QData · qiyanjun · Nov 28, 2020 · Oct 22, 2020 · Oct 22, 2020 · Oct 22, 2020
diff --git a/.github/workflows/check-formatting.yml b/.github/workflows/check-formatting.yml
@@ -28,7 +28,7 @@ jobs:
         python -m pip install --upgrade pip setuptools wheel
         pip install black flake8 isort # Testing packages
         python setup.py install_egg_info # Workaround https://github.com/pypa/pip/issues/4537
-        pip install -e .
+        pip install -e .[dev]
     - name: Check code format with black and isort
       run: |
         make lint
diff --git a/.github/workflows/run-pytest.yml b/.github/workflows/run-pytest.yml
@@ -29,9 +29,9 @@ jobs:
         pip install pytest pytest-xdist # Testing packages
         pip uninstall textattack --yes # Remove TA if it's already installed 
         python setup.py install_egg_info # Workaround https://github.com/pypa/pip/issues/4537
-        pip install .
+        pip install -e .[dev]
         pip freeze
     - name: Test with pytest
       run: |
-        pytest tests -vx --dist=loadfile -n auto
+        pytest tests -v
 
diff --git a/.readthedocs.yml b/.readthedocs.yml
@@ -17,3 +17,5 @@ python:
       - requirements: requirements.txt
       - method: pip
         path: .
+        extra_requirements:
+          - docs
diff --git a/Makefile b/Makefile
@@ -14,7 +14,7 @@ test: FORCE ## Run tests using pytest
 	python -m pytest --dist=loadfile -n auto
 
 docs: FORCE ## Build docs using Sphinx.
-	sphinx-build -b html docs docs/_build/html 
+	sphinx-build -b html docs docs/_build/html
 
 docs-check: FORCE ## Builds docs using Sphinx. If there is an error, exit with an error code (instead of warning & continuing).
 	sphinx-build -b html docs docs/_build/html -W

diff --git a/README.md b/README.md
diff --git a/README_ZH.md b/README_ZH.md
diff --git a/docs/1start/FAQ.md b/docs/1start/FAQ.md
@@ -0,0 +1,123 @@
+Frequently Asked Questions
+========================================
+
+## Via Slack: Where to Ask Questions: 
+
+For help and realtime updates related to TextAttack, please [join the TextAttack Slack](https://join.slack.com/t/textattack/shared_invite/zt-huomtd9z-KqdHBPPu2rOP~Z8q3~urgg)!
+
+
+## Via CLI: `--help`
+
++ Easiest self help:   `textattack --help`
++ More concrete self help: 
+  - `textattack attack --help`  
+  - `textattack augment --help`
+  - `textattack train --help`
+  - `textattack peek-dataset --help`
+  - `textattack list`, e.g., `textattack list search-methods`
+
+
+## Via our papers: More details on results  
++ [references](https://textattack.readthedocs.io/en/latest/1start/references.html)
+
+
+## Via readthedocs: More details on APIs
++ [complete API reference on TextAttack](https://textattack.readthedocs.io/en/latest/apidoc/textattack.html) 
+
+
+## More Concrete Questions: 
+
+
+### 1. How to Train
+
+For example, you can *Train our default LSTM for 50 epochs on the Yelp Polarity dataset:*
+```bash
+textattack train --model lstm --dataset yelp_polarity --batch-size 64 --epochs 50 --learning-rate 1e-5
+```
+
+The training process has data augmentation built-in:
+```bash
+textattack train --model lstm --dataset rotten_tomatoes --augment eda --pct-words-to-swap .1 --transformations-per-example 4
+```
+This uses the `EasyDataAugmenter` recipe to augment the `rotten_tomatoes` dataset before training.
+
+*Fine-Tune `bert-base` on the `CoLA` dataset for 5 epochs**:
+```bash
+textattack train --model bert-base-uncased --dataset glue^cola --batch-size 32 --epochs 5
+```
+
+
+
+
+### 2. Use Custom  Models  
+
+TextAttack is model-agnostic!  You can use `TextAttack` to analyze any model that outputs IDs, tensors, or strings. To help users, TextAttack includes pre-trained models for different common NLP tasks. This makes it easier for
+users to get started with TextAttack. It also enables a more fair comparison of attacks from the literature. A list of available pretrained models and their validation accuracies is available at [HERE](https://textattack.readthedocs.io/en/latest/3recipes/models.html).
+
+
+You can easily try out an attack on a local model you prefer. To attack a pre-trained model, create a short file that loads them as variables `model` and `tokenizer`.  The `tokenizer` must
+be able to transform string inputs to lists or tensors of IDs using a method called `encode()`. The
+model must take inputs via the `__call__` method.
+
+##### Model from a file
+To experiment with a model you've trained, you could create the following file
+and name it `my_model.py`:
+
+```python
+model = load_your_model_with_custom_code() # replace this line with your model loading code
+tokenizer = load_your_tokenizer_with_custom_code() # replace this line with your tokenizer loading code
+```
+
+Then, run an attack with the argument `--model-from-file my_model.py`. The model and tokenizer will be loaded automatically.
+
+TextAttack is model-agnostic - meaning it can run attacks on models implemented in any deep learning framework. Model objects must be able to take a string (or list of strings) and return an output that can be processed by the goal function. For example, machine translation models take a list of strings as input and produce a list of strings as output. Classification and entailment models return an array of scores. As long as the user's model meets this specification, the model is fit to use with TextAttack.
+
+
+### 3. Use Custom Datasets 
+
+
+#### From a file
+
+Loading a dataset from a file is very similar to loading a model from a file. A 'dataset' is any iterable of `(input, output)` pairs.
+The following example would load a sentiment classification dataset from file `my_dataset.py`:
+
+```python
+dataset = [('Today was....', 1), ('This movie is...', 0), ...]
+```
+
+You can then run attacks on samples from this dataset by adding the argument `--dataset-from-file my_dataset.py`.
+
+
+
+#### Custom Dataset via AttackedText class
+
+To allow for word replacement after a sequence has been tokenized, we include an `AttackedText` object
+which maintains both a list of tokens and the original text, with punctuation. We use this object in favor of a list of words or just raw text.
+
+
+#### Custome Dataset via Data Frames or other python data objects (*coming soon*)
+
+
+### 4. Benchmarking Attacks
+
+- See our analysis paper: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples at [EMNLP BlackNLP](https://arxiv.org/abs/2009.06368). 
+
+- As we emphasized in the above paper, we don't recommend to directly compare Attack Recipes out of the box. 
+
+- This comment is due to that attack recipes in the recent literature used different ways or thresholds in setting up their constraints. Without the constraint space held constant, an increase in attack success rate could from an improved search or transformation method or a less restrictive search space. 
+
+
+### 5. Create Custom or New Attacks
+
+The `attack_one` method in an `Attack` takes as input an `AttackedText`, and outputs either a `SuccessfulAttackResult` if it succeeds or a `FailedAttackResult` if it fails. 
+
+- [Here is an example of using TextAttack to create a new attack method](https://github.com/jxmorris12/second-order-adversarial-examples) 
+
+
+We formulate an attack as consisting of four components: a **goal function** which determines if the attack has succeeded, **constraints** defining which perturbations are valid, a **transformation** that generates potential modifications given an input, and a **search method** which traverses through the search space of possible perturbations. The attack attempts to perturb an input text such that the model output fulfills the goal function (i.e., indicating whether the attack is successful) and the perturbation adheres to the set of constraints (e.g., grammar constraint, semantic similarity constraint). A search method is used to find a sequence of transformations that produce a successful adversarial example.
+
+
+This modular design unifies adversarial attack methods into one system, enables us to easily assemble attacks from the literature while re-using components that are shared across attacks. We provides clean, readable implementations of 16 adversarial attack recipes from the literature (see [our tool paper](https://arxiv.org/abs/2005.05909) and [our benchmark search paper](https://arxiv.org/abs/2009.06368)). For the first time, these attacks can be benchmarked, compared, and analyzed in a standardized setting.
+
+
+
diff --git a/docs/1start/api-design-tips.md b/docs/1start/api-design-tips.md
@@ -17,7 +17,6 @@ One of the challenges for building such tools is that the tool should be flexibl
 
 We provide the following broad advice to help other future developers create user-friendly NLP libraries in Python:
 - To become model-agnostic, implement a model wrapper class: a model is anything that takes string input(s) and returns a prediction.
-- To become model-agnostic, implement a model wrapper class.
 - To become data-agnostic, take dataset inputs as (input, output) pairs, where each model input is represented as an OrderedDict.
 - Do not plan for inputs (tensors, lists, etc.) to be a certain size or shape unless explicitly necessary.
 - Centralize common text operations, like parsing and string-level operations, in one class.
@@ -29,6 +28,17 @@ We provide the following broad advice to help other future developers create use
  Our modular and extendable design allows us to reuse many components to offer 15+ different adversarial attack methods proposed by literature. Our model-agnostic and dataset-agnostic design allows users to easily run adversarial attacks against their own models built using any deep learning framework. We hope that our lessons from developing TextAttack will help others create user-friendly open-source NLP libraries.
 
 
+## TextAttack flowchart
+
+![TextAttack flowchart](/_static/imgs/intro/textattack_components.png)
+
+
++ Here is a summary diagram of TextAttack Ecosystem
+
+![diagram](/_static/imgs/intro/textattack_ecosystem.png)
+
+
+
 ## More Details in Reference
 
 ```