New metric module to improve flexibility and intuitiveness - moved from #475 #514

sanchit97 · 2021-08-20T05:22:54Z

What does this PR do?

This PR is in continuation from #475 due to excessive structure changes

Summary

This PR adds a new metric module to Textattack, which provides an easier way to add new metrics or use existing metrics on adv. examples. Currently, metrics are calculated in the attack_log_manager.py file, which is very unintuitive and non-flexible. Slides detailing this change are at: https://docs.google.com/presentation/d/1O77447GGWQF-xDnIdTr8d2MMPsQ9o9CG3pZZoo2bPrs/edit?usp=sharing

Additions

Added metrics module as textattack.metrics.
Added attack_metrics module as textattack.metrics.attack_metrics.
Added quality_metrics module as textattack.metrics.quality_metrics.

Changes

metrics is a new module, removing metrics from attack_log_manager.py

Deletions

**

Checklist

[ x ] The title of your pull request should be a summary of its contribution.
[ x ] Please write detailed description of what parts have been newly added and what parts have been modified. Please also explain why certain changes were made.
[ x ] If your pull request addresses an issue, please mention the issue number in the pull request description to make sure they are linked (and people consulting the issue know you are working on it)
[ x ] To indicate a work in progress please mark it as a draft on Github.
[ x ] Make sure existing tests pass.
[ x ] Add relevant tests. No quality testing = no merge.
[ ] All public methods must have informative docstrings that work nicely with sphinx. For new modules/files, please add/modify the appropriate .rst file in TextAttack/docs/apidoc.'

jinyongyoo

@sanchit97 Thank you for adding the perplexity module! I think PPL (and USE encoder similarity if you have time 😉) are two important metrics to add!

I still am having a hard time understanding why we need to instantiate a new Metric everytime we have a new list of results. I think one downside is clearly evident in the Perplexity class where whenever we instantiate a new Perplexity, we have to load a fresh copy of GPT2 model. Could you perhaps explain the design to me?

Also, I left some comments about calculating Perplexity. Since it's possible that what we add will be used by users for reporting performances, I think it's important that we not only get it correct but also explain clearly what we're calculating and how.

textattack/__init__.py

textattack/attack_args.py

textattack/metrics/quality_metrics/perplexity.py

Signed-off-by: sanchit97 <[email protected]>

sanchit97

init and calculate are now changed to favor one-time initialization
New USE metric has been added and checked
[WIP] Tests and docs

textattack/metrics/quality_metrics/perplexity.py

textattack/attack_args.py

textattack/__init__.py

qiyanjun · 2021-09-15T12:30:53Z

@sanchit97 can you fix the errors?

qiyanjun · 2021-09-17T14:54:50Z

@jinyongyoo I think this is ready. Can you help review it?

jinyongyoo

@sanchit97 This looks awesome! Thank you for the hard work! This will be a great addition to TextAttack.

One (just one I promise!) concern I have is the perplexity numbers from the test output. They looks a little bit too high (200-300, 1000-2000), though I cannot see any issues with the code. One reason might be that we're just using poor samples for testing, but maybe we can do a sanity check on a larger number of samples (e.g. 1000 IMDB, Yelp, etc) just to make sure.

Otherwise, most of my comments are minor ones about the docstring. I'm going to approve the PR since code and design looks good! 🤗

Lastly, these are some possible suggestions on how we can improve Metrics module later on in the future (not necessary for this PR).

Batched passes with GPT2: I know doing batches with GPT2 is tricky b/c there are no padding tokens, but we can manually set them to be EOS token and then use attention masking to make sure we don't use the padding tokens. Also we need to make sure that we set the labels with EOS tokens to be -100 when calculating the loss. This will speed up the perplexity metric a lot.
Global object sharing for GPT2 and USE: note than we also use GPT2 and USE as part of our constraints, so it's possible that we already loaded the model onto our GPUs when running the attacks. It would be nice if we could reuse them for metrics without instantiating new models (very easy to hit GPU OOM). One solution is to make a GLOBAL_SHARED_OBJECTS dictionary within shared/utils, make it global scope, and then insert GPT2 and USE models into the dictionary whenever we instantiate them the first time and reuse them later in other modules.

textattack/attack_args.py

textattack/loggers/attack_log_manager.py

textattack/metrics/quality_metrics/perplexity.py

jinyongyoo · 2021-09-21T06:17:43Z

tests/sample_outputs/run_attack_hotflip_lstm_mr_4_adv_metrics.txt

+| Average perturbed word %:     | 5.56%  |
+| Average num. words per input: | 15.5   |
+| Avg num queries:              | 1.33   |
+| Average Original Perplexity:  | 291.47 |


The perplexity numbers here seem abnormally high (should be less than 100).

I have doubts about this too.

textattack/metrics/metric.py

textattack/metrics/quality_metrics/perplexity.py

jinyongyoo · 2021-09-21T08:06:45Z

@sanchit97 I also realized I never responded to your replies. I don't get any notifications from GitHub and I don't check the repo often, so I didn't see the response until now. In the future, if you need my response, please try tagging me with @ or let me know via slack.

As for how we calculate perplexity (average loss exponentiated or average of individual perplexity), I'm actually not sure of this myself either. I also wondered the same thing recently when I had to calculate perplexity. Personally, I think the average loss exponentiated make sense if we're calculating model's perplexity on the corpus level (e.g. WikiText), but since we're calculating perplexity of individual sentences, I tend to think average individual perplexity gives clearer meaning. But it's possible others might have different interpretations.

qiyanjun · 2021-09-21T17:55:32Z

@jinyongyoo thanks for the excellent review comments. I will fix those Docstrings..

qiyanjun · 2021-09-24T14:54:08Z

@ArshdeepSekhon can you review and try the code?

sanchit97 added 2 commits August 20, 2021 01:05

[CODE] Transfer from #475 + new structure

c0e2993

[CODE] Changing metric to resemble constraints structure

da4ca68

sanchit97 changed the title ~~Metric module~~ Metric module - moved from #475 Aug 20, 2021

sanchit97 changed the title ~~Metric module - moved from #475~~ New metric module to improve flexibility and intuitiveness - moved from #475 Aug 20, 2021

sanchit97 added 3 commits August 20, 2021 02:16

[CODE] Add code for perplexity

46781b1

[CODE] Fixing init file

5a88253

[CODE] Add command-line option for quality metrics

4a94be9

qiyanjun requested a review from jinyongyoo August 20, 2021 15:26

jinyongyoo reviewed Aug 21, 2021

View reviewed changes

sanchit97 added 4 commits August 27, 2021 00:40

[FIX] Import order+metrics import

5e929f2

[CODE] New USE metric WIP

a1b2c5b

Signed-off-by: sanchit97 <[email protected]>

[FIX] Working USE

0baa502

[FIX] Change structure of Metric mdl

10ee24b

sanchit97 commented Sep 3, 2021

View reviewed changes

sanchit97 added 4 commits September 10, 2021 02:33

[CODE] Fix metrics, add tests

32c3e43

[CODE] Fix black on use

559057e

[CODE] Fix isort on use

aab7eec

[CODE] Add new help msg

b5a1209

sanchit97 marked this pull request as ready for review September 10, 2021 06:57

sanchit97 added 3 commits September 10, 2021 10:46

[CODE] Add new docs

aa1ad15

[CODE] Fix print

d44a54c

[CODE] Fix black

eab1cd0

sanchit97 added 2 commits September 16, 2021 18:05

[FIX] Fix perplexity precision

3e2b16f

[FIX] Fix perplexity escape seq

f1ef471

jinyongyoo approved these changes Sep 21, 2021

View reviewed changes

qiyanjun mentioned this pull request Sep 29, 2021

To Improve the Metric Module #532

Open

fix docstring issues..

fa9817a

qiyanjun merged commit 7d2de08 into master Sep 29, 2021

qiyanjun mentioned this pull request Sep 29, 2021

Enhance augment function #531

Merged

New metric module to improve flexibility and intuitiveness - moved from #475 #514

New metric module to improve flexibility and intuitiveness - moved from #475 #514

Uh oh!

Conversation

sanchit97 commented Aug 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary

Additions

Changes

Deletions

Checklist

Uh oh!

jinyongyoo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sanchit97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qiyanjun commented Sep 15, 2021

Uh oh!

qiyanjun commented Sep 17, 2021

Uh oh!

jinyongyoo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jinyongyoo Sep 21, 2021

Choose a reason for hiding this comment

Uh oh!

qiyanjun Sep 29, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jinyongyoo commented Sep 21, 2021

Uh oh!

qiyanjun commented Sep 21, 2021

Uh oh!

qiyanjun commented Sep 24, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sanchit97 commented Aug 20, 2021 •

edited

Loading

jinyongyoo left a comment •

edited

Loading