Fix CLARE StopIteration Bug #591
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Summary
This PR is created to fix #589, in which the sentence encoder raises StopIteration when the
newly_modified_indicesof the attacked text is an empty set. This happens when the new words swapped/inserted/merged by the CLARE transformations do not contain any letters. Our current implementation keeps tracks ofnewly_modified_indicesonly when the new words at the modified indices are actually words, not symbols. An example is shown below:Suppose we have the sentence
and apply WordMergeMaskedLM to the sentence, one of the transformations it returns is
in which the phrase manages sweetness is changed to the symbols ),
This change will not be present in the
newly_modified_indicesof the transformed sentence, since we omit symbols. But when running the sentence encoder,newly_modified_indicesof the transformed sentence needs to have at least one element in it. The StopIteration bug consequently occurs.To fix the issue, one thing we could do is to change how
newly_modified_indicesis recorded. This is linked to the deletion index issue as well #558. This will likely be a big change to make considering our current implementation relies on not counting symbols as modifications. The other thing we could do as a temporary fix is for the CLARE transformations is to only include transformations whose substituted/added words contain at least 1 letter. In this way, the modified indices will be recorded as normal.Additions
Check replacement words before making the transformation. If the replacement words do not contain any letter, skip that transformation.