Skip to content

Conversation

@donggrant
Copy link

What does this PR do?

Fixes the bug in the Augmenter Command class where extra quotation marks are added to the output csv files. This PR avoids using any changes to the input file or having to create a temporary file

Summary

Instead of changing how the csv reader works, an easier solution is to mark where quotes and commas occur within the text field while the data is being read into the DictReader(). Then before augmenting, commas and quotes are replaced back into the text value. This avoids the repeated quotation issue and the glitch where the first two quotes are removed by the csv reader. After the augmentation process, the output file is altered to remove any additional markings from the preprocessing.

@qiyanjun
Copy link
Member

@Hanyu-Liu-123 please review

Copy link
Collaborator

@Hanyu-Liu-123 Hanyu-Liu-123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I think we could also add this solution to the csv logger for attack results since it most likely has the same issue with quotation marks.

I think it would also be helpful if we provide a warning for users if there is / in the original sentence, since it will be deleted in the ouput.

@qiyanjun qiyanjun merged commit 11782c4 into QData:master Mar 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants