Scala Permutation Evaluation

This repository contains scripts for evaluating the impact of token-level permutations (deletions or flips) on Polish sentences using the EuroEval framework. It allows to generate corrupted sentences, manually tag whether the permutation broke grammaticality, and update both CSV and log files with these annotations.

📁 File Structure

data/
├── scala_sample_300_full_nostatus_run1.csv # Generated sample without status
├── scala_tags_run1.txt # Manual tags of erroneous or unsure sentences
├── scala_sample_300_full_run1.csv # CSV updated with status
├── scala_sample_log_run1.txt # Original log
├── scala_sample_log_run1_updated.txt # Log updated with status


scala-test.py # Main script: generates sample, applies permutations, creates initial CSV & log
scala-tags-to-csv.py # Updates CSV based on manual tagging
scala-tags-to-log.py # Updates log based on updated CSV

🧩 Workflow

1️⃣ Generate corrupted sample

Clone this repository. Clone EuroEval repository next to this one. Run the main script:

source env/bin/activate
python scala-test.py

This will: Load the Polish test dataset (PLDT via EuroEval helper scripts). Deterministically select 300 sentences (using random.seed). Apply token-level permutations using euroeval (delete or flip_neighbours). Save CSV (scala_sample_300_full_nostatus_run1.csv) and log (scala_sample_log_run1.txt).

Note: To generate a new run, change the CSV_PATH, LOG_PATH filenames and the random seed in scala-test.py.

2️⃣ Manual tagging Open the generated log. Identify sentences where the permutation did not break grammaticality (i.e., the corrupted sentence is still correct after permutation). Record their ids in a TXT file (scala_tags_run1.txt) separated by commas. If you are unsure about grammaticality, append a ? to the ID.

Example scala_tags_run1.txt:

2,5,7,13?

3️⃣ Update CSV with tags

Run:

python scala-tags-to-csv.py

This script: Loads the original CSV (_nostatus.csv). Reads manual tags from the TXT file. Sets default status ok. Updates: error for sentences confirmed as grammatically broken. unsure for sentences marked with ?. Saves updated CSV (_full.csv).

4️⃣ Update log with CSV statuses

Run:

python scala-tags-to-log.py

This script: Loads the updated CSV with statuses. Reads the original log (_log.txt). Updates all STATUS: ... lines to match the CSV. Saves a new log file (_log_updated.txt).

⚙️ Notes Changing runs: For a new run (e.g., run2), update: CSV_PATH and LOG_PATH in scala-test.py Seed for random sampling Names of manual tag files (*.txt) in the tagging scripts.

🔗 References

✅ Summary Run scala-test.py → generate corrupted sample + log. Manually tag IDs in *.txt. Run scala-tags-to-csv.py → update CSV statuses. Run scala-tags-to-log.py → update log statuses.

This workflow ensures accurate annotation of corrupted sentences for benchmarking syntactic perturbation models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scala Permutation Evaluation

📁 File Structure

🧩 Workflow

1️⃣ Generate corrupted sample

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
env		env
README.md		README.md
notes		notes
scala-compare-pos.py		scala-compare-pos.py
scala-compare.py		scala-compare.py
scala-tags-to-csv.py		scala-tags-to-csv.py
scala-tags-to-log.py		scala-tags-to-log.py
scala-test.py		scala-test.py

Folders and files

Latest commit

History

Repository files navigation

Scala Permutation Evaluation

📁 File Structure

🧩 Workflow

1️⃣ Generate corrupted sample

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages