This folder contains a collection of bash scripts and configuration files used to run the entire SDGym benchmarking suite on the MIT SuperCloud infrastructure.
The contents of this folder consist on the following elements:
config: Folder that contains configuration files for different tasks, which indicate the synthesizers and data modalities to run and the resources needed.run.sh: A bash script that interprets theconfigfiles and launches SDGym with the corresponding settings.submit.sh: A bash script that submits tasks to the MIT SuperCloud job queue using therun.shscript and the configuration files.
In order to use these scripts you will need to have been given an account in the MIT SuperCloud.
If you want to also upload the results to the sdgym S3 bucket you will need to also have an AWS
IAM user with write permissions to the bucket and have configured the .aws/credentials file accordingly.
In order to run SDGym using these scripts, please follow these steps:
-
Log into MIT SuperCloud launch server
-
Clone and enter the SDGym repository
git clone https://github.com/sdv-dev/SDGym
cd SDGym- Install SDGym inside a virtualenv or conda env (NOTE: you may need to activate the conda module)
conda create -y -n sdgym
conda activate sdgym
make install- Enter the
scriptsfolder and executesubmit.shpassing the config files you want to run
cd scripts
./submit.sh config/identity.conf config/...NOTE: The first time the script is run it will download all the available datasets in the
datasetsfolder within thescriptsfolder. Subsequent runs will skip this step if thedatasetsfolder is found.
After this, you can verify that the tasks have been properly submitted running LLstat, and
that a folder called runs/<current-date-and-time> has been created inside the scripts folder.
After all the tasks have finished (LLstat should show no running tasks), you can collect
the results and upload them to the sdgym S3 bucket using the following steps:
- Enter the
scripts/runsfolder.
cd SDGym/scripts/runs- Create a
tar.gzfile with the run results:
tar cvzf <run-date-and-time>.tar.gz <run-date-and-time>- Upload the
tar.gzfile that you just created to S3
aws s3 cp <run-date-and-time>.tar.gz s3://sdgym/runsWhile running, SDGym will generate a large collection of CSV files containing results from the different synthesizers and datasets.
In order to collect all these CSVs into a single table you can use the collect.py python
script passing it the path to the results folder and the path of the new CSV file to generate:
- Enter the
scriptsfolder.
cd SDGym/scripts- Call
python collect.pypassing the path to the run folder and the output csv path:
python collect.py runs/<run-date-and-time> output.csv