Name	Name	Last commit message	Last commit date
parent directory ..
config	config
README.md	README.md
collect.py	collect.py
run.sh	run.sh
submit.sh	submit.sh

SDGym Scripts

This folder contains a collection of bash scripts and configuration files used to run the entire SDGym benchmarking suite on the MIT SuperCloud infrastructure.

Overview

The contents of this folder consist on the following elements:

config: Folder that contains configuration files for different tasks, which indicate the synthesizers and data modalities to run and the resources needed.
run.sh: A bash script that interprets the config files and launches SDGym with the corresponding settings.
submit.sh: A bash script that submits tasks to the MIT SuperCloud job queue using the run.sh script and the configuration files.

Prerequisites

In order to use these scripts you will need to have been given an account in the MIT SuperCloud. If you want to also upload the results to the sdgym S3 bucket you will need to also have an AWS IAM user with write permissions to the bucket and have configured the .aws/credentials file accordingly.

Usage

In order to run SDGym using these scripts, please follow these steps:

Log into MIT SuperCloud launch server
Clone and enter the SDGym repository

git clone https://github.com/sdv-dev/SDGym
cd SDGym

Install SDGym inside a virtualenv or conda env (NOTE: you may need to activate the conda module)

conda create -y -n sdgym
conda activate sdgym
make install

Enter the scripts folder and execute submit.sh passing the config files you want to run

cd scripts
./submit.sh config/identity.conf config/...

NOTE: The first time the script is run it will download all the available datasets in the datasets folder within the scripts folder. Subsequent runs will skip this step if the datasets folder is found.

After this, you can verify that the tasks have been properly submitted running LLstat, and that a folder called runs/<current-date-and-time> has been created inside the scripts folder.

Uploading results to S3

After all the tasks have finished (LLstat should show no running tasks), you can collect the results and upload them to the sdgym S3 bucket using the following steps:

Enter the scripts/runs folder.

cd SDGym/scripts/runs

Create a tar.gz file with the run results:

tar cvzf <run-date-and-time>.tar.gz <run-date-and-time>

Upload the tar.gz file that you just created to S3

aws s3 cp <run-date-and-time>.tar.gz s3://sdgym/runs

Collecting results as a single CSV file

While running, SDGym will generate a large collection of CSV files containing results from the different synthesizers and datasets.

In order to collect all these CSVs into a single table you can use the collect.py python script passing it the path to the results folder and the path of the new CSV file to generate:

Enter the scripts folder.

cd SDGym/scripts

Call python collect.py passing the path to the run folder and the output csv path:

python collect.py runs/<run-date-and-time> output.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

SDGym Scripts

Overview

Prerequisites

Usage

Uploading results to S3

Collecting results as a single CSV file

FilesExpand file tree

scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

scripts

Folders and files

parent directory

README.md

SDGym Scripts

Overview

Prerequisites

Usage

Uploading results to S3

Collecting results as a single CSV file