This repository provides the Cooperative Adversarial Self-supervised Skill Imitation (CASSI) algorithm that enables Solo to extract diverse skills through adversarial imitation from unlabeled, mixed motions using NVIDIA Isaac Gym.
Paper: Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions
Project website: https://sites.google.com/view/icra2023-cassi/home
Maintainer: Chenhao Li
Affiliation: Autonomous Learning Group, Max Planck Institute for Intelligent Systems, and Robotic Systems Lab, ETH Zurich
Contact: chenhli@ethz.ch
-
Create a new python virtual environment with
python 3.8 -
Install
pytorch 1.10withcuda-11.3pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html -
Install Isaac Gym
-
Download and install Isaac Gym Preview 4
cd isaacgym/python pip install -e . -
Try running an example
cd examples python 1080_balls_of_solitude.py -
For troubleshooting, check docs in
isaacgym/docs/index.html
-
-
Install
solo_gymgit clone https://github.com/martius-lab/cassi.git cd solo_gym pip install -e .
- The Solo environment is defined by an env file
solo8.pyand a config filesolo8_config.pyundersolo_gym/envs/solo8/. The config file sets both the environment parameters in classSolo8FlatCfgand the training parameters in classSolo8FlatCfgPPO. - The provided code examplifies the training of Solo 8 with unlabeled mixed motions. Demonstrations induced by 6 locomotion gaits are randomly mixed and augmented with perturbations to 6000 trajectoires with 120 frames and stored in
resources/robots/solo8/datasets/motion_data.pt. The state dimension indices are specified inreference_state_idx_dict.json. To train with other demonstrations, replacemotion_data.ptand adapt reward functions defined insolo_gym/envs/solo8/solo8.pyaccordingly.
python scripts/train.py --task solo8
- The trained policy is saved in
logs/<experiment_name>/<date_time>_<run_name>/model_<iteration>.pt, where<experiment_name>and<run_name>are defined in the train config. - To disable rendering, append
--headless.
python scripts/play.py
- By default the loaded policy is the last model of the last run of the experiment folder.
- Other runs/model iteration can be selected by setting
load_runandcheckpointin the train config. - Use
uandjto command the forward velocity,handkto switch between the extracted skills.
@inproceedings{li2023versatile,
title={Versatile skill control via self-supervised adversarial imitation of unlabeled mixed motions},
author={Li, Chenhao and Blaes, Sebastian and Kolev, Pavel and Vlastelica, Marin and Frey, Jonas and Martius, Georg},
booktitle={2023 IEEE international conference on robotics and automation (ICRA)},
pages={2944--2950},
year={2023},
organization={IEEE}
}
The code is built upon the open-sourced Isaac Gym Environments for Legged Robots and the PPO implementation. We refer to the original repositories for more details.
