Skip to content

ShangtongZhang/DeepRL

 
 

Repository files navigation

This branch is the code for the paper

Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
Shangtong Zhang, Bo Liu, Shimon Whiteson (AAAI 2021)

.
├── Dockerfile                                      # Dependencies
├── requirements.txt                                # Dependencies
├── template_jobs.py                                # Entrance for the experiments
|   ├── mvpi_td3_continuous                         # MVPI-TD3 / TD3 calling
|   ├── var_ppo_continuous                          # TRVO calling
|   ├── mvp_continuous                              # MVP calling
|   ├── risk_a2c_continuous                         # Prashanth baseline calling
|   ├── tamar_continuous                            # Tamar baseline calling
|   ├── off_policy_mvpi                             # Offline MVPI calling
├── deep_rl/agent/MVPITD3_agent.py                  # MVPI-TD3 / TD3 implementation 
├── deep_rl/agent/VarPPO_agent.py                   # TRVO implementation 
├── deep_rl/agent/MVP_agent.py                      # MVP implementation 
├── deep_rl/agent/RiskA2C_agent.py                  # Prashanth baseline implementation 
├── deep_rl/agent/Tamar_agent.py                    # Tamar baseline implementation 
├── deep_rl/agent/OffPolicyMVPI_agent.py            # Offline MVPI implementation 
└── template_plot.py                                # Plotting

I can send the data for plotting via email upon request.

This branch is based on the DeepRL codebase and is left unchanged after I completed the paper. Algorithm implementations not used in the paper may be broken and should never be used. It may take extra effort if you want to rebase/merge the master branch.