This branch is the code for the paper
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Shangtong Zhang, Remi Tachet des Combes, Romain Laroche (JMLR 2022)
.
├── Dockerfile # Dependencies
├── requirements.txt # Dependencies
├── template_jobs.py # Entrance for the experiments
├── deep_rl/agent/OffPAC_agent.py # Off-policy actor critic
└── template_plot.py # Plotting
I can send the data for plotting via email upon request.
This branch is based on the DeepRL codebase and is left unchanged after I completed the paper. Algorithm implementations not used in the paper may be broken and should never be used. It may take extra effort if you want to rebase/merge the master branch.