-
Notifications
You must be signed in to change notification settings - Fork 928
Fix DQN target update frequency #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
vwxyzjn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qgallouedec thanks so much for the fix!
vwxyzjn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Given that this change does not impact the performance of our algorithm variants using their default parameters, we do not need to re-run the benchmark for these variants.
|
Thanks @qgallouedec for this catch. |
* add draft of SAC discrete implementation * run pre-commit * Use log softmax instead of author's log-pi code * Revert to cleanrl SAC delay implementation (it's more stable) * Remove docstrings and duplicate code * Use correct clipreward wrapper * fix bug in log softmax calculation * adhere to cleanrl log_prob naming * fix bug in entropy target calculation * change layer initialization to match existing cleanrl codebase * working minimal diff version * implement original learning update frequency * parameterize the entropy scale for autotuning * add benchmarking script * rename target entropy factor and set new default value * add docs draft * fix SAC-discrete links to work pre merge * add preliminary result table for SAC-discrete * clean up todos and add header * minimize diff between sac_atari and sac_continuous * add sac-discrete end2end test * SAC-discrete docs rework * Update SAC-discrete @100k results * Fix doc links and unify naming in code * update docs * fix target update frequency (see PR #323) * clarify comment regarding CNN encoder sharing * fix benchmark installation * fix eps in minimal diff version and improve code readability * add docs for eps and finalize code * use no_grad for actor Q-vals and re-use action-probs & log-probs in alpha loss * update docs for new code and settings * fix links to point to main branch * update sac-discrete training plots * new sac-d training plots * update results table and fix link * fix pong chart title * add Jimmy Ba name as exception to code spell check * change target_entropy_scale default value to same value as experiments * remove blank line at end of pre-commit Co-authored-by: Costa Huang <[email protected]>
Description
Closes #322
Types of changes
Checklist:
pre-commit run --all-filespasses (required).mkdocs serve.If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-videoflag toggled on (required).mkdocs serve.