Skip to content

Conversation

@qgallouedec
Copy link
Collaborator

@qgallouedec qgallouedec commented Nov 22, 2022

Description

Closes #322

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the documentation and previewed the changes via mkdocs serve.
  • I have updated the tests accordingly (if applicable).

If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

  • I have contacted vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have added additional documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers (if applicable).
    • I have added links to the PR related to the algorithm variant.
    • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves (in PNG format).
    • I have added links to the tracked experiments.
    • I have updated the overview sections at the docs and the repo
  • I have updated the tests accordingly (if applicable).

@vercel
Copy link

vercel bot commented Nov 22, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Nov 22, 2022 at 3:14PM (UTC)

Copy link
Owner

@vwxyzjn vwxyzjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qgallouedec thanks so much for the fix!

Copy link
Owner

@vwxyzjn vwxyzjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Given that this change does not impact the performance of our algorithm variants using their default parameters, we do not need to re-run the benchmark for these variants.

@vwxyzjn
Copy link
Owner

vwxyzjn commented Nov 22, 2022

Thanks @qgallouedec for this catch.

@vwxyzjn vwxyzjn merged commit c515aef into vwxyzjn:master Nov 22, 2022
@qgallouedec qgallouedec deleted the fix-target-update-freq branch November 22, 2022 15:36
timoklein added a commit to timoklein/cleanrl that referenced this pull request Nov 24, 2022
@timoklein timoklein mentioned this pull request Nov 24, 2022
20 tasks
@sdpkjc sdpkjc mentioned this pull request Dec 2, 2022
6 tasks
timoklein added a commit that referenced this pull request Jan 13, 2023
* add draft of SAC discrete implementation

* run pre-commit

* Use log softmax instead of author's log-pi code

* Revert to cleanrl SAC delay implementation (it's more stable)

* Remove docstrings and duplicate code

* Use correct clipreward wrapper

* fix bug in log softmax calculation

* adhere to cleanrl log_prob naming

* fix bug in entropy target calculation

* change layer initialization to match existing cleanrl codebase

* working minimal diff version

* implement original learning update frequency

* parameterize the entropy scale for autotuning

* add benchmarking script

* rename target entropy factor and set new default value

* add docs draft

* fix SAC-discrete links to work pre merge

* add preliminary result table for SAC-discrete

* clean up todos and add header

* minimize diff between sac_atari and sac_continuous

* add sac-discrete end2end test

* SAC-discrete docs rework

* Update SAC-discrete @100k results

* Fix doc links and unify naming in code

* update docs

* fix target update frequency (see PR #323)

* clarify comment regarding CNN encoder sharing

* fix benchmark installation

* fix eps in minimal diff version and improve code readability

* add docs for eps and finalize code

* use no_grad for actor Q-vals and re-use action-probs & log-probs in alpha loss

* update docs for new code and settings

* fix links to point to main branch

* update sac-discrete training plots

* new sac-d training plots

* update results table and fix link

* fix pong chart title

* add Jimmy Ba name as exception to code spell check

* change target_entropy_scale default value to same value as experiments

* remove blank line at end of pre-commit

Co-authored-by: Costa Huang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Target network isn't updated to the correct frequency when target_network_frequency % train_frequency != 0

2 participants