[BREAKING] feat: decouple PPO epochs from critic #1358

emergenz · 2025-05-02T07:52:00Z

Checklist Before Starting

Search for similar PR(s).

What does this PR do?

Currently, ppo_epochs is used for both the actor, as well as the critic. While this is the case for many PPO implementations (https://github.com/EdanToledo/Stoix, https://github.com/openai/baselines), there is no inherent reason why this should be the case. This is probably due to historic reasons, since OpenAI's PPO implementation had a single loss for both actor and critic (https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/ppo2/model.py; inherent coupling of num_epochs between actor and critic).

This PR simply decouples the actor and critic epochs.

API

The only API change is that I renamed critic.ppo_epochs to critic.epochs to reflect the new behaviour.

Additional Info.

Training: [FSDP, Megatron]
Inference: []

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks.
Add [BREAKING] to the PR title if it breaks any API.
Update the documentation about your changes in the docs.

eric-haibin-lin

the existing config still allows custom epoch num for critic via setting critic.ppo_epochs. does it not work for you?

emergenz · 2025-05-05T08:46:21Z

It does. I suppose these changes are just to make it more obvious to the user that this is possible. For context, the current code says self.config.ppo_epochs for the critic epochs. At first glance, this looks like the epochs are coupled between actor and critic.

Maybe I should rename config.critic.epochs to config.critic.critic_epochs to make the decoupling even more obvious.

Also, the default config actually does couple the two by linking the critic epochs to the PPO epochs. Of course one can just change that, but I would argue that there is no reason for this default coupling to begin with.

If you think that the default config should retain the coupling for 'backwards compatibility' with other well-known PPO implementations, then I would suggest that we just change the naming of critic.ppo_epochs to something else and leave the default config as it was before.

What do you think?

feat: decouple PPO epochs from critic

5c4b8c8

eric-haibin-lin reviewed May 4, 2025

View reviewed changes

ZihengJiang added the status: need review label May 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BREAKING] feat: decouple PPO epochs from critic #1358

[BREAKING] feat: decouple PPO epochs from critic #1358

emergenz commented May 2, 2025

Uh oh!

eric-haibin-lin left a comment

Uh oh!

emergenz commented May 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[BREAKING] feat: decouple PPO epochs from critic #1358

Are you sure you want to change the base?

[BREAKING] feat: decouple PPO epochs from critic #1358

Conversation

emergenz commented May 2, 2025

Checklist Before Starting

What does this PR do?

API

Additional Info.

Checklist Before Submitting

Uh oh!

eric-haibin-lin left a comment

Choose a reason for hiding this comment

Uh oh!

emergenz commented May 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants