Initial implementation of power scheduler by catswe · Pull Request #974 · marin-community/levanter

catswe · 2025-05-24T05:08:24Z

Implementation based on this paper. https://arxiv.org/abs/2408.13359.

@dlwh @Helw150 I'm not sure if this is the best way to go about implementing this as is. I had to add five more parameters to the parameter list of the OptimizerConfig class to implement Power Scheduler, as, based on my understanding, there is a tight coupling between the optimizer and the scheduler. Should there be an effort to decouple the optimizer and the scheduler, so that we don't have to pass schedule-specific parameters directly into OptimizerConfig?

dlwh · 2025-05-24T16:32:05Z

yeah it's probably time to make a more configurable schedule thing...

dlwh

seems good modulo some nits (I didn't check the math)

I don't love that you'd have to specify batch size twice (and it doesn't take into account batch schedule), but I really don't want to thread it through.

Do you want to give this a shot using the Marin "submodule" stuff and see how it does?

src/levanter/optim/config.py

catswe · 2025-05-24T18:08:04Z

I don't love that you'd have to specify batch size twice (and it doesn't take into account batch schedule), but I really don't want to thread it through.

Mhmm. I think I can rewrite so that it takes into account BatchSchedule, but it's probably better to try and decouple the optimizer and scheduler first (which I can take on if you can tell me the direction on how to do it).

Do you want to give this a shot using the Marin "submodule" stuff and see how it does?

Sure

dlwh · 2025-05-25T01:22:19Z

ok, you asked for it! so scheduler refactor is going to be a bit delicate just since we need to support old-style configs because I don't want to break too much compatibility (and we probably want to keep the simple case simple).

I'm going to describe it in its full glory, but I think we can split it into chunks, and you only really need to do the first one.

In an ideal world (based on our experience with Marin 8B), I think the schedule system would look something like

lr_schedule: str | LrSchedule | list[ScheduleStep[LrSchedule]]

If lr_schedule is a string, use old behavior. Otherwise, use new behavior.

Chunk 1: LrSchedule

This will enable

lr_schedule: str | LrSchedule

LrSchedule should probably look a bit like how we do the OptimConfig, which is to say a draccus plugin registry.

We should make a LrSchedule type that is a PluginRegistry and add the kinds we already have.

LrSchedule should probably have all settings needed to specify it, except for the schedule length, which should be figured out from context.

normal Cosine would be specified as either:

optimizer:
     lr_schedule: "cosine"
     warmup: ...
     learining_rate: ...

or

optimizer:
    lr_schedule:
          type: cosine
          learning_rate: <peak lr>
          exponent: ...

for more fine-grained control...

Chunk 2: Staged Schedules

We should support staged schedules too:

optimizer:
   lr_schedule:
       # warmup
        - start: 0
           value:
               type: linear 
               initial_lr: 0.0
               final_lr: 1e-4
       # stable
        - start: 1000 
          value:
              type: constant
              learning_rate: 1e-4
        # decay
        - start: 100000
           value:
               type: inv

This is obviously a pain to specify for the common case but sometimes we need the control

This is a big lift, so lmk if this is too daunting! Like I said, we only really need the first chunk right now

catswe · 2025-05-26T04:58:30Z

I implemented chunk 1. I passed in ctx: LrScheduleContext for LrSchedule to avoid the user being able to re-declare learning_rate and min_lr_ratio, which may lead to hidden bugs.

I will probably revisit and implement chunk 2 some other times.

dlwh

this is great, thanks!

@dlwh

Implementation based on this paper. https://arxiv.org/abs/2408.13359. @dlwh @Helw150 I'm not sure if this is the best way to go about implementing this as is. I had to add five more parameters to the parameter list of the OptimizerConfig class to implement Power Scheduler, as, based on my understanding, there is a tight coupling between the optimizer and the scheduler. Should there be an effort to decouple the optimizer and the scheduler, so that we don't have to pass schedule-specific parameters directly into OptimizerConfig?

catswe force-pushed the power-scheduler branch from 2c03bf9 to 4f18121 Compare May 24, 2025 05:26

Initial implementation

ec92504

catswe force-pushed the power-scheduler branch from 4f18121 to ec92504 Compare May 24, 2025 05:28

dlwh reviewed May 24, 2025

View reviewed changes

src/levanter/optim/config.py Outdated Show resolved Hide resolved

src/levanter/optim/config.py Outdated Show resolved Hide resolved

src/levanter/optim/config.py Outdated Show resolved Hide resolved

src/levanter/optim/config.py Outdated Show resolved Hide resolved

Update

e07301b

catswe changed the title ~~[WIP] Initial implementation of power scheduler~~ Initial implementation of power scheduler May 24, 2025

catswe marked this pull request as ready for review May 24, 2025 18:09

catswe added 5 commits May 25, 2025 22:26

Refactor

8b7cd87

Nit

0ffd6de

Merge branch 'main' into power-scheduler

41b7dbc

Nit

6bd6198

Nit

b4bdb4d

Nit

5bf47e1

dlwh approved these changes May 27, 2025

View reviewed changes

dlwh merged commit 7502059 into marin-community:main May 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial implementation of power scheduler#974

Initial implementation of power scheduler#974
dlwh merged 8 commits intomarin-community:mainfrom
catswe:power-scheduler

catswe commented May 24, 2025 •

edited

Loading

Uh oh!

dlwh commented May 24, 2025

Uh oh!

dlwh left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

catswe commented May 24, 2025 •

edited

Loading

Uh oh!

dlwh commented May 25, 2025

Uh oh!

catswe commented May 26, 2025

Uh oh!

dlwh left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

catswe commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dlwh commented May 24, 2025

Uh oh!

dlwh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

catswe commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dlwh commented May 25, 2025

Chunk 1: LrSchedule

Chunk 2: Staged Schedules

Uh oh!

catswe commented May 26, 2025

Uh oh!

dlwh left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

catswe commented May 24, 2025 •

edited

Loading

catswe commented May 24, 2025 •

edited

Loading