Skip to content

Conversation

@xingyaoww
Copy link
Contributor

Add RoPE Scaling Feature to SFT Trainer

Description

This PR adds support for RoPE (Rotary Position Embedding) scaling in the SFT trainer. RoPE scaling is a technique that allows models to handle longer context lengths than they were originally trained on by scaling the position embeddings.

Changes

  • Added RoPE scaling configuration support in the FSDP SFT trainer
  • Implemented model config override mechanism for RoPE scaling parameters
  • Added appropriate logging for RoPE scaling configuration

Usage

To use this feature, add a rope_scaling configuration in your model config:

model:
  rope_scaling:
    type: "linear"  # or "dynamic"
    factor: 2.0     # scaling factor

Testing
Tested with various models that support RoPE scaling, including Llama and Qwen models.

override_config_kwargs = {}
if 'rope_scaling' in self.config.model and self.config.model.rope_scaling is not None:
override_config_kwargs['rope_scaling'] = dict(self.config.model.rope_scaling)
print(f'rope_scaling setted. rope_scaling={override_config_kwargs["rope_scaling"]}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setted -> set

@eric-haibin-lin
Copy link
Collaborator

@openhands-agent could you merge with main and fix potential pre-commit errors?

@xingyaoww
Copy link
Contributor Author

@eric-haibin-lin i think you need to "@OpenHands" for OpenHands Cloud :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants