add length penalty in reward + tracker avg positive & negative token length #1301

paulinebourigault · 2025-04-28T19:28:58Z

This PR adds a length penalty option, along with the tracker for average positive and negative token length.

The length penalty is calculated using the formula: ((5 + length) / 6) ^ alpha.
This is based on the formula used in Google Neural Machine Translation paper [https://arxiv.org/pdf/1609.08144]

When alpha < 0, shorter sequences get higher rewards
When alpha > 0, longer sequences get higher rewards
Length penalties are applied to the final reward scores, after the primary reward calculation

This is to follow my previous PR: tracker average token lengths for both positive and negative samples #1225

@vermouth1992 could you please review this?

paulinebourigault · 2025-05-04T11:37:06Z

Hi @hiyouga, what do you think about merging this?

paulinebourigault and others added 4 commits April 25, 2025 17:26

length penaly in reward function

bb573bc

added length penalty option

833ac3e

fix conflict

4a40db1

Merge branch 'main' into pauline/length_penalty_in_reward

a74b6cc

ZihengJiang added the status: need review label Apr 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add length penalty in reward + tracker avg positive & negative token length #1301

add length penalty in reward + tracker avg positive & negative token length #1301

paulinebourigault commented Apr 28, 2025

Uh oh!

paulinebourigault commented May 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add length penalty in reward + tracker avg positive & negative token length #1301

Are you sure you want to change the base?

add length penalty in reward + tracker avg positive & negative token length #1301

Conversation

paulinebourigault commented Apr 28, 2025

Uh oh!

paulinebourigault commented May 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants