Skip to content

Conversation

@paulinebourigault
Copy link

This PR adds a length penalty option, along with the tracker for average positive and negative token length.

The length penalty is calculated using the formula: ((5 + length) / 6) ^ alpha.
This is based on the formula used in Google Neural Machine Translation paper [https://arxiv.org/pdf/1609.08144]

  • When alpha < 0, shorter sequences get higher rewards
  • When alpha > 0, longer sequences get higher rewards
  • Length penalties are applied to the final reward scores, after the primary reward calculation

This is to follow my previous PR: tracker average token lengths for both positive and negative samples #1225

@vermouth1992 could you please review this?

@paulinebourigault
Copy link
Author

Hi @hiyouga, what do you think about merging this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants