Skip to content

Conversation

abuelnasr0
Copy link
Contributor

I think the use of scaling_factor is wrong in RotaryEmbedding layer. It is used to scale the positions not the frequencies.

References:

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Can we add some tests?

@mattdangerw
Copy link
Member

@abuelnasr0 also apologies, we just changed our entire directory structure in #1608

(Hopefully for good reason, we want to allow pip install -e . and pip install git+https:// while still keeping our explicit API surface)

But it does mean everything will need an annoying merge/rebase. If it'd help for me to do any of those and push to this branch just lmk!

@mattdangerw mattdangerw mentioned this pull request May 2, 2024
2 tasks
@abuelnasr0 abuelnasr0 force-pushed the rope_scaling_factor branch from 4c76fe6 to f93a813 Compare May 2, 2024 19:30
@abuelnasr0
Copy link
Contributor Author

abuelnasr0 commented May 2, 2024

Can we add some tests?

I can add tests, but on sunday. sorry for that, but I will be AFK until then.

@mattdangerw
Copy link
Member

I can add tests, but on sunday. sorry for that, but I will be AFK until then.

No rush at all! And thanks so much for all the major contributions to the library :)

I am just getting back from vacation myself, slowly catching up on all the review.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! If tests pass lets ship it!

@mattdangerw mattdangerw added the kokoro:force-run Runs Tests on GPU label May 6, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label May 6, 2024
@abuelnasr0
Copy link
Contributor Author

thanks so much for all the major contributions to the library :)

You're welcome. I am trying to give back to the community as much as I can. And actually contributing to the library helped me to improve, I am learning new things with each PR. Thank you & other authors for creating the library. and thank you for all your reviews, they were really helpful to me.

@abuelnasr0
Copy link
Contributor Author

https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L96-L174

checkout these lines. what is implemented in this PR is LlamaLinearScalingRotaryEmbedding. and I think it was implemented in the main layer at first, but they decided to move it to a new layer. there's also LlamaDynamicNTKScalingRotaryEmbedding that uses scaling_factor in another way. but I think LlamaLinearScalingRotaryEmbedding is more popular.

@mattdangerw mattdangerw merged commit 778ccd7 into keras-team:master May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants