-
Notifications
You must be signed in to change notification settings - Fork 4.8k
[Feature] Safe save with FSDP, slurm examples #1863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
c84a002 And after 1 epoch of training, I see different losses at 2 nodes and 4 nodes. Sure, 4 nodes trained faster, but I don't think it saw all the data. 4 nodes shows higher loss compare to 2 nodes. |
|
hello, I am trying to use the changes suggested to fine-tine vicuna-33b. I am curious also since it refers to change in internal transformers (main branch) which has changed so I cannot find where to change in the new transformers code. was it this version of transformers :(https://github.com/huggingface/transformers/blob/v4.29.1/src/transformers/trainer.py#L1498) if you could please let me know the tag (e.g. v4.29.1), I can find the right place to change. Thank you :) |
bf7aa7e to
a81a04c
Compare
Related issue number (if applicable)
#166
#588
#256
Checks
format.shto lint the changes in this PR.