-
Notifications
You must be signed in to change notification settings - Fork 109
Update llama.cpp
to latest version supporting gpt-oss
#797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…es to add gpt-oss support
I've tested this update with the b6097 commit and can confirm everything works properly with gpt-oss models. Tested configurations:
Both builds successfully load and run gpt-oss models without any issues. The bindings generation completed without errors and all the existing tests pass. Thanks for this update! |
@webconsulting what configurations/code and quantizations did you use for locally testing the While all the existing tests pass, and existing models work, I'm getting a |
I’ve tested several of Bartowski’s models: I can also confirm that the Unsloth models didn’t work for me. |
Hi! I've analyzed the difference between our implementations and found what's likely causing the decode error -3 with GPT-OSS models. Root Cause Quick Fix to Test for msg in &self.messages { prompt.push_str("<|start|>assistant<|channel|>final<|message|>"); |
@webconsulting thank you so much for looking into this in more detail! I went ahead and followed your suggestions (https://github.com/with-caer/curtana/pull/1/files), but using the official OpenAI Harmony crate (https://crates.io/crates/openai-harmony). After switching to a computer with a bit more RAM (and using the above code), I can confirm everything works with this latest commit--including the GPT OSS models. @MarcusDunn if these changes look good to you, would you be willing to get them merged/deployed onto crates.io? |
Awesome this was updated so quickly. Just cut a release! |
This PR updates the
llama.cpp
commit tag tob6097
, which includes the recent changes to support GPT OSS (ggml-org/llama.cpp#15091).