Skip to content

Conversation

caer
Copy link
Contributor

@caer caer commented Aug 6, 2025

This PR updates the llama.cpp commit tag to b6097, which includes the recent changes to support GPT OSS (ggml-org/llama.cpp#15091).

@webconsulting
Copy link

I've tested this update with the b6097 commit and can confirm everything works properly with gpt-oss models.

Tested configurations:

  • CPU build: compilation and inference working correctly
  • CUDA build: compilation and inference working correctly

Both builds successfully load and run gpt-oss models without any issues. The bindings generation completed without errors and all the existing tests pass.

Thanks for this update!

@caer
Copy link
Contributor Author

caer commented Aug 6, 2025

@webconsulting what configurations/code and quantizations did you use for locally testing the gpt-oss models? 🤔

While all the existing tests pass, and existing models work, I'm getting a Decode Error -3: unknown during my inference pipeline (https://github.com/with-caer/curtana/blob/cba36e2b953b6b3d85b6f76ea985125d9aaaee83/curtana/src/lib.rs#L141) when I try to load up either of these GPT models on a MacBook Air host:

@webconsulting
Copy link

webconsulting commented Aug 6, 2025

I’ve tested several of Bartowski’s models:
https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF
Specifically, I tried the Q5_K_M, Q8_0, and BF16 versions.
I also downloaded the MXFP4 model (openai_gpt-oss-20b-MXFP4.gguf), but I don't think I’ve tested it yet.

I can also confirm that the Unsloth models didn’t work for me.

@webconsulting
Copy link

MXFP4 is ok too...
image

@webconsulting
Copy link

webconsulting commented Aug 6, 2025

Hi! I've analyzed the difference between our implementations and found what's likely causing the decode error -3 with GPT-OSS models.

Root Cause
GPT-OSS models require the Harmony prompt format, not standard chat templates. The model expects specific tokens like <|start|>, <|channel|>, <|message|>, and <|end|>.
In my implementation, I detect GPT-OSS models and use a specialized Harmony processor that formats messages like this:
<|start|>system<|message|>You are ChatGPT...\nReasoning: medium\n# Valid channels: analysis, commentary, final...<|end|>
<|start|>developer<|message|># Instructions\n...<|end|>
<|start|>user<|message|>Hello<|end|>
<|start|>assistant<|channel|>final<|message|>
When using apply_chat_template with the default template, the model receives unexpected tokens and fails during decode.

Quick Fix to Test
Replace the apply_chat_template call with a manual Harmony format construction:
rustlet mut prompt = String::new();
prompt.push_str("<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nReasoning: medium\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|>");

for msg in &self.messages {
match msg.role.as_str() {
"system" => {
prompt.push_str("<|start|>developer<|message|>");
prompt.push_str(&msg.content);
prompt.push_str("<|end|>");
},
"user" => {
prompt.push_str("<|start|>user<|message|>");
prompt.push_str(&msg.content);
prompt.push_str("<|end|>");
},
"assistant" => {
prompt.push_str("<|start|>assistant<|channel|>final<|message|>");
prompt.push_str(&msg.content);
prompt.push_str("<|end|>");
},
_ => {}
}
}

prompt.push_str("<|start|>assistant<|channel|>final<|message|>");
This should resolve the decode error. The model is strict about its format - without these specific tokens, it generates invalid sequences.
Let me know if you need the full Harmony specification or if there are other aspects to investigate!

@caer
Copy link
Contributor Author

caer commented Aug 7, 2025

@webconsulting thank you so much for looking into this in more detail! I went ahead and followed your suggestions (https://github.com/with-caer/curtana/pull/1/files), but using the official OpenAI Harmony crate (https://crates.io/crates/openai-harmony).

After switching to a computer with a bit more RAM (and using the above code), I can confirm everything works with this latest commit--including the GPT OSS models.

@MarcusDunn if these changes look good to you, would you be willing to get them merged/deployed onto crates.io?

@MarcusDunn MarcusDunn merged commit 7fb5a33 into utilityai:main Aug 7, 2025
4 of 5 checks passed
@MarcusDunn
Copy link
Contributor

Awesome this was updated so quickly. Just cut a release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants