Update `llama.cpp` to latest version supporting `gpt-oss` #797

caer · 2025-08-06T01:53:24Z

This PR updates the llama.cpp commit tag to b6097, which includes the recent changes to support GPT OSS (ggml-org/llama.cpp#15091).

…es to add gpt-oss support

webconsulting · 2025-08-06T10:03:21Z

I've tested this update with the b6097 commit and can confirm everything works properly with gpt-oss models.

Tested configurations:

CPU build: compilation and inference working correctly
CUDA build: compilation and inference working correctly

Both builds successfully load and run gpt-oss models without any issues. The bindings generation completed without errors and all the existing tests pass.

Thanks for this update!

caer · 2025-08-06T15:35:32Z

@webconsulting what configurations/code and quantizations did you use for locally testing the gpt-oss models? 🤔

While all the existing tests pass, and existing models work, I'm getting a Decode Error -3: unknown during my inference pipeline (https://github.com/with-caer/curtana/blob/cba36e2b953b6b3d85b6f76ea985125d9aaaee83/curtana/src/lib.rs#L141) when I try to load up either of these GPT models on a MacBook Air host:

webconsulting · 2025-08-06T15:42:30Z

I’ve tested several of Bartowski’s models:
https://huggingface.co/bartowski/openai_gpt-oss-20b-GGUF
Specifically, I tried the Q5_K_M, Q8_0, and BF16 versions.
I also downloaded the MXFP4 model (openai_gpt-oss-20b-MXFP4.gguf), but I don't think I’ve tested it yet.

I can also confirm that the Unsloth models didn’t work for me.

webconsulting · 2025-08-06T16:14:15Z

MXFP4 is ok too...

webconsulting · 2025-08-06T16:23:51Z

Hi! I've analyzed the difference between our implementations and found what's likely causing the decode error -3 with GPT-OSS models.

Quick Fix to Test
Replace the apply_chat_template call with a manual Harmony format construction:
rustlet mut prompt = String::new();
prompt.push_str("<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nReasoning: medium\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|>");

for msg in &self.messages {
match msg.role.as_str() {
"system" => {
prompt.push_str("<|start|>developer<|message|>");
prompt.push_str(&msg.content);
prompt.push_str("<|end|>");
},
"user" => {
prompt.push_str("<|start|>user<|message|>");
prompt.push_str(&msg.content);
prompt.push_str("<|end|>");
},
"assistant" => {
prompt.push_str("<|start|>assistant<|channel|>final<|message|>");
prompt.push_str(&msg.content);
prompt.push_str("<|end|>");
},
_ => {}
}
}

prompt.push_str("<|start|>assistant<|channel|>final<|message|>");
This should resolve the decode error. The model is strict about its format - without these specific tokens, it generates invalid sequences.
Let me know if you need the full Harmony specification or if there are other aspects to investigate!

caer · 2025-08-07T21:04:16Z

@webconsulting thank you so much for looking into this in more detail! I went ahead and followed your suggestions (https://github.com/with-caer/curtana/pull/1/files), but using the official OpenAI Harmony crate (https://crates.io/crates/openai-harmony).

After switching to a computer with a bit more RAM (and using the above code), I can confirm everything works with this latest commit--including the GPT OSS models.

@MarcusDunn if these changes look good to you, would you be willing to get them merged/deployed onto crates.io?

MarcusDunn · 2025-08-07T21:25:35Z

Awesome this was updated so quickly. Just cut a release!

ops: update llama.cpp to b6097, which includes llama.cpp #15091 chang…

54dab05

…es to add gpt-oss support

MarcusDunn approved these changes Aug 7, 2025

View reviewed changes

MarcusDunn merged commit 7fb5a33 into utilityai:main Aug 7, 2025
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update `llama.cpp` to latest version supporting `gpt-oss` #797

Update `llama.cpp` to latest version supporting `gpt-oss` #797

Uh oh!

caer commented Aug 6, 2025 •

edited

Loading

Uh oh!

webconsulting commented Aug 6, 2025

Uh oh!

caer commented Aug 6, 2025

Uh oh!

webconsulting commented Aug 6, 2025 •

edited

Loading

Uh oh!

webconsulting commented Aug 6, 2025

Uh oh!

webconsulting commented Aug 6, 2025 •

edited

Loading

Uh oh!

caer commented Aug 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

MarcusDunn commented Aug 7, 2025

Uh oh!

Uh oh!

Update llama.cpp to latest version supporting gpt-oss #797

Update llama.cpp to latest version supporting gpt-oss #797

Uh oh!

Conversation

caer commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

webconsulting commented Aug 6, 2025

Uh oh!

caer commented Aug 6, 2025

Uh oh!

webconsulting commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

webconsulting commented Aug 6, 2025

Uh oh!

webconsulting commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

caer commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MarcusDunn commented Aug 7, 2025

Uh oh!

Uh oh!

Update `llama.cpp` to latest version supporting `gpt-oss` #797

Update `llama.cpp` to latest version supporting `gpt-oss` #797

caer commented Aug 6, 2025 •

edited

Loading

webconsulting commented Aug 6, 2025 •

edited

Loading

webconsulting commented Aug 6, 2025 •

edited

Loading

caer commented Aug 7, 2025 •

edited

Loading