model: add glm-asr support #17901

piDack · 2025-12-10T03:28:18Z

Make sure to read the contributing guidelines before submitting a PR

This PR adds support for the GLM-ASR architecture, specifically validating with the zai-org/GLM-ASR-Nano-2512 model.

Key Changes:

Model Support: Implemented necessary logic to support GLM-ASR models.
Conversion Script: Updated convert_hf_to_gguf.py to handle dynamic configuration keys (glm-asr use "lm_config" instead of text_config). It now correctly identifies the config section by checking:
llm_config_key = "lm_config" if "lm_config" in self.hparams else "text_config"

Result

ngxson · 2025-12-10T10:27:53Z

tools/mtmd/clip.cpp

+            cur = ggml_mul_mat(ctx0, model.mm_1_w, cur);
+            cur = ggml_add(ctx0, cur, model.mm_1_b);
+            cur = ggml_gelu_erf(ctx0, cur);
+            cur = ggml_mul_mat(ctx0, model.mm_2_w, cur);
+            cur = ggml_add(ctx0, cur, model.mm_2_b);


replace this with build_ffn

ngxson · 2025-12-10T10:29:06Z

tools/mtmd/clip.cpp

+                // whisper downscales input token by half after conv1d
+                n_patches /= 2;
+                // reshape by merge_factor
+                n_patches /= 4;


Suggested change

n_patches /= 4;

n_patches /= n_merge;

and you also need to set hparams.n_merge = 4 upon loading hparams, see load_hparams() function

ngxson · 2025-12-10T10:31:58Z

tools/mtmd/clip.cpp

+            cur = ggml_norm(ctx0, cur, hparams.eps);
+            cur = ggml_mul(ctx0, cur, model.mm_norm_pre_w);
+            cur = ggml_add(ctx0, cur, model.mm_norm_pre_b);
+            cur = ggml_reshape_2d(ctx0, cur, cur->ne[0] * 4, cur->ne[1] / 4);


this will fail if number of elements is not divisible by 4

instead, you should abstract out the StackAudioFrames used by ultravox into a new function, build_stack(), and reuse it here

CISC · 2025-12-10T14:55:03Z

convert_hf_to_gguf.py

    "VLlama3ForCausalLM",
    "LlavaForConditionalGeneration",
    "VoxtralForConditionalGeneration",
+    "GlmasrModel",


Suggested change

"GlmasrModel",

This will get overwritten by lm_config.

CISC · 2025-12-10T15:02:10Z

convert_hf_to_gguf.py

+        if isinstance(self.hparams.get("eos_token_id"), list):
+            from transformers import AutoTokenizer
+            tokenizer = AutoTokenizer.from_pretrained(self.dir_model, trust_remote_code=True)
+            special_vocab = gguf.SpecialVocab(self.dir_model, load_merges=True)
+            special_vocab._set_special_token("eos", tokenizer.get_added_vocab()["<|endoftext|>"])
+            special_vocab._set_special_token("eot", tokenizer.get_added_vocab()["<|user|>"])
+            special_vocab._set_special_token("unk", tokenizer.get_added_vocab()["<|endoftext|>"])
+            special_vocab._set_special_token("bos", tokenizer.get_added_vocab()["<|endoftext|>"])
+            special_vocab.add_to_gguf(self.gguf_writer)
+            special_vocab.chat_template = "glmedge"


This is not ok, check for root architecture instead (see Qwen3MoeModel), also you should not need to set any of those special tokens.

Also, setting the template name like that doesn't work any more I think, and it's a dirty hack to begin with, if the model creators can't be bothered, neither should we.

ngxson · 2025-12-10T15:52:23Z

convert_hf_to_gguf.py

+            special_vocab._set_special_token("unk", tokenizer.get_added_vocab()["<|endoftext|>"])
+            special_vocab._set_special_token("bos", tokenizer.get_added_vocab()["<|endoftext|>"])
+            special_vocab.add_to_gguf(self.gguf_writer)
+            special_vocab.chat_template = "glmedge"


as always, it would be nice if GLM team can be more responsible for more carefully testing and distributing the chat template

we don't generally accept this kind of chat template hack anymore, as it is not supported by the jinja engine

[model] add glm-asr support

f432a5c

piDack requested review from CISC and ngxson as code owners December 10, 2025 03:28

github-actions bot added examples python python script changes labels Dec 10, 2025

loci-dev mentioned this pull request Dec 10, 2025

UPSTREAM PR #17901: [model] add glm-asr support auroralabs-loci/llama.cpp#508

Open

piDack added 2 commits December 10, 2025 03:54

fix format for ci

c382d64

fix convert format for ci

e8a1ec5

piDack changed the title ~~[model] add glm-asr support~~ model: add glm-asr support Dec 10, 2025

ngxson requested changes Dec 10, 2025

View reviewed changes

CISC reviewed Dec 10, 2025

View reviewed changes

ngxson reviewed Dec 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model: add glm-asr support #17901

model: add glm-asr support #17901

piDack commented Dec 10, 2025

Uh oh!

ngxson Dec 10, 2025

Uh oh!

ngxson Dec 10, 2025

Uh oh!

ngxson Dec 10, 2025 •

edited

Loading

Uh oh!

CISC Dec 10, 2025

Uh oh!

CISC Dec 10, 2025 •

edited

Loading

Uh oh!

ngxson Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

model: add glm-asr support #17901

Are you sure you want to change the base?

model: add glm-asr support #17901

Conversation

piDack commented Dec 10, 2025

Uh oh!

ngxson Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CISC Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson Dec 10, 2025 •

edited

Loading

CISC Dec 10, 2025 •

edited

Loading