ggml-hexagon: mm for mtmd #17894

joeldushouyu · 2025-12-09T22:17:53Z

Summary

This pr allows running vision models(tested with gemma4b) on Hexagon NPU.

For now, it only supports using the CDSP for doing fp16xfp32.
Note: I am fully aware that the current FP16xFP32 implementation is not the most optimal. For example, we can easily reduce unnecessary data repetition by using the vtcm as cache, but I think that should probably go into a separate pr that focuses solely on optimization.

Test

I used the f16 vision weights and q40 language weights from unsloth.

1. build hexagon in docker

cmake --preset arm64-android-snapdragon-release -B build-snapdragon
cmake --build build-snapdragon
cmake --install build-snapdragon --prefix pkg-adb/llama.cpp

2. push the weights to phone(tested with samsung s25 ultra

adb push mmproj-F16.gguf data/local/tmp/gguf
adb push gemma-3-4b-it-Q4_0.gguf /data/local/tmp/gguf
adb push hydro_1.png /data/local/tmp/gguf   #Image for testing

3. run the run-mtmd script

E=1 NDEV=1 D=HTP0 MTMD_DEVICE=HTP0 PROF=1 V=1 M=gemma-3-4b-it-Q4_0.gguf MMPROJ=mmproj-F16.gguf IMG=hydro_1.png ./scripts/snapdragon/adb/run-mtmd.sh -p '"What is in this image."'

joeldushouyu · 2025-12-10T21:28:35Z

As I mentioned earlier, I think there’s still a lot of room to optimize the FP16×FP32 kernel by taking advantage of features like VT-CM and DMA. That said, I’m trying to figure out whether there’s any publicly available documentation on how to use the HMX instructions — the built-in matrix-multiplication hardware on the CDSP?

I noticed in the Hexagon SDK docs that the qhl_hmx library was removed starting from SDK 6.0. Is there a specific reason for its removal, and is there any plan to introduce a replacement or an updated HMX library? My impression is that VT-CM can help reduce data redundancy, but the HMX systolic core should still offer better compute throughput compared to implementing matrix multiplies using HVX vector dot-products.

joeldushouyu · 2025-12-10T21:37:52Z

Note: commit c73a2c0 is the patch fix to pass the test case in ggml by running.
Mainly because src0 data memory is non-contiguous on some of the test cases.

HB=0 ./scripts/snapdragon/adb/run-tool.sh test-backend-ops -b HTP0 -o MUL_MAT

joeldushouyu added 3 commits December 9, 2025 16:41

feat: add run_mtmd script for hexagon

ced64af

fix: fix issue in fp16xfp32 mm

cf74c63

fix: remove opt_experiment for fp16xfp32 mm

2c3f20d

joeldushouyu changed the title ~~Mtmd hexagon~~ ggml-hexagon: mm for mtmd Dec 9, 2025

joeldushouyu marked this pull request as ready for review December 9, 2025 22:27

joeldushouyu requested review from lhez and max-krasnyansky as code owners December 9, 2025 22:27

github-actions bot added script Script related ggml changes relating to the ggml tensor library for machine learning labels Dec 10, 2025

fix: ggml-hexagon: matmul fp16xfp32 support non-contigious src0

c73a2c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-hexagon: mm for mtmd #17894

ggml-hexagon: mm for mtmd #17894

joeldushouyu commented Dec 9, 2025

Uh oh!

joeldushouyu commented Dec 10, 2025

Uh oh!

joeldushouyu commented Dec 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ggml-hexagon: mm for mtmd #17894

Are you sure you want to change the base?

ggml-hexagon: mm for mtmd #17894

Conversation

joeldushouyu commented Dec 9, 2025

Summary

Test

1. build hexagon in docker

2. push the weights to phone(tested with samsung s25 ultra

3. run the run-mtmd script

Uh oh!

joeldushouyu commented Dec 10, 2025

Uh oh!

joeldushouyu commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joeldushouyu commented Dec 10, 2025 •

edited

Loading