UPSTREAM PR #17907: [WIP]gml-hexagon: Q4_0 mm opt #512

loci-dev · 2025-12-10T13:44:14Z

Mirrored from ggml-org/llama.cpp#17907

Make sure to read the contributing guidelines before submitting a PR

…tions

…ul operations" This reverts commit 7c8f101.

…rations" This reverts commit b567413.

…ction

…handling and processing

…and multiplication

…oading and multiplication

…ptimization

loci-agentic-ai · 2025-12-10T14:32:59Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary

Project: llama.cpp (auroralabs-loci)
PR #512: UPSTREAM PR #17907: [WIP]gml-hexagon: Q4_0 mm opt
Versions Compared: 571689ba-c827-4914-a21b-e7f9319ce9fa vs babe7961-3459-4634-9db9-ac5ad6523347

Summary

This PR introduces Qualcomm Hexagon HTP backend optimizations for Q4_0 quantized matrix multiplication through loop unrolling, helper function extraction, and RoPE implementation refactoring. Analysis reveals no measurable performance impact across all binaries and functions. Changes affect 2 files (matmul-ops.c, rope-ops.c) with code quality improvements including const-correctness, constant hoisting, and memcpy optimization for bulk copies. The modifications maintain performance parity with power consumption changes below 0.001% across all binaries.

chraac added 25 commits November 27, 2025 12:54

fix test failure

407b408

fix: correct scaling calculations in rope_cache_init

4ddb8a4

wip

cfca78b

wip

e9a02fd

fix: optimize element copying in rope_hex_f32 using memcpy

e324bb0

fix: optimize loop boundaries in rope_hex_f32 for better performance

0121291

rename

010039a

wip

a6ef41f

Merge branch 'master' into dev-fix-rope

0376146

Merge tag 'b7207' into dev-fix-rope

8abecfa

feat: add profiling macros for performance measurement in operations

b567413

refactor: replace manual timing with profiling macros in matmul opera…

7c8f101

…tions

Merge branch 'master' into dev-fix-rope

3a70465

Revert "refactor: replace manual timing with profiling macros in matm…

3b0cef4

…ul operations" This reverts commit 7c8f101.

Revert "feat: add profiling macros for performance measurement in ope…

121e656

…rations" This reverts commit b567413.

refactor: optimize vector operations in vec_dot_q4x4x2_q8x4x2_rx2 fun…

401fd3e

…ction

wip

cf491f2

feat: enhance vec_dot_q4x4x2_q8x4x2_rx2 function with optimized data …

3a01d82

…handling and processing

Merge branch 'master' into dev-mulmat-opt

87ad8b2

feat: add hvx_vec_load_d_and_mpy function for optimized data loading …

421d031

…and multiplication

wip

bd43860

feat: add hvx_vec_load_d_and_mpy_r2x2 function for optimized vector l…

b197464

…oading and multiplication

feat: optimize vec_dot functions with improved data handling and loading

309d782

wip

dbe9309

feat: add build information and update vector loading functions for o…

00d5fb3

…ptimization

loci-dev temporarily deployed to PROD__AL_DEMO December 10, 2025 13:44 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 3 times, most recently from 0e7b989 to 24b5a2d Compare December 10, 2025 19:08

loci-dev force-pushed the main branch 7 times, most recently from 78ff3d3 to 117bfc3 Compare December 11, 2025 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #17907: [WIP]gml-hexagon: Q4_0 mm opt #512

UPSTREAM PR #17907: [WIP]gml-hexagon: Q4_0 mm opt #512

loci-dev commented Dec 10, 2025

Uh oh!

loci-agentic-ai bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #17907: [WIP]gml-hexagon: Q4_0 mm opt #512

Are you sure you want to change the base?

UPSTREAM PR #17907: [WIP]gml-hexagon: Q4_0 mm opt #512

Conversation

loci-dev commented Dec 10, 2025

Uh oh!

loci-agentic-ai bot commented Dec 10, 2025

Performance Analysis Summary

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants