Skip to content

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#17907

Make sure to read the contributing guidelines before submitting a PR

@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

Performance Analysis Summary

Project: llama.cpp (auroralabs-loci)
PR #512: UPSTREAM PR #17907: [WIP]gml-hexagon: Q4_0 mm opt
Versions Compared: 571689ba-c827-4914-a21b-e7f9319ce9fa vs babe7961-3459-4634-9db9-ac5ad6523347


Summary

This PR introduces Qualcomm Hexagon HTP backend optimizations for Q4_0 quantized matrix multiplication through loop unrolling, helper function extraction, and RoPE implementation refactoring. Analysis reveals no measurable performance impact across all binaries and functions. Changes affect 2 files (matmul-ops.c, rope-ops.c) with code quality improvements including const-correctness, constant hoisting, and memcpy optimization for bulk copies. The modifications maintain performance parity with power consumption changes below 0.001% across all binaries.

@loci-dev loci-dev force-pushed the main branch 3 times, most recently from 0e7b989 to 24b5a2d Compare December 10, 2025 19:08
@loci-dev loci-dev force-pushed the main branch 7 times, most recently from 78ff3d3 to 117bfc3 Compare December 11, 2025 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants