Skip to content

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Dec 3, 2025

Mirrored from ggml-org/llama.cpp#17706

Since the combination of MUL_MAT + ADD operations occurs too frequently during model execution, we use the fused operator method, replacing separate calculations with the aclnnaddmm operator to improve efficiency and performance.

@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #406

Project: llama.cpp
PR: CANN Backend Operator Fusion (MUL_MAT + ADD)
Versions: 49737af9 vs 05f5e78f


Summary

This PR introduces opt-in operator fusion for the CANN backend, combining MUL_MAT and ADD operations into a single aclnnAddmm kernel call. The feature is disabled by default via the GGML_CANN_OPERATOR_FUSION environment variable. Performance analysis shows no measurable changes across all binaries and functions, as the optimization is inactive in the baseline build. All 16 analyzed binaries maintain identical power consumption (< 0.001% variance). No functions show Response Time or Throughput Time changes. The implementation adds 155 lines across 4 files without modifying core inference paths.


Note: Analysis conducted under Condition 1 - No performance metric changes detected. The fusion optimization requires explicit runtime enablement to observe performance impact.

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from 3e4b499 to e81a7eb Compare December 5, 2025 13:17
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from e70bc15 to ef96f85 Compare December 14, 2025 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants