UPSTREAM PR #17737: CANN: implement the SSM_CONV operator #416

loci-dev · 2025-12-03T12:46:41Z

Description

We implement the SSM_CONV operator using depthwise 1D convolution.
We use high-level builtin aclnnConvolution function.

The goal is to compute the following:

$$ y[i,j,k] = \sum_{l=0}^{dconv}w[l,i] x[l+j, i, k] $$

where the shape of $y$ is $[dinner, nt, ns]$, $x$ is $[dconv - 1 + nt, dinner, ns]$ and $w$ is $[dconv, dinner]$.

In order to use aclnnConvolution to implement this formula, we reshape the tensors and set the groups parameter to d_inner to calculate the convolution for each channel independently.

Testing

We ran test-backend-ops test suite for SSM_CONV on two different cards: 310P3 and 910B3.

For the 310P3 card, it requires setting the cubeMathType parameter to ALLOW_FP32_DOWN_PRECISION, and it seems that causes the computation to be done not in f32, which in turn causes the tests to not pass with a small error (NMSE 0.000000114, greater than the allowed 1e-7). We had to override max_nmse_err() method for test_ssm_conv to set the maximum error to 1e-6 which allows the tests to pass.

On the 910B card, the operator runs in f32 natively, it passes the tests at the original 1e-7 precision.

Co-authored-by: Aleksei Lobanov, <[email protected]> Co-authored-by: Sujin Kang, <[email protected]>

loci-agentic-ai · 2025-12-03T13:42:49Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #416 - CANN SSM_CONV Operator Implementation

Overview

PR #416 implements the SSM_CONV operator for the CANN backend, adding support for state-space model convolution operations on Ascend NPUs. The changes introduce 137 new lines across 4 files with no deletions, representing a pure feature addition rather than a modification of existing code paths.

Performance Impact Analysis

Power Consumption: Analysis across all binaries shows 0.0% change in power consumption between versions. The measured values for key binaries remain identical:

libllama.so: 194,195 nJ (no change)
libggml-cpu.so: 116,810 nJ (no change)
llama-run: 218,940 nJ (no change)

Inference Performance: No functions in the core inference path (llama_decode, llama_encode, llama_tokenize) were modified. The new ggml_cann_ssm_conv function is an isolated addition to the CANN backend operator set and does not affect existing CPU or GPU inference paths. Tokens per second for standard transformer models remains unchanged.

Code Changes:

New function ggml_cann_ssm_conv implements depthwise 1D convolution using aclnnConvolution
Tensor reshaping logic converts between GGML layout (CLN format) and CANN NCL format
Platform-specific handling for Ascend 310P3 cards sets cubeMathType=1 for FP32 precision
Switch case additions in ggml_cann_compute_forward and ggml_backend_cann_supports_op register the new operator
Test tolerance adjustment from 1e-7 to 1e-6 accommodates 310P3 precision behavior

Scope: This PR exclusively affects state-space models (Mamba, RWKV architectures) running on CANN backend. Standard transformer models and non-CANN backends are unaffected. The implementation adds 123 lines of tensor manipulation and convolution setup code without modifying any existing operator implementations.

CANN: implement SSM_CONV operator

df6a560

Co-authored-by: Aleksei Lobanov, <[email protected]> Co-authored-by: Sujin Kang, <[email protected]>

loci-dev temporarily deployed to PROD__AL_DEMO December 3, 2025 12:46 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 27 times, most recently from 9612097 to c217e38 Compare December 6, 2025 08:10

loci-dev force-pushed the main branch 30 times, most recently from b28744d to 4733ac4 Compare December 13, 2025 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #17737: CANN: implement the SSM_CONV operator #416

UPSTREAM PR #17737: CANN: implement the SSM_CONV operator #416

Uh oh!

loci-dev commented Dec 3, 2025

Uh oh!

loci-agentic-ai bot commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #17737: CANN: implement the SSM_CONV operator #416

Are you sure you want to change the base?

UPSTREAM PR #17737: CANN: implement the SSM_CONV operator #416

Uh oh!

Conversation

loci-dev commented Dec 3, 2025

Description

Testing

Uh oh!

loci-agentic-ai bot commented Dec 3, 2025

Performance Analysis Summary: PR #416 - CANN SSM_CONV Operator Implementation

Overview

Performance Impact Analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants