Integrate SME1 SGEMM KleidiAI kernels #25760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

edgchen1 merged 1 commit into microsoft:main from patryk-kaiser-ARM:SME1_sgemm_integration

Sep 12, 2025

Contributor

patryk-kaiser-ARM commented Aug 15, 2025 •

edited

Loading

Key changes
This PR integrates KleidiAI SME1 FP32 kernels into the existing kleidiai_sgemm.cpp implementation.

Adding SME2 flag in onnxruntime/core/common/cpuid_info.h & onnxruntime/core/common/cpuid_info.cc
Previous SME2 kernels integrated were using SME(1) check, this change will correctly distinguish between when SME1 and SME2 kernels are to be used.

Bumping KleidiAI version to 1.10.0

Indicative performance data
Single thread Mac Mini M4 runs on various models using: onnxruntime_perf_test -v -e cpu -I -m times -x 1 -y 1 -r 1

Next steps
Additional commits to come will address outstanding to-do issues from previous PR linked below:
KleidiAI SGEMM/IGEMM/Quantized MatMul - Modular MLAS API Changes for KleidiAI #25187

Contributor Author

patryk-kaiser-ARM commented Aug 15, 2025

@microsoft-github-policy-service agree company="Arm"

jywu-msft added the KleidiAI label

jywu-msft requested review from edgchen1 and hariharans29

August 19, 2025 03:46

patryk-kaiser-ARM marked this pull request as draft

August 21, 2025 10:06

edgchen1 reviewed

View reviewed changes

onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc Show resolved Hide resolved

patryk-kaiser-ARM force-pushed the SME1_sgemm_integration branch from 3437e9a to dfdc6e1 Compare

August 25, 2025 12:04

patryk-kaiser-ARM marked this pull request as ready for review

August 25, 2025 12:10

Member

hariharans29 commented Aug 25, 2025

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines bot commented Aug 25, 2025

Azure Pipelines successfully started running 5 pipeline(s).

patryk-kaiser-ARM force-pushed the SME1_sgemm_integration branch 2 times, most recently from 443a898 to d165482 Compare

September 1, 2025 12:03

Member

hariharans29 commented Sep 2, 2025

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines bot commented Sep 2, 2025

Azure Pipelines successfully started running 5 pipeline(s).

patryk-kaiser-ARM requested a review from edgchen1

September 3, 2025 15:51

edgchen1 reviewed

View reviewed changes

onnxruntime/core/common/cpuid_info.cc Show resolved Hide resolved

onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp Outdated Show resolved Hide resolved

onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp Outdated Show resolved Hide resolved

onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp Show resolved Hide resolved

patryk-kaiser-ARM force-pushed the SME1_sgemm_integration branch 2 times, most recently from 40526c4 to 9800e11 Compare

September 9, 2025 12:50

Member

hariharans29 commented Sep 9, 2025

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines bot commented Sep 9, 2025

Azure Pipelines successfully started running 5 pipeline(s).


          Integrate SME1 SGEMM KleidiAI kernels

39b7e05

Signed-off-by: Patryk Kaiser <[email protected]>

patryk-kaiser-ARM force-pushed the SME1_sgemm_integration branch from 9800e11 to 39b7e05 Compare

September 10, 2025 11:13

Member

hariharans29 commented Sep 10, 2025

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines bot commented Sep 10, 2025

Azure Pipelines successfully started running 5 pipeline(s).

edgchen1 approved these changes

View reviewed changes

hariharans29 approved these changes

View reviewed changes

hariharans29 closed this

hariharans29 reopened this

Member

hariharans29 commented Sep 12, 2025

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines bot commented Sep 12, 2025

Azure Pipelines successfully started running 5 pipeline(s).

Member

hariharans29 commented Sep 12, 2025

/azp run Windows ARM64 QNN CI Pipeline

azure-pipelines bot commented Sep 12, 2025

Azure Pipelines successfully started running 1 pipeline(s).

edgchen1 merged commit ec3bf7f into microsoft:main

155 of 163 checks passed

hariharans29 added a commit that referenced this pull request


          Revert "Integrate SME1 SGEMM KleidiAI kernels (#25760)"

3c7d339

This reverts commit ec3bf7f.

hariharans29 mentioned this pull request

Revert "Integrate SME1 SGEMM KleidiAI kernels" #26053

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels