-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Integrate SME1 SGEMM KleidiAI kernels #25760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate SME1 SGEMM KleidiAI kernels #25760
Conversation
|
@microsoft-github-policy-service agree company="Arm" |
3437e9a to
dfdc6e1
Compare
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
443a898 to
d165482
Compare
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
40526c4 to
9800e11
Compare
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
Signed-off-by: Patryk Kaiser <[email protected]>
9800e11 to
39b7e05
Compare
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
|
/azp run Windows ARM64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This reverts commit ec3bf7f.
Key changes
This PR integrates KleidiAI SME1 FP32 kernels into the existing kleidiai_sgemm.cpp implementation.
Adding SME2 flag in onnxruntime/core/common/cpuid_info.h & onnxruntime/core/common/cpuid_info.cc
Previous SME2 kernels integrated were using SME(1) check, this change will correctly distinguish between when SME1 and SME2 kernels are to be used.
Bumping KleidiAI version to 1.10.0
Indicative performance data

Single thread Mac Mini M4 runs on various models using: onnxruntime_perf_test -v -e cpu -I -m times -x 1 -y 1 -r 1
Next steps
Additional commits to come will address outstanding to-do issues from previous PR linked below:
KleidiAI SGEMM/IGEMM/Quantized MatMul - Modular MLAS API Changes for KleidiAI #25187