Skip to content

Conversation

@xiaomsft
Copy link
Contributor

@xiaomsft xiaomsft commented Jul 29, 2025

Description

This PR implements GatherBlockQuantized operator for CUDA EP with 4 bit and 8 bit data support.

Motivation and Context

GatherBlockQuantified operator is essential for MOE model's expert selection, especially when the model has been statically quantized.

@tianleiwu
Copy link
Contributor

Please update onnxruntime/test/contrib_ops/gather_block_quantized_op_test.cc to test CUDA EP when it is available.

@xiaomsft
Copy link
Contributor Author

Please update onnxruntime/test/contrib_ops/gather_block_quantized_op_test.cc to test CUDA EP when it is available.

Working on it

@xiaomsft
Copy link
Contributor Author

@microsoft-github-policy-service agree [company="{Microsoft}"]

@xiaomsft
Copy link
Contributor Author

@microsoft-github-policy-service agree company="Microsoft"

@xiaomsft xiaomsft force-pushed the xiaoh/gather_block_quantized_cuda branch from c8d2587 to 3e352e4 Compare July 30, 2025 05:06
@xiaomsft xiaomsft force-pushed the xiaoh/gather_block_quantized_cuda branch from 3e352e4 to bb04d4c Compare July 30, 2025 05:22
@tianleiwu
Copy link
Contributor

tianleiwu commented Aug 1, 2025

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

tianleiwu
tianleiwu previously approved these changes Aug 1, 2025
@xiaomsft xiaomsft force-pushed the xiaoh/gather_block_quantized_cuda branch from 7be0d43 to 0c7938e Compare August 1, 2025 17:42
@tianleiwu
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@kunal-vaishnavi kunal-vaishnavi merged commit 747b4b0 into microsoft:main Aug 1, 2025
89 of 91 checks passed
sophies927 pushed a commit that referenced this pull request Aug 2, 2025
### Description
This PR implements GatherBlockQuantized operator for CUDA EP with 4 bit
and 8 bit data support.


### Motivation and Context
GatherBlockQuantified operator is essential for MOE model's expert
selection, especially when the model has been statically quantized.

---------

Co-authored-by: Xiaoyan Hu <[email protected]>
sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request Aug 11, 2025
…5575)

### Description
This PR implements GatherBlockQuantized operator for CUDA EP with 4 bit
and 8 bit data support.


### Motivation and Context
GatherBlockQuantified operator is essential for MOE model's expert
selection, especially when the model has been statically quantized.

---------

Co-authored-by: Xiaoyan Hu <[email protected]>
gedoensmax pushed a commit to gedoensmax/onnxruntime that referenced this pull request Sep 2, 2025
…5575)

### Description
This PR implements GatherBlockQuantized operator for CUDA EP with 4 bit
and 8 bit data support.


### Motivation and Context
GatherBlockQuantified operator is essential for MOE model's expert
selection, especially when the model has been statically quantized.

---------

Co-authored-by: Xiaoyan Hu <[email protected]>
tianleiwu pushed a commit that referenced this pull request Sep 4, 2025
### Description
This PR implements GatherBlockQuantized operator for CUDA EP with 4 bit
and 8 bit data support.


### Motivation and Context
GatherBlockQuantified operator is essential for MOE model's expert
selection, especially when the model has been statically quantized.

---------

Co-authored-by: Xiaoyan Hu <[email protected]>
@tianleiwu tianleiwu added cherry-picked Cherry-picked for a cherrypicks branch and removed release:1.23.0 labels Sep 4, 2025
jywu-msft pushed a commit that referenced this pull request Sep 5, 2025
### Description
Cherry-pick the following PRs:
#25943
#25937 
#25917
#25909
#25898
#25897
#25888
#25881
#25830
#25619
#25575
#25572
#25558
#25530
#25474
#25455
#25110

Also two dependent PRs for qMoE cpu: 
#25877
#25822

---------

Co-authored-by: xiaomsft <[email protected]>
Co-authored-by: Xiaoyan Hu <[email protected]>
Co-authored-by: Akshay Sonawane <[email protected]>
Co-authored-by: Kunal Vaishnavi <[email protected]>
Co-authored-by: Pradeep Sakhamoori <[email protected]>
Co-authored-by: mingyue <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
Co-authored-by: Adrian Lizarraga <[email protected]>
Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Emmanuel <[email protected]>
Co-authored-by: Emmanuel Assumang <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: praneshgo <[email protected]>
Co-authored-by: Hariharan Seshadri <[email protected]>
Co-authored-by: Jing Fang <[email protected]>
Co-authored-by: Ishwar Raut <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked Cherry-picked for a cherrypicks branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants