-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Relax WeightBiasQuantization constraint for larger QDQ node group #25673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR relaxes constraints in the WeightBiasQuantization transformer to enable quantization of weights in larger QDQ node groups that include type-preserving operations between the target node and QuantizeLinear nodes.
- Introduces a new function to check for valid paths to QuantizeLinear nodes that preserve type/shape without branching
- Replaces the strict single-consumer QuantizeLinear requirement with a more flexible path validation
- Fixes multiple spelling errors in function names across QNN builder components
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/core/optimizer/qdq_transformer/weight_bias_quantization.cc | Adds new path validation logic and replaces strict QuantizeLinear consumer check |
| onnxruntime/test/optimizer/qdq_transformer_test.cc | Adds test case for Conv with ReLU pattern and updates test comments |
| onnxruntime/core/providers/qnn/builder/opbuilder/base_op_builder.h | Fixes function name spelling from "ReArranagePads" to "ReArrangePads" |
| onnxruntime/core/providers/qnn/builder/opbuilder/conv_op_builder.cc | Updates function call and removes extra whitespace |
| onnxruntime/core/providers/qnn/builder/opbuilder/pool_op_builder.cc | Updates function call to use corrected spelling |
| onnxruntime/core/providers/qnn/builder/opbuilder/pad_op_builder.cc | Updates function call to use corrected spelling |
| onnxruntime/core/providers/qnn/builder/qnn_model_wrapper.cc | Fixes spelling error in comment |
onnxruntime/core/optimizer/qdq_transformer/weight_bias_quantization.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/optimizer/qdq_transformer/weight_bias_quantization.cc
Outdated
Show resolved
Hide resolved
f363878 to
34ed336
Compare
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
34ed336 to
00bcce4
Compare
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
…5673) ### Description Relax WeightBiasQuantization constraint for larger QDQ node group ### Motivation and Context The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`). It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.
- **Relax WeightBiasQuantization constraint for larger QDQ node group (#25673)** - **Add cuda graph implementation for NV TRT RTX EP (#25787)** - **python GPU IO Bindings for NVIDIA (#25776)** - **Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814)** - **Fix a long standing bug on file memory mapping on windows. (#25833)** - **Add API for precompiled model compatibility check using just the compat info (#25841)** - **Enable ABSL_FLAGS flag registration for onnxruntime_perf_test for mobile build (#25849)** - **Add default constructor to Ort::Status. (#25860)** - #25871 - #25878 - #25884 - #25886 - #25866
|
The change is added to the release branch |
…crosoft#25673) ### Description Relax WeightBiasQuantization constraint for larger QDQ node group ### Motivation and Context The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`). It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.
- pick microsoft#25584 - pick microsoft#25635 - pick microsoft#25673 - pick microsoft#25702 - pick microsoft#25738
Description
Relax WeightBiasQuantization constraint for larger QDQ node group
Motivation and Context
The transformer
WeightBiasQuantizationquantizes float weights onQ -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQsequence; The check onWeights -> Q(children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName) is an issue due to it would skip quantization for many common patterns such as unfused activations followed byConv(DQ - Conv -> ReLU -> Q).It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to
Qto enable more quantization support.