Skip to content

Conversation

@qti-yuduo
Copy link
Contributor

Description

Relax WeightBiasQuantization constraint for larger QDQ node group

Motivation and Context

The transformer WeightBiasQuantization quantizes float weights on Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ sequence; The check on Weights -> Q (children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by Conv (DQ - Conv -> ReLU -> Q).

It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to Q to enable more quantization support.

@HectorSVC
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@HectorSVC HectorSVC added the ep:QNN issues related to QNN exeution provider label Aug 11, 2025
@HectorSVC HectorSVC requested a review from Copilot August 11, 2025 17:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR relaxes constraints in the WeightBiasQuantization transformer to enable quantization of weights in larger QDQ node groups that include type-preserving operations between the target node and QuantizeLinear nodes.

  • Introduces a new function to check for valid paths to QuantizeLinear nodes that preserve type/shape without branching
  • Replaces the strict single-consumer QuantizeLinear requirement with a more flexible path validation
  • Fixes multiple spelling errors in function names across QNN builder components

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
onnxruntime/core/optimizer/qdq_transformer/weight_bias_quantization.cc Adds new path validation logic and replaces strict QuantizeLinear consumer check
onnxruntime/test/optimizer/qdq_transformer_test.cc Adds test case for Conv with ReLU pattern and updates test comments
onnxruntime/core/providers/qnn/builder/opbuilder/base_op_builder.h Fixes function name spelling from "ReArranagePads" to "ReArrangePads"
onnxruntime/core/providers/qnn/builder/opbuilder/conv_op_builder.cc Updates function call and removes extra whitespace
onnxruntime/core/providers/qnn/builder/opbuilder/pool_op_builder.cc Updates function call to use corrected spelling
onnxruntime/core/providers/qnn/builder/opbuilder/pad_op_builder.cc Updates function call to use corrected spelling
onnxruntime/core/providers/qnn/builder/qnn_model_wrapper.cc Fixes spelling error in comment

@HectorSVC
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@HectorSVC
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@HectorSVC HectorSVC merged commit 0ca313b into microsoft:main Aug 14, 2025
86 of 87 checks passed
snnn pushed a commit that referenced this pull request Aug 28, 2025
…5673)

### Description
Relax WeightBiasQuantization constraint for larger QDQ node group

### Motivation and Context
The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`).

It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.
snnn added a commit that referenced this pull request Aug 29, 2025
- **Relax WeightBiasQuantization constraint for larger QDQ node group
(#25673)**
- **Add cuda graph implementation for NV TRT RTX EP (#25787)**
- **python GPU IO Bindings for NVIDIA  (#25776)**
- **Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814)**
- **Fix a long standing bug on file memory mapping on windows.
(#25833)**
- **Add API for precompiled model compatibility check using just the
compat info (#25841)**
- **Enable ABSL_FLAGS flag registration for onnxruntime_perf_test for
mobile build (#25849)**
- **Add default constructor to Ort::Status. (#25860)**
- #25871
- #25878
- #25884
- #25886
- #25866
@snnn
Copy link
Member

snnn commented Aug 30, 2025

The change is added to the release branch

gedoensmax pushed a commit to gedoensmax/onnxruntime that referenced this pull request Sep 2, 2025
…crosoft#25673)

### Description
Relax WeightBiasQuantization constraint for larger QDQ node group

### Motivation and Context
The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`).

It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.
qti-yuduo pushed a commit to CodeLinaro/onnxruntime that referenced this pull request Sep 24, 2025
@qti-yuduo qti-yuduo deleted the dev/yuduo/fold-q branch September 24, 2025 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:QNN issues related to QNN exeution provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants