Relax WeightBiasQuantization constraint for larger QDQ node group #25673

qti-yuduo · 2025-08-06T20:55:45Z

Description

Relax WeightBiasQuantization constraint for larger QDQ node group

Motivation and Context

The transformer WeightBiasQuantization quantizes float weights on Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ sequence; The check on Weights -> Q (children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by Conv (DQ - Conv -> ReLU -> Q).

It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to Q to enable more quantization support.

HectorSVC · 2025-08-11T16:53:45Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-08-11T16:54:05Z

Azure Pipelines successfully started running 5 pipeline(s).

Copilot

Pull Request Overview

This PR relaxes constraints in the WeightBiasQuantization transformer to enable quantization of weights in larger QDQ node groups that include type-preserving operations between the target node and QuantizeLinear nodes.

Introduces a new function to check for valid paths to QuantizeLinear nodes that preserve type/shape without branching
Replaces the strict single-consumer QuantizeLinear requirement with a more flexible path validation
Fixes multiple spelling errors in function names across QNN builder components

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
onnxruntime/core/optimizer/qdq_transformer/weight_bias_quantization.cc	Adds new path validation logic and replaces strict QuantizeLinear consumer check
onnxruntime/test/optimizer/qdq_transformer_test.cc	Adds test case for Conv with ReLU pattern and updates test comments
onnxruntime/core/providers/qnn/builder/opbuilder/base_op_builder.h	Fixes function name spelling from "ReArranagePads" to "ReArrangePads"
onnxruntime/core/providers/qnn/builder/opbuilder/conv_op_builder.cc	Updates function call and removes extra whitespace
onnxruntime/core/providers/qnn/builder/opbuilder/pool_op_builder.cc	Updates function call to use corrected spelling
onnxruntime/core/providers/qnn/builder/opbuilder/pad_op_builder.cc	Updates function call to use corrected spelling
onnxruntime/core/providers/qnn/builder/qnn_model_wrapper.cc	Fixes spelling error in comment

onnxruntime/core/optimizer/qdq_transformer/weight_bias_quantization.cc

onnxruntime/test/optimizer/qdq_transformer_test.cc

onnxruntime/core/optimizer/qdq_transformer/weight_bias_quantization.cc

HectorSVC · 2025-08-12T16:08:17Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-08-12T16:08:42Z

Azure Pipelines successfully started running 5 pipeline(s).

onnxruntime/test/optimizer/qdq_transformer_test.cc

…de group

HectorSVC · 2025-08-13T21:24:52Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-08-13T21:25:13Z

Azure Pipelines successfully started running 5 pipeline(s).

…5673) ### Description Relax WeightBiasQuantization constraint for larger QDQ node group ### Motivation and Context The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`). It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.

- **Relax WeightBiasQuantization constraint for larger QDQ node group (#25673)** - **Add cuda graph implementation for NV TRT RTX EP (#25787)** - **python GPU IO Bindings for NVIDIA (#25776)** - **Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814)** - **Fix a long standing bug on file memory mapping on windows. (#25833)** - **Add API for precompiled model compatibility check using just the compat info (#25841)** - **Enable ABSL_FLAGS flag registration for onnxruntime_perf_test for mobile build (#25849)** - **Add default constructor to Ort::Status. (#25860)** - #25871 - #25878 - #25884 - #25886 - #25866

snnn · 2025-08-30T05:32:19Z

The change is added to the release branch

…crosoft#25673) ### Description Relax WeightBiasQuantization constraint for larger QDQ node group ### Motivation and Context The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`). It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.

- pick microsoft#25584 - pick microsoft#25635 - pick microsoft#25673 - pick microsoft#25702 - pick microsoft#25738

HectorSVC added the ep:QNN issues related to QNN exeution provider label Aug 11, 2025

HectorSVC requested a review from Copilot August 11, 2025 17:01

Copilot AI reviewed Aug 11, 2025

View reviewed changes

onnxruntime/core/optimizer/qdq_transformer/weight_bias_quantization.cc Outdated Show resolved Hide resolved

onnxruntime/test/optimizer/qdq_transformer_test.cc Outdated Show resolved Hide resolved

HectorSVC requested a review from adrianlizarraga August 11, 2025 17:18

adrianlizarraga reviewed Aug 11, 2025

View reviewed changes

onnxruntime/core/optimizer/qdq_transformer/weight_bias_quantization.cc Outdated Show resolved Hide resolved

qti-yuduo force-pushed the dev/yuduo/fold-q branch from f363878 to 34ed336 Compare August 11, 2025 21:24

qti-yuduo requested a review from adrianlizarraga August 11, 2025 21:25

HectorSVC reviewed Aug 13, 2025

View reviewed changes

onnxruntime/test/optimizer/qdq_transformer_test.cc Outdated Show resolved Hide resolved

qti-yuduo added 2 commits August 13, 2025 10:18

Relax WeightBiasQuantization constraint to work a larger valid QDQ no…

42f42aa

…de group

Fix compiler warnings

00bcce4

qti-yuduo force-pushed the dev/yuduo/fold-q branch from 34ed336 to 00bcce4 Compare August 13, 2025 17:32

qti-yuduo requested a review from HectorSVC August 13, 2025 18:39

HectorSVC approved these changes Aug 14, 2025

View reviewed changes

HectorSVC merged commit 0ca313b into microsoft:main Aug 14, 2025
86 of 87 checks passed

jywu-msft added the release:1.23.0 label Aug 28, 2025

snnn mentioned this pull request Aug 28, 2025

Cherry-picks for 1.23.0 release #25889

Merged

snnn removed the release:1.23.0 label Aug 30, 2025

qti-yuduo pushed a commit to CodeLinaro/onnxruntime that referenced this pull request Sep 24, 2025

Update op builder and qnn node group after rebase

d8bed47

- pick microsoft#25584 - pick microsoft#25635 - pick microsoft#25673 - pick microsoft#25702 - pick microsoft#25738

qti-yuduo deleted the dev/yuduo/fold-q branch September 24, 2025 18:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Relax WeightBiasQuantization constraint for larger QDQ node group #25673

Relax WeightBiasQuantization constraint for larger QDQ node group #25673

Uh oh!

qti-yuduo commented Aug 6, 2025

Uh oh!

HectorSVC commented Aug 11, 2025

Uh oh!

azure-pipelines bot commented Aug 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HectorSVC commented Aug 12, 2025

Uh oh!

azure-pipelines bot commented Aug 12, 2025

Uh oh!

Uh oh!

HectorSVC commented Aug 13, 2025

Uh oh!

azure-pipelines bot commented Aug 13, 2025

Uh oh!

Uh oh!

snnn commented Aug 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Relax WeightBiasQuantization constraint for larger QDQ node group #25673

Relax WeightBiasQuantization constraint for larger QDQ node group #25673

Uh oh!

Conversation

qti-yuduo commented Aug 6, 2025

Description

Motivation and Context

Uh oh!

HectorSVC commented Aug 11, 2025

Uh oh!

azure-pipelines bot commented Aug 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HectorSVC commented Aug 12, 2025

Uh oh!

azure-pipelines bot commented Aug 12, 2025

Uh oh!

Uh oh!

HectorSVC commented Aug 13, 2025

Uh oh!

azure-pipelines bot commented Aug 13, 2025

Uh oh!

Uh oh!

snnn commented Aug 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants