ORT 1.23.0 cherry-pick prs 25592 - 25831 #25805

adrianlizarraga · 2025-08-21T04:47:27Z

Description

Cherry-pick the following PRs into the rel-1.23.0 branch:

Motivation and Context

### Description - Add unit tests for LPBQ fusions on MatMul and Gemm nodes ### Motivation and Context - This commit is adding Unit tests for avoiding future regressions in LPBQ fusions

### Description - Pre-allocated memory for HTP context params list during context creation when VTCM backup buffer sharing is enabled. This is done to avoid memory-based issues due to vector resizing/re-allocation. - Handle case where new binary contexts need to be processed

### Description Updates MSFT Azure pipelines to QAIRT 2.37.0 ### Motivation and Context Regular uplevel

### Description Disable two tests that were broken on X Elite by upgrading to QNN 2.37.0

### Description  The clearing of shared_allocators_ invalidates all entries in shared_ort_allocators_. Remove unused shared_arena_allocators_. That became unnecessary by providing EPs an example implementation for an OrtAllocator based stream-aware arena that they can use directly. ### Motivation and Context  Fix access violation (swallowed as it happens during shutdown) in dtor.

### Description Fix GatherBlockQuantized shape inference test ### Motivation and Context In GatherBlockQuantized op contrib_defs, we have shape inference test ``` for (int i = 0; i < r; ++i) { if (!data_shape.dim(i).has_dim_value() || !scales_shape.dim(i).has_dim_value() || (i == quantize_axis && (data_shape.dim(i).dim_value() * components + block_size - 1) / block_size != scales_shape.dim(i).dim_value()) || (i != quantize_axis && data_shape.dim(i).dim_value() != scales_shape.dim(i).dim_value())) { fail_shape_inference("data shape and scales shape do not match"); } } ``` This code is introduced last year. However, when I try to share weight for the phi-4-mini-instruct model <img width="233" height="494" alt="image" src="https://github.com/user-attachments/assets/9c220543-0b81-4867-bcd1-1b7aa49e20cd" /> I need to have a reshape operator into GatherBlockQuantized. The shape inference of Reshape is not from the initializer directly, but from the Concat which need to do some constant folding. Therefore, at the first sweep of shape inference, `data_shape.dim(i).has_dim_value()` is `False`, which will fail shape inference and the model cannot work. Therefore, When we want to check shape inference, we need to only check when `data_shape.dim(i).has_dim_value()=True`, same for `scales_shape`.

### Description - Change the output data type of the last node from int64/uint64 to int32/uint32, then a Cast op is added to convert the output tensor from int32/uint32 to int64/uint64. ### Motivation and Context - Currently we only add the cast op (int32->int64) when the input name contains "_cast_int32", but the input name may not have this string because it can follow the data type of the previous node. In this case, the input data type of the op is int32, and the output data type of the op is int64, causing an error. - Unit test - https://github.com/microsoft/onnxruntime/blob/4b1838b29608f5a19c0997971fd83bee6732ee56/onnxruntime/test/providers/qnn/reshape_expand_op_test.cc#L242

### Description  See the title ### Motivation and Context  Make traditional EPs (non plug-in) access OrtValue initializers. Re: #25747

### Description Update Qnn default version to 2.37.1.250807

### Description If an option appears multiple times: Unlike `getopt` just returns it again in the parsing loop, `Abseil` processes them in order, and the last one wins (overwrites earlier values). This PR fixes the bug for `-f` free dimension override by name and `-F `free dimension override by denotation. see #25714

### Description  Add some device discovery support for non-Windows platforms. ### Motivation and Context  More device discovery support.

### Description Add a new API `Graph_GetModelMetadata` ### Motivation and Context VitisAI EP would convert ONNX IR to another IR which is suitable for AMD AI compilers. The metadata in a OrtModel contains many important infomation produced by other tools, e.g. Olive. This API potentially used by many other execution providers which need to access the same information.

yuslepukhin

### Description In PoolOpBuilder, - Revise the check to exploit ORT macros. - Fix invoking the function for 5D cases. ### Motivation and Context Refer to #25778. Pool builder incorrectly invokes a function calculating 4D shape in 5D input, which originally expects 3D cases only. However, the check used assert to validate the shape, which did not work in Release nor RelWithDebInfo builds.

* Implements `GetEPContextNodes()` * Enables usage of `AddExternalInitializersFromFilesInMemory` for models that have to be communicated as byte stream but are larger than 2GB * Add EP context unit tests for file, bytestreams and both embed modes NOTE: For large models > 2GB, `embed_mode=0` must be used. `embed_mode=1` fails due to protobuf limitations --------- Co-authored-by: Maximilian Müller <[email protected]>

@skottmckay

) This reconfiguration is done to NOT allocate tensors with an exact matching size. If that strategy is used a tensor will always trigger an allocation in the arena and not reuse memory since the memory size has to exactly match. This became a big problem with ORT GenAI since the arena grew constantly when prompting with different prompt lengths. No arena shrinkage was triggered to return older tensors. @skottmckay I am happy to be educated of a better usage of the allocators. Issues with this: Since the arena is not used for workspace allocations anymore (using reserve) it will likely not be possible in the future to allocate on a stream and immediately free memory after an enqueue call. That could have enabled workspace sharing in a multi model pipeline very nicely. @chilo-ms can you help merge this.

### Description  This PR provides C++ interfaces for the following: Env ==== CopyTensors() CreateSharedAllocator GetSharedAllocator ReleaseSharedAllocator CreateAndRegisterAllocatorV2 RegisterAllocator UnregisterAllocator EpDevice ====== EpDevice_MemoryInfo CreateSyncStreamForEpDevice MemoryInfo ======== CreateMemoryInfo_V2 MemoryInfoGetName MemoryInfoGetId MemoryInfoGetMemType MemoryInfoGetType MemoryInfoGetDeviceMemType MemoryInfoGetVendorId Session ========== SessionGetInputName SessionGetOutputName SessionGetMemoryInfoForInputs SessionGetMemoryInfoForOutputs SessionGetEpDeviceForInputs SyncStream =========== SyncStream_GetHandle ReleaseSyncStream OrtArenaCfg =========== CreateArenaCfgV2 TRT === CreateTensorRTProviderOptions and V2 UpdateTensorRTProviderOptions SessionOptions ============== OrtSessionOptionsAppendExecutionProvider_CPU Prepacked container ============= CUDA Options V2 =========== OrtCUDAProviderOptionsV2 CreateCUDAProviderOptions GetCUDAProviderOptionsByName UpdateCUDAProviderOptionsWithValue UpdateCUDAProviderOptions GetCUDAProviderOptionsAsString ### Motivation and Context  Provide a way to write exception safe code.

…y info (#25749) ### Description This pull request introduces a new mechanism for validating compiled model compatibility with execution providers (EPs) in ONNX Runtime. It adds infrastructure for EPs to generate and store compatibility information in model metadata, and for the runtime to enforce compatibility checks during session initialization. ### Motivation and Context  The APIs proposed in this PR address two requirements: 1. Apps that have an already pre-compiled model on device need a way to determine if the pre-compiled app is still valid (given the EPs / drivers / etc. on the system). 2. Apps may have many different pre-compiled versions of a model stored on a remote server, and want to figure out which of those models they should download for the device where they are running. ### Testing Validated that the new suite of tests passes cleanly. Created a private build of this ORT and the AMD Vitis EP. I stepped through the core logic (the EP doesn't have this support wired up as yet so there is no compatibility info written out) and for regression purposes, confirmed I could compile and run inferences through ResNet. --------- Co-authored-by: Aditya Rastogi <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

### Description  Disable cpuinfo for ARM64EC builds. There's an error when linking to cpuinfo built for ARM64EC when using `--use_vckpg`. This issue was exposed by a recent change (#25228) but cpuinfo was actually not being used before for ARM64EC. The macros here don't properly account for ARM64EC: https://github.com/microsoft/onnxruntime/blob/e6d3e085cb0bb96da7c3458b97316ecca234b37a/onnxruntime/core/common/cpuid_arch_definition.h#L8-L14 ### Motivation and Context  Fix a packaging pipeline failure. Revert to the old behavior of not calling cpuinfo from the CPUIDInfo ctor for ARM64EC. This PR is just a workaround. The cpuinfo link issue needs more investigation.

…ntime version 18.5. (#25844) ### Description  Update mac.yml iphone_simulator job - use Xcode 16.4 and simulator runtime version 18.5. Changes from these PRs: #25752 #25794 ### Motivation and Context  Fix CI build failure.

quic-tirupath and others added 12 commits August 20, 2025 21:39

[QNN EP] Add Unit tests for LPBQ Fusions (#25592)

c7e3f63

### Description - Add unit tests for LPBQ fusions on MatMul and Gemm nodes ### Motivation and Context - This commit is adding Unit tests for avoiding future regressions in LPBQ fusions

Update QAIRT to 2.37.0 (#25688)

eb95592

### Description Updates MSFT Azure pipelines to QAIRT 2.37.0 ### Motivation and Context Regular uplevel

[QNN EP] Disable tests broken by QNN 2.37 (#25729)

86c1eae

### Description Disable two tests that were broken on X Elite by upgrading to QNN 2.37.0

[QNN EP] Upgrade QNN to 2.37.1 (#25751)

d8effa6

### Description Update Qnn default version to 2.37.1.250807

adrianlizarraga requested review from HectorSVC, chilo-ms, edgchen1, jywu-msft, snnn and yuslepukhin August 21, 2025 04:47

jywu-msft previously approved these changes Aug 21, 2025

View reviewed changes

HectorSVC previously approved these changes Aug 21, 2025

View reviewed changes

chilo-ms previously approved these changes Aug 21, 2025

View reviewed changes

edgchen1 previously approved these changes Aug 21, 2025

View reviewed changes

yuslepukhin previously approved these changes Aug 21, 2025

View reviewed changes

minfhong-qti and others added 7 commits August 22, 2025 16:08

Add patch file for cpuinfo's vcpkg port (#25818)

d156d0d

adrianlizarraga dismissed stale reviews from yuslepukhin, edgchen1, chilo-ms, HectorSVC, and jywu-msft via dbf253e August 22, 2025 23:16

adrianlizarraga changed the title ~~ORT 1.23.0 cherry-pick prs 25592 - 25768~~ ORT 1.23.0 cherry-pick prs 25592 - 25831 Aug 22, 2025

jywu-msft previously approved these changes Aug 23, 2025

View reviewed changes

adrianlizarraga dismissed jywu-msft’s stale review via 9190140 August 25, 2025 19:40

HectorSVC approved these changes Aug 25, 2025

View reviewed changes

snnn approved these changes Aug 25, 2025

View reviewed changes

adrianlizarraga merged commit ad45432 into rel-1.23.0 Aug 25, 2025
80 checks passed

adrianlizarraga deleted the adrianl/1.23.0/cherrypick-08202025 branch August 25, 2025 23:45

snnn mentioned this pull request Sep 16, 2025

users/snnn/cr #26058

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ORT 1.23.0 cherry-pick prs 25592 - 25831 #25805

ORT 1.23.0 cherry-pick prs 25592 - 25831 #25805

Uh oh!

adrianlizarraga commented Aug 21, 2025 •

edited

Loading

Uh oh!

yuslepukhin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

ORT 1.23.0 cherry-pick prs 25592 - 25831 #25805

ORT 1.23.0 cherry-pick prs 25592 - 25831 #25805

Uh oh!

Conversation

adrianlizarraga commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

adrianlizarraga commented Aug 21, 2025 •

edited

Loading