-
Notifications
You must be signed in to change notification settings - Fork 3.5k
ORT 1.23.0 cherry-pick prs 25592 - 25831 #25805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
adrianlizarraga
merged 20 commits into
rel-1.23.0
from
adrianl/1.23.0/cherrypick-08202025
Aug 25, 2025
Merged
ORT 1.23.0 cherry-pick prs 25592 - 25831 #25805
adrianlizarraga
merged 20 commits into
rel-1.23.0
from
adrianl/1.23.0/cherrypick-08202025
Aug 25, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Description - Add unit tests for LPBQ fusions on MatMul and Gemm nodes ### Motivation and Context - This commit is adding Unit tests for avoiding future regressions in LPBQ fusions
### Description - Pre-allocated memory for HTP context params list during context creation when VTCM backup buffer sharing is enabled. This is done to avoid memory-based issues due to vector resizing/re-allocation. - Handle case where new binary contexts need to be processed
### Description Updates MSFT Azure pipelines to QAIRT 2.37.0 ### Motivation and Context Regular uplevel
### Description Disable two tests that were broken on X Elite by upgrading to QNN 2.37.0
### Description <!-- Describe your changes. --> The clearing of shared_allocators_ invalidates all entries in shared_ort_allocators_. Remove unused shared_arena_allocators_. That became unnecessary by providing EPs an example implementation for an OrtAllocator based stream-aware arena that they can use directly. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix access violation (swallowed as it happens during shutdown) in dtor.
### Description
Fix GatherBlockQuantized shape inference test
### Motivation and Context
In GatherBlockQuantized op contrib_defs, we have shape inference test
```
for (int i = 0; i < r; ++i) {
if (!data_shape.dim(i).has_dim_value() ||
!scales_shape.dim(i).has_dim_value() ||
(i == quantize_axis && (data_shape.dim(i).dim_value() * components + block_size - 1) / block_size != scales_shape.dim(i).dim_value()) ||
(i != quantize_axis && data_shape.dim(i).dim_value() != scales_shape.dim(i).dim_value())) {
fail_shape_inference("data shape and scales shape do not match");
}
}
```
This code is introduced last year. However, when I try to share weight
for the phi-4-mini-instruct model
<img width="233" height="494" alt="image"
src="https://github.com/user-attachments/assets/9c220543-0b81-4867-bcd1-1b7aa49e20cd"
/>
I need to have a reshape operator into GatherBlockQuantized. The shape
inference of Reshape is not from the initializer directly, but from the
Concat which need to do some constant folding. Therefore, at the first
sweep of shape inference, `data_shape.dim(i).has_dim_value()` is
`False`, which will fail shape inference and the model cannot work.
Therefore, When we want to check shape inference, we need to only check
when `data_shape.dim(i).has_dim_value()=True`, same for `scales_shape`.
### Description - Change the output data type of the last node from int64/uint64 to int32/uint32, then a Cast op is added to convert the output tensor from int32/uint32 to int64/uint64. ### Motivation and Context - Currently we only add the cast op (int32->int64) when the input name contains "_cast_int32", but the input name may not have this string because it can follow the data type of the previous node. In this case, the input data type of the op is int32, and the output data type of the op is int64, causing an error. - Unit test - https://github.com/microsoft/onnxruntime/blob/4b1838b29608f5a19c0997971fd83bee6732ee56/onnxruntime/test/providers/qnn/reshape_expand_op_test.cc#L242
### Description <!-- Describe your changes. --> See the title ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Make traditional EPs (non plug-in) access OrtValue initializers. Re: #25747
### Description Update Qnn default version to 2.37.1.250807
### Description If an option appears multiple times: Unlike `getopt` just returns it again in the parsing loop, `Abseil` processes them in order, and the last one wins (overwrites earlier values). This PR fixes the bug for `-f` free dimension override by name and `-F `free dimension override by denotation. see #25714
### Description <!-- Describe your changes. --> Add some device discovery support for non-Windows platforms. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> More device discovery support.
### Description Add a new API `Graph_GetModelMetadata` ### Motivation and Context VitisAI EP would convert ONNX IR to another IR which is suitable for AMD AI compilers. The metadata in a OrtModel contains many important infomation produced by other tools, e.g. Olive. This API potentially used by many other execution providers which need to access the same information.
jywu-msft
previously approved these changes
Aug 21, 2025
HectorSVC
previously approved these changes
Aug 21, 2025
chilo-ms
previously approved these changes
Aug 21, 2025
edgchen1
previously approved these changes
Aug 21, 2025
yuslepukhin
previously approved these changes
Aug 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
### Description In PoolOpBuilder, - Revise the check to exploit ORT macros. - Fix invoking the function for 5D cases. ### Motivation and Context Refer to #25778. Pool builder incorrectly invokes a function calculating 4D shape in 5D input, which originally expects 3D cases only. However, the check used assert to validate the shape, which did not work in Release nor RelWithDebInfo builds.
* Implements `GetEPContextNodes()` * Enables usage of `AddExternalInitializersFromFilesInMemory` for models that have to be communicated as byte stream but are larger than 2GB * Add EP context unit tests for file, bytestreams and both embed modes NOTE: For large models > 2GB, `embed_mode=0` must be used. `embed_mode=1` fails due to protobuf limitations --------- Co-authored-by: Maximilian Müller <[email protected]>
) This reconfiguration is done to NOT allocate tensors with an exact matching size. If that strategy is used a tensor will always trigger an allocation in the arena and not reuse memory since the memory size has to exactly match. This became a big problem with ORT GenAI since the arena grew constantly when prompting with different prompt lengths. No arena shrinkage was triggered to return older tensors. @skottmckay I am happy to be educated of a better usage of the allocators. Issues with this: Since the arena is not used for workspace allocations anymore (using reserve) it will likely not be possible in the future to allocate on a stream and immediately free memory after an enqueue call. That could have enabled workspace sharing in a multi model pipeline very nicely. @chilo-ms can you help merge this.
### Description <!-- Describe your changes. --> This PR provides C++ interfaces for the following: Env ==== CopyTensors() CreateSharedAllocator GetSharedAllocator ReleaseSharedAllocator CreateAndRegisterAllocatorV2 RegisterAllocator UnregisterAllocator EpDevice ====== EpDevice_MemoryInfo CreateSyncStreamForEpDevice MemoryInfo ======== CreateMemoryInfo_V2 MemoryInfoGetName MemoryInfoGetId MemoryInfoGetMemType MemoryInfoGetType MemoryInfoGetDeviceMemType MemoryInfoGetVendorId Session ========== SessionGetInputName SessionGetOutputName SessionGetMemoryInfoForInputs SessionGetMemoryInfoForOutputs SessionGetEpDeviceForInputs SyncStream =========== SyncStream_GetHandle ReleaseSyncStream OrtArenaCfg =========== CreateArenaCfgV2 TRT === CreateTensorRTProviderOptions and V2 UpdateTensorRTProviderOptions SessionOptions ============== OrtSessionOptionsAppendExecutionProvider_CPU Prepacked container ============= CUDA Options V2 =========== OrtCUDAProviderOptionsV2 CreateCUDAProviderOptions GetCUDAProviderOptionsByName UpdateCUDAProviderOptionsWithValue UpdateCUDAProviderOptions GetCUDAProviderOptionsAsString ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Provide a way to write exception safe code.
…y info (#25749) ### Description This pull request introduces a new mechanism for validating compiled model compatibility with execution providers (EPs) in ONNX Runtime. It adds infrastructure for EPs to generate and store compatibility information in model metadata, and for the runtime to enforce compatibility checks during session initialization. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The APIs proposed in this PR address two requirements: 1. Apps that have an already pre-compiled model on device need a way to determine if the pre-compiled app is still valid (given the EPs / drivers / etc. on the system). 2. Apps may have many different pre-compiled versions of a model stored on a remote server, and want to figure out which of those models they should download for the device where they are running. ### Testing Validated that the new suite of tests passes cleanly. Created a private build of this ORT and the AMD Vitis EP. I stepped through the core logic (the EP doesn't have this support wired up as yet so there is no compatibility info written out) and for regression purposes, confirmed I could compile and run inferences through ResNet. --------- Co-authored-by: Aditya Rastogi <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description <!-- Describe your changes. --> Disable cpuinfo for ARM64EC builds. There's an error when linking to cpuinfo built for ARM64EC when using `--use_vckpg`. This issue was exposed by a recent change (#25228) but cpuinfo was actually not being used before for ARM64EC. The macros here don't properly account for ARM64EC: https://github.com/microsoft/onnxruntime/blob/e6d3e085cb0bb96da7c3458b97316ecca234b37a/onnxruntime/core/common/cpuid_arch_definition.h#L8-L14 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix a packaging pipeline failure. Revert to the old behavior of not calling cpuinfo from the CPUIDInfo ctor for ARM64EC. This PR is just a workaround. The cpuinfo link issue needs more investigation.
dbf253e
jywu-msft
previously approved these changes
Aug 23, 2025
…ntime version 18.5. (#25844) ### Description <!-- Describe your changes. --> Update mac.yml iphone_simulator job - use Xcode 16.4 and simulator runtime version 18.5. Changes from these PRs: #25752 #25794 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix CI build failure.
HectorSVC
approved these changes
Aug 25, 2025
snnn
approved these changes
Aug 25, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Cherry-pick the following PRs into the
rel-1.23.0branch:Graph_GetModelMetadata#25768Motivation and Context