Skip to content

Conversation

quic-tirupath and others added 12 commits August 20, 2025 21:39
### Description
 - Add unit tests for LPBQ fusions on MatMul and Gemm nodes

### Motivation and Context
- This commit is adding Unit tests for avoiding future regressions in LPBQ fusions
### Description
- Pre-allocated memory for HTP context params list during context creation when VTCM backup buffer sharing is enabled. This is done to avoid memory-based issues due to vector resizing/re-allocation.
 - Handle case where new binary contexts need to be processed
### Description
Updates MSFT Azure pipelines to QAIRT 2.37.0

### Motivation and Context
Regular uplevel
### Description

Disable two tests that were broken on X Elite by upgrading to QNN 2.37.0
### Description
<!-- Describe your changes. -->
The clearing of shared_allocators_ invalidates all entries in
shared_ort_allocators_.

Remove unused shared_arena_allocators_. That became unnecessary by
providing EPs an example implementation for an OrtAllocator based
stream-aware arena that they can use directly.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix access violation (swallowed as it happens during shutdown) in dtor.
### Description
Fix GatherBlockQuantized shape inference test



### Motivation and Context
In GatherBlockQuantized op contrib_defs, we have shape inference test
```
        for (int i = 0; i < r; ++i) {
          if (!data_shape.dim(i).has_dim_value() ||
              !scales_shape.dim(i).has_dim_value() ||
              (i == quantize_axis && (data_shape.dim(i).dim_value() * components + block_size - 1) / block_size != scales_shape.dim(i).dim_value()) ||
              (i != quantize_axis && data_shape.dim(i).dim_value() != scales_shape.dim(i).dim_value())) {
            fail_shape_inference("data shape and scales shape do not match");
          }
        }
```
This code is introduced last year. However, when I try to share weight
for the phi-4-mini-instruct model
<img width="233" height="494" alt="image"
src="https://github.com/user-attachments/assets/9c220543-0b81-4867-bcd1-1b7aa49e20cd"
/>
I need to have a reshape operator into GatherBlockQuantized. The shape
inference of Reshape is not from the initializer directly, but from the
Concat which need to do some constant folding. Therefore, at the first
sweep of shape inference, `data_shape.dim(i).has_dim_value()` is
`False`, which will fail shape inference and the model cannot work.
Therefore, When we want to check shape inference, we need to only check
when `data_shape.dim(i).has_dim_value()=True`, same for `scales_shape`.
### Description
- Change the output data type of the last node from int64/uint64 to int32/uint32, then a Cast op is added to convert the output tensor from int32/uint32 to int64/uint64.

### Motivation and Context
- Currently we only add the cast op (int32->int64) when the input name contains "_cast_int32", but the input name may not have this string because it can follow the data type of the previous node. In this case, the input data type of the op is int32, and the output data type of the op is int64, causing an error.

- Unit test
- https://github.com/microsoft/onnxruntime/blob/4b1838b29608f5a19c0997971fd83bee6732ee56/onnxruntime/test/providers/qnn/reshape_expand_op_test.cc#L242
### Description
<!-- Describe your changes. -->
See the title


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Make traditional EPs (non plug-in) access OrtValue initializers.

Re: #25747
### Description
Update Qnn default version to 2.37.1.250807
### Description
If an option appears multiple times:
Unlike `getopt` just returns it again in the parsing loop, `Abseil`
processes them in order, and the last one wins (overwrites earlier
values).

This PR fixes the bug for `-f` free dimension override by name and `-F
`free dimension override by denotation.
see #25714
### Description
<!-- Describe your changes. -->

Add some device discovery support for non-Windows platforms.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

More device discovery support.
### Description

Add a new API `Graph_GetModelMetadata`

### Motivation and Context
VitisAI EP would convert ONNX IR to another IR which is suitable for AMD
AI compilers.
The metadata in a OrtModel contains many important infomation produced
by other tools, e.g. Olive.

This API potentially used by many other execution providers which need
to access the same information.
jywu-msft
jywu-msft previously approved these changes Aug 21, 2025
HectorSVC
HectorSVC previously approved these changes Aug 21, 2025
chilo-ms
chilo-ms previously approved these changes Aug 21, 2025
edgchen1
edgchen1 previously approved these changes Aug 21, 2025
yuslepukhin
yuslepukhin previously approved these changes Aug 21, 2025
Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

minfhong-qti and others added 7 commits August 22, 2025 16:08
### Description
In PoolOpBuilder, 
- Revise the check to exploit ORT macros.
- Fix invoking the function for 5D cases.

### Motivation and Context
Refer to #25778.
Pool builder incorrectly invokes a function calculating 4D shape in 5D input, which originally expects 3D cases only. However, the check used assert to validate the shape, which did not work in Release nor RelWithDebInfo builds.
* Implements `GetEPContextNodes()`
* Enables usage of `AddExternalInitializersFromFilesInMemory` for models
that have to be communicated as byte stream but are larger than 2GB
* Add EP context unit tests for file, bytestreams and both embed modes

NOTE: For large models > 2GB, `embed_mode=0` must be used.
`embed_mode=1` fails due to protobuf limitations

---------

Co-authored-by: Maximilian Müller <[email protected]>
)

This reconfiguration is done to NOT allocate tensors with an exact
matching size. If that strategy is used a tensor will always trigger an
allocation in the arena and not reuse memory since the memory size has
to exactly match.
This became a big problem with ORT GenAI since the arena grew constantly
when prompting with different prompt lengths. No arena shrinkage was
triggered to return older tensors. @skottmckay I am happy to be educated
of a better usage of the allocators.

Issues with this: 
Since the arena is not used for workspace allocations anymore (using
reserve) it will likely not be possible in the future to allocate on a
stream and immediately free memory after an enqueue call. That could
have enabled workspace sharing in a multi model pipeline very nicely.

@chilo-ms can you help merge this.
### Description
<!-- Describe your changes. -->
This PR provides C++ interfaces for the following:

Env
====
CopyTensors()

CreateSharedAllocator
GetSharedAllocator
ReleaseSharedAllocator
CreateAndRegisterAllocatorV2

RegisterAllocator
UnregisterAllocator

EpDevice
======
EpDevice_MemoryInfo
CreateSyncStreamForEpDevice

MemoryInfo
========
CreateMemoryInfo_V2
MemoryInfoGetName 
MemoryInfoGetId 
MemoryInfoGetMemType
MemoryInfoGetType
MemoryInfoGetDeviceMemType
MemoryInfoGetVendorId

Session
==========
SessionGetInputName
SessionGetOutputName

SessionGetMemoryInfoForInputs
SessionGetMemoryInfoForOutputs
SessionGetEpDeviceForInputs

SyncStream
===========
SyncStream_GetHandle
ReleaseSyncStream

OrtArenaCfg
===========
CreateArenaCfgV2

TRT
===
CreateTensorRTProviderOptions and V2
UpdateTensorRTProviderOptions

SessionOptions
==============
OrtSessionOptionsAppendExecutionProvider_CPU

Prepacked container
=============

CUDA Options V2
===========
OrtCUDAProviderOptionsV2
CreateCUDAProviderOptions

GetCUDAProviderOptionsByName
UpdateCUDAProviderOptionsWithValue
UpdateCUDAProviderOptions
GetCUDAProviderOptionsAsString

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Provide a way to write exception safe code.
…y info (#25749)

### Description
This pull request introduces a new mechanism for validating compiled
model compatibility with execution providers (EPs) in ONNX Runtime. It
adds infrastructure for EPs to generate and store compatibility
information in model metadata, and for the runtime to enforce
compatibility checks during session initialization.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
The APIs proposed in this PR address two requirements:

1. Apps that have an already pre-compiled model on device need a way to
determine if the pre-compiled app is still valid (given the EPs /
drivers / etc. on the system).
2. Apps may have many different pre-compiled versions of a model stored
on a remote server, and want to figure out which of those models they
should download for the device where they are running.

### Testing
Validated that the new suite of tests passes cleanly. 
Created a private build of this ORT and the AMD Vitis EP. I stepped
through the core logic (the EP doesn't have this support wired up as yet
so there is no compatibility info written out) and for regression
purposes, confirmed I could compile and run inferences through ResNet.

---------

Co-authored-by: Aditya Rastogi <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
<!-- Describe your changes. -->

Disable cpuinfo for ARM64EC builds. There's an error when linking to
cpuinfo built for ARM64EC when using `--use_vckpg`.

This issue was exposed by a recent change (#25228) but cpuinfo was
actually not being used before for ARM64EC. The macros here don't
properly account for ARM64EC:

https://github.com/microsoft/onnxruntime/blob/e6d3e085cb0bb96da7c3458b97316ecca234b37a/onnxruntime/core/common/cpuid_arch_definition.h#L8-L14

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix a packaging pipeline failure. Revert to the old behavior of not
calling cpuinfo from the CPUIDInfo ctor for ARM64EC.

This PR is just a workaround. The cpuinfo link issue needs more
investigation.
@adrianlizarraga adrianlizarraga changed the title ORT 1.23.0 cherry-pick prs 25592 - 25768 ORT 1.23.0 cherry-pick prs 25592 - 25831 Aug 22, 2025
jywu-msft
jywu-msft previously approved these changes Aug 23, 2025
…ntime version 18.5. (#25844)

### Description
<!-- Describe your changes. -->

Update mac.yml iphone_simulator job - use Xcode 16.4 and simulator
runtime version 18.5.

Changes from these PRs:
#25752
#25794

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix CI build failure.
@adrianlizarraga adrianlizarraga merged commit ad45432 into rel-1.23.0 Aug 25, 2025
80 checks passed
@adrianlizarraga adrianlizarraga deleted the adrianl/1.23.0/cherrypick-08202025 branch August 25, 2025 23:45
@snnn snnn mentioned this pull request Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.