Skip to content

Conversation

fs-eire and others added 8 commits August 8, 2025 09:52
### Description

For f16 uniform variables, use u32 to bit-wise represent them.

### Motivation and Context

Some devices supports f16 in shader/storage buffer, but not in uniform
buffers. Dawn will set the f16_support to false for them. However, we
don't necessarily have to use f16 in uniform.

This change together with #25349 will enable using f16 models on some
Android devices.
### Description
<!-- Describe your changes. -->

!. Disable Turing GPU EP devices 

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Turing will not be supported in this release

@chilo-ms @jywu-msft
This currently holds 2 major improvements:
- dynamic shape models should have much lower memory usage and in
addition to that the management is move towards ORT allocators
- the overhead for shape binding and address updates is reduce per
inference

---------

Co-authored-by: Gaurav Garg <[email protected]>
…ataTransfer registered by a plugin EP in the Environment (#25346)

### Description
<!-- Describe your changes. -->
Add ability to get shared allocator from env.
Add ability to create a MemCpyFunc using the IDataTransfer from the
environment.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Add support for onnxruntime_perf_test to register plugin EP dll and run
plugin EP.

As support for plugin execution providers (EPs) requires additional
options and most single-character options have already been used,
multi-character options are now necessary to ensure clarity and
readability. Therefore, support for `Abseil flags` is added, which
enables multi-character options and provides cross-platform
compatibility.


**New options:**

- `--plugin_ep_libs [registration names and libraries]` Specifies a list
of plugin execution provider (EP) registration names and their
corresponding shared libraries to register.
[Usage]: `--plugin_ep_libs "plugin_ep_name_1|plugin_ep_1.dll
plugin_ep_name_2|plugin_ep_2.dll ... "`

  
- `--plugin_eps [Plugin EPs]` Specifies a semicolon-separated list of
plugin execution providers (EPs) to use.
      [Usage]: `--plugin_eps "plugin_ep_1;plugin_ep_2;... "`

- `--plugin_ep_options [EP options]` Specifies provider options for each
EP listed in --plugin_eps. Options (key-value pairs) for each EP are
separated by space and EPs are separated by semicolons.
      [Usage]:
`--plugin_ep_options "ep_1_option_1_key|ep_1_option_1_value
...;ep_2_option_1_key|ep_2_option_1_value ...;..."` or
`--plugin_ep_options ";ep_2_option_1_key|ep_2_option_1_value ...;..."`
or
`--plugin_ep_options "ep_1_option_1_key|ep_1_option_1_value
...;;ep_3_option_1_key|ep_3_option_1_value ...;..."`

- `--list_ep_devices` Prints all available device indices and their
properties (including metadata). This option makes the program exit
early without performing inference.

- ` --select_ep_devices [list of device indices]` A semicolon-separated
list of device indices to add to the session and run with.

**Usage:**

1. Use `--plugin_ep_libs` and `--list_ep_devices` to list all the
devices.

````sh
--list_ep_devices --plugin_ep_libs "TensorRTEp|C:\TensorRTEp.dll example_ep|C:\example_plugin_ep.dll"
````
   It will print the devices info
````
===== EP device id 0 ======
name: CPUExecutionProvider
vendor: Microsoft
metadata:
  version: 1.23.0

===== EP device id 1 ======
name: example_ep
vendor: Contoso
metadata:
  supported_devices: CrackGriffin 7+
  version: 0.1.0

===== EP device id 2 ======
name: TensorRTEp
vendor: Nvidia
metadata:
  gpu_type: data center
  version: 0.1.0
````

2. Use `--select_ep_devices` to select the device by index. And add
`--plugin_eps` to specify the EP name. The EP name should match the name
when ep library passes in to create the ep factory.

````sh
--plugin_ep_libs "TensorRTEp|C:\TensorRTEp.dll" --select_ep_devices 2 --plugin_eps TensorRTEp -r 1 C:\mul_op\mul_1.onnx
````

3. Or simply use `-e` to specify the EP name. ORT perf test will add all
the devices created by the plugin EP.
The EP name should match the name when ep library passes in to create
the ep factory.

````sh
--plugin_ep_libs "TensorRTEp|C:\TensorRTEp.dll" --plugin_eps TensorRTEp -r 1 C:\mul_op\mul_1.onnx
````
### Description
<!-- Describe your changes. -->
Relax restriction on DML EP so other CPU based EPs can be used. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#25504
### Description
<!-- Describe your changes. -->
Update number of mel bins for whisper model as it differs based on the
Whisper model version. Otherwise I am unable to run Whisper v3 models as
the num_mel_bins is 128 for that.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Currently unable to run Whisper v3 models as the num_mel_bins is 128 and
it is fixed to 80 right now and causes issue during preprocessing.
<!-- Describe your changes. -->
Make Node::ToProto() const
call Graph::ToGraphProto() const so it does process all of the
  subgraphs recursively and removes all the in-memory references.

<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Node::ToProto() const does not respect constness of subgraph attributes
and calls non-const version of Graph::ToGraphProto() which does not
process
subgraph initializers and does not remove in-memory references.
Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@adrianlizarraga adrianlizarraga marked this pull request as ready for review August 8, 2025 19:46
@adrianlizarraga adrianlizarraga merged commit 5cc8dd2 into rel-1.23.0 Aug 8, 2025
80 checks passed
@adrianlizarraga adrianlizarraga deleted the adrianl/rel-1.23.0/cherrypick_080825 branch August 8, 2025 21:22
@snnn snnn mentioned this pull request Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants