Skip to content

Conversation

@wcy123
Copy link
Contributor

@wcy123 wcy123 commented Aug 5, 2025

Description

It is related to #25320 #23979. Enable tensor raw data sharing for externalized tensor proto with kTensorProtoMemoryAddressTag

Motivation and Context

With #25320 #23979, all initialized tensor protos are associated with OrtValue, VitisiAI EP need to adapt to this change.

edgchen1 and others added 30 commits July 25, 2025 21:39
…ded but not constant. (microsoft#25544)

### Description
<!-- Describe your changes. -->

In DynamicQuantizeMatMul KleidiAI-specific prepacking logic, handle case
where B zero point input is provided but not constant. In this case, we
should not prepack.

Add some unit tests that test the prepacking code path.

Add check for ARM SME instructions in DynamicQuantizeMatMul before
calling `MlasDynamicQGemmBatch()` and associated functions.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Follow up to microsoft#25187
### Description

### Motivation and Context

Fix the build break on Windows+Ninja
### Description

Fixes the packaging pipeline.

---------

Co-authored-by: Copilot <[email protected]>
This PR uses the existed RunOption `gpu_graph_id` to control whether to
skip the graph capture. When the webgpu ep option `enableGraphCapture`
is enabled, in RunOption, gpu_graph_id = -1 means skipping graph
capture. Otherwise, go to the graph capture path for each session.run.
If gpu_graph_id is not specified in RunOption, it will respect
`enableGraphCapture `'s value to see whether to go to graph capture
path.
### Description
<!-- Describe your changes. -->
Refactor to split out classes and make things easier to find. 

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Cleanup
…ml (microsoft#25552)

### Description
Yesterday I updated the machine images. Now they already have python
preinstalled. We don't need to do this anymore.
Remove the steps to avoid conflicts.
Also, refactor the yaml file a little bit. Refactors templates to use
parameterized Python versions instead of matrix strategy.
Additional equation support for QNN EP on einsum op.
- **DynamicQuantizeMatMul - handle case where B zero point input is
provided but not constant. (microsoft#25544)**
- **Refactor plugin EP support (microsoft#25541)**
- **Remove the python installation steps from
win-qnn-arm64-ci-pipeline.yml (microsoft#25552)**
### Description
This change is based on microsoft#25135.

Upgrade xnnpack and several related third-party dependencies, including
pthreadpool, cpuinfo, and kleidiai. This change also updates the xnnpack
execution provider code to accommodate changes in the xnnpack api.
Average pooling qu8 is removed as the corresponding microkernel seems no
longer exist in xnnpack.
…te (microsoft#25553)

This PR fixed webgpu_fix_frame_generator by adding present mode to the
surface configuration. This new attribute is required by laste Dawn to
rendering frames.
### Description

This implements the SwiGLU activation for MoE and qMoE. The activation
is corresponding to
https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/swiglu.py.

Also update test_parity_moe.py to enable test for qMoE in CI pipelines.

### Motivation and Context

This is naive implementation of the activation. Since the activation
will reduce each row length to half, we cannot directly use epilogue.
Current implementations need an extra buffer to run SwiGLU kernel.

In the future, we might take a look at other alternatives that does not
need extra buffer.
### Description
Fixes documentation error in onnxruntime_c_api.h: parameter name
mismatch for `Graph_GetGraphView`



### Motivation and Context
Fix errors in the GitHub action for generating the C/C++ documentation
from public header files.
…me (microsoft#25516)

After cherry-picking from win-onnxruntime (microsoft#25481), the MIGraphX EP
stopped compiling on the main branch.
### Description
This PR enhances unidirectional `FlashAttention` by applying causal
masking inside the main loop. This optimization eliminates unnecessary
memory loads by avoiding future entries in the KV cache.

Testing on Lunar Lake shows up to a 20% performance improvement for
`phi-4-mini-accuracy4` (with a prompt of 4096). Similar performance
gains were also observed for other models, including
`Qwen3-0.6B-accuracy4`.

This PR now uses the more readable `unidirectional` attribute instead of
`is_gqa`, to control causal masking.

### Motivation and Context
See above.
…ttr name (microsoft#25565)

### Description
Updates `Node_GetAttributeByName` to return an error status with code
`ORT_NOT_FOUND` and set the `attribute` output parameter to `NULL` when
called with a non-existing attribute name.

Why? Currently, a caller has to do string comparison of the `OrtStatus`
error message to determine if the attribute does not exist or if another
error occurred. This can be somewhat cumbersome. With this change, the
caller can just check the error code.

### Motivation and Context
Make it easier to use `Node_GetAttributeByName`.
### Motivation and Context
Fix data type check to skip optional I/Os.

Optional inputs/outputs would have empty name, and it's valid ONNX syntax. Without this fix, any model with optional inputs/outputs would fail the check due to missing protobuf fields.


Without this fix, we'd hit error fetching `elem_type` from protobuf.
```
2025-07-22 11:14:40.117740035 [I:onnxruntime:, qnn_execution_provider.cc:740 GetSupportedNodes] Validation FAILED for 1 nodes in NodeUnit (Pad) :
	Operator type: Pad Node name: /blocks.4/Pad Node index: 176 	REASON : The tensor doesn't have elem_type.
```
…#25581)

[QNN-EP] Einsum equation ReduceSum Multiply on broadcast X
### Description

This change fixes the problem when building Dawn monolith library, it
failed to pick the correct imported abseil library.

Now we separate the triplet files into per-config to avoid this problem.
### Description
Update OrtEpFactory in new EPs to add allocator, data transfer and
stream stubs.

### Motivation and Context
### Description

For f16 uniform variables, use u32 to bit-wise represent them.

### Motivation and Context

Some devices supports f16 in shader/storage buffer, but not in uniform
buffers. Dawn will set the f16_support to false for them. However, we
don't necessarily have to use f16 in uniform.

This change together with microsoft#25349 will enable using f16 models on some
Android devices.
### Description
Add a new `Node_GetTensorAttributeAsOrtValue` API to support attribute
that is a `TENSOR` type.
This API returns a const OrtValue that represents the TensorProto in the
`TENSOR `attribute.
### Description

~~Remove the out-of-dated patch file.~~

Remove the changes of `src/arm/windows/init.c` from the patch file.
Changes of `include/cpuinfo.h` is kept.
### Description
- Add a GraphTransformer `WhereDummyDq` to insert dummy DequantizeLinear on Where node's initializer input to form a Node Unit when Where node has one DQ and one scalar initializer input
- Add corresponding unit test for the optimization

### Motivation and Context
- To reduce the additional Dequantize and Quantize nodes, we would like to pass `WhereNodeGroupSelector::Check`.
…icrosoft#25590)

### Description
<!-- Describe your changes. -->

use session id to track them with LogSessionCreation

if we call Run in different threads, we could differentiate them with
thread id given Run is not async

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: hualxie <[email protected]>
…#25602)

### Description
Add new API to VitisAI to save graph as a string

### Motivation and Context
to support in-memory flow

---------

Co-authored-by: yifei <[email protected]>
Add support of bfloat16 in MoE and qMoE cuda ops.
### Description
<!-- Describe your changes. -->

!. Disable Turing GPU EP devices 

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Turing will not be supported in this release

@chilo-ms @jywu-msft
### Description
 - Add unit tests for LPBQ fusions on MatMul and Gemm nodes

### Motivation and Context
- This commit is adding Unit tests for avoiding future regressions in LPBQ fusions
### Description
We have a big packaging pipeline that build nuget/java/nodejs packages.
After that we run these. This PR split the tests to a dedicated pipeline
and refactored the code that use maven to download deps instead of using
direct HTTP fetch. The new approach allows us to use Azure DevOps
artifacts as an internal mirror to meet network isolation requirements.
Thsi PR also enabled WebGPU and CoreML EP tests for java package on macOS.

This PR also updated tools/python/run_packaging_pipelines.py a little
bit to add the support for RC releases.

### Motivation and Context
Make the packaging pipelines smaller and easier to use.
qwu16 and others added 4 commits July 31, 2025 15:46
microsoft#25589)

### Description
Cached opSupportLimits in webnn backend and avoid quering it from lower
layer each time to improve the performance. Update the trace event in
data transfer.



### Motivation and Context
In current implementation, each time calling ensureTensor API to check
input/output tensor, MLContext.opSupportLimits API will be called to
query support ops capability from chromium and this function call will
be the hotspot. Call this API when session is created and then cache it
will avoid the frequent lower API call.
…g HTP. (microsoft#25605)

### Description
Lower Gemm with 2d bias to FC + ElementwiseAdd when targeting HTP.

### Motivation and Context
This change will allow Gemm with 2d bias stays on HTP and not falling back to CPU.

---------

Signed-off-by: Mu-Chein Hsu <[email protected]>
@adrianlizarraga
Copy link
Contributor

Hi @wcy123,
I see two issues with this PR:

@snnn snnn requested a review from devang-ml September 29, 2025 20:55
@wcy123
Copy link
Contributor Author

wcy123 commented Nov 3, 2025

yes, we close this PR to follow the ORT dev process.

@wcy123 wcy123 closed this Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.