Bugfix vitisai ep model clone with 23979 25320 #25654

wcy123 · 2025-08-05T03:34:06Z

Description

It is related to #25320 #23979. Enable tensor raw data sharing for externalized tensor proto with kTensorProtoMemoryAddressTag

Motivation and Context

With #25320 #23979, all initialized tensor protos are associated with OrtValue, VitisiAI EP need to adapt to this change.

…ded but not constant. (microsoft#25544) ### Description  In DynamicQuantizeMatMul KleidiAI-specific prepacking logic, handle case where B zero point input is provided but not constant. In this case, we should not prepack. Add some unit tests that test the prepacking code path. Add check for ARM SME instructions in DynamicQuantizeMatMul before calling `MlasDynamicQGemmBatch()` and associated functions. ### Motivation and Context  Follow up to microsoft#25187

### Description ### Motivation and Context Fix the build break on Windows+Ninja

### Description Fixes the packaging pipeline. --------- Co-authored-by: Copilot <[email protected]>

This PR uses the existed RunOption `gpu_graph_id` to control whether to skip the graph capture. When the webgpu ep option `enableGraphCapture` is enabled, in RunOption, gpu_graph_id = -1 means skipping graph capture. Otherwise, go to the graph capture path for each session.run. If gpu_graph_id is not specified in RunOption, it will respect `enableGraphCapture `'s value to see whether to go to graph capture path.

### Description  Refactor to split out classes and make things easier to find. ### Motivation and Context  Cleanup

…ml (microsoft#25552) ### Description Yesterday I updated the machine images. Now they already have python preinstalled. We don't need to do this anymore. Remove the steps to avoid conflicts. Also, refactor the yaml file a little bit. Refactors templates to use parameterized Python versions instead of matrix strategy.

Additional equation support for QNN EP on einsum op.

- **DynamicQuantizeMatMul - handle case where B zero point input is provided but not constant. (microsoft#25544)** - **Refactor plugin EP support (microsoft#25541)** - **Remove the python installation steps from win-qnn-arm64-ci-pipeline.yml (microsoft#25552)**

### Description This change is based on microsoft#25135. Upgrade xnnpack and several related third-party dependencies, including pthreadpool, cpuinfo, and kleidiai. This change also updates the xnnpack execution provider code to accommodate changes in the xnnpack api. Average pooling qu8 is removed as the corresponding microkernel seems no longer exist in xnnpack.

…te (microsoft#25553) This PR fixed webgpu_fix_frame_generator by adding present mode to the surface configuration. This new attribute is required by laste Dawn to rendering frames.

### Description This implements the SwiGLU activation for MoE and qMoE. The activation is corresponding to https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/swiglu.py. Also update test_parity_moe.py to enable test for qMoE in CI pipelines. ### Motivation and Context This is naive implementation of the activation. Since the activation will reduce each row length to half, we cannot directly use epilogue. Current implementations need an extra buffer to run SwiGLU kernel. In the future, we might take a look at other alternatives that does not need extra buffer.

### Description Fixes documentation error in onnxruntime_c_api.h: parameter name mismatch for `Graph_GetGraphView` ### Motivation and Context Fix errors in the GitHub action for generating the C/C++ documentation from public header files.

…me (microsoft#25516) After cherry-picking from win-onnxruntime (microsoft#25481), the MIGraphX EP stopped compiling on the main branch.

### Description This PR enhances unidirectional `FlashAttention` by applying causal masking inside the main loop. This optimization eliminates unnecessary memory loads by avoiding future entries in the KV cache. Testing on Lunar Lake shows up to a 20% performance improvement for `phi-4-mini-accuracy4` (with a prompt of 4096). Similar performance gains were also observed for other models, including `Qwen3-0.6B-accuracy4`. This PR now uses the more readable `unidirectional` attribute instead of `is_gqa`, to control causal masking. ### Motivation and Context See above.

…ttr name (microsoft#25565) ### Description Updates `Node_GetAttributeByName` to return an error status with code `ORT_NOT_FOUND` and set the `attribute` output parameter to `NULL` when called with a non-existing attribute name. Why? Currently, a caller has to do string comparison of the `OrtStatus` error message to determine if the attribute does not exist or if another error occurred. This can be somewhat cumbersome. With this change, the caller can just check the error code. ### Motivation and Context Make it easier to use `Node_GetAttributeByName`.

### Motivation and Context Fix data type check to skip optional I/Os. Optional inputs/outputs would have empty name, and it's valid ONNX syntax. Without this fix, any model with optional inputs/outputs would fail the check due to missing protobuf fields. Without this fix, we'd hit error fetching `elem_type` from protobuf. ``` 2025-07-22 11:14:40.117740035 [I:onnxruntime:, qnn_execution_provider.cc:740 GetSupportedNodes] Validation FAILED for 1 nodes in NodeUnit (Pad) : Operator type: Pad Node name: /blocks.4/Pad Node index: 176 REASON : The tensor doesn't have elem_type. ```

…#25581) [QNN-EP] Einsum equation ReduceSum Multiply on broadcast X

### Description This change fixes the problem when building Dawn monolith library, it failed to pick the correct imported abseil library. Now we separate the triplet files into per-config to avoid this problem.

### Description Update OrtEpFactory in new EPs to add allocator, data transfer and stream stubs. ### Motivation and Context

### Description For f16 uniform variables, use u32 to bit-wise represent them. ### Motivation and Context Some devices supports f16 in shader/storage buffer, but not in uniform buffers. Dawn will set the f16_support to false for them. However, we don't necessarily have to use f16 in uniform. This change together with microsoft#25349 will enable using f16 models on some Android devices.

### Description Add a new `Node_GetTensorAttributeAsOrtValue` API to support attribute that is a `TENSOR` type. This API returns a const OrtValue that represents the TensorProto in the `TENSOR `attribute.

### Description ~~Remove the out-of-dated patch file.~~ Remove the changes of `src/arm/windows/init.c` from the patch file. Changes of `include/cpuinfo.h` is kept.

### Description - Add a GraphTransformer `WhereDummyDq` to insert dummy DequantizeLinear on Where node's initializer input to form a Node Unit when Where node has one DQ and one scalar initializer input - Add corresponding unit test for the optimization ### Motivation and Context - To reduce the additional Dequantize and Quantize nodes, we would like to pass `WhereNodeGroupSelector::Check`.

…icrosoft#25590) ### Description  use session id to track them with LogSessionCreation if we call Run in different threads, we could differentiate them with thread id given Run is not async ### Motivation and Context  --------- Co-authored-by: hualxie <[email protected]>

…#25602) ### Description Add new API to VitisAI to save graph as a string ### Motivation and Context to support in-memory flow --------- Co-authored-by: yifei <[email protected]>

Add support of bfloat16 in MoE and qMoE cuda ops.

@chilo-ms

### Description  !. Disable Turing GPU EP devices ### Motivation and Context  Turing will not be supported in this release @chilo-ms @jywu-msft

### Description - Add unit tests for LPBQ fusions on MatMul and Gemm nodes ### Motivation and Context - This commit is adding Unit tests for avoiding future regressions in LPBQ fusions

### Description We have a big packaging pipeline that build nuget/java/nodejs packages. After that we run these. This PR split the tests to a dedicated pipeline and refactored the code that use maven to download deps instead of using direct HTTP fetch. The new approach allows us to use Azure DevOps artifacts as an internal mirror to meet network isolation requirements. Thsi PR also enabled WebGPU and CoreML EP tests for java package on macOS. This PR also updated tools/python/run_packaging_pipelines.py a little bit to add the support for RC releases. ### Motivation and Context Make the packaging pipelines smaller and easier to use.

microsoft#25589) ### Description Cached opSupportLimits in webnn backend and avoid quering it from lower layer each time to improve the performance. Update the trace event in data transfer. ### Motivation and Context In current implementation, each time calling ensureTensor API to check input/output tensor, MLContext.opSupportLimits API will be called to query support ops capability from chromium and this function call will be the hotspot. Call this API when session is created and then cache it will avoid the frequent lower API call.

…g HTP. (microsoft#25605) ### Description Lower Gemm with 2d bias to FC + ElementwiseAdd when targeting HTP. ### Motivation and Context This change will allow Gemm with 2d bias stays on HTP and not falling back to CPU. --------- Signed-off-by: Mu-Chein Hsu <[email protected]>

it is related to microsoft#25320 microsoft#23979

adrianlizarraga · 2025-08-05T16:07:10Z

Hi @wcy123,
I see two issues with this PR:

Should be created against main (not rel-1.23.0). After it is merged into main, we would cherry-pick into the release branch.
@yuslepukhin is working on a more general fix: Properly remove in-memory references #25652

wcy123 · 2025-11-03T00:43:24Z

yes, we close this PR to follow the ORT dev process.

edgchen1 and others added 30 commits July 25, 2025 21:39

upgrade emsdk to v4.0.11 (microsoft#25477)

b214da5

### Description ### Motivation and Context Fix the build break on Windows+Ninja

[build] Fix the file copy in get_docker_image.py (microsoft#25548)

7c0c29d

### Description Fixes the packaging pipeline. --------- Co-authored-by: Copilot <[email protected]>

[QNN EP] Support more Einsum equation: bhwc,hkc->bhwk (microsoft#25518)

413d38d

Additional equation support for QNN EP on einsum op.

Fix webgpu_pix_frame_generator by adding missing present mode attribu…

38e660c

…te (microsoft#25553) This PR fixed webgpu_fix_frame_generator by adding present mode to the surface configuration. This new attribute is required by laste Dawn to rendering frames.

[MIGraphX EP] Fix compilation after cherry-picking from win-onnxrunti…

87f1499

…me (microsoft#25516) After cherry-picking from win-onnxruntime (microsoft#25481), the MIGraphX EP stopped compiling on the main branch.

[build] upgrade Node.js for NPM packaging pipeline (microsoft#25568)

a89b038

[QNN-EP] Einsum equation ReduceSum Multiply on broadcast X (microsoft…

c22f70d

…#25581) [QNN-EP] Einsum equation ReduceSum Multiply on broadcast X

[build] fix multi-config for VCPKG (microsoft#25585)

b957547

### Description This change fixes the problem when building Dawn monolith library, it failed to pick the correct imported abseil library. Now we separate the triplet files into per-config to avoid this problem.

Update OrtEpFactory in MiGraphX EP (microsoft#25567)

131cf40

### Description Update OrtEpFactory in new EPs to add allocator, data transfer and stream stubs. ### Motivation and Context

[EP ABI] Support for TENSOR type attribute (microsoft#25566)

4a8a289

### Description Add a new `Node_GetTensorAttributeAsOrtValue` API to support attribute that is a `TENSOR` type. This API returns a const OrtValue that represents the TensorProto in the `TENSOR `attribute.

[build] fix build break on arm (microsoft#25601)

f91d24c

### Description ~~Remove the out-of-dated patch file.~~ Remove the changes of `src/arm/windows/init.c` from the patch file. Changes of `include/cpuinfo.h` is kept.

[VitisAI] add new api to VitisAI to save graph as a string (microsoft…

866c7e3

…#25602) ### Description Add new API to VitisAI to save graph as a string ### Motivation and Context to support in-memory flow --------- Co-authored-by: yifei <[email protected]>

[CUDA] BF16 MoE and qMoE (microsoft#25572)

68b9d9b

Add support of bfloat16 in MoE and qMoE cuda ops.

[QNN EP] Add Unit tests for LPBQ Fusions (microsoft#25592)

5c0a7d8

### Description - Add unit tests for LPBQ fusions on MatMul and Gemm nodes ### Motivation and Context - This commit is adding Unit tests for avoiding future regressions in LPBQ fusions

qwu16 and others added 4 commits July 31, 2025 15:46

[build] disable CodeQL for NPM Packaging Pipeline (microsoft#25614)

7b2f667

[VitisAI] bugfix model_clone optimization

1109d03

it is related to microsoft#25320 microsoft#23979

snnn requested a review from devang-ml September 29, 2025 20:55

wcy123 closed this Nov 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bugfix vitisai ep model clone with 23979 25320 #25654

Bugfix vitisai ep model clone with 23979 25320 #25654

Uh oh!

wcy123 commented Aug 5, 2025

Uh oh!

adrianlizarraga commented Aug 5, 2025

Uh oh!

wcy123 commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

23 participants

Bugfix vitisai ep model clone with 23979 25320 #25654

Bugfix vitisai ep model clone with 23979 25320 #25654

Uh oh!

Conversation

wcy123 commented Aug 5, 2025

Description

Motivation and Context

Uh oh!

adrianlizarraga commented Aug 5, 2025

Uh oh!

wcy123 commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

23 participants