-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Bugfix vitisai ep model clone with 23979 25320 #25654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
wcy123
wants to merge
34
commits into
microsoft:rel-1.23.0
from
wcy123:bugfix-vitisai-ep-model-clone-with-23979-25320
Closed
Bugfix vitisai ep model clone with 23979 25320 #25654
wcy123
wants to merge
34
commits into
microsoft:rel-1.23.0
from
wcy123:bugfix-vitisai-ep-model-clone-with-23979-25320
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ded but not constant. (microsoft#25544) ### Description <!-- Describe your changes. --> In DynamicQuantizeMatMul KleidiAI-specific prepacking logic, handle case where B zero point input is provided but not constant. In this case, we should not prepack. Add some unit tests that test the prepacking code path. Add check for ARM SME instructions in DynamicQuantizeMatMul before calling `MlasDynamicQGemmBatch()` and associated functions. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Follow up to microsoft#25187
### Description ### Motivation and Context Fix the build break on Windows+Ninja
### Description Fixes the packaging pipeline. --------- Co-authored-by: Copilot <[email protected]>
This PR uses the existed RunOption `gpu_graph_id` to control whether to skip the graph capture. When the webgpu ep option `enableGraphCapture` is enabled, in RunOption, gpu_graph_id = -1 means skipping graph capture. Otherwise, go to the graph capture path for each session.run. If gpu_graph_id is not specified in RunOption, it will respect `enableGraphCapture `'s value to see whether to go to graph capture path.
### Description <!-- Describe your changes. --> Refactor to split out classes and make things easier to find. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Cleanup
…ml (microsoft#25552) ### Description Yesterday I updated the machine images. Now they already have python preinstalled. We don't need to do this anymore. Remove the steps to avoid conflicts. Also, refactor the yaml file a little bit. Refactors templates to use parameterized Python versions instead of matrix strategy.
Additional equation support for QNN EP on einsum op.
- **DynamicQuantizeMatMul - handle case where B zero point input is provided but not constant. (microsoft#25544)** - **Refactor plugin EP support (microsoft#25541)** - **Remove the python installation steps from win-qnn-arm64-ci-pipeline.yml (microsoft#25552)**
### Description This change is based on microsoft#25135. Upgrade xnnpack and several related third-party dependencies, including pthreadpool, cpuinfo, and kleidiai. This change also updates the xnnpack execution provider code to accommodate changes in the xnnpack api. Average pooling qu8 is removed as the corresponding microkernel seems no longer exist in xnnpack.
…te (microsoft#25553) This PR fixed webgpu_fix_frame_generator by adding present mode to the surface configuration. This new attribute is required by laste Dawn to rendering frames.
### Description This implements the SwiGLU activation for MoE and qMoE. The activation is corresponding to https://github.com/triton-lang/triton/blob/main/python/triton_kernels/triton_kernels/swiglu.py. Also update test_parity_moe.py to enable test for qMoE in CI pipelines. ### Motivation and Context This is naive implementation of the activation. Since the activation will reduce each row length to half, we cannot directly use epilogue. Current implementations need an extra buffer to run SwiGLU kernel. In the future, we might take a look at other alternatives that does not need extra buffer.
### Description Fixes documentation error in onnxruntime_c_api.h: parameter name mismatch for `Graph_GetGraphView` ### Motivation and Context Fix errors in the GitHub action for generating the C/C++ documentation from public header files.
…me (microsoft#25516) After cherry-picking from win-onnxruntime (microsoft#25481), the MIGraphX EP stopped compiling on the main branch.
### Description This PR enhances unidirectional `FlashAttention` by applying causal masking inside the main loop. This optimization eliminates unnecessary memory loads by avoiding future entries in the KV cache. Testing on Lunar Lake shows up to a 20% performance improvement for `phi-4-mini-accuracy4` (with a prompt of 4096). Similar performance gains were also observed for other models, including `Qwen3-0.6B-accuracy4`. This PR now uses the more readable `unidirectional` attribute instead of `is_gqa`, to control causal masking. ### Motivation and Context See above.
…ttr name (microsoft#25565) ### Description Updates `Node_GetAttributeByName` to return an error status with code `ORT_NOT_FOUND` and set the `attribute` output parameter to `NULL` when called with a non-existing attribute name. Why? Currently, a caller has to do string comparison of the `OrtStatus` error message to determine if the attribute does not exist or if another error occurred. This can be somewhat cumbersome. With this change, the caller can just check the error code. ### Motivation and Context Make it easier to use `Node_GetAttributeByName`.
### Motivation and Context Fix data type check to skip optional I/Os. Optional inputs/outputs would have empty name, and it's valid ONNX syntax. Without this fix, any model with optional inputs/outputs would fail the check due to missing protobuf fields. Without this fix, we'd hit error fetching `elem_type` from protobuf. ``` 2025-07-22 11:14:40.117740035 [I:onnxruntime:, qnn_execution_provider.cc:740 GetSupportedNodes] Validation FAILED for 1 nodes in NodeUnit (Pad) : Operator type: Pad Node name: /blocks.4/Pad Node index: 176 REASON : The tensor doesn't have elem_type. ```
…#25581) [QNN-EP] Einsum equation ReduceSum Multiply on broadcast X
### Description This change fixes the problem when building Dawn monolith library, it failed to pick the correct imported abseil library. Now we separate the triplet files into per-config to avoid this problem.
### Description Update OrtEpFactory in new EPs to add allocator, data transfer and stream stubs. ### Motivation and Context
### Description For f16 uniform variables, use u32 to bit-wise represent them. ### Motivation and Context Some devices supports f16 in shader/storage buffer, but not in uniform buffers. Dawn will set the f16_support to false for them. However, we don't necessarily have to use f16 in uniform. This change together with microsoft#25349 will enable using f16 models on some Android devices.
### Description Add a new `Node_GetTensorAttributeAsOrtValue` API to support attribute that is a `TENSOR` type. This API returns a const OrtValue that represents the TensorProto in the `TENSOR `attribute.
### Description ~~Remove the out-of-dated patch file.~~ Remove the changes of `src/arm/windows/init.c` from the patch file. Changes of `include/cpuinfo.h` is kept.
### Description - Add a GraphTransformer `WhereDummyDq` to insert dummy DequantizeLinear on Where node's initializer input to form a Node Unit when Where node has one DQ and one scalar initializer input - Add corresponding unit test for the optimization ### Motivation and Context - To reduce the additional Dequantize and Quantize nodes, we would like to pass `WhereNodeGroupSelector::Check`.
…icrosoft#25590) ### Description <!-- Describe your changes. --> use session id to track them with LogSessionCreation if we call Run in different threads, we could differentiate them with thread id given Run is not async ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: hualxie <[email protected]>
…#25602) ### Description Add new API to VitisAI to save graph as a string ### Motivation and Context to support in-memory flow --------- Co-authored-by: yifei <[email protected]>
Add support of bfloat16 in MoE and qMoE cuda ops.
### Description <!-- Describe your changes. --> !. Disable Turing GPU EP devices ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Turing will not be supported in this release @chilo-ms @jywu-msft
### Description - Add unit tests for LPBQ fusions on MatMul and Gemm nodes ### Motivation and Context - This commit is adding Unit tests for avoiding future regressions in LPBQ fusions
### Description We have a big packaging pipeline that build nuget/java/nodejs packages. After that we run these. This PR split the tests to a dedicated pipeline and refactored the code that use maven to download deps instead of using direct HTTP fetch. The new approach allows us to use Azure DevOps artifacts as an internal mirror to meet network isolation requirements. Thsi PR also enabled WebGPU and CoreML EP tests for java package on macOS. This PR also updated tools/python/run_packaging_pipelines.py a little bit to add the support for RC releases. ### Motivation and Context Make the packaging pipelines smaller and easier to use.
microsoft#25589) ### Description Cached opSupportLimits in webnn backend and avoid quering it from lower layer each time to improve the performance. Update the trace event in data transfer. ### Motivation and Context In current implementation, each time calling ensureTensor API to check input/output tensor, MLContext.opSupportLimits API will be called to query support ops capability from chromium and this function call will be the hotspot. Call this API when session is created and then cache it will avoid the frequent lower API call.
…g HTP. (microsoft#25605) ### Description Lower Gemm with 2d bias to FC + ElementwiseAdd when targeting HTP. ### Motivation and Context This change will allow Gemm with 2d bias stays on HTP and not falling back to CPU. --------- Signed-off-by: Mu-Chein Hsu <[email protected]>
it is related to microsoft#25320 microsoft#23979
Contributor
|
Hi @wcy123,
|
Contributor
Author
|
yes, we close this PR to follow the ORT dev process. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
It is related to #25320 #23979. Enable tensor raw data sharing for externalized tensor proto with kTensorProtoMemoryAddressTag
Motivation and Context
With #25320 #23979, all initialized tensor protos are associated with OrtValue, VitisiAI EP need to adapt to this change.