[java] Auto EP and compile model support #25131

Craigacp · 2025-06-22T17:38:30Z

Description

Java API for compile model and EP discovery APIs. Roughly equivalent to the C# version in #24604.

I haven't quite got the CMake configured so the Java tests for the ep registration only run when the ONNX Runtime shared provider support is built, but everything else works. I expect that to be a quick fix, but I'm not sure in what conditions it should be built and how we should handle it so I don't know where/when to plumb it through.

Motivation and Context

API parity for Java.

cmake/onnxruntime_unittests.cmake

java/src/main/java/ai/onnxruntime/OrtSession.java

java/src/main/java/ai/onnxruntime/OrtModelCompilationOptions.java

skottmckay · 2025-06-27T08:43:50Z

I haven't quite got the CMake configured so the Java tests for the ep registration only run when the ONNX Runtime shared provider support is built, but everything else works. I expect that to be a quick fix, but I'm not sure in what conditions it should be built and how we should handle it so I don't know where/when to plumb it through

The C++ unit tests only run in a Windows shared lib build, and also exclude minimal builds

onnxruntime/cmake/onnxruntime_unittests.cmake

Lines 1832 to 1837 in 7a6cef6

    
           # Build library that can be used with RegisterExecutionProviderLibrary and automatic EP selection 
        
           # We need a shared lib build to use that as a dependency for the test library 
        
           # Currently we only have device discovery on Windows so no point building the test app on other platforms. 
        
           if (WIN32 AND onnxruntime_BUILD_SHARED_LIB AND 
        
               NOT CMAKE_SYSTEM_NAME STREQUAL "Emscripten" AND 
        
               NOT onnxruntime_MINIMAL_BUILD)

Craigacp · 2025-06-27T16:28:56Z

I haven't quite got the CMake configured so the Java tests for the ep registration only run when the ONNX Runtime shared provider support is built, but everything else works. I expect that to be a quick fix, but I'm not sure in what conditions it should be built and how we should handle it so I don't know where/when to plumb it through

The C++ unit tests only run in a Windows shared lib build, and also exclude minimal builds

onnxruntime/cmake/onnxruntime_unittests.cmake

Lines 1832 to 1837 in 7a6cef6

# Build library that can be used with RegisterExecutionProviderLibrary and automatic EP selection

# We need a shared lib build to use that as a dependency for the test library

# Currently we only have device discovery on Windows so no point building the test app on other platforms.

if (WIN32 AND onnxruntime_BUILD_SHARED_LIB AND

NOT CMAKE_SYSTEM_NAME STREQUAL "Emscripten" AND

NOT onnxruntime_MINIMAL_BUILD)

Ok, I can lock them off. Should I use conditional compilation in the C binding to make the EP/device listing methods throw OrtException when used on unsupported platforms? It's a bit weird for them to return an empty list.

skottmckay · 2025-06-27T21:51:13Z

Should I use conditional compilation in the C binding to make the EP/device listing methods throw OrtException when used on unsupported platforms? It's a bit weird for them to return an empty list.

We'll be adding device discovery on linux and macos platforms very soon (should be in next release) so probably not necessary.

Craigacp · 2025-06-27T21:57:41Z

Should I use conditional compilation in the C binding to make the EP/device listing methods throw OrtException when used on unsupported platforms? It's a bit weird for them to return an empty list.

We'll be adding device discovery on linux and macos platforms very soon (should be in next release) so probably not necessary.

Ok, can I get cc'd on those PRs so I can enable the tests on those platforms?

skottmckay · 2025-06-27T21:59:21Z

Should I use conditional compilation in the C binding to make the EP/device listing methods throw OrtException when used on unsupported platforms? It's a bit weird for them to return an empty list.

We'll be adding device discovery on linux and macos platforms very soon (should be in next release) so probably not necessary.

Ok, can I get cc'd on those PRs so I can enable the tests on those platforms?

@edgchen1 Who will be adding them

Craigacp · 2025-08-08T22:10:17Z

I've improved the javadoc a little and made the onnxruntime_providers_shared library an unconditional part of the Java build. Previously we only copied that in if an EP required it, but now with the dynamic loading of EPs it needs to always be there.

Craigacp · 2025-08-08T22:12:09Z

The EP tests are still only enabled on Windows, let me know if support has been added for other platforms, it's a quick fix.

Craigacp · 2025-08-19T14:01:42Z

Enabling the shared library copy caused macOS and Android to fail. Not sure why it's not supported on macOS, but I can easily lock off the Android one. I had assumed if macOS was getting EP discovery support that the shared EP lib would be built automatically?

cmake/onnxruntime_java.cmake

Craigacp · 2025-08-22T03:34:35Z

It doesn't look like the 2 failing checks are due to this PR, they seem to be happening well before anything I've changed is touched.

edgchen1 · 2025-08-22T15:54:21Z

/azp run Windows x64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows ARM64 QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Linux QNN CI Pipeline

azure-pipelines · 2025-08-22T15:54:41Z

Azure Pipelines successfully started running 5 pipeline(s).

Craigacp · 2025-08-22T19:04:03Z

Those failures were a javadoc error which I've fixed. Not sure how the other tests didn't catch it.

edgchen1 · 2025-08-22T19:55:54Z

/azp run Windows x64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows ARM64 QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Linux QNN CI Pipeline

azure-pipelines · 2025-08-22T19:56:15Z

Azure Pipelines successfully started running 5 pipeline(s).

Craigacp · 2025-08-29T14:44:28Z

@edgchen1 please can I get this merged in? I'd prefer to make a fresh small PR for the Java bits of #25878 rather than change this one again given we've got all the tests green now. The Java bits of #25878 will need this PR to land to add the OrtEpDevice Java implementation.

java/src/main/java/ai/onnxruntime/OrtEpDevice.java

@skottmckay

* [CPU] Optimize GQA attention bias application for FP16 (microsoft#25871) ### Description When using attention bias input for GQA op with FP16, on the platforms that don't natively support FP16 math a cast to fp32 needs to be performed, and thus a temporary buffer needs to be created to store the fp32 values. The issue is that this temporary buffer was being allocated / deallocated inside of a loop for every token being processed. Refactored the implementation so that the allocation takes place only once. Phi model throughput increased by 15%. * Fixes for DynamicQuantizeMatMul and Attention3D tests (microsoft#25814) ### Description This change fixes correctness issues in two areas that were causing failures in onnxruntime_test_all: - DynamicQuantizeMatMul.WithConstantBInputs - AttentionTest.Attention3DDefault - AttentionTest.Attention3DWithPastAndPresentQkMatmul What was wrong and how it’s fixed 1) DynamicQuantizeMatMul.WithConstantBInputs - Root cause: The Kleidi dynamic quantization GEMM path could be selected even when the B scales contained values such as (zero, negative, or non-finite). That violates kernel assumptions and can lead to incorrect results. - Fix: In `onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc`, we now explicitly validate that all B scales are finite and strictly positive before enabling the Kleidi/MLAS dynamic path. If any scale is invalid, we disable that path. 2) Attention tests (Attention3DDefault, Attention3DWithPastAndPresentQkMatmul) - Root causes in `onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp`: - Incorrect handling of GEMM corner cases for alpha/beta and K==0 (e.g., not respecting C = beta*C when alpha==0 or K==0). - Unnecessary or premature fallbacks for small shapes. - Fixes: - Add early-outs for degenerate sizes: if M==0 or N==0, return handled. - Correctly implement alpha/beta semantics: --------- Signed-off-by: Jonathan Clohessy <[email protected]> * Fix MoE CPP tests (microsoft#25877) This change adds skip test for QMoE CPU tests when running on TensorRT or CUDA EP. In the QMoE kernel there was a memory overwrite bug in the accumulate part, updated that and this fixed the python tests back * [c++] Eliminate dynamic initialization of static Ort::Global<void>::api_ (microsoft#25741) ### Description Delay the call to `OrtGetApiBase()` until the first call to `Ort::GetApi()` so that `OrtGetApiBase()` is typically called after dynamic library loading. ### Motivation and Context When ORT_API_MANUAL_INIT is not defined (which is the default), the static `Ort::Global<void>::api_` has a dynamic initializer that calls `OrtGetApiBase()->GetApi(ORT_API_VERSION)` This dynamic initialization can cause problems when it interacts with other global/static initialization. On Windows in particular, it can also cause deadlocks when used in a dynamic library if OrtGetApiBase()->GetApi() attempts to load any other libraries. * Replace the templated `Global<void>::api_` with an inline static initialized to nullptr. * `Ort::GetApi()` now calls `detail::Global::GetApi()` which calls `detail::Global::DefaultInit()` if initialization is needed. * When `ORT_API_MANUAL_INIT` is defined, `DefaultInit()` returns nullptr, which will eventually cause the program to crash. The callers have violated the initialization contract by not calling one of the `Ort::InitApi` overloads. * When `ORT_API_MANUAL_INIT` is not defined, `DefaultInit()` uses a function-level static to compute the result of `OrtGetApiBase()->GetApi(ORT_API_VERSION)` once and return it. * `Ort::Global<void>` has been replaced with a non-templated type and moved inside a `detail` namespace. Since the `Global<void>` object was documented as being used internally, it is believed that these changes here are non-breaking, as they do not impact a public API. The public APIs, `Ort::InitApi()` and `Ort::InitApi(const OrtApi*)` remain unchanged. * Add `#pragma detect_mismatch` to surface issues with compilation units that disagree on how ORT_API_MANUAL_INIT is defined. (MSVC only.) --------- Co-authored-by: Copilot <[email protected]> * python GPU IO Bindings for NVIDIA (microsoft#25776) ### Description  1. A Small change to use the shared allocator in Python binding. 2. Remove the FP64 support from the EP. ### Motivation and Context  The Python GPU IO binding is necessary for performance. The change will enable the shared allocator for GPU allocation. The FP64 was using the FP32 inference—aligned WRT TRT RTX support. --------- Co-authored-by: Gaurav Garg <[email protected]> * [CANN] Add a `enable_cann_subgraph` feature parameter (microsoft#25867) ### Description Add a `enable_cann_subgraph` feature parameter. this parameter controls whether graph splitting is performed and can help quickly identify issues in certain scenarios. * [EP ABI] Add OpAttr_GetTensorAttributeAsOrtValue and replace the existing Node_GetTensorAttributeAsOrtValue (microsoft#25886) ### Description Replace `Node_GetTensorAttributeAsOrtValue` with `OpAttr_GetTensorAttributeAsOrtValue`. Change the API signature to make it one of the `OpAttr` interfaces instead of the `OrtNode` interface. The original API was added [here](microsoft#25566). * Language bindings for model compatibility API (microsoft#25878) ### Description This change builds on top of microsoft#25841 , and adds the scaffolding necessary to call into this API from C++ / C# / Python. ### Motivation and Context microsoft#25454 talks more about the broader notion of precompiled model compatibility. This change is directed at app developers whose apps may want to determine if a particular precompiled model (e.g. on a server somewhere) is compatible with the device where the application is running. There is functionality in `OrtEpFactory` for making this determination, which was exposed as a C API in microsoft#25841, and this change makes the API more broadly available in other languages. ### Testing and Validation Introduced new unit test cases across each language, and verified that the API was being called and returned the correct result for the default CPU EP. --------- Co-authored-by: Aditya Rastogi <[email protected]> * [QNN-EP] Introduce Level1 Transformer into qnn.preprocess (microsoft#25883) ### Description - Introduce Level1 Transformer into qnn.preprocess to support various optimizations. ### Motivation and Context - This change brings in several useful optimizations such as `ConvBnFusion` and `ConstantFolding`, which are part of `TransformerLevel::Level1` and can benefit QNNEP. - The goal is to optimize the ONNX model before quantization by integrating these passes into the Python tooling workflow. * [QNN EP] Minor fix weight name missing when not valid QDQ node group (microsoft#25887) ### Description Minor fix weight name missing when not valid QDQ node group ### Motivation and Context Some quantized model failed QDQ node group validation, the weights then won't be folded as initializer. QNN EP failed to handle the dynamic weights here due to the transpose op input name look up. This change make sure we process the weights tensor before adding transposes. * Add custom ops library_path to EP metadata (microsoft#25830) ## Summary Adds EP metadata library path support to enable custom ops DLL registration with proper path resolution. ## Changes - Added `library_path` metadata key to EP metadata infrastructure - Pass resolved library path directly to `EpLibraryProviderBridge` constructor - Simplified implementation per reviewer feedback (removed virtual method complexity) - Added `#include <utility>` for std::move compliance ## Purpose Enables downstream applications (like onnxruntime-genai) to resolve relative custom ops library paths using EP metadata, improving DLL registration reliability. ## Files Modified - `plugin_ep/ep_factory_provider_bridge.h` - `plugin_ep/ep_library.h` - `plugin_ep/ep_library_plugin.h` - `plugin_ep/ep_library_provider_bridge.cc` - `plugin_ep/ep_library_provider_bridge.h` - `utils.cc` * [OVEP] OpenVINO EP Features and bug-fixes for ORT-1.23 (microsoft#25884) ### Description This update introduces multiple improvements, fixes, and feature enhancements to the OpenVINO Execution Provider (OVEP) and related components in ONNX Runtime: #### Configuration & Properties - Updated load_config mapping to act as a passthrough to OpenVINO properties. - Added support for providing layout information to inputs/outputs in OpenVINO. #### Inference & Tensor Handling - Improved OVInferRequest::SetTensor to correctly handle cached binding shape mismatches. - Added support for self-detecting on-the-fly bfloat16 → float16 conversion. - Fixed issues with input ONNX models when used with shared execution contexts. #### Model Handling & Operator Support - Fixed model copying behavior for QDQ stripping. - Updated operator support status for OpenVINO 2025.2. #### Platform & Integration Fixes - Applied multiple PSU Lora fixes and related updates. - Resolved filename confusion issues with wrapped OVIRs in EPCtx. - Enabled memory-mapped native binaries for OpenVINO 2025.3. #### Quality & Maintenance - Addressed linting issues. - Fixed coverage gaps in OVEP. - Added a new test script for OpenVINO with ORT ABI integration. --------- Co-authored-by: Ankit Maheshkar <[email protected]> Co-authored-by: Ryan Metcalfe <[email protected]> Co-authored-by: Klimenko, Mikhail <[email protected]> Co-authored-by: sfatimar <[email protected]> Co-authored-by: Garth Long <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: MayureshV1 <[email protected]> Co-authored-by: Eric Crawford <[email protected]> Co-authored-by: jatinwadhwa921 <[email protected]> Co-authored-by: Vishnudas Thaniel S <[email protected]> Co-authored-by: Javier Martinez <[email protected]> * [java] Auto EP and compile model support (microsoft#25131) ### Description Java API for compile model and EP discovery APIs. Roughly equivalent to the C# version in microsoft#24604. cc: @skottmckay. I haven't quite got the CMake configured so the Java tests for the ep registration only run when the ONNX Runtime shared provider support is built, but everything else works. I expect that to be a quick fix, but I'm not sure in what conditions it should be built and how we should handle it so I don't know where/when to plumb it through. ### Motivation and Context API parity for Java. * Add error handling to extract_nuget_files.ps1 (microsoft#25866) ### Description 1. Check process exit code when running 7z.exe . Currently the errors were silently ignored. 2. Add snld20 flag to the 7z.exe commands, which is needed to be compatible with the latest 7z release. * [Fix] illegal memory access in GetInputIndices with optional inputs (microsoft#25881) ### Description Fix illegal memory access in GetInputIndices with optional inputs ### Motivation and Context When an input is optional, its ValueInfo may be nullptr. The current implementation directly calls InputValueInfo->GetName(), leading to illegal memory access. Update logic to skip optional inputs when valueInfo is nullptr . * Re-enable cpuinfo for ARM64EC (microsoft#25863) ### Description  Re-enable cpuinfo for ARM64EC build and fix `CPUIDINFO_ARCH_ARM` so it is actually used. Patch cpuinfo to support vcpkg ARM64EC build. See pytorch/cpuinfo#324. ### Motivation and Context  Fix for workaround in microsoft#25831. --------- Signed-off-by: Jonathan Clohessy <[email protected]> Co-authored-by: derdeljan-msft <[email protected]> Co-authored-by: Jonathan Clohessy <[email protected]> Co-authored-by: Akshay Sonawane <[email protected]> Co-authored-by: Christopher Warrington <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Ishwar Raut <[email protected]> Co-authored-by: Gaurav Garg <[email protected]> Co-authored-by: Xinpeng Dou <[email protected]> Co-authored-by: Chi Lo <[email protected]> Co-authored-by: adrastogi <[email protected]> Co-authored-by: Aditya Rastogi <[email protected]> Co-authored-by: qti-hungjuiw <[email protected]> Co-authored-by: qti-yuduo <[email protected]> Co-authored-by: Pradeep Sakhamoori <[email protected]> Co-authored-by: Preetha Veeramalai <[email protected]> Co-authored-by: Ankit Maheshkar <[email protected]> Co-authored-by: Ryan Metcalfe <[email protected]> Co-authored-by: Klimenko, Mikhail <[email protected]> Co-authored-by: sfatimar <[email protected]> Co-authored-by: Garth Long <[email protected]> Co-authored-by: MayureshV1 <[email protected]> Co-authored-by: Eric Crawford <[email protected]> Co-authored-by: jatinwadhwa921 <[email protected]> Co-authored-by: Vishnudas Thaniel S <[email protected]> Co-authored-by: Javier Martinez <[email protected]> Co-authored-by: Adam Pocock <[email protected]> Co-authored-by: Changming Sun <[email protected]> Co-authored-by: mingyue <[email protected]> Co-authored-by: Edward Chen <[email protected]>

@skottmckay

### Description Java API for compile model and EP discovery APIs. Roughly equivalent to the C# version in microsoft#24604. cc: @skottmckay. I haven't quite got the CMake configured so the Java tests for the ep registration only run when the ONNX Runtime shared provider support is built, but everything else works. I expect that to be a quick fix, but I'm not sure in what conditions it should be built and how we should handle it so I don't know where/when to plumb it through. ### Motivation and Context API parity for Java.

…evices (#26028) ### Description Adds the Java bits mirroring #25878, and renames a few things in #25131 for uniformity with the other APIs. ### Motivation and Context Java API parity.

Craigacp added 5 commits June 10, 2025 17:32

Java bits for EP selection and compile API.

e2770f5

Building out JNI for OrtHardwareDevice and OrtEpDevice.

5bc9491

Finished OrtModelCompilationOptions.

197d245

Simplifying the getEpDevices JNI.

d553c4f

Fixing EP detection on Windows.

1f8c5d5

skottmckay reviewed Jun 27, 2025

View reviewed changes

cmake/onnxruntime_unittests.cmake Show resolved Hide resolved

java/src/main/java/ai/onnxruntime/OrtSession.java Show resolved Hide resolved

skottmckay reviewed Jun 27, 2025

View reviewed changes

java/src/main/java/ai/onnxruntime/OrtModelCompilationOptions.java Show resolved Hide resolved

Fixing shared lib copying and adding more Javadoc.

acea9f1

Fix spotless errors.

fd5044c

edgchen1 reviewed Aug 19, 2025

View reviewed changes

cmake/onnxruntime_java.cmake Outdated Show resolved Hide resolved

Guarding the shared library copy in Java.

252fdfb

Fixing javadoc.

d8bfbed

Craigacp mentioned this pull request Aug 28, 2025

Language bindings for model compatibility API #25878

Merged

edgchen1 reviewed Aug 29, 2025

View reviewed changes

java/src/main/java/ai/onnxruntime/OrtEpDevice.java Show resolved Hide resolved

edgchen1 approved these changes Aug 29, 2025

View reviewed changes

edgchen1 merged commit c9bdbd7 into microsoft:main Aug 29, 2025
86 checks passed

Craigacp deleted the java-auto-ep branch August 29, 2025 19:41

Craigacp mentioned this pull request Sep 12, 2025

[Java] Add OrtCompiledModelCompatibility and minor updates for OrtEpDevices #26028

Merged

[java] Auto EP and compile model support #25131

[java] Auto EP and compile model support #25131

Uh oh!

Conversation

Craigacp commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skottmckay commented Jun 27, 2025

Uh oh!

Craigacp commented Jun 27, 2025

Uh oh!

skottmckay commented Jun 27, 2025

Uh oh!

Craigacp commented Jun 27, 2025

Uh oh!

skottmckay commented Jun 27, 2025

Uh oh!

Craigacp commented Aug 8, 2025

Uh oh!

Craigacp commented Aug 8, 2025

Uh oh!

Craigacp commented Aug 19, 2025

Uh oh!

Uh oh!

Craigacp commented Aug 22, 2025

Uh oh!

edgchen1 commented Aug 22, 2025

Uh oh!

azure-pipelines bot commented Aug 22, 2025

Uh oh!

Craigacp commented Aug 22, 2025

Uh oh!

edgchen1 commented Aug 22, 2025

Uh oh!

azure-pipelines bot commented Aug 22, 2025

Uh oh!

Craigacp commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Craigacp commented Jun 22, 2025 •

edited

Loading

Craigacp commented Aug 29, 2025 •

edited

Loading