Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions include/onnxruntime/core/framework/execution_provider.h
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,13 @@
virtual common::Status Compile(const std::vector<FusedNodeAndGraph>& fused_nodes_and_graphs,
std::vector<NodeComputeInfo>& node_compute_funcs);

// TODO: add documentation comment.

Check warning on line 320 in include/onnxruntime/core/framework/execution_provider.h

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Missing username in TODO; it should look like "// TODO(my_username): Stuff." [readability/todo] [2] Raw Output: include/onnxruntime/core/framework/execution_provider.h:320: Missing username in TODO; it should look like "// TODO(my_username): Stuff." [readability/todo] [2]
virtual common::Status GetCompiledModelCompatibility(const onnxruntime::GraphViewer& /*graph_viewer*/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skottmckay quick heads-up regarding this PR that we're trying to eventually take for 1.23; there's a requirement to ensure apps can eventually figure out if a given compiled model is compatible with the underlying device, which implies a new method for EP implementors to fill in on the ABI (see #25313). @adrianlizarraga and I have been riffing on this- he's got a draft stood up here which I am going to try to carry forward. Wanted to make sure you were aware of this and if you had any questions / concerns. cc: @jywu-msft

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly have a high-level question about what is going to be the input to determine if the model is compatible. Is it only info in the EPContext nodes or would model metadata potentially be involved as well?

If we also want to support a use-case where we can determine validity prior to downloading the model, does that involve the same input or different? I assume for that scenario we'd need to query the OrtEpFactory.

And with that in mind I'm wondering whether there should be some more structure around getting/storing this info in the Compile stage.

e.g. there's a function in OrtEp to return a string for the compatibility info and ORT calls that as part of Compile. ORT stores that in some well-defined place in model metadata. ORT uses that model metadata to query the factory and/or EP at runtime to determine if it can run the model.

The EP controls what is in the info, but we control where it is stored in order to provide a consistent way to check compatibility without necessarily needing the model. The compatibility metadata could be in the catalog to support a check prior to model download.

Copy link
Contributor

@adrastogi adrastogi Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is going to be the input to determine if the model is compatible. Is it only info in the EPContext nodes or would model metadata potentially be involved as well?

This is something the EP would dictate (the thought was that they'd want to at least have the graph as an input into that decision in case there is EPContext info that would factor into it, hence the argument to the API). That said, I am not sure if EP implementors will need more than this for howsoever they are making the determination. An open item is whether we can get more feedback from them on this.

When we were thinking through the design, the thought was that we'd get this ABI change in (given the schedule), and then fast-follow it with changes to the session creation flow where ORT would check if the supplied model is compiled, and if so, ask the relevant EPs to determine compatibility. And then as a possible path forward, we could expose a new flag in the session options for a caller to indicate how they want to handle the case where the model is compiled but suboptimal (e.g., continue with using the model; or bail out of the session so that they can try to recompile, assuming they can get to the original model; @adrianlizarraga also had some ideas on doing this via a callback as well).

If we also want to support a use-case where we can determine validity prior to downloading the model, does that involve the same input or different? I assume for that scenario we'd need to query the OrtEpFactory.

This is basically the insight I gleaned from talking with @jonwis about this the other day... I was originally thinking about this current PR in terms of the scenario where the app has a model in-hand and wants to figure out if it's compatible with the system where their code is running. However, that will not help solve this 'catalog' use-case you're describing where the app wants to pick the right model from a set of options on a server somewhere. I had filed #25312 for trying to solve the catalog scenario, with the thinking that the driver version was enough info to construct that identifier the app could use to pick the appropriate model from the catalog. I later learned that 1) driver version doesn't tell the whole story, and 2) the issue as written is incomplete and does not address the code the app would then write to do the check- that would likely have to be done via a new API in ORT.

I was mentally considering these two use cases as separate things, but it's seeming like they are two sides of the same coin.

I'm wondering whether there should be some more structure around getting/storing this info in the Compile stage.

I think you are ultimately right. I'm relatively new to the ORT codebase, so I'll need to spend some time unpacking the approach you sketched out in the second half of your message. There's also the little wrinkle around schedule physics, and what we can conceivably do in the current milestone vs. what might have to be deferred to later, and how to make sure both parts are aligned. If you have feedback / advice here, please let me know!

e.g. there's a function in OrtEp to return a string for the compatibility info and ORT calls that as part of Compile. ORT stores that in some well-defined place in model metadata.

A string seems like the simplest representation of that compatibility info, so that makes sense. In terms of the well-defined place: I am still learning about the possible places where such data could be written out. I'm seeing the world through EPContext-colored lenses at the moment (i.e., so the operator schema would need to be updated to support this if that's the best place). Are there others that would be more appropriate?

The EP controls what is in the info, but we control where it is stored in order to provide a consistent way to check compatibility without necessarily needing the model.

So maybe this implies this proposed ABI method should be updated to take a string (e.g., const char*) of the compatibility info instead of a graph? For convenience, we would need a separate API for any catalog-use-case applications to call which would leverage the ABI methods- still thinking about what that would look like (I suppose the apps could also get and call the method on the EPs too; the API would better encapsulate that though).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split it into Generate and a Validate functions?

Instead of having GetCompiledModelCompatibility check and return status, we have a something like 'GenerateCompatibiltyInfo' which takes the graph (may also need model metadata or a way to get it via the graph) and a 'ValidateCompatibility'.

When running the compile we call the 'generate' and save that info in say model metadata.
When validating, if we have the info (from model metadata or the catalog) we can pass it into the 'validate' (does not require the model). If we don't have the info we call 'generate' followed by 'validate'.

The validate probably needs to be in the OrtEpFactory as we don't create an EP instance without a model.
The generate would be in OrtEp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going with Scott's proposal (Generate/Validate) ?

Copy link
Contributor

@adrastogi adrastogi Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the plan- I'm continuing to iterate on that (for now, the prior API proposal appears in the code, but it'll be removed before the PR is published).

I have a separate commit from earlier this week where I had pushed some candidate signatures for that (I've since revised the Generate one a bit- I am wondering if it is strictly necessary and the EP could produce the compatibility string just from the graph. Open to feedback on that.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we control where it is stored in order to provide a consistent way to check compatibility

@skottmckay, @adrianlizarraga and I were talking more about this; could the Generate method be avoided altogether by, say, having an agreed-upon convention for the EP to write the compatibility string into any EPContext nodes as part of compilation? That is admittedly a less strong form of consistency but would still provide a single place and simplify the API a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm perfectly fine with the compatibility string being a value in the EPContext node/s. Should there be a specific attribute added to the EPContext schema for this info to formalize the usage more?

Having it in an EPContext node raises a couple of questions if there are multiple. Is the compatibility value in one or all? If one, does it matter which one? If more than one, is the value required to be the same in all?

Out of interest would it be better to have the compatibility string in model metadata instead/as well? I'm thinking of a scenario where you have the model and want to check validity. Reading just the model metadata from the ModelProto would be cheaper/faster as ORT wouldn't need to create an onnxruntime::Model (with Graph and Node instances) to get to the EPContext node to read attributes.

OrtCompiledModelCompatibility& compatibility) {
compatibility = OrtCompiledModelCompatibility_SUPPORT_UNKNOWN;
return common::Status::OK();
}

#endif

void SetLogger(const logging::Logger* logger) {
Expand Down
33 changes: 33 additions & 0 deletions include/onnxruntime/core/session/onnxruntime_ep_c_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,18 @@ typedef enum OrtEpDataLayout {
OrtEpDataLayout_Default = OrtEpDataLayout_NCHW,
} OrtEpDataLayout;

/**
* \brief Enumeration describing the compatibility state of a compiled model.
*
* \since Version 1.23.
*/
typedef enum OrtCompiledModelCompatibility {
OrtCompiledModelCompatibility_SUPPORT_UNKNOWN = 0,
OrtCompiledModelCompatibility_SUPPORTED_OPTIMAL,
OrtCompiledModelCompatibility_SUPPORTED_PREFER_RECOMPILATION,
OrtCompiledModelCompatibility_UNSUPPORTED,
} OrtCompiledModelCompatibility;

/**
* \brief The OrtEp struct provides functions to implement for an execution provider.
* \since Version 1.22.
Expand Down Expand Up @@ -527,6 +539,27 @@ struct OrtEp {
OrtStatus*(ORT_API_CALL* OnRunEnd)(_In_ OrtEp* this_ptr,
_In_ const OrtRunOptions* run_options,
_In_ bool sync_stream);

/** \brief Called by ORT to determine the compatibility of a compiled model with the EP.
*
* The application determines whether a given model compatibility (other than unsupported) should cause
* session-creation to fail.
*
* \param[in] this_ptr The OrtEp instance.
* \param[in] graph The top-level graph for the model.
* \param[out] model_compatibility Output parameter set to the OrtCompiledModelCompatibility enum value that
* describes the compatibility of the model with the EP.
*
* \note Implementation of this function is optional. If not implemented, ORT assumes a result of
* OrtCompiledModelCompatibility_SUPPORT_UNKNOWN.
*
* \snippet{doc} snippets.dox OrtStatus Return Value
*
* \since Version 1.23.
*/
OrtStatus*(ORT_API_CALL* GetCompiledModelCompatibility)(_In_ OrtEp* this_ptr,
_In_ const OrtGraph* graph,
_Out_ OrtCompiledModelCompatibility* model_compatibility);
};

/** \brief The function signature that ORT will call to create OrtEpFactory instances.
Expand Down
20 changes: 20 additions & 0 deletions onnxruntime/core/session/ep_plugin_provider_interfaces.cc
Original file line number Diff line number Diff line change
Expand Up @@ -574,4 +574,24 @@ std::vector<AllocatorPtr> PluginExecutionProvider::CreatePreferredAllocators() {
return allocators;
}

Status PluginExecutionProvider::GetCompiledModelCompatibility(const onnxruntime::GraphViewer& graph_viewer,
OrtCompiledModelCompatibility& compatibility) {
if (ort_ep_->GetCompiledModelCompatibility == nullptr) {
// Plugin EP did not provide an implementation of this function, so we call a default implementation.
return Base::GetCompiledModelCompatibility(graph_viewer, compatibility);
}

// Create EpGraph (extends OrtGraph) for the actual plugin EP to consume.
std::unique_ptr<EpGraph> ep_graph = nullptr;
ORT_RETURN_IF_ERROR(EpGraph::Create(graph_viewer, ep_graph));

// Call EP plugin's OrtEp::GetCompiledModelCompatibility() function.
OrtCompiledModelCompatibility ep_compatibility = OrtCompiledModelCompatibility_SUPPORT_UNKNOWN;
ORT_RETURN_IF_ERROR(ToStatusAndRelease(ort_ep_->GetCompiledModelCompatibility(ort_ep_.get(), ep_graph.get(),
&ep_compatibility)));

compatibility = ep_compatibility;
return Status::OK();
}

} // namespace onnxruntime
4 changes: 4 additions & 0 deletions onnxruntime/core/session/ep_plugin_provider_interfaces.h
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,10 @@
// needed based on matching against allocator_mem_infos_.
std::vector<AllocatorPtr> CreatePreferredAllocators() override;

// TODO: Add documentation comment

Check warning on line 102 in onnxruntime/core/session/ep_plugin_provider_interfaces.h

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Missing username in TODO; it should look like "// TODO(my_username): Stuff." [readability/todo] [2] Raw Output: onnxruntime/core/session/ep_plugin_provider_interfaces.h:102: Missing username in TODO; it should look like "// TODO(my_username): Stuff." [readability/todo] [2]
Status GetCompiledModelCompatibility(const onnxruntime::GraphViewer& graph_viewer,
OrtCompiledModelCompatibility& compatibility) override;

private:
struct FusedNodeState {
FusedNodeState() = default;
Expand Down
38 changes: 38 additions & 0 deletions onnxruntime/test/framework/ep_plugin_provider_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -317,4 +317,42 @@ TEST(PluginExecutionProviderTest, InferOrtDeviceFromDeviceMemoryInfo) {
#endif // !defined(ORT_NO_EXCEPTIONS)
}

TEST(PluginExecutionProviderTest, TestGetCompiledModelCompatibility) {
auto [ep_wrapper, ort_ep_plugin] = test_plugin_ep::MakeTestOrtEp();

// Test default behavior (EP plugin does not provide an implementation of the function).
{
ort_ep_plugin->GetCompiledModelCompatibility = nullptr; // EP plugin doesn't implement. Should get default value.
OrtCompiledModelCompatibility compatibility = OrtCompiledModelCompatibility_SUPPORT_UNKNOWN;

// TODO: Need to load a model into a GraphViewer and pass it to this call.
// See how it is done in onnxruntime/test/ep_graph/test_ep_graph_utils.h.
// We can use the onnxruntime::Model class to load it.

// ASSERT_STATUS_OK(ep_wrapper->GetCompiledModelCompatibility(graph_viewer, compatibility));

ASSERT_EQ(compatibility, OrtCompiledModelCompatibility_SUPPORT_UNKNOWN);
}

// Test an EP plugin that provides a basic implementation that returns SUPPORTED_OPTIMAL.
{
auto get_compiled_model_compat = [](OrtEp* /*this_ptr*/, const OrtGraph* graph,
OrtCompiledModelCompatibility* compatibility) -> ::OrtStatus* {
(void)graph;
*compatibility = OrtCompiledModelCompatibility_SUPPORTED_OPTIMAL;
return nullptr;
};

ort_ep_plugin->GetCompiledModelCompatibility = get_compiled_model_compat;
OrtCompiledModelCompatibility compatibility = OrtCompiledModelCompatibility_SUPPORT_UNKNOWN;

// TODO: Need to load a model into a GraphViewer and pass it to this call.
// See how it is done in onnxruntime/test/ep_graph/test_ep_graph_utils.h.
// We can use the onnxruntime::Model class to load it.

// ASSERT_STATUS_OK(ep_wrapper->GetCompiledModelCompatibility(graph_viewer, compatibility));
// ASSERT_EQ(compatibility, OrtCompiledModelCompatibility_SUPPORTED_OPTIMAL);
(void)compatibility;
}
}
} // namespace onnxruntime::test
Loading