-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[EP ABI] Get EP compiled model compatibility #25331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| std::vector<NodeComputeInfo>& node_compute_funcs); | ||
|
|
||
| // TODO: add documentation comment. | ||
| virtual common::Status GetCompiledModelCompatibility(const onnxruntime::GraphViewer& /*graph_viewer*/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skottmckay quick heads-up regarding this PR that we're trying to eventually take for 1.23; there's a requirement to ensure apps can eventually figure out if a given compiled model is compatible with the underlying device, which implies a new method for EP implementors to fill in on the ABI (see #25313). @adrianlizarraga and I have been riffing on this- he's got a draft stood up here which I am going to try to carry forward. Wanted to make sure you were aware of this and if you had any questions / concerns. cc: @jywu-msft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly have a high-level question about what is going to be the input to determine if the model is compatible. Is it only info in the EPContext nodes or would model metadata potentially be involved as well?
If we also want to support a use-case where we can determine validity prior to downloading the model, does that involve the same input or different? I assume for that scenario we'd need to query the OrtEpFactory.
And with that in mind I'm wondering whether there should be some more structure around getting/storing this info in the Compile stage.
e.g. there's a function in OrtEp to return a string for the compatibility info and ORT calls that as part of Compile. ORT stores that in some well-defined place in model metadata. ORT uses that model metadata to query the factory and/or EP at runtime to determine if it can run the model.
The EP controls what is in the info, but we control where it is stored in order to provide a consistent way to check compatibility without necessarily needing the model. The compatibility metadata could be in the catalog to support a check prior to model download.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is going to be the input to determine if the model is compatible. Is it only info in the EPContext nodes or would model metadata potentially be involved as well?
This is something the EP would dictate (the thought was that they'd want to at least have the graph as an input into that decision in case there is EPContext info that would factor into it, hence the argument to the API). That said, I am not sure if EP implementors will need more than this for howsoever they are making the determination. An open item is whether we can get more feedback from them on this.
When we were thinking through the design, the thought was that we'd get this ABI change in (given the schedule), and then fast-follow it with changes to the session creation flow where ORT would check if the supplied model is compiled, and if so, ask the relevant EPs to determine compatibility. And then as a possible path forward, we could expose a new flag in the session options for a caller to indicate how they want to handle the case where the model is compiled but suboptimal (e.g., continue with using the model; or bail out of the session so that they can try to recompile, assuming they can get to the original model; @adrianlizarraga also had some ideas on doing this via a callback as well).
If we also want to support a use-case where we can determine validity prior to downloading the model, does that involve the same input or different? I assume for that scenario we'd need to query the OrtEpFactory.
This is basically the insight I gleaned from talking with @jonwis about this the other day... I was originally thinking about this current PR in terms of the scenario where the app has a model in-hand and wants to figure out if it's compatible with the system where their code is running. However, that will not help solve this 'catalog' use-case you're describing where the app wants to pick the right model from a set of options on a server somewhere. I had filed #25312 for trying to solve the catalog scenario, with the thinking that the driver version was enough info to construct that identifier the app could use to pick the appropriate model from the catalog. I later learned that 1) driver version doesn't tell the whole story, and 2) the issue as written is incomplete and does not address the code the app would then write to do the check- that would likely have to be done via a new API in ORT.
I was mentally considering these two use cases as separate things, but it's seeming like they are two sides of the same coin.
I'm wondering whether there should be some more structure around getting/storing this info in the Compile stage.
I think you are ultimately right. I'm relatively new to the ORT codebase, so I'll need to spend some time unpacking the approach you sketched out in the second half of your message. There's also the little wrinkle around schedule physics, and what we can conceivably do in the current milestone vs. what might have to be deferred to later, and how to make sure both parts are aligned. If you have feedback / advice here, please let me know!
e.g. there's a function in OrtEp to return a string for the compatibility info and ORT calls that as part of Compile. ORT stores that in some well-defined place in model metadata.
A string seems like the simplest representation of that compatibility info, so that makes sense. In terms of the well-defined place: I am still learning about the possible places where such data could be written out. I'm seeing the world through EPContext-colored lenses at the moment (i.e., so the operator schema would need to be updated to support this if that's the best place). Are there others that would be more appropriate?
The EP controls what is in the info, but we control where it is stored in order to provide a consistent way to check compatibility without necessarily needing the model.
So maybe this implies this proposed ABI method should be updated to take a string (e.g., const char*) of the compatibility info instead of a graph? For convenience, we would need a separate API for any catalog-use-case applications to call which would leverage the ABI methods- still thinking about what that would look like (I suppose the apps could also get and call the method on the EPs too; the API would better encapsulate that though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split it into Generate and a Validate functions?
Instead of having GetCompiledModelCompatibility check and return status, we have a something like 'GenerateCompatibiltyInfo' which takes the graph (may also need model metadata or a way to get it via the graph) and a 'ValidateCompatibility'.
When running the compile we call the 'generate' and save that info in say model metadata.
When validating, if we have the info (from model metadata or the catalog) we can pass it into the 'validate' (does not require the model). If we don't have the info we call 'generate' followed by 'validate'.
The validate probably needs to be in the OrtEpFactory as we don't create an EP instance without a model.
The generate would be in OrtEp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going with Scott's proposal (Generate/Validate) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the plan- I'm continuing to iterate on that (for now, the prior API proposal appears in the code, but it'll be removed before the PR is published).
I have a separate commit from earlier this week where I had pushed some candidate signatures for that (I've since revised the Generate one a bit- I am wondering if it is strictly necessary and the EP could produce the compatibility string just from the graph. Open to feedback on that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we control where it is stored in order to provide a consistent way to check compatibility
@skottmckay, @adrianlizarraga and I were talking more about this; could the Generate method be avoided altogether by, say, having an agreed-upon convention for the EP to write the compatibility string into any EPContext nodes as part of compilation? That is admittedly a less strong form of consistency but would still provide a single place and simplify the API a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm perfectly fine with the compatibility string being a value in the EPContext node/s. Should there be a specific attribute added to the EPContext schema for this info to formalize the usage more?
Having it in an EPContext node raises a couple of questions if there are multiple. Is the compatibility value in one or all? If one, does it matter which one? If more than one, is the value required to be the same in all?
Out of interest would it be better to have the compatibility string in model metadata instead/as well? I'm thinking of a scenario where you have the model and want to check validity. Reading just the model metadata from the ModelProto would be cheaper/faster as ORT wouldn't need to create an onnxruntime::Model (with Graph and Node instances) to get to the EPContext node to read attributes.
| * | ||
| * \since Version 1.23. | ||
| */ | ||
| const char*(ORT_API_CALL* GenerateCompiledModelCompatibilityInfoString)(_In_ OrtEp* this_ptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skottmckay I pushed a commit with some interface candidates based on my interpretation of the other comment thread- do these look reasonable to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
| * to determine if a compiled model is compatible with the EP. | ||
| * | ||
| * The returned string should be a null-terminated, UTF-8 encoded string. ORT will copy it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * to determine if a compiled model is compatible with the EP. | |
| * | |
| * The returned string should be a null-terminated, UTF-8 encoded string. ORT will copy it. | |
| * to determine if a compiled model is compatible with the EP. | |
| * | |
| * The returned string should be a null-terminated, UTF-8 encoded string. ORT will copy it. |
| * \param[in] this_ptr The OrtEpFactory instance. | ||
| * \param[in] compatibility_info The compatibility information string that will be used | ||
| * \param[out] model_compatibility Output parameter set to the OrtCompiledModelCompatibility enum value that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * \param[in] this_ptr The OrtEpFactory instance. | |
| * \param[in] compatibility_info The compatibility information string that will be used | |
| * \param[out] model_compatibility Output parameter set to the OrtCompiledModelCompatibility enum value that | |
| * \param[in] this_ptr The OrtEpFactory instance. | |
| * \param[in] compatibility_info The compatibility information string that will be used | |
| * \param[out] model_compatibility Output parameter set to the OrtCompiledModelCompatibility enum value that |
| */ | ||
| const char*(ORT_API_CALL* GenerateCompiledModelCompatibilityInfoString)(_In_ OrtEp* this_ptr, | ||
| _In_ const OrtGraph* graph, | ||
| _In_ const OrtModelMetadata* model_metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skottmckay question here about the idea of persisting the compatibility information in the model metadata: would this be something added to the custom metadata map? Still learning about metadata in ORT- the alternative was adding it as a peer value to producer_name, et. al., which is quite a bit more surgery throughout the codebase. I'm moving in the direction of the former but wanted to cross-check.
| const auto validate_compatibility_info = [](OrtEpFactory* /*factory*/, | ||
| const char* /*compatibility_info*/, | ||
| OrtCompiledModelCompatibility* model_compatibility) -> OrtStatus* { | ||
| // CPU EP does not support compiled models. Set the compatibility to OrtCompiledModelCompatibility_SUPPORT_UNKNOWN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compatibility validation API, best as I can tell, needs to be plumbed across the internal factory implementation (even to provide a basic stub), but then this results in all of the provider factories having to provide those based on how the scaffolding is set up- I was getting build errors until I eventually implemented across the whole surface (for completeness, I would need to add the DML and WebGPU ones to this same file, and then we have the other EP factories to contend with).
Did I miss a simpler way? Am I going in the wrong direction here? I was trying to do this in a way that didn't involve changing the various EP factories (until they were ready to add whatever their validation schemes would be). cc: @adrianlizarraga @skottmckay
…putting it as a proposal so I can get feedback on the right way)
|
This PR has been replaced by #25749 |
Description
Motivation and Context