-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[EP ABI] Get EP compiled model compatibility #25331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
1667e6a
[EP ABI] Get EP compiled model compatibility
adrianlizarraga 144cc1e
Merge remote-tracking branch 'origin/main' into adrianl/ep-abi-model-…
fs-eire a963870
Interface revisions based on feedback
0beed66
Plumbing for generating compat string
66a547e
Plumbing for validation API
6e2b8e4
*An* approach to adding the compat strings (not happy with this, but …
ed62236
Fixing hack from earlier (got duped from misleading Intellisense)
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skottmckay quick heads-up regarding this PR that we're trying to eventually take for 1.23; there's a requirement to ensure apps can eventually figure out if a given compiled model is compatible with the underlying device, which implies a new method for EP implementors to fill in on the ABI (see #25313). @adrianlizarraga and I have been riffing on this- he's got a draft stood up here which I am going to try to carry forward. Wanted to make sure you were aware of this and if you had any questions / concerns. cc: @jywu-msft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly have a high-level question about what is going to be the input to determine if the model is compatible. Is it only info in the EPContext nodes or would model metadata potentially be involved as well?
If we also want to support a use-case where we can determine validity prior to downloading the model, does that involve the same input or different? I assume for that scenario we'd need to query the OrtEpFactory.
And with that in mind I'm wondering whether there should be some more structure around getting/storing this info in the Compile stage.
e.g. there's a function in OrtEp to return a string for the compatibility info and ORT calls that as part of Compile. ORT stores that in some well-defined place in model metadata. ORT uses that model metadata to query the factory and/or EP at runtime to determine if it can run the model.
The EP controls what is in the info, but we control where it is stored in order to provide a consistent way to check compatibility without necessarily needing the model. The compatibility metadata could be in the catalog to support a check prior to model download.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something the EP would dictate (the thought was that they'd want to at least have the graph as an input into that decision in case there is EPContext info that would factor into it, hence the argument to the API). That said, I am not sure if EP implementors will need more than this for howsoever they are making the determination. An open item is whether we can get more feedback from them on this.
When we were thinking through the design, the thought was that we'd get this ABI change in (given the schedule), and then fast-follow it with changes to the session creation flow where ORT would check if the supplied model is compiled, and if so, ask the relevant EPs to determine compatibility. And then as a possible path forward, we could expose a new flag in the session options for a caller to indicate how they want to handle the case where the model is compiled but suboptimal (e.g., continue with using the model; or bail out of the session so that they can try to recompile, assuming they can get to the original model; @adrianlizarraga also had some ideas on doing this via a callback as well).
This is basically the insight I gleaned from talking with @jonwis about this the other day... I was originally thinking about this current PR in terms of the scenario where the app has a model in-hand and wants to figure out if it's compatible with the system where their code is running. However, that will not help solve this 'catalog' use-case you're describing where the app wants to pick the right model from a set of options on a server somewhere. I had filed #25312 for trying to solve the catalog scenario, with the thinking that the driver version was enough info to construct that identifier the app could use to pick the appropriate model from the catalog. I later learned that 1) driver version doesn't tell the whole story, and 2) the issue as written is incomplete and does not address the code the app would then write to do the check- that would likely have to be done via a new API in ORT.
I was mentally considering these two use cases as separate things, but it's seeming like they are two sides of the same coin.
I think you are ultimately right. I'm relatively new to the ORT codebase, so I'll need to spend some time unpacking the approach you sketched out in the second half of your message. There's also the little wrinkle around schedule physics, and what we can conceivably do in the current milestone vs. what might have to be deferred to later, and how to make sure both parts are aligned. If you have feedback / advice here, please let me know!
A string seems like the simplest representation of that compatibility info, so that makes sense. In terms of the well-defined place: I am still learning about the possible places where such data could be written out. I'm seeing the world through EPContext-colored lenses at the moment (i.e., so the operator schema would need to be updated to support this if that's the best place). Are there others that would be more appropriate?
So maybe this implies this proposed ABI method should be updated to take a string (e.g.,
const char*) of the compatibility info instead of a graph? For convenience, we would need a separate API for any catalog-use-case applications to call which would leverage the ABI methods- still thinking about what that would look like (I suppose the apps could also get and call the method on the EPs too; the API would better encapsulate that though).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split it into Generate and a Validate functions?
Instead of having
GetCompiledModelCompatibilitycheck and return status, we have a something like 'GenerateCompatibiltyInfo' which takes the graph (may also need model metadata or a way to get it via the graph) and a 'ValidateCompatibility'.When running the compile we call the 'generate' and save that info in say model metadata.
When validating, if we have the info (from model metadata or the catalog) we can pass it into the 'validate' (does not require the model). If we don't have the info we call 'generate' followed by 'validate'.
The validate probably needs to be in the OrtEpFactory as we don't create an EP instance without a model.
The generate would be in OrtEp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going with Scott's proposal (Generate/Validate) ?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the plan- I'm continuing to iterate on that (for now, the prior API proposal appears in the code, but it'll be removed before the PR is published).
I have a separate commit from earlier this week where I had pushed some candidate signatures for that (I've since revised the Generate one a bit- I am wondering if it is strictly necessary and the EP could produce the compatibility string just from the graph. Open to feedback on that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skottmckay, @adrianlizarraga and I were talking more about this; could the Generate method be avoided altogether by, say, having an agreed-upon convention for the EP to write the compatibility string into any EPContext nodes as part of compilation? That is admittedly a less strong form of consistency but would still provide a single place and simplify the API a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm perfectly fine with the compatibility string being a value in the EPContext node/s. Should there be a specific attribute added to the EPContext schema for this info to formalize the usage more?
Having it in an EPContext node raises a couple of questions if there are multiple. Is the compatibility value in one or all? If one, does it matter which one? If more than one, is the value required to be the same in all?
Out of interest would it be better to have the compatibility string in model metadata instead/as well? I'm thinking of a scenario where you have the model and want to check validity. Reading just the model metadata from the ModelProto would be cheaper/faster as ORT wouldn't need to create an onnxruntime::Model (with Graph and Node instances) to get to the EPContext node to read attributes.