-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Convert Initializers to OrtValues Phase 2 #25320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fix reshaping for external weights Fix Fusion Helper
skottmckay
reviewed
Jul 18, 2025
edgchen1
reviewed
Jul 22, 2025
adrianlizarraga
approved these changes
Jul 23, 2025
adrianlizarraga
added a commit
that referenced
this pull request
Jul 24, 2025
…tValues (#25482) ### Description - Adds APIs to get information (file path, file offset, byte size) for initializers with data in external files. This allows EPs to do their own custom memory-mapping of initializer data. By default, EPs that don't have specific requirements can still use `ValueInfo_GetInitializerValue` to get an `OrtValue` with memory-mapped initializer data. - Updates `OrtGraph` to only load `OrtValue` for external initializers on demand. This prevents having to memory map all external initializers before the first call to `OrtEp::GetCapability`. Follow up to #25320 New API functions: | Function | Summary| |-----------|--------------| | `ValueInfo_GetExternalInitializerInfo` | Get `OrtExternalInitializerInfo` from `OrtValueInfo` (or `NULL`). Must be released with `ReleaseExternalInitializerInfo`| | `ReleaseExternalInitializerInfo` | Releases the `OrtExternalInitializerInfo` instance | | `ExternalInitializerInfo_GetFilePath` | Returns the relative path to the file that stores the initializer's data | | `ExternalInitializerInfo_GetFileOffset` | Returns the byte offset within the file where the initializer's data is stored | | `ExternalInitializerInfo_GetByteSize` | Returns the size in bytes of the initializer's data within the file | ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Scott McKay <[email protected]>
Member
|
Hi there! We haven't cut the release branch for this version yet, so I'm removing the |
RyanMetcalfeInt8
pushed a commit
to RyanMetcalfeInt8/onnxruntime
that referenced
this pull request
Jul 29, 2025
…tValues (microsoft#25482) ### Description - Adds APIs to get information (file path, file offset, byte size) for initializers with data in external files. This allows EPs to do their own custom memory-mapping of initializer data. By default, EPs that don't have specific requirements can still use `ValueInfo_GetInitializerValue` to get an `OrtValue` with memory-mapped initializer data. - Updates `OrtGraph` to only load `OrtValue` for external initializers on demand. This prevents having to memory map all external initializers before the first call to `OrtEp::GetCapability`. Follow up to microsoft#25320 New API functions: | Function | Summary| |-----------|--------------| | `ValueInfo_GetExternalInitializerInfo` | Get `OrtExternalInitializerInfo` from `OrtValueInfo` (or `NULL`). Must be released with `ReleaseExternalInitializerInfo`| | `ReleaseExternalInitializerInfo` | Releases the `OrtExternalInitializerInfo` instance | | `ExternalInitializerInfo_GetFilePath` | Returns the relative path to the file that stores the initializer's data | | `ExternalInitializerInfo_GetFileOffset` | Returns the byte offset within the file where the initializer's data is stored | | `ExternalInitializerInfo_GetByteSize` | Returns the size in bytes of the initializer's data within the file | ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Scott McKay <[email protected]>
wcy123
pushed a commit
to wcy123/onnxruntime
that referenced
this pull request
Aug 1, 2025
it is related to microsoft#25320 microsoft#23979
This was referenced Aug 1, 2025
carzh
pushed a commit
that referenced
this pull request
Aug 7, 2025
### Description Make protobuf weights refer to OrtValues on load. Create OrtValues for initializers that are loaded from ORT format for uniformity. Create OrtValues for ORT format initializers. Adjust exporting Graph::ToGraphProto() so it does not export in memory references in external data. Make CoreML process external data including in memory references so it can copy it. ### Motivation and Context Follow up for #23979
adrianlizarraga
pushed a commit
that referenced
this pull request
Aug 8, 2025
### Description It is related to #25320 #23979. Enable tensor raw data sharing for externalized tensor proto with kTensorProtoMemoryAddressTag ### Motivation and Context With #25320 #23979, all initialized tensor protos are associated with OrtValue, VitisiAI EP need to adapt to this change. Co-authored-by: mingyue <[email protected]>
adrianlizarraga
pushed a commit
that referenced
this pull request
Aug 9, 2025
### Description It is related to #25320 #23979. Enable tensor raw data sharing for externalized tensor proto with kTensorProtoMemoryAddressTag ### Motivation and Context With #25320 #23979, all initialized tensor protos are associated with OrtValue, VitisiAI EP need to adapt to this change. Co-authored-by: mingyue <[email protected]>
sanketkaleoss
pushed a commit
to sanketkaleoss/onnxruntime
that referenced
this pull request
Aug 11, 2025
### Description Make protobuf weights refer to OrtValues on load. Create OrtValues for initializers that are loaded from ORT format for uniformity. Create OrtValues for ORT format initializers. Adjust exporting Graph::ToGraphProto() so it does not export in memory references in external data. Make CoreML process external data including in memory references so it can copy it. ### Motivation and Context Follow up for microsoft#23979
sanketkaleoss
pushed a commit
to sanketkaleoss/onnxruntime
that referenced
this pull request
Aug 11, 2025
…tValues (microsoft#25482) ### Description - Adds APIs to get information (file path, file offset, byte size) for initializers with data in external files. This allows EPs to do their own custom memory-mapping of initializer data. By default, EPs that don't have specific requirements can still use `ValueInfo_GetInitializerValue` to get an `OrtValue` with memory-mapped initializer data. - Updates `OrtGraph` to only load `OrtValue` for external initializers on demand. This prevents having to memory map all external initializers before the first call to `OrtEp::GetCapability`. Follow up to microsoft#25320 New API functions: | Function | Summary| |-----------|--------------| | `ValueInfo_GetExternalInitializerInfo` | Get `OrtExternalInitializerInfo` from `OrtValueInfo` (or `NULL`). Must be released with `ReleaseExternalInitializerInfo`| | `ReleaseExternalInitializerInfo` | Releases the `OrtExternalInitializerInfo` instance | | `ExternalInitializerInfo_GetFilePath` | Returns the relative path to the file that stores the initializer's data | | `ExternalInitializerInfo_GetFileOffset` | Returns the byte offset within the file where the initializer's data is stored | | `ExternalInitializerInfo_GetByteSize` | Returns the size in bytes of the initializer's data within the file | ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Scott McKay <[email protected]>
sanketkaleoss
pushed a commit
to sanketkaleoss/onnxruntime
that referenced
this pull request
Aug 11, 2025
### Description It is related to microsoft#25320 microsoft#23979. Enable tensor raw data sharing for externalized tensor proto with kTensorProtoMemoryAddressTag ### Motivation and Context With microsoft#25320 microsoft#23979, all initialized tensor protos are associated with OrtValue, VitisiAI EP need to adapt to this change. Co-authored-by: mingyue <[email protected]>
gedoensmax
pushed a commit
to gedoensmax/onnxruntime
that referenced
this pull request
Sep 2, 2025
### Description It is related to microsoft#25320 microsoft#23979. Enable tensor raw data sharing for externalized tensor proto with kTensorProtoMemoryAddressTag ### Motivation and Context With microsoft#25320 microsoft#23979, all initialized tensor protos are associated with OrtValue, VitisiAI EP need to adapt to this change. Co-authored-by: mingyue <[email protected]>
snnn
added a commit
that referenced
this pull request
Oct 8, 2025
When Constant nodes have tensors larger than 127 bytes, they are converted to OrtValues with in-memory external data for efficiency. However, ONNX shape inference rejects TensorProtos with data_location=EXTERNAL, as it cannot distinguish between in-memory and file-based external data. This fix modifies InferenceContextImpl::getInputData() to detect in-memory external data and materialize it into a temporary TensorProto with embedded data that ONNX shape inference can process. Fixes #26261 The issue was introduced in commit 3b97d79 (PR #25320) which converted large initializers to OrtValues. This regression caused models with Constant nodes having tensors just over 127 bytes to fail loading with shape inference errors. Changes: - Modified getInputData() to check for in-memory external data using utils::HasExternalDataInMemory() - When detected, retrieves the OrtValue and creates a temporary TensorProto with embedded data (use_tensor_buffer=false) - Added temp_tensor_protos_ member to store these temporary protos so they outlive the shape inference call
2 tasks
yuslepukhin
added a commit
that referenced
this pull request
Oct 14, 2025
## Description Fixes #26261 This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error. ### Root Cause Commit 3b97d79 (PR #25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with `data_location = EXTERNAL`. ### The Fix Modified `InferenceContextImpl::getInputData()` to: 1. Detect tensors with in-memory external data using `utils::HasExternalDataInMemory()` 2. Retrieve the corresponding OrtValue 3. Create a temporary TensorProto with embedded data (not external reference) 4. Provide this temporary proto to ONNX shape inference This allows ONNX shape inference to access the actual tensor data without rejecting it as external. ### Memory Impact This fix introduces a minor and temporary increase in memory usage during the model loading phase. - **When:** The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model. - **What:** The fix creates a temporary in-memory copy of the tensor data. - **Duration:** This temporary copy is released as soon as shape inference is complete. The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models. ### Testing - Tested with the problematic model from issue #26261 - All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL) - Unit tests to be added ### Changes - **onnxruntime/core/graph/graph.cc**: - Modified `getInputData()` method in `InferenceContextImpl` class - Added `temp_tensor_protos_` member to store temporary TensorProtos during shape inference ## TODO - [ ] Add unit tests - [ ] Run full test suite --------- Co-authored-by: Dmitri Smirnov <[email protected]>
apsonawane
pushed a commit
that referenced
this pull request
Oct 17, 2025
## Description Fixes #26261 This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error. ### Root Cause Commit 3b97d79 (PR #25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with `data_location = EXTERNAL`. ### The Fix Modified `InferenceContextImpl::getInputData()` to: 1. Detect tensors with in-memory external data using `utils::HasExternalDataInMemory()` 2. Retrieve the corresponding OrtValue 3. Create a temporary TensorProto with embedded data (not external reference) 4. Provide this temporary proto to ONNX shape inference This allows ONNX shape inference to access the actual tensor data without rejecting it as external. ### Memory Impact This fix introduces a minor and temporary increase in memory usage during the model loading phase. - **When:** The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model. - **What:** The fix creates a temporary in-memory copy of the tensor data. - **Duration:** This temporary copy is released as soon as shape inference is complete. The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models. ### Testing - Tested with the problematic model from issue #26261 - All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL) - Unit tests to be added ### Changes - **onnxruntime/core/graph/graph.cc**: - Modified `getInputData()` method in `InferenceContextImpl` class - Added `temp_tensor_protos_` member to store temporary TensorProtos during shape inference ## TODO - [ ] Add unit tests - [ ] Run full test suite --------- Co-authored-by: Dmitri Smirnov <[email protected]>
apsonawane
pushed a commit
that referenced
this pull request
Oct 20, 2025
## Description Fixes #26261 This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error. ### Root Cause Commit 3b97d79 (PR #25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with `data_location = EXTERNAL`. ### The Fix Modified `InferenceContextImpl::getInputData()` to: 1. Detect tensors with in-memory external data using `utils::HasExternalDataInMemory()` 2. Retrieve the corresponding OrtValue 3. Create a temporary TensorProto with embedded data (not external reference) 4. Provide this temporary proto to ONNX shape inference This allows ONNX shape inference to access the actual tensor data without rejecting it as external. ### Memory Impact This fix introduces a minor and temporary increase in memory usage during the model loading phase. - **When:** The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model. - **What:** The fix creates a temporary in-memory copy of the tensor data. - **Duration:** This temporary copy is released as soon as shape inference is complete. The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models. ### Testing - Tested with the problematic model from issue #26261 - All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL) - Unit tests to be added ### Changes - **onnxruntime/core/graph/graph.cc**: - Modified `getInputData()` method in `InferenceContextImpl` class - Added `temp_tensor_protos_` member to store temporary TensorProtos during shape inference ## TODO - [ ] Add unit tests - [ ] Run full test suite --------- Co-authored-by: Dmitri Smirnov <[email protected]>
fs-eire
pushed a commit
that referenced
this pull request
Oct 24, 2025
## Description Fixes #26261 This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error. ### Root Cause Commit 3b97d79 (PR #25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with `data_location = EXTERNAL`. ### The Fix Modified `InferenceContextImpl::getInputData()` to: 1. Detect tensors with in-memory external data using `utils::HasExternalDataInMemory()` 2. Retrieve the corresponding OrtValue 3. Create a temporary TensorProto with embedded data (not external reference) 4. Provide this temporary proto to ONNX shape inference This allows ONNX shape inference to access the actual tensor data without rejecting it as external. ### Memory Impact This fix introduces a minor and temporary increase in memory usage during the model loading phase. - **When:** The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model. - **What:** The fix creates a temporary in-memory copy of the tensor data. - **Duration:** This temporary copy is released as soon as shape inference is complete. The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models. ### Testing - Tested with the problematic model from issue #26261 - All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL) - Unit tests to be added ### Changes - **onnxruntime/core/graph/graph.cc**: - Modified `getInputData()` method in `InferenceContextImpl` class - Added `temp_tensor_protos_` member to store temporary TensorProtos during shape inference ## TODO - [ ] Add unit tests - [ ] Run full test suite --------- Co-authored-by: Dmitri Smirnov <[email protected]>
yuslepukhin
added a commit
that referenced
this pull request
Oct 30, 2025
…lues early (#26345) ### Description Converts weights early and revert "Properly remove in-memory references (#25652)" This reverts commit 3ca49d8 and makes appropriate adjustments for the current state of the code. This PR is made possible and on the heels of: #26263 #25833. Previous history: #23979 #25320 #25626 #25652 The first change (#26263) allows us to convert initializers to OrtValues early and save lots of memory at model loading time. Specifically, for Phi-4-mini-instruct-INT4 model before and after looks like this: **Before** <img width="1204" height="124" alt="Before change DEBUG 2025-10-16 144819" src="https://github.com/user-attachments/assets/674ff75b-057f-498a-a906-0140d59d46e6" /> **After** <img width="997" height="114" alt="After change DEBUG 2025-10-16 144819" src="https://github.com/user-attachments/assets/df1783af-7f50-4cd2-b3ad-6868f23be53f" /> The two peaks represent memory usage at optimization time (8.1Gb before) and after weights memory mapping (6.5Gb) After this change corresponding numbers look 3.5Gb and 4.7Gb respectively. Most of the savings during optimization phase come from `ConstantFolding` where we are able to reuse the resulting OrtValues directly for the new initializers. This PR concludes a series of PRs converting initializers to OrtValues. Memory consumption before the conversion began was 9.3Gb and 6.7Gb respectively. We are saving almost 6Gb during optimization and 2Gb for the steady state. <img width="1175" height="139" alt="image" src="https://github.com/user-attachments/assets/80e7d228-8a8e-4316-8e04-b02c2be30f04" /> The model also loads about 12 seconds faster. Example of ConstantFolding being one of the top contributors where we duplicate memory for higher peak before Resolve takes care of no longer used initializers. <img width="1100" height="558" alt="Sanpshot 3 Peak on ConstantFolding Transpose Optimizer" src="https://github.com/user-attachments/assets/95545abd-3f99-46d9-862e-bbf27cbb5b40" /> <img width="1060" height="600" alt="Snapshot 4 Peak AddInitializer from ConstantFolding" src="https://github.com/user-attachments/assets/dd457ec6-23ee-4efd-8c60-625d5faad61e" /> <img width="325" height="160" alt="image" src="https://github.com/user-attachments/assets/37c1194d-f683-49a7-afb1-073dfbb9bbfc" /> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Reduce memory usage.
naomiOvad
pushed a commit
to naomiOvad/onnxruntime
that referenced
this pull request
Nov 2, 2025
…6263) ## Description Fixes microsoft#26261 This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error. ### Root Cause Commit 3b97d79 (PR microsoft#25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with `data_location = EXTERNAL`. ### The Fix Modified `InferenceContextImpl::getInputData()` to: 1. Detect tensors with in-memory external data using `utils::HasExternalDataInMemory()` 2. Retrieve the corresponding OrtValue 3. Create a temporary TensorProto with embedded data (not external reference) 4. Provide this temporary proto to ONNX shape inference This allows ONNX shape inference to access the actual tensor data without rejecting it as external. ### Memory Impact This fix introduces a minor and temporary increase in memory usage during the model loading phase. - **When:** The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model. - **What:** The fix creates a temporary in-memory copy of the tensor data. - **Duration:** This temporary copy is released as soon as shape inference is complete. The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models. ### Testing - Tested with the problematic model from issue microsoft#26261 - All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL) - Unit tests to be added ### Changes - **onnxruntime/core/graph/graph.cc**: - Modified `getInputData()` method in `InferenceContextImpl` class - Added `temp_tensor_protos_` member to store temporary TensorProtos during shape inference ## TODO - [ ] Add unit tests - [ ] Run full test suite --------- Co-authored-by: Dmitri Smirnov <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Make protobuf weights refer to OrtValues on load.
Create OrtValues for initializers that are loaded from ORT format for uniformity.
Create OrtValues for ORT format initializers.
Adjust exporting Graph::ToGraphProto() so it does not export in memory references in external data.
Make CoreML process external data including in memory references so it can copy it.
Motivation and Context
Follow up for #23979