Skip to content

Conversation

@yuslepukhin
Copy link
Member

@yuslepukhin yuslepukhin commented Jul 8, 2025

Description

Make protobuf weights refer to OrtValues on load.
Create OrtValues for initializers that are loaded from ORT format for uniformity.
Create OrtValues for ORT format initializers.
Adjust exporting Graph::ToGraphProto() so it does not export in memory references in external data.
Make CoreML process external data including in memory references so it can copy it.

Motivation and Context

Follow up for #23979

@jywu-msft jywu-msft changed the title Convert Initializers to OrtValues Convert Initializers to OrtValues Phase 2 Jul 8, 2025
@yuslepukhin yuslepukhin marked this pull request as ready for review July 16, 2025 19:45
@adrianlizarraga adrianlizarraga merged commit 3b97d79 into main Jul 23, 2025
92 of 96 checks passed
@adrianlizarraga adrianlizarraga deleted the yusleoukhin/ort_initializers_ii branch July 23, 2025 06:40
adrianlizarraga added a commit that referenced this pull request Jul 24, 2025
…tValues (#25482)

### Description
- Adds APIs to get information (file path, file offset, byte size) for
initializers with data in external files. This allows EPs to do their
own custom memory-mapping of initializer data. By default, EPs that
don't have specific requirements can still use
`ValueInfo_GetInitializerValue` to get an `OrtValue` with memory-mapped
initializer data.
- Updates `OrtGraph` to only load `OrtValue` for external initializers
on demand. This prevents having to memory map all external initializers
before the first call to `OrtEp::GetCapability`.

Follow up to #25320

New API functions:

| Function | Summary|
|-----------|--------------|
| `ValueInfo_GetExternalInitializerInfo` | Get
`OrtExternalInitializerInfo` from `OrtValueInfo` (or `NULL`). Must be
released with `ReleaseExternalInitializerInfo`|
| `ReleaseExternalInitializerInfo` | Releases the
`OrtExternalInitializerInfo` instance |
| `ExternalInitializerInfo_GetFilePath` | Returns the relative path to
the file that stores the initializer's data |
| `ExternalInitializerInfo_GetFileOffset` | Returns the byte offset
within the file where the initializer's data is stored |
| `ExternalInitializerInfo_GetByteSize` | Returns the size in bytes of
the initializer's data within the file |


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
@snnn
Copy link
Member

snnn commented Jul 25, 2025

Hi there! We haven't cut the release branch for this version yet, so I'm removing the release:1.23.0 label for now to keep things tidy. Thanks so much for your contribution! We'll make sure this gets included when the release is prepared. 🤖

RyanMetcalfeInt8 pushed a commit to RyanMetcalfeInt8/onnxruntime that referenced this pull request Jul 29, 2025
…tValues (microsoft#25482)

### Description
- Adds APIs to get information (file path, file offset, byte size) for
initializers with data in external files. This allows EPs to do their
own custom memory-mapping of initializer data. By default, EPs that
don't have specific requirements can still use
`ValueInfo_GetInitializerValue` to get an `OrtValue` with memory-mapped
initializer data.
- Updates `OrtGraph` to only load `OrtValue` for external initializers
on demand. This prevents having to memory map all external initializers
before the first call to `OrtEp::GetCapability`.

Follow up to microsoft#25320

New API functions:

| Function | Summary|
|-----------|--------------|
| `ValueInfo_GetExternalInitializerInfo` | Get
`OrtExternalInitializerInfo` from `OrtValueInfo` (or `NULL`). Must be
released with `ReleaseExternalInitializerInfo`|
| `ReleaseExternalInitializerInfo` | Releases the
`OrtExternalInitializerInfo` instance |
| `ExternalInitializerInfo_GetFilePath` | Returns the relative path to
the file that stores the initializer's data |
| `ExternalInitializerInfo_GetFileOffset` | Returns the byte offset
within the file where the initializer's data is stored |
| `ExternalInitializerInfo_GetByteSize` | Returns the size in bytes of
the initializer's data within the file |


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
wcy123 pushed a commit to wcy123/onnxruntime that referenced this pull request Aug 1, 2025
carzh pushed a commit that referenced this pull request Aug 7, 2025
### Description

Make protobuf weights refer to OrtValues on load.
Create OrtValues for initializers that are loaded from ORT format for
uniformity.
Create OrtValues for ORT format initializers.
Adjust exporting Graph::ToGraphProto() so it does not export in memory
references in external data.
Make CoreML process external data including in memory references so it
can copy it.

### Motivation and Context
Follow up for #23979
adrianlizarraga pushed a commit that referenced this pull request Aug 8, 2025
### Description

It is related to #25320 #23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With #25320 #23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <[email protected]>
adrianlizarraga pushed a commit that referenced this pull request Aug 9, 2025
### Description

It is related to #25320 #23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With #25320 #23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <[email protected]>
sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request Aug 11, 2025
### Description

Make protobuf weights refer to OrtValues on load.
Create OrtValues for initializers that are loaded from ORT format for
uniformity.
Create OrtValues for ORT format initializers.
Adjust exporting Graph::ToGraphProto() so it does not export in memory
references in external data.
Make CoreML process external data including in memory references so it
can copy it.

### Motivation and Context
Follow up for microsoft#23979
sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request Aug 11, 2025
…tValues (microsoft#25482)

### Description
- Adds APIs to get information (file path, file offset, byte size) for
initializers with data in external files. This allows EPs to do their
own custom memory-mapping of initializer data. By default, EPs that
don't have specific requirements can still use
`ValueInfo_GetInitializerValue` to get an `OrtValue` with memory-mapped
initializer data.
- Updates `OrtGraph` to only load `OrtValue` for external initializers
on demand. This prevents having to memory map all external initializers
before the first call to `OrtEp::GetCapability`.

Follow up to microsoft#25320

New API functions:

| Function | Summary|
|-----------|--------------|
| `ValueInfo_GetExternalInitializerInfo` | Get
`OrtExternalInitializerInfo` from `OrtValueInfo` (or `NULL`). Must be
released with `ReleaseExternalInitializerInfo`|
| `ReleaseExternalInitializerInfo` | Releases the
`OrtExternalInitializerInfo` instance |
| `ExternalInitializerInfo_GetFilePath` | Returns the relative path to
the file that stores the initializer's data |
| `ExternalInitializerInfo_GetFileOffset` | Returns the byte offset
within the file where the initializer's data is stored |
| `ExternalInitializerInfo_GetByteSize` | Returns the size in bytes of
the initializer's data within the file |


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request Aug 11, 2025
### Description

It is related to microsoft#25320 microsoft#23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With microsoft#25320 microsoft#23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <[email protected]>
gedoensmax pushed a commit to gedoensmax/onnxruntime that referenced this pull request Sep 2, 2025
### Description

It is related to microsoft#25320 microsoft#23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With microsoft#25320 microsoft#23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <[email protected]>
snnn added a commit that referenced this pull request Oct 8, 2025
When Constant nodes have tensors larger than 127 bytes, they are converted
to OrtValues with in-memory external data for efficiency. However, ONNX
shape inference rejects TensorProtos with data_location=EXTERNAL, as it
cannot distinguish between in-memory and file-based external data.

This fix modifies InferenceContextImpl::getInputData() to detect in-memory
external data and materialize it into a temporary TensorProto with embedded
data that ONNX shape inference can process.

Fixes #26261

The issue was introduced in commit 3b97d79 (PR #25320) which converted
large initializers to OrtValues. This regression caused models with Constant
nodes having tensors just over 127 bytes to fail loading with shape
inference errors.

Changes:
- Modified getInputData() to check for in-memory external data using
  utils::HasExternalDataInMemory()
- When detected, retrieves the OrtValue and creates a temporary TensorProto
  with embedded data (use_tensor_buffer=false)
- Added temp_tensor_protos_ member to store these temporary protos so they
  outlive the shape inference call
yuslepukhin added a commit that referenced this pull request Oct 14, 2025
## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
apsonawane pushed a commit that referenced this pull request Oct 17, 2025
## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
apsonawane pushed a commit that referenced this pull request Oct 20, 2025
## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
fs-eire pushed a commit that referenced this pull request Oct 24, 2025
## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
yuslepukhin added a commit that referenced this pull request Oct 30, 2025
…lues early (#26345)

### Description
Converts weights early and revert "Properly remove in-memory references
(#25652)"
This reverts commit 3ca49d8 and makes
appropriate adjustments for the current state of the code.

This PR is made possible and on the heels of:
#26263
#25833.

Previous history:
#23979
#25320
#25626
#25652

The first change (#26263)
allows us to convert initializers to OrtValues early and save lots of
memory at model loading time.

Specifically, for Phi-4-mini-instruct-INT4 model before and after looks
like this:

**Before**
<img width="1204" height="124" alt="Before change DEBUG 2025-10-16
144819"
src="https://github.com/user-attachments/assets/674ff75b-057f-498a-a906-0140d59d46e6"
/>

**After**

<img width="997" height="114" alt="After change DEBUG 2025-10-16 144819"
src="https://github.com/user-attachments/assets/df1783af-7f50-4cd2-b3ad-6868f23be53f"
/>

The two peaks represent memory usage at optimization time (8.1Gb before)
and after weights memory mapping (6.5Gb)
After this change corresponding numbers look 3.5Gb and 4.7Gb
respectively.
Most of the savings during optimization phase come from
`ConstantFolding` where we are able to reuse the resulting OrtValues
directly for the new initializers.

This PR concludes a series of PRs converting initializers to OrtValues.

Memory consumption before the conversion began was 9.3Gb and 6.7Gb
respectively. We are saving almost 6Gb during optimization and 2Gb for
the steady state.
 
 
<img width="1175" height="139" alt="image"
src="https://github.com/user-attachments/assets/80e7d228-8a8e-4316-8e04-b02c2be30f04"
/>

The model also loads about 12 seconds faster.

Example of ConstantFolding being one of the top contributors where we
duplicate memory for higher peak before Resolve takes care of no longer
used initializers.
<img width="1100" height="558" alt="Sanpshot 3 Peak on ConstantFolding
Transpose Optimizer"
src="https://github.com/user-attachments/assets/95545abd-3f99-46d9-862e-bbf27cbb5b40"
/>

<img width="1060" height="600" alt="Snapshot 4 Peak AddInitializer from
ConstantFolding"
src="https://github.com/user-attachments/assets/dd457ec6-23ee-4efd-8c60-625d5faad61e"
/>

<img width="325" height="160" alt="image"
src="https://github.com/user-attachments/assets/37c1194d-f683-49a7-afb1-073dfbb9bbfc"
/>


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Reduce memory usage.
naomiOvad pushed a commit to naomiOvad/onnxruntime that referenced this pull request Nov 2, 2025
…6263)

## Description

Fixes microsoft#26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR microsoft#25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue microsoft#26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants