Convert Initializers to OrtValues Phase 2 #25320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

adrianlizarraga merged 23 commits into main from yusleoukhin/ort_initializers_ii

Jul 23, 2025

Member

yuslepukhin commented Jul 8, 2025 •

edited

Loading

Description

Make protobuf weights refer to OrtValues on load.
Create OrtValues for initializers that are loaded from ORT format for uniformity.
Create OrtValues for ORT format initializers.
Adjust exporting Graph::ToGraphProto() so it does not export in memory references in external data.
Make CoreML process external data including in memory references so it can copy it.

Motivation and Context

Follow up for #23979

yuslepukhin added 2 commits

July 7, 2025 19:35


          Adjust InlineIfNode

e4c8244

Fix reshaping for external weights
Fix Fusion Helper


          Merge branch 'main' into yusleoukhin/ort_initializers_ii

5dc217e

jywu-msft changed the title ~~Convert Initializers to OrtValues~~ Convert Initializers to OrtValues Phase 2

yuslepukhin added 10 commits

July 8, 2025 11:36


          Fix a bug in Initializer::ToProtoWithOrtValue

ffb9bbe


          Address type conversion

e19b938


          Adjust ToProto() handling of in external data in memory

564c29f


          Make CoreML accept external initializers

507eb56


          Fix handling of external data in ToProto*()

5e89498


          Fix compiler error

fdcdc12


          Merge branch 'yuslepukhin/ort_initializers_mac' into yusleoukhin/ort_…

6e29d73

…initializers_ii


          Address compile error in Mac code

a0635b7


          Merge branch 'main' into yusleoukhin/ort_initializers_ii

e81b06c


          Adjust in memory references when saving optimized model

4cbddc9

jywu-msft added the release:1.23.0 label

yuslepukhin added 3 commits

July 14, 2025 10:38


          Merge branch 'main' into yusleoukhin/ort_initializers_ii

6ab9839


          Address ToGraphProto() issues

582f27c


          Fix ToGraphProto() and recreate test databases for test_embedlayer_fu…

a636579

…sion.py

yuslepukhin requested review from adrianlizarraga and skottmckay

July 16, 2025 19:44

yuslepukhin marked this pull request as ready for review

July 16, 2025 19:45

yuslepukhin added 4 commits

July 16, 2025 13:53


          GCC not happy about attr placement

83c313f


          Address build error

2c0f97e


          Address compiler error

5c15725


          Adjust test data for fastgelu fusion

a6b2576

adrianlizarraga reviewed

View reviewed changes

onnxruntime/core/graph/graph.cc Show resolved Hide resolved

onnxruntime/core/graph/graph.cc Show resolved Hide resolved

onnxruntime/core/graph/graph.cc Show resolved Hide resolved

skottmckay reviewed

View reviewed changes

include/onnxruntime/core/graph/graph.h Outdated Show resolved Hide resolved

onnxruntime/core/graph/graph.cc Outdated Show resolved Hide resolved

onnxruntime/core/graph/graph.cc Show resolved Hide resolved

yuslepukhin added 2 commits

July 21, 2025 15:14


          Address review comments

f59c9c6


          Add comment and resolve compiler error

64de315

edgchen1 reviewed

View reviewed changes

onnxruntime/core/framework/tensorprotoutils.cc Outdated Show resolved Hide resolved

onnxruntime/core/graph/graph.cc Outdated Show resolved Hide resolved

onnxruntime/core/graph/graph_utils.h Outdated Show resolved Hide resolved

onnxruntime/core/graph/graph.cc Show resolved Hide resolved

onnxruntime/core/graph/graph.cc Outdated Show resolved Hide resolved

onnxruntime/core/optimizer/attention_fusion_helper.h Outdated Show resolved Hide resolved

onnxruntime/core/optimizer/attention_fusion_helper.h Outdated Show resolved Hide resolved


          Address review comments

397fd13

adrianlizarraga mentioned this pull request

[EP ABI] API to get external initializer info + lazy load external OrtValues #25482

Merged

adrianlizarraga reviewed

View reviewed changes

onnxruntime/core/graph/graph.cc Show resolved Hide resolved

adrianlizarraga approved these changes

View reviewed changes

adrianlizarraga merged commit 3b97d79 into main

92 of 96 checks passed

adrianlizarraga deleted the yusleoukhin/ort_initializers_ii branch

July 23, 2025 06:40

adrianlizarraga added a commit that referenced this pull request


          [EP ABI] API to get external initializer info + lazy load external Or…

d702978

…tValues (#25482)

### Description
- Adds APIs to get information (file path, file offset, byte size) for
initializers with data in external files. This allows EPs to do their
own custom memory-mapping of initializer data. By default, EPs that
don't have specific requirements can still use
`ValueInfo_GetInitializerValue` to get an `OrtValue` with memory-mapped
initializer data.
- Updates `OrtGraph` to only load `OrtValue` for external initializers
on demand. This prevents having to memory map all external initializers
before the first call to `OrtEp::GetCapability`.

Follow up to #25320

New API functions:

| Function | Summary|
|-----------|--------------|
| `ValueInfo_GetExternalInitializerInfo` | Get
`OrtExternalInitializerInfo` from `OrtValueInfo` (or `NULL`). Must be
released with `ReleaseExternalInitializerInfo`|
| `ReleaseExternalInitializerInfo` | Releases the
`OrtExternalInitializerInfo` instance |
| `ExternalInitializerInfo_GetFilePath` | Returns the relative path to
the file that stores the initializer's data |
| `ExternalInitializerInfo_GetFileOffset` | Returns the byte offset
within the file where the initializer's data is stored |
| `ExternalInitializerInfo_GetByteSize` | Returns the size in bytes of
the initializer's data within the file |


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Scott McKay <[email protected]>

snnn removed the release:1.23.0 label

Member

snnn commented Jul 25, 2025

Hi there! We haven't cut the release branch for this version yet, so I'm removing the release:1.23.0 label for now to keep things tidy. Thanks so much for your contribution! We'll make sure this gets included when the release is prepared. 🤖

RyanMetcalfeInt8 pushed a commit to RyanMetcalfeInt8/onnxruntime that referenced this pull request


          [EP ABI] API to get external initializer info + lazy load external Or…

2930feb

…tValues (microsoft#25482)

### Description
- Adds APIs to get information (file path, file offset, byte size) for
initializers with data in external files. This allows EPs to do their
own custom memory-mapping of initializer data. By default, EPs that
don't have specific requirements can still use
`ValueInfo_GetInitializerValue` to get an `OrtValue` with memory-mapped
initializer data.
- Updates `OrtGraph` to only load `OrtValue` for external initializers
on demand. This prevents having to memory map all external initializers
before the first call to `OrtEp::GetCapability`.

Follow up to microsoft#25320

New API functions:

| Function | Summary|
|-----------|--------------|
| `ValueInfo_GetExternalInitializerInfo` | Get
`OrtExternalInitializerInfo` from `OrtValueInfo` (or `NULL`). Must be
released with `ReleaseExternalInitializerInfo`|
| `ReleaseExternalInitializerInfo` | Releases the
`OrtExternalInitializerInfo` instance |
| `ExternalInitializerInfo_GetFilePath` | Returns the relative path to
the file that stores the initializer's data |
| `ExternalInitializerInfo_GetFileOffset` | Returns the byte offset
within the file where the initializer's data is stored |
| `ExternalInitializerInfo_GetByteSize` | Returns the size in bytes of
the initializer's data within the file |


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Scott McKay <[email protected]>

wcy123 pushed a commit to wcy123/onnxruntime that referenced this pull request


          [VitisAI] bugfix model_clone optimization

1109d03

it is related to microsoft#25320 microsoft#23979

This was referenced Aug 1, 2025

[VitisAI] bugfix model_clone optimization #25629

Merged

Bugfix vitisai ep model clone with 23979 25320 #25654

Closed

carzh pushed a commit that referenced this pull request


          Convert Initializers to OrtValues Phase 2 (#25320)

59ccc8e

### Description

Make protobuf weights refer to OrtValues on load.
Create OrtValues for initializers that are loaded from ORT format for
uniformity.
Create OrtValues for ORT format initializers.
Adjust exporting Graph::ToGraphProto() so it does not export in memory
references in external data.
Make CoreML process external data including in memory references so it
can copy it.

### Motivation and Context
Follow up for #23979

adrianlizarraga pushed a commit that referenced this pull request


          [VitisAI] bugfix model_clone optimization (#25629)

bfd7f97

### Description

It is related to #25320 #23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With #25320 #23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <[email protected]>

yifei410 mentioned this pull request

[VitisAI] bugfix model_clone optimization #25707

Closed

adrianlizarraga pushed a commit that referenced this pull request


          [VitisAI] bugfix model_clone optimization (#25629)

27b24aa

### Description

It is related to #25320 #23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With #25320 #23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <[email protected]>

sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request


          Convert Initializers to OrtValues Phase 2 (microsoft#25320)

7aaf854

### Description

Make protobuf weights refer to OrtValues on load.
Create OrtValues for initializers that are loaded from ORT format for
uniformity.
Create OrtValues for ORT format initializers.
Adjust exporting Graph::ToGraphProto() so it does not export in memory
references in external data.
Make CoreML process external data including in memory references so it
can copy it.

### Motivation and Context
Follow up for microsoft#23979

sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request


          [EP ABI] API to get external initializer info + lazy load external Or…

f1c3e4e

…tValues (microsoft#25482)

### Description
- Adds APIs to get information (file path, file offset, byte size) for
initializers with data in external files. This allows EPs to do their
own custom memory-mapping of initializer data. By default, EPs that
don't have specific requirements can still use
`ValueInfo_GetInitializerValue` to get an `OrtValue` with memory-mapped
initializer data.
- Updates `OrtGraph` to only load `OrtValue` for external initializers
on demand. This prevents having to memory map all external initializers
before the first call to `OrtEp::GetCapability`.

Follow up to microsoft#25320

New API functions:

| Function | Summary|
|-----------|--------------|
| `ValueInfo_GetExternalInitializerInfo` | Get
`OrtExternalInitializerInfo` from `OrtValueInfo` (or `NULL`). Must be
released with `ReleaseExternalInitializerInfo`|
| `ReleaseExternalInitializerInfo` | Releases the
`OrtExternalInitializerInfo` instance |
| `ExternalInitializerInfo_GetFilePath` | Returns the relative path to
the file that stores the initializer's data |
| `ExternalInitializerInfo_GetFileOffset` | Returns the byte offset
within the file where the initializer's data is stored |
| `ExternalInitializerInfo_GetByteSize` | Returns the size in bytes of
the initializer's data within the file |


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Scott McKay <[email protected]>

sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request


          [VitisAI] bugfix model_clone optimization (microsoft#25629)

febf773

### Description

It is related to microsoft#25320 microsoft#23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With microsoft#25320 microsoft#23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <[email protected]>

gedoensmax pushed a commit to gedoensmax/onnxruntime that referenced this pull request


          [VitisAI] bugfix model_clone optimization (microsoft#25629)

72bd6dc

### Description

It is related to microsoft#25320 microsoft#23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With microsoft#25320 microsoft#23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <[email protected]>

snnn mentioned this pull request

🐛 Bug Report: ONNXRuntime 1.23 fails to load model with external data (works fine in 1.22.x) #26261

Closed

snnn added a commit that referenced this pull request


          Fix shape inference failure with in-memory external data

a04b8a8

When Constant nodes have tensors larger than 127 bytes, they are converted
to OrtValues with in-memory external data for efficiency. However, ONNX
shape inference rejects TensorProtos with data_location=EXTERNAL, as it
cannot distinguish between in-memory and file-based external data.

This fix modifies InferenceContextImpl::getInputData() to detect in-memory
external data and materialize it into a temporary TensorProto with embedded
data that ONNX shape inference can process.

Fixes #26261

The issue was introduced in commit 3b97d79 (PR #25320) which converted
large initializers to OrtValues. This regression caused models with Constant
nodes having tensors just over 127 bytes to fail loading with shape
inference errors.

Changes:
- Modified getInputData() to check for in-memory external data using
  utils::HasExternalDataInMemory()
- When detected, retrieves the OrtValue and creates a temporary TensorProto
  with embedded data (use_tensor_buffer=false)
- Added temp_tensor_protos_ member to store these temporary protos so they
  outlive the shape inference call

snnn mentioned this pull request

Fix shape inference failure with in-memory external data #26263

Merged

2 tasks

yuslepukhin added a commit that referenced this pull request


          Fix shape inference failure with in-memory external data (#26263)

aafdb3a

## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>

yuslepukhin mentioned this pull request

Save much memory at model loading time by converting weights to OrtValues early #26345

Merged

apsonawane pushed a commit that referenced this pull request


          Fix shape inference failure with in-memory external data (#26263)

d955476

## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>

apsonawane pushed a commit that referenced this pull request


          Fix shape inference failure with in-memory external data (#26263)

7559c06

## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>

fs-eire pushed a commit that referenced this pull request


          Fix shape inference failure with in-memory external data (#26263)

22975de

## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>

yuslepukhin added a commit that referenced this pull request


          Save much memory at model loading time by converting weights to OrtVa…

b860988

…lues early (#26345)

### Description
Converts weights early and revert "Properly remove in-memory references
(#25652)"
This reverts commit 3ca49d8 and makes
appropriate adjustments for the current state of the code.

This PR is made possible and on the heels of:
#26263
#25833.

Previous history:
#23979
#25320
#25626
#25652

The first change (#26263)
allows us to convert initializers to OrtValues early and save lots of
memory at model loading time.

Specifically, for Phi-4-mini-instruct-INT4 model before and after looks
like this:

**Before**
<img width="1204" height="124" alt="Before change DEBUG 2025-10-16
144819"
src="https://github.com/user-attachments/assets/674ff75b-057f-498a-a906-0140d59d46e6"
/>

**After**

<img width="997" height="114" alt="After change DEBUG 2025-10-16 144819"
src="https://github.com/user-attachments/assets/df1783af-7f50-4cd2-b3ad-6868f23be53f"
/>

The two peaks represent memory usage at optimization time (8.1Gb before)
and after weights memory mapping (6.5Gb)
After this change corresponding numbers look 3.5Gb and 4.7Gb
respectively.
Most of the savings during optimization phase come from
`ConstantFolding` where we are able to reuse the resulting OrtValues
directly for the new initializers.

This PR concludes a series of PRs converting initializers to OrtValues.

Memory consumption before the conversion began was 9.3Gb and 6.7Gb
respectively. We are saving almost 6Gb during optimization and 2Gb for
the steady state.
 
 
<img width="1175" height="139" alt="image"
src="https://github.com/user-attachments/assets/80e7d228-8a8e-4316-8e04-b02c2be30f04"
/>

The model also loads about 12 seconds faster.

Example of ConstantFolding being one of the top contributors where we
duplicate memory for higher peak before Resolve takes care of no longer
used initializers.
<img width="1100" height="558" alt="Sanpshot 3 Peak on ConstantFolding
Transpose Optimizer"
src="https://github.com/user-attachments/assets/95545abd-3f99-46d9-862e-bbf27cbb5b40"
/>

<img width="1060" height="600" alt="Snapshot 4 Peak AddInitializer from
ConstantFolding"
src="https://github.com/user-attachments/assets/dd457ec6-23ee-4efd-8c60-625d5faad61e"
/>

<img width="325" height="160" alt="image"
src="https://github.com/user-attachments/assets/37c1194d-f683-49a7-afb1-073dfbb9bbfc"
/>


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Reduce memory usage.

naomiOvad pushed a commit to naomiOvad/onnxruntime that referenced this pull request


          Fix shape inference failure with in-memory external data (microsoft#2…

46ecb69

…6263)

## Description

Fixes microsoft#26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR microsoft#25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue microsoft#26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet