Save much memory at model loading time by converting weights to OrtValues early #26345

yuslepukhin · 2025-10-17T20:31:06Z

Description

Converts weights early and revert "Properly remove in-memory references (#25652)"
This reverts commit 3ca49d8 and makes appropriate adjustments for the current state of the code.

This PR is made possible and on the heels of:
#26263
#25833.

Previous history:
#23979
#25320
#25626
#25652

The first change (#26263) allows us to convert initializers to OrtValues early and save lots of memory at model loading time.

Specifically, for Phi-4-mini-instruct-INT4 model before and after looks like this:

Before

After

The two peaks represent memory usage at optimization time (8.1Gb before) and after weights memory mapping (6.5Gb)
After this change corresponding numbers look 3.5Gb and 4.7Gb respectively.
Most of the savings during optimization phase come from ConstantFolding where we are able to reuse the resulting OrtValues directly for the new initializers.

This PR concludes a series of PRs converting initializers to OrtValues.

Memory consumption before the conversion began was 9.3Gb and 6.7Gb respectively. We are saving almost 6Gb during optimization and 2Gb for the steady state.

The model also loads about 12 seconds faster.

Example of ConstantFolding being one of the top contributors where we duplicate memory for higher peak before Resolve takes care of no longer used initializers.

Snapshot 4 Peak AddInitializer from ConstantFolding

Motivation and Context

Reduce memory usage.

This reverts commit 3ca49d8. It also makes adjustments for the current source code state.

Copilot

Pull Request Overview

This PR saves significant memory during model loading by converting weight initializers to OrtValues early in the graph construction process, rather than later during graph transformation. The changes revert previous logic that deferred this conversion and implements early weight conversion at graph initialization time. The PR demonstrates dramatic memory savings during optimization phases (from 8.1GB to 3.5GB for Phi4 Instruct model) by enabling reuse of OrtValues during constant folding operations.

Key changes:

Early conversion of large initializers to OrtValues during graph construction
Update of all graph transformation code to use AddInitializerWithExternalData instead of AddInitializer
Removal of deferred initializer conversion logic from session inference flow

Reviewed Changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
onnxruntime/core/graph/graph.cc	Implements early initializer-to-OrtValue conversion during graph construction
include/onnxruntime/core/graph/graph.h	Removes ConvertInitializersIntoOrtValues method declaration
onnxruntime/core/session/inference_session.cc	Removes deferred initializer conversion call from transform pipeline
onnxruntime/core/optimizer/*.cc	Updates optimizer classes to use AddInitializerWithExternalData for new initializers
orttraining/orttraining/core/optimizer/*.cc	Updates training optimizer classes to use AddInitializerWithExternalData
onnxruntime/test/ir/graph_test.cc	Removes test code that called the now-removed conversion method
onnxruntime/test/framework/cuda/fence_cuda_test.cc	Updates test utility to use AddInitializerWithExternalData

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

onnxruntime/core/framework/session_state_utils.cc

onnxruntime/core/graph/graph.cc

onnxruntime/core/optimizer/attention_fusion.cc

onnxruntime/test/framework/session_state_test.cc

onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc

onnxruntime/core/graph/graph_utils.cc

onnxruntime/core/graph/graph.cc

Revert "Properly remove in-memory references (#25652)"

13d2c37

This reverts commit 3ca49d8. It also makes adjustments for the current source code state.

yuslepukhin requested review from adrianlizarraga, Copilot, fs-eire and skottmckay October 17, 2025 20:31

Copilot AI reviewed Oct 17, 2025

View reviewed changes

yuslepukhin marked this pull request as ready for review October 17, 2025 21:43

yuslepukhin added 4 commits October 20, 2025 11:30

Debug and disable invalid test

17fb9a5

Move

032b5a3

Do not allocate CPU initialziers on Arena and save some space

a15b6a5

Adjust tests for Reserve calls

7e3c38e

yuslepukhin mentioned this pull request Oct 24, 2025

[Performance] Support sharing GPU memory for same tensor data that is consumed by different operators of same ep to reduce memory usage #26208

Open

Merge branch 'main' into yuslepukhin/convert_weights_early

1e741dd

skottmckay reviewed Oct 25, 2025

View reviewed changes

Address review comments

18127ac