[NV TRT RTX EP] Reconfigure memory arena to grow with power of 2 #25800

gedoensmax · 2025-08-20T20:21:35Z

This reconfiguration is done to NOT allocate tensors with an exact matching size. If that strategy is used a tensor will always trigger an allocation in the arena and not reuse memory since the memory size has to exactly match.
This became a big problem with ORT GenAI since the arena grew constantly when prompting with different prompt lengths. No arena shrinkage was triggered to return older tensors. @skottmckay I am happy to be educated of a better usage of the allocators.

Issues with this:
Since the arena is not used for workspace allocations anymore (using reserve) it will likely not be possible in the future to allocate on a stream and immediately free memory after an enqueue call. That could have enabled workspace sharing in a multi model pipeline very nicely.

@chilo-ms can you help merge this.

…llocation

chilo-ms · 2025-08-20T22:19:42Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-08-20T22:20:02Z

Azure Pipelines successfully started running 5 pipeline(s).

@skottmckay

) This reconfiguration is done to NOT allocate tensors with an exact matching size. If that strategy is used a tensor will always trigger an allocation in the arena and not reuse memory since the memory size has to exactly match. This became a big problem with ORT GenAI since the arena grew constantly when prompting with different prompt lengths. No arena shrinkage was triggered to return older tensors. @skottmckay I am happy to be educated of a better usage of the allocators. Issues with this: Since the arena is not used for workspace allocations anymore (using reserve) it will likely not be possible in the future to allocate on a stream and immediately free memory after an enqueue call. That could have enabled workspace sharing in a multi model pipeline very nicely. @chilo-ms can you help merge this.

### Description Cherry-pick the following PRs into the `rel-1.23.0` branch: - #25592 - #25622 - #25688 - #25729 - #25743 - #25769 - #25745 - #25761 - #25751 - #25716 - #25228 - #25768 - #25788 - #25747 - #25800 - #25818 - #25762 - #25749 - #25831 ### Motivation and Context  --------- Co-authored-by: quic-tirupath <[email protected]> Co-authored-by: quic-calvnguy <[email protected]> Co-authored-by: qti-kromero <[email protected]> Co-authored-by: Jeff Kilpatrick <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: David Fan <[email protected]> Co-authored-by: kuanyul-qti <[email protected]> Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Chi Lo <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Chunye Wang@AMD <[email protected]> Co-authored-by: minfhong-qti <[email protected]> Co-authored-by: Vishal Agarwal <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Changming Sun <[email protected]> Co-authored-by: adrastogi <[email protected]> Co-authored-by: Aditya Rastogi <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

@skottmckay

…rosoft#25800) This reconfiguration is done to NOT allocate tensors with an exact matching size. If that strategy is used a tensor will always trigger an allocation in the arena and not reuse memory since the memory size has to exactly match. This became a big problem with ORT GenAI since the arena grew constantly when prompting with different prompt lengths. No arena shrinkage was triggered to return older tensors. @skottmckay I am happy to be educated of a better usage of the allocators. Issues with this: Since the arena is not used for workspace allocations anymore (using reserve) it will likely not be possible in the future to allocate on a stream and immediately free memory after an enqueue call. That could have enabled workspace sharing in a multi model pipeline very nicely. @chilo-ms can you help merge this.

reconfigure the preferred allocator to not infer with tensor memory a…

3ff4ac6

…llocation

chilo-ms added ep:NvRTX NV RTX execution provider release:1.23.0 labels Aug 20, 2025

jywu-msft approved these changes Aug 22, 2025

View reviewed changes

jywu-msft merged commit 63c1d1a into microsoft:main Aug 22, 2025
86 checks passed

adrianlizarraga mentioned this pull request Aug 22, 2025

ORT 1.23.0 cherry-pick prs 25592 - 25831 #25805

Merged

jywu-msft removed the release:1.23.0 label Aug 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NV TRT RTX EP] Reconfigure memory arena to grow with power of 2 #25800

[NV TRT RTX EP] Reconfigure memory arena to grow with power of 2 #25800

Uh oh!

gedoensmax commented Aug 20, 2025

Uh oh!

chilo-ms commented Aug 20, 2025

Uh oh!

azure-pipelines bot commented Aug 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[NV TRT RTX EP] Reconfigure memory arena to grow with power of 2 #25800

[NV TRT RTX EP] Reconfigure memory arena to grow with power of 2 #25800

Uh oh!

Conversation

gedoensmax commented Aug 20, 2025

Uh oh!

chilo-ms commented Aug 20, 2025

Uh oh!

azure-pipelines bot commented Aug 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants