Skip to content

Conversation

@adrianlizarraga
Copy link
Contributor

@adrianlizarraga adrianlizarraga commented Jul 19, 2025

Description

  • Adds ModelCompilationOptions_SetOutputModelWriteFunc to the compile API to allow writing the output model ONNX bytes to a user-provided write function (i.e., for streaming).
  • Adds ModelCompilationOptions_SetOutputModelHandleInitializerFunc to the compile API to allow the user to write individual initializers to some destination. Also allows specifying if an initializer should be embedded within the ONNX model or written to a custom file.
  • Adds C++, Python, and C# bindings for the new APIs.

A follow-up PR adds a write function for EPContext node binary data: #25471

Example

ModelCompilationOptions_SetOutputModelWriteFunc:

static OrtStatus* ORT_API_CALL TestWriteToStream(void* stream_state, const void* buffer, size_t buffer_num_bytes) {
std::ofstream* outfile = reinterpret_cast<std::ofstream*>(stream_state);
outfile->write(reinterpret_cast<const char*>(buffer), buffer_num_bytes);
return nullptr; // No error
}
// Implementation of OrtOutStreamWriteFunc that directly returns an OrtStatus indicating an error.
static OrtStatus* ORT_API_CALL ReturnStatusFromStream(void* stream_state, const void* buffer, size_t buffer_num_bytes) {
ORT_UNUSED_PARAMETER(stream_state);
ORT_UNUSED_PARAMETER(buffer);
ORT_UNUSED_PARAMETER(buffer_num_bytes);
return Ort::GetApi().CreateStatus(ORT_FAIL, "Error from OrtOutStreamWriteFunc callback");
}
// Test using the CompileModel() API with settings:
// - input model comes from a file
// - write output model to custom write stream
TEST_F(QnnHTPBackendTests, CompileApi_InputFile_WriteOutputModelBytes) {
const ORTCHAR_T* input_model_file = ORT_TSTR("./compileapi_inputfile_writeoutputmodelbytes.onnx");
std::filesystem::remove(input_model_file);
// Create a test model and save it to a file.
TestModel test_model;
CreateTestModel(BuildGraphWithQAndNonQ(false), 21, logging::Severity::kERROR, test_model);
ASSERT_STATUS_OK(test_model.Save(input_model_file));
// Initialize session options with QNN EP
Ort::SessionOptions so;
ProviderOptions provider_options;
provider_options["backend_type"] = "htp";
provider_options["offload_graph_io_quantization"] = "0";
so.AppendExecutionProvider("QNN", provider_options);
const ORTCHAR_T* output_model_file = ORT_TSTR("compileapi_inputfile_writeoutputmodelbytes_ctx.onnx");
std::filesystem::remove(output_model_file);
// Open an output file. Test will incrementally write the output model to file
// via calls to our OrtOutStreamWriteFunc callback.
ASSERT_FALSE(std::filesystem::exists(output_model_file));
std::ofstream outfile(output_model_file, std::ios::binary);
// Create model compilation options from the session options.
Ort::ModelCompilationOptions compile_options(*ort_env, so);
compile_options.SetInputModelPath(input_model_file);
compile_options.SetOutputModelWriteFunc(TestWriteToStream, reinterpret_cast<void*>(&outfile));
compile_options.SetEpContextEmbedMode(true);
// Compile the model.
Ort::Status status = Ort::CompileModel(*ort_env, compile_options);
ASSERT_TRUE(status.IsOK()) << status.GetErrorMessage();
outfile.flush();
outfile.close();
// Check that the compiled model has the expected number of EPContext nodes.
ASSERT_TRUE(std::filesystem::exists(output_model_file));
CheckEpContextNodeCounts(output_model_file, 2, 2);
}

ModelCompilationOptions_SetOutputModelHandleInitializerFunc:

struct CustomInitializerHandlerState {
const ORTCHAR_T* external_file_path = nullptr;
std::ofstream* outfile = nullptr;
};
static OrtStatus* ORT_API_CALL TestHandleInitializerDataFunc(void* state,
const char* initializer_name,
const OrtValue* initializer_value,
const OrtExternalInitializerInfo* /*external_info*/,
OrtExternalInitializerInfo** new_external_info) {
const OrtApi& ort_api = Ort::GetApi();
CustomInitializerHandlerState* custom_state = reinterpret_cast<CustomInitializerHandlerState*>(state);
if (std::string("constant") == initializer_name) {
// Keep a specific initializer in the model just to test both scenarios.
// A real implementation may check the byte size and keep small initializers in the model.
*new_external_info = nullptr;
return nullptr;
}
//
// Store other initializers in an external file.
//
// Get initializer's byte size
size_t byte_size = 0;
if (OrtStatus* status = ort_api.GetTensorSizeInBytes(initializer_value, &byte_size); status != nullptr) {
return status;
}
// Get initializer's data.
const void* initializer_data = nullptr;
if (OrtStatus* status = ort_api.GetTensorData(initializer_value, &initializer_data); status != nullptr) {
return status;
}
// Write initializer data to some file.
int64_t offset = custom_state->outfile->tellp();
const ORTCHAR_T* location = custom_state->external_file_path;
custom_state->outfile->write(static_cast<const char*>(initializer_data), byte_size);
custom_state->outfile->flush();
// Provide caller (ORT) with the new external info.
if (OrtStatus* status = ort_api.CreateExternalInitializerInfo(location, offset, byte_size, new_external_info);
status != nullptr) {
return status;
}
return nullptr;
}
// Test using the CompileModel() API with settings:
// - input model comes from a file
// - write output model to a file
// - Use callback to specify where each initializer is stored (i.e., external file or within model).
TEST_F(QnnHTPBackendTests, CompileApi_InputFile_OutputFile_InitializerHandler) {
const ORTCHAR_T* input_model_file = ORT_TSTR("./compileapi_inputfile_outputfile_initializerhandler.onnx");
const ORTCHAR_T* output_model_file = ORT_TSTR("./compileapi_inputfile_outputfile_initializerhandler_ctx.onnx");
const ORTCHAR_T* initializer_file = ORT_TSTR("./compileapi_inputfile_outputfile_initializerhandler.bin");
std::filesystem::remove(input_model_file);
std::filesystem::remove(output_model_file);
std::filesystem::remove(initializer_file);
// Create a test model and save it to a file.
TestModel test_model;
CreateTestModel(BuildGraphWithQAndNonQ(false), 21, logging::Severity::kERROR, test_model);
ASSERT_STATUS_OK(test_model.Save(input_model_file));
// Initialize session options with QNN EP
Ort::SessionOptions so;
ProviderOptions provider_options;
provider_options["backend_type"] = "htp";
provider_options["offload_graph_io_quantization"] = "0";
so.AppendExecutionProvider("QNN", provider_options);
// Open a file to store external initializers. ORT will call our handler function for every initializer.
ASSERT_FALSE(std::filesystem::exists(initializer_file));
std::ofstream outfile(initializer_file, std::ios::binary);
CustomInitializerHandlerState custom_state = {initializer_file, &outfile};
// Create model compilation options from the session options.
Ort::ModelCompilationOptions compile_options(*ort_env, so);
compile_options.SetInputModelPath(input_model_file);
compile_options.SetOutputModelPath(output_model_file);
compile_options.SetOutputModelHandleInitializerFunc(TestHandleInitializerDataFunc,
reinterpret_cast<void*>(&custom_state));
compile_options.SetEpContextEmbedMode(true);
// Compile the model.
Ort::Status status = Ort::CompileModel(*ort_env, compile_options);
ASSERT_TRUE(status.IsOK()) << status.GetErrorMessage();
outfile.flush();
outfile.close();
ASSERT_TRUE(std::filesystem::exists(initializer_file));
ASSERT_TRUE(std::filesystem::exists(output_model_file));
CheckEpContextNodeCounts(output_model_file, 2, 2);
}
static OrtStatus* ORT_API_CALL ReuseExternalInitializers(void* state,
const char* /*initializer_name*/,
const OrtValue* /*initializer_value*/,
const OrtExternalInitializerInfo* external_info,
OrtExternalInitializerInfo** new_external_info) {
// If the original initializer was stored in an external file, keep it there (just for testing).
if (external_info != nullptr) {
Ort::ConstExternalInitializerInfo info(external_info);
auto location = info.GetFilePath();
int64_t offset = info.GetFileOffset();
size_t byte_size = info.GetByteSize();
Ort::ExternalInitializerInfo new_info(nullptr);
Ort::Status status = Ort::ExternalInitializerInfo::Create(location.c_str(), offset, byte_size, new_info);
if (!status.IsOK()) {
return status.release();
}
*new_external_info = new_info.release();
// Keep track of number of reused external initializers so that we can assert
// that we reused the expected number of initializers.
// THIS IS TEST CODE. An application would not do this.
size_t* num_reused_ext_initializers = reinterpret_cast<size_t*>(state);
*num_reused_ext_initializers += 1;
return nullptr;
}
// If not originally external, save it within the generated compiled model
*new_external_info = nullptr;
return nullptr;
}

Motivation and Context

Add output streaming capabilities when saving compiled models.

@adrianlizarraga adrianlizarraga merged commit 8705c68 into main Sep 4, 2025
90 of 93 checks passed
@adrianlizarraga adrianlizarraga deleted the adrianl/compile-api-output-stream branch September 4, 2025 20:10
tianleiwu pushed a commit that referenced this pull request Sep 4, 2025
)

### Description
- Adds `ModelCompilationOptions_SetOutputModelWriteFunc` to the compile
API to allow writing the output model ONNX bytes to a user-provided
write function (i.e., for streaming).
- Adds `ModelCompilationOptions_SetOutputModelHandleInitializerFunc` to
the compile API to allow the user to write individual initializers to
some destination. Also allows specifying if an initializer should be
embedded within the ONNX model or written to a custom file.
- Adds C++, Python, and C# bindings for the new APIs.

A follow-up PR adds a write function for EPContext node binary data:
#25471

### Example
`ModelCompilationOptions_SetOutputModelWriteFunc`:
https://github.com/microsoft/onnxruntime/blob/c62ed23c328cbbfefd3083c1f7a6ced604772c19/onnxruntime/test/providers/qnn/qnn_ep_context_test.cc#L2075-L2131

`ModelCompilationOptions_SetOutputModelHandleInitializerFunc`:

https://github.com/microsoft/onnxruntime/blob/c62ed23c328cbbfefd3083c1f7a6ced604772c19/onnxruntime/test/providers/qnn/qnn_ep_context_test.cc#L2160-L2292

### Motivation and Context
Add output streaming capabilities when saving compiled models.
@tianleiwu tianleiwu added cherry-picked Cherry-picked for a cherrypicks branch and removed release:1.23.0 labels Sep 4, 2025
jywu-msft pushed a commit that referenced this pull request Sep 5, 2025
### Description
Cherry-pick the following PRs:
#25943
#25937 
#25917
#25909
#25898
#25897
#25888
#25881
#25830
#25619
#25575
#25572
#25558
#25530
#25474
#25455
#25110

Also two dependent PRs for qMoE cpu: 
#25877
#25822

---------

Co-authored-by: xiaomsft <[email protected]>
Co-authored-by: Xiaoyan Hu <[email protected]>
Co-authored-by: Akshay Sonawane <[email protected]>
Co-authored-by: Kunal Vaishnavi <[email protected]>
Co-authored-by: Pradeep Sakhamoori <[email protected]>
Co-authored-by: mingyue <[email protected]>
Co-authored-by: Maximilian Müller <[email protected]>
Co-authored-by: Adrian Lizarraga <[email protected]>
Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Emmanuel <[email protected]>
Co-authored-by: Emmanuel Assumang <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: praneshgo <[email protected]>
Co-authored-by: Hariharan Seshadri <[email protected]>
Co-authored-by: Jing Fang <[email protected]>
Co-authored-by: Ishwar Raut <[email protected]>
adrianlizarraga added a commit that referenced this pull request Oct 31, 2025
…#26439)

### Description
Fixes #26294

When using the old model compilation approach (session option
configuration), ORT should verify that the generated output model does
not already exist. Importantly, this check should be done _before_
calling an EP's compile() method. This PR fixes this check, which was
unintentionally disabled by a [previous
PR.](#25455).

Note that this check also (correctly) happens _after_ calling the EP's
compile() method, but it is better to catch it early if we can.



### Motivation and Context
Fixes a regression in the older compilation workflow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked Cherry-picked for a cherrypicks branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants