Skip to content

Conversation

ludfjig
Copy link
Contributor

@ludfjig ludfjig commented Sep 5, 2025

This PR improves error handling between host and guest functions to prevent memory leaks and ensure more reliable deserialization.

Previously, when a host function (invoked from the guest) returned an error, the error was reported immediately without unwinding the guest stack. This left guest-side allocations in an inconsistent state, leading to memory leaks on subsequent entries into the guest.

In addition, error reporting relied on a fragile mechanism: guest errors were manually serialized into a buffer, and the host would attempt to detect them by trying to deserialize an error. If deserialization succeeded, an error was assumed to have occurred. This approach is risky because there was nothing preventing GuestError and FunctionCallResult from possibly having the same serialized format since they are completely separate.

Changes

  • Guest/host function calls now always return a FunctionCallResult, which explicitly represents either Ok or Err.
  • If host function returns an error, it's serialized back into the guest, and the guest will properly unwind, and report it back to the host, fixing a memory leak.

TODO:

For C guests: If host function returns an error, guest will panic when trying to read an expected good value (because there is only an error instead), and leak memory as a result. Will fix this in follow up PR

Closes #826
Closes #497

@ludfjig ludfjig force-pushed the host_error_leak_fix branch 2 times, most recently from d346b32 to 2e45037 Compare September 8, 2025 18:07
@ludfjig ludfjig added the kind/bugfix For PRs that fix bugs label Sep 8, 2025
@ludfjig ludfjig force-pushed the host_error_leak_fix branch 6 times, most recently from 8818cf5 to ae355d7 Compare September 9, 2025 18:10
@ludfjig ludfjig marked this pull request as ready for review September 9, 2025 18:20
Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! overall this looks like an improvement but others should probably take a look as this was my first time reviewing this code.

assert!(
matches!(&res, HyperlightError::GuestError(_, msg) if msg == "Host function error!") // rust guest
|| matches!(&res, HyperlightError::GuestAborted(_, msg) if msg.contains("Host function error!")) // c guest
|| matches!(&res, HyperlightError::StackOverflow()) // c guest. TODO fix this. C guest leaks when host func returns error guest panics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have an issue for this? was this the case prior?

Copy link
Contributor Author

@ludfjig ludfjig Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the case prior. Feedback appreciated how to make this a nicer C-API (for all types, not just i32):

#[unsafe(no_mangle)]
pub extern "C" fn hl_get_host_return_value_as_Int() -> i32 {
    get_host_return_value().expect("Unable to get host return value as int")
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we mimic something like wasmtime does? https://github.com/bytecodealliance/wasmtime/blob/7380932631f7784d944cb0326a6ffaaf5dac29fc/crates/c-api/src/component/val.rs#L180-L185

I don't think this needs to be a blocker as we are in this state today

- Add all.fbs to include all schema files in one place
- Restructure function_call_result.fbs to use Result-like union
- Add HostError variant to ErrorCode enum in guest_error.fbs
- Update flatbuffer generation command in Justfile to use all.fbs
- Update documentation for new generation process

Signed-off-by: Ludvig Liljenberg <[email protected]>
Update all generated Rust code based on the new schema definitions.
This includes new types for error handling and result structures.

Signed-off-by: Ludvig Liljenberg <[email protected]>
- Update function_types.rs to handle Result-like return values
- Simplify guest_error.rs wrapper implementation
- Update util.rs for new generated types
- Update mod.rs  for new generated types

Signed-off-by: Ludvig Liljenberg <[email protected]>
- Remove guest_err.rs from hyperlight_host (replaced by new error handling)
- Remove guest_err.rs from hyperlight_guest_bin (replaced by new error handling)
- Update func/mod.rs to remove obsolete import

Signed-off-by: Ludvig Liljenberg <[email protected]>
- Update initialized_multi_use.rs to use new Result-like error handling
- Update mem/mgr.rs to handle host function errors properly
- Update sandbox/outb.rs for new error propagation pattern

Signed-off-by: Ludvig Liljenberg <[email protected]>
- Update guest/host_comm.rs to use new Result-like return values
- Update guest_bin/call.rs to properly handle host function errors
- Update guest_bin/lib.rs to remove obsolete error handling import and make GUEST_HANDLE public (for use in C-API)
- Update guest_capi/error.rs to support new error types

Signed-off-by: Ludvig Liljenberg <[email protected]>
Update sandbox_host_tests.rs to use the new Result-like error handling pattern.

Signed-off-by: Ludvig Liljenberg <[email protected]>
Update Cargo.lock and Cargo.toml files to reflect the dependency changes
needed for the new error handling implementation.

Signed-off-by: Ludvig Liljenberg <[email protected]>
Signed-off-by: Ludvig Liljenberg <[email protected]>
Signed-off-by: Ludvig Liljenberg <[email protected]>
@ludfjig ludfjig force-pushed the host_error_leak_fix branch from 6ecad00 to 3e17197 Compare September 29, 2025 18:50
jsturtevant
jsturtevant previously approved these changes Sep 29, 2025
Signed-off-by: Ludvig Liljenberg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bugfix For PRs that fix bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory Leak in Guest on Error Calling Host Function Confusing ERROR tracing when not actually an error
2 participants