Breaking revamp of Rust bindings for v3 #670

tnibler · 2025-10-28T14:14:59Z

Related to #574 , draft for the v3 Rust bindings. If we're gonna break APIs, let's just go all the way :D

The goal is to prevent unsoundness from the safe Rust interface, turning as many illegal operations as possible into compile errors , and catch the others at runtime.

Sorry for the mega-PR, but I don't know if it makes sense to split this up since all changes kind of affect everything.

Error handling, type safety

In general: existing methods on VectorType become unsafe fn xxx_unchecked, and safe variants are added that check all required invariants.

There's some ambiguity as to what is actually invalid/unsafe here. As far as I can tell, using an Index created with one ScalarKind (say int8) as one of a different type (like inserting an f64 vector) does not actually cause any memory unsafety. It can garble data though, which I'd say is enough to make the operation unsafe (pretty common in the ecosystem AFAIK?).

So I've added a check that the casts that will be done on the C++ make sense, but it would be good to have more opinions on which ones are ok and which ones are almost certainly mistakes. Some are easy (f32 to f16, bf16 is ok, f64 to b1x8 is not for example), others less so.

Safe `Index` views into file, buffer

Currently you can have an Index reading a buffer passed in by the user, which will hopefully stay live as long as the Index. Though the method is marked unsafe, it's easy to create a dangling pointer situation which is unfortunate in Rust of all places. Also, these immutable Index views will throw exceptions if any mutating methods are called on them.

My proposed solution is to separate the regular, mutable Index that owns its memory and the immutable IndexView into different types. This way IndexView can be associated with a lifetime so the backing buffer can't be dropped while it's being used, and the IndexView object (holding a pointer) can not be moved between threads.

While we're at it, splitting mutating and readonly methods on Index into different traits allows IndexView to only expose read methods and prevent some runtime exceptions this way.

Custom metric closure shenanigans

This part could maybe be split into a different PR. First a lot of duplicated code can be written in a generic way, which is always nice. There's also a memory leak pointed out in #629 (fixed by properly dropping the currently set metric closure when it's overwritten).

I only noticed that because I wanted to see how ergonomics could be improved here. Currently custom metrics are boxed closures taking *const T arguments, which would be very nice to instead turn into &[T] slices of correct length/dimensionality. Also, closures are currently double-boxed because Box<dyn Fn..> is a wide pointer, and metric_punned_t wants a regular pointer, so there's three (if I'm counting correctly) pointer dereferences per call to the metric function (trampoline -> pointer to wide pointer -> dynamic dispatch for dyn Fn trait object).

I think it's possible to add a shim in there to turn the pointers into slices without adding another indirection (currently closure_address is a pointer to the Box<dyn>, when it could instead point to the trait object/wide pointer instead and the trampoline casts that into a valid trait object reference). But I'm not sure if it's worth it when you could also just allow the user a regular fn (&[T], &[T]) -> Distance pointer, which avoid a lot of that and covers many (most?) use cases just fine.

FFI and closures is a difficult topic though so it would be cool for other to weigh in and check what I'm saying is correct.

Every method has a Sized bound anyway, so might as well put it a level higher. Also mark VectorType unsafe, since implementors must follow safety invariants, or else the entire program becomes unsound.

ashvardanian · 2025-11-01T13:00:52Z

Thanks for great suggestions, @tnibler! I'm fine with mega-PRs staged for major library rewrites. Assuming we are targeting the v3, it will take some time to fully rewrite the lib, but I'll keep you posted there. Your suggestions are invaluable 🤗

Before USearch v3, I'll be releasing ForkUnion v3 and UCall v1. The former has a Rust bindings as well, in case you have opinions on better parallel/concurrent abstractions for Rust 🦀

ashvardanian · 2025-11-01T13:03:20Z

P.S.: I'd love to include your entire commit history and thought process around different abstractions. It would be great if we could follow the same commit naming structure shared across the history of all projects I maintain, to be compatible with TinySemVer automation. Let's prefix the commits with intent verbs, like Add:, Improve:, Fix:, Docs:, Chore:, or Break: 🤗

flake.nix

tnibler · 2025-11-01T14:48:23Z

Oh yes you're right the commits are mess, none of this is final I was just looking for input and okay/disagreement on the rought direction. I'll rebase things into a neat order :)

The rough thought process:

If some methods need/want safe wrappers with error checks around the FFI call, it doesn't really cost anything to have the safe/unsafe _unchecked for all/most other methods as well even if they don't do anything special. That way, changes to requirements/invariants for the C++ API can happen without breaking the entire safe Rust API, since everything returns Result and users have to handle errors. Just gives a bit more flexibility for semver and everything.
The safety checks are also mostly the same for all VectorType implementors, so just one default implementation in the trait covers all of them which is nice.
Readonly/Read-write distinction at the type/trait level with Index and IndexView prevents illegal mutation calls to immutable Index at compile time
Separate Index and IndexView types also allows tracking the lifetime of the backing buffer for mmapped/in-memory views without adding a lifetime parameter to Index which would be really annoying.

Re: casting/number type conversion checks, they're just nice to have. They have to be conservative and really only prevent the obviously nonsensical casts.

It has to be pub since it's exposed through the public VectorType trait, but details of how FFI calls, trampolines etc are handled probably shouldn't be part of the public API.

This allows us to remove the unsafe `view_from_buffer` function by introducing a new IndexView type that is associated with the buffer's lifetime. It also prevents invalid usage of immutable Index instances, which would throw C++ exceptions if e.g., add() was called on an immutable view.

tnibler added 2 commits October 28, 2025 12:48

DRAFT: error handling, safe Rust methods

5e7ba0f

Add Sized bound to VectorType trait, mark trait unsafe

0b0f9b5

Every method has a Sized bound anyway, so might as well put it a level higher. Also mark VectorType unsafe, since implementors must follow safety invariants, or else the entire program becomes unsound.

tnibler force-pushed the safe-unsafe-rust branch from b558507 to a69e121 Compare October 28, 2025 21:08

ashvardanian added the v3 Breaking changes planned for v3 label Nov 1, 2025

ashvardanian marked this pull request as ready for review November 1, 2025 13:03

ashvardanian reviewed Nov 1, 2025

View reviewed changes

flake.nix Outdated Show resolved Hide resolved

tnibler added 5 commits November 1, 2025 15:49

Use generic change_metric_unchecked method, remove copy-paste impls

7612efa

Add (failing) test for memory leak in Index::change_metric

3f585d0

Fix leaked metric closure in change_metric

e8eb793

Mark MetricFunctionPtr as doc(hidden)

5532ef1

It has to be pub since it's exposed through the public VectorType trait, but details of how FFI calls, trampolines etc are handled probably shouldn't be part of the public API.

tnibler force-pushed the safe-unsafe-rust branch from a69e121 to 18c786b Compare November 1, 2025 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Breaking revamp of Rust bindings for v3 #670

Breaking revamp of Rust bindings for v3 #670

Uh oh!

tnibler commented Oct 28, 2025

Uh oh!

ashvardanian commented Nov 1, 2025

Uh oh!

ashvardanian commented Nov 1, 2025

Uh oh!

Uh oh!

tnibler commented Nov 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Breaking revamp of Rust bindings for v3 #670

Are you sure you want to change the base?

Breaking revamp of Rust bindings for v3 #670

Uh oh!

Conversation

tnibler commented Oct 28, 2025

Error handling, type safety

Safe Index views into file, buffer

Custom metric closure shenanigans

Uh oh!

ashvardanian commented Nov 1, 2025

Uh oh!

ashvardanian commented Nov 1, 2025

Uh oh!

Uh oh!

tnibler commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Safe `Index` views into file, buffer

tnibler commented Nov 1, 2025 •

edited

Loading