-
-
Notifications
You must be signed in to change notification settings - Fork 781
feat(allocator): add String::from_strs_array_in
#9329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(allocator): add String::from_strs_array_in
#9329
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
0c71bac to
36a90a6
Compare
String::from_array_inString::from_array_in
CodSpeed Performance ReportMerging #9329 will not alter performanceComparing Summary
|
overlookmotel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea for this API is good. And we can also make it more performant with judicious use of some unsafe code! (to remove bounds checks)
However, the implementation I'm pretty sure is not correct. Add a test, and you'll see what I mean...
36a90a6 to
b990c9c
Compare
Ah, I see the problem, I reimplemented it. But I am not sure this if it has become more performant. |
overlookmotel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is, in my opinion, exactly how unsafe code should be used - enclosed in a small function where you can understand the logic and check it's sound, and presenting a safe interface to user, so they don't have to worry about checking that logic again every time they use it.
In addition to my suggested optimization below, I'd suggest:
Add more // SAFETY comments. Usually I go through the docs for the unsafe method that I'm calling and make a comment that "ticks off" each safety requirement one by one - and explains for each one how the code guarantees that requirement is satisfied.
This is boring and lengthy, but has 2 advantages: (a) it's a rigorous approach which makes it harder to miss anything and (b) it makes it easier for someone else to review the correctness of the logic.
For example, if one of the string is "", I'm wondering if it is legal to call ptr::copy_nonoverlapping with len = 0? Your comments don't reveal whether (a) you checked the docs and it is legal or (b) we don't know if it is or not. That kind of ambiguity isn't ideal for unsafe code.
Secondly, unsafe code should ideally have unit tests, and they should include edge cases e.g. empty array [&str; 0] or empty strings ["", "x", ""].
|
One other thing... What if the total length of the strings together exceeds That's not really possible on 64-bit machines, but it could be on WASM (32-bit). let (s1: &str, s2: &str) = get_strings_somehow();
// Lets say these 2 strings are both huge
assert!(s1.len() == 2 * 1024 * 1024 * 1024); // 2 GiB
assert!(s2.len() == 2 * 1024 * 1024 * 1024); // 2 GiB
// Panics in debug mode, but wraps around to 0 in release mode on 32-bit machine
let len = s1.len() + s2.len();
assert!(len == 0);
// `len` is 0 so this does not allocate
let mut vec = Vec::with_capacity_in(len, allocator);
let dst = vec.as_mut_ptr();
assert!(dst as usize == 1); // Dangling pointer, not valid for writes
// Later... boom! Overwrite 2 GiB of arbitrary memory
unsafe { ptr::copy_nonoverlapping(s1.as_ptr(), dst, s1.len()); |
542a429 to
caca4dd
Compare
|
One other thing: Could we call this method |
I like the new method name! |
caca4dd to
fef0716
Compare
String::from_array_inString::from_strs)array_in
String::from_strs)array_inString::from_strs_array_in
|
Thank you for helping me make this API work and safe! I really appreciate it. @overlookmotel |
Merge activity
|
This method is similar to in [Vec::from_array_in](https://github.com/oxc-project/oxc/blob/5acc6ec3e9b51b3c6649409759e5039b6bdce8eb/crates/oxc_allocator/src/vec.rs#L140-L167), this aims to solve the problem that we want to construct an `ArenaString` with the given `&str`s where from different variables. For example: https://github.com/oxc-project/oxc/blob/36a90a61e85bd132040dc9a562efc12e5ae59673/crates/oxc_transformer/src/common/helper_loader.rs#L309-L318 This can refactored to ```diff - let mut source = ArenaString::with_capacity_in(len, ctx.ast.allocator); - source.push_str(&self.module_name); - source.push_str("/helpers/"); - source.push_str(helper_name); + ArenaString::from_array_in([&self.module_name, "/helpers/", helper_name], ctx.ast.allocator); ```
fef0716 to
8b51a75
Compare
…from_strs_array_in` (#9639) Follow-on after #9329. More fully document the constraints that ensure the safety of `String::from_strs_array_in`. The implementation in #9329 was already sound, but the comments didn't prove that (and in fact without checking the docs for `ptr::copy_nonoverlapping` and `*mut T::add`, I wasn't sure that it *was* sound if some of the input strings are zero length). Also add a debug assertion to check the pointer calculations are correct. Also refactor: Use `String::from_utf8_unchecked` to construct the eventual `String`, instead of `String::from_raw_parts_in`. The 2 are currently equivalent, but `allocator_api2::vec::Vec` does not guaranteed that `with_capacity_in` won't allocate *more* bytes than requested. If it does, `String::from_utf8_unchecked` makes use of that spare capacity in the `String`, whereas `String::from_raw_parts_in(ptr, len, len, allocator)` doesn't. That wasn't a soundness hole because both `Vec` and `String` are non-`Drop`, but it would have been a potential memory leak if they were.

This method is similar to in Vec::from_array_in, this aims to solve the problem that we want to construct an
ArenaStringwith the given&strs where from different variables.For example:
oxc/crates/oxc_transformer/src/common/helper_loader.rs
Lines 309 to 318 in 36a90a6
This can refactored to