Skip to content

Conversation

@Dunqing
Copy link
Member

@Dunqing Dunqing commented Feb 24, 2025

This method is similar to in Vec::from_array_in, this aims to solve the problem that we want to construct an ArenaString with the given &strs where from different variables.

For example:

// Construct string directly in arena without an intermediate temp allocation
fn get_runtime_source(&self, helper: Helper, ctx: &mut TraverseCtx<'a>) -> Atom<'a> {
let helper_name = helper.name();
let len = self.module_name.len() + "/helpers/".len() + helper_name.len();
let mut source = ArenaString::with_capacity_in(len, ctx.ast.allocator);
source.push_str(&self.module_name);
source.push_str("/helpers/");
source.push_str(helper_name);
Atom::from(source)
}

This can refactored to

- let mut source = ArenaString::with_capacity_in(len, ctx.ast.allocator); 
- source.push_str(&self.module_name); 
- source.push_str("/helpers/"); 
- source.push_str(helper_name);
+ ArenaString::from_array_in([&self.module_name, "/helpers/", helper_name], ctx.ast.allocator);

Copy link
Member Author

Dunqing commented Feb 24, 2025


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@Dunqing Dunqing force-pushed the 02-24-feat_ast_allocator_add_string_from_array_in branch from 0c71bac to 36a90a6 Compare February 24, 2025 10:30
@github-actions github-actions bot added the C-enhancement Category - New feature or request label Feb 24, 2025
@Dunqing Dunqing changed the title feat(ast/allocator): add String::from_array_in feat(allocator): add String::from_array_in Feb 24, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 24, 2025

CodSpeed Performance Report

Merging #9329 will not alter performance

Comparing 02-24-feat_ast_allocator_add_string_from_array_in (8b51a75) with main (ec922e9)

Summary

✅ 39 untouched benchmarks

Copy link
Member

@overlookmotel overlookmotel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea for this API is good. And we can also make it more performant with judicious use of some unsafe code! (to remove bounds checks)

However, the implementation I'm pretty sure is not correct. Add a test, and you'll see what I mean...

@Boshen Boshen marked this pull request as draft February 26, 2025 13:08
@Dunqing Dunqing force-pushed the 02-24-feat_ast_allocator_add_string_from_array_in branch from 36a90a6 to b990c9c Compare February 26, 2025 14:57
@Dunqing
Copy link
Member Author

Dunqing commented Feb 26, 2025

The idea for this API is good. And we can also make it more performant with judicious use of some unsafe code! (to remove bounds checks)

However, the implementation I'm pretty sure is not correct. Add a test, and you'll see what I mean...

Ah, I see the problem, I reimplemented it. But I am not sure this if it has become more performant.

Copy link
Member

@overlookmotel overlookmotel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is, in my opinion, exactly how unsafe code should be used - enclosed in a small function where you can understand the logic and check it's sound, and presenting a safe interface to user, so they don't have to worry about checking that logic again every time they use it.

In addition to my suggested optimization below, I'd suggest:

Add more // SAFETY comments. Usually I go through the docs for the unsafe method that I'm calling and make a comment that "ticks off" each safety requirement one by one - and explains for each one how the code guarantees that requirement is satisfied.

This is boring and lengthy, but has 2 advantages: (a) it's a rigorous approach which makes it harder to miss anything and (b) it makes it easier for someone else to review the correctness of the logic.

For example, if one of the string is "", I'm wondering if it is legal to call ptr::copy_nonoverlapping with len = 0? Your comments don't reveal whether (a) you checked the docs and it is legal or (b) we don't know if it is or not. That kind of ambiguity isn't ideal for unsafe code.

Secondly, unsafe code should ideally have unit tests, and they should include edge cases e.g. empty array [&str; 0] or empty strings ["", "x", ""].

@overlookmotel
Copy link
Member

overlookmotel commented Feb 26, 2025

One other thing... What if the total length of the strings together exceeds usize::MAX? Does sum() panic in case of arithmetic overflow in release mode? If it doesn't, then we have opportunity for UB.

That's not really possible on 64-bit machines, but it could be on WASM (32-bit).

let (s1: &str, s2: &str) = get_strings_somehow();
// Lets say these 2 strings are both huge
assert!(s1.len() == 2 * 1024 * 1024 * 1024); // 2 GiB
assert!(s2.len() == 2 * 1024 * 1024 * 1024); // 2 GiB
// Panics in debug mode, but wraps around to 0 in release mode on 32-bit machine
let len = s1.len() + s2.len();
assert!(len == 0);
// `len` is 0 so this does not allocate
let mut vec = Vec::with_capacity_in(len, allocator);
let dst = vec.as_mut_ptr();
assert!(dst as usize == 1); // Dangling pointer, not valid for writes
// Later... boom! Overwrite 2 GiB of arbitrary memory
unsafe { ptr::copy_nonoverlapping(s1.as_ptr(), dst, s1.len());

@Dunqing Dunqing force-pushed the 02-24-feat_ast_allocator_add_string_from_array_in branch 3 times, most recently from 542a429 to caca4dd Compare March 6, 2025 10:43
@overlookmotel
Copy link
Member

One other thing: Could we call this method String::from_strs_array_in? Otherwise the question is "array of what?". We might in future want to add e.g. String::from_bytes_array_in.

@Dunqing
Copy link
Member Author

Dunqing commented Mar 6, 2025

One other thing: Could we call this method String::from_strs_array_in? Otherwise the question is "array of what?". We might in future want to add e.g. String::from_bytes_array_in.

I like the new method name!

@Dunqing Dunqing force-pushed the 02-24-feat_ast_allocator_add_string_from_array_in branch from caca4dd to fef0716 Compare March 6, 2025 12:55
@Dunqing Dunqing changed the title feat(allocator): add String::from_array_in feat(allocator): add String::from_strs)array_in Mar 6, 2025
@Dunqing Dunqing changed the title feat(allocator): add String::from_strs)array_in feat(allocator): add String::from_strs_array_in Mar 6, 2025
@Dunqing Dunqing marked this pull request as ready for review March 6, 2025 12:56
@Dunqing
Copy link
Member Author

Dunqing commented Mar 6, 2025

Thank you for helping me make this API work and safe! I really appreciate it. @overlookmotel

@graphite-app graphite-app bot added the 0-merge Merge with Graphite Merge Queue label Mar 7, 2025
@graphite-app
Copy link
Contributor

graphite-app bot commented Mar 7, 2025

Merge activity

  • Mar 7, 3:15 AM UTC: The merge label '0-merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
  • Mar 7, 3:15 AM UTC: A user added this pull request to the Graphite merge queue.
  • Mar 7, 3:21 AM UTC: A user merged this pull request with the Graphite merge queue.

This method is similar to in [Vec::from_array_in](https://github.com/oxc-project/oxc/blob/5acc6ec3e9b51b3c6649409759e5039b6bdce8eb/crates/oxc_allocator/src/vec.rs#L140-L167), this aims to solve the problem that we want to construct an `ArenaString` with the given `&str`s where from different variables.

For example:

https://github.com/oxc-project/oxc/blob/36a90a61e85bd132040dc9a562efc12e5ae59673/crates/oxc_transformer/src/common/helper_loader.rs#L309-L318

This can refactored to

```diff
- let mut source = ArenaString::with_capacity_in(len, ctx.ast.allocator);
- source.push_str(&self.module_name);
- source.push_str("/helpers/");
- source.push_str(helper_name);
+ ArenaString::from_array_in([&self.module_name, "/helpers/", helper_name], ctx.ast.allocator);
```
@graphite-app graphite-app bot force-pushed the 02-24-feat_ast_allocator_add_string_from_array_in branch from fef0716 to 8b51a75 Compare March 7, 2025 03:16
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Mar 7, 2025
@graphite-app graphite-app bot merged commit 8b51a75 into main Mar 7, 2025
33 checks passed
@graphite-app graphite-app bot deleted the 02-24-feat_ast_allocator_add_string_from_array_in branch March 7, 2025 03:21
@oxc-bot oxc-bot mentioned this pull request Mar 7, 2025
graphite-app bot pushed a commit that referenced this pull request Mar 10, 2025
…from_strs_array_in` (#9639)

Follow-on after #9329.

More fully document the constraints that ensure the safety of `String::from_strs_array_in`.

The implementation in #9329 was already sound, but the comments didn't prove that (and in fact without checking the docs for `ptr::copy_nonoverlapping` and `*mut T::add`, I wasn't sure that it *was* sound if some of the input strings are zero length).

Also add a debug assertion to check the pointer calculations are correct.

Also refactor:

Use `String::from_utf8_unchecked` to construct the eventual `String`, instead of `String::from_raw_parts_in`. The 2 are currently equivalent, but `allocator_api2::vec::Vec` does not guaranteed that `with_capacity_in` won't allocate *more* bytes than requested. If it does, `String::from_utf8_unchecked` makes use of that spare capacity in the `String`, whereas `String::from_raw_parts_in(ptr, len, len, allocator)` doesn't.

That wasn't a soundness hole because both `Vec` and `String` are non-`Drop`, but it would have been a potential memory leak if they were.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-enhancement Category - New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants