-
-
Notifications
You must be signed in to change notification settings - Fork 755
feat(allocator): replace allocator_ap2's Vec with Vec2 #9656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(allocator): replace allocator_ap2's Vec with Vec2 #9656
Conversation
CodSpeed Performance ReportMerging #9656 will improve performances by 5.27%Comparing Summary
Benchmarks breakdown
|
e7aa3de to
c20bc2e
Compare
450c4d3 to
b3bbb9e
Compare
c20bc2e to
5351a48
Compare
b3bbb9e to
14c989a
Compare
d7ed2b3 to
c0e0d41
Compare
14c989a to
8bdcb8c
Compare
247b928 to
fcb5728
Compare
c0e0d41 to
cf32c65
Compare
…9733) `stmts` here is an `ArenaVec<&Statement>` (`&Statement` not `Statement`). This is a temporary collection, and doesn't end up in the AST, so we shouldn't store it in the arena. Use a `std::vec::Vec` instead of `oxc_allocator::Vec`. This should also remove some of the lifetime problems we have in #9656. This change may cause a small perf hit, but in my view it's still a good change. If we want to get a speed boost, a better solution would be to have a temporary "scratch space" arena which we can allocate *all* temporary values into. This would likely give us a sizeable speed boost across many parts of Oxc (oxc-project/backlog#121).
|
Here's a minimal reproduction of the lifetimes problem: use bumpalo::{Bump, collections::Vec};
struct Test<'b> {
_dummy: &'b (),
}
impl<'b> Test<'b> {
pub fn test_vec(input: &Vec<'b, u64>, bump: &'b Bump) -> u64 {
let vec = Vec::from_iter_in(input.iter(), bump);
// ^^^^^^^^^^^^
// Error: explicit lifetime required in the type of `input`. lifetime `'b` required.
Self::sum(&vec)
}
fn sum(vec: &Vec<'b, &u64>) -> u64 {
vec.iter().copied().sum()
}
}This fixes it: - fn sum(vec: &Vec<'b, &u64>) -> u64 {
+ fn sum(vec: &Vec<'_, &u64>) -> u64 {But I don't think that should be necessary. I think it should compile fine as is. Some discussion (but no complete conclusion) here: fitzgen/bumpalo#171 |
fcb5728 to
ec2f8da
Compare
Yes! You are right! |
Does that mean we should align the implementation with std does? Accepting allocator rather than a |
958af20 to
328e5d0
Compare
overlookmotel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remain unsure whether the T: 'bump lifetime bound is required on vec2::Vec (and on its various impls).
But now the only place this impacts our code is requiring a C: 'new_alloc bound on CloneIn for Vec, which is unproblematic.
So I think we can merge this now, and resolve the lifetime question later on.
Merge activity
|
|
Note that as well as the positive perf impact on transformer, this PR also has a small negative effect on parser benchmarks.
I imagine this may be for same reason as saw that same performance effect in #9301. We can take a look at parser code and see if we can improve that by reserving initial capacity in |
Let me take a look at this |
Just replace allocator_ap2's Vec with Vec2, and make some changes to make it compile successfully.
328e5d0 to
5cc614a
Compare
We may want to look at implementing the optimization in #9301 first. (or that may not even be an optimization any more!) |
…9772) Follow-on after #9656. Add a defence against potential double-free bug in `String::from_utf8_unchecked`. As noted in the comment, this is probably unnecessary, but it doesn't hurt to add this defensive code while new `Vec` implementation is under development. We can remove it again later when we're satisfied we have covered all bases.
resolve: #9656 (comment) #9656 brought a small performance improvement for the transformer but also led the parser to 1% performance hits. This PR returns performance by splitting `reserve_internal` to `reserve_exact_internal` and `reserve_amortized_internal` respectively the internal implementation of `reserve_exact` and `reserve`. Why the change can improve performance? The original `reserve_internal` implementation has a check for reserve strategy, https://github.com/oxc-project/oxc/blob/fef680a4775559805e99622fb5aa6155cdf47034/crates/oxc_allocator/src/vec2/raw_vec.rs#L664-L668 which is can be avoided because the caller of `reserve_internal` already knows the reserve strategy. After the change, the `reserve_exact` and `reserve` can call the corresponding internal implementation directly, which can avoid unnecessary checks. Likewise, the `Fallibility` check can also be avoided, https://github.com/oxc-project/oxc/blob/fef680a4775559805e99622fb5aa6155cdf47034/crates/oxc_allocator/src/vec2/raw_vec.rs#L681-L683 because we know where the errors should be handled. ~~Due to this change, I also replaced Bumpalo's `CollecitonAllocErr` with allocator-api2's `TryReserveError` because `CollecitonAllocErr::AllocErr` cannot pass in a `Layout`.~~ I ended up reverting 937c61a as it caused transformer performance 1%-2% regression (See [codspeed](https://codspeed.io/oxc-project/oxc/branches/03-15-pref_allocator_vec2_optimize_reserving_memory) and switch to "replace CollectionAllocErr with TryReserveError" commit), and replaced by 84edacd I've tried various way to save the performance but it not work. I suspect the cause is that `TryReserveError` is 16 bytes whereas `CollecitonAllocErr` is only 1 byte. So, after both checks are removed, the performance returns to the original. The whole change is according to standard `RawVec`'s implementation. See https://doc.rust-lang.org/src/alloc/raw_vec.rs.html <img width="608" alt="image" src="https://github.com/user-attachments/assets/53066d8e-26f0-4eb1-8f33-4ca9e517e75b" />
self.reserve(1) calls with self.grow_one() for better efficiency
#9856


Just replace allocator_ap2's Vec with Vec2, and make some changes to make it compile successfully.