-
-
Notifications
You must be signed in to change notification settings - Fork 755
perf(allocator/vec2): optimize reserving memory #9792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(allocator/vec2): optimize reserving memory #9792
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
CodSpeed Performance ReportMerging #9792 will not alter performanceComparing Summary
|
e04b012 to
e55b350
Compare
6634ef5 to
02413cd
Compare
20a2e94 to
01c3780
Compare
overlookmotel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
Merge activity
|
resolve: #9656 (comment) #9656 brought a small performance improvement for the transformer but also led the parser to 1% performance hits. This PR returns performance by splitting `reserve_internal` to `reserve_exact_internal` and `reserve_amortized_internal` respectively the internal implementation of `reserve_exact` and `reserve`. Why the change can improve performance? The original `reserve_internal` implementation has a check for reserve strategy, https://github.com/oxc-project/oxc/blob/fef680a4775559805e99622fb5aa6155cdf47034/crates/oxc_allocator/src/vec2/raw_vec.rs#L664-L668 which is can be avoided because the caller of `reserve_internal` already knows the reserve strategy. After the change, the `reserve_exact` and `reserve` can call the corresponding internal implementation directly, which can avoid unnecessary checks. Likewise, the `Fallibility` check can also be avoided, https://github.com/oxc-project/oxc/blob/fef680a4775559805e99622fb5aa6155cdf47034/crates/oxc_allocator/src/vec2/raw_vec.rs#L681-L683 because we know where the errors should be handled. ~~Due to this change, I also replaced Bumpalo's `CollecitonAllocErr` with allocator-api2's `TryReserveError` because `CollecitonAllocErr::AllocErr` cannot pass in a `Layout`.~~ I ended up reverting 937c61a as it caused transformer performance 1%-2% regression (See [codspeed](https://codspeed.io/oxc-project/oxc/branches/03-15-pref_allocator_vec2_optimize_reserving_memory) and switch to "replace CollectionAllocErr with TryReserveError" commit), and replaced by 84edacd I've tried various way to save the performance but it not work. I suspect the cause is that `TryReserveError` is 16 bytes whereas `CollecitonAllocErr` is only 1 byte. So, after both checks are removed, the performance returns to the original. The whole change is according to standard `RawVec`'s implementation. See https://doc.rust-lang.org/src/alloc/raw_vec.rs.html <img width="608" alt="image" src="https://github.com/user-attachments/assets/53066d8e-26f0-4eb1-8f33-4ca9e517e75b" />
84edacd to
17a9320
Compare

resolve: #9656 (comment)
#9656 brought a small performance improvement for the transformer but also led the parser to 1% performance hits. This PR returns performance by splitting
reserve_internaltoreserve_exact_internalandreserve_amortized_internalrespectively the internal implementation ofreserve_exactandreserve.Why the change can improve performance?
The original
reserve_internalimplementation has a check for reserve strategy,oxc/crates/oxc_allocator/src/vec2/raw_vec.rs
Lines 664 to 668 in fef680a
reserve_internalalready knows the reserve strategy. After the change, thereserve_exactandreservecan call the corresponding internal implementation directly, which can avoid unnecessary checks.Likewise, the
Fallibilitycheck can also be avoided,oxc/crates/oxc_allocator/src/vec2/raw_vec.rs
Lines 681 to 683 in fef680a
because we know where the errors should be handled.
Due to this change, I also replaced Bumpalo'sI ended up reverting 937c61a as it caused transformer performance 1%-2% regression (See codspeed and switch to "replace CollectionAllocErr with TryReserveError" commit), and replaced by 84edacd I've tried various way to save the performance but it not work. I suspect the cause is thatCollecitonAllocErrwith allocator-api2'sTryReserveErrorbecauseCollecitonAllocErr::AllocErrcannot pass in aLayout.TryReserveErroris 16 bytes whereasCollecitonAllocErris only 1 byte.So, after both checks are removed, the performance returns to the original. The whole change is according to standard
RawVec's implementation. See https://doc.rust-lang.org/src/alloc/raw_vec.rs.html