Skip to content

Conversation

@Dunqing
Copy link
Member

@Dunqing Dunqing commented Mar 15, 2025

resolve: #9656 (comment)

#9656 brought a small performance improvement for the transformer but also led the parser to 1% performance hits. This PR returns performance by splitting reserve_internal to reserve_exact_internal and reserve_amortized_internal respectively the internal implementation of reserve_exact and reserve.

Why the change can improve performance?

The original reserve_internal implementation has a check for reserve strategy,

// Nothing we can really do about these checks :(
let new_cap = match strategy {
Exact => used_cap.checked_add(needed_extra_cap).ok_or(CapacityOverflow)?,
Amortized => self.amortized_new_size(used_cap, needed_extra_cap)?,
};
which is can be avoided because the caller of reserve_internal already knows the reserve strategy. After the change, the reserve_exact and reserve can call the corresponding internal implementation directly, which can avoid unnecessary checks.

Likewise, the Fallibility check can also be avoided,

if let (Err(AllocError), Infallible) = (&res, fallibility) {
handle_alloc_error(new_layout);
}

because we know where the errors should be handled.

Due to this change, I also replaced Bumpalo's CollecitonAllocErr with allocator-api2's TryReserveError because CollecitonAllocErr::AllocErr cannot pass in a Layout. I ended up reverting 937c61a as it caused transformer performance 1%-2% regression (See codspeed and switch to "replace CollectionAllocErr with TryReserveError" commit), and replaced by 84edacd I've tried various way to save the performance but it not work. I suspect the cause is that TryReserveError is 16 bytes whereas CollecitonAllocErr is only 1 byte.

So, after both checks are removed, the performance returns to the original. The whole change is according to standard RawVec's implementation. See https://doc.rust-lang.org/src/alloc/raw_vec.rs.html

image

Copy link
Member Author

Dunqing commented Mar 15, 2025


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@Dunqing Dunqing changed the title pref(allocator/vec2): optimize reserving memory perf(allocator/vec2): optimize reserving memory Mar 15, 2025
@github-actions github-actions bot added the C-performance Category - Solution not expected to change functional behavior, only performance label Mar 15, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 15, 2025

CodSpeed Performance Report

Merging #9792 will not alter performance

Comparing 03-15-pref_allocator_vec2_optimize_reserving_memory (17a9320) with main (e924c2b)

Summary

✅ 33 untouched benchmarks

@Dunqing Dunqing force-pushed the 03-15-pref_allocator_vec2_optimize_reserving_memory branch 5 times, most recently from e04b012 to e55b350 Compare March 16, 2025 10:01
@Dunqing Dunqing marked this pull request as ready for review March 16, 2025 10:01
@Dunqing Dunqing requested a review from overlookmotel March 16, 2025 10:01
@Dunqing Dunqing force-pushed the 03-15-pref_allocator_vec2_optimize_reserving_memory branch 9 times, most recently from 6634ef5 to 02413cd Compare March 16, 2025 14:47
@Dunqing Dunqing marked this pull request as draft March 16, 2025 14:47
@Dunqing Dunqing force-pushed the 03-15-pref_allocator_vec2_optimize_reserving_memory branch 9 times, most recently from 20a2e94 to 01c3780 Compare March 17, 2025 01:51
@Dunqing Dunqing marked this pull request as ready for review March 17, 2025 02:16
Copy link
Member

@overlookmotel overlookmotel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@overlookmotel overlookmotel added the 0-merge Merge with Graphite Merge Queue label Mar 17, 2025
Copy link
Member

overlookmotel commented Mar 17, 2025

Merge activity

  • Mar 17, 8:25 AM EDT: The merge label '0-merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
  • Mar 17, 8:26 AM EDT: A user added this pull request to the Graphite merge queue.
  • Mar 17, 8:32 AM EDT: A user merged this pull request with the Graphite merge queue.

resolve: #9656 (comment)

#9656 brought a small performance improvement for the transformer but also led the parser to 1% performance hits. This PR returns performance by splitting `reserve_internal` to `reserve_exact_internal` and `reserve_amortized_internal`  respectively the internal implementation of `reserve_exact` and `reserve`.

Why the change can improve performance?

The original `reserve_internal` implementation has a check for reserve strategy, https://github.com/oxc-project/oxc/blob/fef680a4775559805e99622fb5aa6155cdf47034/crates/oxc_allocator/src/vec2/raw_vec.rs#L664-L668 which is can be avoided because the caller of `reserve_internal` already knows the reserve strategy. After the change, the `reserve_exact` and `reserve` can call the corresponding internal implementation directly, which can avoid unnecessary checks.

Likewise, the `Fallibility` check can also be avoided, https://github.com/oxc-project/oxc/blob/fef680a4775559805e99622fb5aa6155cdf47034/crates/oxc_allocator/src/vec2/raw_vec.rs#L681-L683

because we know where the errors should be handled.

~~Due to this change, I also replaced Bumpalo's `CollecitonAllocErr` with allocator-api2's `TryReserveError` because `CollecitonAllocErr::AllocErr` cannot pass in a `Layout`.~~ I ended up reverting 937c61a as it caused transformer performance 1%-2% regression (See [codspeed](https://codspeed.io/oxc-project/oxc/branches/03-15-pref_allocator_vec2_optimize_reserving_memory) and switch to "replace CollectionAllocErr with TryReserveError" commit), and replaced by 84edacd I've tried various way to save the performance but it not work. I suspect the cause is that `TryReserveError` is 16 bytes whereas `CollecitonAllocErr` is only 1 byte.

So, after both checks are removed, the performance returns to the original. The whole change is according to standard `RawVec`'s implementation. See https://doc.rust-lang.org/src/alloc/raw_vec.rs.html

<img width="608" alt="image" src="https://github.com/user-attachments/assets/53066d8e-26f0-4eb1-8f33-4ca9e517e75b" />
@graphite-app graphite-app bot force-pushed the 03-15-pref_allocator_vec2_optimize_reserving_memory branch from 84edacd to 17a9320 Compare March 17, 2025 12:26
@graphite-app graphite-app bot merged commit 17a9320 into main Mar 17, 2025
26 checks passed
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Mar 17, 2025
@graphite-app graphite-app bot deleted the 03-15-pref_allocator_vec2_optimize_reserving_memory branch March 17, 2025 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-performance Category - Solution not expected to change functional behavior, only performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants