perf(allocator): optimize `Vec::push` for common case that vec is not full to capacity #9301

overlookmotel · 2025-02-23T14:45:29Z

No description provided.

overlookmotel · 2025-02-23T14:45:46Z

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

codspeed-hq · 2025-02-23T14:53:42Z

CodSpeed Performance Report

Merging #9301 will not alter performance

_{Comparing 02-23-perf_allocator_optimize_vec_push_for_common_case_that_vec_is_not_full_to_capacity (f1145e6) with main (5cc614a)}

Summary

✅ 33 untouched benchmarks

overlookmotel · 2025-03-14T06:21:50Z

@overlookmotel Note to self: Rebase on top of #9656 and re-run benchmarks.

… full to capacity

overlookmotel · 2025-03-14T06:44:08Z

@overlookmotel Note to self: Rebase on top of #9656 and re-run benchmarks.

Have done. Let's see...

Dunqing · 2025-03-14T06:59:32Z

It turns out the performance hit coming from different places. 😢

Dunqing · 2025-03-14T10:24:47Z

I stumbled upon a Vec::push_within_capacity method while I was looking through the standard Vec implementation that is very similar to what this optimization wants to do. But it looks like we need an opposite method, i.e. push_out_of_capacity. I have looked through the parser, and many situations can't predicate the capacity so we may need to use push_out_of_capacity instead.

overlookmotel · 2025-03-17T12:07:48Z

I stumbled upon a Vec::push_within_capacity method while I was looking through the standard Vec implementation that is very similar to what this optimization wants to do. But it looks like we need an opposite method, i.e. push_out_of_capacity. I have looked through the parser, and many situations can't predicate the capacity so we may need to use push_out_of_capacity instead.

Yes, push_out_of_capacity is a nice idea.

I also wonder if a VecBuilder type might be another way to approach this problem (particularly in the parser).

What I have in mind is that VecBuilder would store a certain capacity on the stack, and only allocate once the inline capacity is exceeded (like smallvec). Then you'd turn it into an actual Vec once you've finished filling the VecBuilder and exact capacity is known.

Outline:

const VEC_BUILDER_INLINE_CAPACITY: usize = 8; // Or maybe 4 is better

pub enum VecBuilder<'a, T> {
    Inline(VecBuilderInline<T>),
    Allocated(Vec<'a, T>),
}

struct VecBuilderInline<T> {
    data: [MaybeUninit<T>; VEC_BUILDER_INLINE_CAPACITY],
    len: usize, // Cannot be more than `VEC_BUILDER_INLINE_CAPACITY`
}

impl<'a, T> VecBuilder<'a, T> {
    pub fn new() -> Self {
        Self::Inline(VecBuilderInline {
            data: [MaybeUninit::new(); VEC_BUILDER_INLINE_CAPACITY,
            len: 0,
        }
    }

    pub fn into_vec(self) -> Vec<'a, T> {
        match self {
            Inline(inline_vec) => todo!("Convert to Vec"),
            Allocated(vec) => vec,
        }
    }
}

(actual implementation could be optimized to avoid branching on Inline / Allocated)

Usage:

let mut vec_builder = VecBuilder::new();
while Some(node) = get_node_somehow() {
    vec_builder.push(node);
}
let vec = vec_builder.into_vec();

Upside: Never end up with small Vecs with excess unused capacity = tighter memory usage in arena. Might make for less CPU cache misses.
Upside: Don't hit the cold path for "vec needs to grow" until 8+ elements in the Vec. Lots of Vecs never reach that capacity e.g. functions rarely have more than 8 params.
Downside: VecBuilder::into_vec has to copy from inline storage to arena. That has a cost.

I don't know how the positives and negatives would weigh against each other. Maybe when T is small (e.g. Expression / Statement / Argument), the positives might win out. Maybe?

overlookmotel · 2025-03-17T12:09:59Z

Our new Vec already puts "need to grow" on an #[inline(never)] cold path, unlike allocator_api2::vec::Vec. So this is no longer an optimization - in fact it's a deoptimization, as it penalises "need to grow" branch by forcing 2 function calls, instead of 1. So closing this.

github-actions bot added the C-performance Category - Solution not expected to change functional behavior, only performance label Feb 23, 2025

overlookmotel mentioned this pull request Feb 26, 2025

ci(benchmarks): exclude parser from isolated declarations benchmark timing #9388

Merged

overlookmotel force-pushed the 02-23-perf_allocator_optimize_vec_push_for_common_case_that_vec_is_not_full_to_capacity branch from d7ffd68 to 3460450 Compare February 26, 2025 13:42

overlookmotel mentioned this pull request Mar 14, 2025

feat(allocator): replace allocator_ap2's Vec with Vec2 #9656

Merged

perf(allocator): optimize Vec::push for common case that vec is not…

f1145e6

… full to capacity

overlookmotel force-pushed the 02-23-perf_allocator_optimize_vec_push_for_common_case_that_vec_is_not_full_to_capacity branch from 3460450 to f1145e6 Compare March 14, 2025 06:43

overlookmotel closed this Mar 17, 2025

overlookmotel deleted the 02-23-perf_allocator_optimize_vec_push_for_common_case_that_vec_is_not_full_to_capacity branch March 17, 2025 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf(allocator): optimize `Vec::push` for common case that vec is not full to capacity #9301

perf(allocator): optimize `Vec::push` for common case that vec is not full to capacity #9301

Uh oh!

overlookmotel commented Feb 23, 2025

Uh oh!

overlookmotel commented Feb 23, 2025

Uh oh!

codspeed-hq bot commented Feb 23, 2025 •

edited

Loading

Uh oh!

overlookmotel commented Mar 14, 2025

Uh oh!

overlookmotel commented Mar 14, 2025

Uh oh!

Dunqing commented Mar 14, 2025

Uh oh!

Dunqing commented Mar 14, 2025

Uh oh!

overlookmotel commented Mar 17, 2025

Uh oh!

overlookmotel commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

perf(allocator): optimize Vec::push for common case that vec is not full to capacity #9301

perf(allocator): optimize Vec::push for common case that vec is not full to capacity #9301

Uh oh!

Conversation

overlookmotel commented Feb 23, 2025

Uh oh!

overlookmotel commented Feb 23, 2025

How to use the Graphite Merge Queue

Uh oh!

codspeed-hq bot commented Feb 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #9301 will not alter performance

Summary

Uh oh!

overlookmotel commented Mar 14, 2025

Uh oh!

overlookmotel commented Mar 14, 2025

Uh oh!

Dunqing commented Mar 14, 2025

Uh oh!

Dunqing commented Mar 14, 2025

Uh oh!

overlookmotel commented Mar 17, 2025

Uh oh!

overlookmotel commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf(allocator): optimize `Vec::push` for common case that vec is not full to capacity #9301

perf(allocator): optimize `Vec::push` for common case that vec is not full to capacity #9301

codspeed-hq bot commented Feb 23, 2025 •

edited

Loading