Skip to content

Conversation

@overlookmotel
Copy link
Member

No description provided.

@github-actions github-actions bot added the C-performance Category - Solution not expected to change functional behavior, only performance label Feb 23, 2025
Copy link
Member Author


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 23, 2025

CodSpeed Performance Report

Merging #9301 will not alter performance

Comparing 02-23-perf_allocator_optimize_vec_push_for_common_case_that_vec_is_not_full_to_capacity (f1145e6) with main (5cc614a)

Summary

✅ 33 untouched benchmarks

@overlookmotel overlookmotel force-pushed the 02-23-perf_allocator_optimize_vec_push_for_common_case_that_vec_is_not_full_to_capacity branch from d7ffd68 to 3460450 Compare February 26, 2025 13:42
@overlookmotel
Copy link
Member Author

@overlookmotel Note to self: Rebase on top of #9656 and re-run benchmarks.

@overlookmotel overlookmotel force-pushed the 02-23-perf_allocator_optimize_vec_push_for_common_case_that_vec_is_not_full_to_capacity branch from 3460450 to f1145e6 Compare March 14, 2025 06:43
@overlookmotel
Copy link
Member Author

@overlookmotel Note to self: Rebase on top of #9656 and re-run benchmarks.

Have done. Let's see...

@Dunqing
Copy link
Member

Dunqing commented Mar 14, 2025

It turns out the performance hit coming from different places. 😢

image

@Dunqing
Copy link
Member

Dunqing commented Mar 14, 2025

I stumbled upon a Vec::push_within_capacity method while I was looking through the standard Vec implementation that is very similar to what this optimization wants to do. But it looks like we need an opposite method, i.e. push_out_of_capacity. I have looked through the parser, and many situations can't predicate the capacity so we may need to use push_out_of_capacity instead.

@overlookmotel
Copy link
Member Author

I stumbled upon a Vec::push_within_capacity method while I was looking through the standard Vec implementation that is very similar to what this optimization wants to do. But it looks like we need an opposite method, i.e. push_out_of_capacity. I have looked through the parser, and many situations can't predicate the capacity so we may need to use push_out_of_capacity instead.

Yes, push_out_of_capacity is a nice idea.

I also wonder if a VecBuilder type might be another way to approach this problem (particularly in the parser).

What I have in mind is that VecBuilder would store a certain capacity on the stack, and only allocate once the inline capacity is exceeded (like smallvec). Then you'd turn it into an actual Vec once you've finished filling the VecBuilder and exact capacity is known.

Outline:

const VEC_BUILDER_INLINE_CAPACITY: usize = 8; // Or maybe 4 is better

pub enum VecBuilder<'a, T> {
    Inline(VecBuilderInline<T>),
    Allocated(Vec<'a, T>),
}

struct VecBuilderInline<T> {
    data: [MaybeUninit<T>; VEC_BUILDER_INLINE_CAPACITY],
    len: usize, // Cannot be more than `VEC_BUILDER_INLINE_CAPACITY`
}

impl<'a, T> VecBuilder<'a, T> {
    pub fn new() -> Self {
        Self::Inline(VecBuilderInline {
            data: [MaybeUninit::new(); VEC_BUILDER_INLINE_CAPACITY,
            len: 0,
        }
    }

    pub fn into_vec(self) -> Vec<'a, T> {
        match self {
            Inline(inline_vec) => todo!("Convert to Vec"),
            Allocated(vec) => vec,
        }
    }
}

(actual implementation could be optimized to avoid branching on Inline / Allocated)

Usage:

let mut vec_builder = VecBuilder::new();
while Some(node) = get_node_somehow() {
    vec_builder.push(node);
}
let vec = vec_builder.into_vec();
  • Upside: Never end up with small Vecs with excess unused capacity = tighter memory usage in arena. Might make for less CPU cache misses.
  • Upside: Don't hit the cold path for "vec needs to grow" until 8+ elements in the Vec. Lots of Vecs never reach that capacity e.g. functions rarely have more than 8 params.
  • Downside: VecBuilder::into_vec has to copy from inline storage to arena. That has a cost.

I don't know how the positives and negatives would weigh against each other. Maybe when T is small (e.g. Expression / Statement / Argument), the positives might win out. Maybe?

@overlookmotel
Copy link
Member Author

Our new Vec already puts "need to grow" on an #[inline(never)] cold path, unlike allocator_api2::vec::Vec. So this is no longer an optimization - in fact it's a deoptimization, as it penalises "need to grow" branch by forcing 2 function calls, instead of 1. So closing this.

@overlookmotel overlookmotel deleted the 02-23-perf_allocator_optimize_vec_push_for_common_case_that_vec_is_not_full_to_capacity branch March 17, 2025 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-performance Category - Solution not expected to change functional behavior, only performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants