Skip to content

Conversation

@overlookmotel
Copy link
Member

@overlookmotel overlookmotel commented Jul 15, 2024

Remove an unnecessary branch from record_ast_node. As suggested by @rzvxa in #4263 (review).

@graphite-app
Copy link
Contributor

graphite-app bot commented Jul 15, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

Copy link
Member Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @overlookmotel and the rest of your teammates on Graphite Graphite

@github-actions github-actions bot added the A-semantic Area - Semantic label Jul 15, 2024
@overlookmotel overlookmotel marked this pull request as ready for review July 15, 2024 11:05
@codspeed-hq
Copy link

codspeed-hq bot commented Jul 15, 2024

CodSpeed Performance Report

Merging #4273 will not alter performance

Comparing 07-15-perf_semantic_remove_a_branch_from_record_ast_node_ (019b4d3) with main (f9d3f2e)

Summary

✅ 32 untouched benchmarks

@overlookmotel overlookmotel marked this pull request as draft July 15, 2024 11:22
@Boshen Boshen force-pushed the 07-15-perf_semantic_inline_ast_record_functions branch from fbdd40f to f9d3f2e Compare July 15, 2024 11:43
@rzvxa
Copy link
Contributor

rzvxa commented Jul 15, 2024

Huh, It is consistently slower? How does that make sense? I can't remember if the benchmark runs with CFG or not, I think the None check is just fewer CPU cycles with no CFG runs. What does the benchmark look like with CFG enabled?

@Boshen Boshen changed the base branch from 07-15-perf_semantic_inline_ast_record_functions to main July 15, 2024 11:49
@Boshen Boshen force-pushed the 07-15-perf_semantic_remove_a_branch_from_record_ast_node_ branch from 64f74b5 to 019b4d3 Compare July 15, 2024 11:49
@overlookmotel
Copy link
Member Author

overlookmotel commented Jul 15, 2024

@rzvxa I've run the benchmarks 3 times, and each time it shows a negative effect. NB: CFG is not enabled on semantic benchmark.

I am guessing what's going on is one of:

  • Because record_ast_nodes and retrieve_recorded_ast_node also call self.cfg.is_some(), this value is warm in cache, so faster to check.
  • Because record_ast_nodes and retrieve_recorded_ast_node also call self.cfg.is_some(), using self.cfg.is_some() here too allows compiler to see it's a repeated check and make some optimization. This seems unlikely to me, as calls to record_ast_node are so far away from record_ast_nodes.
  • Vec::last_mut is slower than Option::is_some. This surprises me as I'd assume first thing Vec::last_mut would do is a bounds check. That check would be whether an 8-byte value is 0, same as Option::is_some check is.

But this is all just guesswork. No idea really! Any thoughts?

Regardless, given the (small) perf regression, I think we should not merge this.

@overlookmotel overlookmotel requested a review from rzvxa July 15, 2024 11:51
@rzvxa
Copy link
Contributor

rzvxa commented Jul 15, 2024

Yes, Let's close it. I initially suggested this to have fewer checks in the CFG pass but since the linter benchmark(which contains CFG) is untouched it clearly wasn't the case.

Sorry to waste your time mate😓

@rzvxa rzvxa closed this Jul 15, 2024
@overlookmotel
Copy link
Member Author

Not a waste of time at all. Negative results are also results.

overlookmotel added a commit that referenced this pull request Jul 15, 2024
@overlookmotel
Copy link
Member Author

overlookmotel commented Jul 16, 2024

I thought about it more, and I have another theory. Maybe the thing is that Vec::last_mut is a more complicated function, and often used, so compiler doesn't inline it. So the difference between Vec::last_mut and Option::is_some is that Vec::last_mut adds the overhead a function call. Added to that, it may be optimized to expect the Vec to have entries, so returning None is the cold path and incurs a branch mis-prediction.

Maybe.

@rzvxa Still no idea if this is correct, but thought you might be interested.

@rzvxa
Copy link
Contributor

rzvxa commented Jul 16, 2024

Thanks for having this conversation with me.

I thought about it more, and I have another theory. Maybe the thing is that Vec::last_mut is a more complicated function, and often used, so compiler doesn't inline it. So the difference between Vec::last_mut and Option::is_some is that Vec::last_mut adds the overhead a function call. Added to that, it may be optimized to expect the Vec to have entries, so returning None is the cold path and incurs a branch mis-prediction.

Maybe.

@rzvxa Still no idea if this is correct, but thought you might be interested.

I've tested this with a few other approaches such as checking indices instead of using the last_mut and checks for is_some have outperformed them all. Rustc is just really good at handling options(as it should be).

@overlookmotel
Copy link
Member Author

Thanks for having this conversation with me.

I thought about it more, and I have another theory. Maybe the thing is that Vec::last_mut is a more complicated function, and often used, so compiler doesn't inline it. So the difference between Vec::last_mut and Option::is_some is that Vec::last_mut adds the overhead a function call. Added to that, it may be optimized to expect the Vec to have entries, so returning None is the cold path and incurs a branch mis-prediction.
Maybe.
@rzvxa Still no idea if this is correct, but thought you might be interested.

I've tested this with a few other approaches such as checking indices instead of using the last_mut and checks for is_some have outperformed them all. Rustc is just really good at handling options(as it should be).

Or maybe the explanation was this one:

  • Because record_ast_nodes and retrieve_recorded_ast_node also call self.cfg.is_some(), this value is warm in cache, so faster to check.

@Boshen Boshen deleted the 07-15-perf_semantic_remove_a_branch_from_record_ast_node_ branch August 12, 2024 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-semantic Area - Semantic

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants