-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Reduce allocations in UnboundLambda #23534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce allocations in UnboundLambda #23534
Conversation
be91b2e to
22b4d28
Compare
|
@dotnet/roslyn-compiler for review |
| // when binding for real (not for return inference), there is still | ||
| // a good chance that we could reuse a body of a lambda previously bound for | ||
| // return type inference. | ||
| var cacheKey = ReturnInferenceCacheKey.Create(delegateType, IsAsync); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cacheKey [](start = 16, length = 8)
This cache key is not used as a cache key. In fact, it really isn't used, except to get things that are available on the delegate type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is, please avoid creating the cache key, and instead directly compute the parameter type array and parameter ref kind array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➡️ Reverted the key change; can evaluate separately.
| } | ||
|
|
||
| return result; | ||
| var cacheKey = ReturnInferenceCacheKey.Create(delegateType, IsAsync); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cacheKey [](start = 16, length = 8)
Ditto
| BoundLambda result; | ||
| if (!_bindingCache.TryGetValue(delegateType, out result)) | ||
| if (_bindingCache.TryGetValue(delegateType, out var result)) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrency on these caches will be quite rare, and encountered solely based on concurrent calls into the compiler APIs. I think we are likely to be better served using a simple Dictionary with an explicit lock around its uses.
| private readonly ConcurrentDictionary<ReturnInferenceCacheKey, BoundLambda> _returnInferenceCache = new ConcurrentDictionary<ReturnInferenceCacheKey, BoundLambda>(); | ||
| [PerformanceSensitive( | ||
| "https://github.com/dotnet/roslyn/issues/23582", | ||
| Constraint = "Avoid " + nameof(ConcurrentDictionary<NamedTypeSymbol, BoundLambda>) + " which has a large default size, but this cache is normally small.")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which has a large default size, but this cache is normally small. [](start = 99, length = 65)
Could we provide explicit initial capacity instead, the reasonable small value? #Closed
| { | ||
| BoundLambda result; | ||
| if (!_bindingCache.TryGetValue(delegateType, out result)) | ||
| if (_bindingCache.TryGetValue(delegateType, out var result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var [](start = 60, length = 3)
Please spell out the type, or better revert the inlining of the declaration. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 According to the .editorconfig for this section of the code, both forms are acceptable. We need to update the configuration to match the practices followed during the code review process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➡️ Reverted style changes to the extent possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the configuration to match the practices followed during the code review process.
Note that the exact rules in compilers - var is not allowed unless the type is explicitly named in the initializer. #Closed
| [PerformanceSensitive( | ||
| "https://github.com/dotnet/roslyn/issues/23582", | ||
| Constraint = "Avoid " + nameof(ConcurrentDictionary<NamedTypeSymbol, BoundLambda>) + " which has a large default size, but this cache is normally small.")] | ||
| private ImmutableDictionary<NamedTypeSymbol, BoundLambda> _returnInferenceCache = ImmutableDictionary<NamedTypeSymbol, BoundLambda>.Empty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NamedTypeSymbol [](start = 110, length = 15)
The change in the key type looks wrong. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't call it so much wrong as replacing one apparently unimportant optimization (sharing the cache for semantically identical delegate types) for another apparently important optimization. I believe the transformation preserves correctness of the compiler. Do you, @AlekseyTs, believe that change is semantically incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is going to be important to understand. According to the previous implementation of BindForReturnTypeInference, returning a new result from ReallyInferReturnType for the same delegate type is acceptable in the case of a cache miss (note the use of TryAdd instead of GetOrAdd). I verified that the conversion from NamedTypeSymbol to ReturnInferenceCacheKey is a deterministic function, but incorrectly treated it as one-to-one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither description of the change, nor comments mentioned the type change for the key. The motivation behind the change was not explained, no numbers supporting the change were provided. I am talking exclusively about the type change for the key. If you have numbers supporting that the type change on its own leads to noticeable improvement, and you are confident that the change is not going to regress other scenarios, then we should get rid of the ReturnInferenceCacheKey type altogether. #Closed
|
|
||
| BoundLambda result; | ||
| if (!_returnInferenceCache.TryGetValue(cacheKey, out result)) | ||
| if (_returnInferenceCache.TryGetValue(delegateType, out var result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var [](start = 68, length = 3)
Please spell out the type, or better revert the inlining of the declaration. In general, var is not welcome in compilers code base. #Closed
AlekseyTs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert changes to the key type of the dictionaries and all refactorings that were triggered by the key type change.
|
@dotnet/roslyn-compiler This is ready for updated review. Changes are restricted to the type of dictionary used. I will update the original post with allocation savings under the new approach when available. @gafter @AlekseyTs You both mentioned using different dictionary types. Do you have examples of where these dictionaries get large? My belief is these are typically single-element dictionaries, in which case the current tree-based dictionary would be as efficient as anything else. It seems like you would need a very large number of very large sets to give anything else an advantage. Unfortunately my test suite is limited with respect to this change. |
| default: | ||
| // Prefer candidates with fewer diagnostics. | ||
| IEnumerable<BoundLambda> minDiagnosticsGroup = candidates.GroupBy(lambda => lambda.Diagnostics.Length).OrderBy(group => group.Key).First(); | ||
| IEnumerable<KeyValuePair<T, BoundLambda>> minDiagnosticsGroup = candidates.GroupBy(lambda => lambda.Value.Diagnostics.Length).OrderBy(group => group.Key).First(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lambda [](start = 103, length = 6)
Consider renaming parameter to pair or something similar.
| return minDiagnosticsGroup | ||
| .OrderBy(lambda => GetLambdaSortString(lambda.Symbol)) | ||
| .FirstOrDefault(); | ||
| .OrderBy(lambda => GetLambdaSortString(lambda.Value.Symbol)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lambda [](start = 33, length = 6)
Same suggestion here.
| .OrderBy(lambda => GetLambdaSortString(lambda.Symbol)) | ||
| .FirstOrDefault(); | ||
| .OrderBy(lambda => GetLambdaSortString(lambda.Value.Symbol)) | ||
| .FirstOrDefault() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FirstOrDefault [](start = 25, length = 14)
This probably should be First(), it looks like we don't expect the sequence to be empty.
AlekseyTs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (iteration 4)
|
📝 The allocation savings do not appear to have changed significantly from the original report. |
@sharwell Thanks for following up on this. I am curious what exactly are we measuring. "Running analyzer during a build is slower than it should be" isn't quite clear in this respect. Does analyzer do something specific? Could you please provide more details? These questions are not blocking this PR. |
This pull request is the result of an open-ended performance investigation. A test scenario was constructed (#23582), and code changes were applied according to the prominent sources of allocations revealed by the test. Each change was described in terms of gains made against the baseline results for the constructed scenario. |
|
Could you please share some details about the test scenario? |
|
@AlekseyTs I think I edited my comment while you were reading it. I added a link to the test scenario I used for this (it's the same link that appears in the |
|
@AlekseyTs If you know of a sample program which forces these maps to get large, let me know. I can run profiling passes on them to verify the change to a tree-based map will not cause regressions on code that differs from the input I used so far. |
|
@sharwell Are we certain that analyzers executed as part of the scenario themselves doing things in an efficient manner, i.e. not causing compiler to redo work unnecessary, etc. For example, code in https://github.com/dotnet/roslyn/blob/master/src/Features/Core/Portable/UseThrowExpression/AbstractUseThrowExpressionDiagnosticAnalyzer.cs requests separate SemanticModel and causes compiler to rebind the same code again and build another IOperation tree for the same code. |
|
@AlekseyTs to the contrary, by all metrics I have encountered analyzers are frequently extremely inefficient and not in obvious ways. However, with analyzers running as an integral part of the build it's increasingly hard to separate the two. |
|
@Pilchie or @MeiChin-Tsai for ask mode |
|
I'm in favor of the change for the scenario, but I'll leave it to @MeiChin-Tsai to approve in case she has any concerns about risk. |
|
Can you please rebase this to 15.7.x (which is not in ask mode)? |
gafter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
|
I agree with @gafter |
|
@MeiChin-Tsai This is now retargeted to 15.7 |
|
To the best i know compiler side is not in ask mode for 15.7 yet so you should be good to merge this. @jaredpar correct me if thatz not the case |
|
@jinujoseph that is correct. |

Customer scenario
Running analyzer during a build is slower than it should be, with the analyzer driver contributing substantial overhead even when the analyzers themselves are lightweight.
Bugs this fixes
Fixes #23463
Workarounds, if any
None needed
Risk
The dictionary algorithm is now optimistically concurrent, with a small risk of a performance regression in highly-contended scenarios. In addition, the dictionary lookup went from O(1) to O(log n). This should not be a problem as it favors the frequently used small dictionaries without making large edge cases unreasonable.
Performance impact
AnalyzerRunner indicates a reduction in allocations of 2.56GiB (2.9%).
Is this a regression from a previous update?
No.
Root cause analysis
AnalyzerRunner is a new tool for helping us test analyzer performance in isolation.
How was the bug found?
AnalyzerRunner.
Test documentation updated?
No.