Skip to content

Conversation

@sharwell
Copy link
Contributor

@sharwell sharwell commented Dec 2, 2017

Customer scenario

Running analyzer during a build is slower than it should be, with the analyzer driver contributing substantial overhead even when the analyzers themselves are lightweight.

Bugs this fixes

Fixes #23463

Workarounds, if any

None needed

Risk

The dictionary algorithm is now optimistically concurrent, with a small risk of a performance regression in highly-contended scenarios. In addition, the dictionary lookup went from O(1) to O(log n). This should not be a problem as it favors the frequently used small dictionaries without making large edge cases unreasonable.

Performance impact

AnalyzerRunner indicates a reduction in allocations of 2.56GiB (2.9%).

Is this a regression from a previous update?

No.

Root cause analysis

AnalyzerRunner is a new tool for helping us test analyzer performance in isolation.

How was the bug found?

AnalyzerRunner.

Test documentation updated?

No.

@sharwell sharwell changed the title Reduce allocations in UnboundLambda [WIP] Reduce allocations in UnboundLambda Dec 2, 2017
@sharwell sharwell changed the title [WIP] Reduce allocations in UnboundLambda Reduce allocations in UnboundLambda Dec 4, 2017
@sharwell sharwell force-pushed the optimize-unboundlambdastate branch from be91b2e to 22b4d28 Compare December 5, 2017 13:01
@sharwell sharwell requested a review from a team as a code owner December 5, 2017 13:01
@sharwell
Copy link
Contributor Author

sharwell commented Dec 5, 2017

@dotnet/roslyn-compiler for review

@sharwell sharwell requested a review from a team December 5, 2017 16:01
@gafter gafter added Area-Compilers Tenet-Performance Regression in measured performance of the product from goals. labels Dec 5, 2017
// when binding for real (not for return inference), there is still
// a good chance that we could reuse a body of a lambda previously bound for
// return type inference.
var cacheKey = ReturnInferenceCacheKey.Create(delegateType, IsAsync);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheKey [](start = 16, length = 8)

This cache key is not used as a cache key. In fact, it really isn't used, except to get things that are available on the delegate type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is, please avoid creating the cache key, and instead directly compute the parameter type array and parameter ref kind array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➡️ Reverted the key change; can evaluate separately.

}

return result;
var cacheKey = ReturnInferenceCacheKey.Create(delegateType, IsAsync);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheKey [](start = 16, length = 8)

Ditto

BoundLambda result;
if (!_bindingCache.TryGetValue(delegateType, out result))
if (_bindingCache.TryGetValue(delegateType, out var result))
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrency on these caches will be quite rare, and encountered solely based on concurrent calls into the compiler APIs. I think we are likely to be better served using a simple Dictionary with an explicit lock around its uses.

@gafter gafter requested a review from a team December 5, 2017 16:49
@gafter gafter self-assigned this Dec 5, 2017
private readonly ConcurrentDictionary<ReturnInferenceCacheKey, BoundLambda> _returnInferenceCache = new ConcurrentDictionary<ReturnInferenceCacheKey, BoundLambda>();
[PerformanceSensitive(
"https://github.com/dotnet/roslyn/issues/23582",
Constraint = "Avoid " + nameof(ConcurrentDictionary<NamedTypeSymbol, BoundLambda>) + " which has a large default size, but this cache is normally small.")]
Copy link
Contributor

@AlekseyTs AlekseyTs Dec 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which has a large default size, but this cache is normally small. [](start = 99, length = 65)

Could we provide explicit initial capacity instead, the reasonable small value? #Closed

{
BoundLambda result;
if (!_bindingCache.TryGetValue(delegateType, out result))
if (_bindingCache.TryGetValue(delegateType, out var result))
Copy link
Contributor

@AlekseyTs AlekseyTs Dec 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var [](start = 60, length = 3)

Please spell out the type, or better revert the inlining of the declaration. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 According to the .editorconfig for this section of the code, both forms are acceptable. We need to update the configuration to match the practices followed during the code review process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➡️ Reverted style changes to the extent possible.

Copy link
Contributor

@AlekseyTs AlekseyTs Dec 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update the configuration to match the practices followed during the code review process.

Note that the exact rules in compilers - var is not allowed unless the type is explicitly named in the initializer. #Closed

[PerformanceSensitive(
"https://github.com/dotnet/roslyn/issues/23582",
Constraint = "Avoid " + nameof(ConcurrentDictionary<NamedTypeSymbol, BoundLambda>) + " which has a large default size, but this cache is normally small.")]
private ImmutableDictionary<NamedTypeSymbol, BoundLambda> _returnInferenceCache = ImmutableDictionary<NamedTypeSymbol, BoundLambda>.Empty;
Copy link
Contributor

@AlekseyTs AlekseyTs Dec 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NamedTypeSymbol [](start = 110, length = 15)

The change in the key type looks wrong. #Closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call it so much wrong as replacing one apparently unimportant optimization (sharing the cache for semantically identical delegate types) for another apparently important optimization. I believe the transformation preserves correctness of the compiler. Do you, @AlekseyTs, believe that change is semantically incorrect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is going to be important to understand. According to the previous implementation of BindForReturnTypeInference, returning a new result from ReallyInferReturnType for the same delegate type is acceptable in the case of a cache miss (note the use of TryAdd instead of GetOrAdd). I verified that the conversion from NamedTypeSymbol to ReturnInferenceCacheKey is a deterministic function, but incorrectly treated it as one-to-one.

Copy link
Contributor

@AlekseyTs AlekseyTs Dec 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither description of the change, nor comments mentioned the type change for the key. The motivation behind the change was not explained, no numbers supporting the change were provided. I am talking exclusively about the type change for the key. If you have numbers supporting that the type change on its own leads to noticeable improvement, and you are confident that the change is not going to regress other scenarios, then we should get rid of the ReturnInferenceCacheKey type altogether. #Closed


BoundLambda result;
if (!_returnInferenceCache.TryGetValue(cacheKey, out result))
if (_returnInferenceCache.TryGetValue(delegateType, out var result))
Copy link
Contributor

@AlekseyTs AlekseyTs Dec 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var [](start = 68, length = 3)

Please spell out the type, or better revert the inlining of the declaration. In general, var is not welcome in compilers code base. #Closed

Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert changes to the key type of the dictionaries and all refactorings that were triggered by the key type change.

@sharwell sharwell changed the title Reduce allocations in UnboundLambda [WIP] Reduce allocations in UnboundLambda Dec 7, 2017
@sharwell
Copy link
Contributor Author

sharwell commented Dec 7, 2017

@dotnet/roslyn-compiler This is ready for updated review. Changes are restricted to the type of dictionary used. I will update the original post with allocation savings under the new approach when available.

@gafter @AlekseyTs You both mentioned using different dictionary types. Do you have examples of where these dictionaries get large? My belief is these are typically single-element dictionaries, in which case the current tree-based dictionary would be as efficient as anything else. It seems like you would need a very large number of very large sets to give anything else an advantage. Unfortunately my test suite is limited with respect to this change.

@sharwell sharwell changed the title [WIP] Reduce allocations in UnboundLambda Reduce allocations in UnboundLambda Dec 7, 2017
@jasonmalinowski jasonmalinowski removed the request for review from a team December 7, 2017 20:15
default:
// Prefer candidates with fewer diagnostics.
IEnumerable<BoundLambda> minDiagnosticsGroup = candidates.GroupBy(lambda => lambda.Diagnostics.Length).OrderBy(group => group.Key).First();
IEnumerable<KeyValuePair<T, BoundLambda>> minDiagnosticsGroup = candidates.GroupBy(lambda => lambda.Value.Diagnostics.Length).OrderBy(group => group.Key).First();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lambda [](start = 103, length = 6)

Consider renaming parameter to pair or something similar.

return minDiagnosticsGroup
.OrderBy(lambda => GetLambdaSortString(lambda.Symbol))
.FirstOrDefault();
.OrderBy(lambda => GetLambdaSortString(lambda.Value.Symbol))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lambda [](start = 33, length = 6)

Same suggestion here.

.OrderBy(lambda => GetLambdaSortString(lambda.Symbol))
.FirstOrDefault();
.OrderBy(lambda => GetLambdaSortString(lambda.Value.Symbol))
.FirstOrDefault()
Copy link
Contributor

@AlekseyTs AlekseyTs Dec 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FirstOrDefault [](start = 25, length = 14)

This probably should be First(), it looks like we don't expect the sequence to be empty.

@AlekseyTs AlekseyTs dismissed their stale review December 8, 2017 00:46

Obsolete

Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (iteration 4)

@sharwell
Copy link
Contributor Author

sharwell commented Dec 8, 2017

📝 The allocation savings do not appear to have changed significantly from the original report.

@AlekseyTs
Copy link
Contributor

The allocation savings do not appear to have changed significantly from the original report.

@sharwell Thanks for following up on this. I am curious what exactly are we measuring. "Running analyzer during a build is slower than it should be" isn't quite clear in this respect. Does analyzer do something specific? Could you please provide more details? These questions are not blocking this PR.

@sharwell
Copy link
Contributor Author

sharwell commented Dec 8, 2017

I am curious what exactly are we measuring. "Running analyzer during a build is slower than it should be" isn't quite clear in this respect. Does analyzer do something specific? Could you please provide more details? These questions are not blocking this PR.

This pull request is the result of an open-ended performance investigation. A test scenario was constructed (#23582), and code changes were applied according to the prominent sources of allocations revealed by the test. Each change was described in terms of gains made against the baseline results for the constructed scenario.

@AlekseyTs
Copy link
Contributor

Could you please share some details about the test scenario?

@sharwell
Copy link
Contributor Author

sharwell commented Dec 8, 2017

@AlekseyTs I think I edited my comment while you were reading it. I added a link to the test scenario I used for this (it's the same link that appears in the [PerformanceSensitive] attribute).

@sharwell
Copy link
Contributor Author

sharwell commented Dec 8, 2017

@AlekseyTs If you know of a sample program which forces these maps to get large, let me know. I can run profiling passes on them to verify the change to a tree-based map will not cause regressions on code that differs from the input I used so far.

@AlekseyTs
Copy link
Contributor

@sharwell Are we certain that analyzers executed as part of the scenario themselves doing things in an efficient manner, i.e. not causing compiler to redo work unnecessary, etc. For example, code in https://github.com/dotnet/roslyn/blob/master/src/Features/Core/Portable/UseThrowExpression/AbstractUseThrowExpressionDiagnosticAnalyzer.cs requests separate SemanticModel and causes compiler to rebind the same code again and build another IOperation tree for the same code.

@sharwell
Copy link
Contributor Author

sharwell commented Dec 8, 2017

@AlekseyTs to the contrary, by all metrics I have encountered analyzers are frequently extremely inefficient and not in obvious ways. However, with analyzers running as an integral part of the build it's increasingly hard to separate the two.

@gafter gafter removed their assignment Dec 26, 2017
@sharwell sharwell changed the base branch from master to dev15.6.x January 9, 2018 16:06
@sharwell sharwell added this to the 15.6 milestone Jan 9, 2018
@sharwell
Copy link
Contributor Author

@Pilchie or @MeiChin-Tsai for ask mode

@Pilchie
Copy link
Member

Pilchie commented Jan 17, 2018

I'm in favor of the change for the scenario, but I'll leave it to @MeiChin-Tsai to approve in case she has any concerns about risk.

@jinujoseph jinujoseph modified the milestones: 15.6, 15.7 Jan 31, 2018
@gafter
Copy link
Member

gafter commented Feb 1, 2018

Can you please rebase this to 15.7.x (which is not in ask mode)?

Copy link
Member

@gafter gafter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@gafter gafter self-assigned this Feb 1, 2018
@MeiChin-Tsai
Copy link

I agree with @gafter

@sharwell sharwell changed the base branch from dev15.6.x to dev15.7.x February 2, 2018 00:01
@sharwell
Copy link
Contributor Author

sharwell commented Feb 2, 2018

@MeiChin-Tsai This is now retargeted to 15.7

@jinujoseph
Copy link
Contributor

To the best i know compiler side is not in ask mode for 15.7 yet so you should be good to merge this. @jaredpar correct me if thatz not the case

@jaredpar
Copy link
Member

jaredpar commented Feb 2, 2018

@jinujoseph that is correct.

@sharwell sharwell merged commit ce5a3f1 into dotnet:dev15.7.x Feb 2, 2018
@sharwell sharwell deleted the optimize-unboundlambdastate branch February 2, 2018 21:11
@gafter
Copy link
Member

gafter commented Feb 2, 2018

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Compilers Tenet-Performance Regression in measured performance of the product from goals.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants