-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Add array-fast path in Enumerable.Min/Max(int/long) #64470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,7 +2,7 @@ | |
| // The .NET Foundation licenses this file to you under the MIT license. | ||
|
|
||
| using System.Collections.Generic; | ||
| using System.Diagnostics.CodeAnalysis; | ||
| using System.Numerics; | ||
|
|
||
| namespace System.Linq | ||
| { | ||
|
|
@@ -15,6 +15,11 @@ public static int Max(this IEnumerable<int> source) | |
| ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source); | ||
| } | ||
|
|
||
| if (source.GetType() == typeof(int[])) | ||
| { | ||
| return Max((int[])source); | ||
| } | ||
|
|
||
| int value; | ||
| using (IEnumerator<int> e = source.GetEnumerator()) | ||
| { | ||
|
|
@@ -37,6 +42,57 @@ public static int Max(this IEnumerable<int> source) | |
| return value; | ||
| } | ||
|
|
||
| private static int Max(int[] array) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So what I'd really like to see here is for many of these methods to be available on Exposing new public APIs is of course something that needs to goto API proposal, but I think it would be better long term, would also allow more logic sharing for various helper APIs, and would centralize most of the "array like SIMD algorithms" to the same place in the BCL.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd be fine subsequently seeing such methods on MemoryExtensions and deleting the code from here to delegate there instead. Someone would need to do the work to figure out what that set of methods should be. I also don't think it precludes adding this now and then delegating later if/when the methods exist.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, I just wanted to call out what I'd like to see here and I think just about any method we think is worth special-casing for Sum/Min/Max are the "most obvious ones" |
||
| { | ||
| if (array.Length == 0) | ||
| { | ||
| ThrowHelper.ThrowNoElementsException(); | ||
| } | ||
|
|
||
| // Vectorize the search if possible. | ||
| int index, value; | ||
| if (Vector.IsHardwareAccelerated && array.Length >= Vector<int>.Count * 2) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we know how big the average input here is? This is going to give us different perf characteristics between Arm64 and x64 because the former will trigger for I'd expect we want similar handling in both and while manually unrolling may pipeline well on x64, that may not be the case on Arm64 or all x64 based chipsets. Generally the using
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not sure. Are you suggesting a default stance should be to use
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (This is why I opened #64409. We need really clear guidance on when to use which.)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes. There is a lot of nuance here such as I'd generally expect that we do something like:
There are likely cases where microbenches or modern hardware (roughly Skylake and Zen2 or newer) will show
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noting that I wouldn't block this PR on that doc being written up. I think this largely matches what we're already doing for other cases. Just that I believe this is roughly what the writeup will follow based on my own knowledge/experience.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We have many vectorized methods now. From a cursory scan, they all either use
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not aware of any existing SIMD code in the BCL where we need to or should only support There are likely places where There are likely places where we would see perf benefits by adding a The BCL is special from many angles and the guidance we have for our own usage may be more complex than the default recommendation for the typical user or individual OSS maintainer looking to accelerate their app.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Put another way, we are the core foundation on which a lot of other code is built and run. We have no way of statically knowing how any given app will use our own code and being the baseline the places where our code is used varies a lot. This is different from many applications or smaller libraries out there where they can easily get away with having simpler code/guidance and still seeing good benefits. This doesn't mean that we shouldn't provide or make available the extended guidance as well nor that other libraries may not have the same or similar considerations. But there are multiple angles and target audiences here and no one single answer.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Potentially put another way in an attempt at drawing a parallel, the BCL provides Most apps/libraries only need to use Some apps/libraries can then get additional perf benefits by providing additional overloads which use You can then go a step further and "hyper-optimize" by doing things like casting All of these options are valid and we use all of them throughout the BCL depending on what the exact circumstances and needs are. In the same way, it is fine for most user code to just use
Using platform specific intrinsics is a case of "hyper-optimization". You use these when you need the utmost control of the codegen and are trying to eek out every bit of performance or where the xplat helpers happen to fall short.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Right
I do not believe the core libraries are special in this regard. If someone is reaching for If the guidance we have doesn't work for us, it doesn't work. What I outlined in #64470 (comment) is essentially what we do, where the starting point is writing one |
||
| { | ||
| // The array is at least two vectors long. Create a vector from the first N elements, | ||
| // and then repeatedly compare that against the next vector from the array. At the end, | ||
| // the resulting vector will contain the maximum values found, and we then need only | ||
| // to find the max of those. | ||
| var maxes = new Vector<int>(array); | ||
| index = Vector<int>.Count; | ||
| do | ||
| { | ||
| maxes = Vector.Max(maxes, new Vector<int>(array, index)); | ||
| index += Vector<int>.Count; | ||
| } | ||
| while (index + Vector<int>.Count <= array.Length); | ||
|
|
||
| value = maxes[0]; | ||
| for (int i = 1; i < Vector<int>.Count; i++) | ||
| { | ||
| if (maxes[i] > value) | ||
| { | ||
| value = maxes[i]; | ||
| } | ||
| } | ||
| } | ||
| else | ||
| { | ||
| value = array[0]; | ||
| index = 1; | ||
| } | ||
|
|
||
| // Iterate through the remaining elements, comparing against the max. | ||
| for (int i = index; (uint)i < (uint)array.Length; i++) | ||
| { | ||
| if (array[i] > value) | ||
| { | ||
| value = array[i]; | ||
| } | ||
| } | ||
|
|
||
| return value; | ||
| } | ||
|
|
||
| public static int? Max(this IEnumerable<int?> source) | ||
| { | ||
| if (source == null) | ||
|
|
@@ -106,6 +162,11 @@ public static long Max(this IEnumerable<long> source) | |
| ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source); | ||
| } | ||
|
|
||
| if (source.GetType() == typeof(long[])) | ||
| { | ||
| return Max((long[])source); | ||
| } | ||
|
|
||
| long value; | ||
| using (IEnumerator<long> e = source.GetEnumerator()) | ||
| { | ||
|
|
@@ -128,6 +189,58 @@ public static long Max(this IEnumerable<long> source) | |
| return value; | ||
| } | ||
|
|
||
| private static long Max(long[] array) | ||
| { | ||
| if (array.Length == 0) | ||
| { | ||
| ThrowHelper.ThrowNoElementsException(); | ||
| } | ||
|
|
||
| // Vectorize the search if possible. | ||
| int index; | ||
| long value; | ||
| if (Vector.IsHardwareAccelerated && array.Length >= Vector<long>.Count * 2) | ||
| { | ||
| // The array is at least two vectors long. Create a vector from the first N elements, | ||
| // and then repeatedly compare that against the next vector from the array. At the end, | ||
| // the resulting vector will contain the maximum values found, and we then need only | ||
| // to find the max of those. | ||
| var maxes = new Vector<long>(array); | ||
| index = Vector<long>.Count; | ||
| do | ||
| { | ||
| maxes = Vector.Max(maxes, new Vector<long>(array, index)); | ||
| index += Vector<long>.Count; | ||
| } | ||
| while (index + Vector<long>.Count <= array.Length); | ||
|
|
||
| value = maxes[0]; | ||
| for (int i = 1; i < Vector<long>.Count; i++) | ||
| { | ||
| if (maxes[i] > value) | ||
| { | ||
| value = maxes[i]; | ||
| } | ||
| } | ||
| } | ||
| else | ||
| { | ||
| value = array[0]; | ||
| index = 1; | ||
| } | ||
|
|
||
| // Iterate through the remaining elements, comparing against the max. | ||
| for (int i = index; (uint)i < (uint)array.Length; i++) | ||
| { | ||
| if (array[i] > value) | ||
| { | ||
| value = array[i]; | ||
| } | ||
| } | ||
|
|
||
| return value; | ||
| } | ||
|
|
||
| public static long? Max(this IEnumerable<long?> source) | ||
| { | ||
| if (source == null) | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.