-
Notifications
You must be signed in to change notification settings - Fork 2.6k
List: improved performance of InsertRange for lazy enumerable #17181
Conversation
Improved performance of List.InsertRange() when inserting lazy enumerable in the middle of a list. Fix #14876
|
@stilettk thanks! Can you check we already have sufficient tests for this? |
| { | ||
| Insert(index++, en.Current); | ||
| // If items were added, move new items to the required position. | ||
| // It needs to be done even if enumeration fails to maintain the clean state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comma after fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix!
| // It needs to be done even if enumeration fails to maintain the clean state. | ||
|
|
||
| // Swap [index, initialSize] and [initialSize, _size] segments. | ||
| Array.Reverse(_items, index, initialSize - index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is performing these three reverses faster than a single explicit loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain what algorithm with a single loop do you mean? Please consider that the lengths of swapped segments can be different, so simple swapping in a loop won't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new algorithm is not unconditionally better. This change will regress performance for small range inserted into large list. For example:
static IEnumerable<int> MyRange()
{
yield return 1;
yield return 42;
}
static void Main()
{
var l = new List<int>();
for (int i = 0; i < 100000; i++)
l.Add(i);
int start = Environment.TickCount;
for (int i = 0; i < 10000; i++)
{
l.InsertRange(1, MyRange());
}
int end = Environment.TickCount;
Console.WriteLine((end-start).ToString());
}
Before this change: 438ms. With this change: 969ms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas Nice find! Indeed, for small enumerable O(max(M,N))~O(M*N) due to the nature of O(n) approximation. For example, for your case (N=2) there are 2 Array.Copy calls and three Array.Reverse calls.
I have tried to find how counts influence the results by solving complexity equations based on iteration counts, but I didn't succeed because there are too many external factors which can't be approximated.
While I didn't find an exact dependency, I made some benchmarks and found that the new algorithm is always faster for N >= 10 (benchmarks list is below). The exact number is different for different M (that's why the new algorithm is faster for M=1000000 and N = 5). But for N<10 the difference is not so large as for large N.
So, we can use the old algorithm for N<10, and the new one for N>=10. "Magic number" constant of 10 is bothering me a bit, but I think it is somehow based on difference of linear (new) vs quadratic (old) algorithm complexities.
What do you think?
Benchmarks:
BenchmarkDotNet=v0.10.13, OS=Windows 10 Redstone 1 [1607, Anniversary Update] (10.0.14393.2125)
Intel Core i7-7800X CPU 3.50GHz (Kaby Lake), 1 CPU, 6 logical cores and 6 physical cores
.NET Core SDK=2.1.300-preview3-008391
[Host] : .NET Core 2.0.5 (CoreCLR 4.6.26020.03, CoreFX 4.6.26018.01), 64bit RyuJIT
after : .NET Core ? (CoreCLR 4.6.0.0, CoreFX 4.6.26420.0), 64bit RyuJIT
before : .NET Core ? (CoreCLR 4.6.26310.01, CoreFX 4.6.26420.0), 64bit RyuJIT
| Job | Toolchain | Counts | Mean | Error | StdDev |
|---|---|---|---|---|---|
| after | after | insert 10 in 1000000 | 6,353,462.1 ns | 60,910.153 ns | 50,862.732 ns |
| before | before | insert 10 in 1000000 | 8,060,061.3 ns | 87,981.266 ns | 68,689.979 ns |
| after | after | insert 10 in 100000 | 778,029.3 ns | 10,224.351 ns | 8,537.796 ns |
| before | before | insert 10 in 100000 | 825,851.4 ns | 9,417.468 ns | 8,348.339 ns |
| after | after | insert 10 in 10000 | 50,538.9 ns | 728.637 ns | 645.917 ns |
| before | before | insert 10 in 10000 | 54,620.9 ns | 393.464 ns | 348.795 ns |
| after | after | insert 10 in 1000 | 4,902.6 ns | 65.577 ns | 61.341 ns |
| before | before | insert 10 in 1000 | 5,004.0 ns | 55.883 ns | 49.539 ns |
| after | after | insert 10 in 100 | 723.4 ns | 5.421 ns | 4.527 ns |
| before | before | insert 10 in 100 | 946.8 ns | 11.479 ns | 10.738 ns |
| after | after | insert 10 in 10 | 280.6 ns | 5.380 ns | 5.524 ns |
| before | before | insert 10 in 10 | 505.9 ns | 6.787 ns | 6.349 ns |
| after | after | insert 10 in 1 | 344.6 ns | 5.514 ns | 5.157 ns |
| before | before | insert 10 in 1 | 530.8 ns | 7.409 ns | 6.930 ns |
| after | after | insert 5 in 1000000 | 6,290,664.5 ns | 92,389.944 ns | 86,421.602 ns |
| before | before | insert 5 in 1000000 | 6,507,284.5 ns | 126,311.112 ns | 98,615.399 ns |
| after | after | insert 5 in 100000 | 770,505.4 ns | 8,866.949 ns | 8,294.149 ns |
| before | before | insert 5 in 100000 | 748,731.3 ns | 10,642.227 ns | 9,954.745 ns |
| after | after | insert 5 in 10000 | 50,054.9 ns | 474.652 ns | 420.766 ns |
| before | before | insert 5 in 10000 | 47,189.2 ns | 669.830 ns | 593.787 ns |
| after | after | insert 5 in 1000 | 4,840.6 ns | 78.517 ns | 73.445 ns |
| before | before | insert 5 in 1000 | 4,286.1 ns | 44.895 ns | 41.994 ns |
| after | after | insert 5 in 100 | 664.7 ns | 4.691 ns | 3.917 ns |
| before | before | insert 5 in 100 | 713.8 ns | 6.343 ns | 5.933 ns |
| after | after | insert 5 in 10 | 226.4 ns | 4.033 ns | 3.772 ns |
| before | before | insert 5 in 10 | 324.6 ns | 5.109 ns | 4.779 ns |
| after | after | insert 5 in 1 | 253.3 ns | 4.510 ns | 4.219 ns |
| before | before | insert 5 in 1 | 352.1 ns | 5.881 ns | 5.501 ns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With both algorithms included in the implementation, the cost of the this optimization will be the extra code size. List is a very popular type and the code for it is stamped many times. It may be useful to take the extra code size into account as well when evaluating different options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have thought about it a bit, just didn't have time to express it. My above solution isn't correct because N=10 was found with benchmarks on my system only, but there can be many other configurations, processor architectures and so on.
As for the original pull request, I didn't find any other algorithm which has the same performance for N=2. I have tested the algorithm that goes through array exactly once, and it has roughly the same performance as the current one.
Eventually, I hardly doubt that any managed algorithm can compete with unmanaged Array.Copy calls. So, I see two options:
- Decline the change.
- Everything remains the same
- Pull request and appropriate issue should be closed
- Accept the change
- ~1.5x performance decrease for small N
- Up to 99% performance increase for big N
- Linear (not quadratic) algorithm complexity
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas any thoughts on this proposed options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, these improvements that help some cases and hurt other cases are hard to judge.
For cases like these, I usually try to look at the API telemetry and usage to try to convince myself that the change is worth it.
We may consider dropping the extra code that tries to keep the List consistent even in presence of exceptions. The ICollection path above is not keeping the list consistent in the presence of exceptions either.
Unfortunately, we don't. There are tests of This needs to be fixed in another PR later, because tests are in another repo. Anyway, I have changed If you mean performance tests, we don't have them for
Is there something needs worrying of? Is failed "Tizen armel Cross Checked Innerloop Build and Test" critical? |
|
Note that there was some coverage added for this yesterday in: dotnet/corefx#28465 |
|
The earlier change that this is improving upon was found to be problematic and reverted in #20471. Thanks for your PR anyway. |
Summary
Fixes #14876. Improved performance of
InsertRange()when inserting lazyIEnumerablein the middle of a list by up to 99%.Description
Detailed explanation can also be found in #14876. Short description:
If
IEnumerableis being inserted in the list (not added to the end), it simply inserts items one-ny-one, shifting right part of the list on each iteration. It results in O(M*N) complexity. Improved logic is to enumerate collection to the end of the list and then move enumerated items to desired position. This is O(max(M,N)) complexity.Benchmarks
I have tested by inserting
IEnumerableto the beginning of the list (worst possible scenario). For example, Insert1000In100000List means inserting enumerable with 1000 items to list with 100000 items.Remarks
There is a special case: what if
IEnumerablethrows an exception? By default list will end up in incorrect state: inserted items will be at the end of the list, not in desired position. I was choosing between two possible solutions:I have chosen second approach for consistency, because
AddRange(=InsertRangeat the end) works the same way: in case of exception it ends up in a partially inserted enumerable too.