List: improved performance of InsertRange for lazy enumerable #17181

stilettk · 2018-03-24T01:46:13Z

Summary

Fixes #14876. Improved performance of InsertRange() when inserting lazy IEnumerable in the middle of a list by up to 99%.

Description

Detailed explanation can also be found in #14876. Short description:

If IEnumerable is being inserted in the list (not added to the end), it simply inserts items one-ny-one, shifting right part of the list on each iteration. It results in O(M*N) complexity. Improved logic is to enumerate collection to the end of the list and then move enumerated items to desired position. This is O(max(M,N)) complexity.

Benchmarks

I have tested by inserting IEnumerable to the beginning of the list (worst possible scenario). For example, Insert1000In100000List means inserting enumerable with 1000 items to list with 100000 items.

BenchmarkDotNet=v0.10.13, OS=Windows 10 Redstone 1 [1607, Anniversary Update] (10.0.14393.2125)
Intel Core i7-7800X CPU 3.50GHz (Kaby Lake), 1 CPU, 6 logical cores and 6 physical cores
.NET Core SDK=2.1.300-preview3-008391
  [Host]     : .NET Core 2.0.5 (CoreCLR 4.6.26020.03, CoreFX 4.6.26018.01), 64bit RyuJIT
  Job-HXIMAP : .NET Core ? (CoreCLR 4.6.0.0, CoreFX 4.6.26420.0), 64bit RyuJIT
  Job-MVNXHZ : .NET Core ? (CoreCLR 4.6.26310.01, CoreFX 4.6.26420.0), 64bit RyuJIT

Method	Toolchain	Mean	Error	StdDev
Insert1000In100000List	before	17,031.6 us	199.233 us	186.362 us
Insert1000In1000000List	before	277,366.5 us	3,890.782 us	3,449.077 us
Insert10000In10000List	before	16,673.8 us	177.103 us	165.662 us
Insert1000In100000List	after	788.1 us	8.808 us	8.239 us
Insert1000In1000000List	after	6,302.9 us	83.200 us	73.755 us
Insert10000In10000List	after	149.2 us	2.440 us	2.282 us

Remarks

There is a special case: what if IEnumerable throws an exception? By default list will end up in incorrect state: inserted items will be at the end of the list, not in desired position. I was choosing between two possible solutions:

Return the list to original state
Restore the list to a valid state

I have chosen second approach for consistency, because AddRange (=InsertRange at the end) works the same way: in case of exception it ends up in a partially inserted enumerable too.

Improved performance of List.InsertRange() when inserting lazy enumerable in the middle of a list. Fix #14876

dnfclas · 2018-03-24T01:46:26Z

All CLA requirements met.

danmoseley · 2018-03-24T04:17:38Z

@stilettk thanks! Can you check we already have sufficient tests for this?

danmoseley · 2018-03-24T04:19:09Z

src/mscorlib/shared/System/Collections/Generic/List.cs

                    {
-                        Insert(index++, en.Current);
+                        // If items were added, move new items to the required position.
+                        // It needs to be done even if enumeration fails to maintain the clean state.


Comma after fails?

danmoseley · 2018-03-24T04:20:41Z

src/mscorlib/shared/System/Collections/Generic/List.cs

+                        // It needs to be done even if enumeration fails to maintain the clean state.
+
+                        // Swap [index, initialSize] and [initialSize, _size] segments.
+                        Array.Reverse(_items, index, initialSize - index);


Is performing these three reverses faster than a single explicit loop?

Can you please explain what algorithm with a single loop do you mean? Please consider that the lengths of swapped segments can be different, so simple swapping in a loop won't work.

The new algorithm is not unconditionally better. This change will regress performance for small range inserted into large list. For example:

static IEnumerable<int> MyRange() { yield return 1; yield return 42; } static void Main() { var l = new List<int>(); for (int i = 0; i < 100000; i++) l.Add(i); int start = Environment.TickCount; for (int i = 0; i < 10000; i++) { l.InsertRange(1, MyRange()); } int end = Environment.TickCount; Console.WriteLine((end-start).ToString()); }

Before this change: 438ms. With this change: 969ms.

@jkotas Nice find! Indeed, for small enumerable O(max(M,N))~O(M*N) due to the nature of O(n) approximation. For example, for your case (N=2) there are 2 Array.Copy calls and three Array.Reverse calls.

I have tried to find how counts influence the results by solving complexity equations based on iteration counts, but I didn't succeed because there are too many external factors which can't be approximated.

While I didn't find an exact dependency, I made some benchmarks and found that the new algorithm is always faster for N >= 10 (benchmarks list is below). The exact number is different for different M (that's why the new algorithm is faster for M=1000000 and N = 5). But for N<10 the difference is not so large as for large N.

So, we can use the old algorithm for N<10, and the new one for N>=10. "Magic number" constant of 10 is bothering me a bit, but I think it is somehow based on difference of linear (new) vs quadratic (old) algorithm complexities.

What do you think?

Benchmarks:

BenchmarkDotNet=v0.10.13, OS=Windows 10 Redstone 1 [1607, Anniversary Update] (10.0.14393.2125) Intel Core i7-7800X CPU 3.50GHz (Kaby Lake), 1 CPU, 6 logical cores and 6 physical cores .NET Core SDK=2.1.300-preview3-008391 [Host] : .NET Core 2.0.5 (CoreCLR 4.6.26020.03, CoreFX 4.6.26018.01), 64bit RyuJIT after : .NET Core ? (CoreCLR 4.6.0.0, CoreFX 4.6.26420.0), 64bit RyuJIT before : .NET Core ? (CoreCLR 4.6.26310.01, CoreFX 4.6.26420.0), 64bit RyuJIT

Job Toolchain Counts Mean Error StdDev

after after insert 10 in 1000000 6,353,462.1 ns 60,910.153 ns 50,862.732 ns

before before insert 10 in 1000000 8,060,061.3 ns 87,981.266 ns 68,689.979 ns

after after insert 10 in 100000 778,029.3 ns 10,224.351 ns 8,537.796 ns

before before insert 10 in 100000 825,851.4 ns 9,417.468 ns 8,348.339 ns

after after insert 10 in 10000 50,538.9 ns 728.637 ns 645.917 ns

before before insert 10 in 10000 54,620.9 ns 393.464 ns 348.795 ns

after after insert 10 in 1000 4,902.6 ns 65.577 ns 61.341 ns

before before insert 10 in 1000 5,004.0 ns 55.883 ns 49.539 ns

after after insert 10 in 100 723.4 ns 5.421 ns 4.527 ns

before before insert 10 in 100 946.8 ns 11.479 ns 10.738 ns

after after insert 10 in 10 280.6 ns 5.380 ns 5.524 ns

before before insert 10 in 10 505.9 ns 6.787 ns 6.349 ns

after after insert 10 in 1 344.6 ns 5.514 ns 5.157 ns

before before insert 10 in 1 530.8 ns 7.409 ns 6.930 ns

after after insert 5 in 1000000 6,290,664.5 ns 92,389.944 ns 86,421.602 ns

before before insert 5 in 1000000 6,507,284.5 ns 126,311.112 ns 98,615.399 ns

after after insert 5 in 100000 770,505.4 ns 8,866.949 ns 8,294.149 ns

before before insert 5 in 100000 748,731.3 ns 10,642.227 ns 9,954.745 ns

after after insert 5 in 10000 50,054.9 ns 474.652 ns 420.766 ns

before before insert 5 in 10000 47,189.2 ns 669.830 ns 593.787 ns

after after insert 5 in 1000 4,840.6 ns 78.517 ns 73.445 ns

before before insert 5 in 1000 4,286.1 ns 44.895 ns 41.994 ns

after after insert 5 in 100 664.7 ns 4.691 ns 3.917 ns

before before insert 5 in 100 713.8 ns 6.343 ns 5.933 ns

after after insert 5 in 10 226.4 ns 4.033 ns 3.772 ns

before before insert 5 in 10 324.6 ns 5.109 ns 4.779 ns

after after insert 5 in 1 253.3 ns 4.510 ns 4.219 ns

before before insert 5 in 1 352.1 ns 5.881 ns 5.501 ns

With both algorithms included in the implementation, the cost of the this optimization will be the extra code size. List is a very popular type and the code for it is stamped many times. It may be useful to take the extra code size into account as well when evaluating different options.

I have thought about it a bit, just didn't have time to express it. My above solution isn't correct because N=10 was found with benchmarks on my system only, but there can be many other configurations, processor architectures and so on.

As for the original pull request, I didn't find any other algorithm which has the same performance for N=2. I have tested the algorithm that goes through array exactly once, and it has roughly the same performance as the current one.

Eventually, I hardly doubt that any managed algorithm can compete with unmanaged Array.Copy calls. So, I see two options:

Decline the change.

Everything remains the same

Pull request and appropriate issue should be closed

Accept the change

~1.5x performance decrease for small N

Up to 99% performance increase for big N

Linear (not quadratic) algorithm complexity

What do you think?

@jkotas any thoughts on this proposed options?

Well, these improvements that help some cases and hurt other cases are hard to judge.

For cases like these, I usually try to look at the API telemetry and usage to try to convince myself that the change is worth it.

We may consider dropping the extra code that tries to keep the List consistent even in presence of exceptions. The ICollection path above is not keeping the list consistent in the presence of exceptions either.

stilettk · 2018-03-24T08:46:27Z

Can you check we already have sufficient tests for this?

Unfortunately, we don't. There are tests of InsertRange in CoreFX, but only inserting of ICollection is tested (see #14876 (comment)). This is because test collection returns IEnumerable<T>, which is still an array underneath (collection is ICollection<T> returns true):

public IEnumerable<T> ConstructTestCollection(T[] items)
{
    return items;
}

This needs to be fixed in another PR later, because tests are in another repo. Anyway, I have changed ConstructTestCollection method to return pure IEnumerable<T>, and tests have succeeded.

If you mean performance tests, we don't have them for InsertRange either. Again, it needs to be done in separate issue since this is another repo.

Some checks were not successful

Is there something needs worrying of? Is failed "Tizen armel Cross Checked Innerloop Build and Test" critical?

safern · 2018-03-30T00:09:17Z

Note that there was some coverage added for this yesterday in: dotnet/corefx#28465

jkotas · 2018-12-01T14:37:22Z

The earlier change that this is improving upon was found to be problematic and reverted in #20471.

Thanks for your PR anyway.

List: improved performance of InsertRange

88237f6

Improved performance of List.InsertRange() when inserting lazy enumerable in the middle of a list. Fix #14876

danmoseley reviewed Mar 24, 2018

View reviewed changes

RussKeldorph added the area-System.Collections label Apr 13, 2018

jkotas closed this Dec 1, 2018

Job	Toolchain	Counts	Mean	Error	StdDev
after	after	insert 10 in 1000000	6,353,462.1 ns	60,910.153 ns	50,862.732 ns
before	before	insert 10 in 1000000	8,060,061.3 ns	87,981.266 ns	68,689.979 ns
after	after	insert 10 in 100000	778,029.3 ns	10,224.351 ns	8,537.796 ns
before	before	insert 10 in 100000	825,851.4 ns	9,417.468 ns	8,348.339 ns
after	after	insert 10 in 10000	50,538.9 ns	728.637 ns	645.917 ns
before	before	insert 10 in 10000	54,620.9 ns	393.464 ns	348.795 ns
after	after	insert 10 in 1000	4,902.6 ns	65.577 ns	61.341 ns
before	before	insert 10 in 1000	5,004.0 ns	55.883 ns	49.539 ns
after	after	insert 10 in 100	723.4 ns	5.421 ns	4.527 ns
before	before	insert 10 in 100	946.8 ns	11.479 ns	10.738 ns
after	after	insert 10 in 10	280.6 ns	5.380 ns	5.524 ns
before	before	insert 10 in 10	505.9 ns	6.787 ns	6.349 ns
after	after	insert 10 in 1	344.6 ns	5.514 ns	5.157 ns
before	before	insert 10 in 1	530.8 ns	7.409 ns	6.930 ns
after	after	insert 5 in 1000000	6,290,664.5 ns	92,389.944 ns	86,421.602 ns
before	before	insert 5 in 1000000	6,507,284.5 ns	126,311.112 ns	98,615.399 ns
after	after	insert 5 in 100000	770,505.4 ns	8,866.949 ns	8,294.149 ns
before	before	insert 5 in 100000	748,731.3 ns	10,642.227 ns	9,954.745 ns
after	after	insert 5 in 10000	50,054.9 ns	474.652 ns	420.766 ns
before	before	insert 5 in 10000	47,189.2 ns	669.830 ns	593.787 ns
after	after	insert 5 in 1000	4,840.6 ns	78.517 ns	73.445 ns
before	before	insert 5 in 1000	4,286.1 ns	44.895 ns	41.994 ns
after	after	insert 5 in 100	664.7 ns	4.691 ns	3.917 ns
before	before	insert 5 in 100	713.8 ns	6.343 ns	5.933 ns
after	after	insert 5 in 10	226.4 ns	4.033 ns	3.772 ns
before	before	insert 5 in 10	324.6 ns	5.109 ns	4.779 ns
after	after	insert 5 in 1	253.3 ns	4.510 ns	4.219 ns
before	before	insert 5 in 1	352.1 ns	5.881 ns	5.501 ns

List: improved performance of InsertRange for lazy enumerable #17181

List: improved performance of InsertRange for lazy enumerable #17181

Uh oh!

Conversation

stilettk commented Mar 24, 2018

Summary

Description

Benchmarks

Remarks

Uh oh!

dnfclas commented Mar 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danmoseley commented Mar 24, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Mar 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stilettk Apr 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stilettk commented Mar 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

safern commented Mar 30, 2018

Uh oh!

jkotas commented Dec 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dnfclas commented Mar 24, 2018 •

edited

Loading

jkotas Mar 24, 2018 •

edited

Loading

stilettk Apr 26, 2018 •

edited

Loading

stilettk commented Mar 24, 2018 •

edited

Loading