Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@stilettk
Copy link

Summary

Fixes #14876. Improved performance of InsertRange() when inserting lazy IEnumerable in the middle of a list by up to 99%.

Description

Detailed explanation can also be found in #14876. Short description:

If IEnumerable is being inserted in the list (not added to the end), it simply inserts items one-ny-one, shifting right part of the list on each iteration. It results in O(M*N) complexity. Improved logic is to enumerate collection to the end of the list and then move enumerated items to desired position. This is O(max(M,N)) complexity.

Benchmarks

I have tested by inserting IEnumerable to the beginning of the list (worst possible scenario). For example, Insert1000In100000List means inserting enumerable with 1000 items to list with 100000 items.

BenchmarkDotNet=v0.10.13, OS=Windows 10 Redstone 1 [1607, Anniversary Update] (10.0.14393.2125)
Intel Core i7-7800X CPU 3.50GHz (Kaby Lake), 1 CPU, 6 logical cores and 6 physical cores
.NET Core SDK=2.1.300-preview3-008391
  [Host]     : .NET Core 2.0.5 (CoreCLR 4.6.26020.03, CoreFX 4.6.26018.01), 64bit RyuJIT
  Job-HXIMAP : .NET Core ? (CoreCLR 4.6.0.0, CoreFX 4.6.26420.0), 64bit RyuJIT
  Job-MVNXHZ : .NET Core ? (CoreCLR 4.6.26310.01, CoreFX 4.6.26420.0), 64bit RyuJIT

Method Toolchain Mean Error StdDev
Insert1000In100000List before 17,031.6 us 199.233 us 186.362 us
Insert1000In1000000List before 277,366.5 us 3,890.782 us 3,449.077 us
Insert10000In10000List before 16,673.8 us 177.103 us 165.662 us
Insert1000In100000List after 788.1 us 8.808 us 8.239 us
Insert1000In1000000List after 6,302.9 us 83.200 us 73.755 us
Insert10000In10000List after 149.2 us 2.440 us 2.282 us

Remarks

There is a special case: what if IEnumerable throws an exception? By default list will end up in incorrect state: inserted items will be at the end of the list, not in desired position. I was choosing between two possible solutions:

  1. Return the list to original state
  2. Restore the list to a valid state

I have chosen second approach for consistency, because AddRange (=InsertRange at the end) works the same way: in case of exception it ends up in a partially inserted enumerable too.

Improved performance of List.InsertRange() when inserting lazy enumerable in the middle of a list.

Fix #14876
@dnfclas
Copy link

dnfclas commented Mar 24, 2018

CLA assistant check
All CLA requirements met.

@danmoseley
Copy link
Member

@stilettk thanks! Can you check we already have sufficient tests for this?

{
Insert(index++, en.Current);
// If items were added, move new items to the required position.
// It needs to be done even if enumeration fails to maintain the clean state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comma after fails?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix!

// It needs to be done even if enumeration fails to maintain the clean state.

// Swap [index, initialSize] and [initialSize, _size] segments.
Array.Reverse(_items, index, initialSize - index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is performing these three reverses faster than a single explicit loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain what algorithm with a single loop do you mean? Please consider that the lengths of swapped segments can be different, so simple swapping in a loop won't work.

Copy link
Member

@jkotas jkotas Mar 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new algorithm is not unconditionally better. This change will regress performance for small range inserted into large list. For example:

static IEnumerable<int> MyRange()
{
     yield return 1;
     yield return 42;
}

static void Main()
{
    var l = new List<int>();
    for (int i = 0; i < 100000; i++)
        l.Add(i);

    int start = Environment.TickCount;
    for (int i = 0; i < 10000; i++)
    {
        l.InsertRange(1, MyRange());
    }   
    int end = Environment.TickCount;
    Console.WriteLine((end-start).ToString());
}

Before this change: 438ms. With this change: 969ms.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas Nice find! Indeed, for small enumerable O(max(M,N))~O(M*N) due to the nature of O(n) approximation. For example, for your case (N=2) there are 2 Array.Copy calls and three Array.Reverse calls.

I have tried to find how counts influence the results by solving complexity equations based on iteration counts, but I didn't succeed because there are too many external factors which can't be approximated.

While I didn't find an exact dependency, I made some benchmarks and found that the new algorithm is always faster for N >= 10 (benchmarks list is below). The exact number is different for different M (that's why the new algorithm is faster for M=1000000 and N = 5). But for N<10 the difference is not so large as for large N.

So, we can use the old algorithm for N<10, and the new one for N>=10. "Magic number" constant of 10 is bothering me a bit, but I think it is somehow based on difference of linear (new) vs quadratic (old) algorithm complexities.

What do you think?

Benchmarks:

BenchmarkDotNet=v0.10.13, OS=Windows 10 Redstone 1 [1607, Anniversary Update] (10.0.14393.2125)
Intel Core i7-7800X CPU 3.50GHz (Kaby Lake), 1 CPU, 6 logical cores and 6 physical cores
.NET Core SDK=2.1.300-preview3-008391
  [Host] : .NET Core 2.0.5 (CoreCLR 4.6.26020.03, CoreFX 4.6.26018.01), 64bit RyuJIT
  after  : .NET Core ? (CoreCLR 4.6.0.0, CoreFX 4.6.26420.0), 64bit RyuJIT
  before : .NET Core ? (CoreCLR 4.6.26310.01, CoreFX 4.6.26420.0), 64bit RyuJIT

Job Toolchain Counts Mean Error StdDev
after after insert 10 in 1000000 6,353,462.1 ns 60,910.153 ns 50,862.732 ns
before before insert 10 in 1000000 8,060,061.3 ns 87,981.266 ns 68,689.979 ns
after after insert 10 in 100000 778,029.3 ns 10,224.351 ns 8,537.796 ns
before before insert 10 in 100000 825,851.4 ns 9,417.468 ns 8,348.339 ns
after after insert 10 in 10000 50,538.9 ns 728.637 ns 645.917 ns
before before insert 10 in 10000 54,620.9 ns 393.464 ns 348.795 ns
after after insert 10 in 1000 4,902.6 ns 65.577 ns 61.341 ns
before before insert 10 in 1000 5,004.0 ns 55.883 ns 49.539 ns
after after insert 10 in 100 723.4 ns 5.421 ns 4.527 ns
before before insert 10 in 100 946.8 ns 11.479 ns 10.738 ns
after after insert 10 in 10 280.6 ns 5.380 ns 5.524 ns
before before insert 10 in 10 505.9 ns 6.787 ns 6.349 ns
after after insert 10 in 1 344.6 ns 5.514 ns 5.157 ns
before before insert 10 in 1 530.8 ns 7.409 ns 6.930 ns
after after insert 5 in 1000000 6,290,664.5 ns 92,389.944 ns 86,421.602 ns
before before insert 5 in 1000000 6,507,284.5 ns 126,311.112 ns 98,615.399 ns
after after insert 5 in 100000 770,505.4 ns 8,866.949 ns 8,294.149 ns
before before insert 5 in 100000 748,731.3 ns 10,642.227 ns 9,954.745 ns
after after insert 5 in 10000 50,054.9 ns 474.652 ns 420.766 ns
before before insert 5 in 10000 47,189.2 ns 669.830 ns 593.787 ns
after after insert 5 in 1000 4,840.6 ns 78.517 ns 73.445 ns
before before insert 5 in 1000 4,286.1 ns 44.895 ns 41.994 ns
after after insert 5 in 100 664.7 ns 4.691 ns 3.917 ns
before before insert 5 in 100 713.8 ns 6.343 ns 5.933 ns
after after insert 5 in 10 226.4 ns 4.033 ns 3.772 ns
before before insert 5 in 10 324.6 ns 5.109 ns 4.779 ns
after after insert 5 in 1 253.3 ns 4.510 ns 4.219 ns
before before insert 5 in 1 352.1 ns 5.881 ns 5.501 ns

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With both algorithms included in the implementation, the cost of the this optimization will be the extra code size. List is a very popular type and the code for it is stamped many times. It may be useful to take the extra code size into account as well when evaluating different options.

Copy link
Author

@stilettk stilettk Apr 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought about it a bit, just didn't have time to express it. My above solution isn't correct because N=10 was found with benchmarks on my system only, but there can be many other configurations, processor architectures and so on.

As for the original pull request, I didn't find any other algorithm which has the same performance for N=2. I have tested the algorithm that goes through array exactly once, and it has roughly the same performance as the current one.

Eventually, I hardly doubt that any managed algorithm can compete with unmanaged Array.Copy calls. So, I see two options:

  1. Decline the change.
    1. Everything remains the same
    2. Pull request and appropriate issue should be closed
  2. Accept the change
    1. ~1.5x performance decrease for small N
    2. Up to 99% performance increase for big N
    3. Linear (not quadratic) algorithm complexity

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas any thoughts on this proposed options?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, these improvements that help some cases and hurt other cases are hard to judge.

For cases like these, I usually try to look at the API telemetry and usage to try to convince myself that the change is worth it.

We may consider dropping the extra code that tries to keep the List consistent even in presence of exceptions. The ICollection path above is not keeping the list consistent in the presence of exceptions either.

@stilettk
Copy link
Author

stilettk commented Mar 24, 2018

Can you check we already have sufficient tests for this?

Unfortunately, we don't. There are tests of InsertRange in CoreFX, but only inserting of ICollection is tested (see #14876 (comment)). This is because test collection returns IEnumerable<T>, which is still an array underneath (collection is ICollection<T> returns true):

public IEnumerable<T> ConstructTestCollection(T[] items)
{
    return items;
}

This needs to be fixed in another PR later, because tests are in another repo. Anyway, I have changed ConstructTestCollection method to return pure IEnumerable<T>, and tests have succeeded.

If you mean performance tests, we don't have them for InsertRange either. Again, it needs to be done in separate issue since this is another repo.

Some checks were not successful

Is there something needs worrying of? Is failed "Tizen armel Cross Checked Innerloop Build and Test" critical?

@safern
Copy link
Member

safern commented Mar 30, 2018

Note that there was some coverage added for this yesterday in: dotnet/corefx#28465

@jkotas
Copy link
Member

jkotas commented Dec 1, 2018

The earlier change that this is improving upon was found to be problematic and reverted in #20471.

Thanks for your PR anyway.

@jkotas jkotas closed this Dec 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants