Skip to content

Performance regression with AsyncEnumerable - Append, followed by SumAsync #121542

@iXyles

Description

@iXyles

Description

In .NET 10 AsyncEnumerable LINQ features are now included into BCL instead of the standalone package System.Linq.Async. Most things work as expected, but in the scenario of Append followed by a SumAsync we are seeing a regression in performance that is not expected (one of the scenarios we have noted down so far in our internal testing).

Configuration

Code snippet of benchmark:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run(BenchmarkConverter.TypeToBenchmarks(typeof(AppendTest)));

[MemoryDiagnoser]
[MinColumn, MaxColumn, Q1Column, Q3Column, AllStatisticsColumn]
[JsonExporterAttribute.Full]
public class AppendTest
{
    [Benchmark]
    public async Task<int> AppendInts()
    {
        var enumerable = AsyncEnumerable.Empty<int>();
        for (int i = 0; i < 10000; i++)
            enumerable = enumerable.Append(i);

        return await enumerable.SumAsync();
    }
}

Benchmark result with .NET9 & System.Linq.Async:

// * Summary *
BenchmarkDotNet v0.15.6, macOS 26.1 (25B78) [Darwin 25.1.0]
Apple M2 Pro, 1 CPU, 12 logical and 12 physical cores
.NET SDK 9.0.306
  [Host]     : .NET 9.0.10 (9.0.10, 9.0.1025.47515), Arm64 RyuJIT armv8.0-a
  DefaultJob : .NET 9.0.10 (9.0.10, 9.0.1025.47515), Arm64 RyuJIT armv8.0-a


| Method     | Mean     | Error   | StdDev   | StdErr  | Median   | Min      | Max      | Q1       | Q3       | Op/s    | Gen0     | Gen1   | Allocated |
|----------- |---------:|--------:|---------:|--------:|---------:|---------:|---------:|---------:|---------:|--------:|---------:|-------:|----------:|
| AppendInts | 364.3 us | 7.14 us | 13.75 us | 2.03 us | 358.5 us | 342.9 us | 405.2 us | 356.0 us | 368.0 us | 2,745.0 | 166.5039 | 1.9531 |   1.34 MB |

Benchmark result with .NET10 with built-in AsyncEnumerable:

// * Summary *
BenchmarkDotNet v0.15.6, macOS 26.1 (25B78) [Darwin 25.1.0]
Apple M2 Pro, 1 CPU, 12 logical and 12 physical cores
.NET SDK 10.0.100
  [Host]     : .NET 10.0.0 (10.0.0, 10.0.25.52411), Arm64 RyuJIT armv8.0-a
  DefaultJob : .NET 10.0.0 (10.0.0, 10.0.25.52411), Arm64 RyuJIT armv8.0-a


| Method     | Mean     | Error    | StdDev   | StdErr  | Min      | Max      | Q1       | Q3       | Median   | Op/s  | Allocated |
|----------- |---------:|---------:|---------:|--------:|---------:|---------:|---------:|---------:|---------:|------:|----------:|
| AppendInts | 904.1 ms | 12.22 ms | 11.43 ms | 2.95 ms | 891.6 ms | 927.0 ms | 896.3 ms | 908.7 ms | 900.6 ms | 1.106 |   1.83 MB |

Regression?

Regression between .NET 9 using System.Linq.Async implementation over .NET 10 using System.Linq.AsyncEnumerable implementation.

Data

Included under configurations

Analysis

By copying over the Append implementation from System.Linq.Async of v6 branch, the results shows an improved performance instead of the regression:

// * Summary *
BenchmarkDotNet v0.15.6, macOS 26.1 (25B78) [Darwin 25.1.0]
Apple M2 Pro, 1 CPU, 12 logical and 12 physical cores
.NET SDK 10.0.100
  [Host]     : .NET 10.0.0 (10.0.0, 10.0.25.52411), Arm64 RyuJIT armv8.0-a
  DefaultJob : .NET 10.0.0 (10.0.0, 10.0.25.52411), Arm64 RyuJIT armv8.0-a

| Method     | Mean     | Error   | StdDev  | StdErr  | Min      | Max      | Q1       | Q3       | Median   | Op/s    | Gen0     | Gen1   | Allocated |
|----------- |---------:|--------:|--------:|--------:|---------:|---------:|---------:|---------:|---------:|--------:|---------:|-------:|----------:|
| AppendInts | 282.0 us | 3.06 us | 2.56 us | 0.71 us | 279.8 us | 288.1 us | 280.4 us | 281.7 us | 281.3 us | 3,546.4 | 166.5039 | 1.9531 |   1.34 MB |

Gist of copied implementation used for the benchmark test above: https://gist.github.com/iXyles/2c7bec4e1439222bdbbd970c6c36ded5

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions