Skip to content

Conversation

@jgonz120
Copy link
Contributor

@jgonz120 jgonz120 commented Dec 4, 2023

Bug

Fixes: NuGet/Home#12715

Regression? Last working version:

Description

Currently we use the NewtsonJson's JObject implementation for parsing the projectAssets.json. This implementation isn't memory efficient so we decided to switch to using System.Text.Json (STJ).

Implementing custom converters in STJ resulted in the JSON file being read at that start, leading to a LOH allocation. To avoid this I implemented a solution based off of this guide. This will allow us parse through the file without ever having to load the full file at once.

Method Runtime InputFile Mean StdDev Median Ratio Gen0 Gen1 Gen2 Allocated Alloc Ratio
'LockFileFormat read StreamSTJ' .NET 5.0 10KB.json 593.2 us 20.45 us 593.7 us 0.77 1.9531 - - 42.64 KB 0.21
'LockFileFormat read NJ' .NET 5.0 10KB.json 771.6 us 22.54 us 773.3 us 1.00 11.7188 2.9297 - 199.64 KB 1.00
'LockFileFormat read StreamSTJ' .NET Core 3.1 10KB.json 629.0 us 8.61 us 627.9 us 0.78 1.9531 - - 42.33 KB 0.21
'LockFileFormat read NJ' .NET Core 3.1 10KB.json 804.7 us 12.28 us 804.5 us 1.00 11.7188 2.9297 - 198.78 KB 1.00
'LockFileFormat read StreamSTJ' .NET Framework 4.7.2 10KB.json 313.1 us 5.21 us 312.1 us 0.70 7.8125 0.4883 - 50.97 KB 0.23
'LockFileFormat read NJ' .NET Framework 4.7.2 10KB.json 447.2 us 8.28 us 448.3 us 1.00 35.1563 10.2539 - 218.54 KB 1.00
'LockFileFormat read StreamSTJ' .NET 5.0 786KB.json 7,437.1 us 21.04 us 7,433.8 us 0.63 179.6875 85.9375 - 2985.32 KB 0.41
'LockFileFormat read NJ' .NET 5.0 786KB.json 11,899.3 us 62.16 us 11,887.7 us 1.00 437.5000 218.7500 - 7357 KB 1.00
'LockFileFormat read StreamSTJ' .NET Core 3.1 786KB.json 7,882.0 us 59.31 us 7,888.3 us 0.60 171.8750 78.1250 - 2970.14 KB 0.41
'LockFileFormat read NJ' .NET Core 3.1 786KB.json 13,205.2 us 187.11 us 13,169.7 us 1.00 437.5000 218.7500 - 7317.3 KB 1.00
'LockFileFormat read StreamSTJ' .NET Framework 4.7.2 786KB.json 10,399.7 us 124.57 us 10,409.3 us 0.60 515.6250 250.0000 - 3182.19 KB 0.41
'LockFileFormat read NJ' .NET Framework 4.7.2 786KB.json 17,258.3 us 366.31 us 17,192.9 us 1.00 1250.0000 625.0000 93.7500 7695.95 KB 1.00
'LockFileFormat read StreamSTJ' .NET 5.0 1308KB.json 10,769.8 us 51.83 us 10,786.7 us 0.64 265.6250 125.0000 - 4478.94 KB 0.47
'LockFileFormat read NJ' .NET 5.0 1308KB.json 16,737.7 us 143.36 us 16,713.1 us 1.00 562.5000 281.2500 - 9556.33 KB 1.00
'LockFileFormat read StreamSTJ' .NET Core 3.1 1308KB.json 11,831.7 us 46.65 us 11,834.1 us 0.66 265.6250 125.0000 - 4465.27 KB 0.47
'LockFileFormat read NJ' .NET Core 3.1 1308KB.json 17,954.0 us 268.64 us 18,024.8 us 1.00 562.5000 281.2500 - 9517.85 KB 1.00
'LockFileFormat read StreamSTJ' .NET Framework 4.7.2 1308KB.json 15,375.0 us 130.66 us 15,337.3 us 0.62 781.2500 390.6250 - 4832.49 KB 0.48
'LockFileFormat read NJ' .NET Framework 4.7.2 1308KB.json 24,673.5 us 246.52 us 24,662.8 us 1.00 1750.0000 781.2500 218.7500 10052.99 KB 1.00
'LockFileFormat read StreamSTJ' .NET 5.0 2756KB.json 29,047.0 us 125.66 us 29,041.5 us 0.51 718.7500 343.7500 - 11956.48 KB 0.44
'LockFileFormat read NJ' .NET 5.0 2756KB.json 57,121.3 us 369.50 us 57,184.0 us 1.00 1888.8889 1000.0000 222.2222 27482.74 KB 1.00
'LockFileFormat read StreamSTJ' .NET Core 3.1 2756KB.json 31,527.0 us 251.44 us 31,531.1 us 0.56 687.5000 312.5000 - 11908.92 KB 0.44
'LockFileFormat read NJ' .NET Core 3.1 2756KB.json 56,282.2 us 569.83 us 56,214.5 us 1.00 1555.5556 777.7778 111.1111 27358.63 KB 1.00
'LockFileFormat read StreamSTJ' .NET Framework 4.7.2 2756KB.json 47,430.5 us 404.78 us 47,439.5 us 0.56 2181.8182 909.0909 272.7273 12596.34 KB 0.44
'LockFileFormat read NJ' .NET Framework 4.7.2 2756KB.json 85,334.2 us 1,111.29 us 84,782.2 us 1.00 5142.8571 2285.7143 857.1429 28617.17 KB 1.00
'LockFileFormat read StreamSTJ' .NET 5.0 11527KB.json 201,285.2 us 3,929.03 us 202,010.5 us 0.59 4333.3333 2666.6667 1000.0000 63626.85 KB 0.47
'LockFileFormat read NJ' .NET 5.0 11527KB.json 364,149.1 us 34,805.34 us 352,503.5 us 1.00 9000.0000 5000.0000 1000.0000 136493.16 KB 1.00
'LockFileFormat read StreamSTJ' .NET Core 3.1 11527KB.json 236,958.2 us 4,460.41 us 235,785.6 us 0.63 4333.3333 2666.6667 1000.0000 63415.04 KB 0.47
'LockFileFormat read NJ' .NET Core 3.1 11527KB.json 375,601.1 us 10,888.11 us 373,736.1 us 1.00 9000.0000 5000.0000 1000.0000 135976.18 KB 1.00
'LockFileFormat read StreamSTJ' .NET Framework 4.7.2 11527KB.json 295,284.9 us 7,029.31 us 292,850.2 us 0.71 11500.0000 4500.0000 1500.0000 66077.99 KB 0.47
'LockFileFormat read NJ' .NET Framework 4.7.2 11527KB.json 415,378.6 us 7,298.57 us 415,826.5 us 1.00 24000.0000 9000.0000 2000.0000 140948.86 KB 1.00

PR Checklist

  • PR has a meaningful title

  • PR has a linked issue.

  • Described changes

  • Tests

    • Automated tests added
    • OR
    • Test exception
    • OR
    • N/A
  • Documentation

    • Documentation PR or issue filled
    • OR
    • N/A

@jgonz120 jgonz120 requested a review from a team as a code owner December 4, 2023 23:20
@zivkan
Copy link
Member

zivkan commented Dec 5, 2023

FYI @davkean

@davkean
Copy link
Contributor

davkean commented Dec 5, 2023

Exciting, I will take a look tomorrow.

{
reader.TrySkip();
}
if (reader.ValueTextEquals(Utf8Level))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can pattern matching via a switch expression make this a little nicer?

while (reader.Read() && reader.TokenType == JsonTokenType.PropertyName)
{
var propertyName = reader.GetString();
lockFileItem.Properties[propertyName] = reader.ReadNextTokenAsString();
Copy link
Contributor

@davkean davkean Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these items have empty properties? if so, I'd handled the empty path by passing the lock file item a shared empty readonly-dictionary on creation.


var lockFileTarget = new LockFileTarget();
//We want to read the property name right away
var propertyName = reader.GetString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the same trick for target frameworks as we are doing for the common property names where we compare them as UTF8? We're very likely parsing the same target frameworks over and over again.

while (reader.Read() && reader.TokenType == JsonTokenType.PropertyName)
{
string propertyName = reader.GetString();
string versionString = reader.ReadNextTokenAsString();
Copy link
Contributor

@davkean davkean Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine version strings and ranges are the same throughout the assets file, can we dedupe them (via Dictionary<Memory<T>, VersionRange> or something) so we're only parsing the same bytes once per file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these may vary a bit they're specific to package they're dependent on, and each package will have a different version.

Copy link
Contributor

@davkean davkean Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at an assets file from https://github.com/dotnet/project-system/tree/main/src/Microsoft.VisualStudio.ProjectSystem.Managed, and 6.0.0 shows up 162 times.

When we look at low memory dumps with NuGet assets in memory, I see millions of version numbers that are identical across a variety of data structures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davkean I'm trying to get better acquainted with Memory. We have three scenarios where we can cache the values here.
VersionRange, NuGetVersion, and NuGetFramework.

For NuGetVersion, it looks like we're parsing it once we've already retrieved the string for the current token. So I was planning on using Dictionary<string, NuGetVersion>.

For NuGetFramework and VersionRange we could use the current ValueSpan/ValueSequence of the current token. So if we can leverage the readonlyspan we could do this check and get the corresponding object without allocating a string each time. I'm guessing that was the idea behind the suggestion of using Dictionary<Memory, VersionRange>? Would that mean using Dictionary<Memory<char[]>, VersionRange>, then using an IEqualityComparer to compare the arrays are equal with SequenceEqual?

I found this open issue that looks like it's trying to address this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NuGetFramework already has some optimizations,

private static bool TryParseCommonFramework(string frameworkString, [NotNullWhen(true)] out NuGetFramework? framework)
. This caches well known frameworks.

Nothing for version range/version though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so framework might not be worth it unless we want to figure out how to avoid the extra string allocation.

else
{
lockFileTargetLibrary.Name = propertyName.Substring(0, slashIndex);
lockFileTargetLibrary.Version = NuGetVersion.Parse(propertyName.Substring(slashIndex + 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on the version number as above.

throw new JsonException("Expected PropertyName, found " + reader.TokenType);
}

var lockFileTargetLibrary = new LockFileTargetLibrary();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I image these are commonly referenced throughout an assets file, can we dedupe these to avoid parsing the same data over and over again?

Copy link
Member

@zivkan zivkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the benchmarks in the PR description, considering the current version of NuGet gets inserted into .NET 8 and VS, I don't think having benchmark results for .NET 5 and .NET Core 3.1 provide a lot of value. The results would be much more meaningful if you showed .NET Framework 4.8 (or 4.7.2), and .NET 8.

I didn't go through everything in detail, but I already have so many comments, some of which might result in significant changes, so I don't want to spend more time on this now. Hopefully my comments are useful, and not annoying!

@davkean
Copy link
Contributor

davkean commented Dec 5, 2023

@jgonz120 This is great progress, so happy to see this work. Don't let the plethora of comments take away from this great work.

I super love to get my hands on trace of the benchmarks - before and after would be great, but only need .NET Framework, can you collect one for me using PerfView via Collect -> Collect with these options:

image

This will help me better understand if those comments I've added are problems or not.

Copy link
Contributor

@donnie-msft donnie-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking on this work. I know it's been a goal for a while, and it looks like no small feat.
There's a lot of great feedback, so I'll take another pass later. Not much additional jumped out at me.
May the force be with you 🖖

@jgonz120 jgonz120 closed this Dec 5, 2023
@jgonz120 jgonz120 reopened this Dec 5, 2023
@jgonz120
Copy link
Contributor Author

jgonz120 commented Dec 5, 2023

I'm going to close this PR and break it out into smaller PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stop using JObject in assets file reading to reduce allocations

6 participants