-
Notifications
You must be signed in to change notification settings - Fork 734
Migrate from using Newtson.Json to System.Text.Json for parsing the assets file. #5529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
FYI @davkean |
|
Exciting, I will take a look tomorrow. |
src/NuGet.Core/NuGet.ProjectModel/LockFile/Utf8JsonStreamAssetsLogMessageConverter.cs
Show resolved
Hide resolved
| { | ||
| reader.TrySkip(); | ||
| } | ||
| if (reader.ValueTextEquals(Utf8Level)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can pattern matching via a switch expression make this a little nicer?
| while (reader.Read() && reader.TokenType == JsonTokenType.PropertyName) | ||
| { | ||
| var propertyName = reader.GetString(); | ||
| lockFileItem.Properties[propertyName] = reader.ReadNextTokenAsString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these items have empty properties? if so, I'd handled the empty path by passing the lock file item a shared empty readonly-dictionary on creation.
src/NuGet.Core/NuGet.ProjectModel/LockFile/Utf8JsonStreamLockFileTargetConverter.cs
Show resolved
Hide resolved
|
|
||
| var lockFileTarget = new LockFileTarget(); | ||
| //We want to read the property name right away | ||
| var propertyName = reader.GetString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do the same trick for target frameworks as we are doing for the common property names where we compare them as UTF8? We're very likely parsing the same target frameworks over and over again.
src/NuGet.Core/NuGet.ProjectModel/LockFile/Utf8JsonStreamLockFileTargetLibraryConverter.cs
Show resolved
Hide resolved
src/NuGet.Core/NuGet.ProjectModel/LockFile/Utf8JsonStreamLockFileTargetLibraryConverter.cs
Show resolved
Hide resolved
| while (reader.Read() && reader.TokenType == JsonTokenType.PropertyName) | ||
| { | ||
| string propertyName = reader.GetString(); | ||
| string versionString = reader.ReadNextTokenAsString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine version strings and ranges are the same throughout the assets file, can we dedupe them (via Dictionary<Memory<T>, VersionRange> or something) so we're only parsing the same bytes once per file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe these may vary a bit they're specific to package they're dependent on, and each package will have a different version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked at an assets file from https://github.com/dotnet/project-system/tree/main/src/Microsoft.VisualStudio.ProjectSystem.Managed, and 6.0.0 shows up 162 times.
When we look at low memory dumps with NuGet assets in memory, I see millions of version numbers that are identical across a variety of data structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davkean I'm trying to get better acquainted with Memory. We have three scenarios where we can cache the values here.
VersionRange, NuGetVersion, and NuGetFramework.
For NuGetVersion, it looks like we're parsing it once we've already retrieved the string for the current token. So I was planning on using Dictionary<string, NuGetVersion>.
For NuGetFramework and VersionRange we could use the current ValueSpan/ValueSequence of the current token. So if we can leverage the readonlyspan we could do this check and get the corresponding object without allocating a string each time. I'm guessing that was the idea behind the suggestion of using Dictionary<Memory, VersionRange>? Would that mean using Dictionary<Memory<char[]>, VersionRange>, then using an IEqualityComparer to compare the arrays are equal with SequenceEqual?
I found this open issue that looks like it's trying to address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NuGetFramework already has some optimizations,
| private static bool TryParseCommonFramework(string frameworkString, [NotNullWhen(true)] out NuGetFramework? framework) |
Nothing for version range/version though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so framework might not be worth it unless we want to figure out how to avoid the extra string allocation.
src/NuGet.Core/NuGet.ProjectModel/LockFile/Utf8JsonStreamLockFileTargetLibraryConverter.cs
Show resolved
Hide resolved
| else | ||
| { | ||
| lockFileTargetLibrary.Name = propertyName.Substring(0, slashIndex); | ||
| lockFileTargetLibrary.Version = NuGetVersion.Parse(propertyName.Substring(slashIndex + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto on the version number as above.
| throw new JsonException("Expected PropertyName, found " + reader.TokenType); | ||
| } | ||
|
|
||
| var lockFileTargetLibrary = new LockFileTargetLibrary(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I image these are commonly referenced throughout an assets file, can we dedupe these to avoid parsing the same data over and over again?
zivkan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the benchmarks in the PR description, considering the current version of NuGet gets inserted into .NET 8 and VS, I don't think having benchmark results for .NET 5 and .NET Core 3.1 provide a lot of value. The results would be much more meaningful if you showed .NET Framework 4.8 (or 4.7.2), and .NET 8.
I didn't go through everything in detail, but I already have so many comments, some of which might result in significant changes, so I don't want to spend more time on this now. Hopefully my comments are useful, and not annoying!
src/NuGet.Core/NuGet.ProjectModel/LockFile/Utf8JsonStreamAssetsLogMessageConverter.cs
Show resolved
Hide resolved
test/NuGet.Core.Tests/NuGet.ProjectModel.Test/NjJsonPackageSpecReaderTests.cs
Show resolved
Hide resolved
src/NuGet.Core/NuGet.ProjectModel/LockFile/Utf8JsonStreamAssetsLogMessageConverter.cs
Show resolved
Hide resolved
|
@jgonz120 This is great progress, so happy to see this work. Don't let the plethora of comments take away from this great work. I super love to get my hands on trace of the benchmarks - before and after would be great, but only need .NET Framework, can you collect one for me using PerfView via Collect -> Collect with these options: This will help me better understand if those comments I've added are problems or not. |
donnie-msft
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking on this work. I know it's been a goal for a while, and it looks like no small feat.
There's a lot of great feedback, so I'll take another pass later. Not much additional jumped out at me.
May the force be with you 🖖
src/NuGet.Core/NuGet.ProjectModel/Utf8JsonStreamProjectFileDependencyGroupConverter.cs
Show resolved
Hide resolved
|
I'm going to close this PR and break it out into smaller PRs. |

Bug
Fixes: NuGet/Home#12715
Regression? Last working version:
Description
Currently we use the NewtsonJson's JObject implementation for parsing the projectAssets.json. This implementation isn't memory efficient so we decided to switch to using System.Text.Json (STJ).
Implementing custom converters in STJ resulted in the JSON file being read at that start, leading to a LOH allocation. To avoid this I implemented a solution based off of this guide. This will allow us parse through the file without ever having to load the full file at once.
PR Checklist
PR has a meaningful title
PR has a linked issue.
Described changes
Tests
Documentation