-
Notifications
You must be signed in to change notification settings - Fork 734
System.Text.Json Migration - Adding code to parse the PackageSpec using STJ #5541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Text.Json Migration - Adding code to parse the PackageSpec using STJ #5541
Conversation
* Moved files over and addressed some PR comments * added comment * switched to true and false strings * Added ctr to specify buffer for testing purposes. * remove commented code * switch to use Utf8 preamble for BOM * Create method for checking complete * combined code for ReadStringArray * Updated buffer size to match STJ's default buffer size * Switch Utf8JsonStreamReader to be disposable. * Switch to read the value for numbers into a string directly * revert back to using private var for utf8Bom * Remove ReadStringArrayAsList * Avoid referencing buffer after returning * Actually avoid referencing _buffer after returning * Update how buffers are fed into Utf8JsonReader to avoid feeding extra empty data. * remove extra line * Reverted back to using try get int for ReadTokenAsString * Update src/NuGet.Core/NuGet.ProjectModel/Utf8JsonStreamReader.cs Co-authored-by: Andy Zivkovic <[email protected]> * Remove ValueTextEquals taking in string * Switched to Skip instead of TrySkip * Update src/NuGet.Core/NuGet.ProjectModel/Utf8JsonStreamReader.cs Co-authored-by: Andy Zivkovic <[email protected]> * Added some unit tests * fix Bom * Switched to using Moq * Update src/NuGet.Core/NuGet.ProjectModel/Utf8JsonStreamReader.cs Co-authored-by: Andy Zivkovic <[email protected]> * loop through stream when reading to ensure reading full bytes or to the end * update signature comment * Switched stream back to field and supress warning --------- Co-authored-by: Andy Zivkovic <[email protected]>
src/NuGet.Core/NuGet.ProjectModel/Utf8JsonStreamPackageSpecReader.cs
Outdated
Show resolved
Hide resolved
src/NuGet.Core/NuGet.ProjectModel/Utf8JsonStreamPackageSpecReader.cs
Outdated
Show resolved
Hide resolved
src/NuGet.Core/NuGet.ProjectModel/Utf8JsonStreamPackageSpecReader.cs
Outdated
Show resolved
Hide resolved
nkolev92
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly cleanup comments, nothing major.
I do have a suggestion for reducing the number of changes, thus making it easier to review.
zivkan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good for the "happy case". But looking at this from a library point of view, I have concerns about the error handling. However, since NuGet should be the only component that writes this file (we don't expect 3rd party tools to write the file, then NuGet to read it), maybe error handling is less important, since it should be so unlikely.
There's always a question of "doing it right" vs "good enough", but also the principal of least surprise. I'm worried about how difficult it might be to diagnose some problems if customers report bugs, and we don't realize that the json parsing code is silently ignoring JSON errors, rather than throwing exceptions and making it abundantly clear that something unexpected happened.
src/NuGet.Core/NuGet.ProjectModel/JsonPackageSpecReader.Utf8JsonStreamReader.cs
Show resolved
Hide resolved
| packageSpecPath); | ||
| } | ||
|
|
||
| if (jsonReader.Read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no else for this statement. My understanding is that Read only returns false at the end of the file, in which case it means this JSON file is incomplete. Traditionally, parsers would throw an exception at the point of syntax error (like missing value following a property name), but this code appears to return an incomplete value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're right that read should only return false at the end of the file. That said Read() will throw an error if we're at the end of the file at this point. The file needs to be structured to be able to call read on the reader and be at a StartObject. I added a test to confirm this.
Utf8JsonReader throws an exception if we try to read and hit the end of file when it's several layers deep into an object already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So from a practical point of view, Read() is never expected to return false, except for potentially the last read of the root object's deserialization (but from another PR, you found that Newtosoft.Json doesn't even validate that end of an object is also the end of the file).
This makes all the if (jsonReader.Read()) conditions unnecessary nesting.
I'm wondering from an API design point of view, if Utf8JsonStreamReader's Read() method should be changed to return void. We can add a EnsureEndOfFile() if we want to check that there is no contents at the end of the root object.
On the other hand, I feel guilty when my comments feel like scope creep, so this probably really doesn't matter, as Utf8JsonStreamReader is internal, and nuget should write the files that are read, so we have very high confidence there will not be any errors in the json.
src/NuGet.Core/NuGet.ProjectModel/JsonPackageSpecReader.Utf8JsonStreamReader.cs
Outdated
Show resolved
Hide resolved
| { | ||
| if (jsonReader.Read() && jsonReader.TokenType == JsonTokenType.StartObject) | ||
| { | ||
| while (jsonReader.Read() && jsonReader.TokenType == JsonTokenType.PropertyName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like other comments, I'm not sure the error handling here provides a good API experience. If jsonReader.Read() returns false, doesn't that mean the file has an error? If read succeeds, but the token type is wrong, that's deinately an error, but no error is being reported here.
|
Also, can the title of this PR please be updated to something meaningful? It confuses me every time I see it in my notifications list 😄 |
zivkan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears all my feedback is relevant to the Newtonsoft.Json code, so probably out of scope for this PR.
22f3566 to
1731952
Compare
…urebranch-jgonz120-package-spec-reader-migration
kartheekp-ms
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I apologize for providing feedback after the pull request has been merged. Given that it was merged into a feature branch, I would like you to consider these comments in your future PRs during the migration process.
| }); | ||
| } | ||
|
|
||
| [Obsolete] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An obsolete attribute allows to add a short explanation suggesting alternative workarounds. It could be great if we can add that information here. Are these methods marked as obsolete due to their usage in the Newtonsoft.Json API? If that's the case, perhaps we could add a message to inform the caller. In certain instances, the caller might be a customer of the NuGet SDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is private so we shouldn't have any customers calling it right? The public functions have a message in them already but I can update it if you think a specific call out to STJ would be better.
|
|
||
| if (jsonReader.TokenType == JsonTokenType.StartObject) | ||
| { | ||
| while (jsonReader.Read() && jsonReader.TokenType == JsonTokenType.PropertyName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed that the GetPackageSpec method contains several if-else blocks within this while loop, which might affect the readability and maintainability of the code. To improve this, I'm considering a refactor that involves using smaller, focused classes, each class is responsible for handling a specific property in the package spec. This way, the GetPackageSpec method can simply delegate the handling of each property to these specialized classes. One approach I'm thinking now is to combine the Strategy Pattern with the Abstract Factory Pattern. Implementing these patterns would significantly streamline the GetPackageSpec method, making it more concise and simpler to understand. Each property handler could be encapsulated in its own class, and an abstract factory could be used to instantiate the appropriate handler based on the property being processed. This would not only enhance readability but also make the codebase more modular and easier to extend or modify in the future.
interface IJsonPropertyHandler
{
void HandleProperty(ref Utf8JsonStreamReader jsonReader, PackageSpec packageSpec);
}class AuthorsPropertyHandler : IJsonPropertyHandler
{
public void HandleProperty(ref Utf8JsonStreamReader jsonReader, PackageSpec packageSpec)
{
// Implement logic for handling Authors property
}
}
class DescriptionPropertyHandler : IJsonPropertyHandler
{
public void HandleProperty(ref Utf8JsonStreamReader jsonReader, PackageSpec packageSpec)
{
// Implement logic for handling Description property
}
// ... other implementations for different properties
}interface IJsonPropertyHandlerFactory
{
IJsonPropertyHandler GetHandler(string propertyName);
}class JsonPropertyHandlerFactory : IJsonPropertyHandlerFactory
{
private readonly Dictionary<string, IJsonPropertyHandler> _handlers;
public JsonPropertyHandlerFactory()
{
_handlers = new Dictionary<string, IJsonPropertyHandler>
{
{ "Authors", new AuthorsPropertyHandler() },
{ "Description", new DescriptionPropertyHandler() },
// ... other mappings
};
}
public IJsonPropertyHandler GetHandler(string propertyName)
{
if (_handlers.TryGetValue(propertyName, out var handler))
{
return handler;
}
return null; // Or some default handler
}
}internal static PackageSpec GetPackageSpec(ref Utf8JsonStreamReader jsonReader, string name, string packageSpecPath, string snapshotValue)
{
var packageSpec = new PackageSpec();
var handlerFactory = new JsonPropertyHandlerFactory();
if (jsonReader.TokenType == JsonTokenType.StartObject)
{
while (jsonReader.Read() && jsonReader.TokenType == JsonTokenType.PropertyName)
{
string propertyName = jsonReader.GetString();
var handler = handlerFactory.GetHandler(propertyName);
if (handler != null)
{
handler.HandleProperty(ref jsonReader, packageSpec);
}
else
{
jsonReader.Skip(); // Skip unknown properties
}
}
}
// ... finalizing packageSpec
return packageSpec;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this type of pattern is necessary when many of these properties are just reading a string from the stream?
…ng STJ (#5541) Updated the logic for parsing the package spec to use STJ. Using a registry setting to determine if the code should use NJ or STJ for parsing.
…ng STJ (#5541) Updated the logic for parsing the package spec to use STJ. Using a registry setting to determine if the code should use NJ or STJ for parsing.
…ng STJ (#5541) Updated the logic for parsing the package spec to use STJ. Using a registry setting to determine if the code should use NJ or STJ for parsing.
Crated a new struct to parse the assets file using System.Text.Json instead of Newtonsoft. It will read the file without loading it completely in memory, reducing memory allocations. PRs associated with this: * Create class for reading Json files in chunks (#5530) * System.Text.Json Migration - Adding code to parse the PackageSpec using STJ (#5541) * System.Text.Json Migration - Adding code to parse the Project.Assets.Json file using STJ. (#5558) * Create class for reading Json files in chunks (#5530)
Bug
Fixes: NuGet/Home#12715
Regression? Last working version:
Description
Second PR in this series, this one focuses on the PackageSpec, which is required to complete the LockFile conversion.
FileFormatException
Updated this file to allow us to create this exception without having to provide the line or column where the error occurred. Unfortunately, we do not have access to this information, it's an internal member of the JsonReaderState
JsonPackageSpecReader
Has been split into two different files, one for NewtonsoftJson and another for SystemTextJson
The logic for both of these version's is similar since we were iterating through the JSON using the JsonTextReader.
PR Checklist
PR has a meaningful title
PR has a linked issue.
Described changes
Tests
Documentation