-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Prevent exception when attempting to write PaxTarEntry obtained from TarReader #75237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsFixes #75215 A When attempting to write such entry to a The fix is to add the key using the indexer. That way, we always write the most up-to-date value to the dictionary. We should try to get it into 7.0 as mentioned in the issue.
|
src/libraries/System.Formats.Tar/tests/TarWriter/TarWriter.WriteEntry.File.Tests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/tests/TarWriter/TarWriter.File.Base.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/tests/TarWriter/TarWriter.File.Base.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/tests/TarWriter/TarWriter.File.Base.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind elaborate further why are you skipping these archives? and how do you deal with them in other tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first two, the reason is the same as the one I wrote a few lines above.
Generally speaking, these tests are collecting the unarchived files from the filesystem. We have to skip them because either some OS variants do not support the files in these test cases, or nuget is unable to pack them from runtime-assets and unpack them unchanged into the test data.
Other tests are not consuming the unarchived files, they are consuming the .tar or .tar.gz files directly, which we can iterate without interacting with the filesystem. But this test in particular needs to be able to compare with the filesystem, not with TarReader entries.
The skipped test cases aren't that important. What's important is the copy of an entry that comes from a TarReader into a new TarWriter, particularly in the PAX format, to verify the bug is fixed in the extended attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test in particular needs to be able to compare with the filesystem, not with TarReader entries.
I left a suggestion below that can help you avoid comparing against the filesystem #75237 (comment). Could that help removing all these continues?
src/libraries/System.Formats.Tar/tests/TarWriter/TarWriter.File.Base.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/tests/TarWriter/TarWriter.WriteEntry.File.Tests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/tests/TarWriter/TarWriter.WriteEntry.File.Tests.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way you can validate the copy is by re-iterating originArchive (along with destinationArchive) and comparing as much fields as possible against the entries in destinationReader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have both tests? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see value in extracting the entries and then using them for comparison in this case, the reported error is when you copy a Tar to another without writing to the disk. IMO, the filesystem should not be involved at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if there is a bug in the way we are reading information with the Tar APIs, then comparing the items read with the first TarReader with a second TarReader will give me the same results. The utf8 case discussed above is a good example of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then there should be a separate test that validates file system entries extracted to disk against tar archive entries. Said test should not exercise the "Copy tar" scenario, just "extract to disk then compare"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior looks like it was intentional, is it OK that we change it from "add if it was absent" to "always set"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This CollectExtendedAttributesFromStandardFields method does exactly what other tar tools do when creating a PAX entry:
- They always insert the values of
nameandmtimefor all entry types, regardless if they fit or not in the standard field. - They insert
linknamefor symlinks and hardlinks if not empty, regardless if it fits or not in the standard field. - If
gnameorunameare set, they get added if their byte lengths are too large for the standard field. - If
sizedoes not fit in the standard field when converted to string, it gets inserted.
If the user decides to manually change the mtime before writing it into the TarWriter, then the mtime would not be updated.
I need to add more unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs
Outdated
Show resolved
Hide resolved
This hasn't moved in 10 days. What's the plan? |
I had a call with @jozkee today. We have a mutual understanding now and will address the comments today. |
… or when using conversion constructor. Disallow dictionary with reserved keys when using the pax constructor that takes a dictionary. Allow user to modify reserved fields via their properties and ensure they get updated in the dictionary upon writing entry to an archive.
|
All the apple queues are failing with an unrelated infra problem that is being investigated right now (thanks Stephen for notifying First Responders). |
| { | ||
| if (key is PaxEaName or PaxEaSize or PaxEaMTime or PaxEaGName or PaxEaUName) | ||
| { | ||
| throw new ArgumentException(string.Format(SR.TarReservedExtendedAttribute, key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| throw new ArgumentException(string.Format(SR.TarReservedExtendedAttribute, key)); | |
| throw new ArgumentException(SR.Format(SR.TarReservedExtendedAttribute, key)); |
| private string _name; | ||
| internal int _mode; | ||
| internal int _uid; | ||
| internal int _gid; | ||
| internal long _size; | ||
| internal DateTimeOffset _mTime; | ||
| private long _size; | ||
| private DateTimeOffset _mTime; | ||
| internal int _checksum; | ||
| internal TarEntryType _typeFlag; | ||
| internal string? _linkName; | ||
| private string? _linkName; | ||
|
|
||
| // POSIX and GNU shared attributes | ||
|
|
||
| internal string _magic; | ||
| internal string _version; | ||
| internal string? _gName; | ||
| internal string? _uName; | ||
| private string? _gName; | ||
| private string? _uName; | ||
| internal int _devMajor; | ||
| internal int _devMinor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider grouping by and documenting the fields that are written/expected in the extended attributes.
|
|
||
| _format = format; | ||
| _name = name; | ||
| Name = name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Name = name; | |
| _name = name; |
This can keep using the fields as all _isPaxEa* bools are false at this point. No? The same applies for the rest of the cases.
| if (!allowReservedKeys) | ||
| { | ||
| foreach ((string key, string _) in existing) | ||
| { | ||
| if (key is PaxEaName or PaxEaSize or PaxEaMTime or PaxEaGName or PaxEaUName) | ||
| { | ||
| throw new ArgumentException(string.Format(SR.TarReservedExtendedAttribute, key)); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| _ea = new Dictionary<string, string>(existing); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are now iterating existing twice when if (!allowReservedKeys). Can you please change it to iterate just once on that case?
|
|
||
| // Used to access the data section of this entry in an unseekable file | ||
| private TarReader? _readerOfOrigin; | ||
| internal TarReader? _readerOfOrigin; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| internal TarReader? _readerOfOrigin; | |
| private TarReader? _readerOfOrigin; |
| // fields have data, we store it to avoid data loss, but we don't yet expose it publicly. | ||
| internal byte[]? _gnuUnusedBytes; | ||
|
|
||
| internal string Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you create an entry and then change Name using the setter, will ExtendedAttributes["path"] return the updated value? Seems to me that it won't.
| if (!_isPaxEaGNameSynced && !string.IsNullOrEmpty(GName)) | ||
| { | ||
| TryAddStringField(ExtendedAttributes, PaxEaGName, _gName, FieldLengths.GName); | ||
| TryAddStringField(ExtendedAttributes, PaxEaGName, GName, FieldLengths.GName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TryAddStringField will only change the dictionary element if Encoding.UTF8.GetByteCount(value) > maxLength and you are signaling _isPaxEaGNameSynced regardless of that condition.
| if (!_isPaxEaUNameSynced && !string.IsNullOrEmpty(UName)) | ||
| { | ||
| TryAddStringField(ExtendedAttributes, PaxEaUName, _uName, FieldLengths.UName); | ||
| TryAddStringField(ExtendedAttributes, PaxEaUName, UName, FieldLengths.UName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| } | ||
|
|
||
| if (_size > 99_999_999) | ||
| Size = GetTotalDataBytesToWrite(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you were not doing this before, why is it needed now?
|
Closing in favor of #76404 |
Fixes #75215
A
PaxTarEntryobtained from aTarReaderwill have an extended attributes dictionary filled with the fields that get collected by default.When attempting to write such entry to a
TarWriterviaWriteEntry(TarEntry), we collect the default extended attribute fields and store them in the dictionary. The problem is that we were usingDictionary.Add, which throws if the key already exists, and that is not the intended behavior.The fix is to add the key using the indexer. That way, we always write the most up-to-date value to the dictionary.
We should try to get it into 7.0 as mentioned in the issue.