Skip to content

Conversation

@carlossanlop
Copy link
Contributor

Fixes #75215

A PaxTarEntry obtained from a TarReader will have an extended attributes dictionary filled with the fields that get collected by default.

When attempting to write such entry to a TarWriter via WriteEntry(TarEntry), we collect the default extended attribute fields and store them in the dictionary. The problem is that we were using Dictionary.Add, which throws if the key already exists, and that is not the intended behavior.

The fix is to add the key using the indexer. That way, we always write the most up-to-date value to the dictionary.

We should try to get it into 7.0 as mentioned in the issue.

@ghost
Copy link

ghost commented Sep 8, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #75215

A PaxTarEntry obtained from a TarReader will have an extended attributes dictionary filled with the fields that get collected by default.

When attempting to write such entry to a TarWriter via WriteEntry(TarEntry), we collect the default extended attribute fields and store them in the dictionary. The problem is that we were using Dictionary.Add, which throws if the key already exists, and that is not the intended behavior.

The fix is to add the key using the indexer. That way, we always write the most up-to-date value to the dictionary.

We should try to get it into 7.0 as mentioned in the issue.

Author: carlossanlop
Assignees: carlossanlop
Labels:

area-System.IO

Milestone: -

Comment on lines +45 to +47
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind elaborate further why are you skipping these archives? and how do you deal with them in other tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first two, the reason is the same as the one I wrote a few lines above.

Generally speaking, these tests are collecting the unarchived files from the filesystem. We have to skip them because either some OS variants do not support the files in these test cases, or nuget is unable to pack them from runtime-assets and unpack them unchanged into the test data.

Other tests are not consuming the unarchived files, they are consuming the .tar or .tar.gz files directly, which we can iterate without interacting with the filesystem. But this test in particular needs to be able to compare with the filesystem, not with TarReader entries.

The skipped test cases aren't that important. What's important is the copy of an entry that comes from a TarReader into a new TarWriter, particularly in the PAX format, to verify the bug is fixed in the extended attributes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test in particular needs to be able to compare with the filesystem, not with TarReader entries.

I left a suggestion below that can help you avoid comparing against the filesystem #75237 (comment). Could that help removing all these continues?

Comment on lines +252 to +250
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way you can validate the copy is by re-iterating originArchive (along with destinationArchive) and comparing as much fields as possible against the entries in destinationReader.

Copy link
Contributor Author

@carlossanlop carlossanlop Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have both tests? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see value in extracting the entries and then using them for comparison in this case, the reported error is when you copy a Tar to another without writing to the disk. IMO, the filesystem should not be involved at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if there is a bug in the way we are reading information with the Tar APIs, then comparing the items read with the first TarReader with a second TarReader will give me the same results. The utf8 case discussed above is a good example of this.

Copy link
Member

@jozkee jozkee Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then there should be a separate test that validates file system entries extracted to disk against tar archive entries. Said test should not exercise the "Copy tar" scenario, just "extract to disk then compare"

Comment on lines 626 to 629
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior looks like it was intentional, is it OK that we change it from "add if it was absent" to "always set"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CollectExtendedAttributesFromStandardFields method does exactly what other tar tools do when creating a PAX entry:

  • They always insert the values of name and mtime for all entry types, regardless if they fit or not in the standard field.
  • They insert linkname for symlinks and hardlinks if not empty, regardless if it fits or not in the standard field.
  • If gname or uname are set, they get added if their byte lengths are too large for the standard field.
  • If size does not fit in the standard field when converted to string, it gets inserted.

If the user decides to manually change the mtime before writing it into the TarWriter, then the mtime would not be updated.

I need to add more unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@stephentoub
Copy link
Member

We should try to get it into 7.0 as mentioned in the issue.

This hasn't moved in 10 days. What's the plan?

@carlossanlop
Copy link
Contributor Author

This hasn't moved in 10 days. What's the plan?

I had a call with @jozkee today. We have a mutual understanding now and will address the comments today.

… or when using conversion constructor.

Disallow dictionary with reserved keys when using the pax constructor that takes a dictionary.
Allow user to modify reserved fields via their properties and ensure they get updated in the dictionary upon writing entry to an archive.
@carlossanlop
Copy link
Contributor Author

All the apple queues are failing with an unrelated infra problem that is being investigated right now (thanks Stephen for notifying First Responders).

##[error]Git fetch failed with exit code: 128

{
if (key is PaxEaName or PaxEaSize or PaxEaMTime or PaxEaGName or PaxEaUName)
{
throw new ArgumentException(string.Format(SR.TarReservedExtendedAttribute, key));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new ArgumentException(string.Format(SR.TarReservedExtendedAttribute, key));
throw new ArgumentException(SR.Format(SR.TarReservedExtendedAttribute, key));

Comment on lines +66 to 83
private string _name;
internal int _mode;
internal int _uid;
internal int _gid;
internal long _size;
internal DateTimeOffset _mTime;
private long _size;
private DateTimeOffset _mTime;
internal int _checksum;
internal TarEntryType _typeFlag;
internal string? _linkName;
private string? _linkName;

// POSIX and GNU shared attributes

internal string _magic;
internal string _version;
internal string? _gName;
internal string? _uName;
private string? _gName;
private string? _uName;
internal int _devMajor;
internal int _devMinor;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider grouping by and documenting the fields that are written/expected in the extended attributes.


_format = format;
_name = name;
Name = name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Name = name;
_name = name;

This can keep using the fields as all _isPaxEa* bools are false at this point. No? The same applies for the rest of the cases.

Comment on lines +195 to 206
if (!allowReservedKeys)
{
foreach ((string key, string _) in existing)
{
if (key is PaxEaName or PaxEaSize or PaxEaMTime or PaxEaGName or PaxEaUName)
{
throw new ArgumentException(string.Format(SR.TarReservedExtendedAttribute, key));
}
}
}

_ea = new Dictionary<string, string>(existing);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are now iterating existing twice when if (!allowReservedKeys). Can you please change it to iterate just once on that case?


// Used to access the data section of this entry in an unseekable file
private TarReader? _readerOfOrigin;
internal TarReader? _readerOfOrigin;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
internal TarReader? _readerOfOrigin;
private TarReader? _readerOfOrigin;

// fields have data, we store it to avoid data loss, but we don't yet expose it publicly.
internal byte[]? _gnuUnusedBytes;

internal string Name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you create an entry and then change Name using the setter, will ExtendedAttributes["path"] return the updated value? Seems to me that it won't.

if (!_isPaxEaGNameSynced && !string.IsNullOrEmpty(GName))
{
TryAddStringField(ExtendedAttributes, PaxEaGName, _gName, FieldLengths.GName);
TryAddStringField(ExtendedAttributes, PaxEaGName, GName, FieldLengths.GName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryAddStringField will only change the dictionary element if Encoding.UTF8.GetByteCount(value) > maxLength and you are signaling _isPaxEaGNameSynced regardless of that condition.

if (!_isPaxEaUNameSynced && !string.IsNullOrEmpty(UName))
{
TryAddStringField(ExtendedAttributes, PaxEaUName, _uName, FieldLengths.UName);
TryAddStringField(ExtendedAttributes, PaxEaUName, UName, FieldLengths.UName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

}

if (_size > 99_999_999)
Size = GetTotalDataBytesToWrite();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you were not doing this before, why is it needed now?

@jozkee
Copy link
Member

jozkee commented Sep 29, 2022

Closing in favor of #76404

@jozkee jozkee closed this Sep 29, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Oct 30, 2022
@carlossanlop carlossanlop deleted the ExtendedAttributesCollect branch July 28, 2023 15:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tar: Not able to write a PaxTarEntry from a TarReader.

4 participants