Skip to content

Conversation

@carlossanlop
Copy link
Contributor

@carlossanlop carlossanlop commented Jun 16, 2020

Fixes: #2162

The APIs added in this PR are:

namespace System.IO
{
    public class Path
    {
        public static string RemoveRedundantSegments(string path);
        public static string RemoveRedundantSegments(ReadOnlySpan<char> path);
        public static bool TryRemoveRedundantSegments(ReadOnlySpan<char> path, Span<char> destination, out int charsWritten);
    }
}

The original internal methods were not able to handle all the edge cases.

The unit tests for both Windows and Unix are passing, as well as all the previously existing unit tests for Path.GetFullPath, which is the main method in System.IO that was consuming the formerly internal method that removed redundant segments.

The code is different between Linux and Windows because there are many special cases in Windows.

Unix rules:

  • / at the beginning of the path means the path is rooted. It's also the segment separator. Repeated contiguous separators get merged into one.
  • "." refers the current directory. Gets removed except when it's the first character in a path without a known root.
  • ".." backtracks to the previous directory. The previous segment needs to be removed, and the current ".." segment does not get added. The exception is when the path root / is unknown and either there are no more directories to backtrack, or all other previous segments are also "..", in which case, the current segment must also stay.
  • \ is a valid file or folder character. For example: one\segment is a single file or folder name.
  • A segment whose name consists of 3 or more dots is a valid segment name.

Windows rules:

  • Both \ and / are considered path separators. The / character gets automatically normalized to \ when detected. Repeated contiguous separators get merged into one.
  • "." refers to the current directory. Gets removed except when it's the first character in a path without a known root.
  • ".." backtracks to the previous directory. The previous segment needs to be removed, and the current ".." segment does not get added. The exception is when the path root is unknown and either there are no more directories to backtrack, or all other previous segments are also "..", in which case, the current segment must also stay.
  • A segment whose name consist of 3 or more dots is a valid segment name, unless it's the last segment in the path and is not followed by a trailing separator, in which case the segment gets removed.
  • A segment whose name has trailing dots gets the trailing dots removed.
  • Paths that begin with a device prefix (\\.\, \\?\ or \??\) have some special rules:
    • A segment whose name consist of 3 or more dots never get removed.
    • A segment whose name has trailing dots do not get the dots removed.
  • Paths that have a drive but not a separator are considered unqualified. Example: "C:folder".
  • Paths that start with a separator but have no drive are considered rooted. Example: \folder.
  • PathInternal.GetRootLength already contains the logic to decide what portion of the path should be considered the root. It can vary depending if the path begins with a device prefix, or is in the \\Server\Share format, or if the root has no drive, or if the root has no separator.

Windows paths are explained in detail in these sources:

@carlossanlop carlossanlop added this to the 5.0.0 milestone Jun 16, 2020
@carlossanlop carlossanlop requested review from ericstj and jozkee June 16, 2020 00:27
@carlossanlop carlossanlop self-assigned this Jun 16, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@gfoidl

This comment has been minimized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paths in the form of \\?\ and \??\ should never change as they are, by definition, fully qualified. If you pass a path with ? to Windows there is no concept of "relative" segments- they won't be eaten/skipped. It is also theoretically possible to have some device that needs \.\ and \..\ to be retained, ? allows you to do that.

I don't think we should remove segments from \\?\C:\. as \\?\C:\ is not treated the same by Windows. No matter what you do we should be super explicit in the docs about the special behavior of \\?\.

\\.\ paths should, however, eat down to Path.GetPathRoot():

\\.\C:\
\\.\MyDrive\
\\.\UNC\Server\Share\

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeremyKuhne I though about this case, but then I thought: if the user is explicitly asking to remove paths from the path prefixed with \??\ or \\?\, we probably should help them. It's only when they call GetFullPath or other APIs that they need to be considered already fully qualified.

But now that you mention that . and .. could be needed by devices, and need to be retained, I'll make sure to adjust the code and the unit tests to avoid removing segments from paths with these prefixes. If users need to remove segments from these paths, they can check if the path starts with \\?\ or \??\, in which case, they should call the RemoveRedundantSegments APIs that take a span, and pass their string sliced (removing the prefix).

Thanks for confirming this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, Path.GetPathRoot() does some amazing magic that helped me ensure a lot of cases got pre-covered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other main issue I had with manipulating them is that they are no longer equivalent. Everything else you'd get the same results using the path before and after removing relative segments. I think that is one of the contracts here- you get the same results before and after. That said, there is weirdness around traversing links on Unix that I've never really rationalized. 🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeremyKuhne I remember another reason why I decided to make the RemoveRedundantSegments methods to work with paths prefixed with \\?\:
the pre-existing System.Runtime.Extensions unit tests for GetFullPath are expecting the paths prefixed with \\?\ to get normalized.

Removing the ability from GetFullPath from removing redundant segments and duplicate separators from paths prefixed with \\?\ or \??\ would be considered a breaking change, wouldn't it?

[Theory,
MemberData(nameof(GetFullPath_Windows_CommonDevicePaths))]
public void GetFullPath_CommonDevice_Windows(string path, string basePath, string expected)
{
Assert.Equal(@"\\.\" + expected, Path.GetFullPath(path, @"\\.\" + basePath));
Assert.Equal(@"\\?\" + expected, Path.GetFullPath(path, @"\\?\" + basePath));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to consider this a bug fix instead: Paths that begin with \\?\ should not get modified by either GetFullPath(string, string) or RemoveRedundantSegments, in which case I would update these unit tests so when calling GetFullPath(string, string) with a path prefixed with \\?\ would expect the same string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emphasis on "duplicate separators". It seems all these pre-existing unit tests are failing because they only expect duplicate separators to be collapsed, but maybe even that should not be addresed for paths prefixed with \\?\ and \??\.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what's happening. Those GetFullPath unit tests expect the \\?\ paths to get combined with the base path. I will create a separate unit test method to verify extended paths only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the old existing unit tests. I split the cases that get \\?\ prefixed into separate unit tests with different outputs (unmodified, except when the paths get combined).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device path should eat down to ..\folder here as only the first .. is the "volume". Device paths are, by definition, always fully qualified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that same note \\?\C:A\bar.txt the root is \\?\C:A\. Wanted to point that out so we don't get that confused with drive relative, which only happens in the C:A form.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device path should eat down to ..\folder here as only the first .. is the "volume". Device paths are, by definition, always fully qualified.

Thanks. If I'm understanding this correctly, it seems that my answer in the comment above will make sure this case gets covered.

On that same note \\?\C:A\bar.txt the root is \\?\C:A\. Wanted to point that out so we don't get that confused with drive relative, which only happens in the C:A form.

Thanks for confirming. Yes, I found out about edge cases like C:A when I realized Path.GetPathRoot() was giving me some strange root lengths after the device prefix, but I read its code and understood the reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fully explicit:

  • \\?\..\..\folder -> \\?\..\..\folder (untouched)
  • \\.\..\..\folder -> \\.\..\folder (\\.\..\ is the root)
  • \\.\C:\..\folder -> \\.\C:\folder\ (for a more common root example)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first example, I just made the adjustment in my last commits to ensure they are not modified.

I see what you mean with your 2nd example. I think my code may be able to handle that case right now, but I need to double check if I have unit tests for it.

The third case is handled for sure.

Copy link
Contributor Author

@carlossanlop carlossanlop Jun 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: For the 1st case, I submitted a commit to ensure they are not modified. For the 2nd case, I was already testing it in the unit tests that consume member data containing the substring Prefix_DriveRootless in their name.

@carlossanlop
Copy link
Contributor Author

carlossanlop commented Jun 17, 2020

I'm investigating why the net472 leg is failing.
Edit: Need to add files to Microsoft.IO.Redist.csproj
Edit 2: Added the files, but using sb[^1] (System.Index) is not available there, so I modified those index accessors to sb[sb.Length - 1].

@carlossanlop carlossanlop requested a review from JeremyKuhne June 17, 2020 23:50
@carlossanlop
Copy link
Contributor Author

@jozkee @ericstj can I get a review?

Copy link
Member

@ericstj ericstj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor feedback, haven't had time to really dig in and understand this. Curious if there is a way to better visualize the diff to the product code so it doesn't look like such a rewrite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: unrelated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VS keeps modifying them automatically.

Copy link
Member

@jkotas jkotas Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to regress performance of the existing GetFullPath API? This API was not allocating anything when there was nothing to normalize. It is always going to allocate a new string after this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the old API used to look. There was one allocation that was always being done, when calling the constructor:

        internal static string RemoveRelativeSegments(string path, int rootLength)	
        {	
            var sb = new ValueStringBuilder(stackalloc char[260 /* PathInternal.MaxShortPath */]);	

            if (RemoveRelativeSegments(path.AsSpan(), rootLength, ref sb))	
            {	
                path = sb.ToString();	
            }	

            sb.Dispose();	
            return path;	
        }

Then the overload that takes 3 arguments, which is being called in the if clause, would do the job of copying all the characters from the original path into the ValueStringBuilder instance.

The allocations were being done before. The main difference is that I added extra logic to handle some edge cases that the former method was not handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a unit test calling Path.GetFullPath, with my changes, inside an infinite while loop, and collected time spent using dotnet trace for a few seconds, then viewed the results in speedscope.app.

On the right side you can see the stack of calls, and the length of each bar is the time spent on each method. Most of the time is spent in CPU_TIME inside the TryRemoveRedundantSegments method.

image

Copy link
Member

@jkotas jkotas Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was one allocation that was always being done, when calling the constructor:

There is no allocation done on GC heap in the current implementation of GetFullPath API when there is nothing to normalize.

Copy link
Member

@jkotas jkotas Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having performance numbers before/after this change for the existing Path APIs would be useful. https://github.com/dotnet/performance has micro-benchmarks for Path APIs.

On Windows, this change is replacing ~100 line RemoveRelativeSegments method with ~400 lines in a new RedundantSegmentHelper type and multiple methods. That's a bit concerning for an API that was advertised as already implemented as an internal API (#2162 (comment)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. I'm working on getting the performance numbers.

Some things to keep in mind:

  • The reason why I had to rewrite everything is because the original internal method was unable to handle all the Windows edge cases, particularly with device prefixes.
  • I put everything in a new type because it would help having the code organized. Should I put everything in Path.cs to prevent creating a new type, even if that class gets cluttered?
  • I split the functionality into all those methods to make the code more readable. Should I instead put as much code as possible into fewer methods, even if we sacrifice readability?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas / @carlossanlop I recommend we:

  1. Capture the numbers to see how much of a regression we introduce
  2. Decide if the regression is either A) acceptable, B) within reason, but should be tuned, or C) not within reason
  3. If it's within reason but should be tuned, file a perf regression issue to fix after Preview 8 and before RC2 (with other perf regression issues)
  4. If it the regression ends up being so egregious that we aren't confident we'd be able to tune it back within reason, reevaluate this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: undo formatting changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VS keeps modifying them automatically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 260?
Also, can this be a const?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxShortPath is a limit on Windows, but not on Unix. There are other places in cross-platform files where we do this same thing.

For example:

var sb = new ValueStringBuilder(stackalloc char[260 /* PathInternal.MaxShortPath */]);

var builder = new ValueStringBuilder(stackalloc char[260]); // MaxShortPath on Windows

What I can do is create an additional private constant with a name that describes the maximum cross platform path length, and use it in all these places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do, magic numbers bad. Constants have same IL as literals and are more maintainable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Store path.AsSpan() on a local variable since is called more than once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on why you have to get the root length again after trimming the prefix? Also, shouldn't rootLength be added to charsToSkip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charsToSkip is the result of calling PathInternal.GetRootLength(originalPath), which is a method that for some special cases (specifically in paths that start with a device prefix), will tell me the root includes some segments beyond the prefix and the drive segment. The charsToSkip value will help ensure we don't remove segments beyond these characters (that substring must remain unmodified).

But it doesn't tell me if the path was rooted or not. Calling PathInternal.GetRootLength(pathWithoutPrefix) (with the prefix excluded) returns a different value that excludes those additional segments after the drive.

Here's an example:

using System;
using System.IO;
namespace ConsoleApp
{
    class Program
    {
        static void Main()
        {
            string[] paths = new string[]
            {
                @"\\.\C:..\folder\subfolder\file.txt", // GetPathRoot returns "\\.\C:..\"
                @"C:..\folder\subfolder\file.txt"      // GetPathRoot returns "C:"
            };

            foreach (string path in paths)
            {
                Console.WriteLine(Path.GetPathRoot(path));
            }
        }
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to the code to explain why I'm doing those two calls.

@carlossanlop
Copy link
Contributor Author

@jkotas @jozkee @jeffhandley here is the perf PR: dotnet/performance#1394

Results

master branch

Method path Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
GetFullPathWithoutRedundantSegments .......(...)......\. [104] 527.9 ns 6.47 ns 5.73 ns 528.5 ns 518.3 ns 537.8 ns 0.0471 - - 304 B
GetFullPathWithoutRedundantSegments .......\f(...)3........ [92] 518.4 ns 8.76 ns 7.77 ns 516.0 ns 505.8 ns 532.7 ns 0.0473 - - 304 B
GetFullPathWithoutRedundantSegments C:....\fol(...)3........ [90] 243.5 ns 2.14 ns 2.00 ns 243.4 ns 240.6 ns 246.5 ns 0.0127 - - 80 B
GetFullPathWithoutRedundantSegments \.\Server\S(...)......\. [109] 291.5 ns 4.25 ns 3.98 ns 291.3 ns 286.3 ns 299.0 ns 0.0162 - - 104 B

RRS branch (this PR)

Method path Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
GetFullPathWithoutRedundantSegments .......(...)......\. [104] 516.9 ns 3.70 ns 3.09 ns 516.8 ns 512.0 ns 523.9 ns 0.0478 - - 304 B
GetFullPathWithoutRedundantSegments .......\f(...)3........ [92] 507.8 ns 7.01 ns 6.56 ns 506.7 ns 499.1 ns 521.7 ns 0.0467 - - 304 B
GetFullPathWithoutRedundantSegments C:....\fol(...)3........ [90] 240.8 ns 1.81 ns 1.51 ns 240.5 ns 238.4 ns 243.8 ns 0.0126 - - 80 B
GetFullPathWithoutRedundantSegments \.\Server\S(...)......\. [109] 285.8 ns 1.69 ns 1.58 ns 286.0 ns 283.5 ns 288.4 ns 0.0159 - - 104 B

Comparison

❯ dotnet run --base "D:\perf_before" --diff "D:\perf_after" --threshold 0.01%

summary:
better: 4, geomean: 1.018
total diff: 4

No Slower results for the provided threshold = 0.01% and noise filter = 0.3ns.

Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.IO.Tests.Perf_Path.GetFullPathWithoutRedundantSegments(path: "..\.\..\.\. 1.02 528.53 516.76
System.IO.Tests.Perf_Path.GetFullPathWithoutRedundantSegments(path: "\\.\Server\ 1.02 291.31 286.01
System.IO.Tests.Perf_Path.GetFullPathWithoutRedundantSegments(path: "..\.\..\.\. 1.02 516.01 506.69
System.IO.Tests.Perf_Path.GetFullPathWithoutRedundantSegments(path: "C:\..\.\.\f 1.01 243.39 240.52

@jkotas
Copy link
Member

jkotas commented Jul 11, 2020

Your performance test is Windows specific and it is testing atypical case that is very unlikely to appear in the real world. Corner cases like this one are good for functional testing, but they are not appropriate for performance testing.

You should focus performance testing on common real world use cases of this API, and also cover both Windows and Unix.

E.g. I have done a quick check on Linux that run Path.GetFullPath("/home/jkotas/runtime/src/libraries/System.Private.CoreLib/src/System/IO/Path.cs");. I see 1.5x regression on this one with your change.

@carlossanlop carlossanlop modified the milestones: 5.0.0, 6.0.0 Aug 4, 2020
@carlossanlop
Copy link
Contributor Author

I checked-out this branch in my WSL with Ubuntu. I ran all the Path benchmarks, where I also made sure to include a benchmark testing the path shared by @jkotas above (see PR), and the new benchmark did not show up:

carlos@calopepc:~/performance/src/tools/ResultsComparer$ dotnet run -c release --base "/home/carlos/perf_before/" --diff "/home/carlos/perf_after/" --threshold 0.01%
summary:
better: 4, geomean: 1.077
worse: 3, geomean: 1.053
total diff: 7

| Slower                                     | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| ------------------------------------------ | ---------:| ----------------:| ----------------:| --------:|
| System.IO.Tests.Perf_Path.GetTempPath      |      1.12 |           157.33 |           175.82 |         |
| System.IO.Tests.Perf_Path.GetDirectoryName |      1.02 |            33.14 |            33.91 |         |
| System.IO.Tests.Perf_Path.GetFileName      |      1.02 |            31.73 |            32.39 |         |

| Faster                                                 | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| ------------------------------------------------------ | ---------:| ----------------:| ----------------:| --------:|
| System.IO.Tests.Perf_Path.Combine                      |      1.28 |             3.84 |             3.00 |         |
| System.IO.Tests.Perf_Path.ChangeExtension              |      1.04 |            21.01 |            20.26 |         |
| System.IO.Tests.Perf_Path.GetFullPathForReallyLongPath |      1.01 |          3067.42 |          3039.39 |         |
| System.IO.Tests.Perf_Path.GetFullPathForLegacyLength   |      1.00 |           610.23 |           608.12 |         |

@jkotas
Copy link
Member

jkotas commented Nov 26, 2020

I still see the regression for the example I gave above. Can you run this program before and after changes:

using System.IO;
using System.Diagnostics;

var sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 100000000; i++)
    Path.GetFullPath("/home/jkotas/runtime/src/libraries/System.Private.CoreLib/src/System/IO/Path.cs");
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds.ToString());    

I prints 41390 with your changes and 23331 without your changes in my WSL2 Ubuntu instance.

Copy link
Member

@jkotas jkotas Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was one allocation that was always being done, when calling the constructor:

There is no allocation done on GC heap in the current implementation of GetFullPath API when there is nothing to normalize.

@carlossanlop
Copy link
Contributor Author

carlossanlop commented Jan 19, 2021

@jkotas I modified Path.GetFullPath so that it is now calling the RemoveRedundantSegments overload that takes a string, instead of a span. That way, when there is no redundancy, the original string is returned.

I ran your code snippet in my Mac, and there was only a slight improvement:

  • Without my changes: 24886 ms
  • With my changes: 44872 ms
  • With my changes, but calling the string overload: 40983 ms

Would it be possible to get the changes merged as they are right now so we can get potential feedback in Preview 1? I can open an issue to make additional performance improvements later, before we ship.

@jkotas
Copy link
Member

jkotas commented Jan 19, 2021

Would it be possible to get the changes merged as they are right now so we can get potential feedback in Preview 1?

Who do you expect to use these APIs and to get a feedback from?

@carlossanlop
Copy link
Contributor Author

The original proposal had some links from people interested in it: #15584, MonoGame/MonoGame#7167. There was also an issue from the arcade repo I think, but right now I can't find it.

@jkotas
Copy link
Member

jkotas commented Jan 19, 2021

Could you please ask somebody from your feature crew to do detailed review of this change?

I can do sanity check and sign-off after you have signoff from a member of your feature crew.

@jkotas
Copy link
Member

jkotas commented Jan 19, 2021

I ran your code snippet in my Mac, and there was only a slight improvement:
Without my changes: 24886 ms
With my changes: 44872 ms

Does this mean that there is still 1.8x performance regression caused by your changes? It does not sound acceptable to me.

@carlossanlop
Copy link
Contributor Author

The code's doing more work than before, too. I had to make sure to handle more cases than the original internal method was handling.

@jkotas
Copy link
Member

jkotas commented Jan 19, 2021

The code's doing more work than before, too.

I do not think that this is a good justification for introducing 1.8x regression in existing public APIs.

@jeffhandley jeffhandley self-assigned this Jan 20, 2021
@carlossanlop
Copy link
Contributor Author

I retrieved official benchmark results for Windows And Linux. Here is the performance PR.

The Stopwatch runs I executed above are unreliable. I ran them within a unit test in release mode, but I think the Xunit environment may have influenced the result.

@jeffhandley
Copy link
Member

Those perf results look much better, @carlossanlop. Do I understand the following from those results correctly?

  • Ubuntu
    • GetFullPathNoRedundantSegments regresses 4%
    • GetFullPathForReallyLongPath regresses 3%
  • Windows
    • GetFullPathWithRedundantSegments regresses 17% (this is the new scenario that was not supported as well before)
    • GetFullPathForTypicalLongPath regresses 9%
    • GetFullPathForReallyLongPath regresses 4%
    • GetFullPathForLegacyLength improves 2%
    • GetFullPathNoRedundantSegments improves 1%

That all seems acceptable to me for this new functionality.

@jkotas
Copy link
Member

jkotas commented Jan 28, 2021

Note that this is change is not adding any new functionality to the existing APIs. The existing APIs function exactly same as before, except that they are somewhat slower now. It may be useful to understand where this regression is coming from and whether it is possible to fix it by small tweaks while still allowing sharing of the code with the new APIs.

@jeffhandley
Copy link
Member

My mistake; thanks, @jkotas. I had it in my head we were introducing the redundant segment removal behavior for the existing APIs as part of this--I forgot that behavior was already there and we are just refactoring those methods to extract the new public API.

@carlossanlop -- can you confirm that GetFullPath always has the same output now as it did before? If there are edge cases that weren't fully handled before, and they are now, then that could justify some slight regressions. But if all cases were previously handled correctly, then we'd need to rework this to avoid the regression.

@iSazonov
Copy link
Contributor

I am not sure this helps but if you want a real world example how PowerShell uses relative paths you could look PowerShell NormalizeRelativePathHelper method https://github.com/PowerShell/PowerShell/blob/6d5b0b3ad11dd8b4c3b106e70195948a96af08bf/src/System.Management.Automation/namespaces/FileSystemProvider.cs#L5255

@carlossanlop carlossanlop marked this pull request as draft February 13, 2021 01:49
Base automatically changed from master to main March 1, 2021 09:06
@hamarb123
Copy link
Contributor

Hi, I'm looking forward to this change.
Just wondering if it would leave something like a/b/../c as a/c on Unix and as a\c on Windows without adding any additional path things, namely the C:\ on Windows or / at the start on Unix.
Also, would it leave an additional / or \ at the end of a path such as a/b/../c/?

@carlossanlop
Copy link
Contributor Author

Just wondering if it would leave something like a/b/../c as a/c on Unix and as a\c on Windows without adding any additional path things, namely the C:\ on Windows or / at the start on Unix.

@hamarb123 It depends on the OS where you are calling this API:

  • If you're on Windows, and your string represents a Windows-friendly path (one that uses either \ or / as separator, because both are considered valid separators on Windows), then both a/b/../c and a\b\..\c should return a\c, with the separator normalized to \ (because that one is the official on Windows).

  • If you're on Unix, and you pass a/b/../c, this API should return a/c, because / is the only valid path separator on Unix. But if you pass a\b\..\c, (I think) 1the whole string would be considered a single path segment, so the API would return a\b\..\c (no changes), because \ is not a valid Unix separator.

Also, would it leave an additional / or \ at the end of a path such as a/b/../c/?

Yes, this API should preserve trailing separators, but making sure they get normalized (\->/ in Windows) and any duplicates are removed (C:\path\\\ -> C:\path\).

@ghost
Copy link

ghost commented Apr 23, 2021

Draft Pull Request was automatically closed for inactivity. It can be manually reopened in the next 30 days if the work resumes.

@ghost ghost closed this Apr 23, 2021
@ghost ghost locked as resolved and limited conversation to collaborators May 23, 2021
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Path.RemoveRelativeSegments Api

10 participants