Skip to content

Conversation

@ViktorHofer
Copy link
Owner

Related: https://github.com/dotnet/corefx/issues/24145

Reviewers please read this first:

Implement & expose new Span/Memory APIs

The APIs aren't approved by or even submitted to the API review board yet. I had to make sure that what I had planned was even possible before taking it to the board. That's why this PR targets my master branch and not the corefx's one.

I introduced Span/Memory in places where it makes most sense.

Memory overloads: Match, Matches (return the provided input text in the Match object)
Span overloads: Escape, IsMatch, Replace, Match.Result

Replace with a MatchEvaluator is a special situation as the evaluator allows to reference members in the internal used Match object. As we are using a "light version" of Match (we are not setting the input Memory to avoid conversion from Span to Memory) for APIs that operate but not return a Match object, a "full" Match must be created here. To avoid copy costs from Span to Memory I'm pinning the Span for the lifetime of the Replace operation until the evaluation is done.

Break inheritance between RegexInterpreter and RegexRunner

I was discussing this with Stephen before. The current architecture makes it hard/impossible to modernize the Interpreter and introduce goodies like Span/MemoryPool because RegexRunner itself is exposed because of the CompileToAssembly functionality, which we currently don't support in Core. As the Interpreter itself is internal it makes sense to get rid of this burden and modernize it.

Currently the relevant code is copied from RegexRunner to RegexInterpreter but I play with idea of changing it to a by ref type and introduce ValueListBuilder members for the stacks. This should remove a lot of duplicate code. More about that later. Therefore please do yourself a favor and skip reviewing the code additions in RegexInterpreter.cs!

The CompiledRunner currently boxes the input string to a Memory when creating the Match object. We are limited with introducing changes here and I plan to discuss breaking compatibility with netfx for assemblies that were compiled with CompileToAssembly. More about that later, not part of this discussion/PR.

Tests

I added tests for the new overloads in all places where the string counterpart were used. As the string and the Span/Memory overloads share the same implementation, coverage should be fine. If you find any holes, please let me know.

Minor changes

  • Replace code paths especially for RTL optimized by avoiding an additional data structure for the reversed transformed inputs
  • Add missing xmldocs to exposed APIs where it was missing

Performance

Early Perfview/BenchmarkDotNet tests show that the existing string overloads benefit from the changes and aren't slower than they were before. More benchmarks and traces will follow soon!

I played with various inputs i.e. the regex-redux benchmark. After hacking the Shared ArrayPool (resizing the max array size) I could see with Perfview that 99% of the string GC cleanup was removed and replaced by borrowed char arrays which of course currently don't get cleaned up: https://github.com/dotnet/coreclr/issues/7747.

API Review

Ref struct for Match and siblings (Capture & Group).

I had a discussion with Jan offline and he pointed out that we might want to introduce a ref struct MatchValue type that is returned by APIs that take Span/Memory as an input.

The problem with just using the current class Match is that it gives you unsecure access to the Span. For example, you can send the Match object to other thread and start working on the Span there, while the current thread unwinds and frees the memory.
ref struct MatchValue would avoid this issue
(And also saved the alllocation)

The issues with that is that we currently have the following hiearchy: Match --> Group --> Capture and that Groups and Match contain collections of Captures/Groups.

Yes, the flip from class to valuetype tends to be like this. E.g. when we have introduced ValueTask to CoreFX, a bunch of parallel ValueSomething types went with it.
It is a topic for API review discussion
One option is to just not have Span version of the APIs that returns these collections or have callbacks

startat overload

For things like Regex.Replace, the startat argument means copy everything up to startat to the destination Span, and then run regular Replace that does not take startat method. So this looks like a convenience method to me - it saves you from typing a tiny bit of code in rare cases to achieve the same effect.

Should we add these startat convenience overloads for Span also?
If yes, this commit should be reverted bf7d7f9

RegexSplitEnumerator RTL yield order

If you call the Span version of Regex.Split and pass RegexOptions.RightToLeft to it the yield order of the enumerator will also be right to left as we start looking for matches from right to left. The current implementation (which is not an enumerator!) reverses the captured strings before returning.

RegexSplitEnumerator GetEnumerator (see ref diff)

I'm not aware of any other cases in the BCL where we have a GetEnumerator method like this on a struct enumerator. I understand you want it to be able to directly foreach the results without introducing an enumerable struct to serve as the return type, but I'm not sure this is a pattern we want to introduce. You should be sure to highlight this as part of any API review discussion.

Proposed APIs

This diff contains the Memory overloads and the MatchEvaluator overloads. See discussion above if we should introduce new ref types for Match, Group & Capture.

namespace System.Text.RegularExpressions
{
    public partial class Capture
    {
        internal Capture() { }
        public int Index { get { throw null; } }
        public int Length { get { throw null; } }
        public string Value { get { throw null; } }
+       public ReadOnlySpan<char> ValueSpan { get { throw null; } }
        public override string ToString() { throw null; }
    }
    public partial class Match : System.Text.RegularExpressions.Group
    {
        internal Match() { }
        public static System.Text.RegularExpressions.Match Empty { get { throw null; } }
        public virtual System.Text.RegularExpressions.GroupCollection Groups { get { throw null; } }
        public System.Text.RegularExpressions.Match NextMatch() { throw null; }
        public virtual string Result(string replacement) { throw null; }
+       public virtual bool TryResult(string replacement, Span<char> destination, out int charsWritten) { throw null; }
        public static System.Text.RegularExpressions.Match Synchronized(System.Text.RegularExpressions.Match inner) { throw null; }
    }
    public partial class Regex : System.Runtime.Serialization.ISerializable
    {
+       public ref struct SplitEnumerator
+       {
+           public ReadOnlySpan<char> Current { get { throw null; } }
+           public SplitEnumerator GetEnumerator() { throw null; }
+           public bool MoveNext() { throw null; }
+       }
        protected internal System.Collections.Hashtable caps;
        protected internal System.Collections.Hashtable capnames;
        protected internal int capsize;
        protected internal string[] capslist;
        protected internal System.Text.RegularExpressions.RegexRunnerFactory factory;
        public static readonly System.TimeSpan InfiniteMatchTimeout;
        protected internal System.TimeSpan internalMatchTimeout;
        protected internal string pattern;
        protected internal System.Text.RegularExpressions.RegexOptions roptions;
        protected Regex() { }
        protected Regex(System.Runtime.Serialization.SerializationInfo info, System.Runtime.Serialization.StreamingContext context) { }
        public Regex(string pattern) { }
        public Regex(string pattern, System.Text.RegularExpressions.RegexOptions options) { }
        public Regex(string pattern, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { }
        public static int CacheSize { get { throw null; } set { } }
        [System.CLSCompliant(false)]
        protected System.Collections.IDictionary Caps { get { throw null; } set { } }
        [System.CLSCompliant(false)]
        protected System.Collections.IDictionary CapNames { get { throw null; } set { } }
        public System.TimeSpan MatchTimeout { get { throw null; } }
        public System.Text.RegularExpressions.RegexOptions Options { get { throw null; } }
        public bool RightToLeft { get { throw null; } }
        public static string Escape(string str) { throw null; }
+       public static bool TryEscape(ReadOnlySpan<char> str, Span<char> destination, out int charsWritten) { throw null; }
        public string[] GetGroupNames() { throw null; }
        public int[] GetGroupNumbers() { throw null; }
        public string GroupNameFromNumber(int i) { throw null; }
        public int GroupNumberFromName(string name) { throw null; }
        protected void InitializeReferences() { }
        public bool IsMatch(string input) { throw null; }
+       public bool IsMatch(ReadOnlySpan<char> input) { throw null; }
        public bool IsMatch(string input, int startat) { throw null; }
        public static bool IsMatch(string input, string pattern) { throw null; }
        public static bool IsMatch(string input, string pattern, System.Text.RegularExpressions.RegexOptions options) { throw null; }
        public static bool IsMatch(string input, string pattern, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { throw null; }
+       public static bool IsMatch(ReadOnlySpan<char> input, string pattern, System.Text.RegularExpressions.RegexOptions options = System.Text.RegularExpressions.RegexOptions.None, System.TimeSpan? matchTimeout = null) { throw null; }
        public System.Text.RegularExpressions.Match Match(string input) { throw null; }
+       public System.Text.RegularExpressions.Match Match(ReadOnlyMemory<char> input) { throw null; }
        public System.Text.RegularExpressions.Match Match(string input, int startat) { throw null; }
        public System.Text.RegularExpressions.Match Match(string input, int beginning, int length) { throw null; }
        public static System.Text.RegularExpressions.Match Match(string input, string pattern) { throw null; }
        public static System.Text.RegularExpressions.Match Match(string input, string pattern, System.Text.RegularExpressions.RegexOptions options) { throw null; }
        public static System.Text.RegularExpressions.Match Match(string input, string pattern, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { throw null; }
+       public static System.Text.RegularExpressions.Match Match(ReadOnlyMemory<char> input, string pattern, System.Text.RegularExpressions.RegexOptions options = RegexOptions.None, System.TimeSpan? matchTimeout = null) { throw null; }
        public System.Text.RegularExpressions.MatchCollection Matches(string input) { throw null; }
+       public System.Text.RegularExpressions.MatchCollection Matches(ReadOnlyMemory<char> input) { throw null; }
        public System.Text.RegularExpressions.MatchCollection Matches(string input, int startat) { throw null; }
        public static System.Text.RegularExpressions.MatchCollection Matches(string input, string pattern) { throw null; }
        public static System.Text.RegularExpressions.MatchCollection Matches(string input, string pattern, System.Text.RegularExpressions.RegexOptions options) { throw null; }
        public static System.Text.RegularExpressions.MatchCollection Matches(string input, string pattern, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { throw null; }
+       public static System.Text.RegularExpressions.MatchCollection Matches(ReadOnlyMemory<char> input, string pattern, System.Text.RegularExpressions.RegexOptions options = RegexOptions.None, System.TimeSpan? matchTimeout = null) { throw null; }
        public string Replace(string input, string replacement) { throw null; }
        public string Replace(string input, string replacement, int count) { throw null; }
+       public bool TryReplace(ReadOnlySpan<char> input, string replacement, Span<char> destination, out int charsWritten, int count = -1) { throw null; }
        public string Replace(string input, string replacement, int count, int startat) { throw null; }
        public static string Replace(string input, string pattern, string replacement) { throw null; }
        public static string Replace(string input, string pattern, string replacement, System.Text.RegularExpressions.RegexOptions options) { throw null; }
        public static string Replace(string input, string pattern, string replacement, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { throw null; }
+       public static bool TryReplace(ReadOnlySpan<char> input, string pattern, string replacement, Span<char> destination, out int charsWritten, System.Text.RegularExpressions.RegexOptions options = System.Text.RegularExpressions.RegexOptions.None, System.TimeSpan? matchTimeout = null) { throw null; }
        public static string Replace(string input, string pattern, System.Text.RegularExpressions.MatchEvaluator evaluator) { throw null; }
        public static string Replace(string input, string pattern, System.Text.RegularExpressions.MatchEvaluator evaluator, System.Text.RegularExpressions.RegexOptions options) { throw null; }
        public static string Replace(string input, string pattern, System.Text.RegularExpressions.MatchEvaluator evaluator, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { throw null; }
+       public static bool TryReplace(ReadOnlySpan<char> input, string pattern, System.Text.RegularExpressions.MatchEvaluator evaluator, Span<char> destination, out int charsWritten, System.Text.RegularExpressions.RegexOptions options = System.Text.RegularExpressions.RegexOptions.None, System.TimeSpan? matchTimeout = null) { throw null; }
        public string Replace(string input, System.Text.RegularExpressions.MatchEvaluator evaluator) { throw null; }
        public string Replace(string input, System.Text.RegularExpressions.MatchEvaluator evaluator, int count) { throw null; }
+       public bool TryReplace(ReadOnlySpan<char> input, System.Text.RegularExpressions.MatchEvaluator evaluator, Span<char> destination, out int charsWritten, int count = -1) { throw null; }
        public string Replace(string input, System.Text.RegularExpressions.MatchEvaluator evaluator, int count, int startat) { throw null; }
        public string[] Split(string input) { throw null; }
        public string[] Split(string input, int count) { throw null; }
        public string[] Split(string input, int count, int startat) { throw null; }
+       public SplitEnumerator Split(ReadOnlySpan<char> input, int count = 0) { throw null; }
        public static string[] Split(string input, string pattern) { throw null; }
        public static string[] Split(string input, string pattern, System.Text.RegularExpressions.RegexOptions options) { throw null; }
        public static string[] Split(string input, string pattern, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout) { throw null; }
+       public static SplitEnumerator Split(ReadOnlySpan<char> input, string pattern, RegexOptions options = RegexOptions.None, TimeSpan? matchTimeout = null) { throw null; }
        void System.Runtime.Serialization.ISerializable.GetObjectData(System.Runtime.Serialization.SerializationInfo si, System.Runtime.Serialization.StreamingContext context) { }
        public override string ToString() { throw null; }
        public static string Unescape(string str) { throw null; }
+       public static bool TryUnescape(ReadOnlySpan<char> str, Span<char> destination, out int charsWritten) { throw null; }
        protected bool UseOptionC() { throw null; }
        protected bool UseOptionR() { throw null; }
        protected internal static void ValidateMatchTimeout(System.TimeSpan matchTimeout) { }
    }
}

@ViktorHofer ViktorHofer self-assigned this Jun 17, 2018
ViktorHofer pushed a commit that referenced this pull request Sep 7, 2018
* Fix ServiceController name population perf

* Split tests

* Remove dead field

* Remove new use of DangerousGetHandle

* SafeHandle all the things!

* VSB #1

* VSB #2

* Fix GLE

* Initialize machineName in ctor

* Test for empty name ex

* Null names

* Inadvertent edit

* Unix build

* Move interop into class

* Reverse SafeHandle for HAllocGlobal

* Fix tests

* Disable test for NETFX

* CR feedback

* Pattern matching on VSB

* Direct call

* typo
ViktorHofer added a commit that referenced this pull request Sep 19, 2018
ViktorHofer pushed a commit that referenced this pull request Sep 26, 2018
* Add TryAdd and Clear regression tests

* Add Run Condition on Clear()

* Address PR Feedback

* Address PR Feedback #2

*  Address PR Feedback #3

* Remove Extra Line

* Add MoveNext Result Asserts
bartonjs and others added 24 commits September 26, 2018 15:35
…bcurl

Rather than check a generic 1.0/1.1, test for the specific library version that
the crypto shim has loaded.  This makes things work when both libcurl
and the crypto shim are using OpenSSL 1.1 and also prevents a state where two
different copies of the library (at different patch versions) are utilized.
Since a 0-length buffer is technically possible (though not very usable), have sizeOfEachElement==0 -> ByteLength == 0.

Signed-off-by: dotnet-bot <[email protected]>
…side of lock (dotnet#32495)

Minimize the work done inside the lock in order to reduce contention.  This means we need to reacquire the lock if the connection isn't usable, but in the fast path case where the connection is usable, we remove the syscalls from being performed while holding the lock.
…nfoTests.ImplementedInterfaces test method signature (dotnet#32509)
…#31590)

* Add back the missing code that handle source=="vals[i]" code.

* Move tests to XmlSerializerTests.cs so it will run in sgen test project.

* Fix name
The thread statics and delegate marshalling are now being enabled for
collectible classes, so disable the corresponding tests.
Some of the work being done inside the lock can be done before or after it instead.
The GetFunctionPointerForDelegateTests had one subtest that was testing
that an attempt to use delegate marshalling in collectible types will
result in an exception as it was not supported. But the support is
coming in coreclr, so this test needs to be deleted.
…fsTestILC to preview1-03227-02, preview1-26927-03, preview1-26927-03, preview1-26927-01, beta-26927-00, beta-26927-00, respectively (master) (dotnet#32499)

* Update BuildTools, CoreClr, CoreFx, CoreSetup, ProjectNTfs, ProjectNTfsTestILC to preview1-03227-02, preview1-26927-03, preview1-26927-03, preview1-26927-01, beta-26927-00, beta-26927-00, respectively

* Disable failing test
rework connection pool cancellation to avoid deadlock
… preview1-26928-01, preview1-26928-01, preview1-26928-01, beta-26928-00, beta-26928-00, respectively (dotnet#32525)
ericstj and others added 25 commits October 18, 2018 20:44
Switch from BuildTools packaging to Arcade packaging
* fix SafeLsaMemoryHandle.InvalidHandle

* address PR feedback

* update tests

* address PR feedback

* update test

* sort csproj

* update test

* update test

* address PR feedback

* address PR feedback
…ctNTfsTestILC to preview1-27019-051, master-20181019-0048, preview1-27019-02, beta-27019-00, beta-27019-00, respectively (dotnet#32912)
* Fix a few XML comments

* Rename internal Dequeue<T> to Deque<T>

Just to better conform to the typical spelling.

* Fix includes ordering in Channels project file

* Fix continuation scheduling with non-default TaskScheduler and default SyncContext

When {Unsafe}OnCompleted is called on an AsyncOperation in Channels, without ConfigureAwait(false), we look to see if there's a current non-default SynchronizationContext, and if there isn't, then a current non-default TaskScheduler.  That's stored for later when the operation is completed.  However, if as OnCompleted is being called the operation is completed, then rather than using this captured context or scheduler later, the continuation is immediately queued to the appropriate context.  When that happens, if the current SynchronizationContext was non-null but still the default (an instance of the base class), we're incorrectly ignoring whatever TaskScheduler may have been present, and end up queueing to the thread pool rather than to that scheduler.

The fix is just to ensure that, in that case, we don't pay attention to the default SynchronizationContext.  This does so and adds a test.

* Add a test with both non-default SyncContext and TaskScheduler set

Validate that, when there's both a non-default SyncContext and a non-default TaskScheduler, we prefer the SyncContext.
* Disable socket inheritance on Windows and macOS

We already disabled it on Linux.  Doing the same for Windows and macOS.

* Add test for socket non-inheritance

* Disable new test on netfx
…1, master-20181019-1034, preview1-27019-05, respectively (dotnet#32922)
…Invoke (dotnet#32900)

* Add OpenSslVersionNumber

* Fix native compilation

* Seed some API doc comments

* Add a test

* Move (test) PlatformDetection's OpenSslVersion into PlatformDetection.
* Removing RTL_OSVERSIONINFOEX

* Try to substitude OpenProcessToken

* Removing CloseHandle

* Removing OpenProcessToken and related

* Removing RtlGetVersion and moving methods

* Removing xunit. Adding const

* RtlGetVersion reverted back

* Renaming
…1, master-20181021-0044, preview1-27021-01, respectively (dotnet#32936)
* Renamed SafeCloseSocket to SafeSocketHandle.

* Exposed Socket.SafeHandle and SafeSocketHandle.

* Use Socket.SafeHandle instead of reflection in SafePipeHandle.

* Added tests for Socket.SafeHandle.
* Renamed SafeCloseSocket to SafeSocketHandle.

* Exposed Socket.SafeHandle and SafeSocketHandle.

* Use Socket.SafeHandle instead of reflection in SafePipeHandle.

* Added tests for Socket.SafeHandle.
* Renamed SafeCloseSocket to SafeSocketHandle.

* Exposed Socket.SafeHandle and SafeSocketHandle.

* Use Socket.SafeHandle instead of reflection in SafePipeHandle.

* Added tests for Socket.SafeHandle.
* Renamed SafeCloseSocket to SafeSocketHandle.

* Exposed Socket.SafeHandle and SafeSocketHandle.

* Use Socket.SafeHandle instead of reflection in SafePipeHandle.

* Added tests for Socket.SafeHandle.
Remove SharedReference helper and use WeakRefrence<T> directly instead.
Add comments for code paths that were unclear.
Break inheritance between RegexInterpreter and RegexRunner.
ViktorHofer pushed a commit that referenced this pull request Aug 9, 2019
* Json prototype (#1)

Basic API for Json writable DOM with scenarios including collection initialization, accessing and modifying Json nodes.

* Json prototype - transformation API  (#2)

* transformation API added
* assertions to existing scenarios added

* Json prototype (#1)

Basic API for Json writable DOM with scenarios including collection initialization, accessing and modifying Json nodes.

* Json prototype - transformation API  (#2)

* transformation API added
* assertions to existing scenarios added

* JsonNumber implementation and tests (#3)

* IEquatables added
* JsonNumber implementation and unit tests added
* pragmas deleted
* review comments included, more tests added
* decimal support added to JsonNumber
* decimal support added to JsonArray and JsonObject

* all unimplemented classes and methods with accompanying tests removed

* First part of documentation added

* documentation completed

* missing exceptions added

* JsonElement changes removed

* part of the review comments included

* work on review comments

* code refactor

* more decimal tests added using MemberData

* more decimal tests added using MemberData

* more test cases added

* equals summary adjusted, equals tests added

* more Equals tests added, GetHashCode tests added, minor changes

* scientifing notation support added, rational numbers tests fixes

* rational overflow tests added

* ulong maxvalue tests added to rational types

* presision problems fixes

* exception strings fixed

* CI failing fixes (hopefully), review comments included

* missing == tests added to achieve 100% branch coverage

* review comments included

* trailing whitespaces fixes

* equals comments added

* equals object refactored to call quals json number

* merge fix
ViktorHofer pushed a commit that referenced this pull request Sep 24, 2019
This is attempt #2. The first attempt was commit
176da26.

Initialize HostArch to the arch-style used in RIDs directly by
converting things to lowercase.

Use the HostArch for the tool runtime instead of assuming x64.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.