Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Extend RegexCharClass.Canonicalize range inversion optimization
There's a simple optimization in RegexCharClass.Canonicalize that was added in .NET 5, with the goal of taking a set that's made up of exactly two ranges and seeing whether those ranges were leaving out exactly one character.  If they were, the set can instead be rewritten as that character negated, which is a normalized form used downstream and optimized.  We can extend this normalization ever so slightly to be for two ranges separated not just be a single character but by more than that as well.
  • Loading branch information
stephentoub committed Nov 15, 2021
commit 17dfcb85c7f2c3a90526c9221aa6b98661b7135f
Original file line number Diff line number Diff line change
Expand Up @@ -1390,23 +1390,24 @@ private void Canonicalize(bool isNonBacktracking)
rangelist.RemoveRange(j, rangelist.Count - j);
}

// If the class now represents a single negated character, but does so by including every
// other character, invert it to produce a normalized form recognized by IsSingletonInverse.
if (!isNonBacktracking && // do not produce the IsSingletonInverse transformation in NonBacktracking mode
// If the class now represents a single negated range, but does so by including every
// other character, invert it to produce a normalized form with a single range. This
// is valuable for subsequent optimizations in most of the engines.
if (!isNonBacktracking && // TODO: Why is NonBacktracking special-cased?
!_negate &&
_subtractor is null &&
(_categories is null || _categories.Length == 0))
{
if (rangelist.Count == 2)
{
// There are two ranges in the list. See if there's one missing element between them.
// There are two ranges in the list. See if there's one missing range between them.
// Such a range might be as small as a single character.
if (rangelist[0].First == 0 &&
rangelist[0].Last == (char)(rangelist[1].First - 2) &&
rangelist[1].Last == LastChar)
rangelist[1].Last == LastChar &&
rangelist[0].Last < rangelist[1].First - 1)
{
char ch = (char)(rangelist[0].Last + 1);
rangelist[0] = new SingleRange((char)(rangelist[0].Last + 1), (char)(rangelist[1].First - 1));
rangelist.RemoveAt(1);
rangelist[0] = new SingleRange(ch, ch);
_negate = true;
}
}
Expand Down