Skip to content

Conversation

@stephentoub
Copy link
Member

@stephentoub stephentoub commented Feb 17, 2022

If a pattern doesn't have any captures and if any match of that pattern will always be the same length, we can skip the Phase 3 computation as, given the computed starting position of the match, we know exactly where it's going to end.

Also took the opportunity to add some comments.

Fixes #65383 (mostly... I think the rest is #65532)

Method Toolchain Pattern Options Mean Ratio
Count \main\corerun.exe (?i)Holmes NonBacktracking 675.67 us 1.00
Count \pr\corerun.exe (?i)Holmes NonBacktracking 645.59 us 0.96
Count \main\corerun.exe (?i)Sherlock NonBacktracking 150.84 us 1.00
Count \pr\corerun.exe (?i)Sherlock NonBacktracking 141.54 us 0.94
Count \main\corerun.exe (?i)Sherlock Holmes NonBacktracking 160.33 us 1.00
Count \pr\corerun.exe (?i)Sherlock Holmes NonBacktracking 148.33 us 0.93
Count \main\corerun.exe Holmes NonBacktracking 131.74 us 1.00
Count \pr\corerun.exe Holmes NonBacktracking 110.33 us 0.84
Count \main\corerun.exe Sherlock NonBacktracking 63.59 us 1.00
Count \pr\corerun.exe Sherlock NonBacktracking 55.97 us 0.88
Count \main\corerun.exe Sherlock Holmes NonBacktracking 76.50 us 1.00
Count \pr\corerun.exe Sherlock Holmes NonBacktracking 65.42 us 0.86

If a pattern doesn't have any captures and if any match of that pattern will always be the same length, we can skip the Phase 3 computation as, given the computed starting position of the match, we know exactly where it's going to end.
@ghost
Copy link

ghost commented Feb 17, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

If a pattern doesn't have any captures and if any match of that pattern will always be the same length, we can skip the Phase 3 computation as, given the computed starting position of the match, we know exactly where it's going to end.

Also took the opportunity to add some comments.

Fixes #65383 (mostly)

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: 7.0.0

Comment on lines 515 to 522
if (startat == input.Length)
{
// Covers the special-case of an empty match at the end of the input.
uint prevKind = GetCharKind(input, startat - 1);
uint nextKind = GetCharKind(input, startat);

bool emptyMatchExists = _pattern.IsNullableFor(CharKind.Context(prevKind, nextKind));
return emptyMatchExists ?
return _pattern.IsNullableFor(CharKind.Context(prevKind, nextKind)) ?
new SymbolicMatch(startat, 0) :
SymbolicMatch.NoMatch;
}
Copy link
Contributor

@olsaarik olsaarik Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the new comment I realized this optimization should additionally handle the case where there are capture groups and do the same ApplyEffects thing FindEndPositionCapturing is doing. The difference will be visible for some patterns with nullable capture groups that have anchors in them.

Edit: Oh actually any nullable patterns with nullable capture groups.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @olsaarik. That's pre-existing this PR, yes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, shouldn't block this PR, just something I noticed.

@kunalspathak
Copy link
Contributor

kunalspathak commented Feb 22, 2022

@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock

4 participants