-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Add IndexOfAnyValues #78093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add IndexOfAnyValues #78093
Changes from 1 commit
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
1e154c3
Add IndexOfAnyValues
MihaZupan c520667
Remove IAsciiSet special cases
MihaZupan a688c23
More tests
MihaZupan 6548ed3
Move expensive tests back to OuterLoop
MihaZupan 5b648d1
Tweak scalar paths
MihaZupan 55817fe
Improve IndexOfAnyValues<T> summary
MihaZupan 1e7684f
Add temporary MONO workaround
MihaZupan 1d79940
Simplify IndexOfAny{1, 2, 3}Values
MihaZupan fef55c8
Remove redundant AND before shift
MihaZupan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Tweak scalar paths
- Loading branch information
commit 5b648d1185a00b52a9156bb4c904a0ecf375e30b
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little concerned about the size impact this method is going to have. Any use of IndexOfAnyValues.Create is going to root all possible implementations (IndexOfEmptyValues, IndexOfAny1Value, IndexOfAny2Values, IndexOfAny3Values, IndexOfAny4Values, IndexOfAny5Values, IndexOfAnyLatin1CharValues, IndexOfAnyCharValuesProbabilistic, and IndexOfAnyValuesInRange), regardless of which is actually used. Basically we're adding a choke point.
I don't have a better answer, unless we want to limit what this API produces, such that we don't special-case APIs that already have a direct public entrypoint (IndexOf0/1/2/3, IndexOfRange).
Let's go with it for now, but keep an eye on it. It'd also be helpful in this regard for us to either in this PR or immediately after follow it up with using these APIs everywhere they're applicable and see what kind of impact it actually has on our size benchmarks, and then what we can do about it. For example, maybe there are ways to share most of the code associated with the vectorization of the individual algorithms, such that each doesn't bring in nearly as much as it does today (Adam and I had spoken about an approach where we'd parameterize the algorithms with a generic struct that provided the setup and comparisons as methods that the driver could then call appropriately as part of loops, unrolled call sites, etc.) For use within corelib, we might also want to make some of these helpers internal and allow those uses to bypass the public Create in order to directly instantiate the needed type. If things still end up being bad, we could consider replacing the general Create with more specialized ones focused on known characteristics of the data. Worst case, we could also employ some additional linker switches to remove various code paths if we want to trade off optimal speed for size.
All that said, I still really like the simplicity of the design we currently have, and that it affords us the ability to pick the best implementation given the supplied data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds 50kB to System.Private.CoreLib.dll with R2R. It is not end of the world given how much we add to the product in each release.
Note that this only helps with IL size. It does not help with native code size. Also, the generic structs cost some startup time and memory at runtime.
We may want to look into whether the factory can be interpreted at compile time by the native aot compiler. cc @MichalStrehovsky
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 already working on that.
FWIW, in quite a few places so far I've been able to delete substantial chunks of code with this API, so it'd be pretty cool if this was actually a net size improvement in the long run.
Over a bunch of places across runtime where I've used this API locally, I've never used it for the "(IndexOf0/1/2/3, IndexOfRange)" set you called out. Likewise, for cases with 4/5 values, I've never used it if I knew that it would just defer to the existing 4/5 value implementation that could be reached via
IndexOfAny(const).Ignoring cases where the underlying hardware doesn't support these algorithms, we're really just using Ascii in the vast majority of cases from within runtime. Regex can also make similar decisions and avoid using this API for cases that wouldn't benefit.
That said, it's still nice that this API gives you an optimal implementation even if you aren't as concerned about startup/size costs and just want to use the same API for everything without requiring you to be aware of internal implementation details.
I'm curious about what this would look like.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my concern isn't the all-up size of corelib, rather the impact on a small trimmed app. But your lack of concern lowers my concern :)
👍
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be.
Right. Which begs the question of whether it's actually worth including those, since whether we use them or not, at present all that code is going to be kept with the current trimming.
Yes, my intention is that the source generator and compiler only use this for cases where there isn't currently a better direct API for it. In large part that's to help with readability, but it'll also help reduce startup overheads.
The rough strawman, which we never actually tried, was something along the lines of (very rough pseudo code)
then with a shared driver like:
and then use like:
etc. It'd end up looking similar in structure to what you currently have in this PR, actually, just at a lower level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to interpret the
Createmethod this comment is on ahead of time depending on the callsite?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IndexOfAnyValues.Createis expected to be often used in static constructors with constant input. For example, like this: https://github.com/dotnet/runtime/pull/78666/files#diff-2599fdb4dd17bc235b019eb03aed3a26260765050d1e48419bc3e44319ecb147R31-R32