-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Specialize Iterator::eq{_by}
for TrustedLen
iterators
#137122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
This comment was marked as resolved.
This comment was marked as resolved.
Ok, sorry for the noise, but I got confused and then re-confused and then finally (hopefully) un-confused. This optimization is ok on |
Iterator::{eq|cmp|partial_cmp}_by
for TrustedLen
iteratorsIterator::eq{_by}
for TrustedLen
iterators
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
… r=<try> Specialize `Iterator::eq{_by}` for `TrustedLen` iterators I'm sure I got some stuff wrong here, but opening this to get feedback and make sure it's a viable idea at all. ### Motivation I had a piece of code that open-coded `Iterator::eq`, something like: ```rust if current.len() != other.len() || current.iter().zip(other.iter()).any(|(a, b)| a != b) { ... } ``` ... where both `current` and `other` are slices of the same type. Changing the code to use `current.iter().eq(other)` made it a lot slower, since it wasn't checking the length of the two slices beforehand anymore, which in this instance made a big difference in perf. So I thought I'd see if I can improve `Iterator::eq`. ### Questions 1. I can't specialize for `ExactSizeIterator`, I think it's a limitation of `min_specialization` but not sure exactly why. Is specializing for `TrustedLen` good enough? 2. Should I make a codegen test for this? If so, then how? (I manually checked the assembly to make sure it works as expected) 3. Where should I put `SpecIterCompare`? 4. Can I get a perf run for this, please? I think the compiler uses this in a few places, so it might have an affect.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
where | ||
F: FnMut(Self::Item, <B as Iterator>::Item) -> ControlFlow<T>, | ||
{ | ||
if let (_, Some(a)) = self.size_hint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pondering: What if, instead of specialization, iter_compare
just always checked if the size_hint
s? That way it can work even for things that are neither exact nor trusted.
Something with a size hint of (2, Some(10))
can't possibly be equal to one with (14, None)
, for example.
And for ESIs, which almost always return let len = self.len(); (len, Some(len))
, the compiler can probably optimize the two checks into one. And for the default (0, None)
hint the compiler will optimize away the check because obviously the other one can't be shorter than zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thing is - I'm not sure it would be acceptable if Iterator::eq
started returning wrong results due to "wrong" size hints. It won't be UB, but I don't think any other iterator combinators can return wrong results because of a buggy size_hint()
. IMHO even ESI's guarantee (which is less than TrustedLen
's) might not be strong enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be equivalent to, say, for_each
not being run for the last elements of an iterator if its upper bound hint is incorrectly smaller than the number of elements it can yield.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An incorrect size_hint
is absolutely enough to trigger garbage-in-garbage-out. The default size_hint of (0, None)
is always correct, so this would only be an issue if someone explicitly overrides it incorrectly, and that's not allowed.
Of course it's not allowed to be UB if someone implements size_hint wrong, but implementing size_hint
wrong is no different from implementing fold
wrong, for example: it's 100% allowed to make other things misbehave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a really good point. Hadn't looked at it that way.
But I do think that as a user, I would be much more surprised if an incorrect size_hint
implementation caused anything other than wrong estimations for with_capacity
or something, rather than that an incorrect fold
will just blow everything up. Maybe the word "hint" makes it sound less consequential. 🤷
Anyways, @the8472 's concern about eliding side effects might be a deal breaker, so I'll wait for the libs team decision on that before trying out different approaches for implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, part of the problem is that, today, the size_hint is really only used by collect, which doesn't even use the upper part of it, just the bottom part.
I often wish there was instead just a suggested_reserve() -> usize
or something that was more obviously both 1) just for the collect case, and 2) explicitly documented as allowed to be garbage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about: iff the size_hint()
s are exact (the lower and the upper bounds are the same), compare them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the rationale for doing that?
TrustedLen
is an unsafe trait where the implementor must guarantee that the lower bound is equal to the upper bound (unless the upper bound is None
), so your suggestion would be defensively protecting against cases where the implementation is wrong.
I think that the fact that the trait is unsafe means we should prefer shaving off even a few instructions over safe-guarding for implementations that might not uphold the contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant doing this for any iterator, not just TrustedLen
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned in the meeting summary we wanted to go with the safer choice first. Doing it based on size_hint can be assessed separately.
Finished benchmarking commit (61b77a2): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary 7.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (primary 0.1%, secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 774.698s -> 773.335s (-0.18%) |
r? @the8472 |
Personally I think this would be great from a performance perspective but it has the consequence of eliding side-effects of the iterators. Last time I proposed a change that would elide some sideeffects of iterators it got NACK'd by the team. Nominating for discussion since perhaps this case is different since the eq impl involves some non-trivial variability in the amount of items that would be consumed before iteration stops. |
Thanks, that's a good point, hadn't occurred to me. I guess one way to get around that is to add a |
We discussed this during today's libs meeting, albeit with limited attendance. @rfcbot fcp merge |
Team member @the8472 has proposed to merge this. The next step is review by the rest of the tagged team members: Concerns:
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
Thank you for the effort, @the8472 ! |
Went through all the regressions, couldn't find a single true-positive one. So I think this is ready for a proper CR? 😁 |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
Never had a PR go through an FCP before... what happens now? I'm guessing this still needs to go through the "standard" review&merge process? |
if let (_, Some(a)) = self.size_hint() | ||
&& let (_, Some(b)) = b.size_hint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comparison isn't ideal. There can also be a match arm that checks if one is Some
and the other is None
and vice-versa. For TrustedLen
iterator a None upper bound guarantees that it's larger than usize::MAX
, which means its length would be different than whatever is in the Some
.
} | ||
|
||
trait SpecIterCompare<B: Iterator>: Iterator { | ||
fn spec_iter_compare<F, T>(self, b: B, f: F) -> ControlFlow<T, Ordering> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is poorly named, it does not specialize the general iter_compare
function which calculates Ordering
between iterators. This is only used for eq_by
, so spec_iter_equals
or something like that would be more appropriate.
And we can push down the ControlFlow -> bool conversion into it. I think it's confusing... (cont.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the ControlFlow
-> bool
conversion into a new iter_eq
fn so I don't have to repeat it in both the default and the specialized impls of SpeciterEq
. Might be a bit overkill, I dunno
let ord = a.cmp(&b); | ||
if ord != Ordering::Equal { | ||
return ControlFlow::Continue(ord); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and this could just return bool in that case.
d9f58d4
to
eb7abeb
Compare
This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
Addressed all your comments @the8472 , thank you for patiently bearing with me 🙏 |
@bors r+ rollup=never Previous benchmark showed no significant change, still better to not roll it up since it's an optimization. |
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 7c275d0 (parent) -> 2f4dfc7 (this PR) Test differencesShow 596 test diffs596 doctest diffs were found. These are ignored, as they are noisy. Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 2f4dfc753fd86c672aa4145940db075a8a149f17 --output-dir test-dashboard And then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
Finished benchmarking commit (2f4dfc7): comparison URL. Overall result: ❌✅ regressions and improvements - no action needed@rustbot label: -perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 4.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.1%, secondary -0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 472.692s -> 470.95s (-0.37%) |
I'm sure I got some stuff wrong here, but opening this to get feedback and make sure it's a viable idea at all.
Motivation
I had a piece of code that open-coded
Iterator::eq
, something like:... where both
current
andother
are slices of the same type.Changing the code to use
current.iter().eq(other)
made it a lot slower, since it wasn't checking the length of the two slices beforehand anymore, which in this instance made a big difference in perf. So I thought I'd see if I can improveIterator::eq
.Questions
ExactSizeIterator
, I think it's a limitation ofmin_specialization
but not sure exactly why. Is specializing forTrustedLen
good enough?SpecIterCompare
?