Skip to content

Conversation

yotamofek
Copy link
Contributor

I'm sure I got some stuff wrong here, but opening this to get feedback and make sure it's a viable idea at all.

Motivation

I had a piece of code that open-coded Iterator::eq, something like:

if current.len() != other.len()
    || current.iter().zip(other.iter()).any(|(a, b)| a != b) { ... }

... where both current and other are slices of the same type.
Changing the code to use current.iter().eq(other) made it a lot slower, since it wasn't checking the length of the two slices beforehand anymore, which in this instance made a big difference in perf. So I thought I'd see if I can improve Iterator::eq.

Questions

  1. I can't specialize for ExactSizeIterator, I think it's a limitation of min_specialization but not sure exactly why. Is specializing for TrustedLen good enough?
  2. Should I make a codegen test for this? If so, then how? (I manually checked the assembly to make sure it works as expected)
  3. Where should I put SpecIterCompare?
  4. Can I get a perf run for this, please? I think the compiler uses this in a few places, so it might have an affect.

@rustbot
Copy link
Collaborator

rustbot commented Feb 16, 2025

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 16, 2025
@rust-log-analyzer

This comment has been minimized.

@yotamofek

This comment was marked as resolved.

@yotamofek yotamofek closed this Feb 16, 2025
@yotamofek
Copy link
Contributor Author

Ok, sorry for the noise, but I got confused and then re-confused and then finally (hopefully) un-confused. This optimization is ok on Iterator::eq{_by}, but not on cmp and friends.
Fixed.

@yotamofek yotamofek reopened this Feb 16, 2025
@yotamofek yotamofek changed the title Specialize Iterator::{eq|cmp|partial_cmp}_by for TrustedLen iterators Specialize Iterator::eq{_by} for TrustedLen iterators Feb 16, 2025
@lqd
Copy link
Member

lqd commented Feb 17, 2025

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 17, 2025
@bors
Copy link
Collaborator

bors commented Feb 17, 2025

⌛ Trying commit d9f58d4 with merge 61b77a2...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 17, 2025
… r=<try>

Specialize `Iterator::eq{_by}` for `TrustedLen` iterators

I'm sure I got some stuff wrong here, but opening this to get feedback and make sure it's a viable idea at all.

### Motivation
I had a piece of code that open-coded `Iterator::eq`, something like:
```rust
if current.len() != other.len()
    || current.iter().zip(other.iter()).any(|(a, b)| a != b) { ... }
```
... where both `current` and `other` are slices of the same type.
Changing the code to use `current.iter().eq(other)` made it a lot slower, since it wasn't checking the length of the two slices beforehand anymore, which in this instance made a big difference in perf. So I thought I'd see if I can improve `Iterator::eq`.

### Questions
1. I can't specialize for `ExactSizeIterator`, I think it's a limitation of `min_specialization` but not sure exactly why. Is specializing for `TrustedLen` good enough?
2. Should I make a codegen test for this? If so, then how? (I manually checked the assembly to make sure it works as expected)
3. Where should I put `SpecIterCompare`?
4. Can I get a perf run for this, please? I think the compiler uses this in a few places, so it might have an affect.
@bors
Copy link
Collaborator

bors commented Feb 17, 2025

☀️ Try build successful - checks-actions
Build commit: 61b77a2 (61b77a2c3cd2d5520be1f24e0ebf0fc672982a72)

@rust-timer

This comment has been minimized.

where
F: FnMut(Self::Item, <B as Iterator>::Item) -> ControlFlow<T>,
{
if let (_, Some(a)) = self.size_hint()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pondering: What if, instead of specialization, iter_compare just always checked if the size_hints? That way it can work even for things that are neither exact nor trusted.

Something with a size hint of (2, Some(10)) can't possibly be equal to one with (14, None), for example.

And for ESIs, which almost always return let len = self.len(); (len, Some(len)), the compiler can probably optimize the two checks into one. And for the default (0, None) hint the compiler will optimize away the check because obviously the other one can't be shorter than zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing is - I'm not sure it would be acceptable if Iterator::eq started returning wrong results due to "wrong" size hints. It won't be UB, but I don't think any other iterator combinators can return wrong results because of a buggy size_hint(). IMHO even ESI's guarantee (which is less than TrustedLen's) might not be strong enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be equivalent to, say, for_each not being run for the last elements of an iterator if its upper bound hint is incorrectly smaller than the number of elements it can yield.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An incorrect size_hint is absolutely enough to trigger garbage-in-garbage-out. The default size_hint of (0, None) is always correct, so this would only be an issue if someone explicitly overrides it incorrectly, and that's not allowed.

Of course it's not allowed to be UB if someone implements size_hint wrong, but implementing size_hint wrong is no different from implementing fold wrong, for example: it's 100% allowed to make other things misbehave.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really good point. Hadn't looked at it that way.
But I do think that as a user, I would be much more surprised if an incorrect size_hint implementation caused anything other than wrong estimations for with_capacity or something, rather than that an incorrect fold will just blow everything up. Maybe the word "hint" makes it sound less consequential. 🤷

Anyways, @the8472 's concern about eliding side effects might be a deal breaker, so I'll wait for the libs team decision on that before trying out different approaches for implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, part of the problem is that, today, the size_hint is really only used by collect, which doesn't even use the upper part of it, just the bottom part.

I often wish there was instead just a suggested_reserve() -> usize or something that was more obviously both 1) just for the collect case, and 2) explicitly documented as allowed to be garbage.

Copy link
Contributor

@ChayimFriedman2 ChayimFriedman2 Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about: iff the size_hint()s are exact (the lower and the upper bounds are the same), compare them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the rationale for doing that?
TrustedLen is an unsafe trait where the implementor must guarantee that the lower bound is equal to the upper bound (unless the upper bound is None), so your suggestion would be defensively protecting against cases where the implementation is wrong.
I think that the fact that the trait is unsafe means we should prefer shaving off even a few instructions over safe-guarding for implementations that might not uphold the contract.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant doing this for any iterator, not just TrustedLen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in the meeting summary we wanted to go with the safer choice first. Doing it based on size_hint can be assessed separately.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (61b77a2): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 7.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
7.7% [7.2%, 8.1%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 7.7% [7.2%, 8.1%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary 0.1%, secondary 0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.2%] 5
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [0.0%, 0.2%] 5

Bootstrap: 774.698s -> 773.335s (-0.18%)
Artifact size: 362.37 MiB -> 362.39 MiB (0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 17, 2025
@Amanieu
Copy link
Member

Amanieu commented Feb 18, 2025

r? @the8472

@rustbot rustbot assigned the8472 and unassigned Amanieu Feb 18, 2025
@the8472
Copy link
Member

the8472 commented Feb 18, 2025

Personally I think this would be great from a performance perspective but it has the consequence of eliding side-effects of the iterators. Last time I proposed a change that would elide some sideeffects of iterators it got NACK'd by the team.

Nominating for discussion since perhaps this case is different since the eq impl involves some non-trivial variability in the amount of items that would be consumed before iteration stops.

@the8472 the8472 added the I-libs-nominated Nominated for discussion during a libs team meeting. label Feb 18, 2025
@yotamofek
Copy link
Contributor Author

Personally I think this would be great from a performance perspective but it has the consequence of eliding side-effects of the iterators.

Thanks, that's a good point, hadn't occurred to me. I guess one way to get around that is to add a MAY_HAVE_SIDE_EFFECT associated const to TrustedLen, but that sounds like a pretty huge change just for this PR, no?
Anyways, waiting to hear what the team thinks about this 😁

@the8472
Copy link
Member

the8472 commented Feb 19, 2025

We discussed this during today's libs meeting, albeit with limited attendance.
Those present didn't object to either approach (TrustedLen or sized_hint), but this PR we want to go with the TrustedLen approach since that should be the least controversial one.
It could be expanded to size_hint in a followup PR and be discussed separately there, if there's appetite for that.

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Feb 19, 2025

Team member @the8472 has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Feb 19, 2025
@craterbot craterbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Jun 30, 2025
@yotamofek
Copy link
Contributor Author

yotamofek commented Jun 30, 2025

Thank you for the effort, @the8472 !
Yeah, I guess I was comparing the numbers to runs for other modes, not build-and-test.
The numbers now are manageable :)

@yotamofek
Copy link
Contributor Author

Went through all the regressions, couldn't find a single true-positive one. So I think this is ready for a proper CR? 😁

@cuviper
Copy link
Member

cuviper commented Aug 27, 2025

@rfcbot resolve side-effects
@rfcbot reviewed

@rust-rfcbot rust-rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Aug 27, 2025
@rust-rfcbot
Copy link
Collaborator

🔔 This is now entering its final comment period, as per the review above. 🔔

@rust-rfcbot rust-rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Sep 6, 2025
@rust-rfcbot
Copy link
Collaborator

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@yotamofek
Copy link
Contributor Author

Never had a PR go through an FCP before... what happens now? I'm guessing this still needs to go through the "standard" review&merge process?

Comment on lines 4010 to 4011
if let (_, Some(a)) = self.size_hint()
&& let (_, Some(b)) = b.size_hint()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison isn't ideal. There can also be a match arm that checks if one is Some and the other is None and vice-versa. For TrustedLen iterator a None upper bound guarantees that it's larger than usize::MAX, which means its length would be different than whatever is in the Some.

}

trait SpecIterCompare<B: Iterator>: Iterator {
fn spec_iter_compare<F, T>(self, b: B, f: F) -> ControlFlow<T, Ordering>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is poorly named, it does not specialize the general iter_compare function which calculates Ordering between iterators. This is only used for eq_by, so spec_iter_equals or something like that would be more appropriate.
And we can push down the ControlFlow -> bool conversion into it. I think it's confusing... (cont.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the ControlFlow -> bool conversion into a new iter_eq fn so I don't have to repeat it in both the default and the specialized impls of SpeciterEq. Might be a bit overkill, I dunno

Comment on lines 4013 to 4016
let ord = a.cmp(&b);
if ord != Ordering::Equal {
return ControlFlow::Continue(ord);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and this could just return bool in that case.

@yotamofek yotamofek force-pushed the pr/std/iter-eq-exact-size branch from d9f58d4 to eb7abeb Compare September 18, 2025 19:48
@rustbot
Copy link
Collaborator

rustbot commented Sep 18, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@yotamofek
Copy link
Contributor Author

Addressed all your comments @the8472 , thank you for patiently bearing with me 🙏

@the8472
Copy link
Member

the8472 commented Sep 18, 2025

@bors r+ rollup=never

Previous benchmark showed no significant change, still better to not roll it up since it's an optimization.

@bors
Copy link
Collaborator

bors commented Sep 18, 2025

📌 Commit eb7abeb has been approved by the8472

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 18, 2025
@bors
Copy link
Collaborator

bors commented Sep 18, 2025

⌛ Testing commit eb7abeb with merge 2f4dfc7...

@bors
Copy link
Collaborator

bors commented Sep 19, 2025

☀️ Test successful - checks-actions
Approved by: the8472
Pushing 2f4dfc7 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 19, 2025
@bors bors merged commit 2f4dfc7 into rust-lang:master Sep 19, 2025
11 checks passed
@rustbot rustbot added this to the 1.92.0 milestone Sep 19, 2025
Copy link
Contributor

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 7c275d0 (parent) -> 2f4dfc7 (this PR)

Test differences

Show 596 test diffs

596 doctest diffs were found. These are ignored, as they are noisy.

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 2f4dfc753fd86c672aa4145940db075a8a149f17 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. dist-aarch64-apple: 7508.1s -> 9204.9s (22.6%)
  2. dist-x86_64-apple: 7952.3s -> 6881.1s (-13.5%)
  3. dist-powerpc64le-linux-musl: 6015.5s -> 5349.4s (-11.1%)
  4. dist-aarch64-windows-gnullvm: 4801.8s -> 4420.7s (-7.9%)
  5. dist-apple-various: 4093.5s -> 3781.8s (-7.6%)
  6. x86_64-gnu-llvm-20: 2495.3s -> 2347.2s (-5.9%)
  7. dist-loongarch64-linux: 5298.5s -> 5000.7s (-5.6%)
  8. dist-x86_64-netbsd: 4750.9s -> 4971.3s (4.6%)
  9. dist-android: 1519.2s -> 1588.6s (4.6%)
  10. i686-msvc-1: 9981.5s -> 10416.0s (4.4%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (2f4dfc7): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

@rustbot label: -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.2% [0.2%, 0.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.6% [-0.6%, -0.6%] 1
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary 4.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.4% [4.4%, 4.4%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 4.4% [4.4%, 4.4%] 1

Cycles

Results (primary -2.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.5% [-2.5%, -2.5%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.5% [-2.5%, -2.5%] 1

Binary size

Results (primary -0.1%, secondary -0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.1%, -0.1%] 1
Improvements ✅
(secondary)
-0.0% [-0.0%, -0.0%] 1
All ❌✅ (primary) -0.1% [-0.1%, -0.1%] 1

Bootstrap: 472.692s -> 470.95s (-0.37%)
Artifact size: 389.97 MiB -> 389.99 MiB (0.00%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue. to-announce Announce this issue on triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.