copy-tracking: refactor async stream for classifying diff entries#9087
Open
copy-tracking: refactor async stream for classifying diff entries#9087
Conversation
996fec0 to
15080ef
Compare
15639e4 to
5a41c5f
Compare
yuja
reviewed
Mar 12, 2026
ilyagr
commented
Mar 12, 2026
5a41c5f to
a3a72f3
Compare
This refactors a part of #7537. The logic should remain exactly the same, except that the previous use of FuturesOrdered is replaced with an async stream being evaluated with `.buffered()`. If we used `.buffered(usize::MAX)`, this would have been completely equivalent to the previous FuturesOrdered approach IIUC (`.buffered` uses FuturesOrdered under the hood). My understanding is that a limit of ~1024 is safer to prevent runaway memory use for extremely large diffs. I plan some follow-up bugfixes that should be a lot easier to follow after this refactor. In particular, with those bugfixes, there will be cases where `classify_diff_entry` returns 0 entries. I considered using a SmallVec, but AI thinks that this would expand our futures (which IIUC are kept on the heap) for little benefit.
This could be squashed into parent, but the parent commit's diff seems easier to review without this change.
a3a72f3 to
3c078a0
Compare
Contributor
PhilipMetzger
left a comment
There was a problem hiding this comment.
minor comment from me
Comment on lines
+369
to
+371
| /// Could be adjusted if we find the value too low or that this doesn't bound | ||
| /// memory use enough (seems unlikely to be over a megabyte with a rough | ||
| /// estimate) |
Contributor
There was a problem hiding this comment.
nit: I think this needs to clarify that only certain backends will get bitten by this, e.g ersc and Google atm otherwise this looks a bit misleading.
| pending: FuturesOrdered::new(), | ||
| } | ||
| } | ||
| concurrency_buffer_size: usize, |
Contributor
There was a problem hiding this comment.
nit: This could be an Option to represent the intention here with RECINNEBDED_CONCURRENCY_BUFFER_SIZE where its an unwrap_or(...)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cc @steadmon. Also, @davidbarsky, could you take a look if you have a moment; I'm hoping this might be interesting to you, and it'd be nice if you double-checked my claims about async
.buffered.I recommend reviewing this with a diff that ignores whitespace changes.
This refactors a part of #7537. See commit description for more details.
Checklist
If applicable:
CHANGELOG.mdREADME.md,docs/,demos/)cli/src/config-schema.json)how it works, how it's organized), including any code drafted by an LLM.
an eye towards deleting anything that is irrelevant, clarifying anything
that is confusing, and adding details that are relevant. This includes,
for example, commit descriptions, PR descriptions, and code comments.