Skip to content

copy-tracking: refactor async stream for classifying diff entries#9087

Open
ilyagr wants to merge 2 commits intomainfrom
ig/refactor-copy-diff-stream
Open

copy-tracking: refactor async stream for classifying diff entries#9087
ilyagr wants to merge 2 commits intomainfrom
ig/refactor-copy-diff-stream

Conversation

@ilyagr
Copy link
Contributor

@ilyagr ilyagr commented Mar 11, 2026

Cc @steadmon. Also, @davidbarsky, could you take a look if you have a moment; I'm hoping this might be interesting to you, and it'd be nice if you double-checked my claims about async .buffered.

I recommend reviewing this with a diff that ignores whitespace changes.

This refactors a part of #7537. See commit description for more details.

Checklist

If applicable:

  • n/a I have updated CHANGELOG.md
  • n/a I have updated the documentation (README.md, docs/, demos/)
  • n/a I have updated the config schema (cli/src/config-schema.json)
  • n/a I have added/updated tests to cover my changes
  • I fully understand the code that I am submitting (what it does,
    how it works, how it's organized), including any code drafted by an LLM.
  • For any prose generated by an LLM, I have proof-read and copy-edited with
    an eye towards deleting anything that is irrelevant, clarifying anything
    that is confusing, and adding details that are relevant. This includes,
    for example, commit descriptions, PR descriptions, and code comments.

@ilyagr ilyagr force-pushed the ig/refactor-copy-diff-stream branch 2 times, most recently from 996fec0 to 15080ef Compare March 11, 2026 23:21
@ilyagr ilyagr marked this pull request as ready for review March 11, 2026 23:21
@ilyagr ilyagr requested a review from a team as a code owner March 11, 2026 23:21
@ilyagr ilyagr force-pushed the ig/refactor-copy-diff-stream branch 2 times, most recently from 15639e4 to 5a41c5f Compare March 12, 2026 00:21
@ilyagr ilyagr mentioned this pull request Mar 12, 2026
6 tasks
@ilyagr ilyagr force-pushed the ig/refactor-copy-diff-stream branch from 5a41c5f to a3a72f3 Compare March 18, 2026 04:11
ilyagr added 2 commits March 17, 2026 21:12
This refactors a part of #7537.

The logic should remain exactly the same, except that the previous use
of FuturesOrdered is replaced with an async stream being evaluated with
`.buffered()`. If we used `.buffered(usize::MAX)`, this would have
been completely equivalent to the previous FuturesOrdered approach IIUC
(`.buffered` uses FuturesOrdered under the hood). My understanding is
that a limit of ~1024 is safer to prevent runaway memory use for
extremely large diffs.

I plan some follow-up bugfixes that should be a lot easier to follow
after this refactor.

In particular, with those bugfixes, there will be cases where
`classify_diff_entry` returns 0 entries. I considered using a SmallVec,
but AI thinks that this would expand our futures (which IIUC are kept
on the heap) for little benefit.
This could be squashed into parent, but the parent commit's diff seems
easier to review without this change.
@ilyagr ilyagr force-pushed the ig/refactor-copy-diff-stream branch from a3a72f3 to 3c078a0 Compare March 18, 2026 04:13
Copy link
Contributor

@PhilipMetzger PhilipMetzger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment from me

Comment on lines +369 to +371
/// Could be adjusted if we find the value too low or that this doesn't bound
/// memory use enough (seems unlikely to be over a megabyte with a rough
/// estimate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this needs to clarify that only certain backends will get bitten by this, e.g ersc and Google atm otherwise this looks a bit misleading.

pending: FuturesOrdered::new(),
}
}
concurrency_buffer_size: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This could be an Option to represent the intention here with RECINNEBDED_CONCURRENCY_BUFFER_SIZE where its an unwrap_or(...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants