Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There are exactly 4 types of flaky tests in Windows x86 right now:
review_input_isolated_from_parent_history
=> Times out waiting for closing eventsreview_does_not_emit_agent_message_on_structured_output
=> Times out waiting for closing eventsauto_compact_runs_after_token_limit_hit
=> Times out waiting for closing eventsauto_compact_runs_after_token_limit_hit
=> Also has a problem where auto compact should add a third request, but receives 4 requests.1, 2, and 3 seem to be solved with increasing threads on windows runner from 2 -> 4.
Don't know yet why # 4 is happening, but probably also because of WireMock issues on windows causing races.