Skip to content

Conversation

@Hongten
Copy link
Contributor

@Hongten Hongten commented Feb 2, 2026

This PR resolved the timeout exception triggering reassign partitions with --bootstrap-server option. More can be found https://issues.apache.org/jira/browse/KAFKA-13392.

Root cause
When we run a reassignment using a plan file (e.g. xxx.json), the plan may still include replicas on the down broker. During the execution, we try to apply throttling by calling adminClient.incrementalAlterConfigs(configs). The issue is: this API needs to connect to the target broker to set the broker-level throttle configs.
If the broker is down, it’s obviously unreachable, so the client keeps retrying and eventually times out → TimeoutException.

My proposed solution
Add a new parameter: --broker-list-without-throttle
Description: Optional. Comma-separated broker ID list (e.g. 1,2) that should be excluded from broker-level throttle config updates during partition reassignment execution. When --execute and --throttle are used, it normally applies throttle configs on all brokers involved in the reassignment. If any of those brokers are known to be down or unreachable, adding them to --broker-list-without-throttle makes it skip the throttle-setting step for those brokers, avoiding retries/timeouts, while still throttling the remaining reachable brokers.
Value: a list of broker IDs, comma-separated
Example: 1001 or 1001,1002

  --bootstrap-server xxx.xxx.xxx.xxx:9092 \
  --reassignment-json-file reassignment-test.json \
  --throttle 209715200 \
  --execute \
  --broker-list-without-throttle 1001

If broker 1001 is known to be down, and the reassignment plan includes it, then we exclude 1001 from throttle config changes.

Why this is needed

  • If we don’t use '--throttle' at all, then Kafka won’t set throttle on any broker (including the down one). But that’s risky, migrations can easily blow up network bandwidth or disk IO.
  • If we only skip throttling for the known down broker, it doesn’t change the reassignment logic itself, and it avoids the timeout. Meanwhile, healthy brokers still get throttled properly.

@github-actions github-actions bot added triage PRs from the community tools labels Feb 2, 2026
@Hongten
Copy link
Contributor Author

Hongten commented Feb 2, 2026

cc @mimaison @chia7712 @showuon This PR is ready for review, please take a look when you get a moment. thanks.:)

Comment on lines +90 to +97
brokerListWithoutThrottleOpt = parser.accepts("broker-list-without-throttle", "Optional. Comma-separated broker ID list (e.g. 1,2) that " +
"should be excluded from broker-level throttle config updates during partition reassignment execution. " +
"When --execute and --throttle are used, it normally applies throttle configs on all brokers involved in the reassignment. " +
"If any of those brokers are known to be down or unreachable, adding them to --broker-list-without-throttle makes it " +
"skip the throttle-setting step for those brokers, avoiding retries/timeouts, while still throttling the remaining reachable brokers.")
.withRequiredArg()
.describedAs("broker list without throttle")
.ofType(String.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is adding a new param to a command line tool (public API), so it would need a KIP to discuss and approve with the community. You can find info here and take it from there https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-Process
Thanks for looking into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tools triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants