Skip to content

Conversation

@emasab
Copy link
Contributor

@emasab emasab commented Jan 26, 2023

Reopening with a different branch as CI only runs on feature branches

Remaining tasks:

  • Write changelog to outline the new behaviour with epochs,
    and what it fixes.
    Also mention the new ERR__LOG_TRUNCATION error code that will
    be raised if log truncation is detected on the broker due to
    unclean leader election.

  • Write test cases using the mock broker. This also requires
    the new OffsetForLeaderEpochRequest to be added to the mock handlers.

  • Run the AK system tests truncation test suite to verify that the
    verifiable consumer (backed by librdkafka, preferably thru the python client)
    handles log truncation properly.
    Talk with Jing about running on Jenkins, and/or try to get it working
    with the local docker/ducker-ak based system testing.
    In the latter case you will need to find a way to get the verifiable clients
    on the docker images that are tested.

  • For high-level language bindings the following changes are needed:

  • Add the ERR__LOG_TRUNCATION error type.
  • Make sure that any relevant uses of rd_kafka_topic_partition_list_t
    and rd_kafka_topic_partition_t now respects and retains the
    leader epoch: use get_leader_epoch() to retrieve and set_leader_epoch()
    to set. Relevant APIs are:
    • seek
    • seek_partitions
    • commit
    • committed
    • position
    • *assign()
    • message_leader_epoch()
  • For .NET: add new method alternatives that takes the new TopicPartitionOffsetEpoch type.
  • The message object now has a leader_epoch() method, for the consumer, for committing
    offsets outside of Kafka.
  • The two convenience functions rd_kafka_buf_[write|read]_topic_partition_list()
    were changed to take an array of field specifiers, this allows these functions
    to be reused for various read/write partition array jobs even
    when the field ordering differs.
    However, the calling signature is variadic, which means existing code that calls
    these functions will compile but not work - this has been fixed in this PR
    of course, but other PRs sub-sequent to this one needs to be fixed too.

@emasab emasab marked this pull request as draft February 1, 2023 15:21
@emasab emasab force-pushed the feature/kip320 branch 2 times, most recently from 6c04083 to 5afd446 Compare March 1, 2023 15:49
@emasab emasab changed the title KIP-320 implementation, WIP KIP-320 implementation Mar 1, 2023
@emasab emasab force-pushed the feature/kip320 branch 2 times, most recently from fcb8a73 to cdb5340 Compare March 3, 2023 11:28
@emasab emasab marked this pull request as ready for review March 8, 2023 13:55
emasab added 12 commits March 10, 2023 11:38
offset validation, allows to retry the validation
if needed
retry instead of refresh, same action as
fenced leader epoch.
The consumer could be fetching from
a follower that is not in sync
using a fetch state check instead.
Fix stale next fetch position update
when fetch hasn't started yet
list, alter, delete consumer group offsets tests
compare returned offsets and epochs
testing TxnOffsetCommit changes.
Mock handler for OffsetForLeaderEpoch
Mock test for seek with leader epoch
@emasab emasab requested review from milindl and pranavrth March 10, 2023 10:43
Copy link
Contributor

@milindl milindl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished a first pass review for all the main parts of KIP-320. Mostly, it looks good to me. Just a few questions in the comments, especially one with reference to KIP-392 interaction with KIP-320.

The mock and test changes aren't a part of the review (and I have not yet made minor/nit comments).

@milindl milindl self-requested a review March 20, 2023 10:41
Copy link
Contributor

@milindl milindl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving it based on some internal discussion. I'll take a look once more but it seems fine to me.

Mainly, we discussed the comment about case3: the implementation in AK client is different from the KIP-392, and the same as here. I had made that comment based on the KIP. Things should be okay.
There are still some changes like log messages (minor) that @emasab is doing but those won't really block this PR from merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants