[SPARK-25005][SS]Support non-consecutive offsets for Kafka #22042

zsxwing · 2018-08-08T17:51:58Z

What changes were proposed in this pull request?

As the user uses Kafka transactions to write data, the offsets in Kafka will be non-consecutive. It will contains some transaction (commit or abort) markers. In addition, if the consumer's isolation.level is read_committed, poll will not return aborted messages either. Hence, we will see non-consecutive offsets in the date returned by poll. However, as seekToEnd may move the offset point to these missing offsets, there are 4 possible corner cases we need to support:

The whole batch contains no data messages
The first offset in a batch is not a committed data message
The last offset in a batch is not a committed data message
There is a gap in the middle of a batch

They are all covered by the new unit tests.

How was this patch tested?

The new unit tests.

zsxwing · 2018-08-08T17:52:43Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala

    offsetRanges.zipWithIndex.map { case (o, i) => new KafkaSourceRDDPartition(i, o) }.toArray
  }

-  override def count(): Long = offsetRanges.map(_.size).sum


The assumption in these methods is no longer right, so remove them.

Goooood catch. That would have never occurred to me!

zsxwing · 2018-08-08T17:53:51Z

cc @tdas

SparkQA · 2018-08-08T20:03:43Z

Test build #94441 has finished for PR 22042 at commit dc18a6f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-08T20:03:48Z

Test build #94446 has finished for PR 22042 at commit dfea7e3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas

Review Round 1: I reviewed the non-test code, and I think it deserves a bit of refactoring. My most important point is that the why create a new exception type when you can handle that condition in the fetchData method itself.

tdas · 2018-08-13T00:21:45Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

    ju.Collections.emptyIterator[ConsumerRecord[Array[Byte], Array[Byte]]]
  @volatile private var nextOffsetInFetchedData = UNKNOWN_OFFSET

+  @volatile private var offsetBeforePoll: Long = UNKNOWN_OFFSET


Can you add some docs to explain what these 2 vars siginify and why these vars are needed?

tdas · 2018-08-13T00:23:43Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

+ * "isolation.level" is "read_committed". The offsets in the range [offset, nextOffsetToFetch) are
+ * missing. In order words, `nextOffsetToFetch` indicates the next offset to fetch.
+ */
+private[kafka010] class MissingOffsetException(


nit: Is this meant to be used outside this KafkaDataConsumer class? If not, then maybe make it an inner class to KafkaDataConsumer.

tdas · 2018-08-13T00:24:53Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

+ * missing. In order words, `nextOffsetToFetch` indicates the next offset to fetch.
+ */
+private[kafka010] class MissingOffsetException(
+    val offset: Long,


maybe rename offset to something like missingOffset. Its weird to have a generic named field "offset" next to a specifically named field "nextOffsetToFetch".

tdas · 2018-08-13T00:53:11Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

   */
  private def fetchData(
      offset: Long,
      untilOffset: Long,


Update docs of this method saying that it can throw MissingOffsetException and what it means?

tdas · 2018-08-13T00:59:11Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

      poll(pollTimeoutMs)
+    } else if (!fetchedData.hasNext) {
+      // The last pre-fetched data has been drained.
+      if (offset < offsetAfterPoll) {


Its hard to understand this condition because it hard to understand what offsetAfterPoll means? Does it refer to the offset that will be fetched next by the KafkaConsumer?

tdas · 2018-08-13T01:29:45Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

  }

  private def poll(pollTimeoutMs: Long): Unit = {
+    offsetBeforePoll = consumer.position(topicPartition)


This variable offsetBeforePoll seems to be only used to identify whether data was actually fetched in a poll and nothing else. Rather than define another var (there are already many that already confusing), why not just return a boolean from poll which is true or false depending on whether poll moved anything.

tdas · 2018-08-13T01:37:03Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

        throw new OffsetOutOfRangeException(
          Map(topicPartition -> java.lang.Long.valueOf(offset)).asJava)
-      } else {
+      } else if (offsetBeforePoll == offsetAfterPoll) {


Just to be clear, can this happen only if there is a timeout?? And if so then why push this condition and exception into the poll() method thus simplifying this method?

tdas · 2018-08-13T01:51:02Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

+          s"seek to $offset and poll but the offset was reset to $offsetAfterPoll")
+        throw new MissingOffsetException(offset, offsetAfterPoll)
      }
    } else {


Let's remove this else and reduce the condition nesting. The previous if statement always ends in an exception, so we can remove this else.

tdas · 2018-08-13T02:12:04Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

+      if (offset < offsetAfterPoll) {
+        // Offsets in [offset, offsetAfterPoll) are missing. We should skip them.
+        resetFetchedData()
+        throw new MissingOffsetException(offset, offsetAfterPoll)


So MissingOffsetRange is only used to signal that some offset may be missing due to control messages and nothing else. And the higher function (i.e. get) just handles it by resetting the fetched offsets. Why not let this fetchData method handle the situation instead of creating a new exception just for this?

tdas · 2018-08-13T02:14:37Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala

    offsetRanges.zipWithIndex.map { case (o, i) => new KafkaSourceRDDPartition(i, o) }.toArray
  }

-  override def count(): Long = offsetRanges.map(_.size).sum


Goooood catch. That would have never occurred to me!

SparkQA · 2018-08-15T18:28:33Z

Test build #94808 has finished for PR 22042 at commit baef29f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-15T18:34:03Z

Test build #94809 has finished for PR 22042 at commit a9b00b4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas

This first level of refactoring looks much better. But I think we can do more.

tdas · 2018-08-15T21:56:35Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

+   * `read_committed`), it will be skipped and this method will try to fetch next available record
+   * within [offset, untilOffset).
+   *
+   * This method also will try the best to detect data loss. If `failOnDataLoss` is `false`, it will


if failOnDataLoss is true then it should throw exception... isnt it?

nit: try its best

tdas · 2018-08-15T21:57:33Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

+   * within [offset, untilOffset).
+   *
+   * This method also will try the best to detect data loss. If `failOnDataLoss` is `false`, it will
+   * throw an exception when we detect an unavailable offset. If `failOnDataLoss` is `true`, this


Will we throw an exception even when its a control message and there is no real data loss?

Will we throw an exception even when its a control message and there is no real data loss?

No. It will be skipped and this method will try to fetch next available record within [offset, untilOffset).

tdas · 2018-08-15T21:59:19Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

+   * instead.
+   */
+  private case class FetchedRecord(
+    record: Option[ConsumerRecord[Array[Byte], Array[Byte]]],


Can;t we reuse the objects here. And do we need to have an Option, thus creating a lot of Option objects all the time?

tdas · 2018-08-17T17:51:07Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

    val r = p.records(topicPartition)
    logDebug(s"Polled $groupId ${p.partitions()}  ${r.size}")
-    fetchedData = r.iterator
+    offsetAfterPoll = consumer.position(topicPartition)


I strongly think that this should not be a var, rather a clear return value. we have been burnt by too many mutable vars/defs (see all the flakiness caused by the structured ProgressReporter) and we should consciously try to improve this everywhere by not having vars all over the place.

zsxwing · 2018-08-21T23:03:28Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

+      poll(offset, pollTimeoutMs)
+    } else if (!fetchedData.hasNext) {
+      // The last pre-fetched data has been drained.
+      if (offset < offsetAfterPoll) {


this is the place preventing me from making offsetAfterPoll be a local var.

SparkQA · 2018-08-21T23:14:58Z

Test build #95056 has finished for PR 22042 at commit f379d47.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-22T00:11:05Z

Test build #95063 has finished for PR 22042 at commit a06742f.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

tdas

The refactor with FetchedData looks cleaner. But this needs a bit more work, especially on the test side. Left a whole bunch questions and comments.

tdas · 2018-08-22T00:03:17Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

+   */
+  private case class FetchedData(
+      private var records: ju.ListIterator[ConsumerRecord[Array[Byte], Array[Byte]]],
+      var nextOffsetInFetchedData: Long,


Make this public getter, private setter.

tdas · 2018-08-22T00:42:48Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

+    }
+
+    if (!fetchedData.hasNext) {
+      assert(offset <= fetchedData.offsetAfterPoll,


Add comments here on what this case means.

tdas · 2018-08-22T00:45:16Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

   * @throws TimeoutException if cannot fetch the record in `pollTimeoutMs` milliseconds.
   */
  private def fetchData(
      offset: Long,


Maybe rename this method to fetchRecord, to make it consistent with return type.

tdas · 2018-08-22T00:45:51Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

+   * @throws TimeoutException if the consumer position is not changed after polling. It means the
+   *                          consumer polls nothing before timeout.
+   */
+  private def poll(offset: Long, pollTimeoutMs: Long): Unit = {


Maybe rename this method to be consistent with that it does .... fetch data.

tdas · 2018-08-22T00:46:12Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

-      if (offset < range.earliest || offset >= range.latest) {
-        throw new OffsetOutOfRangeException(
-          Map(topicPartition -> java.lang.Long.valueOf(offset)).asJava)
+      poll(offset, pollTimeoutMs)


comment that this method updates fetchedData

tdas · 2018-08-22T01:15:26Z

...kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala

+        CheckAnswer((1 to 5) ++ (11 to 13): _*), // offset: 12, 13, 14
+        AdvanceManualClock(100),
+        waitUntilBatchProcessed,
+        CheckAnswer((1 to 5) ++ (11 to 15): _*),  // offset: 15, 16, 17*


Use CheckNewAnswer instead cumulative CheckAnswer.

tdas · 2018-08-22T01:16:34Z

...kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala

+      true
+    }
+
+    val producer = testUtils.createProducer(usingTrascation = true)


You could define a testWithProducer method and wrap the finally in it.

tdas · 2018-08-22T01:19:32Z

...kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala

+        // 1 from smallest, 1 from middle, 8 from biggest
+        CheckAnswer(),
+        WithKafkaProducer(topic, producer) { producer =>
+          // Send 5 messages. They should be visible only after being committed.


Why so? This read_uncommitted, right?

tdas · 2018-08-22T01:20:12Z

...kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala

+        },
+        AdvanceManualClock(100),
+        waitUntilBatchProcessed,
+        CheckAnswer(1 to 3: _*), // offset 0, 1, 2


Why only 3 records when 1 to 5 has been sent already and we are reading uncommitted data?

Why only 3 records when 1 to 5 has been sent already and we are reading uncommitted data?

I'm using maxOffsetsPerTrigger = 3 to cut the batches on purpose. Otherwise, it's really hard to cover all of cases.

tdas · 2018-08-22T01:21:23Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala

    props
  }

+  def createProducer(usingTrascation: Boolean): KafkaProducer[String, String] = {


nit: usingTrascation -> usingTranscation

SparkQA · 2018-08-22T18:03:54Z

Test build #95115 has finished for PR 22042 at commit 603e0bc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas

Just a few nits here there. This looks pretty good. Thank you for doing this. It was tricky to deal with Kafka transactions.

tdas · 2018-08-24T02:03:03Z

...kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReadSupport.scala

    offsetRange.topicPartition, executorKafkaParams, reuseKafkaConsumer)

  private val rangeToRead = resolveRange(offsetRange)
+


unnecessary

tdas · 2018-08-24T02:27:39Z

...kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala

+  object WithKafkaProducer {
+    def apply(
+        topic: String,
+        producer: KafkaProducer[String, String])(


Ping on this comment. Maybe you missed this?

tdas · 2018-08-24T04:51:27Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala

            reportDataLoss(false, s"Skip missing records in [$offset, ${record.offset})")
-            record
+            fetchedRecord.withRecord(record, fetchedData.nextOffsetInFetchedData)
          }


nit: This can be unnested.
if ... else { if ... else ... } -> if ... else if .. else

tdas · 2018-08-24T05:05:25Z

...kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala

+    def apply(
+        topic: String,
+        producer: KafkaProducer[String, String])(
+        func: KafkaProducer[String, String] => Unit): AssertOnQuery = {


nit: AssertOnQuery -> StreamAction

tdas · 2018-08-24T05:07:44Z

...kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala

      s"AddKafkaData(topics = $topics, data = $data, message = $message)"
  }

+  object WithKafkaProducer {


nit: This is not creating a KafkaProducer .. as most With*** methods. The point of this is to force synchronization of the consumer. So maybe rename it to WithOffsetSync { ... }?

SparkQA · 2018-08-24T22:58:35Z

Test build #95236 has finished for PR 22042 at commit 7a02921.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2018-08-25T16:20:58Z

This patch fails Spark unit tests.

This is the flaky test I fixed in #22230

retest this please

zsxwing · 2018-08-27T16:53:01Z

retest this please

SparkQA · 2018-08-27T17:31:24Z

Test build #95297 has finished for PR 22042 at commit 7a02921.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-27T23:08:32Z

Test build #95319 has finished for PR 22042 at commit ea804cf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2018-08-28T06:44:42Z

LGTM.

zsxwing · 2018-08-28T15:37:05Z

Thanks! Merging to master.

## What changes were proposed in this pull request? As the user uses Kafka transactions to write data, the offsets in Kafka will be non-consecutive. It will contains some transaction (commit or abort) markers. In addition, if the consumer's `isolation.level` is `read_committed`, `poll` will not return aborted messages either. Hence, we will see non-consecutive offsets in the date returned by `poll`. However, as `seekToEnd` may move the offset point to these missing offsets, there are 4 possible corner cases we need to support: - The whole batch contains no data messages - The first offset in a batch is not a committed data message - The last offset in a batch is not a committed data message - There is a gap in the middle of a batch They are all covered by the new unit tests. ## How was this patch tested? The new unit tests. Closes apache#22042 from zsxwing/kafka-transaction-read. Authored-by: Shixiong Zhu <[email protected]> Signed-off-by: Shixiong Zhu <[email protected]>

Support non-consecutive offsets for Kafka

dc18a6f

zsxwing commented Aug 8, 2018

View reviewed changes

minor

dfea7e3

tdas suggested changes Aug 13, 2018

View reviewed changes

zsxwing added 2 commits August 15, 2018 10:53

address

baef29f

update comment

a9b00b4

tdas suggested changes Aug 17, 2018

View reviewed changes

reuse FetchedRecord

c98bf50

zsxwing commented Aug 21, 2018

View reviewed changes

one more place

f379d47

Add FetchedData

a06742f

tdas suggested changes Aug 22, 2018

View reviewed changes

zsxwing added 4 commits August 22, 2018 01:02

Merge remote-tracking branch 'origin/master' into kafka-transaction-read

3c72b80

Merge remote-tracking branch 'origin/master' into kafka-transaction-read

f4b5f72

Address TD's comments

e0d2c4d

minor

603e0bc

tdas approved these changes Aug 24, 2018

View reviewed changes

zsxwing added 2 commits August 24, 2018 15:03

Merge remote-tracking branch 'origin/master' into kafka-transaction-read

0d0b8c5

address

7a02921

Merge remote-tracking branch 'origin/master' into kafka-transaction-read

ea804cf

asfgit closed this in 1149c4e Aug 28, 2018

zsxwing deleted the kafka-transaction-read branch August 28, 2018 17:14

		offsetRange.topicPartition, executorKafkaParams, reuseKafkaConsumer)

		private val rangeToRead = resolveRange(offsetRange)

[SPARK-25005][SS]Support non-consecutive offsets for Kafka #22042

[SPARK-25005][SS]Support non-consecutive offsets for Kafka #22042

Uh oh!

Conversation

zsxwing commented Aug 8, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zsxwing commented Aug 8, 2018

Uh oh!

SparkQA commented Aug 8, 2018

Uh oh!

SparkQA commented Aug 8, 2018

Uh oh!

tdas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 15, 2018

Uh oh!

SparkQA commented Aug 15, 2018

Uh oh!

tdas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 21, 2018

Uh oh!

SparkQA commented Aug 22, 2018

Uh oh!

tdas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment