[SPARK-25497][SQL] Limit operation within whole stage codegen should not consume all the inputs #22630

cloud-fan · 2018-10-04T15:48:41Z

What changes were proposed in this pull request?

This PR is inspired by #22524, but proposes a safer fix.

The current limit whole stage codegen has 2 problems:

It's only applied to InputAdapter, many leaf nodes can't stop earlier w.r.t. limit.
It needs to override a method, which will break if we have more than one limit in the whole-stage.

The first problem is easy to fix, just figure out which nodes can stop earlier w.r.t. limit, and update them. This PR updates RangeExec, ColumnarBatchScan, SortExec, HashAggregateExec.

The second problem is hard to fix. This PR proposes to propagate the limit counter variable name upstream, so that the upstream leaf/blocking nodes can check the limit counter and quit the loop earlier.

For better performance, the implementation here follows CodegenSupport.needStopCheck, so that we only codegen the check only if there is limit in the query. For columnar node like range, we check the limit counter per-batch instead of per-row, to make the inner loop tight and fast.

Why this is safer?

the leaf/blocking nodes don't have to check the limit counter and stop earlier. It's only for performance. (this is same as before)
The blocking operators can stop propagating the limit counter name, because the counter of limit after blocking operators will never increase, before blocking operators consume all the data from upstream operators. So the upstream operators don't care about limit after blocking operators. This is also for performance only, it's OK if we forget to do it for some new blocking operators.

How was this patch tested?

a new test

cloud-fan · 2018-10-04T15:49:17Z

cc @viirya @mgaido91 @kiszk

cloud-fan · 2018-10-04T15:51:36Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

The change here simply moves the inner loop after the batchEnd and metrics update, so that we can get correct metrics when we stop earlier because of limit.

cloud-fan · 2018-10-04T15:54:37Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

I change the test to check whole-stage mode only. The metrics is different between whole-stage and normal mode, and the bug was only in whole-stage.

viirya · 2018-10-04T15:56:06Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala

I have not looked at this in details. But if there is limit before Aggregate? We should not consume all input rows.

let's say the query is range -> limit -> agg -> limit.

So agg does consume all the inputs, from the first limit. The range will have a stop check w.r.t. to first limit, not the second limit. If there is no limit before agg, then range will not have a stop check.

cloud-fan · 2018-10-04T15:56:10Z

sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala

note that this is sub-optimal for adjacent limits, but I think it's fine as optimizer will merge adjacent limits.

… inputs

cloud-fan · 2018-10-04T16:10:07Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala

    def outputFromRegularHashMap: String = {
      s"""
-         |while ($iterTerm.next()) {
+         |while ($iterTerm.next()$keepProducingDataCond) {


Here I only add the stop check for regular hash map. The fast hash map is small and all in memory, it's ok to always output all of it.

SparkQA · 2018-10-04T19:42:58Z

Test build #96944 has finished for PR 22630 at commit d9b54d5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-04T19:51:14Z

Test build #96947 has finished for PR 22630 at commit 13d882a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-04T20:16:43Z

Test build #96948 has finished for PR 22630 at commit 0a6c79a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-10-04T23:13:27Z

This is an interesting change. I like this idea.

viirya · 2018-10-04T23:15:37Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

   */
  def needStopCheck: Boolean = parent.needStopCheck
+
+  def conditionsOfKeepProducingData: Seq[String] = parent.conditionsOfKeepProducingData


Can we have described simply what this two method are here?

mgaido91 · 2018-10-05T10:10:50Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

+    if (parent.limitNotReachedChecks.isEmpty) {
+      ""
+    } else {
+      parent.limitNotReachedChecks.mkString(" && ", " && ", "")


here we are assuming that this is going to be in and with an already existing condition. I don't see a case in which this may be used is a different context as of now, but what about just producing the conditions here and put the initial and outside of this? It may be easier to reuse this. WDYT?

then we will have a lot of places generating the initial &&. If we do have a different context in the future, we can use limitNotReachedChecks directly.

mgaido91 · 2018-10-05T10:18:43Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala

-         |
         |  if (shouldStop()) return;
         |}
+         |$iterTerm.close();


this is an unrelated change, right? It changes nothing in the generated code, right? just want to double-check I am not missing something (what changes is that before we were not doing the cleanup in case of limit operator, instead now we do, I see this).

Yes it's unrelated and is a noop. outputFromRowBasedMap and outputFromVectorizedMap put the resource closing at the end, I want to be consistent here.

mgaido91 · 2018-10-05T11:32:32Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

+    // with partition start. `batchEnd` tracks the end index of the current batch, initialized
+    // with `nextIndex`. In the outer loop, we first check if `nextIndex == batchEnd`. If it's true,
+    // it means the current batch is fully consumed, and we will update `batchEnd` to process the
+    // next batch. If `batchEnd` reaches partition end, exit the outer loop. finally we enter the


Capital case for finally

mgaido91 · 2018-10-05T11:44:35Z

sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala

-      }
-    """, inlineToOuterClass = true)
-    val countTerm = ctx.addMutableState(CodeGenerator.JAVA_INT, "count") // init as count = 0
+    ctx.addMutableState(CodeGenerator.JAVA_INT, countTerm, forceInline = true, useFreshName = false)


why do we need to forceInline?

because the counter variable name is decided before we obtain the CodegenContext. If we don't inline here, we need a way to notify the upstream operators about the counter name, which is hard to do.

mgaido91 · 2018-10-05T11:46:08Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

  }

+  private def collectNodeWithinWholeStage[T <: SparkPlan : ClassTag](plan: SparkPlan): Seq[T] = {
+    val stages = plan.collect {


collectFirst?

we also want to detect the case of 2 whole stage codegen and fail.

mgaido91 · 2018-10-05T11:53:26Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

-    val range = ctx.freshName("range")
    val shouldStop = if (parent.needStopCheck) {
-      s"if (shouldStop()) { $number = $value + ${step}L; return; }"
+      s"if (shouldStop()) { $nextIndex = $value + ${step}L; return; }"


in this case we are not very accurate in the metrics right? I mean we always say that we are returning a full batch, even though we have consumed less rows than a batch.

What about updating the metrics before returning? Something like $inputMetrics.incRecordsRead($localIdx - $localEnd);?

You are right about the problem, but I'm not going to touch this part in this PR. Note that this PR focuses on limit whole stage codegen.

Personally I feel it's ok to make the metrics a little inaccurate for better performance, we can discuss it later in other PRs.

BTW I do have a local branch that fixed this problem, I just don't have time to benchmark it yet. I'll send it out later and let's move the discussion there.

I am not sure why you need a benchmark for this (unless you did something different from what I have suggested in the earlier comment). In that case it is a single metric update which happens only when stopping, it shouldn't introduce any significant overhead. Am I missing something? Anyway let's move the discussion to the next PR then, thanks.

Something like $inputMetrics.incRecordsRead($localIdx - $localEnd);?

localIdx is purely local to the loop, if we access it outside of the loop, we need to define localIdx outside of loop as well. This may have some performance penalty. cc @kiszk

but shouldStop is called local to the loop, isn't it?

shouldStop is called local, but metrics updating is not.

Anyway, JVM JIT is mysterious and we need to be super careful when updating this kind of hot loops. That said, I'm not confident of any changes to the hot loop without a benchmark.

ok, let's get back to this eventually later, this is anyway not worse than before.

Sorry for late comment. It would be good to discuss detail in another PR.

At first, I agree with necessary of benchmarking. Here are my thoughts.

I think that localIdx can be defined as local variable outside of the loop. Or, how about storing localIdx to another local variable only if parent.needStopCheck is true.

Since shouldStop() is simply without updating, we expect the JIT applies inlining and some optimizations.

If we want to call incRecordRead, it would be good to exit a loop using break and then call incRecordRead.

SparkQA · 2018-10-05T12:22:46Z

Test build #96983 has finished for PR 22630 at commit 7404fe9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-05T12:34:15Z

Test build #96984 has finished for PR 22630 at commit a31f601.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-05T12:48:20Z

Test build #96986 has finished for PR 22630 at commit 2188b27.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-05T12:52:44Z

Test build #96985 has finished for PR 22630 at commit 51ce7be.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-10-05T12:42:38Z

sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala

+  def newLimitCountTerm(): String = {
+    val id = curId.getAndIncrement()
+    s"_limit_counter_$id"
+  }


Can't we use freshName?

there is no CodegenContext here.

see MapObjects.apply as an existing example.

viirya · 2018-10-05T13:17:54Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala

    val inputRow = if (needsUnsafeRowConversion) null else row
    s"""
-       |while ($input.hasNext()) {
+       |while ($input.hasNext()$limitNotReachedCond) {


We can put limitNotReachedCond as first condition to avoid possible buffering of row.

mgaido91 · 2018-10-05T15:35:44Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala

+  // upstream operators. Here we override this method to return Nil, so that upstream operators will
+  // not generate useless conditions (which are always evaluated to false) for the Limit operators
+  // after Sort.
+  override def limitNotReachedChecks: Seq[String] = Nil


it seems that all blocking operators will have this behavior. Shall we rather have a blockingOperator flag def and make this a final function incorporating this logic there?

It's only done in Sort and Aggregate currently. I don't want to overdesign it until there are more use cases.

I am fine to do it later, but I'd like to avoid to have other places where we duplicate this logic in the future in order to avoid possible mistakes.

mgaido91 · 2018-10-05T15:40:24Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

+   * limit-not-reached checks.
+   */
+  final def limitNotReachedCond: String = {
+    if (parent.limitNotReachedChecks.isEmpty) {


Just one thought: since we propagate (correctly) the limitNotReachedChecks to all the children, shall we also enforce that we are calling this on a node which will not propagate the limitNotReachedChecks anymore? We may use the blocking flag proposed in the other comment maybe.

The reason I'd like to do this is to enforce that we are not introducing the same limit condition check more than once, in more than one operator, which would be useless and may cause (small) perf issue. WDYT?

It's not very useful to enforce that. The consequence is so minor and I don't think it's worth the complexity. I want to have a simple and robust framework for the limit optimization first.

I want to have a simple and robust framework

yes, I 100%, that's why I'd like to early detect all the possible situations which we are not thinking as possible but may happen in corner cases we are not considering. What I am suggesting here is to enforce and fail that for testing only of course, in production we shouldn't do anything similar.

SparkQA · 2018-10-05T18:23:08Z

Test build #96996 has finished for PR 22630 at commit e0bc621.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-05T18:23:27Z

Test build #96997 has finished for PR 22630 at commit dc2dfa5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-10-06T15:25:32Z

LGTM

mgaido91 · 2018-10-07T07:10:47Z

LGTM apart the minor comments which we can address also later

SparkQA · 2018-10-08T05:41:18Z

Test build #97099 has finished for PR 22630 at commit d815c0b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait BlockingOperatorWithCodegen extends CodegenSupport

SparkQA · 2018-10-08T07:05:01Z

Test build #97101 has finished for PR 22630 at commit e61078b.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait BlockingOperatorWithCodegen extends CodegenSupport

viirya · 2018-10-08T07:54:29Z

retest this please.

mgaido91

LGTM apart one nit, thanks for your work on this @cloud-fan and @viirya

mgaido91 · 2018-10-08T08:08:15Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

  final def limitNotReachedCond: String = {
+    // InputAdapter is also a leaf node.
+    val isLeafNode = children.isEmpty || this.isInstanceOf[InputAdapter]
+    assert(isLeafNode || this.isInstanceOf[BlockingOperatorWithCodegen],


nit: shall we do this only if Utils.isTesting and otherwise just emit a warning maybe?

ah good idea!

viirya

LGTM

viirya · 2018-10-08T10:41:43Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

    val isLeafNode = children.isEmpty || this.isInstanceOf[InputAdapter]
-    assert(isLeafNode || this.isInstanceOf[BlockingOperatorWithCodegen],
-      "only leaf nodes and blocking nodes need to call this method in its data producing loop.")
+    if (isLeafNode || this.isInstanceOf[BlockingOperatorWithCodegen]) {


if (!isLeafNode && !this.isInstanceOf[BlockingOperatorWithCodegen])?

mgaido91 · 2018-10-08T10:45:03Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

+    // InputAdapter is also a leaf node.
+    val isLeafNode = children.isEmpty || this.isInstanceOf[InputAdapter]
+    if (isLeafNode || this.isInstanceOf[BlockingOperatorWithCodegen]) {
+      val errMsg = "only leaf nodes and blocking nodes need to call 'limitNotReachedCond' " +


nit: Only

mgaido91 · 2018-10-08T10:46:37Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

+      if (Utils.isTesting) {
+        throw new IllegalStateException(errMsg)
+      } else {
+        logWarning(errMsg)


nit: shall we also mention to report to the community if seen?

SparkQA · 2018-10-08T11:48:30Z

Test build #97106 has finished for PR 22630 at commit e61078b.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait BlockingOperatorWithCodegen extends CodegenSupport

mgaido91 · 2018-10-08T12:35:14Z

LGTM

SparkQA · 2018-10-08T15:18:23Z

Test build #97109 has finished for PR 22630 at commit 9114107.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-08T17:41:39Z

Test build #97111 has finished for PR 22630 at commit eac31b2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-10-08T19:04:52Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

+      case w: WholeStageCodegenExec => w
+    }
+    assert(stages.length == 1, "The query plan should have one and only one whole-stage.")
+    stages.head


nit: Do we need this line?

kiszk · 2018-10-08T19:34:31Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

+    val loopCondition = if (limitNotReachedChecks.isEmpty) {
+      "true"
+    } else {
+      limitNotReachedChecks.mkString(" && ")


nit: I am a bit affraid about 64KB Java bytecode overflow by using mkString. On the other hand, I understand that this condition generation is performance sensitive.

This is whole-stage-codege. If bytecode overfolow happens, we will fallback

kiszk · 2018-10-08T19:34:43Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

+    if (parent.limitNotReachedChecks.isEmpty) {
+      ""
+    } else {
+      parent.limitNotReachedChecks.mkString("", " && ", " &&")


nit: I am a bit affraid about 64KB Java bytecode overflow by using mkString. On the other hand, I understand that this condition generation is performance sensitive.

SparkQA · 2018-10-09T03:14:51Z

Test build #97136 has finished for PR 22630 at commit 4fc4301.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-10-09T06:35:19Z

LGTM

kiszk · 2018-10-09T07:52:07Z

Thanks! merging to master

…not consume all the inputs ## What changes were proposed in this pull request? This PR is inspired by apache#22524, but proposes a safer fix. The current limit whole stage codegen has 2 problems: 1. It's only applied to `InputAdapter`, many leaf nodes can't stop earlier w.r.t. limit. 2. It needs to override a method, which will break if we have more than one limit in the whole-stage. The first problem is easy to fix, just figure out which nodes can stop earlier w.r.t. limit, and update them. This PR updates `RangeExec`, `ColumnarBatchScan`, `SortExec`, `HashAggregateExec`. The second problem is hard to fix. This PR proposes to propagate the limit counter variable name upstream, so that the upstream leaf/blocking nodes can check the limit counter and quit the loop earlier. For better performance, the implementation here follows `CodegenSupport.needStopCheck`, so that we only codegen the check only if there is limit in the query. For columnar node like range, we check the limit counter per-batch instead of per-row, to make the inner loop tight and fast. Why this is safer? 1. the leaf/blocking nodes don't have to check the limit counter and stop earlier. It's only for performance. (this is same as before) 2. The blocking operators can stop propagating the limit counter name, because the counter of limit after blocking operators will never increase, before blocking operators consume all the data from upstream operators. So the upstream operators don't care about limit after blocking operators. This is also for performance only, it's OK if we forget to do it for some new blocking operators. ## How was this patch tested? a new test Closes apache#22630 from cloud-fan/limit. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Kazuaki Ishizaki <[email protected]>

cloud-fan commented Oct 4, 2018

View reviewed changes

viirya reviewed Oct 4, 2018

View reviewed changes

cloud-fan commented Oct 4, 2018

View reviewed changes

Limit operation within whole stage codegen should not consume all the…

13d882a

… inputs

cloud-fan force-pushed the limit branch from d9b54d5 to 13d882a Compare October 4, 2018 16:06

cloud-fan commented Oct 4, 2018

View reviewed changes

improve doc

0a6c79a

viirya reviewed Oct 4, 2018

View reviewed changes

cloud-fan force-pushed the limit branch from 7404fe9 to a31f601 Compare October 5, 2018 08:38

address comment

51ce7be

cloud-fan force-pushed the limit branch from a31f601 to 51ce7be Compare October 5, 2018 08:45

small update

2188b27

mgaido91 reviewed Oct 5, 2018

View reviewed changes

viirya reviewed Oct 5, 2018

View reviewed changes

viirya mentioned this pull request Oct 5, 2018

[SPARK-25497][SQL] Limit operation within whole stage codegen should not consume all the inputs #22524

Closed

cloud-fan added 2 commits October 5, 2018 22:29

address comments

e0bc621

add comment

dc2dfa5

mgaido91 reviewed Oct 5, 2018

View reviewed changes

create a trait for blocking operators

e61078b

cloud-fan force-pushed the limit branch from d815c0b to e61078b Compare October 8, 2018 06:02

mgaido91 approved these changes Oct 8, 2018

View reviewed changes

viirya approved these changes Oct 8, 2018

View reviewed changes

address comment

9114107

viirya reviewed Oct 8, 2018

View reviewed changes

mgaido91 reviewed Oct 8, 2018

View reviewed changes

address comments

eac31b2

kiszk reviewed Oct 8, 2018

View reviewed changes

address comment

4fc4301

asfgit closed this in e3133f4 Oct 9, 2018

[SPARK-25497][SQL] Limit operation within whole stage codegen should not consume all the inputs #22630

[SPARK-25497][SQL] Limit operation within whole stage codegen should not consume all the inputs #22630

Uh oh!

Conversation

cloud-fan commented Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Oct 4, 2018

Uh oh!

cloud-fan Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 4, 2018

Uh oh!

SparkQA commented Oct 4, 2018

Uh oh!

SparkQA commented Oct 4, 2018

Uh oh!

viirya commented Oct 4, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiszk Oct 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 5, 2018

Uh oh!

SparkQA commented Oct 5, 2018

Uh oh!

SparkQA commented Oct 5, 2018

cloud-fan commented Oct 4, 2018 •

edited

Loading

cloud-fan Oct 4, 2018 •

edited

Loading

viirya Oct 4, 2018 •

edited

Loading

kiszk Oct 8, 2018 •

edited

Loading