[SPARK-13981][SQL] Defer evaluating variables within Filter operator. #11792

nongli · 2016-03-17T20:01:39Z

What changes were proposed in this pull request?

This improves the Filter codegen for NULLs by deferring loading the values for IsNotNull.
Instead of generating code like:

boolean isNull = ...
int value = ...
if (isNull) continue;

we will generate:
boolean isNull = ...
if (isNull) continue;
int value = ...

This is useful since retrieving the values can be non-trivial (they can be dictionary encoded
among other things). This currently only works when the attribute comes from the column batch
but could be extended to other cases in the future.

How was this patch tested?

On tpcds q55, this fixes the regression from introducing the IsNotNull predicates.

TPCDS Snappy:                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
--------------------------------------------------------------------------------
q55                                      4564 / 5036         25.2          39.6
q55                                      4064 / 4340         28.3          35.3

nongli · 2016-03-17T20:04:46Z

Generated code before:

      InternalRow scan_row = scan_batch.getRow(scan_batchIdx++);
      /* input[0, int] */
      boolean scan_isNull = scan_row.isNullAt(0);
      int scan_value = scan_isNull ? -1 : (scan_row.getInt(0));
      /* input[1, int] */
      boolean scan_isNull1 = scan_row.isNullAt(1);
      int scan_value1 = scan_isNull1 ? -1 : (scan_row.getInt(1));

      if (scan_isNull) continue;
      if (scan_isNull1) continue;

      /* (input[0, int] >= 2452245) */
      boolean filter_value = false;
      filter_value = scan_value >= 2452245;
      if (!filter_value) continue;

      /* (input[0, int] <= 2452275) */
      boolean filter_value3 = false;
      filter_value3 = scan_value <= 2452275;
      if (!filter_value3) continue;

      filter_metricValue.add(1);

with this patch

      InternalRow scan_row = scan_batch.getRow(scan_batchIdx++);

    /* input[0, int] */
    boolean scan_isNull = scan_row.isNullAt(0);
    int scan_value = scan_isNull ? -1 : (scan_row.getInt(0));

    if (!(!(scan_isNull))) continue;

    /* (input[0, int] >= 2452245) */
    boolean filter_value2 = false;
    filter_value2 = scan_value >= 2452245;
    if (!filter_value2) continue;

    /* (input[0, int] <= 2452275) */
    boolean filter_value5 = false;
    filter_value5 = scan_value <= 2452275;
    if (!filter_value5) continue;

    /* input[1, int] */
    boolean scan_isNull1 = scan_row.isNullAt(1);
    int scan_value1 = scan_isNull1 ? -1 : (scan_row.getInt(1));

    if (!(!(scan_isNull1))) continue;

    filter_metricValue.add(1);

SparkQA · 2016-03-17T21:33:17Z

Test build #53463 has finished for PR 11792 at commit 29e408d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-03-17T22:44:27Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

output will have different nullability than child.output, we should use child.output here.

nwm, it's nice trick to use output here, since we already generate the code for BoundReference, the nullablity of it does not matter, but help to simplify the code of expressions.

Could you add a comment for that?

SparkQA · 2016-03-21T23:33:33Z

Test build #53715 has finished for PR 11792 at commit dd35e2e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-03-21T23:38:14Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

It's better to use notNullAttributes ++ otherPreds to make sure that IsNotNull always come first.

That defeats the point of this. Then we're referencing all the attributes first before the predicates.

I can not find where this assumption (IsNotNull come before) is guarateed, if not, we could generate wrong results.

Also, the conjuncts is in the sequence like this:

IsNotNull(A), IsNotNull(B), A > xx, B > xxx

We should re-order them explicitly, make sue they are in this order:

IsNotNull(A), A > xx, IsNotNull(B), B > xxx

SparkQA · 2016-03-24T22:39:16Z

Test build #54107 has finished for PR 11792 at commit 5d71fc5.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ExprCode(var _code: String, var isNull: String, var value: String,

nongli · 2016-03-24T22:40:43Z

@davies I did this a different way. Let me know your thoughts.

… and NULL improvements. This improves the Filter codegen to optimize IsNotNull filters which are common. This patch defers loading attributes as late as possible within the filter operator. This takes advantage of short-circuiting. Instead of generating code like: boolean isNull = ... int value = ... boolean isNull2 = ... int value2 = ... if (isNull) continue; we will generate: boolean isNull = ... int value = ... if (isNull) continue; int value2 = ... if (isNull) continue; On tpcds q55, this fixes the regression from introducing the IsNotNull predicates. TPCDS Snappy: Best/Avg Time(ms) Rate(M/s) Per Row(ns) -------------------------------------------------------------------------------- q55 4564 / 5036 25.2 39.6 q55 4064 / 4340 28.3 35.3

SparkQA · 2016-03-26T00:02:06Z

Test build #54228 has finished for PR 11792 at commit 8f5dfc9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-03-28T17:56:12Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

+        val idx = notNullPreds.indexWhere { n => n.asInstanceOf[IsNotNull].child.semanticEquals(r)}
+        if (idx != -1 && !generatedIsNotNullChecks(idx)) {
+          // Use the child's output. The nullability is what the child produced.
+          val code = genPredicate(notNullPreds(idx), input, child.output)


generatedIsNotNullChecks(idx) = true genPredicate(notNullPreds(idx), input, child.output)

davies · 2016-03-28T17:59:40Z

LGTM, just two minor comments.

SparkQA · 2016-03-28T20:21:22Z

Test build #54342 has finished for PR 11792 at commit 1a77012.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-28T21:46:47Z

Test build #2702 has finished for PR 11792 at commit 1a77012.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-29T01:48:53Z

Test build #2706 has finished for PR 11792 at commit 1a77012.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-03-29T03:31:27Z

Thanks - merging in master.

davies reviewed Mar 17, 2016
View reviewed changes

davies reviewed Mar 21, 2016
View reviewed changes

nongli force-pushed the spark-13981 branch from dd35e2e to 5d71fc5 Compare March 24, 2016 22:36

nongli force-pushed the spark-13981 branch from 5d71fc5 to 8f5dfc9 Compare March 25, 2016 22:25

davies reviewed Mar 28, 2016
View reviewed changes

CR

1a77012

asfgit closed this in 4a55c33 Mar 29, 2016

nongli deleted the spark-13981 branch March 29, 2016 18:18

JoshRosen mentioned this pull request Jun 1, 2019

[SPARK-27915][SQL][WIP] Update logical Filter's output nullability based on IsNotNull conditions #24765

Closed

[SPARK-13981][SQL] Defer evaluating variables within Filter operator. #11792

[SPARK-13981][SQL] Defer evaluating variables within Filter operator. #11792

Uh oh!

Conversation

nongli commented Mar 17, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

nongli commented Mar 17, 2016

Uh oh!

SparkQA commented Mar 17, 2016

Uh oh!

davies Mar 17, 2016

Choose a reason for hiding this comment

Uh oh!

davies Mar 17, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 21, 2016

Uh oh!

davies Mar 21, 2016

Choose a reason for hiding this comment

Uh oh!

nongli Mar 22, 2016

Choose a reason for hiding this comment

Uh oh!

davies Mar 22, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 24, 2016

Uh oh!

nongli commented Mar 24, 2016

Uh oh!

SparkQA commented Mar 26, 2016

Uh oh!

davies Mar 28, 2016

Choose a reason for hiding this comment

Uh oh!

davies commented Mar 28, 2016

Uh oh!

SparkQA commented Mar 28, 2016

Uh oh!

SparkQA commented Mar 28, 2016

Uh oh!

SparkQA commented Mar 29, 2016

Uh oh!

rxin commented Mar 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants