Skip to content

Conversation

@nongli
Copy link
Contributor

@nongli nongli commented Mar 17, 2016

What changes were proposed in this pull request?

This improves the Filter codegen for NULLs by deferring loading the values for IsNotNull.
Instead of generating code like:

boolean isNull = ...
int value = ...
if (isNull) continue;

we will generate:
boolean isNull = ...
if (isNull) continue;
int value = ...

This is useful since retrieving the values can be non-trivial (they can be dictionary encoded
among other things). This currently only works when the attribute comes from the column batch
but could be extended to other cases in the future.

How was this patch tested?

On tpcds q55, this fixes the regression from introducing the IsNotNull predicates.

TPCDS Snappy:                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
--------------------------------------------------------------------------------
q55                                      4564 / 5036         25.2          39.6
q55                                      4064 / 4340         28.3          35.3

@nongli
Copy link
Contributor Author

nongli commented Mar 17, 2016

Generated code before:

      InternalRow scan_row = scan_batch.getRow(scan_batchIdx++);
      /* input[0, int] */
      boolean scan_isNull = scan_row.isNullAt(0);
      int scan_value = scan_isNull ? -1 : (scan_row.getInt(0));
      /* input[1, int] */
      boolean scan_isNull1 = scan_row.isNullAt(1);
      int scan_value1 = scan_isNull1 ? -1 : (scan_row.getInt(1));

      if (scan_isNull) continue;
      if (scan_isNull1) continue;

      /* (input[0, int] >= 2452245) */
      boolean filter_value = false;
      filter_value = scan_value >= 2452245;
      if (!filter_value) continue;

      /* (input[0, int] <= 2452275) */
      boolean filter_value3 = false;
      filter_value3 = scan_value <= 2452275;
      if (!filter_value3) continue;

      filter_metricValue.add(1);

with this patch

      InternalRow scan_row = scan_batch.getRow(scan_batchIdx++);

    /* input[0, int] */
    boolean scan_isNull = scan_row.isNullAt(0);
    int scan_value = scan_isNull ? -1 : (scan_row.getInt(0));

    if (!(!(scan_isNull))) continue;

    /* (input[0, int] >= 2452245) */
    boolean filter_value2 = false;
    filter_value2 = scan_value >= 2452245;
    if (!filter_value2) continue;

    /* (input[0, int] <= 2452275) */
    boolean filter_value5 = false;
    filter_value5 = scan_value <= 2452275;
    if (!filter_value5) continue;

    /* input[1, int] */
    boolean scan_isNull1 = scan_row.isNullAt(1);
    int scan_value1 = scan_isNull1 ? -1 : (scan_row.getInt(1));

    if (!(!(scan_isNull1))) continue;

    filter_metricValue.add(1);

@SparkQA
Copy link

SparkQA commented Mar 17, 2016

Test build #53463 has finished for PR 11792 at commit 29e408d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output will have different nullability than child.output, we should use child.output here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nwm, it's nice trick to use output here, since we already generate the code for BoundReference, the nullablity of it does not matter, but help to simplify the code of expressions.

Could you add a comment for that?

@SparkQA
Copy link

SparkQA commented Mar 21, 2016

Test build #53715 has finished for PR 11792 at commit dd35e2e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use notNullAttributes ++ otherPreds to make sure that IsNotNull always come first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That defeats the point of this. Then we're referencing all the attributes first before the predicates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not find where this assumption (IsNotNull come before) is guarateed, if not, we could generate wrong results.

Also, the conjuncts is in the sequence like this:

IsNotNull(A), IsNotNull(B), A > xx, B > xxx

We should re-order them explicitly, make sue they are in this order:

IsNotNull(A), A > xx, IsNotNull(B), B > xxx

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #54107 has finished for PR 11792 at commit 5d71fc5.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ExprCode(var _code: String, var isNull: String, var value: String,

@nongli
Copy link
Contributor Author

nongli commented Mar 24, 2016

@davies I did this a different way. Let me know your thoughts.

… and NULL improvements.

This improves the Filter codegen to optimize IsNotNull filters which are common. This patch
defers loading attributes as late as possible within the filter operator. This takes advantage
of short-circuiting.

Instead of generating code like:
boolean isNull = ...
int value = ...
boolean isNull2 = ...
int value2 = ...
if (isNull) continue;

we will generate:
boolean isNull = ...
int value = ...
if (isNull) continue;
int value2 = ...
if (isNull) continue;

On tpcds q55, this fixes the regression from introducing the IsNotNull predicates.

TPCDS Snappy:                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
--------------------------------------------------------------------------------
q55                                      4564 / 5036         25.2          39.6
q55                                      4064 / 4340         28.3          35.3
@SparkQA
Copy link

SparkQA commented Mar 26, 2016

Test build #54228 has finished for PR 11792 at commit 8f5dfc9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val idx = notNullPreds.indexWhere { n => n.asInstanceOf[IsNotNull].child.semanticEquals(r)}
if (idx != -1 && !generatedIsNotNullChecks(idx)) {
// Use the child's output. The nullability is what the child produced.
val code = genPredicate(notNullPreds(idx), input, child.output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generatedIsNotNullChecks(idx) = true
genPredicate(notNullPreds(idx), input, child.output)

@davies
Copy link
Contributor

davies commented Mar 28, 2016

LGTM, just two minor comments.

@SparkQA
Copy link

SparkQA commented Mar 28, 2016

Test build #54342 has finished for PR 11792 at commit 1a77012.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 28, 2016

Test build #2702 has finished for PR 11792 at commit 1a77012.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 29, 2016

Test build #2706 has finished for PR 11792 at commit 1a77012.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Mar 29, 2016

Thanks - merging in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants