Skip to content
Closed
Prev Previous commit
Next Next commit
remove exprId both in notNullPreds and otherPreds when get notNullAtt…
…ributes
  • Loading branch information
wangshuo128 committed Sep 27, 2019
commit 270172bb23688ff823bb7f93e3f26ca357f52568
Original file line number Diff line number Diff line change
Expand Up @@ -110,14 +110,15 @@ case class FilterExec(condition: Expression, child: SparkPlan)

// The columns that will filtered out by `IsNotNull` could be considered as not nullable.
private val notNullAttributes = notNullPreds.flatMap(_.references).distinct.map(_.exprId)
.diff(otherPreds.flatMap(_.references).distinct.map(_.exprId))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix seems sub-optimal to me. If we have a IsNotNull(a), then a should be a notNullAttribute even if a appears in otherPreds like EqualTo(a, b).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bug is about how we codegen null checks, the current logic is:

  1. split the predicates into notNullPreds (IsNotNull expressions) and otherPreds
  2. try codegen otherPreds first, then notNullPreds.
  3. if an other-predicate can leverage a specific not-null-predicate, then codegen that not-null-predicate first, so that we can codegen the other-predicate with non-nullable attributes.

There is a problem in step 3. Image that we have a IsNotNull(SubString(a)) and a SomeFunc(a). Then a is a not-null attribute for sure, even if there is no IsNullNull(a) predicate.

When we codegen SomeFunc(a), we can't find a IsNotNull(a) expression, so we skip the null check. However, we assume a is not nullable when codegen SomeFunc(a), which is wrong as IsNotNull(SubString(a)) has not been codegened yet.


// Mark this as empty. We'll evaluate the input during doConsume(). We don't want to evaluate
// all the variables at the beginning to take advantage of short circuiting.
override def usedInputs: AttributeSet = AttributeSet.empty

override def output: Seq[Attribute] = {
child.output.map { a =>
if (a.nullable && notNullPreds.exists(_.semanticEquals(a))) {
if (a.nullable && notNullAttributes.contains(a.exprId)) {
a.withNullability(false)
} else {
a
Expand Down