Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Constant Folding(null propagation)
  • Loading branch information
chenghao-intel committed Apr 28, 2014
commit 2645d4fd0acca695be02e332e7f3a69b33f7f2a1
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,7 @@ abstract class BinaryPredicate extends BinaryExpression with Predicate {
def nullable = left.nullable || right.nullable
}

case class Not(child: Expression) extends Predicate with trees.UnaryNode[Expression] {
def references = child.references
case class Not(child: Expression) extends UnaryExpression with Predicate {
override def foldable = child.foldable
def nullable = child.nullable
override def toString = s"NOT $child"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,11 +94,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
case q: LogicalPlan => q transformExpressionsDown {
// Skip redundant folding of literals.
case l: Literal => l
case e @ If(Literal(v, _), trueValue, falseValue) => if(v == true) trueValue else falseValue
case e @ In(Literal(v, _), list) if(list.exists(c => c match {
case Literal(candidate, _) if(candidate == v) => true
case _ => false
})) => Literal(true, BooleanType)
case e if e.foldable => Literal(e.eval(null), e.dataType)
}
}
}

/**
* The expression may be constant value, due to one or more of its children expressions is null or
* not null constantly, replaces [[catalyst.expressions.Expression Expressions]] with equivalent
* [[catalyst.expressions.Literal Literal]] values if possible caused by that.
*/
object NullPropagation extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsUp {
case l: Literal => l
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to skip literals since none of the conditions below can ever match a raw literal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking if put the literal matching in the beginning, maybe helpful avoid the further pattern matching of the rest rules. Just a tiny performance optimization for Literal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By that logic it would be an optimization to skip any class that won't match the cases below. Why is Literal a special case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as the rule ConstantFolding, NullPropagation won't do any transformation for Literal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in the case of ConstantFolding the subsequent pattern will match Literal, since a Literal is technically foldable. Matching the next pattern causes the rule to invoke the expression evaluator and create an identical, wasted object.

In NullPropogation, a Literal will not match any of the later rules. So in essence you are second guessing the code generated by the pattern matcher. While there may be extreme cases where that is required for performance, I don't think this is one of them.

case e @ IsNull(Literal(null, _)) => Literal(true, BooleanType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these already fold correctly.

scala> sql("SELECT null IS NULL")
res4: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[0] at RDD at SchemaRDD.scala:96
== Query Plan ==
Project [true AS c0#0]

Maybe we should write tests for each case, before adding the rule, to make sure it is broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, correctly, if all of the operands are literal, and it's covered by the rule expression.foldable in most of cases. I removed the overlapped cases from the rule NullPropagation

case e @ IsNull(Literal(_, _)) => Literal(false, BooleanType)
case e @ IsNull(c @ Rand) => Literal(false, BooleanType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this more generally stated as case IsNull(e) if e.nullable == false => Literal(false)? Similarly below.

case e @ IsNotNull(Literal(null, _)) => Literal(false, BooleanType)
case e @ IsNotNull(Literal(_, _)) => Literal(true, BooleanType)
case e @ IsNotNull(c @ Rand) => Literal(true, BooleanType)
case e @ GetItem(Literal(null, _), _) => Literal(null, e.dataType)
case e @ GetItem(_, Literal(null, _)) => Literal(null, e.dataType)
case e @ GetField(Literal(null, _), _) => Literal(null, e.dataType)
case e @ Coalesce(children) => {
val newChildren = children.filter(c => c match {
case Literal(null, _) => false
case _ => true
})
if(newChildren.length == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can .length == null? Also, transform already does equality checking to minimize GC churn, so I think its okay to just return a filtered version of the children.

Literal(null, e.dataType)
} else if(newChildren.length == children.length){
e
} else {
Coalesce(newChildren)
}
}
// TODO put exceptional cases(Unary & Binary Expression) before here.
case e: UnaryExpression => e.child match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its reasonable to enforce this nullability semantic on unary and binary nodes, but we should add something to their scaladoc. Maybe also just make SortOrder not a UnaryNode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I've update the SortOrder which inherits from UnaryNode instead of UnaryExpression.

case Literal(null, _) => Literal(null, e.dataType)
}
case e: BinaryExpression => e.children match {
case Literal(null, _) :: right :: Nil => Literal(null, e.dataType)
case left :: Literal(null, _) :: Nil => Literal(null, e.dataType)
}
}
}
}

/**
* Simplifies boolean expressions where the answer can be determined without evaluating both sides.
* Note that this rule can eliminate expressions that might otherwise have been evaluated and thus
Expand Down