-
Notifications
You must be signed in to change notification settings - Fork 29k
[WIP][Spark-SQL] Optimize the Constant Folding for Expression #482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
2645d4f
3c045c7
536c005
9cf0396
543ef9d
9ccefdb
b28e03a
27ea3d7
80f9f18
50444cc
29c8166
68b9fad
2f14b50
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -94,11 +94,59 @@ object ConstantFolding extends Rule[LogicalPlan] { | |
| case q: LogicalPlan => q transformExpressionsDown { | ||
| // Skip redundant folding of literals. | ||
| case l: Literal => l | ||
| case e @ If(Literal(v, _), trueValue, falseValue) => if(v == true) trueValue else falseValue | ||
| case e @ In(Literal(v, _), list) if(list.exists(c => c match { | ||
| case Literal(candidate, _) if(candidate == v) => true | ||
| case _ => false | ||
| })) => Literal(true, BooleanType) | ||
| case e if e.foldable => Literal(e.eval(null), e.dataType) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * The expression may be constant value, due to one or more of its children expressions is null or | ||
| * not null constantly, replaces [[catalyst.expressions.Expression Expressions]] with equivalent | ||
| * [[catalyst.expressions.Literal Literal]] values if possible caused by that. | ||
| */ | ||
| object NullPropagation extends Rule[LogicalPlan] { | ||
| def apply(plan: LogicalPlan): LogicalPlan = plan transform { | ||
| case q: LogicalPlan => q transformExpressionsUp { | ||
| case l: Literal => l | ||
| case e @ IsNull(Literal(null, _)) => Literal(true, BooleanType) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some of these already fold correctly. scala> sql("SELECT null IS NULL")
res4: org.apache.spark.sql.SchemaRDD =
SchemaRDD[0] at RDD at SchemaRDD.scala:96
== Query Plan ==
Project [true AS c0#0]Maybe we should write tests for each case, before adding the rule, to make sure it is broken.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, correctly, if all of the operands are literal, and it's covered by the rule |
||
| case e @ IsNull(Literal(_, _)) => Literal(false, BooleanType) | ||
| case e @ IsNull(c @ Rand) => Literal(false, BooleanType) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this more generally stated as |
||
| case e @ IsNotNull(Literal(null, _)) => Literal(false, BooleanType) | ||
| case e @ IsNotNull(Literal(_, _)) => Literal(true, BooleanType) | ||
| case e @ IsNotNull(c @ Rand) => Literal(true, BooleanType) | ||
| case e @ GetItem(Literal(null, _), _) => Literal(null, e.dataType) | ||
| case e @ GetItem(_, Literal(null, _)) => Literal(null, e.dataType) | ||
| case e @ GetField(Literal(null, _), _) => Literal(null, e.dataType) | ||
| case e @ Coalesce(children) => { | ||
| val newChildren = children.filter(c => c match { | ||
| case Literal(null, _) => false | ||
| case _ => true | ||
| }) | ||
| if(newChildren.length == null) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can |
||
| Literal(null, e.dataType) | ||
| } else if(newChildren.length == children.length){ | ||
| e | ||
| } else { | ||
| Coalesce(newChildren) | ||
| } | ||
| } | ||
| // TODO put exceptional cases(Unary & Binary Expression) before here. | ||
| case e: UnaryExpression => e.child match { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think its reasonable to enforce this nullability semantic on unary and binary nodes, but we should add something to their scaladoc. Maybe also just make
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, I've update the SortOrder which inherits from UnaryNode instead of UnaryExpression. |
||
| case Literal(null, _) => Literal(null, e.dataType) | ||
| } | ||
| case e: BinaryExpression => e.children match { | ||
| case Literal(null, _) :: right :: Nil => Literal(null, e.dataType) | ||
| case left :: Literal(null, _) :: Nil => Literal(null, e.dataType) | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Simplifies boolean expressions where the answer can be determined without evaluating both sides. | ||
| * Note that this rule can eliminate expressions that might otherwise have been evaluated and thus | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to skip literals since none of the conditions below can ever match a raw literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking if put the literal matching in the beginning, maybe helpful avoid the further pattern matching of the rest rules. Just a tiny performance optimization for Literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By that logic it would be an optimization to skip any class that won't match the cases below. Why is Literal a special case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as the rule ConstantFolding, NullPropagation won't do any transformation for Literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but in the case of
ConstantFoldingthe subsequent pattern will matchLiteral, since aLiteralis technicallyfoldable. Matching the next pattern causes the rule to invoke the expression evaluator and create an identical, wasted object.In
NullPropogation, aLiteralwill not match any of the later rules. So in essence you are second guessing the code generated by the pattern matcher. While there may be extreme cases where that is required for performance, I don't think this is one of them.