Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
address review comments
  • Loading branch information
kiszk committed Apr 16, 2017
commit 791aad97338a7fac600ed1f185fb43a73e74d772
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,6 @@ abstract class Optimizer(sessionCatalog: SessionCatalog, conf: SQLConf)
CombineUnions,
// Constant folding and strength reduction
NullPropagation(conf),
EliminateMapObjects,
FoldablePropagation,
OptimizeIn(conf),
ConstantFolding,
Expand Down Expand Up @@ -120,7 +119,8 @@ abstract class Optimizer(sessionCatalog: SessionCatalog, conf: SQLConf)
CostBasedJoinReorder(conf)) ::
Batch("Decimal Optimizations", fixedPoint,
DecimalAggregates(conf)) ::
Batch("Typed Filter Optimization", fixedPoint,
Batch("Object Expressions Optimization", fixedPoint,
EliminateMapObjects,
CombineTypedFilters) ::
Batch("LocalRelation", fixedPoint,
ConvertToLocalRelation,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -369,7 +369,7 @@ case class NullPropagation(conf: SQLConf) extends Rule[LogicalPlan] {
case EqualNullSafe(Literal(null, _), r) => IsNull(r)
case EqualNullSafe(l, Literal(null, _)) => IsNull(l)

case _ @ AssertNotNull(c, _) if !c.nullable => c
case AssertNotNull(c, _) if !c.nullable => c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe to do? According to the description of AssertNotNull, even c is non-nullable, we still need to add this assertion for some cases.

Copy link
Member Author

@kiszk kiszk Apr 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is what @cloud-fan suggested in his comment.
Is there better alternative implementation to remove this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if @cloud-fan's no-op AssertNotNull is as the same as the case in AssertNotNull's description.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good catch! sorry it was my mistake, but then seems we can not remove MapObjects, as the null check have to be done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I checked all the usage of AssertNotNull, we never use AssertNotNull to check a not nullable column/field, seems the document of AssertNotNull is wrong. Can you double check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @cloud-fan. I have also checked the usages of AssertNotNull. IIUC, all of them are used for throwing a runtime exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the purpose of AssertNotNull is used to give proper exception in runtime when an expression (note: it can be nullable or non-nullable expression) evaluates to null value.

Maybe for MapObjects, we can safely remove it. But I am not sure other cases it is okay too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as we trust the nullable property, I think it's safe to remove it. We don't use AssertNull to validate the input data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Sounds reasonable to me.


// For Coalesce, remove null literals.
case e @ Coalesce(children) =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,11 +110,18 @@ object EliminateMapObjects extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we call plan.transformAllExpressions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it works. done

case _ @ DeserializeToObject(Invoke(
MapObjects(_, _, _, Cast(LambdaVariable(_, _, dataType, _), castDataType, _),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Cast should be eliminated when running this rule?

Copy link
Member Author

@kiszk kiszk Apr 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I eliminate Cast in this rule, too. This Cast seems to be added very recently by a recent commit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, do you mean

  1. Cast should be eliminated by this rule?; or
  2. Cast should be eliminated before applying this rule?

inputData, None, _),
inputData, None),
funcName, returnType: ObjectType, arguments, propagateNull, returnNullable),
outputObjAttr, child) if dataType == castDataType =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this case. Cast has been removed by SimplifyCasts .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, as you pointed out, Cast has been removed by SimplifyCasts.
I leave this for robustness. In the future, this optimization will be executed before SimplifyCasts by reordering.
What do you think? cc: @cloud-fan

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order does not matter. The batch will be run multiple times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

DeserializeToObject(Invoke(
inputData, funcName, returnType, arguments, propagateNull, returnNullable),
outputObjAttr, child)
case _ @ DeserializeToObject(Invoke(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: _ @ DeserializeToObject -> DeserializeToObject

MapObjects(_, _, _, LambdaVariable(_, _, dataType, _), inputData, None),
Copy link
Member

@gatorsmile gatorsmile Apr 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataType is not being used. Basically, this rule is to get rid of MapObjects when no function is applied to LambdaVariable. Do we still need to check whether customCollectionCls is equal to None? Any other scenario besides type casting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @cloud-fan pointed out in this comment , it is necessary. customCollectionCls is introduced by #16541.
This is not equal to None when Seq() is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, for safety, we can keep it.

funcName, returnType: ObjectType, arguments, propagateNull, returnNullable),
outputObjAttr, child) =>
DeserializeToObject(Invoke(
inputData, funcName, returnType, arguments, propagateNull, returnNullable),
outputObjAttr, child)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,17 @@ import org.apache.spark.sql.catalyst.rules.RuleExecutor
import org.apache.spark.sql.types._

class EliminateMapObjectsSuite extends PlanTest {
object Optimize extends RuleExecutor[LogicalPlan] {
val batches =
class Optimize(addSimplifyCast: Boolean) extends RuleExecutor[LogicalPlan] {
val batches = if (addSimplifyCast) {
Batch("EliminateMapObjects", FixedPoint(50),
NullPropagation(conf),
SimplifyCasts,
EliminateMapObjects) :: Nil
} else {
Batch("EliminateMapObjects", FixedPoint(50),
NullPropagation(conf),
EliminateMapObjects) :: Nil
}
}

implicit private def intArrayEncoder = ExpressionEncoder[Array[Int]]()
Expand All @@ -42,19 +48,23 @@ class EliminateMapObjectsSuite extends PlanTest {
val intObjType = ObjectType(classOf[Array[Int]])
val intInput = LocalRelation('a.array(ArrayType(IntegerType, false)))
val intQuery = intInput.deserialize[Array[Int]].analyze
val intOptimized = Optimize.execute(intQuery)
val intExpected = DeserializeToObject(
Invoke(intInput.output(0), "toIntArray", intObjType, Nil, true, false),
AttributeReference("obj", intObjType, true)(), intInput)
comparePlans(intOptimized, intExpected)
Seq(true, false).foreach { addSimplifyCast =>
val intOptimized = new Optimize(addSimplifyCast).execute(intQuery)
val intExpected = DeserializeToObject(
Invoke(intInput.output(0), "toIntArray", intObjType, Nil, true, false),
AttributeReference("obj", intObjType, true)(), intInput)
comparePlans(intOptimized, intExpected)
}

val doubleObjType = ObjectType(classOf[Array[Double]])
val doubleInput = LocalRelation('a.array(ArrayType(DoubleType, false)))
val doubleQuery = doubleInput.deserialize[Array[Double]].analyze
val doubleOptimized = Optimize.execute(doubleQuery)
val doubleExpected = DeserializeToObject(
Invoke(doubleInput.output(0), "toDoubleArray", doubleObjType, Nil, true, false),
AttributeReference("obj", doubleObjType, true)(), doubleInput)
comparePlans(doubleOptimized, doubleExpected)
Seq(true, false).foreach { addSimplifyCast =>
val doubleOptimized = new Optimize(addSimplifyCast).execute(doubleQuery)
val doubleExpected = DeserializeToObject(
Invoke(doubleInput.output(0), "toDoubleArray", doubleObjType, Nil, true, false),
AttributeReference("obj", doubleObjType, true)(), doubleInput)
comparePlans(doubleOptimized, doubleExpected)
}
}
}