Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
address comments.
  • Loading branch information
gatorsmile committed Nov 3, 2016
commit 2364cc292e0f6585a4647225446b1044c723908c
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,7 @@ case class FilterExec(condition: Expression, child: SparkPlan)

// If one expression and its children are null intolerant, it is null intolerant.
private def isNullIntolerant(expr: Expression): Boolean = expr match {
case e: NullIntolerant =>
if (e.isInstanceOf[LeafExpression]) true else e.children.forall(isNullIntolerant)
case e: NullIntolerant => e.children.forall(isNullIntolerant)
case _ => false
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1667,6 +1667,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
expr = "_1 + _2", expectedNonNullableColumns = Seq("_1", "_2"))
verifyNullabilityInFilterExec(df,
expr = "_1", expectedNonNullableColumns = Seq("_1"))
// `constructIsNotNullConstraints` infers the IsNotNull(_2) from IsNotNull(_2 + Rand())
// Thus, we are able to set nullability of _2 to false.
// If IsNotNull(_2) is not given from `constructIsNotNullConstraints`, the impl of
// isNullIntolerant in `FilterExec` needs an update for more advanced inference.
verifyNullabilityInFilterExec(df,
expr = "_2 + Rand()", expectedNonNullableColumns = Seq("_2"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not work in current approach. It works now because we infer redundant IsNotNull constraints. E.g., if Filter has a constraint IsNotNull(_2 + Rand()), we will infer another IsNotNull(_2) from it. Your approach is working on IsNotNull(_2) to decide _2 is non-nullable column, not IsNotNull(_2 + Rand()).

I submitted another PR #15653 for redundant IsNotNull constraints. But I am not sure if we want to fix it since it doesn't affect correctness. I left that to @cloud-fan or @hvanhovell to decide it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already explained why my current solution works in my previous statement. Personally, I like simple code, which is easy to understand and maintain, especially when it can cover all the cases. Result correctness and code maintainability are alwasy more important.

If constructIsNotNullConstraints is changed by somebody else (i.e., it does not provide the expected IsNotNull constraints), the test cases added by this PR will fail. Then, we can modify the codes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so I said I will left that to @cloud-fan or others to decide...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the simplicity argument but can you please add a comment here explaining why this particular case is working due to the null inference rule?

verifyNullabilityInFilterExec(df,
Expand Down