-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17981] [SPARK-17957] [SQL] Fix Incorrect Nullability Setting to False in FilterExec #15523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
54c3cc8
ce418f9
52cb8fb
c25df4d
4f2101e
49daace
2364cc2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -87,7 +87,14 @@ case class FilterExec(condition: Expression, child: SparkPlan) | |
|
|
||
| // Split out all the IsNotNulls from condition. | ||
| private val (notNullPreds, otherPreds) = splitConjunctivePredicates(condition).partition { | ||
| case IsNotNull(a: NullIntolerant) if a.references.subsetOf(child.outputSet) => true | ||
| case IsNotNull(a) => isNullIntolerant(a) && a.references.subsetOf(child.outputSet) | ||
| case _ => false | ||
| } | ||
|
|
||
| // One expression is null intolerant iff it and its children are null intolerant | ||
| private def isNullIntolerant(expr: Expression): Boolean = expr match { | ||
| case e: NullIntolerant => | ||
| if (e.isInstanceOf[LeafExpression]) true else e.children.forall(isNullIntolerant) | ||
|
||
| case _ => false | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
This change is too conservative. Actually we only need to consider a non
NullIntolerantexpression when it contains the attributes in the output. I think we can do more aggressive way. E.g.,There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized the original code was from your PR. Then, in your above code, why you still need to keep
a.references.subsetOf(child.outputSet)? It looks confusing to me.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even
apassed the check ofisNullIntolerant, i.e., it has not nonNullIntolerantwhich wraps output attributes. If it doesn't refer to any output attributes, we don't need it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you show me an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsNotNull(Rand() > 0.5)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh, I see.
First, we definitely need test cases to cover each positive and negative scenario. Previously, we did not have any test case to check the validity of nullability changes. Second, the code needs more comments when the variable/function names are not able to explain the codes.