-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26459][SQL] replace UpdateNullabilityInAttributeReferences with FixNullability #23390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this rule out of the Analyzer, so that it can be used in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is from the UpdateNullabilityInAttributeReferences. Leaf nodes don't have child and no nullability will be updated, then the case below is noop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the impl of resolveOperatorsUp, if the plan is analyzed, this rule will not take any effect in the optimizer stage. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the plan is analyzed
More precisely, the _analyzed flag is true.
This flag will be reset to false if the plan changed(plan copy happened). If it's true, then the plan is not changed since last analysis and we don't need to update the nullability.
|
Test build #100478 has finished for PR 23390 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Analyzer.scala is too big, let's make this as a new file, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about UpdateNullability -> UpdateAttributeNullability?
|
If you don't mind, shall we have a new JIRA issue for this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: plz update the comment inside this test: UpdateNullabilityInAttributeReferences -> UpdateNullability
|
LGTM |
|
Could you adjust the PR title and description by mentioning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this check even after the resolution batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no we don't. But it has no harm, and help us to merge these 2 rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still looks weird to me if we call an analyzer rule in the optimizer. Our codegen impl depends on the correctness of nullability fields. I am wondering which rule could break it? join reordering?
I think our existing test cases might already have such a case. Could you throw an exception if this rule changes the nullability in the optimizer stage? I want to know the exact case why we need to run this in the optimizer stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original PR #18576 explains it, and the test case is https://github.com/apache/spark/pull/23390/files#diff-099c363f75cfc9011d9e08f5a8067038R29
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, if we introduce fixed-size array type, then CreateArray returns fixed-size array, and GetArrayItem can define the nullable smarter if the input is fixed-size array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu do you have more use cases? If it's the only use case, maybe we can simply remove this optimization as its use case is rare. And we can optimize it in a better way in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to understand which cases are improved and then update the nullable at the right place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Removing it from the optimizer looks ok to me, but I remember the rule seems to be related to the existing tests? See: #18576 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we accept this patch and think about removing this optimization later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, that sounds good to me. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When removing it in a following pr, could you reopen the jira, too? https://issues.apache.org/jira/browse/SPARK-21351
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, feel free to merge this PR if you think it's ready to go. thanks!
|
Test build #100486 has finished for PR 23390 at commit
|
|
Test build #100487 has finished for PR 23390 at commit
|
|
Retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just have a map exprId -> nullable here and use this map is the transformExpressions below?
|
Test build #100492 has finished for PR 23390 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds weird that in optimization phase we make correct nullability to wrong again. Can we have other way to solve it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not about wrong nullability, it's an optimization. see https://github.com/apache/spark/pull/23390/files#r244326583
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I see. Then sounds like removing this optimization (https://github.com/apache/spark/pull/23390/files#r244326906) is ok.
|
Test build #100732 has finished for PR 23390 at commit
|
|
retest this please |
|
Pending Jenkins |
|
Test build #100951 has finished for PR 23390 at commit
|
|
retest this please |
|
Test build #100957 has finished for PR 23390 at commit
|
|
@cloud-fan sorry, but could you resolve the conflict? |
|
Thanks, pending Jenkins |
|
Test build #100999 has finished for PR 23390 at commit
|
|
Thanks! merging to master. |
…h FixNullability ## What changes were proposed in this pull request? This is a followup of apache#18576 The newly added rule `UpdateNullabilityInAttributeReferences` does the same thing the `FixNullability` does, we only need to keep one of them. This PR removes `UpdateNullabilityInAttributeReferences`, and use `FixNullability` to replace it. Also rename it to `UpdateAttributeNullability` ## How was this patch tested? existing tests Closes apache#23390 from cloud-fan/nullable. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
…e optimizer ## What changes were proposed in this pull request? This pr removed `UpdateAttributeNullability` from the optimizer because the same logic happens in the analyzer. See SPARK-26459(#23390) for more detailed discussion. ## How was this patch tested? N/A Closes #23508 from maropu/SPARK-21351. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This is a followup of #18576
The newly added rule
UpdateNullabilityInAttributeReferencesdoes the same thing theFixNullabilitydoes, we only need to keep one of them.This PR removes
UpdateNullabilityInAttributeReferences, and useFixNullabilityto replace it. Also rename it toUpdateAttributeNullabilityHow was this patch tested?
existing tests