-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18578][SQL] Full outer join in correlated subquery returns incorrect results #16005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rrect results ## What changes were proposed in this pull request? This patch fixes the incorrect results in the rule ResolveSubquery in Catalyst's Analysis phase. ## How was this patch tested? ./dev/run-tests a new unit test on the problematic pattern.
…rrect results ## What changes were proposed in this pull request? This patch fixes the incorrect results in the rule ResolveSubquery in Catalyst's Analysis phase. ## How was this patch tested? ./dev/run-tests a new unit test on the problematic pattern.
| // in a Full (Outer) Join operator and its descendants | ||
| case j @ Join(left, right, FullOuter, _) => | ||
| failOnOuterReference(j) | ||
| failOnOuterReferenceInSubTree(left, "a FULL OUTER JOIN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you call failOnOuterReferenceInSubTree(j, "a FULL OUTER JOIN") you only need to do that once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
|
@hvanhovell I want to get your opinion on this. The more I read the code in this block of The code disallows operators in a sub plan of an operator hosting correlation on a case by case basis. As it is today, it only blocks Union/Intersect/Except/Expand/LocalLimit/GlobalLimit/Sample/FOJ and right table of LOJ (and left table of ROJ). That means any LogicalPlan operators that are not in the list above are permitted to be under a correlation point. Is this risky? There are many (30+ at least from browsing the LogicalPlan type hierarchy) operators derived from LogicalPlan class. Should we whitelist what operators allowed? For the case of ScalarSubquery, it explicitly checks that only SubqueryAlias/Project/Filter/Aggregate are allowed (CheckAnalysis.scala around line 126-165 in and after |
|
@nsyca +1000 on whitelisting operators. That is what we should have done from the start. Let's break it down:
|
|
Test build #69135 has finished for PR 16005 at commit
|
|
@hvanhovell I will work on the whitelist in a new JIRA under SPARK-18455. It will be my top priority task and I hope we can merge it in the next minor release of 2.0.x. Let's have this PR scoped for the FOJ and Window cases. Shall we? |
|
Test build #69137 has finished for PR 16005 at commit
|
|
LGTM. Merging to master/2.1. Thanks! |
…orrect results ## What changes were proposed in this pull request? - Raise Analysis exception when correlated predicates exist in the descendant operators of either operand of a Full outer join in a subquery as well as in a FOJ operator itself - Raise Analysis exception when correlated predicates exists in a Window operator (a side effect inadvertently introduced by SPARK-17348) ## How was this patch tested? Run sql/test catalyst/test and new test cases, added to SubquerySuite, showing the reported incorrect results. Author: Nattavut Sutyanyong <[email protected]> Closes #16005 from nsyca/FOJ-incorrect.1. (cherry picked from commit a367d5f) Signed-off-by: Herman van Hovell <[email protected]>
|
Hmmm. I cannot merge to 2.0 :(... Can you open a backport against 2.0? |
…orrect results ## What changes were proposed in this pull request? - Raise Analysis exception when correlated predicates exist in the descendant operators of either operand of a Full outer join in a subquery as well as in a FOJ operator itself - Raise Analysis exception when correlated predicates exists in a Window operator (a side effect inadvertently introduced by SPARK-17348) ## How was this patch tested? Run sql/test catalyst/test and new test cases, added to SubquerySuite, showing the reported incorrect results. Author: Nattavut Sutyanyong <[email protected]> Closes apache#16005 from nsyca/FOJ-incorrect.1.
…orrect results ## What changes were proposed in this pull request? - Raise Analysis exception when correlated predicates exist in the descendant operators of either operand of a Full outer join in a subquery as well as in a FOJ operator itself - Raise Analysis exception when correlated predicates exists in a Window operator (a side effect inadvertently introduced by SPARK-17348) ## How was this patch tested? Run sql/test catalyst/test and new test cases, added to SubquerySuite, showing the reported incorrect results. Author: Nattavut Sutyanyong <[email protected]> Closes apache#16005 from nsyca/FOJ-incorrect.1.
What changes were proposed in this pull request?
How was this patch tested?
Run sql/test catalyst/test and new test cases, added to SubquerySuite, showing the reported incorrect results.