[SPARK-20392][SQL][followup] should not add extra AnalysisBarrier #20094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

cloud-fan wants to merge 3 commits into apache:master from cloud-fan:barrier

Contributor

cloud-fan commented Dec 27, 2017 •

edited

Loading

What changes were proposed in this pull request?

I found this problem while auditing the analyzer code. It's dangerous to introduce extra AnalysisBarrer during analysis, as the plan inside it will bypass all analysis afterward, which may not be expected. We should only preserve AnalysisBarrer but not introduce new ones.

How was this patch tested?

existing tests

Contributor Author

cloud-fan commented Dec 27, 2017

cc @viirya @gatorsmile

cloud-fan commented

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated

Contributor Author

cloud-fan Dec 27, 2017

just make the names shorter

cloud-fan commented

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated

Contributor Author

cloud-fan Dec 27, 2017

I refactored the code to resolve expressions and add missing attributes in one shot, so that we have a central place to deal with analysis barrier and to decide which operator is supported and which is not.

SparkQA commented Dec 27, 2017

Test build #85439 has finished for PR 20094 at commit 64709fc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya reviewed

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated

Member

viirya Dec 27, 2017

If right plan is wrapped (e.g., we join two datasets) in an analysis barrier, the later right.collect doesn't work.

Member

viirya Dec 27, 2017

oh, I see, you have recursively dedupRight on it.


          should not add extra AnalysisBarrier

cd39760

cloud-fan force-pushed the barrier branch from 64709fc to cd39760 Compare

December 28, 2017 02:25

viirya reviewed

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated

Member

viirya Dec 28, 2017 •

edited

Loading

newRight is introduced before to be wrapped in AnalysisBarrier. We can get rid of this redundant variable now.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated

    
                        case d: Distinct =>

                          (exprs.map(resolveExpression(_, d)), d)

                        case u: UnaryNode =>

Member

viirya Dec 28, 2017

Shouldn't we stop at SubqueryAlias as before?

Contributor Author

cloud-fan Dec 28, 2017

ah good catch! I missed that because the logic was in resolveExpressionRecursively instead of addMissingAttr.

It indicates that it's more clear to merge these 2 methods :)

Member

viirya commented Dec 28, 2017

LGTM with two minor comments.

cloud-fan added 2 commits

December 28, 2017 10:55


          address comments


          one more comment

6a25d60

SparkQA commented Dec 28, 2017

Test build #85452 has finished for PR 20094 at commit 8879870.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA commented Dec 28, 2017

Test build #85450 has finished for PR 20094 at commit cd39760.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA commented Dec 28, 2017

Test build #85453 has finished for PR 20094 at commit 6a25d60.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile reviewed

View reviewed changes

Member

gatorsmile left a comment

LGTM

Member

gatorsmile commented Dec 28, 2017

Thanks! Merged to master.

asfgit closed this in

755f2f5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet