-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20612][SQL] Throw exception when there is unresolvable attributes in Filter #17874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
is it a bug? IIRC this is intentional, so that the dataframe behavior is consistent with SQL. |
|
Test build #76495 has finished for PR 17874 at commit
|
|
The rule is added by #12235. From the description and code comment, it should be just used for HAVING clause that access a grouping column that is not presented in SELECT clause, instead of a general rule to add missing attributes to Filter. |
|
@cloud-fan This rule could make the query work: But the where condition should not be able to refer |
|
Test build #76538 has finished for PR 17874 at commit
|
| * projection, so that they will be available during sorting. Another projection is added to | ||
| * remove these attributes after sorting. | ||
| * | ||
| * The HAVING clause could also used a grouping columns that is not presented in the SELECT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is by design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example,
select type, avg (price)
from titles
group by type
having sum (total_sales) > 10000This example is copied from Sybase ASE. I believe this is part of Transact-SQL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is by design.
For the HAVING clause could also used a grouping columns that is not presented in the SELECT, yes.
For other general cases, I doubt it.
We have other rule doing this (HAVING clause with grouping columns). That is why the tests are passed after this rule is removed. The above query also works without this rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we introduced this by accident, I do not think we can remove it now. It could break the applications that are built on it. cc @rxin @cloud-fan @marmbrus
|
in postgres, this one we should not support, we should not add missing attributes though subqueries. |
|
It seems to me Spark also parses the above SQL query like this way. There is an order of evaluation in SQL systems. E.g, MySQL:
|
|
Maybe another point of view is, we can split |
| val model = new FPGrowth().setMinSupport(0.7).fit(dataset) | ||
| val prediction = model.transform(df) | ||
| assert(prediction.select("prediction").where("id=3").first().getSeq[String](0).isEmpty) | ||
| assert(prediction.where("id=3").select("prediction").first().getSeq[String](0).isEmpty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that existing spark applications may already use this pattern in the code, so no matter it's a bug or not, it seems a feature now and we can't break it...
|
@cloud-fan Since you all concern about breaking existing applications, I'd close this. But I think we should not add missing attributes though subqueries like I showed above. I'll create another PR to fix it. What do you think? |
|
yea that example looks very weird and we should fix it, thanks! |
What changes were proposed in this pull request?
We have a rule in
Analyzerthat adds missing attributes in a Filter into its child plan. It makes the following codes work:It should throw an analysis exception instead of implicitly adding the missing attributes into underlying plan.
How was this patch tested?
Jenkins tests.
Please review http://spark.apache.org/contributing.html before opening a pull request.