-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-6145][SQL] fix ORDER BY on nested fields #4918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
+14
−3
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather use check answer for these tests, especially if we are going to put them in SQLQuerySuite. When check answer fails it'll give nice exceptions and then we test end to end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried the latest master code in Spark SQL CLI:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, In Hive
Seems Hive has bugs on this ambiguous attribute references, that's why I think we probably need to change that code:
https://github.com/apache/spark/pull/4892/files#diff-27c76f96a7b2733ecfd6f46a1716e153R201
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or seems Hive only support the
ORDER BYreferenced attributes to be listed in the projection list.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right,
analyzedis not actually checking analysis. Ugh... My mistake.I think the bug here is that we are partially analyzing nested field accesses. We should not resolve the
aina.aunless we can also resolve the field access too.The fact that Hive only supports ordering on things from the
SELECTclause sounds like a bug to me. That is not how the SQL spec works right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supports ordering more than attributes from the SELECT clause should be the feature of Spark SQL, so seems we may not able to keep the same name convention as Hive does for the nested data accessing, but will that break lots of stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you end up making things ambiguous, why not just alias the unnesting manually? I do not think it is okay to change the default unnesting alias anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's exactly what I described https://github.com/apache/spark/pull/4892/files#diff-27c76f96a7b2733ecfd6f46a1716e153R201
But the bug you raised in #4892 is quite interesting
Anyway, I will do some investigation on some other database systems other than Hive.