-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23914][SQL] Add array_union function #21061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
dc9d6f0
initial commit
kiszk 3019840
update description
kiszk 8cee6cf
fix test failure
kiszk 2041ec4
address review comments
kiszk 8c2280b
introduce ArraySetUtils to reuse code among array_union/array_interse…
kiszk b3a3132
fix python test failure
kiszk a2c7dd1
fix python test failure
kiszk 5313680
simplification
kiszk 98f8d1f
fix pyspark test failure
kiszk 30ee7fc
address review comments
kiszk cd347e9
add new tests based on review comment
kiszk d2eaee3
fix mistakes in rebase
kiszk 2ddeb06
fix unexpected changes
kiszk 71b31f0
merge changes in #21103
kiszk 7e71340
use GenericArrayData if UnsafeArrayData cannot be used
kiszk 04c97c3
use BinaryArrayExpressionWithImplicitCast
kiszk 401ca7a
update test cases
kiszk 15b953b
rebase with master
kiszk f050922
support complex types
kiszk 8a27667
add test cases with duplication in an array
kiszk e50bc55
rebase with master
kiszk 7e3f2ef
address review comments
kiszk e5401e7
address review comment
kiszk 3e21e48
keep the order of input array elements
kiszk 3c39506
address review comments
kiszk 6654742
fix scala style error
kiszk be9f331
address review comment
kiszk 90e84b3
address review comments
kiszk 6f721f0
address review comments
kiszk 0c0d3ba
address review comments
kiszk 4a217bc
cleanup
kiszk f5ebbe8
eliminate duplicated code
kiszk 763a1f8
address review comments
kiszk 7b51564
address review comment
kiszk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev
Previous commit
address review comment
- Loading branch information
commit 7b515649c75dc68d5d74ebca15628b6377a05344
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we obtain unique elements of two arrays in the hash set, can't we get final array elements from it directly instead of scanning two arrays again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be. Originally, I took that approach.
After discussed with @ueshin, I decided to generate a result array from the original arrays instead of the hash. This is because we generate a result array in a unique deterministic order among the different paths in
array_union.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, though I think there will be some performance issue.