-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17187][SQL] Supports using arbitrary Java object as internal aggregation buffer object #14753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
[SPARK-17187][SQL] Supports using arbitrary Java object as internal aggregation buffer object #14753
Changes from 1 commit
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
10861b2
object aggregation buffer
clockfly 0fdc1ea
fix comments
clockfly d3108ab
fix review comments
clockfly 2873765
fix review comments
clockfly 7190eb0
fix review comments
clockfly 5904bcd
on viirya's comment
clockfly 8c8bd9a
on yin's comment
clockfly 7e7cb85
On wenchen's comment
clockfly 86166a1
On wenchen's comment
clockfly e060d21
On wenchen's comment
clockfly ac8e36a
add test for nullable aggregation function
clockfly ca574e1
use while loop
yhuai File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
fix review comments
- Loading branch information
commit 7190eb0c2a4dce2c5b84c29fb90bb2def23a3520
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -432,6 +432,12 @@ abstract class DeclarativeAggregate | |
| * 4. After processing all input aggregation objects of current group (group by key), the framework | ||
| * calls method `eval(buffer: T)` to generate the final output for this group. | ||
| * 5. The framework moves on to next group, until all groups have been processed. | ||
| * | ||
| * NOTE: SQL with TypedImperativeAggregate functions is planned in sort based aggregation, | ||
| * instead of hash based aggregation, as TypedImperativeAggregate use BinaryType as aggregation | ||
| * buffer's storage format, which is not supported by hash based aggregation. Hash based | ||
| * aggregation only support aggregation buffer of mutable types (like LongType, IntType that have | ||
| * fixed length and can be mutated in place in UnsafeRow) | ||
| */ | ||
| abstract class TypedImperativeAggregate[T] extends ImperativeAggregate { | ||
|
|
||
|
|
@@ -507,8 +513,9 @@ abstract class TypedImperativeAggregate[T] extends ImperativeAggregate { | |
| } | ||
| } | ||
|
|
||
| private[this] val anyObjectType = ObjectType(classOf[AnyRef]) | ||
| private def getField[U](input: InternalRow, fieldIndex: Int): U = { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems we only need |
||
| input.get(fieldIndex, null).asInstanceOf[U] | ||
| input.get(fieldIndex, anyObjectType).asInstanceOf[U] | ||
| } | ||
|
|
||
| final override lazy val aggBufferAttributes: Seq[AttributeReference] = { | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the wrong way around? Isn't
ImperativeAggregatethe untyped version of anTypedImperativeAggregate? Much likeDatasetandDataFrame?I know this has been done for engineering purposes, but I still wonder if we shouldn't reverse the hierarchy here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImperativeAggregateonly defines the interface. It does not specify what are accepted buffer types, right?