-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-48421][SQL] SPJ: Add documentation #46745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
But would be great if we have eyes from other folks like @sunchao @cloud-fan |
sunchao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too with a few nits, thanks @szehon-ho !
|
|
||
| Storage Partition Join (SPJ) is an optimization technique in Spark SQL that makes use the existing storage layout to avoid the shuffle phase. | ||
|
|
||
| This is a generalization of the concept of Bucket Joins, which is only applicable for [bucketed](sql-data-sources-load-save-functions.html#bucketing-sorting-and-partitioning) tables, to tables partitioned by functions registered in FunctionCatalog. Storage Partition Joins are currently supported for compatible V2 DataSources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this bucketed link doesn't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, i built the site and it seems to work, but let me know if another one is better here?
docs/sql-performance-tuning.md
Outdated
|
|
||
| This is a generalization of the concept of Bucket Joins, which is only applicable for [bucketed](sql-data-sources-load-save-functions.html#bucketing-sorting-and-partitioning) tables, to tables partitioned by functions registered in FunctionCatalog. Storage Partition Joins are currently supported for compatible V2 DataSources. | ||
|
|
||
| The following SQL properties enable Storage Partition Join. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps The following SQL properties enable Storage Partition Join and various optimizations of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, added 'in different join queries with various optimizations.' (as some flags are about different scenarios)
|
Merged to master. |
|
Thank you @HyukjinKwon @sunchao |
What changes were proposed in this pull request?
Add docs for SPJ
Why are the changes needed?
There are no docs describing SPJ, even though it is mentioned in migration notes: #46673
Does this PR introduce any user-facing change?
No
How was this patch tested?
Checked the new text
Was this patch authored or co-authored using generative AI tooling?
No