-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25245][DOCS][SS] Explain regarding limiting modification on "spark.sql.shuffle.partitions" for structured streaming #22238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
ec24f29
138cc63
e2ee43d
bb45c26
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2812,6 +2812,12 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f | |
|
|
||
| # Additional Information | ||
|
|
||
| **Gotchas** | ||
|
|
||
| - For structured streaming, modifying "spark.sql.shuffle.partitions" is restricted once you run the query. | ||
| - This is because state is partitioned via key, hence number of partitions for state should be unchanged. | ||
| - If you want to run less tasks for stateful operations, `coalesce` would help with avoiding unnecessary repartitioning. Please note that it will also affect downstream operators. | ||
|
||
|
|
||
| **Further Reading** | ||
|
|
||
| - See and run the | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmmm .. @HeartSaVioR how about leaving them in codes or API somewhere as
note?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, It would be better to keep it here as well as in the code, we may not be able to surface it in the right api docs and chance for users to miss it.
@HeartSaVioR, may be add an example here to illustrate how to use the coalesce?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to add the explanation to
doc()ofspark.sql.shuffle.partitions, but looks like what we explained indoc()would not be published automatically. (Please correct me if I'm missing here.) SQLConf is even not exposed to scaladoc. That's why I'm adding this to structured streaming guide doc. Actually I think most of end users only take a look at this doc for structured streaming, and we can't (and shouldn't) expect end users to take a look at source code to find it.But also actually I didn't notice that
spark.sql.shuffle.partitionsis explained insql-programming-guide.mdbut I also think we need to explain all configs here if they work differently with batch query.spark.sql.shuffle.partitionsis the case.Btw,
Gotchaslooks like funny though. Maybe having section would be better. Maybe like## Configuration Options For Structured Streaminginsql-programming-guide.md?