-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9858][SPARK-9859][SPARK-9861][SQL] Add an ExchangeCoordinator to estimate the number of post-shuffle partitions for aggregates and joins #9276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is for testing purpose.
|
Test build #44342 has finished for PR 9276 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
", in bytes" ?
|
Test build #44472 has finished for PR 9276 at commit
|
|
Test build #44487 has finished for PR 9276 at commit
|
|
yhuai@24e1caf is only for testing purpose. We will remove it before we commit the code. |
|
@JoshRosen I changed |
|
Test build #44544 has finished for PR 9276 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment why that is required briefly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Test build #44561 has finished for PR 9276 at commit
|
|
The last three commits are just for testing purpose. |
|
test this please |
|
Test build #44632 has finished for PR 9276 at commit
|
|
Test build #44638 has finished for PR 9276 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add a toString method to this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
test this please |
|
Test build #44748 has finished for PR 9276 at commit
|
|
Test build #44781 has finished for PR 9276 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an @GuardedBy("this")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a set instead of an array buffer in order to guard against double-registration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is the case, I guess we should expose that and fix the double registration problem, right? In doEstimationIfNecessary, we have an assert assert(exchanges.length == numExchanges).
|
Let's merge this now and post-hoc review in more detail later. |
|
Test build #44882 has finished for PR 9276 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this call super.beforeAll()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we are fine since SparkFunSuite does not extend BeforeAndAfterAll.
|
ok. I am merging it now. Will have a follow-up pr to address @JoshRosen's comments. |
…to estimate the number of post-shuffle partitions for aggregates and joins https://issues.apache.org/jira/browse/SPARK-9858 https://issues.apache.org/jira/browse/SPARK-9859 https://issues.apache.org/jira/browse/SPARK-9861 Author: Yin Huai <[email protected]> Closes apache#9276 from yhuai/numReducer.
…f post-shuffle partitions for aggregates and joins (follow-up) https://issues.apache.org/jira/browse/SPARK-9858 This PR is the follow-up work of #9276. It addresses JoshRosen's comments. Author: Yin Huai <[email protected]> Closes #9453 from yhuai/numReducer-followUp. (cherry picked from commit 8211aab) Signed-off-by: Yin Huai <[email protected]>
…f post-shuffle partitions for aggregates and joins (follow-up) https://issues.apache.org/jira/browse/SPARK-9858 This PR is the follow-up work of #9276. It addresses JoshRosen's comments. Author: Yin Huai <[email protected]> Closes #9453 from yhuai/numReducer-followUp.
…to estimate the number of post-shuffle partitions for aggregates and joins https://issues.apache.org/jira/browse/SPARK-9858 https://issues.apache.org/jira/browse/SPARK-9859 https://issues.apache.org/jira/browse/SPARK-9861 Author: Yin Huai <[email protected]> Closes apache#9276 from yhuai/numReducer.
…f post-shuffle partitions for aggregates and joins (follow-up) https://issues.apache.org/jira/browse/SPARK-9858 This PR is the follow-up work of apache#9276. It addresses JoshRosen's comments. Author: Yin Huai <[email protected]> Closes apache#9453 from yhuai/numReducer-followUp.
…f post-shuffle partitions for aggregates and joins (follow-up) https://issues.apache.org/jira/browse/SPARK-9858 This PR is the follow-up work of apache/spark#9276. It addresses JoshRosen's comments. Author: Yin Huai <[email protected]> Closes #9453 from yhuai/numReducer-followUp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the implementation, we can't guarantee to satisfy minNumPostShufflePartitions right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. It is advisory.
|
Is |
|
No it is not documented. Right now, it is mainly for people who are interested in experimenting it. There are still work that needs to be done to make it support more cases. |
|
@yhuai , could you please let us know is there any known issues / limitation with this feature ? Has this feature been tested under some large jobs ? We are also considering automatical determining shuffle partitions, and happened to see this PR, and therefore interested in exploring this feature a little bit to see if we could productionize it for all jobs (by default). |
https://issues.apache.org/jira/browse/SPARK-9858
https://issues.apache.org/jira/browse/SPARK-9859
https://issues.apache.org/jira/browse/SPARK-9861