-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11307] Reduce memory consumption of OutputCommitCoordinator #9274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A shuffle map stage's maximum partition id is determined by the number of partitions in the RDD being computed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I was reviewing this, I was wondering if a ShuffleMapStage could have a different maximum partitionId if it was from a skipped stage. I'm now convinced it cannot, but it might be a bit clearer if we change the constructor to not even take a numTasks argument, since it should always be rdd.partitions.length? Not necessary for this change, but just a thought while you are touching this.
Also -- isn't the output commit coordinator irrelevant for ShuffleMapStages anyway? If not, than I think there might be another bug there for skipped stages. Since it indexes by stageId, you can have two different stages, that really represent the exact same shuffle, so you could have two different tasks authorized to commit that are handling the same stage. (Which wouldn't be a problem introduced by this change, but I just thought it was worth mentioning.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it should be irrelevant for ShuffleMapStages. I was just being overly-conservative here.
|
/cc @kayousterhout @markhamstra, this seems like a potentially easy win for reducing driver memory consumption when performing a write that outputs millions of partitions. This isn't necessarily a huge amount of memory savings, but it's a substantial reduction in the number of map entry objects created, which could have GC benefits. |
|
Test build #44327 has finished for PR 9274 at commit
|
|
one small question, overall lgtm. but I'm not very familiar w/ the speculative execution code so would appreciate an expert opinion. |
|
LGTM |
|
Test build #45056 has finished for PR 9274 at commit
|
|
Merged into master, thanks! |
OutputCommitCoordinator uses a map in a place where an array would suffice, increasing its memory consumption for result stages with millions of tasks.
This patch replaces that map with an array. The only tricky part of this is reasoning about the range of possible array indexes in order to make sure that we never index out of bounds.