Skip to content

Conversation

@JoshRosen
Copy link
Contributor

OutputCommitCoordinator uses a map in a place where an array would suffice, increasing its memory consumption for result stages with millions of tasks.

This patch replaces that map with an array. The only tricky part of this is reasoning about the range of possible array indexes in order to make sure that we never index out of bounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A shuffle map stage's maximum partition id is determined by the number of partitions in the RDD being computed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I was reviewing this, I was wondering if a ShuffleMapStage could have a different maximum partitionId if it was from a skipped stage. I'm now convinced it cannot, but it might be a bit clearer if we change the constructor to not even take a numTasks argument, since it should always be rdd.partitions.length? Not necessary for this change, but just a thought while you are touching this.

Also -- isn't the output commit coordinator irrelevant for ShuffleMapStages anyway? If not, than I think there might be another bug there for skipped stages. Since it indexes by stageId, you can have two different stages, that really represent the exact same shuffle, so you could have two different tasks authorized to commit that are handling the same stage. (Which wouldn't be a problem introduced by this change, but I just thought it was worth mentioning.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be irrelevant for ShuffleMapStages. I was just being overly-conservative here.

@JoshRosen
Copy link
Contributor Author

/cc @kayousterhout @markhamstra, this seems like a potentially easy win for reducing driver memory consumption when performing a write that outputs millions of partitions. This isn't necessarily a huge amount of memory savings, but it's a substantial reduction in the number of map entry objects created, which could have GC benefits.

@SparkQA
Copy link

SparkQA commented Oct 26, 2015

Test build #44327 has finished for PR 9274 at commit 9dc210e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor

squito commented Oct 27, 2015

one small question, overall lgtm. but I'm not very familiar w/ the speculative execution code so would appreciate an expert opinion.

@davies
Copy link
Contributor

davies commented Nov 4, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Nov 5, 2015

Test build #45056 has finished for PR 9274 at commit 5085aa8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Nov 5, 2015

Merged into master, thanks!

@asfgit asfgit closed this in d0b5633 Nov 5, 2015
@JoshRosen JoshRosen deleted the SPARK-11307 branch November 5, 2015 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants