-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-1194] Fix the same-RDD rule for cache replacement #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
62c92ac
40cdcb2
6e40c22
2524ab9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -236,13 +236,23 @@ private class MemoryStore(blockManager: BlockManager, maxMemory: Long) | |
| while (maxMemory - (currentMemory - selectedMemory) < space && iterator.hasNext) { | ||
| val pair = iterator.next() | ||
| val blockId = pair.getKey | ||
| if (rddToAdd.isDefined && rddToAdd == getRddId(blockId)) { | ||
| logInfo("Will not store " + blockIdToAdd + " as it would require dropping another " + | ||
| "block from the same RDD") | ||
| return false | ||
| // Apply the same-RDD rule for cache replacement. Quoted from the | ||
| // original RDD paper: | ||
| // | ||
| // When a new RDD partition is computed but there is not enough | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey @liancheng I think it's okay to remove this quote. If you look at the scaladoc it already explains the intended policy wrt to partitions in the same RDD - so I think that is sufficient. The scaladoc says "which leads to a wasteful cyclic replacement pattern for RDDs don't fit into memory that we want to avoid"
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, removed :) |
||
| // space to store it, we evict a partition from the least recently | ||
| // accessed RDD, unless this is the same RDD as the one with the | ||
| // new partition. In that case, we keep the old partition in memory | ||
| // to prevent cycling partitions from the same RDD in and out. | ||
| // | ||
| // TODO implement LRU eviction | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. entries is already a LinkedHashMap - so you iterate in LRU : you can remove the comment.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mridulm , I think LinkedHashMap actually keeps the order of insertion, but not using? (though I'm not clear how to track when the block is accessed by the tasks for now...)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see |
||
| rddToAdd match { | ||
| case Some(rddId) if rddId == getRddId(blockId) => | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Made a mistake here, |
||
| // no-op | ||
| case _ => | ||
| selectedBlocks += blockId | ||
| selectedMemory += pair.getValue.size | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just a suggested alternative to LRU: To minimize the number of affected RDDs, how about evicting the blocks from those RDDs occupying the most memory space first, because in usual, all the blocks in RDD are necessary for the computation, this approach may minimize the chance for recomputation |
||
| } | ||
| selectedBlocks += blockId | ||
| selectedMemory += pair.getValue.size | ||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy over this log message to L267/L277 - when there blocks in entries and eviction wont help store this block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, thanks.