-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24296][CORE] Replicate large blocks as a stream. #21451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
EDIT: no longer a WIP, as all the dependencies are in. |
|
Test build #91266 has finished for PR 21451 at commit
|
|
Test build #91268 has finished for PR 21451 at commit
|
|
Test build #91271 has finished for PR 21451 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary to add a parameter. Change the message parameter to InputStream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there are other ways to do this, but I wanted to leave the old code paths as close relatively untouched to minimize the behavior change / risk of bugs. I also think its helpful to clearly separate out a portion that is read entirely into memory vs. the streaming portion, it makes it easier to work with. Also InputStream suggests the data is getting pulled instead of pushed.
your earlier approach definitely gave a lot of inspiration for this change. I'm hoping that making it a more isolated change helps us make progress here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about incorporating parameter message into parameter streamData?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna move discussion here #21346 since that is the PR that will introduce this api
68c5d5f to
1cc0f3f
Compare
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see any red flags but definitely would like another look after the other change goes in - but not sure I'll have time for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update to match API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is off here.
Using .asInstanceOf[UploadBlockStream] would achieve the same goal here with less indentation, just with a different exception...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to turn this into a config constant? I'm seeing it in a bunch of places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/mb/buffer
Confusing in a place that deals with sizes all over.
|
Test build #92398 has finished for PR 21451 at commit
|
|
Test build #92407 has finished for PR 21451 at commit
|
|
Test build #92427 has finished for PR 21451 at commit
|
bdfa6ff to
335e26d
Compare
|
@mridulm @jerryshao @felixcheung last one in the 2GB block limit series. just rebased to include the updates to #21440. I will also run my tests on a cluster here with this: https://github.com/squito/spark_2gb_test/blob/master/src/main/scala/com/cloudera/sparktest/LargeBlocks.scala thanks for all the reviews! |
|
Test build #93250 has finished for PR 21451 at commit
|
|
retest this please |
|
Test build #93255 has finished for PR 21451 at commit
|
When replicating large cached RDD blocks, it can be helpful to replicate them as a stream, to avoid using large amounts of memory during the transfer. This also allows blocks larger than 2GB to be replicated. Added unit tests in DistributedSuite. Also ran tests on a cluster for blocks > 2gb.
335e26d to
fe31a7d
Compare
|
Test build #93549 has finished for PR 21451 at commit
|
|
fyi, I did finally run my scale tests again on a cluster, and shuffles, remote reads, and replication worked for blocks over 2gb (sorry got sidetracked with a few other things in the meantime) |
tgravescs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made one pass through, need to look at some thing in more depth to make sure I understand
| } | ||
| val data = new NioManagedBuffer(ByteBuffer.wrap(uploadBlock.blockData)) | ||
| val blockId = BlockId(uploadBlock.blockId) | ||
| logInfo(s"Receiving replicated block $blockId with level ${level} " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like it could be pretty verbose, put at debug or trace?
|
|
||
| private[spark] val MEMORY_MAP_LIMIT_FOR_TESTS = | ||
| ConfigBuilder("spark.storage.memoryMapLimitForTests") | ||
| .internal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a .doc that says is for testing only
| override def receiveStream( | ||
| client: TransportClient, | ||
| messageHeader: ByteBuffer, | ||
| responseContext: RpcResponseCallback): StreamCallbackWithID = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix spacing
| .asInstanceOf[(StorageLevel, ClassTag[_])] | ||
| } | ||
| val blockId = BlockId(message.blockId) | ||
| logInfo(s"Receiving replicated block $blockId with level ${level} as stream " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug?
| // stream. | ||
| channel.close() | ||
| // TODO Even if we're only going to write the data to disk after this, we end up using a lot | ||
| // of memory here. We wont' get a jvm OOM, but might get killed by the OS / cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling won't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah agree this could be an issue with yarn since overhead memory might not be big enough, can we file a jira to specifically track this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filed SPARK-25035
|
LGTM good to me (after applying the @tgravescs comments). Great job on the whole issue. |
|
Test build #94319 has finished for PR 21451 at commit
|
| import static org.apache.spark.network.shuffle.protocol.BlockTransferMessage.Type; | ||
|
|
||
| /** | ||
| * A request to Upload a block, which the destintation should receive as a stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spelling destination
| // we just write to a temp file, and call putBytes on the data in that file. | ||
| val tmpFile = diskBlockManager.createTempLocalBlock()._2 | ||
| new StreamCallbackWithID { | ||
| val channel: WritableByteChannel = Channels.newChannel(new FileOutputStream(tmpFile)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to honor spark.io.encryption.enabled here to encrypt the file on local disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah sure looks like it.
|
Test build #94695 has finished for PR 21451 at commit
|
|
Test build #94725 has finished for PR 21451 at commit
|
|
test this please |
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sane to me.
| */ | ||
| final protected def encryptionTest(name: String)(fn: SparkConf => Unit) { | ||
| encryptionTestHelper(name) { case (name, conf) => | ||
| test(name)(fn(conf)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation
| case MemoryMode.ON_HEAP => ByteBuffer.allocate _ | ||
| case MemoryMode.OFF_HEAP => Platform.allocateDirectBuffer _ | ||
| } | ||
| new EncryptedBlockData(tmpFile, blockSize, conf, key).toChunkedByteBuffer(allocator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toChunkedByteBuffer is also pretty memory-hungry, right? You'll end up needing enough memory to hold the entire file in memory, if I read the code right.
This is probably ok for now, but should probably mention it in your TODO above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, you store the entire file in memory (after decrypting). its not memory mapped either, so it'll probably be a regular OOM (depending on memory mode). updated the comment
|
Test build #94762 has finished for PR 21451 at commit
|
|
Test build #94778 has finished for PR 21451 at commit
|
|
@tgravescs @vanzin any more comments? I think I've addressed everything |
|
LGTM. Merging to master. |
Is this possible a bug introduced by this PR? After merging this PR, I saw this error multiple times. https://issues.apache.org/jira/browse/SPARK-25422 |
|
looking. so far seems unrelated to me, but as you've said its failed in a few builds so I'm gonna keep digging. The error is occurring before any rdds are getting replicated via the new code path. This change is mostly not touching the path involved in sending a broadcast. I've been unable to repro so far despite running the test hundreds of times, but I might need to run more tests or put in some pauses or something. gonna compare with other test runs with teh failure as well. |
|
@squito Thanks for digging it! This PR introduced the failed test case. We have to know whether it exposes any serious bug (if it is not introduced by this PR) and impacts our 2.4 release. |
|
still looking -- will put comments on the jira so its more visible |
When replicating large cached RDD blocks, it can be helpful to replicate
them as a stream, to avoid using large amounts of memory during the
transfer. This also allows blocks larger than 2GB to be replicated.
Added unit tests in DistributedSuite. Also ran tests on a cluster for
blocks > 2gb.