-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32916][SHUFFLE][test-maven][test-hadoop2.7] Ensure the number of chunks in meta file and index file are equal #30433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
72c115c
f5783a9
72b5976
5422f10
0bce631
f48d52f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
…xceeded threshold
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -526,21 +526,17 @@ private boolean isDuplicateBlock() { | |
| * This is only invoked when the stream is able to write. The stream first writes any deferred | ||
| * block parts buffered in memory. | ||
| */ | ||
| private void writeAnyDeferredBufs() throws IOException { | ||
| if (deferredBufs != null && !deferredBufs.isEmpty()) { | ||
| for (ByteBuffer deferredBuf : deferredBufs) { | ||
| writeBuf(deferredBuf); | ||
| } | ||
| deferredBufs = null; | ||
| private void writeDeferredBufs() throws IOException { | ||
| for (ByteBuffer deferredBuf : deferredBufs) { | ||
| writeBuf(deferredBuf); | ||
| } | ||
| deferredBufs = null; | ||
| } | ||
|
|
||
| /** | ||
| * This throws RuntimeException which will abort the merge of a particular shuffle partition. | ||
| * This throws RuntimeException if the number of IOExceptions have exceeded threshold. | ||
| */ | ||
| private void incrementIOExceptionsAndAbortIfNecessary() { | ||
| // Update the count of IOExceptions | ||
| partitionInfo.incrementIOExceptions(); | ||
| private void abortIfNecessary() { | ||
| if (partitionInfo.shouldAbort(mergeManager.ioExceptionsThresholdDuringMerge)) { | ||
| deferredBufs = null; | ||
| throw new RuntimeException(String.format("%s when merging %s", | ||
|
|
@@ -549,6 +545,16 @@ private void incrementIOExceptionsAndAbortIfNecessary() { | |
| } | ||
| } | ||
|
|
||
| /** | ||
| * This increments the number of IOExceptions and throws RuntimeException if it exceeds the | ||
| * threshold which will abort the merge of a particular shuffle partition. | ||
| */ | ||
| private void incrementIOExceptionsAndAbortIfNecessary() { | ||
| // Update the count of IOExceptions | ||
| partitionInfo.incrementIOExceptions(); | ||
| abortIfNecessary(); | ||
| } | ||
|
|
||
| @Override | ||
| public void onData(String streamId, ByteBuffer buf) throws IOException { | ||
| // When handling the block data using StreamInterceptor, it can help to reduce the amount | ||
|
|
@@ -586,6 +592,7 @@ public void onData(String streamId, ByteBuffer buf) throws IOException { | |
| deferredBufs = null; | ||
| return; | ||
| } | ||
| abortIfNecessary(); | ||
| logger.trace("{} shuffleId {} reduceId {} onData writable", | ||
| partitionInfo.appShuffleId.appId, partitionInfo.appShuffleId.shuffleId, | ||
| partitionInfo.reduceId); | ||
|
|
@@ -596,7 +603,9 @@ public void onData(String streamId, ByteBuffer buf) throws IOException { | |
| // If we got here, it's safe to write the block data to the merged shuffle file. We | ||
| // first write any deferred block. | ||
| try { | ||
| writeAnyDeferredBufs(); | ||
| if (deferredBufs != null && !deferredBufs.isEmpty()) { | ||
| writeDeferredBufs(); | ||
| } | ||
| writeBuf(buf); | ||
| } catch (IOException ioe) { | ||
| incrementIOExceptionsAndAbortIfNecessary(); | ||
|
|
@@ -674,7 +683,10 @@ public void onComplete(String streamId) throws IOException { | |
| } | ||
| if (partitionInfo.getCurrentMapIndex() < 0) { | ||
| try { | ||
| writeAnyDeferredBufs(); | ||
| if (deferredBufs != null && !deferredBufs.isEmpty()) { | ||
| abortIfNecessary(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does this need to be inside the if block, instead of keeping
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking that if a stream is completing but doesn't need to write any deferred bufs then we should let it complete without failures. |
||
| writeDeferredBufs(); | ||
| } | ||
| } catch (IOException ioe) { | ||
| incrementIOExceptionsAndAbortIfNecessary(); | ||
| // If the above doesn't throw a RuntimeException, then we propagate the IOException | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should invoke this here.
Invoking inside
onCompleteshould be sufficient.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Purpose is to only do this check once per block instead of once per buf and avoid throwing unnecessary exceptions from onData which leads to channel close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Victsm This change is to address this comment:
#30433 (comment)
The scenario is that there can be pending streams which are waiting on the lock for the
partitionInfoand meanwhile the exception threshold has met. When the pending stream acquires the lock it will attempt to write to the data file even though the exception threshold is reached.I have added the UT
testPendingBlockIsAbortedImmediatelyto verify this.We already throw IOExceptions when write to data file fails. I don't see how throwing
IOException exceeded thresholdmakes it any worse.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I see the issue mentioned in that comment is that we are not preventing new blocks to be merged when the IOException threshold is reached.
To do that, we only need to invoke
abortIfNecessaryinsideonComplete, whether we still have any deferredBuf to write at that point.This way, for normal case without IOException, we are only invoking
abortIfNecessaryonce per block.By invoking it here, we would invoke it once per buf for normal case.
Of course, if we only check inside
onComplete, we would delay rejection of these pending blocks until we reach their stream's end.I think this is a reasonable tradeoff to make, considering that majority of the time the code is executing for normal case instead of the exception case.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm. I think the intention is to not have the server attempt writing if the threshold is reached for the partition. Probably this check here makes this behavior more accurate.
However, I don't have a strong opinion on this since the assumption is that if these number of IOExceptions have already reached the threshold, any further writes will result in an IOException as well. With that assumption, the write in
onDataafter this threshold is met, will very likely throwIOExceptionas well and since the threshold is already met, the server will instead throwIOExceptions exceeded threshold.I can remove it from
onData. @Ngone51 What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ngone51 @mridulm , by throwing exception inside
onComplete, we are still effectively rejecting any new blocks from being successfully appended to the file.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had an offline discussion with @mridulm and @otterc , the concern about closing channels with throwing exception mid-stream seems negligible since it only happens after reaching the max IOException count.
Calling
abortIfNecessaryinsideonDatashould also have negligible performance implications.With those, I think it should be fine to keep this part of the code as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering does the server closes the channel or client would stop streaming the remaining data when we throw the exception in
onData. Otherwise, we'd receive the following data from the client and throw the exception for multiple times.(I didn't find anywhere we close the channel or the client may have a chance to stop streaming. I may miss it somewhere.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ngone51 I think that's an existing behavior in Spark with how it uses
StreamInterceptorto stream data coming from a RPC message that's out of the frame.This is also documented with SPARK-6237 in #21346:
spark/common/network-common/src/main/java/org/apache/spark/network/server/RpcHandler.java
Lines 62 to 66 in 82aca7e
It is the server which closes the channel if exception is thrown from
onData.Once an exception gets thrown from
onDatawhileStreamInterceptorhasn't finished processing all the out of frame bytes for a given RPC message, theTransportFrameDecoderwill no longer be able to successfully decode following RPC messages from this channel.Thus, the server needs to close the channel at this point.
Once the channel gets closed, the client will no longer be able to transfer any more data to the server using the same channel.
The connection needs to be reestablished at this time, resetting state on the client side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The client does receive the exception thrown from
onData. I simulated an exception fromonDataat the server for a particular push block.Below are the logs of the client.
Note: I am running an older version of magnet which still uses
ShuffleBlockIdfor shuffle push and some other classes are old.After this, the client logs show that the server has terminated the connection.