Skip to content

Conversation

@LinhongLiu
Copy link
Contributor

@LinhongLiu LinhongLiu commented Nov 15, 2018

What changes were proposed in this pull request?

Empty chunk in ChunkedByteBuffer will truncate the ChunkedByteBufferInputStream.
The detail reason is described in: https://issues.apache.org/jira/browse/SPARK-26068

How was this patch tested?

Modified current UT to cover this case.

@LinhongLiu LinhongLiu force-pushed the fix-empty-chunked-byte-buffer branch from ab81c1e to fa7af44 Compare November 15, 2018 05:21
@LinhongLiu
Copy link
Contributor Author

cc @xuanyuanking

@advancedxy
Copy link
Contributor

cc @ericl and @JoshRosen, this bug was introduced by https://github.com/apache/spark/pull/14099/files

After loosing empty chunk check, the ChunkedByteBufferInputStream doesn't handle empty chunks correctly

xuanyuanking
xuanyuanking approved these changes Nov 15, 2018
Copy link
Member

@xuanyuanking xuanyuanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, after looking into the code in detail the problem will not happened in Spark, @linhong-intel maybe you can leave the detail analysis and close this.

@LinhongLiu
Copy link
Contributor Author

Problem:
ChunkedByteBuffer has signature ChunkedByteBuffer(var chunks: Array[ByteBuffer]). This means user is allowed to pass any kind of chunks to it. Then we will face the problem described in JIRA. That's why I submit this PR.

But on the other hand:
This is an internal class and Spark safely uses ChunkedByteBuffer in 2 ways:

  1. Use ChunkedByteBuffer(byteBuffer: ByteBuffer) to pass only one buffer. Even if it's an empty one, spark will handle this case correctly.
  2. Use ChunkedByteBufferOutputStream to create ChunkedByteBuffer with multiple chunks. In this case, empty ByteBuffer will never happen.

As a result, current spark code will never reach the problem as far as we won't use ChunkedByteBuffer(var chunks: Array[ByteBuffer]) directly.

So it's both OK either we fix this or not.

@cloud-fan
Copy link
Contributor

It's good to fix a potential bug, can you add a unit test?

@LinhongLiu
Copy link
Contributor Author

@cloud-fan Thanks for your reply.
could you please check if current test case change is enough to cover this bug?
If not, I'll add another unit test.

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Nov 15, 2018

Test build #98878 has finished for PR 23040 at commit fa7af44.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

extends InputStream {

private[this] var chunks = chunkedByteBuffer.getChunks().iterator
private[this] var chunks = chunkedByteBuffer.getChunks().filter(_.hasRemaining).iterator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment above, saying that we do this filter because read assumes chunks has no empty chunk?

@cloud-fan
Copy link
Contributor

LGTM except one comment

@cloud-fan
Copy link
Contributor

also cc @jiangxb1987 @zsxwing

@LinhongLiu
Copy link
Contributor Author

cc @cloud-fan @srowen
review is fixed.

@SparkQA
Copy link

SparkQA commented Nov 19, 2018

Test build #98989 has finished for PR 23040 at commit 3c6d349.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in b58b1fd Nov 20, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…chunks correctly

## What changes were proposed in this pull request?

Empty chunk in ChunkedByteBuffer will truncate the ChunkedByteBufferInputStream.
The detail reason is described in: https://issues.apache.org/jira/browse/SPARK-26068

## How was this patch tested?
Modified current UT to cover this case.

Closes apache#23040 from LinhongLiu/fix-empty-chunked-byte-buffer.

Lead-authored-by: Liu,Linhong <[email protected]>
Co-authored-by: Xianjin YE <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants