Skip to content

Conversation

@witgo
Copy link
Contributor

@witgo witgo commented Oct 24, 2014

cc @rxin

@witgo witgo changed the title NioBlockTransferService.fetchBlocks may cause spark to hang. [SPARK-4064]NioBlockTransferService.fetchBlocks may cause spark to hang. Oct 24, 2014
@SparkQA
Copy link

SparkQA commented Oct 24, 2014

Test build #22137 has started for PR 2929 at commit f8f3195.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

Test build #22137 has finished for PR 2929 at commit f8f3195.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22137/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change looks good. but can you comment on when this would happen and put it into the source code?

thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @aarondav since this will conflict with your change

@SparkQA
Copy link

SparkQA commented Oct 25, 2014

Test build #22216 has started for PR 2929 at commit c23e562.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 25, 2014

Test build #22216 has finished for PR 2929 at commit c23e562.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22216/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this say "Received empty message from $cmId" or "Received no data from $cmId"?

@SparkQA
Copy link

SparkQA commented Oct 27, 2014

Test build #22269 has started for PR 2929 at commit 20110f2.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 27, 2014

Test build #22269 has finished for PR 2929 at commit 20110f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22269/
Test PASSed.

@rxin
Copy link
Contributor

rxin commented Oct 28, 2014

Thanks. I'm merging this one.

@rxin
Copy link
Contributor

rxin commented Oct 28, 2014

@aarondav this will conflict with your pr.

@asfgit asfgit closed this in 7c0c26c Oct 28, 2014
@witgo witgo deleted the SPARK-4064 branch October 28, 2014 08:38
@JoshRosen
Copy link
Contributor

This is moot now, since the related code has already changed, but I just wanted to point out that this PR seems subtly incorrect in cases where some but not all of of the blocks are missing in the remote block manager. As of this commit, getBlock returns null when blocks are missing, which ends up getting wrapped into a None in processBlock, so a fetch failure for a particular block results in the absence of a block result rather than an explicit error response. In some later commits, this was changed (perhaps unintentionally) so that fetch failures result in exceptions that result in connection manager error messages, so these errors are now handled in the future's onFailure handler.

In retrospect, I think that the right fix here would have been to either a) send negative error responses, or b) have the requester match the responses it received against the ids in its request and treat any missing blocks as failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants