Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Jul 19, 2016

What changes were proposed in this pull request?

Document RDD.pipe semantics; don't execute process for empty input partitions.

Note this includes the fix in #14256 because it's necessary to even test this. One or the other will merge the fix.

How was this patch tested?

Jenkins tests including new test.

@SparkQA
Copy link

SparkQA commented Jul 19, 2016

Test build #62521 has finished for PR 14260 at commit 62692af.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 19, 2016

Is it possible that the underlying command always return something even for 0 rows? e.g. if it is counting the number of elements?

@srowen
Copy link
Member Author

srowen commented Jul 19, 2016

Yeah that's the 'problem' -- consider wc -l which returns 0 for no input at all. That explains the behavior now, where an empty partition results in a partition of one element: "0". That seems a bit odd. It's not that odd because pipe is actually quite partition-oriented. I don't feel strongly about changing the behavior of an empty partition though, if anyone thinks it actual has the right semantics. But in any event the current behavior could be documented better.

@SparkQA
Copy link

SparkQA commented Jul 20, 2016

Test build #62595 has finished for PR 14260 at commit 4f9d4d8.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Jul 20, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jul 20, 2016

Test build #62601 has finished for PR 14260 at commit 4f9d4d8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 20, 2016

LGTM

@rxin
Copy link
Contributor

rxin commented Jul 20, 2016

Merging in master/2.0.

asfgit pushed a commit that referenced this pull request Jul 20, 2016
## What changes were proposed in this pull request?

Document RDD.pipe semantics; don't execute process for empty input partitions.

Note this includes the fix in #14256 because it's necessary to even test this. One or the other will merge the fix.

## How was this patch tested?

Jenkins tests including new test.

Author: Sean Owen <[email protected]>

Closes #14260 from srowen/SPARK-16613.

(cherry picked from commit 4b079dc)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 4b079dc Jul 20, 2016
@srowen srowen deleted the SPARK-16613 branch July 24, 2016 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants