-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8373][PySpark]Add emptyRDD to pyspark and fix the issue when calling sum on an empty RDD #6826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #34927 has finished for PR 6826 at commit
|
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emptyRDD[T] should produce an RDD of T, not byte arrays right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T in the return type is not used. Actually, it can be an arbitrary type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update it to T
|
Test build #34935 has finished for PR 6826 at commit
|
|
/cc @davies |
|
LGTM |
|
Test build #34937 has finished for PR 6826 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this API used anywhere? If the answer is "no", we should remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just bringing it to parity with the scala SparkContext API, which we currently also only use in tests. This is probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is PythonRDD, is not JavaSparkContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see
… calling sum on an empty RDD This PR fixes the sum issue and also adds `emptyRDD` so that it's easy to create a test case. Author: zsxwing <[email protected]> Closes #6826 from zsxwing/python-emptyRDD and squashes the following commits: b36993f [zsxwing] Update the return type to JavaRDD[T] 71df047 [zsxwing] Add emptyRDD to pyspark and fix the issue when calling sum on an empty RDD (cherry picked from commit 0fc4b96) Signed-off-by: Andrew Or <[email protected]>
This is a follow-up PR to remove unused `PythonRDD.emptyRDD` added by #6826 Author: zsxwing <[email protected]> Closes #6867 from zsxwing/remove-PythonRDD-emptyRDD and squashes the following commits: b66d363 [zsxwing] Remove PythonRDD.emptyRDD
… calling sum on an empty RDD This PR fixes the sum issue and also adds `emptyRDD` so that it's easy to create a test case. Author: zsxwing <[email protected]> Closes apache#6826 from zsxwing/python-emptyRDD and squashes the following commits: b36993f [zsxwing] Update the return type to JavaRDD[T] 71df047 [zsxwing] Add emptyRDD to pyspark and fix the issue when calling sum on an empty RDD
This is a follow-up PR to remove unused `PythonRDD.emptyRDD` added by apache#6826 Author: zsxwing <[email protected]> Closes apache#6867 from zsxwing/remove-PythonRDD-emptyRDD and squashes the following commits: b66d363 [zsxwing] Remove PythonRDD.emptyRDD
This PR fixes the sum issue and also adds
emptyRDDso that it's easy to create a test case.