-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-2871] [PySpark] Add missing API #1791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
ff2cbe3
add missing API in SparkContext
davies e0b3d30
add histogram()
davies 5d5be95
change histogram API
davies a95eca0
add zipWithIndex and zipWithUniqueId
davies 4ffae00
collectPartitions()
davies 7a9ea0a
update docs of histogram
davies 53640be
histogram() in pure Python, better support for int
davies 9a01ac3
fix docs
davies 7ba5f88
refactor
davies a25c34e
fix bug of countApproxDistinct
davies 1218b3b
add countApprox and countApproxDistinct
davies 034124f
Merge branch 'master' into api
davies 9132456
fix pep8
davies 977e474
address comments: improve docs
davies ac606ca
comment out not implemented APIs
davies f0158e4
comment out not implemented API in SparkContext
davies cb4f712
Mark SparkConf as read-only after initialization
davies 96713fa
Merge branch 'master' into api
davies e9e1037
Merge branch 'master' into api
davies 63c013d
address all the comments:
davies 1213aca
Merge branch 'master' into api
davies 28fd368
Merge branch 'master' into api
davies 1ac98d6
remove splitted changes
davies 657a09b
remove countApproxDistinct()
davies File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
collectPartitions()
- Loading branch information
commit 4ffae0031e1f00641845fc5e9e3b62f54e7c56ad
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Scala API, this is marked as a private API used only for tests. Is there a non-test usecase for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will help for debug, you can collect parts of the RDD to investigate with them.
It also be helpful if we have an API called slice(start, [end]) to select parts of the partitions. DPark has this kind of API, it help us a lot, Narrow down the data to do fast debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Josh, let's delete this for now. We can open a separate JIRA about making it public and maybe discuss there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I do like a slice-based API in general, that might be what we propose publicly.