-
Notifications
You must be signed in to change notification settings - Fork 29k
[DOCS][SPARK-18365] Improve Sample Method Documentation #15815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c6e1000
9f04fa8
e46e6a7
ce2bb90
a257d81
19c4828
c94b75c
b4f2611
fae4a80
f29acb4
064c653
f464b6e
0d7cde8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -549,6 +549,11 @@ def distinct(self): | |
| def sample(self, withReplacement, fraction, seed=None): | ||
| """Returns a sampled subset of this :class:`DataFrame`. | ||
|
|
||
| .. note:: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tiny question about this syntax (I don't know it) -- I see other instances of this in the code base have to use a line continuation
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've found that this syntax works quite well. I'm not familiar with the line continuation syntax that you're referring to but this will display appropriately (on one line as a sentence). |
||
|
|
||
| This is not guaranteed to provide exactly the fraction specified of the total count | ||
| of the given :class:`DataFrame`. | ||
|
|
||
| >>> df.sample(False, 0.5, 42).count() | ||
| 2 | ||
| """ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple overloads of sample here; update them all and maybe apply the clarification about seed you added below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Will add this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also another method I forgot in the python RDD that I will fix now as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, the Python one didn't need it once i re-read the docs.