-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-38774][PYTHON] Implement Series.autocorr #36048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
check against the Pandas side: |
|
Thanks for working on this @zhengruifeng ! cc @ueshin @xinrong-databricks @itholic FYI! |
python/pyspark/pandas/series.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should we define the column names in variables which are reused throughout the method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, will update soon
awdavidson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to add some basic tests around this?
|
Thanks @zhengruifeng! https://github.com/apache/spark/blob/master/python/pyspark/pandas/tests/test_series.py is a good place to add tests. It would be great to specify what changes in Does this PR introduce any user-facing change? section of the PR description. An example is good enough. |
|
@xinrong-databricks Will add the tests and update the PR description, thanks! |
python/pyspark/pandas/series.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add .. versionadded:: 3.4.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's document the .. versionadded:: 3.4.0 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
awdavidson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM
|
cc @HyukjinKwon , I think this PR is ready too |
python/pyspark/pandas/series.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should also add a note about the global window operation and its performance impact.
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
90ca30c to
78daa69
Compare
|
all tests passed |
|
Merged to master. |
|
Thanks all for reviewing! |
… consistent with Pandas ### What changes were proposed in this pull request? in `Series.autocorr`, rename `periods` as `lag` ### Why are the changes needed? when implementing the `Series.autocorr` in my first PS PR #36048 , I wrongly follow the parameter name `min_periods` in `Series.corr`, it should be `lag` to be the same with [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.Series.autocorr.html) ### Does this PR introduce _any_ user-facing change? no, since 3.4 is not released ### How was this patch tested? existing UTs Closes #38216 from zhengruifeng/ps_ser_autocorr_rename_parameter. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
… consistent with Pandas ### What changes were proposed in this pull request? in `Series.autocorr`, rename `periods` as `lag` ### Why are the changes needed? when implementing the `Series.autocorr` in my first PS PR apache#36048 , I wrongly follow the parameter name `min_periods` in `Series.corr`, it should be `lag` to be the same with [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.Series.autocorr.html) ### Does this PR introduce _any_ user-facing change? no, since 3.4 is not released ### How was this patch tested? existing UTs Closes apache#38216 from zhengruifeng/ps_ser_autocorr_rename_parameter. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
What changes were proposed in this pull request?
Implement Series.autocorr
Why are the changes needed?
for API coverage
Does this PR introduce any user-facing change?
yes, Series now support function
autocorrHow was this patch tested?
added doctest