[SPARK-38774][PYTHON] Implement Series.autocorr #36048

zhengruifeng · 2022-04-02T12:50:38Z

What changes were proposed in this pull request?

Implement Series.autocorr

Why are the changes needed?

for API coverage

Does this PR introduce any user-facing change?

yes, Series now support function autocorr

In [86]: s = pd.Series([.2, .0, .6, .2, np.nan, .5, .6])

In [87]: s.autocorr()
Out[87]: -0.14121975762272054

How was this patch tested?

added doctest

zhengruifeng · 2022-04-02T12:51:22Z

check against the Pandas side:

In [86]: s = pd.Series([.2, .0, .6, .2, np.nan, .5, .6])

In [87]: s.autocorr()
Out[87]: -0.14121975762272054

In [88]: s.autocorr(0)
Out[88]: 1.0

In [89]: s.autocorr(2)
Out[89]: 0.9707253433941511

In [90]: s.autocorr(-3)
Out[90]: 0.2773500981126146

In [91]: s.autocorr(5)
Out[91]: -0.9999999999999998

In [92]: s.autocorr(6)
/home/zrf/.zrf/anaconda3/lib/python3.9/site-packages/numpy/lib/function_base.py:2683: RuntimeWarning: Degrees of freedom <= 0 for slice
  c = cov(x, y, rowvar, dtype=dtype)
/home/zrf/.zrf/anaconda3/lib/python3.9/site-packages/numpy/lib/function_base.py:2542: RuntimeWarning: divide by zero encountered in true_divide
  c *= np.true_divide(1, fact)
Out[92]: nan

HyukjinKwon · 2022-04-03T00:53:43Z

Thanks for working on this @zhengruifeng ! cc @ueshin @xinrong-databricks @itholic FYI!

awdavidson · 2022-04-03T17:32:21Z

python/pyspark/pandas/series.py

Nit: should we define the column names in variables which are reused throughout the method?

good point, will update soon

awdavidson

Would be good to add some basic tests around this?

xinrong-meng · 2022-04-03T18:56:30Z

Thanks @zhengruifeng!

https://github.com/apache/spark/blob/master/python/pyspark/pandas/tests/test_series.py is a good place to add tests.

It would be great to specify what changes in Does this PR introduce any user-facing change? section of the PR description. An example is good enough.

zhengruifeng · 2022-04-04T00:48:43Z

@xinrong-databricks Will add the tests and update the PR description, thanks!

xinrong-meng · 2022-04-04T16:46:41Z

python/pyspark/pandas/series.py

Shall we add .. versionadded:: 3.4.0?

Yeah, let's document the .. versionadded:: 3.4.0 here.

awdavidson

+1 LGTM

zhengruifeng · 2022-04-13T03:54:12Z

cc @HyukjinKwon , I think this PR is ready too

HyukjinKwon · 2022-04-13T04:06:22Z

python/pyspark/pandas/series.py

Maybe we should also add a note about the global window operation and its performance impact.

HyukjinKwon

LGTM otherwise

HyukjinKwon · 2022-04-13T08:09:47Z

Build link: https://github.com/zhengruifeng/spark/runs/6003246221

zhengruifeng · 2022-04-13T09:01:10Z

all tests passed

HyukjinKwon · 2022-04-13T09:08:56Z

Merged to master.

zhengruifeng · 2022-04-13T09:17:45Z

Thanks all for reviewing!

… consistent with Pandas ### What changes were proposed in this pull request? in `Series.autocorr`, rename `periods` as `lag` ### Why are the changes needed? when implementing the `Series.autocorr` in my first PS PR #36048 , I wrongly follow the parameter name `min_periods` in `Series.corr`, it should be `lag` to be the same with [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.Series.autocorr.html) ### Does this PR introduce _any_ user-facing change? no, since 3.4 is not released ### How was this patch tested? existing UTs Closes #38216 from zhengruifeng/ps_ser_autocorr_rename_parameter. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>

… consistent with Pandas ### What changes were proposed in this pull request? in `Series.autocorr`, rename `periods` as `lag` ### Why are the changes needed? when implementing the `Series.autocorr` in my first PS PR apache#36048 , I wrongly follow the parameter name `min_periods` in `Series.corr`, it should be `lag` to be the same with [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.Series.autocorr.html) ### Does this PR introduce _any_ user-facing change? no, since 3.4 is not released ### How was this patch tested? existing UTs Closes apache#38216 from zhengruifeng/ps_ser_autocorr_rename_parameter. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>

github-actions bot added CORE PYTHON labels Apr 2, 2022

awdavidson reviewed Apr 3, 2022

View reviewed changes

xinrong-meng reviewed Apr 4, 2022

View reviewed changes

awdavidson approved these changes Apr 5, 2022

View reviewed changes

HyukjinKwon reviewed Apr 13, 2022

View reviewed changes

HyukjinKwon approved these changes Apr 13, 2022

View reviewed changes

zhengruifeng added 5 commits April 13, 2022 15:14

init

3bdd8dc

reformat

ecbd743

address comments

adce588

add versionadded

1389b00

add performance note

78daa69

zhengruifeng force-pushed the pandas_series_autocorr branch from 90ca30c to 78daa69 Compare April 13, 2022 07:28

HyukjinKwon closed this in eb699ec Apr 13, 2022

zhengruifeng deleted the pandas_series_autocorr branch April 13, 2022 09:18

zhengruifeng mentioned this pull request Oct 12, 2022

[SPARK-38774][PS][FOLLOW-UP] Make parameter name in Series.autocorr consistent with Pandas #38216

Closed

[SPARK-38774][PYTHON] Implement Series.autocorr #36048

[SPARK-38774][PYTHON] Implement Series.autocorr #36048

Uh oh!

Conversation

zhengruifeng commented Apr 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

zhengruifeng commented Apr 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Apr 3, 2022

Uh oh!

awdavidson Apr 3, 2022

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

awdavidson left a comment

Choose a reason for hiding this comment

Uh oh!

xinrong-meng commented Apr 3, 2022

Uh oh!

zhengruifeng commented Apr 4, 2022

Uh oh!

xinrong-meng Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

itholic Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Apr 5, 2022

Choose a reason for hiding this comment

Uh oh!

awdavidson left a comment

Choose a reason for hiding this comment

Uh oh!

zhengruifeng commented Apr 13, 2022

Uh oh!

HyukjinKwon Apr 13, 2022

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Apr 13, 2022

Uh oh!

zhengruifeng commented Apr 13, 2022

Uh oh!

HyukjinKwon commented Apr 13, 2022

Uh oh!

zhengruifeng commented Apr 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zhengruifeng commented Apr 2, 2022 •

edited

Loading

zhengruifeng commented Apr 2, 2022 •

edited

Loading