-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-54131][PYTHON][TESTS] Update Pandas version 2.3.3
#52828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pandas version 2.3.3
Pandas version 2.3.3Pandas version 2.3.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @bjornjorgensen . I left a few comments.
dev/requirements.txt
Outdated
| pyarrow>=15.0.0 | ||
| six==1.16.0 | ||
| pandas>=2.2.0 | ||
| pandas>=2.3.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this because the minimum Pandas version is still 2.2.0.
spark/python/packaging/classic/setup.py
Line 153 in e837ff9
| _minimum_pandas_version = "2.2.0" |
| zlib1g-dev | ||
|
|
||
| ARG BASIC_PIP_PKGS="numpy==1.22.4 pyarrow==15.0.0 pandas==2.2.0 six==1.16.0 scipy scikit-learn coverage unittest-xml-reporting" | ||
| ARG BASIC_PIP_PKGS="numpy==1.22.4 pyarrow==15.0.0 pandas==2.3.3 six==1.16.0 scipy scikit-learn coverage unittest-xml-reporting" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this because the minimum Pandas version is still 2.2.0.
spark/python/packaging/classic/setup.py
Line 153 in e837ff9
| _minimum_pandas_version = "2.2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun ok, but should we change _minimum_pandas_version to 2.3.3. or do you still will have pandas version 2.2.0 as min? pandas 2.3.3 is the first one that support python 3.14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
I have changed min version back to 2.2.0 but let me know if we should raise the min version to 2.3.3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python 3.14 is not Apache PySpark's minimum support Python version. The minimum versions of Python libraries has different criteria, @bjornjorgensen , because the majority of PySpark users are still using Python 3.10/3.11/3.12/3.13.
but should we change _minimum_pandas_version to 2.3.3. or do you still will have pandas version 2.2.0 as min?
Pandas version 2.3.3Pandas version 2.3.3
|
I change the tittle for this PR to add [TESTS] now that its only for tests. CC @cloud-fan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @bjornjorgensen .
| ln -sf /usr/local/pypy/pypy3.10/bin/pypy /usr/local/bin/pypy3 | ||
| RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 | ||
| RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.3.0' scipy coverage matplotlib lxml | ||
| RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.3.3' scipy coverage matplotlib lxml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revise the following PR description due to this part. You can simply delete from 2.3.2 part.
What changes were proposed in this pull request?
Update pandas from 2.3.2 to 2.3.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed it for you, @bjornjorgensen .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun Thank you
|
there are so many python env spec for the CI... do we have a plan to unify? cc @HyukjinKwon |
|
Hi, @cloud-fan . FYI, it was split by @zhengruifeng recently at Apache Spark 4.0.0 via the following umbrella JIRA, SPARK-50294
|
### What changes were proposed in this pull request? Update pandas to 2.3.3 ### Why are the changes needed? New version with some bug fixes and support for python 3.14 _Pandas 2.3.3 is the first version of pandas that is generally compatible with the upcoming Python 3.14, and both wheels for free-threaded and normal Python 3.14 will be uploaded for this release._ [Release notes](https://pandas.pydata.org/pandas-docs/version/2.3/whatsnew/index.html#release) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass CI/CD tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52828 from bjornjorgensen/pandas-2_3_3. Authored-by: Bjørn Jørgensen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit fc3a590) Signed-off-by: Dongjoon Hyun <[email protected]>
|
Merged to master/4.1 for Apache Spark 4.1.0. Thank you, @bjornjorgensen and @cloud-fan . |

What changes were proposed in this pull request?
Update pandas to 2.3.3
Why are the changes needed?
New version with some bug fixes and support for python 3.14
Pandas 2.3.3 is the first version of pandas that is generally compatible with the upcoming Python 3.14, and both wheels for free-threaded and normal Python 3.14 will be uploaded for this release.
Release notes
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass CI/CD tests.
Was this patch authored or co-authored using generative AI tooling?
No.