Skip to content

Conversation

@bjornjorgensen
Copy link
Contributor

@bjornjorgensen bjornjorgensen commented Nov 1, 2025

What changes were proposed in this pull request?

Update pandas to 2.3.3

Why are the changes needed?

New version with some bug fixes and support for python 3.14

Pandas 2.3.3 is the first version of pandas that is generally compatible with the upcoming Python 3.14, and both wheels for free-threaded and normal Python 3.14 will be uploaded for this release.

Release notes

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass CI/CD tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@bjornjorgensen bjornjorgensen changed the title [WIP] Pandas 2 3 3 [WIP] [SPARK-54131][PYTHON] Update Pandas version 2.3.3 Nov 1, 2025
@bjornjorgensen bjornjorgensen changed the title [WIP] [SPARK-54131][PYTHON] Update Pandas version 2.3.3 [SPARK-54131][PYTHON] Update Pandas version 2.3.3 Nov 1, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @bjornjorgensen . I left a few comments.

pyarrow>=15.0.0
six==1.16.0
pandas>=2.2.0
pandas>=2.3.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this because the minimum Pandas version is still 2.2.0.

_minimum_pandas_version = "2.2.0"

zlib1g-dev

ARG BASIC_PIP_PKGS="numpy==1.22.4 pyarrow==15.0.0 pandas==2.2.0 six==1.16.0 scipy scikit-learn coverage unittest-xml-reporting"
ARG BASIC_PIP_PKGS="numpy==1.22.4 pyarrow==15.0.0 pandas==2.3.3 six==1.16.0 scipy scikit-learn coverage unittest-xml-reporting"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this because the minimum Pandas version is still 2.2.0.

_minimum_pandas_version = "2.2.0"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun ok, but should we change _minimum_pandas_version to 2.3.3. or do you still will have pandas version 2.2.0 as min? pandas 2.3.3 is the first one that support python 3.14

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
I have changed min version back to 2.2.0 but let me know if we should raise the min version to 2.3.3.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3.14 is not Apache PySpark's minimum support Python version. The minimum versions of Python libraries has different criteria, @bjornjorgensen , because the majority of PySpark users are still using Python 3.10/3.11/3.12/3.13.

but should we change _minimum_pandas_version to 2.3.3. or do you still will have pandas version 2.2.0 as min?

@bjornjorgensen bjornjorgensen changed the title [SPARK-54131][PYTHON] Update Pandas version 2.3.3 [SPARK-54131][PYTHON][TESTS] Update Pandas version 2.3.3 Nov 3, 2025
@bjornjorgensen
Copy link
Contributor Author

I change the tittle for this PR to add [TESTS] now that its only for tests. CC @cloud-fan

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @bjornjorgensen .

ln -sf /usr/local/pypy/pypy3.10/bin/pypy /usr/local/bin/pypy3
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.3.0' scipy coverage matplotlib lxml
RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.3.3' scipy coverage matplotlib lxml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revise the following PR description due to this part. You can simply delete from 2.3.2 part.

What changes were proposed in this pull request?

Update pandas from 2.3.2 to 2.3.3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it for you, @bjornjorgensen .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Thank you

@cloud-fan
Copy link
Contributor

there are so many python env spec for the CI... do we have a plan to unify? cc @HyukjinKwon

@dongjoon-hyun
Copy link
Member

Hi, @cloud-fan . FYI, it was split by @zhengruifeng recently at Apache Spark 4.0.0 via the following umbrella JIRA, SPARK-50294 Refactor docker image for testing.

Screenshot 2025-11-03 at 10 54 50

dongjoon-hyun pushed a commit that referenced this pull request Nov 3, 2025
### What changes were proposed in this pull request?
Update pandas to 2.3.3

### Why are the changes needed?
New version with some bug fixes and support for python 3.14

_Pandas 2.3.3 is the first version of pandas that is generally compatible with the upcoming Python 3.14, and both wheels for free-threaded and normal Python 3.14 will be uploaded for this release._

[Release notes](https://pandas.pydata.org/pandas-docs/version/2.3/whatsnew/index.html#release)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass CI/CD tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #52828 from bjornjorgensen/pandas-2_3_3.

Authored-by: Bjørn Jørgensen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit fc3a590)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Merged to master/4.1 for Apache Spark 4.1.0.

Thank you, @bjornjorgensen and @cloud-fan .

@bjornjorgensen bjornjorgensen deleted the pandas-2_3_3 branch November 3, 2025 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants