Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Update python/pyspark/sql/pandas/conversion.py
  • Loading branch information
HyukjinKwon authored Nov 18, 2021
commit eb2a55ee3f86341bb6e4d73caef4f55e79713daf
2 changes: 1 addition & 1 deletion python/pyspark/sql/pandas/conversion.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ def toPandas(self) -> "PandasDataFrameLike":
else:
series = pdf[column_name]

# No need to cast for empty series for timedelta.
# No need to cast for non-empty series for timedelta. The type is already correct.
should_check_timedelta = is_timedelta64_dtype(t) and len(pdf) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you actually meaning len(pdf) != 0? Or I miss-read the code/comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh comments are wrong. Let me rewrite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is, BTW, to work around a bug from Arrow <> pandas.

For some reasons, pd.Series(pd.Timedelta(...), dtype="object") created from Arrow becomes float64 when you cast with series.astype("timedelta64[us]") when the data is non-empty - this cannot be reproduced with plain pandas Series.

So, here I avoided it by just skipping the casting because the type becomes correct when it is not empty. When data is empty, the type becomes object, and it has to be casted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating it. Looks good now.


if (t is not None and not is_timedelta64_dtype(t)) or should_check_timedelta:
Expand Down