Commit 9c35c43
committed
[SPARK-52580][PS] Avoid CAST_INVALID_INPUT of
### What changes were proposed in this pull request?
Avoid CAST_INVALID_INPUT of `replace` in ANSI mode.
Specifically, under ANSI mode
- used try_cast() to safely cast values
- NaN checks, we now avoid F.isnan() on non-numeric types
An example of the spark plan difference between ANSI on/off is:
```
# if the original column is of StringType
# ANSI off
Column<'CASE WHEN in(C, 0, 1, 2, 3, 5, 6) THEN 4 ELSE C END'>
# ANSI on
Column<'CASE WHEN in(C, TRY_CAST(0 AS STRING), TRY_CAST(1 AS STRING), TRY_CAST(2 AS STRING), TRY_CAST(3 AS STRING), TRY_CAST(5 AS STRING), TRY_CAST(6 AS STRING)) THEN TRY_CAST(4 AS STRING) ELSE TRY_CAST(C AS STRING) END'>
```
### Why are the changes needed?
Ensure pandas on Spark works well with ANSI mode on.
Part of https://issues.apache.org/jira/browse/SPARK-52556.
### Does this PR introduce _any_ user-facing change?
Yes, `replace` works in ANSI, for example
```py
>>> ps.set_option("compute.fail_on_ansi_mode", False)
>>> ps.set_option("compute.ansi_mode_support", True)
>>> pdf = pd.DataFrame(
... {"A": [0, 1, 2, 3, np.nan], "B": [5, 6, 7, 8, np.nan], "C": ["a", "b", "c", "d", None]},
... index=np.random.rand(5),
... )
>>> psdf = ps.from_pandas(pdf)
>>> psdf["C"].replace([0, 1, 2, 3, 5, 6], 4)
0.458472 a
0.749773 b
0.222904 c
0.397280 d
0.293933 None
Name: C, dtype: object
>>> psdf.replace([0, 1, 2, 3, 5, 6], [6, 5, 4, 3, 2, 1])
A B C
0.458472 6.0 2.0 a
0.749773 5.0 1.0 b
0.222904 4.0 7.0 c
0.397280 3.0 8.0 d
0.293933 NaN NaN None
```
### How was this patch tested?
Unit tests
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #51297 from xinrong-meng/replace.
Authored-by: Xinrong Meng <[email protected]>
Signed-off-by: Xinrong Meng <[email protected]>replace in ANSI mode1 parent e9a285e commit 9c35c43
File tree
2 files changed
+46
-11
lines changed- python/pyspark/pandas
- tests/computation
2 files changed
+46
-11
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
| 107 | + | |
107 | 108 | | |
108 | 109 | | |
109 | 110 | | |
| |||
5106 | 5107 | | |
5107 | 5108 | | |
5108 | 5109 | | |
| 5110 | + | |
| 5111 | + | |
| 5112 | + | |
| 5113 | + | |
| 5114 | + | |
5109 | 5115 | | |
5110 | 5116 | | |
5111 | 5117 | | |
5112 | 5118 | | |
5113 | 5119 | | |
5114 | 5120 | | |
5115 | | - | |
5116 | | - | |
5117 | | - | |
5118 | | - | |
5119 | | - | |
| 5121 | + | |
| 5122 | + | |
| 5123 | + | |
| 5124 | + | |
| 5125 | + | |
| 5126 | + | |
| 5127 | + | |
| 5128 | + | |
| 5129 | + | |
| 5130 | + | |
| 5131 | + | |
| 5132 | + | |
| 5133 | + | |
5120 | 5134 | | |
5121 | | - | |
| 5135 | + | |
5122 | 5136 | | |
5123 | 5137 | | |
5124 | | - | |
| 5138 | + | |
5125 | 5139 | | |
5126 | 5140 | | |
5127 | 5141 | | |
5128 | 5142 | | |
5129 | 5143 | | |
5130 | 5144 | | |
5131 | | - | |
| 5145 | + | |
| 5146 | + | |
| 5147 | + | |
| 5148 | + | |
| 5149 | + | |
| 5150 | + | |
| 5151 | + | |
| 5152 | + | |
| 5153 | + | |
| 5154 | + | |
| 5155 | + | |
5132 | 5156 | | |
5133 | 5157 | | |
5134 | | - | |
5135 | | - | |
| 5158 | + | |
| 5159 | + | |
| 5160 | + | |
| 5161 | + | |
| 5162 | + | |
| 5163 | + | |
| 5164 | + | |
| 5165 | + | |
| 5166 | + | |
| 5167 | + | |
| 5168 | + | |
| 5169 | + | |
| 5170 | + | |
| 5171 | + | |
5136 | 5172 | | |
5137 | 5173 | | |
5138 | 5174 | | |
| |||
Lines changed: 0 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
277 | | - | |
278 | 277 | | |
279 | 278 | | |
280 | 279 | | |
| |||
0 commit comments