Commit c99463d
committed
[SPARK-26979][PYTHON][FOLLOW-UP] Make binary math/string functions take string as columns as well
## What changes were proposed in this pull request?
This is a followup of #23882 to handle binary math/string functions. For instance, see the cases below:
**Before:**
```python
>>> from pyspark.sql.functions import lit, ascii
>>> spark.range(1).select(lit('a').alias("value")).select(ascii("value"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/functions.py", line 51, in _
jc = getattr(sc._jvm.functions, name)(col._jc if isinstance(col, Column) else col)
File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1286, in __call__
File "/.../spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 332, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.sql.functions.ascii. Trace:
py4j.Py4JException: Method ascii([class java.lang.String]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339)
at py4j.Gateway.invoke(Gateway.java:276)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
```
```python
>>> from pyspark.sql.functions import atan2
>>> spark.range(1).select(atan2("id", "id"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/functions.py", line 78, in _
jc = getattr(sc._jvm.functions, name)(col1._jc if isinstance(col1, Column) else float(col1),
ValueError: could not convert string to float: id
```
**After:**
```python
>>> from pyspark.sql.functions import lit, ascii
>>> spark.range(1).select(lit('a').alias("value")).select(ascii("value"))
DataFrame[ascii(value): int]
```
```python
>>> from pyspark.sql.functions import atan2
>>> spark.range(1).select(atan2("id", "id"))
DataFrame[ATAN2(id, id): double]
```
Note that,
- This PR causes a slight behaviour changes for math functions. For instance, numbers as strings (e.g., `"1"`) were supported as arguments of binary math functions before. After this PR, it recognises it as column names.
- I also intentionally didn't document this behaviour changes since we're going ahead for Spark 3.0 and I don't think numbers as strings make much sense in math functions.
- There is another exception `when`, which takes string as literal values as below. This PR doeesn't fix this ambiguity.
```python
>>> spark.range(1).select(when(lit(True), col("id"))).show()
```
```
+--------------------------+
|CASE WHEN true THEN id END|
+--------------------------+
| 0|
+--------------------------+
```
```python
>>> spark.range(1).select(when(lit(True), "id")).show()
```
```
+--------------------------+
|CASE WHEN true THEN id END|
+--------------------------+
| id|
+--------------------------+
```
This PR also fixes as below:
#23882 fixed it to:
- Rename `_create_function` to `_create_name_function`
- Define new `_create_function` to take strings as column names.
This PR, I proposes to:
- Revert `_create_name_function` name to `_create_function`.
- Define new `_create_function_over_column` to take strings as column names.
## How was this patch tested?
Some unit tests were added for binary math / string functions.
Closes #24121 from HyukjinKwon/SPARK-26979.
Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>1 parent 8b0aa59 commit c99463d
2 files changed
+64
-29
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
39 | 45 | | |
40 | | - | |
41 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
42 | 49 | | |
43 | 50 | | |
44 | 51 | | |
| |||
48 | 55 | | |
49 | 56 | | |
50 | 57 | | |
51 | | - | |
52 | | - | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
53 | 63 | | |
54 | 64 | | |
55 | 65 | | |
| |||
71 | 81 | | |
72 | 82 | | |
73 | 83 | | |
74 | | - | |
75 | | - | |
76 | | - | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
77 | 101 | | |
78 | 102 | | |
79 | 103 | | |
| |||
96 | 120 | | |
97 | 121 | | |
98 | 122 | | |
99 | | - | |
100 | | - | |
| 123 | + | |
101 | 124 | | |
102 | 125 | | |
103 | 126 | | |
104 | 127 | | |
105 | 128 | | |
106 | 129 | | |
107 | 130 | | |
108 | | - | |
109 | | - | |
110 | | - | |
| 131 | + | |
111 | 132 | | |
112 | 133 | | |
113 | 134 | | |
| |||
120 | 141 | | |
121 | 142 | | |
122 | 143 | | |
123 | | - | |
| 144 | + | |
124 | 145 | | |
125 | 146 | | |
126 | 147 | | |
| |||
155 | 176 | | |
156 | 177 | | |
157 | 178 | | |
158 | | - | |
| 179 | + | |
159 | 180 | | |
160 | 181 | | |
161 | 182 | | |
| |||
186 | 207 | | |
187 | 208 | | |
188 | 209 | | |
189 | | - | |
| 210 | + | |
190 | 211 | | |
191 | 212 | | |
192 | 213 | | |
| |||
203 | 224 | | |
204 | 225 | | |
205 | 226 | | |
206 | | - | |
| 227 | + | |
207 | 228 | | |
208 | 229 | | |
209 | 230 | | |
| |||
268 | 289 | | |
269 | 290 | | |
270 | 291 | | |
271 | | - | |
272 | | - | |
273 | 292 | | |
274 | 293 | | |
275 | | - | |
276 | | - | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
277 | 298 | | |
278 | 299 | | |
279 | 300 | | |
280 | 301 | | |
281 | | - | |
282 | | - | |
283 | | - | |
284 | | - | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
285 | 306 | | |
286 | 307 | | |
287 | | - | |
288 | | - | |
| 308 | + | |
| 309 | + | |
289 | 310 | | |
290 | 311 | | |
291 | 312 | | |
| |||
1450 | 1471 | | |
1451 | 1472 | | |
1452 | 1473 | | |
| 1474 | + | |
| 1475 | + | |
1453 | 1476 | | |
1454 | 1477 | | |
1455 | 1478 | | |
| |||
1460 | 1483 | | |
1461 | 1484 | | |
1462 | 1485 | | |
1463 | | - | |
| 1486 | + | |
1464 | 1487 | | |
1465 | 1488 | | |
1466 | 1489 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
132 | 138 | | |
133 | 139 | | |
134 | 140 | | |
| |||
151 | 157 | | |
152 | 158 | | |
153 | 159 | | |
154 | | - | |
| 160 | + | |
| 161 | + | |
155 | 162 | | |
156 | 163 | | |
157 | 164 | | |
| |||
162 | 169 | | |
163 | 170 | | |
164 | 171 | | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
165 | 177 | | |
166 | 178 | | |
167 | 179 | | |
| |||
0 commit comments