Skip to content

Commit 07f390a

Browse files
viiryacloud-fan
authored andcommitted
[SPARK-22347][PYSPARK][DOC] Add document to notice users for using udfs with conditional expressions
## What changes were proposed in this pull request? Under the current execution mode of Python UDFs, we don't well support Python UDFs as branch values or else value in CaseWhen expression. Since to fix it might need the change not small (e.g., #19592) and this issue has simpler workaround. We should just notice users in the document about this. ## How was this patch tested? Only document change. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #19617 from viirya/SPARK-22347-3.
1 parent 96798d1 commit 07f390a

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

python/pyspark/sql/functions.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2185,6 +2185,13 @@ def udf(f=None, returnType=StringType()):
21852185
duplicate invocations may be eliminated or the function may even be invoked more times than
21862186
it is present in the query.
21872187
2188+
.. note:: The user-defined functions do not support conditional execution by using them with
2189+
SQL conditional expressions such as `when` or `if`. The functions still apply on all rows no
2190+
matter the conditions are met or not. So the output is correct if the functions can be
2191+
correctly run on all rows without failure. If the functions can cause runtime failure on the
2192+
rows that do not satisfy the conditions, the suggested workaround is to incorporate the
2193+
condition logic into the functions.
2194+
21882195
:param f: python function if used as a standalone function
21892196
:param returnType: a :class:`pyspark.sql.types.DataType` object
21902197
@@ -2278,6 +2285,13 @@ def pandas_udf(f=None, returnType=StringType()):
22782285
.. seealso:: :meth:`pyspark.sql.GroupedData.apply`
22792286
22802287
.. note:: The user-defined function must be deterministic.
2288+
2289+
.. note:: The user-defined functions do not support conditional execution by using them with
2290+
SQL conditional expressions such as `when` or `if`. The functions still apply on all rows no
2291+
matter the conditions are met or not. So the output is correct if the functions can be
2292+
correctly run on all rows without failure. If the functions can cause runtime failure on the
2293+
rows that do not satisfy the conditions, the suggested workaround is to incorporate the
2294+
condition logic into the functions.
22812295
"""
22822296
return _create_udf(f, returnType=returnType, pythonUdfType=PythonUdfType.PANDAS_UDF)
22832297

0 commit comments

Comments
 (0)