-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-22347][PySpark][DOC] Add document to notice users for using udfs with conditional expressions #19617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
retest this please. |
|
Test build #83242 has finished for PR 19617 at commit
|
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise.
python/pyspark/sql/functions.py
Outdated
| it is present in the query. | ||
| .. note:: The user-defined functions do not support conditional execution by using them with | ||
| SQL conditional expressions such `when` or `if`. The functions still apply on all rows no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a tiny typo expressions such `when` or `if`. -> expressions such as `when` or `if`..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Thanks.
| duplicate invocations may be eliminated or the function may even be invoked more times than | ||
| it is present in the query. | ||
| .. note:: The user-defined functions do not support conditional execution by using them with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, should we maybe clarify the output itself is correct if it does not cause the runtime failure by the condition? Maybe I am too much worried but think it might mislead the output is incorrect at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think this is a valid worry. Thanks.
|
Test build #83261 has finished for PR 19617 at commit
|
|
cc @cloud-fan for checking the document too. |
|
thanks, merging to master! |
What changes were proposed in this pull request?
Under the current execution mode of Python UDFs, we don't well support Python UDFs as branch values or else value in CaseWhen expression.
Since to fix it might need the change not small (e.g., #19592) and this issue has simpler workaround. We should just notice users in the document about this.
How was this patch tested?
Only document change.