-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24890] [SQL] Short circuiting the if condition when trueValue and falseValue are the same
#21848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if condition when trueValue and falseValue are the sameif condition when trueValue and falseValue are the same
| } | ||
|
|
||
| case e @ CaseWhen(branches, _) if branches.headOption.map(_._1) == Some(TrueLiteral) => | ||
| case CaseWhen(branches, _) if branches.headOption.map(_._1).contains(TrueLiteral) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we removed Scala 2.10, it seems to be okay. However, if we revert this irrelevant change, this PR becomes neater (and easier for someone to backport this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's not a bug fix, I guess it's unlikely someone will backport this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In the community, it's not allowed for backport. I mean the others who want to have this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, in any event, wouldn't it be better to revert this change back if there's any actual advantage against a unrelated style change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, we avoid adding unneeded refactoring in such a PR. Please avoid it next time. Thanks!
|
|
| } | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra space line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we always have two new lines between two objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to remove it back though assuming from
Use one or two blank line(s) to separate class definitions.
https://github.com/databricks/scala-style-guide#blank-lines-vertical-whitespace
Looks either way is fine.
| case If(TrueLiteral, trueValue, _) => trueValue | ||
| case If(FalseLiteral, _, falseValue) => falseValue | ||
| case If(Literal(null, _), _, falseValue) => falseValue | ||
| case If(_, trueValue, falseValue) if trueValue.semanticEquals(falseValue) => trueValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not right. The condition must be deterministic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate?
For trueValue.semanticEquals(falseValue), it's guaranteed that both trueValue and falseValue are deterministic.
def semanticEquals(other: Expression): Boolean =
deterministic && other.deterministic && canonicalized == other.canonicalizedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understandable that the condition can be non-deterministic, but this doesn't change the result of If.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition could have a side effect. For example, calling a stateful UDF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
|
Since this skips the evaluation of This PR Spark 2.3.1 |
|
This is a good point.
…On Mon, Jul 23, 2018, 12:03 PM Dongjoon Hyun ***@***.***> wrote:
Since this skips the evaluation of if condition, this will cause the
following difference.
*This PR*
scala> sql("select * from t").show
+----+
| a|
+----+
| 1|
|null|
+----+
scala> sql("select if(assert_true(a is null),a,a) from t").show
+-----------------------------------------------------+
|(IF(CAST(assert_true((a IS NULL)) AS BOOLEAN), a, a))|
+-----------------------------------------------------+
| 1|
| null|
+-----------------------------------------------------+
*Spark 2.3.1*
scala> sql("select * from t").show
+----+
| a|
+----+
| 1|
|null|
+----+
scala> sql("select if(assert_true(a is null),a,a) from t").show
18/07/23 11:59:11 ERROR Executor: Exception in task 0.0 in stage 20.0 (TID 20)
java.lang.RuntimeException: 'isnull(input[0, int, true])' is not true!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#21848 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAEM99Jsk4Z9Nt7NV5zkdG9oWyUDY4OZks5uJh4RgaJpZM4VbZaO>
.
|
|
Test build #93454 has finished for PR 21848 at commit
|
|
For now, seems we don't have a good way to know if an expression has side effect. Some expressions like |
|
@gatorsmile this can remove some of the expensive condition expressions, so I would like to find a way to properly implement this. Thank you all for chiming in with many good points. Let me summary here.
This means |
|
Currently, we are setting the expressions We should change |
|
This will simplify the scope of this PR a lot. My concern is the more |
|
Test build #93471 has finished for PR 21848 at commit
|
|
Hmm, seems we have limitation on where non deterministic expressions can be in. |
|
@dbtsai I have a question. How does the current code check the following condition?
|
| case class AssertNotNull(child: Expression, walkedTypePath: Seq[String] = Nil) | ||
| extends UnaryExpression with NonSQLExpression { | ||
|
|
||
| override lazy val deterministic: Boolean = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us create a separate PR for the changes on deterministic? We need extra changes when we changing the flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. I'll create a followup PR for this.
|
@kiszk |
|
Here is a followup JIRA for making |
|
Test build #93523 has finished for PR 21848 at commit
|
|
LGTM Thanks! Merged to master. |
| case If(FalseLiteral, _, falseValue) => falseValue | ||
| case If(Literal(null, _), _, falseValue) => falseValue | ||
| case If(cond, trueValue, falseValue) | ||
| if cond.deterministic && trueValue.semanticEquals(falseValue) => trueValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we'll still have this problem by skipping the evaluation of cond ..
Lately, SPARK-33544 introduced another approach for that. I think that superseded SPARK-24913. I think we can switch it to use SPARK-33544 approach.
@dbtsai, can we try and follow up it with using NoThrow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @tgravescs too FYI
What changes were proposed in this pull request?
When
trueValueandfalseValueare semantic equivalence, the condition expression inifcan be removed to avoid extra computation in runtime.How was this patch tested?
Test added.