[SPARK-39249][SQL] Improve subexpression elimination for conditional expressions #36626

WangGuangxin · 2022-05-21T14:55:19Z

What changes were proposed in this pull request?

Currently we can do subexpression elimination for conditional expressions when the subexpression is common across all branchGroups. In fact, we can farther improve this when there are common expressions between alwaysEvaluatedInputs and branchGroups.

Why are the changes needed?

Take the following case as an example

IF(IsNull(a), b, KnowNotNull(a))

a may miss subexpression elimination chances since it is not the common expression between all branchGroups, but it's safe to evaluate a as common subexpression and eagerly execute it since it's part of the prediction, which will always be executed. If a is a time-expensive expression, we may waste time on running it.

This kind of expressions are common when we do sum on decimal type because of #29026

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala

Line 125 in 291d155

If(

Many queries on TPC-DS has positive improvement. Following is some obvious improvements on TPC-DS 10T

Query	With this PR	Without this PR	Speed up
4	310.862	635.299	104.37%
44	47.851	63.717	33.16%
50	188.106	217.023	13.32%
80	36.723	46.006	25.28%
93	193.26	224.135	13.78%
95	102.811	126.125	18.48%

Does this PR introduce any user-facing change?

No

How was this patch tested?

add more UT.

WangGuangxin · 2022-05-21T14:57:15Z

@viirya @cloud-fan Could you please help review this?

AmplabJenkins · 2022-05-21T20:46:56Z

Can one of the admins verify this patch?

Kimahriman · 2022-05-21T22:17:16Z

FYI I created #32987 a while ago to address this in a more general way. I've tried to keep it up to date, but there seemed to be concerns about creating a subexpression for something that might only execute once I guess? Even though that's already happening in certain cases

WangGuangxin · 2022-06-06T04:37:44Z

@viirya @cloud-fan Could you please help review this?

github-actions · 2022-09-15T00:24:34Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Improve subexpression elimination for conditional expressions

e88c864

github-actions bot added the SQL label May 21, 2022

fix ut

02f704e

github-actions bot added the Stale label Sep 15, 2022

github-actions bot closed this Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-39249][SQL] Improve subexpression elimination for conditional expressions #36626

[SPARK-39249][SQL] Improve subexpression elimination for conditional expressions #36626

Uh oh!

WangGuangxin commented May 21, 2022 •

edited

Loading

Uh oh!

WangGuangxin commented May 21, 2022

Uh oh!

AmplabJenkins commented May 21, 2022

Uh oh!

Kimahriman commented May 21, 2022 •

edited

Loading

Uh oh!

WangGuangxin commented Jun 6, 2022

Uh oh!

github-actions bot commented Sep 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-39249][SQL] Improve subexpression elimination for conditional expressions #36626

[SPARK-39249][SQL] Improve subexpression elimination for conditional expressions #36626

Uh oh!

Conversation

WangGuangxin commented May 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

WangGuangxin commented May 21, 2022

Uh oh!

AmplabJenkins commented May 21, 2022

Uh oh!

Kimahriman commented May 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WangGuangxin commented Jun 6, 2022

Uh oh!

github-actions bot commented Sep 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WangGuangxin commented May 21, 2022 •

edited

Loading

Kimahriman commented May 21, 2022 •

edited

Loading