[SPARK-42551][SQL] Support more subexpression elimination cases #41119

wankunde · 2023-05-10T10:57:26Z

What changes were proposed in this pull request?

How to support more subexpressions elimination cases

Get all common expressions from input expressions of the current physical operator to current CodeGenContext. Recursively visits all subexpressions regardless of whether the current expression is a conditional expression.
For each common expression:
- Add a new boolean variable subExprInit to indicate whether it has already been evaluated.
- Add a new code block in CodeGenSupport trait, and reset those subExprInit variables to false before the physical operators begin to evaluate the input row.
- Add a new wrapper subExpr function for each common subexpression.

private void subExpr_n(${argList}) {
 if (!subExprInit_n) {
   ${eval.code}
   subExprInit_n = true;
   subExprIsNull_n = ${eval.isNull};
   subExprValue_n = ${eval.value};
 }
}

When generating the input expression code, if the input expression is a common expression, the expression code will be replaced with the corresponding subExpr function. When the subExpr function is called for the first time, subExprInit will be set to true, and the subsequent function calls will do nothing.

Why are the changes needed?

Support more subexpression elimination cases, improve performance.

For example, TPCDS q23b query, we can reuse the result of projection sum(ss_quantity * ss_sales_price) in the if expressions:

if (isnull((cast(input[2, int, true] as decimal(10,0)) * input[3, decimal(7,2), true])))
   input[0, decimal(28,2), true]
else 
   (input[0, decimal(28,2), true] + cast(knownnotnull((cast(input[2, int, true] as decimal(10,0)) * input[3, decimal(7,2), true])) as decimal(28,2)))

q4, q62, q67 are similar to the above.

Query	Time with PR	Time without PR	Time diff	Percentage
q1.sql	36.97	36.623	-0.347	99.06%
q2.sql	42.699	41.741	-0.958	97.76%
q3.sql	6.038	6.111	0.073	101.21%
q4.sql	273.849	323.799	49.95	118.24%
q5.sql	76.202	76.353	0.151	100.20%
q6.sql	8.451	9.011	0.56	106.63%
q7.sql	13.017	12.954	-0.063	99.52%
q8.sql	10.22	11.027	0.807	107.90%
q9.sql	72.843	72.019	-0.824	98.87%
q10.sql	13.233	14.028	0.795	106.01%
q11.sql	112.252	112.983	0.731	100.65%
q12.sql	3.036	3.488	0.452	114.89%
q13.sql	12.471	12.911	0.44	103.53%
q14a.sql	210.201	220.12	9.919	104.72%
q14b.sql	185.374	187.57	2.196	101.18%
q15.sql	10.189	10.338	0.149	101.46%
q16.sql	80.756	82.503	1.747	102.16%
q17.sql	28.523	28.567	0.044	100.15%
q18.sql	13.417	14.271	0.854	106.37%
q19.sql	6.366	6.53	0.164	102.58%
q20.sql	3.427	4.939	1.512	144.12%
q21.sql	2.096	2.16	0.064	103.05%
q22.sql	14.4	14.01	-0.39	97.29%
q23a.sql	507.253	545.185	37.932	107.48%
q23b.sql	707.054	768.148	61.094	108.64%
q24a.sql	193.116	193.793	0.677	100.35%
q24b.sql	177.109	179.54	2.431	101.37%
q25.sql	22.264	22.949	0.685	103.08%
q26.sql	8.68	8.973	0.293	103.38%
q27.sql	8.535	8.558	0.023	100.27%
q28.sql	101.953	102.713	0.76	100.75%
q29.sql	75.392	76.211	0.819	101.09%
q30.sql	12.265	13.508	1.243	110.13%
q31.sql	26.477	26.965	0.488	101.84%
q32.sql	3.393	3.507	0.114	103.36%
q33.sql	6.909	7.277	0.368	105.33%
q34.sql	8.41	8.572	0.162	101.93%
q35.sql	34.214	36.822	2.608	107.62%
q36.sql	9.027	9.79	0.763	108.45%
q37.sql	36.076	36.753	0.677	101.88%
q38.sql	71.768	74.473	2.705	103.77%
q39a.sql	7.753	7.617	-0.136	98.25%
q39b.sql	6.365	7.229	0.864	113.57%
q40.sql	16.588	17.164	0.576	103.47%
q41.sql	1.162	1.188	0.026	102.24%
q42.sql	2.3	2.561	0.261	111.35%
q43.sql	7.407	7.605	0.198	102.67%
q44.sql	28.939	30.473	1.534	105.30%
q45.sql	9.796	9.634	-0.162	98.35%
q46.sql	9.496	9.692	0.196	102.06%
q47.sql	27.087	27.151	0.064	100.24%
q48.sql	14.524	14.889	0.365	102.51%
q49.sql	21.466	21.572	0.106	100.49%
q50.sql	194.755	195.052	0.297	100.15%
q51.sql	37.493	38.56	1.067	102.85%
q52.sql	2.227	2.28	0.053	102.38%
q53.sql	5.375	5.437	0.062	101.15%
q54.sql	12.556	13.015	0.459	103.66%
q55.sql	2.341	2.809	0.468	119.99%
q56.sql	7.424	7.207	-0.217	97.08%
q57.sql	17.606	17.797	0.191	101.08%
q58.sql	6.169	6.374	0.205	103.32%
q59.sql	27.602	27.744	0.142	100.51%
q60.sql	7.04	7.459	0.419	105.95%
q61.sql	7.838	7.816	-0.022	99.72%
q62.sql	9.726	10.762	1.036	110.65%
q63.sql	4.816	5.176	0.36	107.48%
q64.sql	253.937	261.034	7.097	102.79%
q65.sql	78.942	78.373	-0.569	99.28%
q66.sql	15.199	14.73	-0.469	96.91%
q67.sql	926.049	1022.971	96.922	110.47%
q68.sql	7.932	7.977	0.045	100.57%
q69.sql	12.101	14.699	2.598	121.47%
q70.sql	20.7	20.872	0.172	100.83%
q71.sql	14.96	15.065	0.105	100.70%
q72.sql	73.215	73.955	0.74	101.01%
q73.sql	5.973	6.126	0.153	102.56%
q74.sql	97.611	99.577	1.966	102.01%
q75.sql	125.005	129.508	4.503	103.60%
q76.sql	34.812	35.34	0.528	101.52%
q77.sql	7.686	8.474	0.788	110.25%
q78.sql	287.959	292.936	4.977	101.73%
q79.sql	8.401	9.616	1.215	114.46%
q80.sql	59.371	60.051	0.68	101.15%
q81.sql	18.452	19.499	1.047	105.67%
q82.sql	64.093	65.032	0.939	101.47%
q83.sql	4.675	4.867	0.192	104.11%
q84.sql	10.456	10.816	0.36	103.44%
q85.sql	12.347	12.77	0.423	103.43%
q86.sql	6.537	6.843	0.306	104.68%
q87.sql	77.427	77.876	0.449	100.58%
q88.sql	83.082	83.385	0.303	100.36%
q89.sql	6.645	6.801	0.156	102.35%
q90.sql	7.841	7.883	0.042	100.54%
q91.sql	3.88	4.129	0.249	106.42%
q92.sql	3.044	3.271	0.227	107.46%
q93.sql	361.149	365.883	4.734	101.31%
q94.sql	43.929	46.667	2.738	106.23%
q95.sql	196.363	197.427	1.064	100.54%
q96.sql	12.457	12.496	0.039	100.31%
q97.sql	80.131	81.821	1.69	102.11%
q98.sql	6.885	7.522	0.637	109.25%
q99.sql	17.685	18.009	0.324	101.83%
	6788.707	7116.357	327.65	104.82%

One of our production query which has 19 case when expressions, it's query time changed from 1.1 hour to 42 seconds.

A simplify benchmark of the above production query.

    spark.range(1, 2000000, 1, 1)
      .selectExpr(
        "cast(id + 1 as decimal) as a",
        "cast(id + 2 as decimal) as b",
        "cast(id + 3 as decimal) as c",
        "cast(id + 4 as decimal) as d")
      .createOrReplaceTempView("tab")
    runBenchmark("Subexpression elimination in ProjectExec") {
      val benchmark =
        new Benchmark("Subexpression elimination in ProjectExec", 2000000, output = output)
      benchmark.addCase(s"Test query") { _ =>
        val query =
          s"""
             |SELECT a, b, c, d,
             |       a * b / c as s1,
             |       CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN 1 ELSE 0 END s2,
             |       CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN
             |            CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN 1 ELSE 0 END
             |       ELSE 0 END s3,
             |       CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN
             |            CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN
             |                 CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN 1 ELSE 0 END
             |            ELSE 0 END
             |       ELSE 0 END s4,
             |       CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN
             |            CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN
             |                 CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN
             |                      CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN 1 ELSE 0 END
             |                 ELSE 0 END
             |            ELSE 0 END
             |       ELSE 0 END s5
             |FROM tab
             |""".stripMargin
        spark.sql(query).noop()
      }
      benchmark.run()

Local benchmark result:
Before this PR:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Subexpression elimination in ProjectExec:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test query                                         9713           9900         263          0.2        4856.7       1.0X

After this PR:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Subexpression elimination in ProjectExec:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test query                                         1238           1307          98          8.1         123.8       1.0X

Does this PR introduce any user-facing change?

No

How was this patch tested?

Exists UT.

Do not reuse the expression if CodegenContext changed.

wankunde · 2023-05-12T05:57:13Z

Hi, @cloud-fan @ulysses-you @wangyum could you help to review this PR ? Thanks

HyukjinKwon · 2023-05-15T03:39:28Z

cc @rednaxelafx

HyukjinKwon · 2023-05-15T03:40:02Z

and @peter-toth

wankunde · 2023-05-19T06:16:05Z

Hi, @rednaxelafx @peter-toth could you help to review this PR ? Thanks

peter-toth · 2023-05-19T09:04:03Z

Hi, @rednaxelafx @peter-toth could you help to review this PR ? Thanks

Hi @wankunde, thanks for pinging me. I can take a look at this PR sometime next week or the week after...

wankunde · 2023-05-31T03:21:18Z

I think it would be easier to compare the difference of the code generated by ProjectExec and FilterExec before reviewing this PR.
ProjectExec generated code: https://gist.github.com/wankunde/8034cb6bdd8cac108765bffb16bd0e21
FilterExec generated code: https://gist.github.com/wankunde/e1531f42402638252d958f0ad4d3769d

cc @peter-toth @reactormonk @cloud-fan

peter-toth · 2023-05-31T12:16:39Z

Sorry, I still haven't got time to review the PR thoroughly. (Maybe next week...)

IMO it is a good idea to lazily evaluate expressions and cache the results runtime in case of any expresions that might be used more than once. In some cases it could achieve much better performance than any kind of static analysis we do currently...
But the downside is that the generated code is bigger and so it can cause regressions too. So, probably I would add a new feature flag to allow switching between the current "static" and the suggested new "runtime" subexpression elimination of this PR. But others might have better ideas...

wankunde · 2023-06-01T10:12:19Z

Add a microbenchmark to evaluate the overhead of an additional function call.
The expressions "a * b", "a / b", "a * b / c" will be wrapped in a function and will only be called once.

First benchmark for Project

  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
    spark.range(1, 20000000, 1, 1)
      .selectExpr(
        "cast(id + 1 as decimal) as a",
        "cast(id + 2 as decimal) as b",
        "cast(id + 3 as decimal) as c",
        "cast(id + 4 as decimal) as d")
      .createOrReplaceTempView("tab")
    runBenchmark("Subexpression elimination in ProjectExec") {
      val benchmark =
        new Benchmark("Subexpression elimination in ProjectExec", 20000000, output = output)
      for (expr <- Seq("a * b", "a / b", "a * b / c")) {
        benchmark.addCase(s"Test $expr expr") { _ =>
          val query =
            s"""
               |SELECT a, b, c, d, CASE WHEN $expr > 0 THEN 0 WHEN $expr > -1 THEN -1 END AS e
               |FROM tab
               |""".stripMargin
          spark.sql(query).noop()
        }
      }

      benchmark.run()
    }
  }

Benchmark results are almost the same

Before this PR

Subexpression elimination in ProjectExec:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test a * b expr                                    2221           2273          73          9.0         111.0       1.0X
Test a / b expr                                    7581           7612          45          2.6         379.0       0.3X
Test a * b / c expr                                8418           8582         232          2.4         420.9       0.3X

After this PR

Subexpression elimination in ProjectExec:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test a * b expr                                    2223           2269          65          9.0         111.1       1.0X
Test a / b expr                                    7565           7632          95          2.6         378.2       0.3X
Test a * b / c expr                                8568           8673         149          2.3         428.4       0.3X

Second benchmark for Filter.
If there is no complex expression, the query time changes from 1165ms to 1270ms,
If there is only one complex expression, the query time is the same as before. (6527ms and 7982ms)

  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
    spark.range(1, 20000000, 1, 1)
      .selectExpr(
        "cast(id + 1 as decimal) as a",
        "cast(id + 2 as decimal) as b",
        "cast(id + 3 as decimal) as c",
        "cast(id + 4 as decimal) as d")
      .createOrReplaceTempView("tab")
    runBenchmark("Subexpression elimination in FilterExec") {
      val benchmark =
        new Benchmark("Subexpression elimination in FilterExec", 20000000, output = output)
      for (expr <- Seq("a * b", "a / b", "a * b / c")) {
        benchmark.addCase(s"Test $expr expr") { _ =>
          val query =
            s"""
               |SELECT a, b, c, d
               |FROM tab
               |WHERE $expr < 0 AND $expr < 1
               |""".stripMargin
          spark.sql(query).noop()
        }
      }

      benchmark.run()
    }
  }

Before this change: 
Subexpression elimination in FilterExec:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test a * b expr                                    1046           1165         169         19.1          52.3       1.0X
Test a / b expr                                    6519           6527          12          3.1         325.9       0.2X
Test a * b / c expr                                7634           7982         492          2.6         381.7       0.1X


After this change: 
Subexpression elimination in FilterExec:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test a * b expr                                    1245           1270          35         16.1          62.3       1.0X
Test a / b expr                                    6469           6582         160          3.1         323.4       0.2X
Test a * b / c expr                                7751           7997         348          2.6         387.6       0.2X

peter-toth · 2023-06-21T08:31:15Z

@wankunde, I didn't foget this PR just realized that we can improve the current subexpression elimination logic in terms of recognizing the surely evaluated subexpressions and keeping track of conditionally evaluated ones.

IMO if surely evaluated count is >= 2 or surely evaluated count = 1 but there is also a certain probability of evaluating the subexpression conditionally then we can include the subexpression for elimination. (This is not a new idea, basically @cloud-fan already mentioned this in #32987 (comment)). So I opened a PR for tracking the expected count of sure and conditional evaluations here: #41677. (Need to add a few more tests so it is just WIP now...)

If that PR gets accepted we can revisit those subexpressions that have only conditional evaluations at 2+ places and apply your lazy construct idea proposed in this PR.

wankunde · 2023-08-18T07:32:23Z

Hi, @peter-toth , any update about the expression elimination work ?

peter-toth · 2023-08-22T14:56:14Z

Yeah, unfortunately #41677 got a bit stuck. I will try to revive it this week.

github-actions · 2023-12-01T00:21:12Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

wankunde added 8 commits May 9, 2023 19:30

[SPARK-42551][SQL] Support more subexpression elimination cases

ffe4ca8

Support whole stage subexpressions elimination

db77ea2

Bug fix for whole stage subexpression elimination

e5ee05e

Bug fix

f152b2a

Do not support CSE in produce method

9a7cdc6

Clear stale commonExpressions when reoptimize plan

076792f

Remove whole-stage expression elimination

f524ef5

Merge code into origin CSE

670446a

github-actions bot added the SQL label May 10, 2023

wankunde added 2 commits May 10, 2023 23:24

bug fix

fbfa00b

Consider common expression local input variables.

06238be

Do not reuse the expression if CodegenContext changed.

wankunde force-pushed the cse3 branch from bedd30b to 06238be Compare May 12, 2023 01:53

github-actions bot added the Stale label Dec 1, 2023

github-actions bot closed this Dec 2, 2023

[SPARK-42551][SQL] Support more subexpression elimination cases #41119

[SPARK-42551][SQL] Support more subexpression elimination cases #41119

Uh oh!

Conversation

wankunde commented May 10, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

wankunde commented May 12, 2023

Uh oh!

HyukjinKwon commented May 15, 2023

Uh oh!

HyukjinKwon commented May 15, 2023

Uh oh!

wankunde commented May 19, 2023

Uh oh!

peter-toth commented May 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wankunde commented May 31, 2023

Uh oh!

peter-toth commented May 31, 2023

Uh oh!

wankunde commented Jun 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peter-toth commented Jun 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wankunde commented Aug 18, 2023

Uh oh!

peter-toth commented Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

peter-toth commented May 19, 2023 •

edited

Loading

wankunde commented Jun 1, 2023 •

edited

Loading

peter-toth commented Jun 21, 2023 •

edited

Loading

peter-toth commented Aug 22, 2023 •

edited

Loading