Skip to content

Conversation

@wankunde
Copy link
Contributor

What changes were proposed in this pull request?

How to support more subexpressions elimination cases

  • Get all common expressions from input expressions of the current physical operator to current CodeGenContext. Recursively visits all subexpressions regardless of whether the current expression is a conditional expression.
  • For each common expression:
    • Add a new boolean variable subExprInit to indicate whether it has already been evaluated.
    • Add a new code block in CodeGenSupport trait, and reset those subExprInit variables to false before the physical operators begin to evaluate the input row.
    • Add a new wrapper subExpr function for each common subexpression.
private void subExpr_n(${argList}) {
 if (!subExprInit_n) {
   ${eval.code}
   subExprInit_n = true;
   subExprIsNull_n = ${eval.isNull};
   subExprValue_n = ${eval.value};
 }
}
  • When generating the input expression code, if the input expression is a common expression, the expression code will be replaced with the corresponding subExpr function. When the subExpr function is called for the first time, subExprInit will be set to true, and the subsequent function calls will do nothing.

Why are the changes needed?

Support more subexpression elimination cases, improve performance.

For example, TPCDS q23b query, we can reuse the result of projection sum(ss_quantity * ss_sales_price) in the if expressions:

if (isnull((cast(input[2, int, true] as decimal(10,0)) * input[3, decimal(7,2), true])))
   input[0, decimal(28,2), true]
else 
   (input[0, decimal(28,2), true] + cast(knownnotnull((cast(input[2, int, true] as decimal(10,0)) * input[3, decimal(7,2), true])) as decimal(28,2)))

q4, q62, q67 are similar to the above.

Query Time with PR Time without PR Time diff Percentage
q1.sql 36.97 36.623 -0.347 99.06%
q2.sql 42.699 41.741 -0.958 97.76%
q3.sql 6.038 6.111 0.073 101.21%
q4.sql 273.849 323.799 49.95 118.24%
q5.sql 76.202 76.353 0.151 100.20%
q6.sql 8.451 9.011 0.56 106.63%
q7.sql 13.017 12.954 -0.063 99.52%
q8.sql 10.22 11.027 0.807 107.90%
q9.sql 72.843 72.019 -0.824 98.87%
q10.sql 13.233 14.028 0.795 106.01%
q11.sql 112.252 112.983 0.731 100.65%
q12.sql 3.036 3.488 0.452 114.89%
q13.sql 12.471 12.911 0.44 103.53%
q14a.sql 210.201 220.12 9.919 104.72%
q14b.sql 185.374 187.57 2.196 101.18%
q15.sql 10.189 10.338 0.149 101.46%
q16.sql 80.756 82.503 1.747 102.16%
q17.sql 28.523 28.567 0.044 100.15%
q18.sql 13.417 14.271 0.854 106.37%
q19.sql 6.366 6.53 0.164 102.58%
q20.sql 3.427 4.939 1.512 144.12%
q21.sql 2.096 2.16 0.064 103.05%
q22.sql 14.4 14.01 -0.39 97.29%
q23a.sql 507.253 545.185 37.932 107.48%
q23b.sql 707.054 768.148 61.094 108.64%
q24a.sql 193.116 193.793 0.677 100.35%
q24b.sql 177.109 179.54 2.431 101.37%
q25.sql 22.264 22.949 0.685 103.08%
q26.sql 8.68 8.973 0.293 103.38%
q27.sql 8.535 8.558 0.023 100.27%
q28.sql 101.953 102.713 0.76 100.75%
q29.sql 75.392 76.211 0.819 101.09%
q30.sql 12.265 13.508 1.243 110.13%
q31.sql 26.477 26.965 0.488 101.84%
q32.sql 3.393 3.507 0.114 103.36%
q33.sql 6.909 7.277 0.368 105.33%
q34.sql 8.41 8.572 0.162 101.93%
q35.sql 34.214 36.822 2.608 107.62%
q36.sql 9.027 9.79 0.763 108.45%
q37.sql 36.076 36.753 0.677 101.88%
q38.sql 71.768 74.473 2.705 103.77%
q39a.sql 7.753 7.617 -0.136 98.25%
q39b.sql 6.365 7.229 0.864 113.57%
q40.sql 16.588 17.164 0.576 103.47%
q41.sql 1.162 1.188 0.026 102.24%
q42.sql 2.3 2.561 0.261 111.35%
q43.sql 7.407 7.605 0.198 102.67%
q44.sql 28.939 30.473 1.534 105.30%
q45.sql 9.796 9.634 -0.162 98.35%
q46.sql 9.496 9.692 0.196 102.06%
q47.sql 27.087 27.151 0.064 100.24%
q48.sql 14.524 14.889 0.365 102.51%
q49.sql 21.466 21.572 0.106 100.49%
q50.sql 194.755 195.052 0.297 100.15%
q51.sql 37.493 38.56 1.067 102.85%
q52.sql 2.227 2.28 0.053 102.38%
q53.sql 5.375 5.437 0.062 101.15%
q54.sql 12.556 13.015 0.459 103.66%
q55.sql 2.341 2.809 0.468 119.99%
q56.sql 7.424 7.207 -0.217 97.08%
q57.sql 17.606 17.797 0.191 101.08%
q58.sql 6.169 6.374 0.205 103.32%
q59.sql 27.602 27.744 0.142 100.51%
q60.sql 7.04 7.459 0.419 105.95%
q61.sql 7.838 7.816 -0.022 99.72%
q62.sql 9.726 10.762 1.036 110.65%
q63.sql 4.816 5.176 0.36 107.48%
q64.sql 253.937 261.034 7.097 102.79%
q65.sql 78.942 78.373 -0.569 99.28%
q66.sql 15.199 14.73 -0.469 96.91%
q67.sql 926.049 1022.971 96.922 110.47%
q68.sql 7.932 7.977 0.045 100.57%
q69.sql 12.101 14.699 2.598 121.47%
q70.sql 20.7 20.872 0.172 100.83%
q71.sql 14.96 15.065 0.105 100.70%
q72.sql 73.215 73.955 0.74 101.01%
q73.sql 5.973 6.126 0.153 102.56%
q74.sql 97.611 99.577 1.966 102.01%
q75.sql 125.005 129.508 4.503 103.60%
q76.sql 34.812 35.34 0.528 101.52%
q77.sql 7.686 8.474 0.788 110.25%
q78.sql 287.959 292.936 4.977 101.73%
q79.sql 8.401 9.616 1.215 114.46%
q80.sql 59.371 60.051 0.68 101.15%
q81.sql 18.452 19.499 1.047 105.67%
q82.sql 64.093 65.032 0.939 101.47%
q83.sql 4.675 4.867 0.192 104.11%
q84.sql 10.456 10.816 0.36 103.44%
q85.sql 12.347 12.77 0.423 103.43%
q86.sql 6.537 6.843 0.306 104.68%
q87.sql 77.427 77.876 0.449 100.58%
q88.sql 83.082 83.385 0.303 100.36%
q89.sql 6.645 6.801 0.156 102.35%
q90.sql 7.841 7.883 0.042 100.54%
q91.sql 3.88 4.129 0.249 106.42%
q92.sql 3.044 3.271 0.227 107.46%
q93.sql 361.149 365.883 4.734 101.31%
q94.sql 43.929 46.667 2.738 106.23%
q95.sql 196.363 197.427 1.064 100.54%
q96.sql 12.457 12.496 0.039 100.31%
q97.sql 80.131 81.821 1.69 102.11%
q98.sql 6.885 7.522 0.637 109.25%
q99.sql 17.685 18.009 0.324 101.83%
6788.707 7116.357 327.65 104.82%

One of our production query which has 19 case when expressions, it's query time changed from 1.1 hour to 42 seconds.

image

A simplify benchmark of the above production query.

    spark.range(1, 2000000, 1, 1)
      .selectExpr(
        "cast(id + 1 as decimal) as a",
        "cast(id + 2 as decimal) as b",
        "cast(id + 3 as decimal) as c",
        "cast(id + 4 as decimal) as d")
      .createOrReplaceTempView("tab")
    runBenchmark("Subexpression elimination in ProjectExec") {
      val benchmark =
        new Benchmark("Subexpression elimination in ProjectExec", 2000000, output = output)
      benchmark.addCase(s"Test query") { _ =>
        val query =
          s"""
             |SELECT a, b, c, d,
             |       a * b / c as s1,
             |       CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN 1 ELSE 0 END s2,
             |       CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN
             |            CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN 1 ELSE 0 END
             |       ELSE 0 END s3,
             |       CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN
             |            CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN
             |                 CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN 1 ELSE 0 END
             |            ELSE 0 END
             |       ELSE 0 END s4,
             |       CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN
             |            CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN
             |                 CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN
             |                      CASE WHEN d = 0 THEN 0 WHEN a * b / c > 0 THEN 1 ELSE 0 END
             |                 ELSE 0 END
             |            ELSE 0 END
             |       ELSE 0 END s5
             |FROM tab
             |""".stripMargin
        spark.sql(query).noop()
      }
      benchmark.run()

Local benchmark result:
Before this PR:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Subexpression elimination in ProjectExec:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test query                                         9713           9900         263          0.2        4856.7       1.0X

After this PR:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Subexpression elimination in ProjectExec:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test query                                         1238           1307          98          8.1         123.8       1.0X

Does this PR introduce any user-facing change?

No

How was this patch tested?

Exists UT.

@github-actions github-actions bot added the SQL label May 10, 2023
wankunde added 2 commits May 10, 2023 23:24
Do not reuse the expression if CodegenContext changed.
@wankunde
Copy link
Contributor Author

Hi, @cloud-fan @ulysses-you @wangyum could you help to review this PR ? Thanks

@HyukjinKwon
Copy link
Member

cc @rednaxelafx

@HyukjinKwon
Copy link
Member

and @peter-toth

@wankunde
Copy link
Contributor Author

Hi, @rednaxelafx @peter-toth could you help to review this PR ? Thanks

@peter-toth
Copy link
Contributor

peter-toth commented May 19, 2023

Hi, @rednaxelafx @peter-toth could you help to review this PR ? Thanks

Hi @wankunde, thanks for pinging me. I can take a look at this PR sometime next week or the week after...

@wankunde
Copy link
Contributor Author

I think it would be easier to compare the difference of the code generated by ProjectExec and FilterExec before reviewing this PR.
ProjectExec generated code: https://gist.github.com/wankunde/8034cb6bdd8cac108765bffb16bd0e21
FilterExec generated code: https://gist.github.com/wankunde/e1531f42402638252d958f0ad4d3769d

cc @peter-toth @reactormonk @cloud-fan

@peter-toth
Copy link
Contributor

Sorry, I still haven't got time to review the PR thoroughly. (Maybe next week...)

IMO it is a good idea to lazily evaluate expressions and cache the results runtime in case of any expresions that might be used more than once. In some cases it could achieve much better performance than any kind of static analysis we do currently...
But the downside is that the generated code is bigger and so it can cause regressions too. So, probably I would add a new feature flag to allow switching between the current "static" and the suggested new "runtime" subexpression elimination of this PR. But others might have better ideas...

@wankunde
Copy link
Contributor Author

wankunde commented Jun 1, 2023

Add a microbenchmark to evaluate the overhead of an additional function call.
The expressions "a * b", "a / b", "a * b / c" will be wrapped in a function and will only be called once.

First benchmark for Project

  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
    spark.range(1, 20000000, 1, 1)
      .selectExpr(
        "cast(id + 1 as decimal) as a",
        "cast(id + 2 as decimal) as b",
        "cast(id + 3 as decimal) as c",
        "cast(id + 4 as decimal) as d")
      .createOrReplaceTempView("tab")
    runBenchmark("Subexpression elimination in ProjectExec") {
      val benchmark =
        new Benchmark("Subexpression elimination in ProjectExec", 20000000, output = output)
      for (expr <- Seq("a * b", "a / b", "a * b / c")) {
        benchmark.addCase(s"Test $expr expr") { _ =>
          val query =
            s"""
               |SELECT a, b, c, d, CASE WHEN $expr > 0 THEN 0 WHEN $expr > -1 THEN -1 END AS e
               |FROM tab
               |""".stripMargin
          spark.sql(query).noop()
        }
      }

      benchmark.run()
    }
  }

Benchmark results are almost the same

Before this PR

Subexpression elimination in ProjectExec:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test a * b expr                                    2221           2273          73          9.0         111.0       1.0X
Test a / b expr                                    7581           7612          45          2.6         379.0       0.3X
Test a * b / c expr                                8418           8582         232          2.4         420.9       0.3X

After this PR

Subexpression elimination in ProjectExec:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test a * b expr                                    2223           2269          65          9.0         111.1       1.0X
Test a / b expr                                    7565           7632          95          2.6         378.2       0.3X
Test a * b / c expr                                8568           8673         149          2.3         428.4       0.3X

Second benchmark for Filter.
If there is no complex expression, the query time changes from 1165ms to 1270ms,
If there is only one complex expression, the query time is the same as before. (6527ms and 7982ms)

  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
    spark.range(1, 20000000, 1, 1)
      .selectExpr(
        "cast(id + 1 as decimal) as a",
        "cast(id + 2 as decimal) as b",
        "cast(id + 3 as decimal) as c",
        "cast(id + 4 as decimal) as d")
      .createOrReplaceTempView("tab")
    runBenchmark("Subexpression elimination in FilterExec") {
      val benchmark =
        new Benchmark("Subexpression elimination in FilterExec", 20000000, output = output)
      for (expr <- Seq("a * b", "a / b", "a * b / c")) {
        benchmark.addCase(s"Test $expr expr") { _ =>
          val query =
            s"""
               |SELECT a, b, c, d
               |FROM tab
               |WHERE $expr < 0 AND $expr < 1
               |""".stripMargin
          spark.sql(query).noop()
        }
      }

      benchmark.run()
    }
  }
Before this change: 
Subexpression elimination in FilterExec:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test a * b expr                                    1046           1165         169         19.1          52.3       1.0X
Test a / b expr                                    6519           6527          12          3.1         325.9       0.2X
Test a * b / c expr                                7634           7982         492          2.6         381.7       0.1X


After this change: 
Subexpression elimination in FilterExec:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Test a * b expr                                    1245           1270          35         16.1          62.3       1.0X
Test a / b expr                                    6469           6582         160          3.1         323.4       0.2X
Test a * b / c expr                                7751           7997         348          2.6         387.6       0.2X

@peter-toth
Copy link
Contributor

peter-toth commented Jun 21, 2023

@wankunde, I didn't foget this PR just realized that we can improve the current subexpression elimination logic in terms of recognizing the surely evaluated subexpressions and keeping track of conditionally evaluated ones.

IMO if surely evaluated count is >= 2 or surely evaluated count = 1 but there is also a certain probability of evaluating the subexpression conditionally then we can include the subexpression for elimination. (This is not a new idea, basically @cloud-fan already mentioned this in #32987 (comment)). So I opened a PR for tracking the expected count of sure and conditional evaluations here: #41677. (Need to add a few more tests so it is just WIP now...)

If that PR gets accepted we can revisit those subexpressions that have only conditional evaluations at 2+ places and apply your lazy construct idea proposed in this PR.

@wankunde
Copy link
Contributor Author

Hi, @peter-toth , any update about the expression elimination work ?

@peter-toth
Copy link
Contributor

peter-toth commented Aug 22, 2023

Yeah, unfortunately #41677 got a bit stuck. I will try to revive it this week.

@github-actions
Copy link

github-actions bot commented Dec 1, 2023

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Dec 1, 2023
@github-actions github-actions bot closed this Dec 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants