Commit 3354a21
[SPARK-37001][SQL] Disable two level of map for final hash aggregation by default
### What changes were proposed in this pull request?
This PR is to disable two level of maps for final hash aggregation by default. The feature was introduced in apache#32242 and we found it can lead to query performance regression when the final aggregation gets rows with a lot of distinct keys. The 1st level hash map is full so a lot of rows will waste the 1st hash map lookup and inserted into 2nd hash map. This feature still benefits query with not so many distinct keys though, so introducing a config here `spark.sql.codegen.aggregate.final.map.twolevel.enabled`, to allow query to enable the feature when seeing benefit.
### Why are the changes needed?
Fix query regression.
### Does this PR introduce _any_ user-facing change?
Yes, the introduced `spark.sql.codegen.aggregate.final.map.twolevel.enabled` config.
### How was this patch tested?
Existing unit test in `AggregationQuerySuite.scala`.
Also verified generated code for an example query in the file:
```
spark.sql(
"""
|SELECT key, avg(value)
|FROM agg1
|GROUP BY key
""".stripMargin)
```
Verified the generated code for final hash aggregation not have two level maps by default:
https://gist.github.com/c21/d4ce87ef28a22d1ce839e0cda000ce14 .
Verified the generated code for final hash aggregation have two level maps if enabling the config:
https://gist.github.com/c21/4b59752c1f3f98303b60ccff66b5db69 .
Closes apache#34270 from c21/agg-fix.
Authored-by: Cheng Su <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent c11c44e commit 3354a21
File tree
3 files changed
+19
-2
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql
- catalyst/expressions
- internal
- core/src/main/scala/org/apache/spark/sql/execution/aggregate
3 files changed
+19
-2
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
217 | 217 | | |
218 | 218 | | |
219 | 219 | | |
220 | | - | |
221 | 220 | | |
222 | 221 | | |
| 222 | + | |
223 | 223 | | |
224 | 224 | | |
225 | 225 | | |
| |||
Lines changed: 10 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1710 | 1710 | | |
1711 | 1711 | | |
1712 | 1712 | | |
| 1713 | + | |
| 1714 | + | |
| 1715 | + | |
| 1716 | + | |
| 1717 | + | |
| 1718 | + | |
| 1719 | + | |
| 1720 | + | |
| 1721 | + | |
| 1722 | + | |
1713 | 1723 | | |
1714 | 1724 | | |
1715 | 1725 | | |
| |||
Lines changed: 8 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
667 | 667 | | |
668 | 668 | | |
669 | 669 | | |
670 | | - | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
671 | 678 | | |
672 | 679 | | |
673 | 680 | | |
| |||
0 commit comments