Commit cdcccd7
[SPARK-23405] Generate additional constraints for Join's children
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
I run a sql: `select ls.cs_order_number from ls left semi join catalog_sales cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table is a small table ,and the number is one. The `catalog_sales` table is a big table, and the number is 10 billion. The task will be hang up. And i find the many null values of `cs_order_number` in the `catalog_sales` table. I think the null value should be removed in the logical plan.
>== Optimized Logical Plan ==
>Join LeftSemi, (cs_order_number#1 = cs_order_number#22)
>:- Project cs_order_number#1
> : +- Filter isnotnull(cs_order_number#1)
> : +- MetastoreRelation 100t, ls
>+- Project cs_order_number#22
> +- MetastoreRelation 100t, catalog_sales
Now, use this patch, the plan will be:
>== Optimized Logical Plan ==
>Join LeftSemi, (cs_order_number#1 = cs_order_number#22)
>:- Project cs_order_number#1
> : +- Filter isnotnull(cs_order_number#1)
> : +- MetastoreRelation 100t, ls
>+- Project cs_order_number#22
> : **+- Filter isnotnull(cs_order_number#22)**
> :+- MetastoreRelation 100t, catalog_sales
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: KaiXinXiaoLei <[email protected]>
Author: hanghang <[email protected]>
Closes #20670 from KaiXinXiaoLei/Spark-23405.1 parent ff14801 commit cdcccd7
File tree
3 files changed
+28
-13
lines changed- sql/catalyst/src
- main/scala/org/apache/spark/sql/catalyst
- optimizer
- plans/logical
- test/scala/org/apache/spark/sql/catalyst/optimizer
3 files changed
+28
-13
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
661 | 661 | | |
662 | 662 | | |
663 | 663 | | |
664 | | - | |
| 664 | + | |
665 | 665 | | |
666 | 666 | | |
667 | 667 | | |
| |||
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
Lines changed: 15 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
27 | | - | |
28 | | - | |
| 26 | + | |
| 27 | + | |
29 | 28 | | |
30 | | - | |
| 29 | + | |
31 | 30 | | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
40 | 34 | | |
41 | 35 | | |
42 | 36 | | |
43 | 37 | | |
44 | 38 | | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
45 | 48 | | |
46 | 49 | | |
47 | 50 | | |
| |||
Lines changed: 12 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
192 | 192 | | |
193 | 193 | | |
194 | 194 | | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
195 | 207 | | |
0 commit comments