-
Notifications
You must be signed in to change notification settings - Fork 29k
[SQL] SPARK-6489: Optimize lateral view with explode to not unnecessary columns. #5358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1b29835
376d332
644f688
9e7aaec
8909a5d
6014acc
54abc3a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -55,6 +55,7 @@ case class Generate( | |
| child: LogicalPlan) | ||
| extends UnaryNode { | ||
|
|
||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This mutable state is no longer needed. |
||
| protected def generatorOutput: Seq[Attribute] = { | ||
| val output = alias | ||
| .map(a => generator.output.map(_.withQualifiers(a :: Nil))) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -502,7 +502,11 @@ class FilterPushdownSuite extends PlanTest { | |
| .where(('c > 6) || ('b > 5)).analyze | ||
| } | ||
| val optimized = Optimize(originalQuery) | ||
|
|
||
| comparePlans(optimized, originalQuery) | ||
| val correctAnswer = { | ||
| testRelationWithArrayType | ||
| .generate(Explode(Seq("c"), 'c_arr), true, false, Some("arr")) | ||
| .where(('c > 6) || ('b > 5)).analyze | ||
| } | ||
| comparePlans(optimized, correctAnswer) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is going on here? |
||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -56,7 +56,8 @@ case class Generate( | |
| val boundGenerator = BindReferences.bindReference(generator, child.output) | ||
|
|
||
| override def execute(): RDD[Row] = { | ||
| if (join) { | ||
| // #SPARK-6489 do not join when the child has no output | ||
| if (join && child.output.nonEmpty) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems unrelated. Also, how is it possible to run a generator when there is no output?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SELECT one FROM person LATERAL VIEW explode(1) AS one;
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not be trying to optimize degenerate queries such as this one by putting random hacks into the execution engine. |
||
| child.execute().mapPartitions { iter => | ||
| val nullValues = Seq.fill(generator.output.size)(Literal(null)) | ||
| // Used to produce rows with no matches when outer = true. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| A, 20, 10:12:19 | ||
| B, 25, 7:8:4 | ||
| C, 19, 12:4:232 | ||
| D, 73, 243:53:7835 | ||
| E, 88, 1345:23:532532:353 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -87,6 +87,27 @@ class PruningSuite extends HiveComparisonTest with BeforeAndAfter { | |
| Seq("key"), | ||
| Seq.empty) | ||
|
|
||
| createPruningTest("Column pruning - explode with aggregate", | ||
| "SELECT name, sum(d) AS sumd FROM person LATERAL VIEW explode(data) d AS d GROUP BY name", | ||
| Seq("name", "sumd"), | ||
| Seq("name","data"), | ||
| Seq.empty) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need to go all the way to hive to create an integration test. Instead create some unit tests in |
||
|
|
||
| createPruningTest("Column pruning - outer explode with limit", | ||
| "SELECT name FROM person LATERAL VIEW OUTER explode(data) outd AS d" + | ||
| " where name < \"C\" limit 3", | ||
| Seq("name"), | ||
| Seq("name", "data"), | ||
| Seq.empty) | ||
|
|
||
| createPruningTest(s"Column pruning - select all without explode optimze - query test", | ||
| "SELECT * FROM person LATERAL VIEW OUTER explode(data) outd AS d WHERE 20 < age", | ||
| Seq("name", "age", "data", "d"), | ||
| Seq("name", "age", "data"), | ||
| Seq.empty) | ||
|
|
||
|
|
||
|
|
||
| // Partition pruning tests | ||
|
|
||
| createPruningTest("Partition pruning - non-partitioned, non-trivial project", | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just look for Project on top of generate instead of attempting to walk the query tree. You can assume other rules will push projects down to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generate block the process of collectProjectsAndFilters in PhysicalOperation.
Except that Project on top of generate, the expression named Filter may also be on the top of generate. How to solve this scene?
Project
Filter
Generate(explode)
Are there many Filters between Project and Generate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that we need several rules to accomplish all of the various optimizations. However, as its written now this is trying to do too much and as a result is too hard to follow.