-
Notifications
You must be signed in to change notification settings - Fork 29k
[SQL] SPARK-6489: Optimize lateral view with explode to not unnecessary columns. #5358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
What's wrong?Who can verify it? |
|
ok to test |
|
Test build #29754 has finished for PR 5358 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that OK if we just do like
case gen @ Generate(_, _, _, _, child)
if (child.outputSet -- gen.references).nonEmpty =>
gen.copy(child = Project(a.references.toSeq, child))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we don't need other code change except the unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok,thanks:)
|
Test build #29844 has finished for PR 5358 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mutable state is no longer needed.
and update the prefix of answerfile when user.dir is not spark/sql/hive
|
Test build #30300 has finished for PR 5358 at commit
|
|
Test build #30311 has finished for PR 5358 at commit
|
|
Test build #30322 has finished for PR 5358 at commit
|
|
Test build #30328 has finished for PR 5358 at commit
|
|
What's wrong with the code in PruningSuite? createPruningTest("Column pruning - explode with aggregate",
"SELECT name, sum(d) AS sumd FROM person LATERAL VIEW explode(data) d AS d GROUP BY name",
Seq("name", "sumd"),
Seq("name","data"),
Seq.empty)
createPruningTest("Column pruning - outer explode with limit",
"SELECT name FROM person LATERAL VIEW OUTER explode(data) outd AS d" +
" where name < \"C\" limit 3",
Seq("name"),
Seq("name", "data"),
Seq.empty)
createPruningTest(s"Column pruning - select all without explode optimze - query test",
"SELECT * FROM person LATERAL VIEW OUTER explode(data) outd AS d WHERE 20 < age",
Seq("name", "age", "data", "d"),
Seq("name", "age", "data"),
Seq.empty)jenkins report -> Error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just look for Project on top of generate instead of attempting to walk the query tree. You can assume other rules will push projects down to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generate block the process of collectProjectsAndFilters in PhysicalOperation.
Except that Project on top of generate, the expression named Filter may also be on the top of generate. How to solve this scene?
Project
Filter
Generate(explode)
Are there many Filters between Project and Generate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that we need several rules to accomplish all of the various optimizations. However, as its written now this is trying to do too much and as a result is too hard to follow.
|
Can one of the admins verify this patch? |
|
@cloud-fan can you take this over? |
This PR takes over #5358 Author: Wenchen Fan <[email protected]> Closes #8268 from cloud-fan/6489. (cherry picked from commit b0dbaec) Signed-off-by: Michael Armbrust <[email protected]>
This PR takes over #5358 Author: Wenchen Fan <[email protected]> Closes #8268 from cloud-fan/6489.
|
Looks like #8286 already supersedes this. I think we can safely close this patch. |
This PR takes over apache/spark#5358 Author: Wenchen Fan <[email protected]> Closes #8268 from cloud-fan/6489.
|
I'm going to close this pull request. |
No description provided.