-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12719][SQL] [WIP] SQL generation support for generators, including UDTF #11596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@rxin @cloud-fan @liancheng @gatorsmile Hi Reynold and Wenchen, |
|
ok to test |
|
Test build #52719 has finished for PR 11596 at commit
|
|
hi @dilipbiswal , thanks for working on this! Some high-level questions:
Generally speaking, we should use one unified format for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an assert here.
assert(join == true || plan.projectList == 1)|
@cloud-fan Thank you.
Yes, based on the tests i have tried. I am testing this more to see if i uncover any issue. Now i am converting select explode(array(1,2,3)) from t1to select tablequalifier.colname from t1 LATERAL VIEW explode(array(1,2,3)) tablequalifier as colname
No. As we have a restriction of allowing one generate in a projection list. Thats why we try to |
|
But a |
|
@cloud-fan Yeah.. that should be possible, You are thinking to represent each Generate as a sub select clause ? |
|
I'm not sure which one is better, it depends on which one is easier to understand and implement. I'm ok to generate verbose SQL string, but the generator itself should be as simple and robust as possible. |
|
Test build #52725 has finished for PR 11596 at commit
|
|
@cloud-fan Thanks !! Actually i remember exploring this option. There is a generate syntax for outer lateral view like following - SELECT * FROM src LATERAL VIEW OUTER explode(array()) C AS a limit 10;I thought if we converted this generate operator as a sub select where the generator is in projection |
|
ah makes sense, then we should always generate a LATERAL VIEW format SQL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just do plan.generatorOutput.map(_.sql).mkString(", ")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Thx
#### What changes were proposed in this pull request? As shown in another PR: apache#11596, we are using `SELECT 1` as a dummy table, when the table is used for SQL statements in which a table reference is required, but the contents of the table are not important. For example, ```SQL SELECT value FROM (select 1) dummyTable Lateral View explode(array(1,2,3)) adTable as value ``` Before the PR, the optimized plan contains a useless `Project` after Optimizer executing the `ColumnPruning` rule, as shown below: ``` == Analyzed Logical Plan == value: int Project [value#22] +- Generate explode(array(1, 2, 3)), true, false, Some(adtable), [value#22] +- SubqueryAlias dummyTable +- Project [1 AS 1#21] +- OneRowRelation$ == Optimized Logical Plan == Generate explode([1,2,3]), false, false, Some(adtable), [value#22] +- Project +- OneRowRelation$ ``` After the fix, the optimized plan removed the useless `Project`, as shown below: ``` == Optimized Logical Plan == Generate explode([1,2,3]), false, false, Some(adtable), [value#22] +- OneRowRelation$ ``` This PR is to remove `Project` when its Child's output is Nil #### How was this patch tested? Added a new unit test case into the suite `ColumnPruningSuite.scala` Author: gatorsmile <[email protected]> Closes apache#11599 from gatorsmile/projectOneRowRelation.
d7f0207 to
bea871f
Compare
|
Test build #52915 has finished for PR 11596 at commit
|
|
@cloud-fan Can you please help trigger a retest ? This does not seem related to my changes. |
|
retest this please |
|
Test build #52930 has finished for PR 11596 at commit
|
|
Hi @dilipbiswal , sorry that recently I made a major refactor to |
|
Closing this in favor of |
## What changes were proposed in this pull request? This PR adds SQL generation support for `Generate` operator. It always converts `Generate` operator into `LATERAL VIEW` format as there are many limitations to put UDTF in project list. This PR is based on #11658, please see the last commit to review the real changes. Thanks dilipbiswal for his initial work! Takes over #11596 ## How was this patch tested? new tests in `LogicalPlanToSQLSuite` Author: Wenchen Fan <[email protected]> Closes #11696 from cloud-fan/generate.
#### What changes were proposed in this pull request? As shown in another PR: apache#11596, we are using `SELECT 1` as a dummy table, when the table is used for SQL statements in which a table reference is required, but the contents of the table are not important. For example, ```SQL SELECT value FROM (select 1) dummyTable Lateral View explode(array(1,2,3)) adTable as value ``` Before the PR, the optimized plan contains a useless `Project` after Optimizer executing the `ColumnPruning` rule, as shown below: ``` == Analyzed Logical Plan == value: int Project [value#22] +- Generate explode(array(1, 2, 3)), true, false, Some(adtable), [value#22] +- SubqueryAlias dummyTable +- Project [1 AS 1#21] +- OneRowRelation$ == Optimized Logical Plan == Generate explode([1,2,3]), false, false, Some(adtable), [value#22] +- Project +- OneRowRelation$ ``` After the fix, the optimized plan removed the useless `Project`, as shown below: ``` == Optimized Logical Plan == Generate explode([1,2,3]), false, false, Some(adtable), [value#22] +- OneRowRelation$ ``` This PR is to remove `Project` when its Child's output is Nil #### How was this patch tested? Added a new unit test case into the suite `ColumnPruningSuite.scala` Author: gatorsmile <[email protected]> Closes apache#11599 from gatorsmile/projectOneRowRelation.
## What changes were proposed in this pull request? This PR adds SQL generation support for `Generate` operator. It always converts `Generate` operator into `LATERAL VIEW` format as there are many limitations to put UDTF in project list. This PR is based on apache#11658, please see the last commit to review the real changes. Thanks dilipbiswal for his initial work! Takes over apache#11596 ## How was this patch tested? new tests in `LogicalPlanToSQLSuite` Author: Wenchen Fan <[email protected]> Closes apache#11696 from cloud-fan/generate.
What changes were proposed in this pull request?
This is a alternate way to convert SQL from analyzed logical plans containing Generate operator.
In this PR , generators in projection list are expressed as LATERAL VIEW.
Sample Plan :
Generated Query:
How was this patch tested?
Tests added to LogicalPlanToSQLSuite
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)