Skip to content

Conversation

@allisonwang-db
Copy link
Contributor

@allisonwang-db allisonwang-db commented Feb 6, 2023

What changes were proposed in this pull request?

This PR is a follow-up for #37758. It updates the rule AddMetadataColumns to avoid introducing extra Project.

Why are the changes needed?

To fix an issue introduced by #37758.

-- t1: [key, value] t2: [key, value]
select t1.key, t2.key from t1 full outer join t2 using (key)

Before this PR, the rule AddMetadataColumns will add a new Project between the using join and the select list:

Project [key, key]
+- Project [key, key, key, key] <--- extra project
   +- Project [coalesce(key, key) AS key, value, value, key, key]
      +- Join FullOuter, (key = key)
         :- LocalRelation <empty>, [key#0, value#0]
         +- LocalRelation <empty>, [key#0, value#0]

After this PR, this extra Project will be removed.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add a new UT.

@github-actions github-actions bot added the SQL label Feb 6, 2023
@cloud-fan
Copy link
Contributor

The pyspark failure is unrelated, merging to master/3.4!

@cloud-fan cloud-fan closed this in 286d336 Feb 7, 2023
cloud-fan pushed a commit that referenced this pull request Feb 7, 2023
…aColumns

This PR is a follow-up for #37758. It updates the rule `AddMetadataColumns` to avoid introducing extra `Project`.

To fix an issue introduced by #37758.
```sql
-- t1: [key, value] t2: [key, value]
select t1.key, t2.key from t1 full outer join t2 using (key)
```
Before this PR, the rule `AddMetadataColumns` will add a new Project between the using join and the select list:
```
Project [key, key]
+- Project [key, key, key, key] <--- extra project
   +- Project [coalesce(key, key) AS key, value, value, key, key]
      +- Join FullOuter, (key = key)
         :- LocalRelation <empty>, [key#0, value#0]
         +- LocalRelation <empty>, [key#0, value#0]
```
After this PR, this extra Project will be removed.

No

Add a new UT.

Closes #39895 from allisonwang-db/spark-40149-follow-up.

Authored-by: allisonwang-db <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 286d336)
Signed-off-by: Wenchen Fan <[email protected]>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…aColumns

This PR is a follow-up for apache#37758. It updates the rule `AddMetadataColumns` to avoid introducing extra `Project`.

To fix an issue introduced by apache#37758.
```sql
-- t1: [key, value] t2: [key, value]
select t1.key, t2.key from t1 full outer join t2 using (key)
```
Before this PR, the rule `AddMetadataColumns` will add a new Project between the using join and the select list:
```
Project [key, key]
+- Project [key, key, key, key] <--- extra project
   +- Project [coalesce(key, key) AS key, value, value, key, key]
      +- Join FullOuter, (key = key)
         :- LocalRelation <empty>, [key#0, value#0]
         +- LocalRelation <empty>, [key#0, value#0]
```
After this PR, this extra Project will be removed.

No

Add a new UT.

Closes apache#39895 from allisonwang-db/spark-40149-follow-up.

Authored-by: allisonwang-db <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 286d336)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants