Skip to content

Conversation

@karenfeng
Copy link
Contributor

What changes were proposed in this pull request?

Follows up from #31666. This PR introduced a bug where the qualified star expansion of a subquery alias containing a NATURAL/USING output duplicated columns.

Why are the changes needed?

Duplicated, hidden columns should not be output from a star expansion.

Does this PR introduce any user-facing change?

The query

val df1 = Seq((3, 8)).toDF("a", "b") 
val df2 = Seq((8, 7)).toDF("b", "d") 
val joinDF = df1.join(df2, "b")
joinDF.alias("r").select("r.*")

Now outputs a single column b, instead of two (duplicate) columns for b.

How was this patch tested?

UTs

@github-actions github-actions bot added the SQL label Jun 3, 2022
cloud-fan pushed a commit that referenced this pull request Jun 6, 2022
…ery alias from NATURAL/USING JOIN

### What changes were proposed in this pull request?

Follows up from #31666. This PR introduced a bug where the qualified star expansion of a subquery alias containing a NATURAL/USING output duplicated columns.

### Why are the changes needed?

Duplicated, hidden columns should not be output from a star expansion.

### Does this PR introduce _any_ user-facing change?

The query

```
val df1 = Seq((3, 8)).toDF("a", "b")
val df2 = Seq((8, 7)).toDF("b", "d")
val joinDF = df1.join(df2, "b")
joinDF.alias("r").select("r.*")
```

Now outputs a single column `b`, instead of two (duplicate) columns for `b`.

### How was this patch tested?

UTs

Closes #36763 from karenfeng/SPARK-39376.

Authored-by: Karen Feng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 18ca369)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan cloud-fan closed this in 18ca369 Jun 6, 2022
cloud-fan pushed a commit that referenced this pull request Jun 6, 2022
…ery alias from NATURAL/USING JOIN

### What changes were proposed in this pull request?

Follows up from #31666. This PR introduced a bug where the qualified star expansion of a subquery alias containing a NATURAL/USING output duplicated columns.

### Why are the changes needed?

Duplicated, hidden columns should not be output from a star expansion.

### Does this PR introduce _any_ user-facing change?

The query

```
val df1 = Seq((3, 8)).toDF("a", "b")
val df2 = Seq((8, 7)).toDF("b", "d")
val joinDF = df1.join(df2, "b")
joinDF.alias("r").select("r.*")
```

Now outputs a single column `b`, instead of two (duplicate) columns for `b`.

### How was this patch tested?

UTs

Closes #36763 from karenfeng/SPARK-39376.

Authored-by: Karen Feng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…ery alias from NATURAL/USING JOIN

### What changes were proposed in this pull request?

Follows up from apache#31666. This PR introduced a bug where the qualified star expansion of a subquery alias containing a NATURAL/USING output duplicated columns.

### Why are the changes needed?

Duplicated, hidden columns should not be output from a star expansion.

### Does this PR introduce _any_ user-facing change?

The query

```
val df1 = Seq((3, 8)).toDF("a", "b")
val df2 = Seq((8, 7)).toDF("b", "d")
val joinDF = df1.join(df2, "b")
joinDF.alias("r").select("r.*")
```

Now outputs a single column `b`, instead of two (duplicate) columns for `b`.

### How was this patch tested?

UTs

Closes apache#36763 from karenfeng/SPARK-39376.

Authored-by: Karen Feng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants