Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Jan 13, 2017

What changes were proposed in this pull request?

This pr is to preserve aliases that are given for pivot aggregations to solve the issue reported in SPARK-17237. This pivoting adds backticks (e.g. 3_count(`c`)) in column names and, in some cases,
thes causes analysis exceptions like;

scala> val df = Seq((2, 3, 4), (3, 4, 5)).toDF("a", "x", "y")
scala> df.groupBy("a").pivot("x").agg(count("y"), avg("y")).na.fill(0)
org.apache.spark.sql.AnalysisException: syntax error in attribute name: `3_count(`y`)`;
  at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:134)
  at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:144)
...

So, this pr also removes these backticks from column names.

How was this patch tested?

Added a test in DataFrameAggregateSuite.

@SparkQA
Copy link

SparkQA commented Jan 13, 2017

Test build #71281 has finished for PR 16565 at commit 875e2c0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Could you change the PR title to [SPARK-17237][SQL][Backport-2.0] Remove backticks in a pivot result schema

@maropu
Copy link
Member Author

maropu commented Jan 13, 2017

okay! I'm now looking into the test failure, so just a sec, thanks

@maropu maropu changed the title [SPARK-17237][SQL] Remove backticks in a pivot result schema [SPARK-17237][SQL][Backport-2.0] Remove backticks in a pivot result schema Jan 13, 2017
@SparkQA
Copy link

SparkQA commented Jan 13, 2017

Test build #71290 has finished for PR 16565 at commit e2c2fae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

I checked the change history. Actually, you also backported #15111. Could you please update your PR description and PR title?

@gatorsmile
Copy link
Member

LGTM except one comment

@maropu
Copy link
Member Author

maropu commented Jan 13, 2017

@gatorsmile oh, I see. Is it okay to mix this pr with the fix of #15111? Would it be better to backport #15111 first then, backport this?

@maropu
Copy link
Member Author

maropu commented Jan 15, 2017

@gatorsmile ping

@gatorsmile
Copy link
Member

I think it is fine to do it together. Basically, your PR is to fix the bug of #15111

@maropu
Copy link
Member Author

maropu commented Jan 15, 2017

okay! I'll update them

@maropu maropu changed the title [SPARK-17237][SQL][Backport-2.0] Remove backticks in a pivot result schema [SPARK-17237][SPARK-17458][SQL][Backport-2.0] Alias specified for aggregates in a pivot are not honored Jan 15, 2017
@maropu maropu changed the title [SPARK-17237][SPARK-17458][SQL][Backport-2.0] Alias specified for aggregates in a pivot are not honored [SPARK-17237][SPARK-17458][SQL][Backport-2.0] Preserve aliases that are given for pivot aggregations Jan 15, 2017
@maropu
Copy link
Member Author

maropu commented Jan 15, 2017

@gatorsmile How about this fix? plz check this again?

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

asfgit pushed a commit that referenced this pull request Jan 15, 2017
…re given for pivot aggregations

## What changes were proposed in this pull request?
This pr is to preserve aliases that are given for pivot aggregations to solve the issue reported in `SPARK-17237`. This pivoting adds backticks (e.g. 3_count(\`c\`)) in column names and, in some cases,
thes causes analysis exceptions  like;
```
scala> val df = Seq((2, 3, 4), (3, 4, 5)).toDF("a", "x", "y")
scala> df.groupBy("a").pivot("x").agg(count("y"), avg("y")).na.fill(0)
org.apache.spark.sql.AnalysisException: syntax error in attribute name: `3_count(`y`)`;
  at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:134)
  at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:144)
...
```
So, this pr also removes these backticks from column names.

## How was this patch tested?
Added a test in `DataFrameAggregateSuite`.

Author: Takeshi YAMAMURO <[email protected]>

Closes #16565 from maropu/SPARK-17237-3.
@gatorsmile
Copy link
Member

Thanks! Merging to 2.0

Could you please close it?

@maropu
Copy link
Member Author

maropu commented Jan 15, 2017

Okay and thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants