[SPARK-25908][SQL][FOLLOW-UP] Add back unionAll #23131

gatorsmile · 2018-11-24T20:09:21Z

What changes were proposed in this pull request?

This PR is to add back unionAll, which is widely used. The name is also consistent with our ANSI SQL. We also have the corresponding intersectAll and exceptAll, which were introduced in Spark 2.4.

How was this patch tested?

Added a test case in DataFrameSuite

gatorsmile · 2018-11-24T20:11:07Z

cc @rxin @srowen @cloud-fan

rxin · 2018-11-24T20:12:02Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

  }

+  /**
+   * Returns a new Dataset containing union of rows in this Dataset and another Dataset.


say that this is an alias of union.

srowen

Sounds fine, so this un-deprecates it effectively. The migration docs need to be updated too, in sparkr.md and sql-migration-guide-upgrade.md, to remove reference to unionAll. I updated the JIRA release notes.

It needs to be restored to R too; see 41e1416#diff-508641a8bd6c6b59f3e77c80cdcfa6a9

SparkQA · 2018-11-24T21:35:48Z

Test build #99229 has finished for PR 23131 at commit 12dfd77.

This patch fails some tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-24T21:40:48Z

Test build #99230 has finished for PR 23131 at commit f0dfe7b.

This patch fails some tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-24T23:50:04Z

Test build #99228 has finished for PR 23131 at commit 133246d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-11-25T00:22:23Z

docs/sql-migration-guide-upgrade.md

   APIs. Instead, `DataFrame` remains the primary programming abstraction, which is analogous to the
   single-node data frame notion in these languages.

- - Dataset and DataFrame API `unionAll` has been deprecated and replaced by `union`


Ur, we cannot change the history. Until Spark 2.4.0, we are showing the deprecation warning.

scala> spark.version res2: String = 2.4.0 scala> df.unionAll(df2) <console>:28: warning: method unionAll in class Dataset is deprecated: use union() df.unionAll(df2) ^

Shall we keep the history in this specific migration doc, Upgrading From Spark SQL 1.6 to 2.0, and add some comment about 3.0.0 instead?

That's my fault for making this suggestion. Yeah maybe best to leave this statement, and add a note here or the the 3.0 migration guide that it has been subsequently un-deprecated

SparkQA · 2018-11-25T01:21:33Z

Test build #99231 has finished for PR 23131 at commit 170262b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-25T08:05:02Z

Test build #99234 has finished for PR 23131 at commit 515c04c.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-11-25T18:26:50Z

retest this please

dongjoon-hyun

+1, LGTM. Thanks!

SparkQA · 2018-11-25T22:03:11Z

Test build #99242 has finished for PR 23131 at commit 515c04c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-11-26T02:34:30Z

shall we say union is an alias of unionAll instead of unionAll is an alias of Union? According to the SQL spec, unionAll is implemented correctly that it keeps duplicated rows, while union does not follow SQL spec, as it's too widely used and it's too late to change behavior.

gatorsmile · 2018-11-26T04:54:47Z

Thanks! Merged to master.

Yes. Adding Distinct over Union is super expensive especially when the underlying data set is huge.

felixcheung · 2018-11-27T17:09:16Z

R/pkg/R/DataFrame.R


+#' Return a new SparkDataFrame containing the union of rows
+#'
+#' This is an alias for `union`.


If the goal is for this to be like other *All, this should go into a separate doc page, plus seealso, example etc.

The way this was written, as it was a deprecated function, this doc page merged with union - as it is committed now, none of the text above will show up and also unionAll will not be listed in method index list.

also backtick doesn't format with roxygen2. this should be

This is an alias for \code{union}.

I see. Instead of directly copying the comments back, we should follow intersectAll. Opened a ticket: https://issues.apache.org/jira/browse/SPARK-26189

This PR is to add back `unionAll`, which is widely used. The name is also consistent with our ANSI SQL. We also have the corresponding `intersectAll` and `exceptAll`, which were introduced in Spark 2.4. Added a test case in DataFrameSuite Closes apache#23131 from gatorsmile/addBackUnionAll. Authored-by: gatorsmile <[email protected]> Signed-off-by: gatorsmile <[email protected]>

## What changes were proposed in this pull request? This PR is to add back `unionAll`, which is widely used. The name is also consistent with our ANSI SQL. We also have the corresponding `intersectAll` and `exceptAll`, which were introduced in Spark 2.4. ## How was this patch tested? Added a test case in DataFrameSuite Closes apache#23131 from gatorsmile/addBackUnionAll. Authored-by: gatorsmile <[email protected]> Signed-off-by: gatorsmile <[email protected]>

Add back unionAll

133246d

rxin reviewed Nov 24, 2018

View reviewed changes

srowen requested changes Nov 24, 2018

View reviewed changes

address comments.

12dfd77

update the doc

f0dfe7b

update SparkR

170262b

srowen approved these changes Nov 24, 2018

View reviewed changes

dongjoon-hyun reviewed Nov 25, 2018

View reviewed changes

address comments.

515c04c

dongjoon-hyun approved these changes Nov 25, 2018

View reviewed changes

asfgit closed this in 9414578 Nov 25, 2018

felixcheung reviewed Nov 27, 2018

View reviewed changes

[SPARK-25908][SQL][FOLLOW-UP] Add back unionAll #23131

[SPARK-25908][SQL][FOLLOW-UP] Add back unionAll #23131

Uh oh!

Conversation

gatorsmile commented Nov 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

gatorsmile commented Nov 24, 2018

Uh oh!

rxin Nov 24, 2018

Choose a reason for hiding this comment

Uh oh!

gatorsmile Nov 24, 2018

Choose a reason for hiding this comment

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 24, 2018

Uh oh!

SparkQA commented Nov 24, 2018

Uh oh!

SparkQA commented Nov 24, 2018

Uh oh!

dongjoon-hyun Nov 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen Nov 25, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 25, 2018

Uh oh!

SparkQA commented Nov 25, 2018

Uh oh!

gatorsmile commented Nov 25, 2018

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 25, 2018

Uh oh!

cloud-fan commented Nov 26, 2018

Uh oh!

gatorsmile commented Nov 26, 2018

Uh oh!

felixcheung Nov 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felixcheung Nov 27, 2018

Choose a reason for hiding this comment

Uh oh!

gatorsmile Nov 27, 2018

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

gatorsmile commented Nov 24, 2018 •

edited

Loading

dongjoon-hyun Nov 25, 2018 •

edited

Loading

felixcheung Nov 27, 2018 •

edited

Loading