[SPARK-21897][PYTHON][R] Add unionByName API to DataFrame in Python and R #19105

HyukjinKwon · 2017-09-02T01:56:47Z

What changes were proposed in this pull request?

This PR proposes to add a wrapper for unionByName API to R and Python as well.

Python

df1 = spark.createDataFrame([[1, 2, 3]], ["col0", "col1", "col2"])
df2 = spark.createDataFrame([[4, 5, 6]], ["col1", "col2", "col0"])
df1.unionByName(df2).show()

+----+----+----+
|col0|col1|col3|
+----+----+----+
|   1|   2|   3|
|   6|   4|   5|
+----+----+----+

R

df1 <- select(createDataFrame(mtcars), "carb", "am", "gear")
df2 <- select(createDataFrame(mtcars), "am", "gear", "carb")
head(unionByName(limit(df1, 2), limit(df2, 2)))

  carb am gear
1    4  1    4
2    4  1    4
3    4  1    4
4    4  1    4

How was this patch tested?

Doctests for Python and unit test added in test_sparkSQL.R for R.

HyukjinKwon · 2017-09-02T01:57:31Z

cc @felixcheung and @actuaryzhang, could you take a look please?

SparkQA · 2017-09-02T03:00:36Z

Test build #81335 has finished for PR 19105 at commit 058feeb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-09-02T03:34:30Z

Python side change LGTM

felixcheung · 2017-09-02T07:41:52Z

do you mean for your example in the PR description above something different?
it says

head(union(limit(df1, 2), limit(df2, 2)))

felixcheung

LGTM, minor comments

felixcheung · 2017-09-02T07:42:49Z

R/pkg/R/DataFrame.R

            union(x, y)
          })

+#' Return a new SparkDataFrame containing the union of rows


I think I'd suggest a slight different in the title - this one is the same words as union

Sure, I just addressed both comments.

felixcheung · 2017-09-02T07:43:26Z

R/pkg/R/DataFrame.R

+#'
+#' Return a new SparkDataFrame containing the union of rows in this SparkDataFrame
+#' and another SparkDataFrame. This is different from both \code{UNION ALL} and
+#' \code{UNION DISTINCT} in SQL as column positions are not taken into account.


I'd list union() here too

HyukjinKwon · 2017-09-02T07:50:34Z

Oh, yea. I made a mistake during copying and pasting the example in the description. I just corrected it.

SparkQA · 2017-09-02T09:06:16Z

Test build #81341 has finished for PR 19105 at commit 7ce6713.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-09-03T12:05:48Z

Thank you @viirya and @felixcheung.

HyukjinKwon · 2017-09-03T12:05:58Z

Merged to master.

Add unionByName API to DataFrame in Python and R

058feeb

felixcheung approved these changes Sep 2, 2017

View reviewed changes

Address comments

7ce6713

felixcheung approved these changes Sep 2, 2017

View reviewed changes

asfgit closed this in 07fd68a Sep 3, 2017

GulajavaMinistudio mentioned this pull request Sep 4, 2017

[SPARK-21897][PYTHON][R] Add unionByName API to DataFrame in Python a… GulajavaMinistudio/spark#151

Merged

HyukjinKwon deleted the unionByName-r-python branch January 2, 2018 03:37

[SPARK-21897][PYTHON][R] Add unionByName API to DataFrame in Python and R #19105

[SPARK-21897][PYTHON][R] Add unionByName API to DataFrame in Python and R #19105

Uh oh!

Conversation

HyukjinKwon commented Sep 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HyukjinKwon commented Sep 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Sep 2, 2017

Uh oh!

viirya commented Sep 2, 2017

Uh oh!

felixcheung commented Sep 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

felixcheung Sep 2, 2017

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Sep 2, 2017

Choose a reason for hiding this comment

Uh oh!

felixcheung Sep 2, 2017

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Sep 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Sep 2, 2017

Uh oh!

HyukjinKwon commented Sep 3, 2017

Uh oh!

HyukjinKwon commented Sep 3, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HyukjinKwon commented Sep 2, 2017 •

edited

Loading

HyukjinKwon commented Sep 2, 2017 •

edited

Loading

felixcheung commented Sep 2, 2017 •

edited

Loading

HyukjinKwon commented Sep 2, 2017 •

edited

Loading