Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Sep 2, 2017

What changes were proposed in this pull request?

This PR proposes to add a wrapper for unionByName API to R and Python as well.

Python

df1 = spark.createDataFrame([[1, 2, 3]], ["col0", "col1", "col2"])
df2 = spark.createDataFrame([[4, 5, 6]], ["col1", "col2", "col0"])
df1.unionByName(df2).show()
+----+----+----+
|col0|col1|col3|
+----+----+----+
|   1|   2|   3|
|   6|   4|   5|
+----+----+----+

R

df1 <- select(createDataFrame(mtcars), "carb", "am", "gear")
df2 <- select(createDataFrame(mtcars), "am", "gear", "carb")
head(unionByName(limit(df1, 2), limit(df2, 2)))
  carb am gear
1    4  1    4
2    4  1    4
3    4  1    4
4    4  1    4

How was this patch tested?

Doctests for Python and unit test added in test_sparkSQL.R for R.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Sep 2, 2017

cc @felixcheung and @actuaryzhang, could you take a look please?

@SparkQA
Copy link

SparkQA commented Sep 2, 2017

Test build #81335 has finished for PR 19105 at commit 058feeb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Sep 2, 2017

Python side change LGTM

@felixcheung
Copy link
Member

felixcheung commented Sep 2, 2017

do you mean for your example in the PR description above something different?
it says

head(union(limit(df1, 2), limit(df2, 2)))

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comments

union(x, y)
})

#' Return a new SparkDataFrame containing the union of rows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd suggest a slight different in the title - this one is the same words as union

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I just addressed both comments.

#'
#' Return a new SparkDataFrame containing the union of rows in this SparkDataFrame
#' and another SparkDataFrame. This is different from both \code{UNION ALL} and
#' \code{UNION DISTINCT} in SQL as column positions are not taken into account.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd list union() here too

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Sep 2, 2017

Oh, yea. I made a mistake during copying and pasting the example in the description. I just corrected it.

@SparkQA
Copy link

SparkQA commented Sep 2, 2017

Test build #81341 has finished for PR 19105 at commit 7ce6713.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Thank you @viirya and @felixcheung.

@HyukjinKwon
Copy link
Member Author

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants