[SPARK-20375][R] R wrappers for array and map #17674

zero323 · 2017-04-18T21:12:13Z

What changes were proposed in this pull request?

Adds wrappers for o.a.s.sql.functions.array and o.a.s.sql.functions.map

How was this patch tested?

Unit tests, check-cran.sh

zero323 · 2017-04-18T21:13:48Z

cc @felixcheung

SparkQA · 2017-04-18T21:50:40Z

Test build #75912 has finished for PR 17674 at commit 453a39d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-18T22:20:13Z

Test build #75917 has finished for PR 17674 at commit 6615b38.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-04-19T02:39:41Z

R/pkg/R/functions.R

@param ... additional Column(s). is what we have other places

Should we adjust this for concat(_ws), least, greatest and countDistinct?

I'd say, yes please.

felixcheung · 2017-04-19T02:47:31Z

R/pkg/R/functions.R

this should be Non-aggregate functions as per Scala doc

Do you mean normal_funcs?

perhaps that what it maps to in R, I haven't checked closely.
though I'd think it'd be better to be consistent with Scala so they could be more easily discoverable.

also I think we should change the @family name into full text instead of the short form some_funcs - that shows up in the generated doc. I didn't get around making all those changes but might make sense in the 2.3 release.

felixcheung · 2017-04-19T02:47:40Z

R/pkg/R/functions.R

ditto Non-aggregate functions

felixcheung · 2017-04-19T02:49:48Z

R/pkg/R/functions.R

null in JVM is mapped to NA in R - we haven't documented that consistently, but would be good to start thinking about the better way to do that

I think it is clear from the context that we mean SQL NULL and both lit(NA) and lit(NULL) create SQL NULL literal. But this reminds me of something else:

> lit(NaN) Column NULL > select(createDataFrame(data.frame(x=c(1))), lit(NaN)) SparkDataFrame[NULL:null]

doesn't look right. PySpark handles this correctly

>>> lit(float("Nan")) Column<b'NaN'>

with DoubleType.

I wouldn't be surprised that we have some issues with NaN...
but does it work if you add it to an existing dataframe instead of going via createDataFrame? there's some additional type inference going on in the 2nd route.

It doesn't work with createDataFrame either.

For lit it should be a quick fix because we can call Java lit with Float.NaN. createDataFrame won't be that simple.

actually but does it work if you add it to an existing dataframe instead of going via createDataFrame? there's some additional type inference going on in the 2nd route.
I mean like

a <- as.DataFrame(cars) a$foo <- lit(NaN)

No, it doesn't.

ok, let's open a JIRA on that separately..

My thoughts exactly.

felixcheung · 2017-04-19T02:50:02Z

R/pkg/R/functions.R

@param ... additional Column(s).

felixcheung · 2017-04-19T02:51:12Z

R/pkg/R/generics.R

this is also ###################### Expression Function Methods ########################## might not be the right place

It covers all o.a.s.sql.functions right now. I am not sure these two are different enough to be an exception (and what about struct which belongs to the same category).

actually you are right - I saw ###################### Column Methods ########################## and thought that's the place but you are right, we already have them in both places.

I'm fine with what you have

SparkQA · 2017-04-19T11:40:36Z

Test build #75939 has finished for PR 17674 at commit d2b9723.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung

LGTM

felixcheung · 2017-04-20T04:20:19Z

merged to master. thanks! one step closer to parity

## What changes were proposed in this pull request? Adds wrappers for `o.a.s.sql.functions.array` and `o.a.s.sql.functions.map` ## How was this patch tested? Unit tests, `check-cran.sh` Author: zero323 <[email protected]> Closes apache#17674 from zero323/SPARK-20375.

zero323 force-pushed the SPARK-20375 branch from 453a39d to 6615b38 Compare April 18, 2017 21:46

felixcheung requested changes Apr 19, 2017

View reviewed changes

Add wrappers for array and map functions

d2b9723

zero323 force-pushed the SPARK-20375 branch from 6615b38 to d2b9723 Compare April 19, 2017 11:04

zero323 mentioned this pull request Apr 19, 2017

[SPARK-20371][R] Add wrappers for collect_list and collect_set #17672

Closed

felixcheung approved these changes Apr 19, 2017

View reviewed changes

asfgit closed this in 46c5749 Apr 20, 2017

zero323 deleted the SPARK-20375 branch April 20, 2017 20:44

[SPARK-20375][R] R wrappers for array and map #17674

[SPARK-20375][R] R wrappers for array and map #17674

Uh oh!

Conversation

zero323 commented Apr 18, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

zero323 commented Apr 18, 2017

Uh oh!

SparkQA commented Apr 18, 2017

Uh oh!

SparkQA commented Apr 18, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 19, 2017

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

felixcheung commented Apr 20, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants