-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20375][R] R wrappers for array and map #17674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @felixcheung |
|
Test build #75912 has finished for PR 17674 at commit
|
|
Test build #75917 has finished for PR 17674 at commit
|
R/pkg/R/functions.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@param ... additional Column(s). is what we have other places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we adjust this for concat(_ws), least, greatest and countDistinct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say, yes please.
R/pkg/R/functions.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be Non-aggregate functions as per Scala doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean normal_funcs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps that what it maps to in R, I haven't checked closely.
though I'd think it'd be better to be consistent with Scala so they could be more easily discoverable.
also I think we should change the @family name into full text instead of the short form some_funcs - that shows up in the generated doc. I didn't get around making all those changes but might make sense in the 2.3 release.
R/pkg/R/functions.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto Non-aggregate functions
R/pkg/R/functions.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null in JVM is mapped to NA in R - we haven't documented that consistently, but would be good to start thinking about the better way to do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is clear from the context that we mean SQL NULL and both lit(NA) and lit(NULL) create SQL NULL literal. But this reminds me of something else:
> lit(NaN)
Column NULL
> select(createDataFrame(data.frame(x=c(1))), lit(NaN))
SparkDataFrame[NULL:null]
doesn't look right. PySpark handles this correctly
>>> lit(float("Nan"))
Column<b'NaN'>with DoubleType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be surprised that we have some issues with NaN...
but does it work if you add it to an existing dataframe instead of going via createDataFrame? there's some additional type inference going on in the 2nd route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't work with createDataFrame either.
For lit it should be a quick fix because we can call Java lit with Float.NaN. createDataFrame won't be that simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually but does it work if you add it to an existing dataframe instead of going via createDataFrame? there's some additional type inference going on in the 2nd route.
I mean like
a <- as.DataFrame(cars)
a$foo <- lit(NaN)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's open a JIRA on that separately..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts exactly.
R/pkg/R/functions.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@param ... additional Column(s).
R/pkg/R/generics.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also ###################### Expression Function Methods ########################## might not be the right place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It covers all o.a.s.sql.functions right now. I am not sure these two are different enough to be an exception (and what about struct which belongs to the same category).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually you are right - I saw ###################### Column Methods ########################## and thought that's the place but you are right, we already have them in both places.
I'm fine with what you have
|
Test build #75939 has finished for PR 17674 at commit
|
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
merged to master. thanks! one step closer to parity |
## What changes were proposed in this pull request? Adds wrappers for `o.a.s.sql.functions.array` and `o.a.s.sql.functions.map` ## How was this patch tested? Unit tests, `check-cran.sh` Author: zero323 <[email protected]> Closes apache#17674 from zero323/SPARK-20375.
What changes were proposed in this pull request?
Adds wrappers for
o.a.s.sql.functions.arrayando.a.s.sql.functions.mapHow was this patch tested?
Unit tests,
check-cran.sh