Skip to content

Conversation

@zero323
Copy link
Member

@zero323 zero323 commented Apr 18, 2017

What changes were proposed in this pull request?

Adds wrappers for o.a.s.sql.functions.array and o.a.s.sql.functions.map

How was this patch tested?

Unit tests, check-cran.sh

@zero323
Copy link
Member Author

zero323 commented Apr 18, 2017

cc @felixcheung

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75912 has finished for PR 17674 at commit 453a39d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75917 has finished for PR 17674 at commit 6615b38.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@param ... additional Column(s). is what we have other places

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we adjust this for concat(_ws), least, greatest and countDistinct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say, yes please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be Non-aggregate functions as per Scala doc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean normal_funcs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps that what it maps to in R, I haven't checked closely.
though I'd think it'd be better to be consistent with Scala so they could be more easily discoverable.

also I think we should change the @family name into full text instead of the short form some_funcs - that shows up in the generated doc. I didn't get around making all those changes but might make sense in the 2.3 release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto Non-aggregate functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null in JVM is mapped to NA in R - we haven't documented that consistently, but would be good to start thinking about the better way to do that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is clear from the context that we mean SQL NULL and both lit(NA) and lit(NULL) create SQL NULL literal. But this reminds me of something else:

> lit(NaN)
Column NULL 

> select(createDataFrame(data.frame(x=c(1))), lit(NaN))
SparkDataFrame[NULL:null]

doesn't look right. PySpark handles this correctly

>>> lit(float("Nan"))
Column<b'NaN'>

with DoubleType.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be surprised that we have some issues with NaN...
but does it work if you add it to an existing dataframe instead of going via createDataFrame? there's some additional type inference going on in the 2nd route.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't work with createDataFrame either.

For lit it should be a quick fix because we can call Java lit with Float.NaN. createDataFrame won't be that simple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually but does it work if you add it to an existing dataframe instead of going via createDataFrame? there's some additional type inference going on in the 2nd route.
I mean like

a <- as.DataFrame(cars)
a$foo <- lit(NaN)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's open a JIRA on that separately..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts exactly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@param ... additional Column(s).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also ###################### Expression Function Methods ########################## might not be the right place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It covers all o.a.s.sql.functions right now. I am not sure these two are different enough to be an exception (and what about struct which belongs to the same category).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you are right - I saw ###################### Column Methods ########################## and thought that's the place but you are right, we already have them in both places.

I'm fine with what you have

@SparkQA
Copy link

SparkQA commented Apr 19, 2017

Test build #75939 has finished for PR 17674 at commit d2b9723.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@felixcheung
Copy link
Member

merged to master. thanks! one step closer to parity

@asfgit asfgit closed this in 46c5749 Apr 20, 2017
@zero323 zero323 deleted the SPARK-20375 branch April 20, 2017 20:44
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
## What changes were proposed in this pull request?

Adds wrappers for `o.a.s.sql.functions.array` and `o.a.s.sql.functions.map`

## How was this patch tested?

Unit tests, `check-cran.sh`

Author: zero323 <[email protected]>

Closes apache#17674 from zero323/SPARK-20375.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants