-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9319][SPARKR] Add support for setting column names, types #9218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
3dfa4ea
87f28fc
f79ebf2
f304f9c
3bb2a8d
f7a51de
8185f03
a01cbf5
f21ef8d
4f5e60c
96cf87f
d89be0b
56419cf
eb59b94
86d6526
59db9e9
14d08b9
0451b00
fab710a
40c77fb
729f983
bb5a2af
45029bf
e8ec2a7
69b9e4b
3c47188
97b3c8f
ac4118d
fc27dfb
40d3c67
aa494a9
643c49c
fae9bba
dc7e399
49f7117
046e32e
cf04fdf
f8d93ed
8d3f043
3e770a6
e963070
e209fa2
ea4a3e7
767522d
74ba952
a930e62
71d1c90
f92f334
b3aedca
33ae7a3
db11ee5
2804674
ecfb3e7
ec03866
c020f7d
476f434
21ad846
2cef1bb
9cb5c73
efaa472
9cf56c9
c34c27f
d728d5c
67e23b3
425ff03
b86f2ca
233e534
d188a67
57446eb
d6035d9
d6f10aa
3434572
f54ff19
b2e4b31
ebf8b0b
a9676cc
1d04dc9
f6fcb48
680b4e7
53e9cee
5051262
d648a4a
e352de0
2692bdb
8aff36e
c09e513
e328b69
820064e
9b214ce
8790ee6
27feafc
cd1df66
e0fc9c7
3bd6f5d
987df4b
de289bf
abf5e42
d19f4fd
701fb50
1b6a5d4
411ff6a
b6e0a5a
ce5e6a2
a752dda
d0b5633
81498dd
6f81eae
859dff5
7bdc921
a94671a
77488fb
72634f2
2e86cf1
a4b5cef
b072ff4
9da7cee
c76865c
6b87acd
f80f7b6
14ee0f5
8a5314e
b9455d1
d9e30c5
b6974f8
2440106
07414af
6091e91
8fa8c83
468ad0a
5e31db7
3cc2c05
eec74ba
363a476
bc5d6c0
253e87e
cf69ce1
574141a
c048929
8211aab
62bb290
49f1a82
f328fed
3a652f6
c447c9d
f6680cd
7e9a9e6
1ab72b0
6d0ead3
1c80d66
105732d
30b706b
7f74190
2ff0e79
ef36284
4b69a42
d981902
5c4e6d7
30c8ba7
2673905
b2d195e
97b7080
d8b50f7
9e48cdf
b541b31
8c0e1b5
d50a66c
9b88e1d
08a7a83
404a28f
cd17488
874cd66
860ea0d
88a3fdc
5039a49
51d41e4
b7720fa
f138cb8
150f6a8
a3a7c91
8a23368
fcb57e9
9565c24
2f38378
9c740a9
675c7e7
7dc9d8d
61f9c87
26062d2
0ce6f9b
1f0f14e
6502944
1431319
c4e19b3
d6cd3a1
521b3ca
5507a9d
a81f47f
689386b
6e5fc37
e0701c7
47735cd
dfcfcbc
5360085
87aedc4
f14e951
18350a5
dba1a62
724cf7a
638c51d
32790fe
1dde39d
e281b87
3121e78
21c562f
a398905
c0e48df
33112f9
3e0a6cf
9009175
6600786
12c7635
745e45d
6e101d2
fac53d8
1a8e046
bd70244
99f5f98
1bc4112
1510c52
27524a3
95daff6
c964fc1
9c57bc0
ec2b807
e71ba56
529a1d3
27029bc
df97df2
a9a6b80
dd77e27
2d76e44
e1bcf6a
1a21be1
b8ff688
a40838a
3343047
969dc0e
4e820ec
e2399b5
8730a37
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
…lumn
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -293,13 +293,22 @@ setMethod("colnames<-", | |
| dataFrame(sdf) | ||
| }) | ||
|
|
||
| rToScalaTypes <- new.env() | ||
| rToScalaTypes[["integer"]] <- "integer" # in R, integer is 32bit | ||
| rToScalaTypes[["numeric"]] <- "double" # in R, numeric == double which is 64bit | ||
| rToScalaTypes[["double"]] <- "double" | ||
| rToScalaTypes[["character"]] <- "string" | ||
| rToScalaTypes[["logical"]] <- "boolean" | ||
|
|
||
| #' coltypes | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. coltypes<- ?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. R doc is usually under the getter name instead of the setter (
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks at the "names" and "names<-" in DataFrame.R, seems "coltypes" and "coltypes<-" should share a same function description. We should co-operate between this PR and the PR for "coltypes"
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that's the plan too. could we merge #8984 now? :) |
||
| #' | ||
| #' Set the column types of a DataFrame. | ||
| #' | ||
| #' @name coltypes | ||
| #' @param x (DataFrame) | ||
| #' @return value (character) A character vector with the target column types for the given DataFrame | ||
| #' @return value (character) A character vector with the target column types for the given | ||
| #' DataFrame. Column types can be one of integer, numeric/double, character, logical, or NA | ||
| #' to keep that column as-is. | ||
| #' @rdname coltypes | ||
| #' @aliases coltypes | ||
| #' @export | ||
|
|
@@ -309,7 +318,8 @@ setMethod("colnames<-", | |
| #' sqlContext <- sparkRSQL.init(sc) | ||
| #' path <- "path/to/file.json" | ||
| #' df <- jsonFile(sqlContext, path) | ||
| #' coltypes(df) <- c("string", "integer") | ||
| #' coltypes(df) <- c("character", "integer") | ||
| #' coltypes(df) <- c(NA, "numeric") | ||
| #'} | ||
| setMethod("coltypes<-", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think users may expect
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Certainly, it is in PR 8984 by @olarayej
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this is a little tricky. In #8984 we are converting the SparkSQL types to R types. So in that case for consistency we should take in R types here (i.e character, numeric etc.) and convert them to SparkSQL types
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's correct. I'm hoping #8984 can be merged soon so I could add a new reverse mapping in the same place. I could make this [WIP] if you'd like |
||
| signature(x = "DataFrame", value = "character"), | ||
|
|
@@ -321,7 +331,15 @@ setMethod("coltypes<-", | |
| } | ||
| newCols <- lapply(seq_len(ncols), function(i) { | ||
| col <- getColumn(x, cols[i]) | ||
| cast(col, value[i]) | ||
| if (!is.na(value[i])) { | ||
| stype <- rToScalaTypes[[value[i]]] | ||
| if (is.null(stype)) { | ||
| stop("Only atomic type is supported for column types") | ||
| } | ||
| cast(col, stype) | ||
| } else { | ||
| col | ||
| } | ||
| }) | ||
| nx <- select(x, newCols) | ||
| dataFrame(nx@sdf) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep R types -> SQLTypes mapping here, but it would be better we can refactor mapping between R types and SQLTypes later (to combine with SQLTypes -> R types mapping in "colTypes()")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's the plan.