Skip to content

Commit 1a93323

Browse files
felixcheungshivaram
authored andcommitted
[SPARK-11339][SPARKR] Document the list of functions in R base package that are masked by functions with same name in SparkR
Added tests for function that are reported as masked, to make sure the base:: or stats:: function can be called. For those we can't call, added them to SparkR programming guide. It would seem to me `table, sample, subset, filter, cov` not working are not actually expected - I investigated/experimented with them but couldn't get them to work. It looks like as they are defined in base or stats they are missing the S3 generic, eg. ``` > methods("transform") [1] transform,ANY-method transform.data.frame [3] transform,DataFrame-method transform.default see '?methods' for accessing help and source code > methods("subset") [1] subset.data.frame subset,DataFrame-method subset.default [4] subset.matrix see '?methods' for accessing help and source code Warning message: In .S3methods(generic.function, class, parent.frame()) : function 'subset' appears not to be S3 generic; found functions that look like S3 methods ``` Any idea? More information on masking: http://www.ats.ucla.edu/stat/r/faq/referencing_objects.htm http://www.sfu.ca/~sweldon/howTo/guide4.pdf This is what the output doc looks like (minus css): ![image](https://cloud.githubusercontent.com/assets/8969467/11229714/2946e5de-8d4d-11e5-94b0-dda9696b6fdd.png) Author: felixcheung <[email protected]> Closes #9785 from felixcheung/rmasked.
1 parent d02d5b9 commit 1a93323

File tree

6 files changed

+77
-6
lines changed

6 files changed

+77
-6
lines changed

R/pkg/R/DataFrame.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2152,7 +2152,7 @@ setMethod("with",
21522152
})
21532153

21542154
#' Returns the column types of a DataFrame.
2155-
#'
2155+
#'
21562156
#' @name coltypes
21572157
#' @title Get column types of a DataFrame
21582158
#' @family dataframe_funcs

R/pkg/R/functions.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2204,7 +2204,7 @@ setMethod("denseRank",
22042204
#' @export
22052205
#' @examples \dontrun{lag(df$c)}
22062206
setMethod("lag",
2207-
signature(x = "characterOrColumn", offset = "numeric", defaultValue = "ANY"),
2207+
signature(x = "characterOrColumn"),
22082208
function(x, offset, defaultValue = NULL) {
22092209
col <- if (class(x) == "Column") {
22102210
x@jc

R/pkg/R/generics.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -539,7 +539,7 @@ setGeneric("showDF", function(x,...) { standardGeneric("showDF") })
539539

540540
# @rdname subset
541541
# @export
542-
setGeneric("subset", function(x, subset, select, ...) { standardGeneric("subset") })
542+
setGeneric("subset", function(x, ...) { standardGeneric("subset") })
543543

544544
#' @rdname agg
545545
#' @export
@@ -790,7 +790,7 @@ setGeneric("kurtosis", function(x) { standardGeneric("kurtosis") })
790790

791791
#' @rdname lag
792792
#' @export
793-
setGeneric("lag", function(x, offset, defaultValue = NULL) { standardGeneric("lag") })
793+
setGeneric("lag", function(x, ...) { standardGeneric("lag") })
794794

795795
#' @rdname last
796796
#' @export

R/pkg/inst/tests/test_mllib.R

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,11 @@ test_that("glm and predict", {
3131
model <- glm(Sepal_Width ~ Sepal_Length, training, family = "gaussian")
3232
prediction <- predict(model, test)
3333
expect_equal(typeof(take(select(prediction, "prediction"), 1)$prediction), "double")
34+
35+
# Test stats::predict is working
36+
x <- rnorm(15)
37+
y <- x + rnorm(15)
38+
expect_equal(length(predict(lm(y ~ x))), 15)
3439
})
3540

3641
test_that("glm should work with long formula", {

R/pkg/inst/tests/test_sparkSQL.R

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -433,6 +433,10 @@ test_that("table() returns a new DataFrame", {
433433
expect_is(tabledf, "DataFrame")
434434
expect_equal(count(tabledf), 3)
435435
dropTempTable(sqlContext, "table1")
436+
437+
# Test base::table is working
438+
#a <- letters[1:3]
439+
#expect_equal(class(table(a, sample(a))), "table")
436440
})
437441

438442
test_that("toRDD() returns an RRDD", {
@@ -673,6 +677,9 @@ test_that("sample on a DataFrame", {
673677
# Also test sample_frac
674678
sampled3 <- sample_frac(df, FALSE, 0.1, 0) # set seed for predictable result
675679
expect_true(count(sampled3) < 3)
680+
681+
# Test base::sample is working
682+
#expect_equal(length(sample(1:12)), 12)
676683
})
677684

678685
test_that("select operators", {
@@ -753,6 +760,9 @@ test_that("subsetting", {
753760
df6 <- subset(df, df$age %in% c(30), c(1,2))
754761
expect_equal(count(df6), 1)
755762
expect_equal(columns(df6), c("name", "age"))
763+
764+
# Test base::subset is working
765+
expect_equal(nrow(subset(airquality, Temp > 80, select = c(Ozone, Temp))), 68)
756766
})
757767

758768
test_that("selectExpr() on a DataFrame", {
@@ -888,6 +898,9 @@ test_that("column functions", {
888898
expect_equal(result, list(list(3L, 2L, 1L), list(6L, 5L, 4L)))
889899
result <- collect(select(df, sort_array(df[[1]])))[[1]]
890900
expect_equal(result, list(list(1L, 2L, 3L), list(4L, 5L, 6L)))
901+
902+
# Test that stats::lag is working
903+
expect_equal(length(lag(ldeaths, 12)), 72)
891904
})
892905
#
893906
test_that("column binary mathfunctions", {
@@ -1086,7 +1099,7 @@ test_that("group by, agg functions", {
10861099
gd3_local <- collect(agg(gd3, var(df8$age)))
10871100
expect_equal(162, gd3_local[gd3_local$name == "Justin",][1, 2])
10881101

1089-
# make sure base:: or stats::sd, var are working
1102+
# Test stats::sd, stats::var are working
10901103
expect_true(abs(sd(1:2) - 0.7071068) < 1e-6)
10911104
expect_true(abs(var(1:5, 1:5) - 2.5) < 1e-6)
10921105

@@ -1138,6 +1151,9 @@ test_that("filter() on a DataFrame", {
11381151
expect_equal(count(filtered5), 1)
11391152
filtered6 <- where(df, df$age %in% c(19, 30))
11401153
expect_equal(count(filtered6), 2)
1154+
1155+
# Test stats::filter is working
1156+
#expect_true(is.ts(filter(1:100, rep(1, 3))))
11411157
})
11421158

11431159
test_that("join() and merge() on a DataFrame", {
@@ -1284,6 +1300,12 @@ test_that("unionAll(), rbind(), except(), and intersect() on a DataFrame", {
12841300
expect_is(unioned, "DataFrame")
12851301
expect_equal(count(intersected), 1)
12861302
expect_equal(first(intersected)$name, "Andy")
1303+
1304+
# Test base::rbind is working
1305+
expect_equal(length(rbind(1:4, c = 2, a = 10, 10, deparse.level = 0)), 16)
1306+
1307+
# Test base::intersect is working
1308+
expect_equal(length(intersect(1:20, 3:23)), 18)
12871309
})
12881310

12891311
test_that("withColumn() and withColumnRenamed()", {
@@ -1365,6 +1387,9 @@ test_that("describe() and summarize() on a DataFrame", {
13651387
stats2 <- summary(df)
13661388
expect_equal(collect(stats2)[4, "name"], "Andy")
13671389
expect_equal(collect(stats2)[5, "age"], "30")
1390+
1391+
# Test base::summary is working
1392+
expect_equal(length(summary(attenu, digits = 4)), 35)
13681393
})
13691394

13701395
test_that("dropna() and na.omit() on a DataFrame", {
@@ -1448,6 +1473,9 @@ test_that("dropna() and na.omit() on a DataFrame", {
14481473
expect_identical(expected, actual)
14491474
actual <- collect(na.omit(df, minNonNulls = 3, cols = c("name", "age", "height")))
14501475
expect_identical(expected, actual)
1476+
1477+
# Test stats::na.omit is working
1478+
expect_equal(nrow(na.omit(data.frame(x = c(0, 10, NA)))), 2)
14511479
})
14521480

14531481
test_that("fillna() on a DataFrame", {
@@ -1510,6 +1538,9 @@ test_that("cov() and corr() on a DataFrame", {
15101538
expect_true(abs(result - 1.0) < 1e-12)
15111539
result <- corr(df, "singles", "doubles", "pearson")
15121540
expect_true(abs(result - 1.0) < 1e-12)
1541+
1542+
# Test stats::cov is working
1543+
#expect_true(abs(max(cov(swiss)) - 1739.295) < 1e-3)
15131544
})
15141545

15151546
test_that("freqItems() on a DataFrame", {

docs/sparkr.md

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,7 @@ head(teenagers)
286286

287287
# Machine Learning
288288

289-
SparkR allows the fitting of generalized linear models over DataFrames using the [glm()](api/R/glm.html) function. Under the hood, SparkR uses MLlib to train a model of the specified family. Currently the gaussian and binomial families are supported. We support a subset of the available R formula operators for model fitting, including '~', '.', ':', '+', and '-'.
289+
SparkR allows the fitting of generalized linear models over DataFrames using the [glm()](api/R/glm.html) function. Under the hood, SparkR uses MLlib to train a model of the specified family. Currently the gaussian and binomial families are supported. We support a subset of the available R formula operators for model fitting, including '~', '.', ':', '+', and '-'.
290290

291291
The [summary()](api/R/summary.html) function gives the summary of a model produced by [glm()](api/R/glm.html).
292292

@@ -351,3 +351,38 @@ summary(model)
351351
##Sepal_Width 0.404655
352352
{% endhighlight %}
353353
</div>
354+
355+
# R Function Name Conflicts
356+
357+
When loading and attaching a new package in R, it is possible to have a name [conflict](https://stat.ethz.ch/R-manual/R-devel/library/base/html/library.html), where a
358+
function is masking another function.
359+
360+
The following functions are masked by the SparkR package:
361+
362+
<table class="table">
363+
<tr><th>Masked function</th><th>How to Access</th></tr>
364+
<tr>
365+
<td><code>cov</code> in <code>package:stats</code></td>
366+
<td><code><pre>stats::cov(x, y = NULL, use = "everything",
367+
method = c("pearson", "kendall", "spearman"))</pre></code></td>
368+
</tr>
369+
<tr>
370+
<td><code>filter</code> in <code>package:stats</code></td>
371+
<td><code><pre>stats::filter(x, filter, method = c("convolution", "recursive"),
372+
sides = 2, circular = FALSE, init)</pre></code></td>
373+
</tr>
374+
<tr>
375+
<td><code>sample</code> in <code>package:base</code></td>
376+
<td><code>base::sample(x, size, replace = FALSE, prob = NULL)</code></td>
377+
</tr>
378+
<tr>
379+
<td><code>table</code> in <code>package:base</code></td>
380+
<td><code><pre>base::table(...,
381+
exclude = if (useNA == "no") c(NA, NaN),
382+
useNA = c("no", "ifany", "always"),
383+
dnn = list.names(...), deparse.level = 1)</pre></code></td>
384+
</tr>
385+
</table>
386+
387+
You can inspect the search path in R with [`search()`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/search.html)
388+

0 commit comments

Comments
 (0)