-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16508][SparkR] Fix warnings on undocumented/duplicated arguments by CRAN-check #14558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
82e2f09
41d9dca
44115e9
2d136db
901edbb
475ee38
2285de7
20efb79
719ac5f
15637f7
977fbbf
d3a30d2
1e40135
8611bc2
51b1016
ea8a198
4b434e7
0ed6236
33a213f
b87ba8f
6bf20cd
bc683f0
0fb0149
d2c1d64
b4047fc
bde94cd
38378f5
a21ecc9
750f880
e02d0d0
8f4cacd
4503632
e5771a1
2e2c787
237ae54
1c56971
022230c
6cb3eab
3e0163b
68a24d3
22c7660
394d598
9406f82
585d1d9
91aa532
5735b8b
ec5f157
0bc3753
6d5233e
0edfd7d
e72a6aa
afa69ed
c9cfe43
3aafaa7
315a0dd
aa3d233
71170e9
2682719
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -77,3 +77,8 @@ spark-warehouse/ | |
| # For R session data | ||
| .RData | ||
| .RHistory | ||
| .Rhistory | ||
| *.Rproj | ||
| *.Rproj.* | ||
|
|
||
| .Rproj.user | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -120,7 +120,6 @@ setMethod("schema", | |
| #' | ||
| #' Print the logical and physical Catalyst plans to the console for debugging. | ||
| #' | ||
| #' @param x A SparkDataFrame | ||
| #' @param extended Logical. If extended is FALSE, explain() only prints the physical plan. | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases explain,SparkDataFrame-method | ||
|
|
@@ -177,11 +176,10 @@ setMethod("isLocal", | |
| #' | ||
| #' Print the first numRows rows of a SparkDataFrame | ||
| #' | ||
| #' @param x A SparkDataFrame | ||
| #' @param numRows The number of rows to print. Defaults to 20. | ||
| #' @param truncate Whether truncate long strings. If true, strings more than 20 characters will be | ||
| #' truncated and all cells will be aligned right | ||
| #' | ||
| #' @param numRows the number of rows to print. Defaults to 20. | ||
| #' @param truncate whether truncate long strings. If true, strings more than 20 characters will be | ||
|
||
| #' truncated. However, if set greater than zero, truncates strings longer than `truncate` | ||
| #' characters and all cells will be aligned right. | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases showDF,SparkDataFrame-method | ||
| #' @rdname showDF | ||
|
|
@@ -206,7 +204,7 @@ setMethod("showDF", | |
| #' | ||
| #' Print the SparkDataFrame column names and types | ||
| #' | ||
| #' @param x A SparkDataFrame | ||
| #' @param object a SparkDataFrame. | ||
| #' | ||
| #' @family SparkDataFrame functions | ||
| #' @rdname show | ||
|
|
@@ -318,6 +316,7 @@ setMethod("colnames", | |
| columns(x) | ||
| }) | ||
|
|
||
| #' @param value a character vector. Must have the same length as the number of columns in the SparkDataFrame. | ||
| #' @rdname columns | ||
| #' @aliases colnames<-,SparkDataFrame-method | ||
| #' @name colnames<- | ||
|
|
@@ -406,7 +405,6 @@ setMethod("coltypes", | |
| #' | ||
| #' Set the column types of a SparkDataFrame. | ||
| #' | ||
| #' @param x A SparkDataFrame | ||
| #' @param value A character vector with the target column types for the given | ||
| #' SparkDataFrame. Column types can be one of integer, numeric/double, character, logical, or NA | ||
| #' to keep that column as-is. | ||
|
|
@@ -510,9 +508,9 @@ setMethod("registerTempTable", | |
| #' | ||
| #' Insert the contents of a SparkDataFrame into a table registered in the current SparkSession. | ||
| #' | ||
| #' @param x A SparkDataFrame | ||
| #' @param tableName A character vector containing the name of the table | ||
| #' @param overwrite A logical argument indicating whether or not to overwrite | ||
| #' @param x a SparkDataFrame. | ||
| #' @param tableName a character vector containing the name of the table. | ||
| #' @param overwrite a logical argument indicating whether or not to overwrite. | ||
| #' the existing rows in the table. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is tableName moved?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reasons are
Does that make sense?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. We typically have In many cases we have setGeneric the fewer number of parameters - whether it is because it has to match an existing generic or some parameters we don't put in the signature with type - so we would have a parameter and I see your point about if the parameter is in the generic then perhaps we should @param document it there as well. I think we should try to keep it near the function definition/body though because
with the exception where we've talked about the generic applies to multiple function definitions with different classes and that first parameter could be in different classes so it needs a central place. What do you think?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I agree that ideally the doc should be kept near the function definition/body, and that's consistent with many other functions. So then the only issue is
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. that is an unfortunate side-effect. I wonder if there is a way to order this in the Rd generated roxygen; maybe this could be a reasonable PR to change roxygen for special casing For now I think we should prioritize maintainability and keep the doc close to the function as much as possible. |
||
| #' | ||
| #' @family SparkDataFrame functions | ||
|
|
@@ -571,7 +569,9 @@ setMethod("cache", | |
| #' supported storage levels, refer to | ||
| #' \url{http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence}. | ||
| #' | ||
| #' @param x The SparkDataFrame to persist | ||
| #' @param x the SparkDataFrame to persist. | ||
| #' @param newLevel storage level chosen for the persistance. See available options in | ||
| #' the description. | ||
| #' | ||
| #' @family SparkDataFrame functions | ||
| #' @rdname persist | ||
|
|
@@ -634,9 +634,10 @@ setMethod("unpersist", | |
| #' \item{3.} {Return a new SparkDataFrame partitioned by the given column(s), | ||
| #' using `spark.sql.shuffle.partitions` as number of partitions.} | ||
| #'} | ||
| #' @param x A SparkDataFrame | ||
| #' @param numPartitions The number of partitions to use. | ||
| #' @param col The column by which the partitioning will be performed. | ||
| #' @param x a SparkDataFrame. | ||
| #' @param numPartitions the number of partitions to use. | ||
| #' @param col the column by which the partitioning will be performed. | ||
| #' @param ... additional column(s) to be used in the partitioning. | ||
| #' | ||
| #' @family SparkDataFrame functions | ||
| #' @rdname repartition | ||
|
|
@@ -915,8 +916,6 @@ setMethod("sample_frac", | |
|
|
||
| #' Returns the number of rows in a SparkDataFrame | ||
| #' | ||
| #' @param x A SparkDataFrame | ||
| #' | ||
| #' @family SparkDataFrame functions | ||
| #' @rdname nrow | ||
| #' @name count | ||
|
|
@@ -1092,8 +1091,10 @@ setMethod("limit", | |
| dataFrame(res) | ||
| }) | ||
|
|
||
| #' Take the first NUM rows of a SparkDataFrame and return a the results as a R data.frame | ||
| #' Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame | ||
| #' | ||
| #' @param x a SparkDataFrame. | ||
| #' @param num number of rows to take. | ||
| #' @family SparkDataFrame functions | ||
| #' @rdname take | ||
| #' @name take | ||
|
|
@@ -1120,9 +1121,9 @@ setMethod("take", | |
| #' then head() returns the first 6 rows in keeping with the current data.frame | ||
| #' convention in R. | ||
| #' | ||
| #' @param x A SparkDataFrame | ||
| #' @param num The number of rows to return. Default is 6. | ||
| #' @return A data.frame | ||
| #' @param x a SparkDataFrame. | ||
| #' @param num the number of rows to return. Default is 6. | ||
| #' @return A data.frame. | ||
| #' | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases head,SparkDataFrame-method | ||
|
|
@@ -1146,7 +1147,7 @@ setMethod("head", | |
|
|
||
| #' Return the first row of a SparkDataFrame | ||
| #' | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think similar to what you have for other functions, this could go to generic.R - do you have any other idea how to have functions working with multiple class documented?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have a good answer yet. When I tried to move the param there, it seems that the generic functions for RDD Actions and Transformations are not exposed. Do you know specific reason for that by chance?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right - they are not - RDD functions are not exported from the packages (not public) and we don't want Rd file generated for them. Please see PR #14626 - we want a separate SetGeneric for non-RDD functions, and then this line documenting both DataFrame and Column parameter can then go to generics.R |
||
| #' @param x A SparkDataFrame | ||
| #' @param x a SparkDataFrame or a column used in aggregation function. | ||
| #' | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases first,SparkDataFrame-method | ||
|
|
@@ -1240,7 +1241,6 @@ setMethod("group_by", | |
| #' | ||
| #' Compute aggregates by specifying a list of columns | ||
| #' | ||
| #' @param x a SparkDataFrame | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases agg,SparkDataFrame-method | ||
| #' @rdname summarize | ||
|
|
@@ -1387,16 +1387,15 @@ setMethod("dapplyCollect", | |
| #' Groups the SparkDataFrame using the specified columns and applies the R function to each | ||
| #' group. | ||
| #' | ||
| #' @param x A SparkDataFrame | ||
| #' @param cols Grouping columns | ||
| #' @param func A function to be applied to each group partition specified by grouping | ||
| #' @param cols grouping columns. | ||
| #' @param func a function to be applied to each group partition specified by grouping | ||
| #' column of the SparkDataFrame. The function `func` takes as argument | ||
| #' a key - grouping columns and a data frame - a local R data.frame. | ||
| #' The output of `func` is a local R data.frame. | ||
| #' @param schema The schema of the resulting SparkDataFrame after the function is applied. | ||
| #' @param schema the schema of the resulting SparkDataFrame after the function is applied. | ||
| #' The schema must match to output of `func`. It has to be defined for each | ||
| #' output column with preferred output column name and corresponding data type. | ||
| #' @return a SparkDataFrame | ||
| #' @return A SparkDataFrame. | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases gapply,SparkDataFrame-method | ||
| #' @rdname gapply | ||
|
|
@@ -1479,13 +1478,12 @@ setMethod("gapply", | |
| #' Groups the SparkDataFrame using the specified columns, applies the R function to each | ||
| #' group and collects the result back to R as data.frame. | ||
| #' | ||
| #' @param x A SparkDataFrame | ||
| #' @param cols Grouping columns | ||
| #' @param func A function to be applied to each group partition specified by grouping | ||
| #' @param cols grouping columns. | ||
| #' @param func a function to be applied to each group partition specified by grouping | ||
| #' column of the SparkDataFrame. The function `func` takes as argument | ||
| #' a key - grouping columns and a data frame - a local R data.frame. | ||
| #' The output of `func` is a local R data.frame. | ||
| #' @return a data.frame | ||
| #' @return A data.frame. | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases gapplyCollect,SparkDataFrame-method | ||
| #' @rdname gapplyCollect | ||
|
|
@@ -2461,8 +2459,8 @@ setMethod("unionAll", | |
| #' Union two or more SparkDataFrames. This is equivalent to `UNION ALL` in SQL. | ||
| #' Note that this does not remove duplicate rows across the two SparkDataFrames. | ||
| #' | ||
| #' @param x A SparkDataFrame | ||
| #' @param ... Additional SparkDataFrame | ||
| #' @param x a SparkDataFrame. | ||
| #' @param ... additional SparkDataFrame(s). | ||
| #' @return A SparkDataFrame containing the result of the union. | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases rbind,SparkDataFrame-method | ||
|
|
@@ -2519,8 +2517,8 @@ setMethod("intersect", | |
| #' Return a new SparkDataFrame containing rows in this SparkDataFrame | ||
| #' but not in another SparkDataFrame. This is equivalent to `EXCEPT` in SQL. | ||
| #' | ||
| #' @param x A SparkDataFrame | ||
| #' @param y A SparkDataFrame | ||
| #' @param x a SparkDataFrame. | ||
| #' @param y a SparkDataFrame. | ||
| #' @return A SparkDataFrame containing the result of the except operation. | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases except,SparkDataFrame,SparkDataFrame-method | ||
|
|
@@ -2561,10 +2559,11 @@ setMethod("except", | |
| #' and to not change the existing data. | ||
| #' } | ||
| #' | ||
| #' @param df A SparkDataFrame | ||
| #' @param path A name for the table | ||
| #' @param source A name for external data source | ||
| #' @param mode One of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default) | ||
| #' @param df a SparkDataFrame. | ||
| #' @param path a name for the table. | ||
| #' @param source a name for external data source. | ||
| #' @param mode one of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default) | ||
| #' @param ... additional argument(s) passed to the method. | ||
| #' | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases write.df,SparkDataFrame,character-method | ||
|
|
@@ -2623,10 +2622,11 @@ setMethod("saveDF", | |
| #' ignore: The save operation is expected to not save the contents of the SparkDataFrame | ||
| #' and to not change the existing data. \cr | ||
| #' | ||
| #' @param df A SparkDataFrame | ||
| #' @param tableName A name for the table | ||
| #' @param source A name for external data source | ||
| #' @param mode One of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default) | ||
| #' @param df a SparkDataFrame. | ||
| #' @param tableName a name for the table. | ||
| #' @param source a name for external data source. | ||
| #' @param mode one of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default). | ||
| #' @param ... additional option(s) passed to the method. | ||
| #' | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases saveAsTable,SparkDataFrame,character-method | ||
|
|
@@ -2662,10 +2662,10 @@ setMethod("saveAsTable", | |
| #' Computes statistics for numeric columns. | ||
| #' If no columns are given, this function computes statistics for all numerical columns. | ||
| #' | ||
| #' @param x A SparkDataFrame to be computed. | ||
| #' @param col A string of name | ||
| #' @param ... Additional expressions | ||
| #' @return A SparkDataFrame | ||
| #' @param x a SparkDataFrame to be computed. | ||
| #' @param col a string of name. | ||
| #' @param ... additional expressions. | ||
| #' @return A SparkDataFrame. | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases describe,SparkDataFrame,character-method describe,SparkDataFrame,ANY-method | ||
| #' @rdname summary | ||
|
|
@@ -2700,6 +2700,7 @@ setMethod("describe", | |
| dataFrame(sdf) | ||
| }) | ||
|
|
||
| #' @param object a SparkDataFrame to be summarized. | ||
| #' @rdname summary | ||
| #' @name summary | ||
| #' @aliases summary,SparkDataFrame-method | ||
|
|
@@ -2715,16 +2716,20 @@ setMethod("summary", | |
| #' | ||
| #' dropna, na.omit - Returns a new SparkDataFrame omitting rows with null values. | ||
| #' | ||
| #' @param x A SparkDataFrame. | ||
| #' @param x a SparkDataFrame. | ||
| #' @param how "any" or "all". | ||
| #' if "any", drop a row if it contains any nulls. | ||
| #' if "all", drop a row only if all its values are null. | ||
| #' if minNonNulls is specified, how is ignored. | ||
| #' @param minNonNulls If specified, drop rows that have less than | ||
| #' @param minNonNulls if specified, drop rows that have less than | ||
| #' minNonNulls non-null values. | ||
| #' This overwrites the how parameter. | ||
| #' @param cols Optional list of column names to consider. | ||
| #' @return A SparkDataFrame | ||
| #' @param cols optional list of column names to consider. In `fillna`, | ||
| #' columns specified in cols that do not have matching data | ||
| #' type are ignored. For example, if value is a character, and | ||
| #' subset contains a non-character column, then the non-character | ||
| #' column is simply ignored. | ||
| #' @return A SparkDataFrame. | ||
| #' | ||
| #' @family SparkDataFrame functions | ||
| #' @rdname nafunctions | ||
|
|
@@ -2769,18 +2774,12 @@ setMethod("na.omit", | |
|
|
||
| #' fillna - Replace null values. | ||
| #' | ||
| #' @param x A SparkDataFrame. | ||
| #' @param value Value to replace null values with. | ||
| #' @param value value to replace null values with. | ||
| #' Should be an integer, numeric, character or named list. | ||
| #' If the value is a named list, then cols is ignored and | ||
| #' value must be a mapping from column name (character) to | ||
| #' replacement value. The replacement value must be an | ||
| #' integer, numeric or character. | ||
| #' @param cols optional list of column names to consider. | ||
| #' Columns specified in cols that do not have matching data | ||
| #' type are ignored. For example, if value is a character, and | ||
| #' subset contains a non-character column, then the non-character | ||
| #' column is simply ignored. | ||
| #' | ||
| #' @rdname nafunctions | ||
| #' @name fillna | ||
|
|
@@ -2845,8 +2844,11 @@ setMethod("fillna", | |
| #' Since data.frames are held in memory, ensure that you have enough memory | ||
| #' in your system to accommodate the contents. | ||
| #' | ||
| #' @param x a SparkDataFrame | ||
| #' @return a data.frame | ||
| #' @param x a SparkDataFrame. | ||
| #' @param row.names NULL or a character vector giving the row names for the data frame. | ||
| #' @param optional If `TRUE`, converting column names is optional. | ||
| #' @param ... additional arguments passed to the method. | ||
|
||
| #' @return A data.frame. | ||
| #' @family SparkDataFrame functions | ||
| #' @aliases as.data.frame,SparkDataFrame-method | ||
| #' @rdname as.data.frame | ||
|
|
@@ -3000,9 +3002,8 @@ setMethod("str", | |
| #' Returns a new SparkDataFrame with columns dropped. | ||
| #' This is a no-op if schema doesn't contain column name(s). | ||
| #' | ||
| #' @param x A SparkDataFrame. | ||
| #' @param cols A character vector of column names or a Column. | ||
| #' @return A SparkDataFrame | ||
| #' @param col a character vector of column names or a Column. | ||
| #' @return A SparkDataFrame. | ||
| #' | ||
| #' @family SparkDataFrame functions | ||
| #' @rdname drop | ||
|
|
@@ -3049,8 +3050,8 @@ setMethod("drop", | |
| #' | ||
| #' @name histogram | ||
| #' @param nbins the number of bins (optional). Default value is 10. | ||
| #' @param col the column (described by character or Column object) to build the histogram from. | ||
|
||
| #' @param df the SparkDataFrame containing the Column to build the histogram from. | ||
| #' @param colname the name of the column to build the histogram from. | ||
| #' @return a data.frame with the histogram statistics, i.e., counts and centroids. | ||
| #' @rdname histogram | ||
| #' @aliases histogram,SparkDataFrame,characterOrColumn-method | ||
|
|
@@ -3184,6 +3185,7 @@ setMethod("histogram", | |
| #' @param x A SparkDataFrame | ||
| #' @param url JDBC database url of the form `jdbc:subprotocol:subname` | ||
| #' @param tableName The name of the table in the external database | ||
| #' @param ... additional argument(s) passed to the method | ||
|
||
| #' @param mode One of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default) | ||
| #' @family SparkDataFrame functions | ||
| #' @rdname write.jdbc | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate line. Can these be pushed down into a
.gitignorein the subdirectory? this file is a mess.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... Is the upper/lower case difference caused by different R versions or platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is essential for this PR. I'd suggest leaving this out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. This is not part of this PR.