Skip to content

Commit d9c5903

Browse files
committed
Merge remote-tracking branch 'origin/master' into SPARK-27676
2 parents 58e9544 + 54da3bb commit d9c5903

File tree

855 files changed

+47268
-11160
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

855 files changed

+47268
-11160
lines changed

.github/PULL_REQUEST_TEMPLATE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@
77
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
88
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
99

10-
Please review http://spark.apache.org/contributing.html before opening a pull request.
10+
Please review https://spark.apache.org/contributing.html before opening a pull request.

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
## Contributing to Spark
22

33
*Before opening a pull request*, review the
4-
[Contributing to Spark guide](http://spark.apache.org/contributing.html).
4+
[Contributing to Spark guide](https://spark.apache.org/contributing.html).
55
It lists steps that are required before creating a PR. In particular, consider:
66

77
- Is the change important and ready enough to ask the community to spend time reviewing?
88
- Have you searched for existing, related JIRAs and pull requests?
9-
- Is this a new feature that can stand alone as a [third party project](http://spark.apache.org/third-party-projects.html) ?
9+
- Is this a new feature that can stand alone as a [third party project](https://spark.apache.org/third-party-projects.html) ?
1010
- Is the change being proposed clearly explained and motivated?
1111

1212
When you contribute code, you affirm that the contribution is your original work and that you

LICENSE-binary

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -368,6 +368,8 @@ org.eclipse.jetty:jetty-servlets
368368
org.eclipse.jetty:jetty-util
369369
org.eclipse.jetty:jetty-webapp
370370
org.eclipse.jetty:jetty-xml
371+
org.scala-lang.modules:scala-xml_2.12
372+
org.opencypher:okapi-shade
371373

372374
core/src/main/java/org/apache/spark/util/collection/TimSort.java
373375
core/src/main/resources/org/apache/spark/ui/static/bootstrap*
@@ -412,7 +414,6 @@ org.scala-lang:scala-compiler
412414
org.scala-lang:scala-library
413415
org.scala-lang:scala-reflect
414416
org.scala-lang.modules:scala-parser-combinators_2.12
415-
org.scala-lang.modules:scala-xml_2.12
416417
org.fusesource.leveldbjni:leveldbjni-all
417418
net.sourceforge.f2j:arpack_combined_all
418419
xmlenc:xmlenc

NOTICE-binary

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1163,3 +1163,18 @@ Copyright 2014 The Apache Software Foundation
11631163

11641164
Apache Mahout (http://mahout.apache.org/)
11651165
Copyright 2014 The Apache Software Foundation
1166+
1167+
scala-xml
1168+
Copyright (c) 2002-2019 EPFL
1169+
Copyright (c) 2011-2019 Lightbend, Inc.
1170+
1171+
scala-xml includes software developed at
1172+
LAMP/EPFL (https://lamp.epfl.ch/) and
1173+
Lightbend, Inc. (https://www.lightbend.com/).
1174+
1175+
Licensed under the Apache License, Version 2.0 (the "License").
1176+
Unless required by applicable law or agreed to in writing, software
1177+
distributed under the License is distributed on an "AS IS" BASIS,
1178+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1179+
See the License for the specific language governing permissions and
1180+
limitations under the License.

R/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ export R_HOME=/home/username/R
1717

1818
#### Build Spark
1919

20-
Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
20+
Build Spark with [Maven](https://spark.apache.org/docs/latest/building-spark.html#buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
2121

2222
```bash
2323
build/mvn -DskipTests -Psparkr package
@@ -35,15 +35,15 @@ SparkContext, you can run
3535

3636
./bin/sparkR --master "local[2]"
3737

38-
To set other options like driver memory, executor memory etc. you can pass in the [spark-submit](http://spark.apache.org/docs/latest/submitting-applications.html) arguments to `./bin/sparkR`
38+
To set other options like driver memory, executor memory etc. you can pass in the [spark-submit](https://spark.apache.org/docs/latest/submitting-applications.html) arguments to `./bin/sparkR`
3939

4040
#### Using SparkR from RStudio
4141

4242
If you wish to use SparkR from RStudio, please refer [SparkR documentation](https://spark.apache.org/docs/latest/sparkr.html#starting-up-from-rstudio).
4343

4444
#### Making changes to SparkR
4545

46-
The [instructions](http://spark.apache.org/contributing.html) for making contributions to Spark also apply to SparkR.
46+
The [instructions](https://spark.apache.org/contributing.html) for making contributions to Spark also apply to SparkR.
4747
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
4848
Once you have made your changes, please include unit tests for them and run existing unit tests using the `R/run-tests.sh` script as described below.
4949

@@ -58,7 +58,7 @@ To run one of them, use `./bin/spark-submit <filename> <args>`. For example:
5858
```bash
5959
./bin/spark-submit examples/src/main/r/dataframe.R
6060
```
61-
You can run R unit tests by following the instructions under [Running R Tests](http://spark.apache.org/docs/latest/building-spark.html#running-r-tests).
61+
You can run R unit tests by following the instructions under [Running R Tests](https://spark.apache.org/docs/latest/building-spark.html#running-r-tests).
6262

6363
### Running on YARN
6464

R/WINDOWS.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,19 +20,19 @@ license: |
2020

2121
To build SparkR on Windows, the following steps are required
2222

23-
1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to
23+
1. Install R (>= 3.1) and [Rtools](https://cloud.r-project.org/bin/windows/Rtools/). Make sure to
2424
include Rtools and R in `PATH`. Note that support for R prior to version 3.4 is deprecated as of Spark 3.0.0.
2525

2626
2. Install
27-
[JDK8](http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) and set
27+
[JDK8](https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) and set
2828
`JAVA_HOME` in the system environment variables.
2929

30-
3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin`
30+
3. Download and install [Maven](https://maven.apache.org/download.html). Also include the `bin`
3131
directory in Maven in `PATH`.
3232

33-
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
33+
4. Set `MAVEN_OPTS` as described in [Building Spark](https://spark.apache.org/docs/latest/building-spark.html).
3434

35-
5. Open a command shell (`cmd`) in the Spark directory and build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
35+
5. Open a command shell (`cmd`) in the Spark directory and build Spark with [Maven](https://spark.apache.org/docs/latest/building-spark.html#buildmvn) and include the `-Psparkr` profile to build the R package. For example to use the default Hadoop versions you can run
3636

3737
```bash
3838
mvn.cmd -DskipTests -Psparkr package
@@ -52,7 +52,7 @@ To run the SparkR unit tests on Windows, the following steps are required —ass
5252

5353
4. Set the environment variable `HADOOP_HOME` to the full path to the newly created `hadoop` directory.
5454

55-
5. Run unit tests for SparkR by running the command below. You need to install the needed packages following the instructions under [Running R Tests](http://spark.apache.org/docs/latest/building-spark.html#running-r-tests) first:
55+
5. Run unit tests for SparkR by running the command below. You need to install the needed packages following the instructions under [Running R Tests](https://spark.apache.org/docs/latest/building-spark.html#running-r-tests) first:
5656

5757
```
5858
.\bin\spark-submit2.cmd --conf spark.hadoop.fs.defaultFS="file:///" R\pkg\tests\run-all.R

R/pkg/R/DataFrame.R

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1179,16 +1179,16 @@ setMethod("collect",
11791179
function(x, stringsAsFactors = FALSE) {
11801180
connectionTimeout <- as.numeric(Sys.getenv("SPARKR_BACKEND_CONNECTION_TIMEOUT", "6000"))
11811181
useArrow <- FALSE
1182-
arrowEnabled <- sparkR.conf("spark.sql.execution.arrow.enabled")[[1]] == "true"
1182+
arrowEnabled <- sparkR.conf("spark.sql.execution.arrow.sparkr.enabled")[[1]] == "true"
11831183
if (arrowEnabled) {
11841184
useArrow <- tryCatch({
11851185
checkSchemaInArrow(schema(x))
11861186
TRUE
11871187
}, error = function(e) {
11881188
warning(paste0("The conversion from Spark DataFrame to R DataFrame was attempted ",
11891189
"with Arrow optimization because ",
1190-
"'spark.sql.execution.arrow.enabled' is set to true; however, ",
1191-
"failed, attempting non-optimization. Reason: ",
1190+
"'spark.sql.execution.arrow.sparkr.enabled' is set to true; ",
1191+
"however, failed, attempting non-optimization. Reason: ",
11921192
e))
11931193
FALSE
11941194
})
@@ -1476,7 +1476,7 @@ dapplyInternal <- function(x, func, schema) {
14761476
schema <- structType(schema)
14771477
}
14781478

1479-
arrowEnabled <- sparkR.conf("spark.sql.execution.arrow.enabled")[[1]] == "true"
1479+
arrowEnabled <- sparkR.conf("spark.sql.execution.arrow.sparkr.enabled")[[1]] == "true"
14801480
if (arrowEnabled) {
14811481
if (inherits(schema, "structType")) {
14821482
checkSchemaInArrow(schema)

R/pkg/R/SQLContext.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -259,7 +259,7 @@ getSchema <- function(schema, firstRow = NULL, rdd = NULL) {
259259
createDataFrame <- function(data, schema = NULL, samplingRatio = 1.0,
260260
numPartitions = NULL) {
261261
sparkSession <- getSparkSession()
262-
arrowEnabled <- sparkR.conf("spark.sql.execution.arrow.enabled")[[1]] == "true"
262+
arrowEnabled <- sparkR.conf("spark.sql.execution.arrow.sparkr.enabled")[[1]] == "true"
263263
useArrow <- FALSE
264264
firstRow <- NULL
265265

@@ -302,7 +302,7 @@ createDataFrame <- function(data, schema = NULL, samplingRatio = 1.0,
302302
},
303303
error = function(e) {
304304
warning(paste0("createDataFrame attempted Arrow optimization because ",
305-
"'spark.sql.execution.arrow.enabled' is set to true; however, ",
305+
"'spark.sql.execution.arrow.sparkr.enabled' is set to true; however, ",
306306
"failed, attempting non-optimization. Reason: ",
307307
e))
308308
FALSE

R/pkg/R/group.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ gapplyInternal <- function(x, func, schema) {
229229
if (is.character(schema)) {
230230
schema <- structType(schema)
231231
}
232-
arrowEnabled <- sparkR.conf("spark.sql.execution.arrow.enabled")[[1]] == "true"
232+
arrowEnabled <- sparkR.conf("spark.sql.execution.arrow.sparkr.enabled")[[1]] == "true"
233233
if (arrowEnabled) {
234234
if (inherits(schema, "structType")) {
235235
checkSchemaInArrow(schema)

R/pkg/R/mllib_classification.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ setClass("NaiveBayesModel", representation(jobj = "jobj"))
5050
#'
5151
#' @param data SparkDataFrame for training.
5252
#' @param formula A symbolic description of the model to be fitted. Currently only a few formula
53-
#' operators are supported, including '~', '.', ':', '+', and '-'.
53+
#' operators are supported, including '~', '.', ':', '+', '-', '*', and '^'.
5454
#' @param regParam The regularization parameter. Only supports L2 regularization currently.
5555
#' @param maxIter Maximum iteration number.
5656
#' @param tol Convergence tolerance of iterations.

0 commit comments

Comments
 (0)