Branch 1.6 #13882

liu549676915 · 2016-06-24T01:45:22Z

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

…er install in dep tests This patch fixes a build/test issue caused by the combination of #10672 and a latent issue in the original `dev/test-dependencies` script. First, changes which _only_ touched build files were not triggering full Jenkins runs, making it possible for a build change to be merged even though it could cause failures in other tests. The `root` build module now depends on `build`, so all tests will now be run whenever a build-related file is changed. I also added a `clean` step to the Maven install step in `dev/test-dependencies` in order to address an issue where the dummy JARs stuck around and caused "multiple assembly JARs found" errors in tests. /cc zsxwing Author: Josh Rosen <[email protected]> Closes #10704 from JoshRosen/fix-build-test-problems. (cherry picked from commit a449914) Signed-off-by: Josh Rosen <[email protected]>

…ampType casting Warning users about casting changes. Author: Brandon Bradley <[email protected]> Closes #10708 from blbradley/spark-12758. (cherry picked from commit a767ee8) Signed-off-by: Michael Armbrust <[email protected]>

https://issues.apache.org/jira/browse/SPARK-11823 This test often hangs and times out, leaving hanging processes. Let's ignore it for now and improve the test. Author: Yin Huai <[email protected]> Closes #10715 from yhuai/SPARK-11823-ignore. (cherry picked from commit aaa2c3b) Signed-off-by: Josh Rosen <[email protected]>

…d function "aggregate" Currently, RDD function aggregate's parameter doesn't explain well, especially parameter "zeroValue". It's helpful to let junior scala user know that "zeroValue" attend both "seqOp" and "combOp" phase. Author: Tommy YU <[email protected]> Closes #10587 from Wenpei/rdd_aggregate_doc. (cherry picked from commit 9f0995b) Signed-off-by: Sean Owen <[email protected]>

[SPARK-12582][Test] IndexShuffleBlockResolverSuite fails in windows * IndexShuffleBlockResolverSuite fails in windows due to file is not closed. * mv IndexShuffleBlockResolverSuite.scala from "test/java" to "test/scala". https://issues.apache.org/jira/browse/SPARK-12582 Author: Yucai Yu <[email protected]> Closes #10526 from yucai/master. (cherry picked from commit 7e15044) Signed-off-by: Sean Owen <[email protected]>

…gression Use a much smaller step size in LinearRegressionWithSGD MLlib examples to achieve a reasonable RMSE. Our training folks hit this exact same issue when concocting an example and had the same solution. Author: Sean Owen <[email protected]> Closes #10675 from srowen/SPARK-5273. (cherry picked from commit 9c7f34a) Signed-off-by: Sean Owen <[email protected]>

…orm equals to zero Cosine similarity with 0 vector should be 0 Related to #10152 Author: Sean Owen <[email protected]> Closes #10696 from srowen/SPARK-7615. (cherry picked from commit c48f2a3) Signed-off-by: Sean Owen <[email protected]>

This reverts commit 8b5f230.

#10311 introduces some rare, non-deterministic flakiness for hive udf tests, see #10311 (comment) I can't reproduce it locally, and may need more time to investigate, a quick solution is: bypass hive tests for json serialization. Author: Wenchen Fan <[email protected]> Closes #10430 from cloud-fan/hot-fix. (cherry picked from commit 8543997) Signed-off-by: Michael Armbrust <[email protected]>

…in GROUP BY clause cloud-fan Can you please take a look ? In this case, we are failing during check analysis while validating the aggregation expression. I have added a semanticEquals for HiveGenericUDF to fix this. Please let me know if this is the right way to address this issue. Author: Dilip Biswal <[email protected]> Closes #10520 from dilipbiswal/spark-12558. (cherry picked from commit dc7b387) Signed-off-by: Yin Huai <[email protected]> Conflicts: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala

The default run has changed, but the documentation didn't fully reflect the change. Author: Luc Bourlier <[email protected]> Closes #10740 from skyluc/issue/mesos-modes-doc. (cherry picked from commit cc91e21) Signed-off-by: Reynold Xin <[email protected]>

…verflow jira: https://issues.apache.org/jira/browse/SPARK-12685 master PR: #10627 the log of word2vec reports trainWordsCount = -785727483 during computation over a large dataset. Update the priority as it will affect the computation process. alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1)) Author: Yuhao Yang <[email protected]> Closes #10721 from hhbyyh/branch-1.4. (cherry picked from commit 7bd2564) Signed-off-by: Joseph K. Bradley <[email protected]>

…thon3 This replaces the `execfile` used for running custom python shell scripts with explicit open, compile and exec (as recommended by 2to3). The reason for this change is to make the pythonstartup option compatible with python3. Author: Erik Selin <[email protected]> Closes #10255 from tyro89/pythonstartup-python3. (cherry picked from commit e4e0b3f) Signed-off-by: Josh Rosen <[email protected]>

I hit the exception below. The `UnsafeKVExternalSorter` does pass `null` as the consumer when creating an `UnsafeInMemorySorter`. Normally the NPE doesn't occur because the `inMemSorter` is set to null later and the `free()` method is not called. It happens when there is another exception like OOM thrown before setting `inMemSorter` to null. Anyway, we can add the null check to avoid it. ``` ERROR spark.TaskContextImpl: Error in TaskCompletionListener java.lang.NullPointerException at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.free(UnsafeInMemorySorter.java:110) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.cleanupResources(UnsafeExternalSorter.java:288) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter$1.onTaskCompletion(UnsafeExternalSorter.java:141) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ``` Author: Carson Wang <[email protected]> Closes #10637 from carsonwang/FixNPE. (cherry picked from commit eabc7b8) Signed-off-by: Josh Rosen <[email protected]>

…number of features is large jira: https://issues.apache.org/jira/browse/SPARK-12026 The issue is valid as features.toArray.view.zipWithIndex.slice(startCol, endCol) becomes slower as startCol gets larger. I tested on local and the change can improve the performance and the running time was stable. Author: Yuhao Yang <[email protected]> Closes #10146 from hhbyyh/chiSq. (cherry picked from commit 021dafc) Signed-off-by: Joseph K. Bradley <[email protected]>

When an Executor process is destroyed, the FileAppender that is asynchronously reading the stderr stream of the process can throw an IOException during read because the stream is closed. Before the ExecutorRunner destroys the process, the FileAppender thread is flagged to stop. This PR wraps the inputStream.read call of the FileAppender in a try/catch block so that if an IOException is thrown and the thread has been flagged to stop, it will safely ignore the exception. Additionally, the FileAppender thread was changed to use Utils.tryWithSafeFinally to better log any exception that do occur. Added unit tests to verify a IOException is thrown and logged if FileAppender is not flagged to stop, and that no IOException when the flag is set. Author: Bryan Cutler <[email protected]> Closes #10714 from BryanCutler/file-appender-read-ioexception-SPARK-9844. (cherry picked from commit 56cdbd6) Signed-off-by: Sean Owen <[email protected]>

… allocation Add `listener.synchronized` to get `storageStatusList` and `execInfo` atomically. Author: Shixiong Zhu <[email protected]> Closes #10728 from zsxwing/SPARK-12784. (cherry picked from commit 501e99e) Signed-off-by: Shixiong Zhu <[email protected]>

If sort column contains slash(e.g. "Executor ID / Host") when yarn mode,sort fail with following message. ![spark-12708](https://cloud.githubusercontent.com/assets/6679275/12193320/80814f8c-b62a-11e5-9914-7bf3907029df.png) Ｉt's similar to SPARK-4313 . Author: root <root@R520T1.(none)> Author: Koyo Yoshida <[email protected]> Closes #10663 from yoshidakuy/SPARK-12708. (cherry picked from commit 32cca93) Signed-off-by: Kousuke Saruta <[email protected]>

Author: Oscar D. Lara Yejas <[email protected]> Author: Oscar D. Lara Yejas <[email protected]> Author: Oscar D. Lara Yejas <[email protected]> Author: Oscar D. Lara Yejas <[email protected]> Closes #9613 from olarayej/SPARK-11031. (cherry picked from commit ba4a641) Signed-off-by: Shivaram Venkataraman <[email protected]>

…read completion Changed Logging FileAppender to use join in `awaitTermination` to ensure that thread is properly finished before returning. Author: Bryan Cutler <[email protected]> Closes #10654 from BryanCutler/fileAppender-join-thread-SPARK-12701. (cherry picked from commit ea104b8) Signed-off-by: Sean Owen <[email protected]>

http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline ``` val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model") ``` should be ``` val sameModel = PipelineModel.load("/tmp/spark-logistic-regression-model") ``` cc: jkbradley Author: Jeff Lam <[email protected]> Closes #10769 from Agent007/SPARK-12722. (cherry picked from commit 86972fa) Signed-off-by: Sean Owen <[email protected]>

…plied in GROUP BY clause Addresses the comments from Yin. #10520 Author: Dilip Biswal <[email protected]> Closes #10758 from dilipbiswal/spark-12558-followup. (cherry picked from commit db9a860) Signed-off-by: Yin Huai <[email protected]> Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala

…ures Currently `summary()` fails on a GLM model fitted over a vector feature missing ML attrs, since the output feature attrs will also have no name. We can avoid this situation by forcing `VectorAssembler` to make up suitable names when inputs are missing names. cc mengxr Author: Eric Liang <[email protected]> Closes #10323 from ericl/spark-12346. (cherry picked from commit 5e492e9) Signed-off-by: Xiangrui Meng <[email protected]>

…ntegration doc This PR added instructions to get flume assembly jar for Python users in the flume integration page like Kafka doc. Author: Shixiong Zhu <[email protected]> Closes #10746 from zsxwing/flume-doc. (cherry picked from commit a973f48) Signed-off-by: Tathagata Das <[email protected]>

… integration doc This PR added instructions to get Kinesis assembly jar for Python users in the Kinesis integration page like Kafka doc. Author: Shixiong Zhu <[email protected]> Closes #10822 from zsxwing/kinesis-doc. (cherry picked from commit 721845c) Signed-off-by: Tathagata Das <[email protected]>

In SPARK-10743 we wrap cast with `UnresolvedAlias` to give `Cast` a better alias if possible. However, for cases like filter, the `UnresolvedAlias` can't be resolved and actually we don't need a better alias for this case. This PR move the cast wrapping logic to `Column.named` so that we will only do it when we need a alias name. backport #10781 to 1.6 Author: Wenchen Fan <[email protected]> Closes #10819 from cloud-fan/bug.

… in interface.scala Author: proflin <[email protected]> Closes #10824 from proflin/master. (cherry picked from commit c00744e) Signed-off-by: Reynold Xin <[email protected]>

Change assertion's message so it's consistent with the code. The old message says that the invoked method was lapack.dports, where in fact it was lapack.dppsv method. Author: Wojciech Jurczyk <[email protected]> Closes #10818 from wjur/wjur/rename_error_message. (cherry picked from commit ebd9ce0) Signed-off-by: Sean Owen <[email protected]>

…ReaderBase It looks like there's one place left in the codebase, SpecificParquetRecordReaderBase, where we didn't use SparkHadoopUtil's reflective accesses of TaskAttemptContext methods, which could create problems when using a single Spark artifact with both Hadoop 1.x and 2.x. Author: Josh Rosen <[email protected]> Closes #10843 from JoshRosen/SPARK-12921.

https://issues.apache.org/jira/browse/SPARK-12747 Postgres JDBC driver uses "FLOAT4" or "FLOAT8" not "real". Author: Liang-Chi Hsieh <[email protected]> Closes #10695 from viirya/fix-postgres-jdbc. (cherry picked from commit 55c7dd0) Signed-off-by: Reynold Xin <[email protected]>

vanzin · 2016-06-24T01:50:47Z

Please close this PR.

vanzin · 2016-06-24T17:04:40Z

@liu549676915 please close this PR.

## What changes were proposed in this pull request? In the case that we don't know which module a object came from, will call pickle.whichmodule() to go throught all the loaded modules to find the object, which could fail because some modules, for example, six, see https://bitbucket.org/gutworth/six/issues/63/importing-six-breaks-pickling We should ignore the exception here, use `__main__` as the module name (it means we can't find the module). ## How was this patch tested? Manual tested. Can't have a unit test for this. Author: Davies Liu <[email protected]> Closes #13788 from davies/whichmodule. (cherry picked from commit d489354) Signed-off-by: Davies Liu <[email protected]>

## What changes were proposed in this pull request? This PR fixes `DataFrame.describe()` by forcing materialization to make the `Seq` serializable. Currently, `describe()` of `DataFrame` throws `Task not serializable` Spark exceptions when joining in Scala 2.10. ## How was this patch tested? Manual. (After building with Scala 2.10, test on bin/spark-shell and bin/pyspark.) Author: Dongjoon Hyun <[email protected]> Closes #13902 from dongjoon-hyun/SPARK-16173-branch-1.6.

…ndexOutOfBoundsException. I have found the bug and tested the solution. ## What changes were proposed in this pull request? Just adjust the size of an array in line 58 so it does not cause an ArrayOutOfBoundsException in line 66. ## How was this patch tested? Manual tests. I have recompiled the entire project with the fix, it has been built successfully and I have run the code, also with good results. line 66: val yD = blas.ddot(trueWeights.length, x, 1, trueWeights, 1) + rnd.nextGaussian() * 0.1 crashes because trueWeights has length "nfeatures + 1" while "x" has length "features", and they should have the same length. To fix this just make trueWeights be the same length as x. I have recompiled the project with the change and it is working now: [spark-1.6.1]$ spark-submit --master local[*] --class org.apache.spark.mllib.util.SVMDataGenerator mllib/target/spark-mllib_2.11-1.6.1.jar local /home/user/test And it generates the data successfully now in the specified folder. Author: José Antonio <[email protected]> Closes #13895 from j4munoz/patch-2. (cherry picked from commit a3c7b41) Signed-off-by: Sean Owen <[email protected]>

…g tests ## What changes were proposed in this pull request? Make spill tests wait until job has completed before returning the number of stages that spilled ## How was this patch tested? Existing Jenkins tests. Author: Sean Owen <[email protected]> Closes #13896 from srowen/SPARK-16193. (cherry picked from commit e877415) Signed-off-by: Sean Owen <[email protected]>

## What changes were proposed in this pull request? reduce the denominator of SparkPi by 1 ## How was this patch tested? integration tests Author: 杨浩 <[email protected]> Closes #13910 from yanghaogn/patch-1. (cherry picked from commit b452026) Signed-off-by: Sean Owen <[email protected]>

…oot` module ending up failure of Python tests ## What changes were proposed in this pull request? This PR fixes incorrect checking for `root` module (meaning all tests). I realised that #13806 is being failed due to this one. The PR corrects two files in `sql` and `core`. Since it seems fixing `core` module triggers all tests by `root` value from `determine_modules_for_files`. So, `changed_modules` becomes as below: ``` ['root', 'sql'] ``` and `module.dependent_modules` becaomes as below: ``` ['pyspark-mllib', 'pyspark-ml', 'hive-thriftserver', 'sparkr', 'mllib', 'examples', 'pyspark-sql'] ``` Now, `modules_to_test` does not include `root` and this checking is skipped but then both `changed_modules` and `modules_to_test` are being merged after that. So, this includes `root` module to test. This ends up with failing with the message below (e.g. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60990/consoleFull): ``` Error: unrecognized module 'root'. Supported modules: pyspark-core, pyspark-sql, pyspark-streaming, pyspark-ml, pyspark-mllib ``` ## How was this patch tested? N/A Author: hyukjinkwon <[email protected]> Closes #13845 from HyukjinKwon/fix-build-1.6.

… Executor ID ## What changes were proposed in this pull request? Previously, the TaskLocation implementation would not allow for executor ids which include underscores. This tweaks the string split used to get the hostname and executor id, allowing for underscores in the executor id. This addresses the JIRA found here: https://issues.apache.org/jira/browse/SPARK-16148 This is moved over from a previous PR against branch-1.6: #13857 ## How was this patch tested? Ran existing unit tests for core and streaming. Manually ran a simple streaming job with an executor whose id contained underscores and confirmed that the job ran successfully. This is my original work and I license the work to the project under the project's open source license. Author: Tom Magrino <[email protected]> Closes #13858 from tmagrino/fixtasklocation. (cherry picked from commit ae14f36) Signed-off-by: Shixiong Zhu <[email protected]>

…n NewHadoopRDD to branch 1.6 ## What changes were proposed in this pull request? This PR backports #13759. (`SqlNewHadoopRDDState` was renamed to `InputFileNameHolder` and `spark` API does not exist in branch 1.6) ## How was this patch tested? Unit tests in `ColumnExpressionSuite`. Author: hyukjinkwon <[email protected]> Closes #13806 from HyukjinKwon/backport-SPARK-16044.

….6.3. ## What changes were proposed in this pull request? - Adds 1.6.2 and 1.6.3 as supported Spark versions within the bundled spark-ec2 script. - Makes the default Spark version 1.6.3 to keep in sync with the upcoming release. - Does not touch the newer spark-ec2 scripts in the separate amplabs repository. ## How was this patch tested? - Manual script execution: export AWS_SECRET_ACCESS_KEY=_snip_ export AWS_ACCESS_KEY_ID=_snip_ $SPARK_HOME/ec2/spark-ec2 \ --key-pair=_snip_ \ --identity-file=_snip_ \ --region=us-east-1 \ --vpc-id=_snip_ \ --slaves=1 \ --instance-type=t1.micro \ --spark-version=1.6.2 \ --hadoop-major-version=yarn \ launch test-cluster - Result: Successful creation of a 1.6.2-based Spark cluster. This contribution is my original work and I license the work to the project under the project's open source license. Author: Brian Uri <[email protected]> Closes #13947 from briuri/branch-1.6-bug-spark-16257.

…cess.destroyForcibly() if and only if Process.destroy() fails ## What changes were proposed in this pull request? Utils.terminateProcess should `destroy()` first and only fall back to `destroyForcibly()` if it fails. It's kind of bad that we're force-killing executors -- and only in Java 8. See JIRA for an example of the impact: no shutdown While here: `Utils.waitForProcess` should use the Java 8 method if available instead of a custom implementation. ## How was this patch tested? Existing tests, which cover the force-kill case, and Amplab tests, which will cover both Java 7 and Java 8 eventually. However I tested locally on Java 8 and the PR builder will try Java 7 here. Author: Sean Owen <[email protected]> Closes #13973 from srowen/SPARK-16182. (cherry picked from commit 2075bf8) Signed-off-by: Sean Owen <[email protected]>

…hon3 ## What changes were proposed in this pull request? I would like to use IPython with Python 3.5. It is annoying when it fails with IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON when I have a version greater than 2.7 ## How was this patch tested It now works with IPython and Python3 Author: MechCoder <[email protected]> Closes #13503 from MechCoder/spark-15761. (cherry picked from commit 66283ee) Signed-off-by: Sean Owen <[email protected]>

… No Column #14040 #### What changes were proposed in this pull request? Star expansion over a table containing zero column does not work since 1.6. However, it works in Spark 1.5.1. This PR is to fix the issue in the master branch. For example, ```scala val rddNoCols = sqlContext.sparkContext.parallelize(1 to 10).map(_ => Row.empty) val dfNoCols = sqlContext.createDataFrame(rddNoCols, StructType(Seq.empty)) dfNoCols.registerTempTable("temp_table_no_cols") sqlContext.sql("select * from temp_table_no_cols").show ``` Without the fix, users will get the following the exception: ``` java.lang.IllegalArgumentException: requirement failed at scala.Predef$.require(Predef.scala:221) at org.apache.spark.sql.catalyst.analysis.UnresolvedStar.expand(unresolved.scala:199) ``` #### How was this patch tested? Tests are added Author: gatorsmile <[email protected]> Closes #14042 from gatorsmile/starExpansionEmpty.

Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-16353 ## What changes were proposed in this pull request? The javadoc options for the java unidoc generation are ignored when generating the java unidoc. For example, the generated `index.html` has the wrong HTML page title. This can be seen at http://spark.apache.org/docs/latest/api/java/index.html. I changed the relevant setting scope from `doc` to `(JavaUnidoc, unidoc)`. ## How was this patch tested? I ran `docs/jekyll build` and verified that the java unidoc `index.html` has the correct HTML page title. Author: Michael Allman <[email protected]> Closes #14031 from mallman/spark-16353. (cherry picked from commit 7dbffcd) Signed-off-by: Sean Owen <[email protected]>

…er is no longer published on Apache mirrors ## What changes were proposed in this pull request? Download Maven 3.3.9 instead of 3.3.3 because the latter is no longer published on Apache mirrors ## How was this patch tested? Jenkins Author: Sean Owen <[email protected]> Closes #14066 from srowen/Maven339Branch16.

… log ## What changes were proposed in this pull request? Free memory size displayed in the log is wrong (used memory), fix to make it correct. Backported to 1.6. ## How was this patch tested? N/A Author: jerryshao <[email protected]> Closes #14043 from jerryshao/memory-log-fix-1.6-backport.

## What changes were proposed in this pull request? The following Java code because of type erasing: ```Java JavaRDD<Vector> rows = jsc.parallelize(...); RowMatrix mat = new RowMatrix(rows.rdd()); QRDecomposition<RowMatrix, Matrix> result = mat.tallSkinnyQR(true); ``` We should use retag to restore the type to prevent the following exception: ```Java java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.linalg.Vector; ``` ## How was this patch tested? Java unit test Author: Xusen Yin <[email protected]> Closes #14051 from yinxusen/SPARK-16372. (cherry picked from commit 4c6f00d) Signed-off-by: Sean Owen <[email protected]>

This reverts commit 45dda92.

…eflection. Using "Method.invoke" causes an exception to be thrown, not an error, so Utils.waitForProcess() was always throwing an exception when run on Java 7. Author: Marcelo Vanzin <[email protected]> Closes #14056 from vanzin/SPARK-16385. (cherry picked from commit 59f9c1b) Signed-off-by: Sean Owen <[email protected]>

## What changes were proposed in this pull request? RegexExtract and RegexReplace currently crash on non-nullable input due use of a hard-coded local variable name (e.g. compiles fail with `java.lang.Exception: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 85, Column 26: Redefinition of local variable "m" `). This changes those variables to use fresh names, and also in a few other places. ## How was this patch tested? Unit tests. rxin Author: Eric Liang <[email protected]> Closes #14168 from ericl/sc-3906. (cherry picked from commit 1c58fa9) Signed-off-by: Reynold Xin <[email protected]>

…rtitionBy This patch fixes a variable namespace collision bug in pmod and partitionBy Regression test for one possible occurrence. A more general fix in `ExpressionEvalHelper.checkEvaluation` will be in a subsequent PR. Author: Sameer Agarwal <[email protected]> Closes #14144 from sameeragarwal/codegen-bug. (cherry picked from commit 9cc74f9) Signed-off-by: Reynold Xin <[email protected]> (cherry picked from commit 6892614) Signed-off-by: Reynold Xin <[email protected]>

…n code generation In code generation, it is incorrect for expressions to reuse variable names across different instances of itself. As an example, SPARK-16488 reports a bug in which pmod expression reuses variable name "r". This patch updates ExpressionEvalHelper test harness to always project two instances of the same expression, which will help us catch variable reuse problems in expression unit tests. This patch also fixes the bug in crc32 expression. This is a test harness change, but I also created a new test suite for testing the test harness. Author: Reynold Xin <[email protected]> Closes #14146 from rxin/SPARK-16489. (cherry picked from commit c377e49) Signed-off-by: Reynold Xin <[email protected]>

…signed to numSkippedTasks ## What changes were proposed in this pull request? I fixed a misassigned var, numCompletedTasks was assigned to numSkippedTasks in the convertJobData method ## How was this patch tested? dev/run-tests Author: Alex Bozarth <[email protected]> Closes #14141 from ajbozarth/spark16375. (cherry picked from commit f156136) Signed-off-by: Sean Owen <[email protected]>

…g OoM for long runs ## What changes were proposed in this pull request? Unpersist broadcasted vars in Word2Vec.fit for more timely / reliable resource cleanup ## How was this patch tested? Jenkins tests Author: Sean Owen <[email protected]> Closes #14153 from srowen/SPARK-16440. (cherry picked from commit 51ade51) Signed-off-by: Sean Owen <[email protected]>

…ons in file listing ## What changes were proposed in this pull request? Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors. After making partition discovery not silently drop exceptions, HiveMetastoreCatalog can trigger partition discovery on empty tables, which cause FileNotFoundExceptions (these Exceptions were dropped by partition discovery silently). To address this issue, this PR introduces two **hacks** to workaround the issues. These two hacks try to avoid of triggering partition discovery on empty tables in HiveMetastoreCatalog. ## How was this patch tested? Manually tested. **Note: This is a backport of #13987 Author: Yin Huai <[email protected]> Closes #14139 from yhuai/SPARK-16313-branch-1.6.

## What changes were proposed in this pull request? Forgotten broadcasted variables were persisted into a previous #PR 14153). This PR turns those `unpersist()` into `destroy()` so that memory is freed even on the driver. ## How was this patch tested? Unit Tests in Word2VecSuite were run locally. This contribution is done on behalf of Criteo, according to the terms of the Apache license 2.0. Author: Anthony Truchet <[email protected]> Closes #14268 from AnthonyTruchet/SPARK-16440. (cherry picked from commit 0dc79ff) Signed-off-by: Sean Owen <[email protected]>

JoshRosen and others added 30 commits January 11, 2016 12:58

Revert "[SPARK-12645][SPARKR] SparkR support hash function"

03e523e

This reverts commit 8b5f230.

[SQL][MINOR] Fix one little mismatched comment according to the codes…

30f55e5

… in interface.scala Author: proflin <[email protected]> Closes #10824 from proflin/master. (cherry picked from commit c00744e) Signed-off-by: Reynold Xin <[email protected]>

Davies Liu and others added 26 commits June 24, 2016 14:35

Revert "[SPARK-16372][MLLIB] Retag RDD to tallSkinnyQR of RowMatrix"

bb92788

This reverts commit 45dda92.

[HOTFIX] Fix build break.

980db2b

srowen mentioned this pull request Jul 23, 2016

[MINOR] Close old PRs that should be closed but have not been #14328

Closed

asfgit closed this in e3c7039 Jul 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Branch 1.6 #13882

Branch 1.6 #13882

Uh oh!

liu549676915 commented Jun 24, 2016

Uh oh!

vanzin commented Jun 24, 2016

Uh oh!

vanzin commented Jun 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Branch 1.6 #13882

Branch 1.6 #13882

Uh oh!

Conversation

liu549676915 commented Jun 24, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

vanzin commented Jun 24, 2016

Uh oh!

vanzin commented Jun 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants