Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
313 commits
Select commit Hold shift + click to select a range
8c70cb4
fixed numVertices in transitive closure example
udoklein Jan 8, 2016
553fd7b
[SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail…
tgravescs Jan 8, 2016
659fd9d
[SPARK-4819] Remove Guava's "Optional" from public API
srowen Jan 8, 2016
d9447ca
[SPARK-12593][SQL] Converts resolved logical plan back to SQL
liancheng Jan 8, 2016
1fdf9bb
[SPARK-12730][TESTS] De-duplicate some test code in BlockManagerSuite
JoshRosen Jan 9, 2016
090d691
[SPARK-4628][BUILD] Remove all non-Maven-Central repositories from build
JoshRosen Jan 9, 2016
95cd5d9
[SPARK-12577] [SQL] Better support of parentheses in partition by and…
viirya Jan 9, 2016
3d77cff
[SPARK-12645][SPARKR] SparkR support hash function
yanboliang Jan 9, 2016
b23c452
[SPARK-12340] Fix overflow in various take functions.
rxin Jan 9, 2016
3efd106
Close #10665
rxin Jan 10, 2016
5b0d544
[SPARK-12735] Consolidate & move spark-ec2 to AMPLab managed repository.
rxin Jan 10, 2016
b78e028
[SPARK-12736][CORE][DEPLOY] Standalone Master cannot be started due t…
jaceklaskowski Jan 10, 2016
e5904bb
[SPARK-12692][BUILD][MLLIB] Scala style: Fix the style violation (Spa…
sarutak Jan 10, 2016
3119206
[SPARK-12692][BUILD][GRAPHX] Scala style: Fix the style violation (Sp…
sarutak Jan 10, 2016
3ab0138
[SPARK-12734][BUILD] Fix Netty exclusion and use Maven Enforcer to pr…
JoshRosen Jan 11, 2016
6439a82
[SPARK-3873][BUILD] Enable import ordering error checking.
Jan 11, 2016
008a558
[SPARK-4628][BUILD] Add a resolver to MiMaBuild.scala for mqttv3(1.0.1).
sarutak Jan 11, 2016
f13c7f8
[SPARK-12734][HOTFIX][TEST-MAVEN] Fix bug in Netty exclusions
JoshRosen Jan 11, 2016
f253fef
[SPARK-12539][FOLLOW-UP] always sort in partitioning writer
cloud-fan Jan 11, 2016
bd723bd
removed lambda from sortByKey()
udoklein Jan 11, 2016
8fe928b
[SPARK-12269][STREAMING][KINESIS] Update aws-java-sdk version
BrianLondon Jan 11, 2016
9559ac5
[SPARK-12744][SQL] Change parsing JSON integers to timestamps to trea…
antlypls Jan 11, 2016
b313bad
[STREAMING][MINOR] Typo fixes
jaceklaskowski Jan 11, 2016
a449914
[SPARK-12734][HOTFIX] Build changes must trigger all tests; clean aft…
JoshRosen Jan 11, 2016
a767ee8
[SPARK-12758][SQL] add note to Spark SQL Migration guide about Timest…
blbradley Jan 11, 2016
ee4ee02
[SPARK-12603][MLLIB] PySpark MLlib GaussianMixtureModel should suppor…
yanboliang Jan 11, 2016
4f8eefa
[SPARK-12685][MLLIB] word2vec trainWordsCount gets overflow
hhbyyh Jan 11, 2016
bbea888
[SPARK-10809][MLLIB] Single-document topicDistributions method for Lo…
hhbyyh Jan 11, 2016
fe9eb0b
[SPARK-12576][SQL] Enable expression parsing in CatalystQl
hvanhovell Jan 12, 2016
473907a
[SPARK-12742][SQL] org.apache.spark.sql.hive.LogicalPlanToSQLSuite fa…
scwf Jan 12, 2016
36d4935
[SPARK-12498][SQL][MINOR] BooleanSimplication simplification
liancheng Jan 12, 2016
aaa2c3b
[SPARK-11823] Ignores HiveThriftBinaryServerSuite's test jdbc cancel
yhuai Jan 12, 2016
39ae04e
[SPARK-12692][BUILD][STREAMING] Scala style: Fix the style violation …
sarutak Jan 12, 2016
112abf9
[SPARK-12692][BUILD][YARN] Scala style: Fix the style violation (Spac…
sarutak Jan 12, 2016
8cfa218
[SPARK-12692][BUILD][SQL] Scala style: Fix the style violation (Space…
sarutak Jan 12, 2016
c48f2a3
[SPARK-7615][MLLIB] MLLIB Word2Vec wordVectors divided by Euclidean N…
srowen Jan 12, 2016
9c7f34a
[SPARK-5273][MLLIB][DOCS] Improve documentation examples for LinearRe…
srowen Jan 12, 2016
9f0995b
[SPARK-12638][API DOC] Parameter explanation not very accurate for rd…
Wenpei Jan 12, 2016
7e15044
[SPARK-12582][TEST] IndexShuffleBlockResolverSuite fails in windows
Jan 12, 2016
1d88879
[SPARK-12762][SQL] Add unit test for SimplifyConditionals optimizatio…
rxin Jan 12, 2016
508592b
[SPARK-9843][SQL] Make catalyst optimizer pass pluggable at runtime
Jan 12, 2016
0ed430e
[SPARK-12768][SQL] Remove CaseKeyWhen expression
rxin Jan 12, 2016
0d543b9
Revert "[SPARK-12692][BUILD][SQL] Scala style: Fix the style violatio…
rxin Jan 12, 2016
8ed5f12
[SPARK-12724] SQL generation support for persisted data source tables
liancheng Jan 12, 2016
4f60651
[SPARK-12652][PYSPARK] Upgrade Py4J to 0.9.1
zsxwing Jan 12, 2016
9247084
[SPARK-12785][SQL] Add ColumnarBatch, an in memory columnar format fo…
nongli Jan 13, 2016
b3b9ad2
[SPARK-12788][SQL] Simplify BooleanEquality by using casts.
rxin Jan 13, 2016
f14922c
[SPARK-12692][BUILD][CORE] Scala style: Fix the style violation (Spac…
sarutak Jan 13, 2016
dc7b387
[SPARK-12558][SQL] AnalysisException when multiple functions applied …
dilipbiswal Jan 13, 2016
cb7b864
[SPARK-12692][BUILD][SQL] Scala style: Fix the style violation (Space…
sarutak Jan 13, 2016
3d81d63
[SPARK-12692][BUILD] Enforce style checking about white space before …
sarutak Jan 13, 2016
d6fd9b3
[SPARK-12692][BUILD][HOT-FIX] Fix the scala style of KinesisBackedBlo…
yhuai Jan 13, 2016
63eee86
[SPARK-9297] [SQL] Add covar_pop and covar_samp
viirya Jan 13, 2016
cc91e21
[SPARK-12805][MESOS] Fixes documentation on Mesos run modes
Jan 13, 2016
38148f7
[SPARK-12761][CORE] Remove duplicated code
jodersky Jan 13, 2016
97e0c7c
[SPARK-9383][PROJECT-INFRA] PR merge script should reset back to prev…
JoshRosen Jan 13, 2016
e4e0b3f
[SPARK-12268][PYSPARK] Make pyspark shell pythonstartup work under py…
erikselin Jan 13, 2016
c2ea79f
[SPARK-12642][SQL] improve the hash expression to be decoupled from u…
cloud-fan Jan 13, 2016
cbbcd8e
[SPARK-12791][SQL] Simplify CaseWhen by breaking "branches" into "con…
rxin Jan 13, 2016
eabc7b8
[SPARK-12690][CORE] Fix NPE in UnsafeInMemorySorter.free()
carsonwang Jan 13, 2016
cd81fc9
[SPARK-12400][SHUFFLE] Avoid generating temp shuffle files for empty …
jerryshao Jan 14, 2016
021dafc
[SPARK-12026][MLLIB] ChiSqTest gets slower and slower over time when …
hhbyyh Jan 14, 2016
20d8ef8
[SPARK-12703][MLLIB][DOC][PYTHON] Fixed pyspark.mllib.clustering.KMea…
jkbradley Jan 14, 2016
e2ae7bd
[SPARK-12819] Deprecate TaskContext.isRunningLocally()
JoshRosen Jan 14, 2016
962e9bc
[SPARK-12756][SQL] use hash expression in Exchange
cloud-fan Jan 14, 2016
8f13cd4
[SPARK-12707][SPARK SUBMIT] Remove submit python/R scripts through py…
zjffdu Jan 14, 2016
56cdbd6
[SPARK-9844][CORE] File appender race condition during shutdown
BryanCutler Jan 14, 2016
501e99e
[SPARK-12784][UI] Fix Spark UI IndexOutOfBoundsException with dynamic…
zsxwing Jan 14, 2016
902667f
[SPARK-12771][SQL] Simplify CaseWhen code generation
rxin Jan 14, 2016
bcc7373
[SPARK-12821][BUILD] Style checker should run when some configuration…
sarutak Jan 14, 2016
2578298
[SPARK-12174] Speed up BlockManagerSuite getRemoteBytes() test
JoshRosen Jan 15, 2016
cc7af86
[SPARK-12813][SQL] Eliminate serialization for back to back operations
marmbrus Jan 15, 2016
32cca93
[SPARK-12708][UI] Sorting task error in Stages Page when yarn mode.
yoshidakuy Jan 15, 2016
591c88c
[SPARK-12829] Turn Java style checker on
rxin Jan 15, 2016
fe7246f
[SPARK-12830] Java style: disallow trailing whitespaces.
rxin Jan 15, 2016
d0a5c32
[SPARK-12655][GRAPHX] GraphX does not unpersist RDDs
jasoncl Jan 15, 2016
96fb894
[SPARK-2930] clarify docs on using webhdfs with spark.yarn.access.nam…
Jan 15, 2016
ba4a641
[SPARK-11031][SPARKR] Method str() on a DataFrame
Jan 15, 2016
c5e7076
[MINOR] [SQL] GeneratedExpressionCode -> ExprCode
Jan 15, 2016
5f83c69
[SPARK-12833][SQL] Initial import of spark-csv
falaki Jan 15, 2016
ad1503f
[SPARK-12667] Remove block manager's internal "external block store" API
rxin Jan 15, 2016
513266c
[SPARK-12833][HOT-FIX] Fix scala 2.11 compilation.
yhuai Jan 15, 2016
0bb7355
Fix typo
julienbaley Jan 15, 2016
61c4587
[SPARK-12716][WEB UI] Add a TOTALS row to the Executors Web UI
ajbozarth Jan 15, 2016
3f1c58d
[SQL][MINOR] BoundReference do not need to be NamedExpression
cloud-fan Jan 15, 2016
7cd7f22
[SPARK-12575][SQL] Grammar parity with existing SQL parser
hvanhovell Jan 15, 2016
5f84378
[SPARK-11925][ML][PYSPARK] Add PySpark missing methods for ml.feature…
yanboliang Jan 15, 2016
f6ddbb3
[SPARK-12833][HOT-FIX] Reset the locale after we set it.
yhuai Jan 16, 2016
8dbbf3e
[SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profile
JoshRosen Jan 16, 2016
3b5ccb1
[SPARK-12649][SQL] support reading bucketed table
cloud-fan Jan 16, 2016
9039333
[SPARK-12644][SQL] Update parquet reader to be vectorized.
nongli Jan 16, 2016
242efb7
[SPARK-12840] [SQL] Support passing arbitrary objects (not just expre…
Jan 16, 2016
2f7d0b6
[SPARK-12856] [SQL] speed up hashCode of unsafe array
cloud-fan Jan 16, 2016
86972fa
[SPARK-12722][DOCS] Fixed typo in Pipeline example
Jan 16, 2016
3c0d236
[SPARK-12796] [SQL] Whole stage codegen
Jan 16, 2016
cede7b2
[SPARK-12860] [SQL] speed up safe projection for primitive types
cloud-fan Jan 17, 2016
9250270
[SPARK-12862][SPARKR] Jenkins does not run R tests
felixcheung Jan 17, 2016
bc36b0f
[SQL] [MINOR] speed up hashcode for UTF8String
cloud-fan Jan 17, 2016
233d6ce
[SPARK-10264][DOCUMENTATION] Added @Since to ml.recomendation
Wenpei Jan 18, 2016
db9a860
[SPARK-12558][FOLLOW-UP] AnalysisException when multiple functions ap…
dilipbiswal Jan 18, 2016
44fcf99
[SPARK-12873][SQL] Add more comment in HiveTypeCoercion for type wide…
rxin Jan 18, 2016
5e492e9
[SPARK-12346][ML] Missing attribute names in GLM for vector-type feat…
ericl Jan 18, 2016
302bb56
[SPARK-12884] Move classes to their own files for readability
Jan 18, 2016
b8cb548
[SPARK-10985][CORE] Avoid passing evicted blocks throughout BlockManager
JoshRosen Jan 18, 2016
38c3c0e
[SPARK-12855][SQL] Remove parser dialect developer API
rxin Jan 18, 2016
4f11e3f
[SPARK-12841][SQL] fix cast in filter
cloud-fan Jan 18, 2016
4041902
[SPARK-12882][SQL] simplify bucket tests and add more comments
cloud-fan Jan 18, 2016
a973f48
[SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume i…
zsxwing Jan 18, 2016
4bcea1b
Revert "[SPARK-12829] Turn Java style checker on"
zsxwing Jan 19, 2016
721845c
[SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis…
zsxwing Jan 19, 2016
39ac56f
[SPARK-12889][SQL] Rename ParserDialect -> ParserInterface.
rxin Jan 19, 2016
323d51f
[SPARK-12700] [SQL] embed condition into SMJ and BroadcastHashJoin
Jan 19, 2016
2b5d11f
[SPARK-12885][MINOR] Rename 3 fields in ShuffleWriteMetrics
Jan 19, 2016
74ba84b
[HOT][BUILD] Changed the import order
gatorsmile Jan 19, 2016
453dae5
[SPARK-12668][SQL] Providing aliases for CSV options to be similar to…
HyukjinKwon Jan 19, 2016
c00744e
[SQL][MINOR] Fix one little mismatched comment according to the codes…
lw-lin Jan 19, 2016
d8c4b00
[SPARK-7683][PYSPARK] Confusing behavior of fold function of RDD in p…
srowen Jan 19, 2016
ebd9ce0
[MLLIB] Fix CholeskyDecomposition assertion's message
wjur Jan 19, 2016
0ddba6d
[SPARK-11944][PYSPARK][MLLIB] python mllib.clustering.bisecting k means
holdenk Jan 19, 2016
e14817b
[SPARK-12870][SQL] better format bucket id in file name
cloud-fan Jan 19, 2016
b122c86
[SPARK-12887] Do not expose var's in TaskMetrics
Jan 19, 2016
2388de5
[SPARK-12804][ML] Fix LogisticRegression with FitIntercept on all sam…
feynmanliang Jan 19, 2016
b72e01e
[SPARK-12867][SQL] Nullability of Intersect can be stricter
gatorsmile Jan 19, 2016
4dbd316
[SPARK-12560][SQL] SqlTestUtils.stripSparkFilter needs to copy utf8st…
squito Jan 19, 2016
c78e208
[SPARK-12816][SQL] De-alias type when generating schemas
jodersky Jan 19, 2016
c6f971b
[SPARK-11295] Add packages to JUnit output for Python tests
gliptak Jan 19, 2016
efd7eed
[BUILD] Runner for spark packages
marmbrus Jan 19, 2016
43f1d59
[SPARK-2750][WEB UI] Add https support to the Web UI
scwf Jan 19, 2016
f6f7ca9
[SPARK-9716][ML] BinaryClassificationEvaluator should accept Double p…
BenFradet Jan 19, 2016
3e84ef0
[SPARK-12770][SQL] Implement rules for branch elimination for CaseWhen
rxin Jan 20, 2016
37fefa6
[SPARK-12168][SPARKR] Add automated tests for conflicted function in R
felixcheung Jan 20, 2016
3ac6482
[SPARK-12337][SPARKR] Implement dropDuplicates() method of DataFrame …
Jan 20, 2016
beda901
Revert "[SPARK-11295] Add packages to JUnit output for Python tests"
mengxr Jan 20, 2016
488bbb2
[SPARK-12232][SPARKR] New R API for read.table to avoid name conflict
felixcheung Jan 20, 2016
6844d36
[SPARK-12871][SQL] Support to specify the option for compression codec.
HyukjinKwon Jan 20, 2016
753b194
[SPARK-12912][SQL] Add a test suite for EliminateSubQueries
rxin Jan 20, 2016
8e4f894
[SPARK-12881] [SQL] subexpress elimination in mutable projection
Jan 20, 2016
9376ae7
[SPARK-6519][ML] Add spark.ml API for bisecting k-means
yu-iskw Jan 20, 2016
9bb35c5
[SPARK-11295][PYSPARK] Add packages to JUnit output for Python tests
gliptak Jan 20, 2016
9753835
[SPARK-12230][ML] WeightedLeastSquares.fit() should handle division b…
iyounus Jan 20, 2016
e75e340
[SPARK-12925][SQL] Improve HiveInspectors.unwrap for StringObjectIns…
rbalamohan Jan 20, 2016
ab4a6bf
[SPARK-12898] Consider having dummyCallSite for HiveTableScan
rbalamohan Jan 20, 2016
e3727c4
[SPARK-10263][ML] Add @Since annotation to ml.param and ml.*
Jan 20, 2016
944fdad
[SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post a…
zsxwing Jan 20, 2016
b7d74a6
[SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project
zsxwing Jan 20, 2016
8f90c15
[SPARK-12616][SQL] Making Logical Operator `Union` Support Arbitrary …
gatorsmile Jan 20, 2016
f3934a8
[SPARK-12888][SQL] benchmark the new hash expression
cloud-fan Jan 20, 2016
1017327
[SPARK-12848][SQL] Change parsed decimal literal datatype from Double…
hvanhovell Jan 20, 2016
b362239
[SPARK-12797] [SQL] Generated TungstenAggregate (without grouping keys)
Jan 20, 2016
015c8ef
[SPARK-8968][SQL] external sort by the partition clomns when dynamic …
scwf Jan 21, 2016
d60f8d7
[SPARK-8968] [SQL] [HOT-FIX] Fix scala 2.11 build.
yhuai Jan 21, 2016
d741599
[SPARK-12910] Fixes : R version for installing sparkR
napsternxg Jan 21, 2016
1b2a918
[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR.
Jan 21, 2016
85200c0
[SPARK-12534][DOC] update documentation to list command line equivale…
felixcheung Jan 21, 2016
b4574e3
[SPARK-12908][ML] Add warning message for LogisticRegression for pote…
Jan 22, 2016
55c7dd0
[SPARK-12747][SQL] Use correct type name for Postgres JDBC's real array
viirya Jan 22, 2016
006906d
[SPARK-12960] [PYTHON] Some examples are missing support for python2
markgrover Jan 22, 2016
e13c147
[SPARK-12959][SQL] Writing Bucketed Data with Disabled Bucketing in S…
gatorsmile Jan 22, 2016
8a88e12
[SPARK-12629][SPARKR] Fixes for DataFrame saveAsTable method
NarineK Jan 22, 2016
d8fefab
[HOTFIX][BUILD][TEST-MAVEN] Remove duplicate dependency
zsxwing Jan 22, 2016
bc1babd
[SPARK-7997][CORE] Remove Akka from Spark Core and Streaming
zsxwing Jan 23, 2016
ea5c38f
[HOTFIX]Remove rpcEnv.awaitTermination to avoid dead-lock in some test
zsxwing Jan 23, 2016
5af5a02
[SPARK-12872][SQL] Support to specify the option for compression code…
HyukjinKwon Jan 23, 2016
1c690dd
[SPARK-12933][SQL] Initial implementation of Count-Min sketch
liancheng Jan 23, 2016
358a33b
[SPARK-12859][STREAMING][WEB UI] Names of input streams with receiver…
ajbozarth Jan 23, 2016
56f57f8
[SPARK-12760][DOCS] invalid lambda expression in python example for …
mortada Jan 23, 2016
aca2a01
[SPARK-12760][DOCS] inaccurate description for difference between loc…
srowen Jan 23, 2016
5f56980
[SPARK-11137][STREAMING] Make StreamingContext.stop() exception-safe
jayadevanmurali Jan 23, 2016
423783a
[SPARK-12904][SQL] Strength reduction for integral and decimal litera…
rxin Jan 23, 2016
cfdcef7
[STREAMING][MINOR] Scaladoc + logs
jaceklaskowski Jan 23, 2016
f400460
[SPARK-12971] Fix Hive tests which fail in Hadoop-2.3 SBT build
JoshRosen Jan 24, 2016
a834001
[SPARK-10498][TOOLS][BUILD] Add requirements.txt file for dev python …
holdenk Jan 24, 2016
e789b1d
[SPARK-12120][PYSPARK] Improve exception message when failing to init…
zjffdu Jan 24, 2016
3327fd2
[SPARK-12624][PYSPARK] Checks row length when converting Java arrays …
liancheng Jan 25, 2016
3adebfc
[SPARK-12901][SQL] Refactor options for JSON and CSV datasource (not …
HyukjinKwon Jan 25, 2016
d8e4805
[SPARK-12932][JAVA API] improved error message for java type inferenc…
andygrove Jan 25, 2016
4ee8191
[SPARK-12755][CORE] Stop the event logger before the DAG scheduler
Jan 25, 2016
dd2325d
[SPARK-11965][ML][DOC] Update user guide for RFormula feature interac…
yanboliang Jan 25, 2016
ef8fb36
Closes #10879
mengxr Jan 25, 2016
c037d25
[SPARK-12149][WEB UI] Executor UI improvement suggestions - Color UI
ajbozarth Jan 25, 2016
7d877c3
[SPARK-12902] [SQL] visualization for generated operators
Jan 25, 2016
00026fa
[SPARK-12901][SQL][HOT-FIX] Fix scala 2.11 compilation.
yhuai Jan 25, 2016
9348431
[SPARK-12975][SQL] Throwing Exception when Bucketing Columns are part…
gatorsmile Jan 25, 2016
dcae355
[SPARK-12905][ML][PYSPARK] PCAModel return eigenvalues for PySpark
yanboliang Jan 25, 2016
6f0f1d9
[SPARK-12934][SQL] Count-min sketch serialization
liancheng Jan 25, 2016
be375fc
[SPARK-12879] [SQL] improve the unsafe row writing framework
cloud-fan Jan 26, 2016
109061f
[SPARK-12936][SQL] Initial bloom filter implementation
cloud-fan Jan 26, 2016
fdcc351
[SPARK-12934] use try-with-resources for streams
tedyu Jan 26, 2016
b66afde
[SPARK-11922][PYSPARK][ML] Python api for ml.feature.quantile discret…
holdenk Jan 26, 2016
ae47ba7
[SPARK-12834] Change ser/de of JavaArray and JavaList
yinxusen Jan 26, 2016
27c910f
[SPARK-10086][MLLIB][STREAMING][PYSPARK] ignore StreamingKMeans test …
mengxr Jan 26, 2016
d54cfed
[SQL][MINOR] A few minor tweaks to CSV reader.
rxin Jan 26, 2016
6743de3
[SPARK-12937][SQL] bloom filter serialization
cloud-fan Jan 26, 2016
5936bf9
[SPARK-12961][CORE] Prevent snappy-java memory leak
viirya Jan 26, 2016
649e9d0
[SPARK-3369][CORE][STREAMING] Java mapPartitions Iterator->Iterable i…
srowen Jan 26, 2016
ae0309a
[SPARK-10911] Executors should System.exit on clean shutdown.
Jan 26, 2016
08c781c
[SPARK-12682][SQL] Add support for (optionally) not storing tables in…
sameeragarwal Jan 26, 2016
cbd507d
[SPARK-7799][STREAMING][DOCUMENT] Add the linking and deploying instr…
zsxwing Jan 26, 2016
8beab68
[SPARK-11923][ML] Python API for ml.feature.ChiSqSelector
yinxusen Jan 26, 2016
fbf7623
[SPARK-12952] EMLDAOptimizer initialize() should return EMLDAOptimize…
yinxusen Jan 26, 2016
ee74498
[SPARK-8725][PROJECT-INFRA] Test modules in topologically-sorted orde…
JoshRosen Jan 26, 2016
83507fe
[SQL] Minor Scaladoc format fix
liancheng Jan 26, 2016
19fdb21
[SPARK-12993][PYSPARK] Remove usage of ADD_FILES in pyspark
zjffdu Jan 26, 2016
eb91729
[SPARK-10509][PYSPARK] Reduce excessive param boiler plate code
holdenk Jan 26, 2016
22662b2
[SPARK-12614][CORE] Don't throw non fatal exception from ask
zsxwing Jan 27, 2016
1dac964
[SPARK-11622][MLLIB] Make LibSVMRelation extends HadoopFsRelation and…
zjffdu Jan 27, 2016
5551273
[SPARK-12854][SQL] Implement complex types support in ColumnarBatch
nongli Jan 27, 2016
b72611f
[SPARK-7780][MLLIB] intercept in logisticregressionwith lbfgs should …
holdenk Jan 27, 2016
e7f9199
[SPARK-12903][SPARKR] Add covar_samp and covar_pop for SparkR
yanboliang Jan 27, 2016
ce38a35
[SPARK-12935][SQL] DataFrame API for Count-Min Sketch
liancheng Jan 27, 2016
58f5d8c
[SPARK-12728][SQL] Integrates SQL generation with native view
liancheng Jan 27, 2016
bae3c9a
[SPARK-12967][NETTY] Avoid NettyRpc error message during sparkContext…
nishkamravi2 Jan 27, 2016
4db255c
[SPARK-12780] Inconsistency returning value of ML python models' prop…
yinxusen Jan 27, 2016
90b0e56
[SPARK-12983][CORE][DOC] Correct metrics.properties.template
BenFradet Jan 27, 2016
093291c
[SPARK-1680][DOCS] Explain environment variables for running on YARN …
weineran Jan 27, 2016
41f0c85
[SPARK-13023][PROJECT INFRA] Fix handling of root module in modules_t…
JoshRosen Jan 27, 2016
edd4737
[SPARK-10847][SQL][PYSPARK] Pyspark - DataFrame - Optional Metadata w…
jasoncl Jan 27, 2016
87abcf7
[SPARK-12895][SPARK-12896] Migrate TaskMetrics to accumulators
Jan 27, 2016
32f7411
[SPARK-13021][CORE] Fail fast when custom RDDs violate RDD.partition'…
JoshRosen Jan 27, 2016
680afab
[SPARK-12938][SQL] DataFrame API for Bloom filter
cloud-fan Jan 27, 2016
ef96cd3
[SPARK-12865][SPARK-12866][SQL] Migrate SparkSQLParser/ExtendedHiveQl…
hvanhovell Jan 27, 2016
d702f0c
[HOTFIX] Fix Scala 2.11 compilation
Jan 27, 2016
4a09123
[SPARK-13045] [SQL] Remove ColumnVector.Struct in favor of ColumnarBa…
nongli Jan 27, 2016
c220443
Provide same info as in spark-submit --help
jimlohse Jan 28, 2016
415d0a8
[SPARK-12818][SQL] Specialized integral and string types for Count-mi…
liancheng Jan 28, 2016
6768039
[SPARK-12926][SQL] SQLContext to display warning message when non-sql…
tejasapatil Jan 28, 2016
cc18a71
[SPARK-13031] [SQL] cleanup codegen and improve test coverage
Jan 28, 2016
df78a93
[SPARK-9835][ML] Implement IterativelyReweightedLeastSquares solver
yanboliang Jan 28, 2016
abae889
[SPARK-12401][SQL] Add integration tests for postgres enum types
maropu Jan 28, 2016
3a40c0e
[SPARK-12749][SQL] add json option to parse floating-point types as D…
blbradley Jan 28, 2016
4637fc0
[SPARK-11955][SQL] Mark optional fields in merging schema for safely …
viirya Jan 29, 2016
b9dfdcc
Revert "[SPARK-13031] [SQL] cleanup codegen and improve test coverage"
davies Jan 29, 2016
66449b8
[SPARK-12968][SQL] Implement command to set current database
viirya Jan 29, 2016
721ced2
[SPARK-13067] [SQL] workaround for a weird scala reflection problem
cloud-fan Jan 29, 2016
8d3cc3d
[SPARK-13050][BUILD] Scalatest tags fail build with the addition of t…
ajbozarth Jan 29, 2016
55561e7
[SPARK-13031][SQL] cleanup codegen and improve test coverage
Jan 29, 2016
e51b6ea
[SPARK-13032][ML][PYSPARK] PySpark support model export/import and ta…
yanboliang Jan 29, 2016
e4c1162
[SPARK-10873] Support column sort and search for History Server.
Jan 29, 2016
c5f745e
[SPARK-13072] [SQL] simplify and improve murmur3 hash expression codegen
cloud-fan Jan 29, 2016
5f686cc
[SPARK-12656] [SQL] Implement Intersect with Left-semi Join
gatorsmile Jan 29, 2016
2b027e9
[SPARK-12818] Polishes spark-sketch module
liancheng Jan 29, 2016
e38b0ba
[SPARK-13055] SQLHistoryListener throws ClassCastException
Jan 29, 2016
2cbc412
[SPARK-13076][SQL] Rename ClientInterface -> HiveClient
rxin Jan 30, 2016
e6ceac4
[SPARK-13096][TEST] Fix flaky verifyPeakExecutionMemorySet
Jan 30, 2016
70e69fc
[SPARK-13088] Fix DAG viz in latest version of chrome
Jan 30, 2016
12252d1
[SPARK-13071] Coalescing HadoopRDD overwrites existing input metrics
Jan 30, 2016
e6a02c6
[SPARK-12914] [SQL] generate aggregation with grouping keys
Jan 30, 2016
dab246f
[SPARK-13098] [SQL] remove GenericInternalRowWithSchema
cloud-fan Jan 30, 2016
289373b
[SPARK-6363][BUILD] Make Scala 2.11 the default Scala version
JoshRosen Jan 30, 2016
de28371
[SPARK-13100][SQL] improving the performance of stringToDate method i…
Jan 30, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-13071] Coalescing HadoopRDD overwrites existing input metrics
This issue is causing tests to fail consistently in master with Hadoop 2.6 / 2.7. This is because for Hadoop 2.5+ we overwrite existing values of `InputMetrics#bytesRead` in each call to `HadoopRDD#compute`. In the case of coalesce, e.g.
```
sc.textFile(..., 4).coalesce(2).count()
```
we will call `compute` multiple times in the same task, overwriting `bytesRead` values from previous calls to `compute`.

For a regression test, see `InputOutputMetricsSuite.input metrics for old hadoop with coalesce`. I did not add a new regression test because it's impossible without significant refactoring; there's a lot of existing duplicate code in this corner of Spark.

This was caused by apache#10835.

Author: Andrew Or <[email protected]>

Closes apache#10973 from andrewor14/fix-input-metrics-coalesce.
  • Loading branch information
Andrew Or committed Jan 30, 2016
commit 12252d1da90fa7d2dffa3a7c249ecc8821dee130
7 changes: 6 additions & 1 deletion core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,7 @@ class HadoopRDD[K, V](
// TODO: there is a lot of duplicate code between this and NewHadoopRDD and SqlNewHadoopRDD

val inputMetrics = context.taskMetrics().registerInputMetrics(DataReadMethod.Hadoop)
val existingBytesRead = inputMetrics.bytesRead

// Sets the thread local variable for the file's name
split.inputSplit.value match {
Expand All @@ -230,9 +231,13 @@ class HadoopRDD[K, V](
case _ => None
}

// For Hadoop 2.5+, we get our input bytes from thread-local Hadoop FileSystem statistics.
// If we do a coalesce, however, we are likely to compute multiple partitions in the same
// task and in the same thread, in which case we need to avoid override values written by
// previous partitions (SPARK-13071).
def updateBytesRead(): Unit = {
getBytesReadCallback.foreach { getBytesRead =>
inputMetrics.setBytesRead(getBytesRead())
inputMetrics.setBytesRead(existingBytesRead + getBytesRead())
}
}

Expand Down
7 changes: 6 additions & 1 deletion core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ class NewHadoopRDD[K, V](
val conf = getConf

val inputMetrics = context.taskMetrics().registerInputMetrics(DataReadMethod.Hadoop)
val existingBytesRead = inputMetrics.bytesRead

// Find a function that will return the FileSystem bytes read by this thread. Do this before
// creating RecordReader, because RecordReader's constructor might read some bytes
Expand All @@ -139,9 +140,13 @@ class NewHadoopRDD[K, V](
case _ => None
}

// For Hadoop 2.5+, we get our input bytes from thread-local Hadoop FileSystem statistics.
// If we do a coalesce, however, we are likely to compute multiple partitions in the same
// task and in the same thread, in which case we need to avoid override values written by
// previous partitions (SPARK-13071).
def updateBytesRead(): Unit = {
getBytesReadCallback.foreach { getBytesRead =>
inputMetrics.setBytesRead(getBytesRead())
inputMetrics.setBytesRead(existingBytesRead + getBytesRead())
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ private[spark] class SqlNewHadoopRDD[V: ClassTag](
val conf = getConf(isDriverSide = false)

val inputMetrics = context.taskMetrics().registerInputMetrics(DataReadMethod.Hadoop)
val existingBytesRead = inputMetrics.bytesRead

// Sets the thread local variable for the file's name
split.serializableHadoopSplit.value match {
Expand All @@ -142,9 +143,13 @@ private[spark] class SqlNewHadoopRDD[V: ClassTag](
case _ => None
}

// For Hadoop 2.5+, we get our input bytes from thread-local Hadoop FileSystem statistics.
// If we do a coalesce, however, we are likely to compute multiple partitions in the same
// task and in the same thread, in which case we need to avoid override values written by
// previous partitions (SPARK-13071).
def updateBytesRead(): Unit = {
getBytesReadCallback.foreach { getBytesRead =>
inputMetrics.setBytesRead(getBytesRead())
inputMetrics.setBytesRead(existingBytesRead + getBytesRead())
}
}

Expand Down