Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
896 commits
Select commit Hold shift + click to select a range
df0e318
Fixed error in scaladoc of convertToCanonicalEdges
gauravkumar37 Nov 12, 2015
4fe99c7
[SPARK-11191][SQL] Looks up temporary function using execution Hive c…
liancheng Nov 12, 2015
f5a9526
[SPARK-10113][SQL] Explicit error message for unsigned Parquet logica…
HyukjinKwon Nov 12, 2015
d292f74
[SPARK-11420] Updating Stddev support via Imperative Aggregate
JihongMA Nov 12, 2015
767d288
[SPARK-11655][CORE] Fix deadlock in handling of launcher stop().
Nov 12, 2015
f0d3b58
[SPARK-11290][STREAMING][TEST-MAVEN] Fix the test for maven build
zsxwing Nov 12, 2015
380dfcc
[SPARK-11671] documentation code example typo
snowch Nov 12, 2015
74c3004
[SPARK-2533] Add locality levels on stage summary view
jbonofre Nov 12, 2015
cf38fc7
[SPARK-11670] Fix incorrect kryo buffer default value in docs
Nov 12, 2015
12a0784
[SPARK-11667] Update dynamic allocation docs to reflect supported clu…
Nov 12, 2015
68ef61b
[SPARK-11658] simplify documentation for PySpark combineByKey
snowch Nov 12, 2015
bc09296
[SPARK-11709] include creation site info in SparkContext.assertNotSto…
mengxr Nov 13, 2015
dcb896f
[SPARK-11712][ML] Make spark.ml LDAModel be abstract
jkbradley Nov 13, 2015
41bbd23
[SPARK-11654][SQL] add reduce to GroupedDataset
marmbrus Nov 13, 2015
0f1d00a
[SPARK-11663][STREAMING] Add Java API for trackStateByKey
zsxwing Nov 13, 2015
7786f9c
[SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog…
brkyvz Nov 13, 2015
e4e46b2
[SPARK-11681][STREAMING] Correctly update state timestamp even when s…
tdas Nov 13, 2015
e71c075
[SPARK-11672][ML] flaky spark.ml read/write tests
mengxr Nov 13, 2015
ed04846
[SPARK-11263][SPARKR] lintr Throws Warnings on Commented Code in Docu…
felixcheung Nov 13, 2015
2035ed3
[SPARK-11717] Ignore R session and history files from git
Lewuathe Nov 13, 2015
ea5ae27
[SPARK-11629][ML][PYSPARK][DOC] Python example code for Multilayer Pe…
yanboliang Nov 13, 2015
ad96088
[SPARK-8029] Robust shuffle writer
Nov 13, 2015
ec80c0c
[SPARK-11706][STREAMING] Fix the bug that Streaming Python tests cann…
zsxwing Nov 13, 2015
7b5d905
[SPARK-11678][SQL] Partition discovery should stop at the root path o…
yhuai Nov 13, 2015
61a2848
[SPARK-11445][DOCS] Replaced example code in mllib-ensembles.md using…
rishabhbhardwaj Nov 13, 2015
99693fe
[SPARK-11723][ML][DOC] Use LibSVM data source rather than MLUtils.loa…
yanboliang Nov 13, 2015
a244779
[SPARK-11690][PYSPARK] Add pivot to python api
aray Nov 13, 2015
23b8188
[SPARK-11654][SQL][FOLLOW-UP] fix some mistakes and clean up
cloud-fan Nov 13, 2015
d7b2b97
[SPARK-11727][SQL] Split ExpressionEncoder into FlatEncoder and Produ…
cloud-fan Nov 13, 2015
2d2411f
[SPARK-11672][ML] Set active SQLContext in MLlibTestSparkContext.befo…
mengxr Nov 13, 2015
912b943
[SPARK-11336] Add links to example codes
yinxusen Nov 13, 2015
bdfbc1d
[MINOR][ML] remove MLlibTestsSparkContext from ImpuritySuite
mengxr Nov 13, 2015
c939c70
[SPARK-7970] Skip closure cleaning for SQL operations
nitindexter Nov 14, 2015
139c15b
[SPARK-11694][SQL] Parquet logical types are not being tested properly
HyukjinKwon Nov 14, 2015
9a73b33
[MINOR][DOCS] typo in docs/configuration.md
vectorijk Nov 14, 2015
9461f5e
[SPARK-11573] Correct 'reflective access of structural type member meth…
gliptak Nov 14, 2015
22e96b8
Typo in comment: use 2 seconds instead of 1
Nov 14, 2015
d83c2f9
[SPARK-11736][SQL] Add monotonically_increasing_id to function registry.
yhuai Nov 15, 2015
d22fc10
[SPARK-11734][SQL] Rename TungstenProject -> Project, TungstenSort ->…
rxin Nov 15, 2015
64e5551
[SPARK-11672][ML] set active SQLContext in JavaDefaultReadWriteSuite
mengxr Nov 15, 2015
3e2e187
[SPARK-11738] [SQL] Making ArrayType orderable
yhuai Nov 15, 2015
72c1d68
[SPARK-10181][SQL] Do kerberos login for credentials during hive clie…
yolandagao Nov 15, 2015
d7d9fa0
[SPARK-11086][SPARKR] Use dropFactors column-wise instead of nested l…
zero323 Nov 16, 2015
835a79d
[SPARK-10500][SPARKR] sparkr.zip cannot be created if /R/lib is unwri…
Nov 16, 2015
b58765c
[SPARK-9928][SQL] Removal of LogicalLocalTable
gatorsmile Nov 16, 2015
fd50fa4
Revert "[SPARK-11572] Exit AsynchronousListenerBus thread when stop()…
JoshRosen Nov 16, 2015
42de525
[SPARK-11745][SQL] Enable more JSON parsing options
rxin Nov 16, 2015
7f8eb3b
[SPARK-11044][SQL] Parquet writer version fixed as version1
HyukjinKwon Nov 16, 2015
e388b39
[SPARK-11692][SQL] Support for Parquet logical types, JSON and BSON …
HyukjinKwon Nov 16, 2015
0e79604
[SPARK-11522][SQL] input_file_name() returns "" for external tables
xwu0226 Nov 16, 2015
06f1fdb
[SPARK-11752] [SQL] fix timezone problem for DateTimeUtils.getSeconds
cloud-fan Nov 16, 2015
b0c3fd3
[SPARK-11743] [SQL] Add UserDefinedType support to RowEncoder
viirya Nov 16, 2015
de5e531
[SPARK-11731][STREAMING] Enable batching on Driver WriteAheadLog by d…
brkyvz Nov 16, 2015
ace0db4
[SPARK-6328][PYTHON] Python API for StreamingListener
djalova Nov 16, 2015
24477d2
[SPARK-11718][YARN][CORE] Fix explicitly killed executor dies silentl…
jerryshao Nov 16, 2015
b1a9662
[SPARK-11754][SQL] consolidate `ExpressionEncoder.tuple` and `Encoder…
cloud-fan Nov 16, 2015
985b38d
[SPARK-11390][SQL] Query plan with/without filterPushdown indistingui…
Nov 16, 2015
3c02508
Revert "[SPARK-11271][SPARK-11016][CORE] Use Spark BitSet instead of …
davies Nov 16, 2015
bcea0bf
[SPARK-11742][STREAMING] Add the failure info to the batch lists
zsxwing Nov 16, 2015
3129662
[SPARK-11553][SQL] Primitive Row accessors should not convert null to…
Nov 16, 2015
75ee12f
[SPARK-8658][SQL] AttributeReference's equals method compares all the…
gatorsmile Nov 16, 2015
fd14936
[SPARK-11625][SQL] add java test for typed aggregate
cloud-fan Nov 16, 2015
ea6f53e
[SPARKR][HOTFIX] Disable flaky SparkR package build test
shivaram Nov 17, 2015
30f3cfd
[SPARK-11480][CORE][WEBUI] Wrong callsite is displayed when using Asy…
sarutak Nov 17, 2015
33a0ec9
[SPARK-11710] Document new memory management model
Nov 17, 2015
bd10eb8
[EXAMPLE][MINOR] Add missing awaitTermination in click stream example
jerryshao Nov 17, 2015
1c5475f
[SPARK-11612][ML] Pipeline and PipelineModel persistence
jkbradley Nov 17, 2015
540bf58
[SPARK-11617][NETWORK] Fix leak in TransportFrameDecoder.
Nov 17, 2015
fbad920
[SPARK-11768][SPARK-9196][SQL] Support now function in SQL (alias for…
rxin Nov 17, 2015
75d2020
[SPARK-11694][FOLLOW-UP] Clean up imports, use a common function for …
HyukjinKwon Nov 17, 2015
e01865a
[SPARK-11447][SQL] change NullType to StringType during binaryCompari…
kevinyu98 Nov 17, 2015
d79d8b0
[MINOR] [SQL] Fix randomly generated ArrayData in RowEncoderSuite
viirya Nov 17, 2015
fa13301
[SPARK-11191][SQL][FOLLOW-UP] Cleans up unnecessary anonymous HiveFun…
liancheng Nov 17, 2015
7276fa9
[SPARK-11751] Doc describe error in the "Spark Streaming Programming …
wypb Nov 17, 2015
15cc36b
[SPARK-11779][DOCS] Fix reference to deprecated MESOS_NATIVE_LIBRARY
philipphoffmann Nov 17, 2015
6fc2740
[SPARK-11744][LAUNCHER] Fix print version throw exception when using …
jerryshao Nov 17, 2015
cc567b6
[SPARK-11695][CORE] Set s3a credentials
Nov 17, 2015
21fac54
[SPARK-11766][MLLIB] add toJson/fromJson to Vector/Vectors
mengxr Nov 17, 2015
e8833dd
[SPARK-11679][SQL] Invoking method " apply(fields: java.util.List[Str…
jackieMaKing Nov 17, 2015
7b1407c
[SPARK-11089][SQL] Adds option for disabling multi-session in Thrift …
liancheng Nov 17, 2015
0158ff7
[SPARK-8658][SQL][FOLLOW-UP] AttributeReference's equals method compa…
gatorsmile Nov 17, 2015
d925149
[SPARK-10186][SQL] support postgre array type in JDBCRDD
cloud-fan Nov 17, 2015
d98d1cb
[SPARK-11769][ML] Add save, load to all basic Transformers
jkbradley Nov 17, 2015
5aca6ad
[SPARK-11767] [SQL] limit the size of caced batch
Nov 17, 2015
fa603e0
[SPARK-11732] Removes some MiMa false positives
thunterdb Nov 17, 2015
328eb49
[SPARK-11729] Replace example code in ml-linear-methods.md using incl…
yinxusen Nov 17, 2015
6eb7008
[SPARK-11763][ML] Add save,load to LogisticRegression Estimator
jkbradley Nov 17, 2015
3e9e638
[SPARK-11764][ML] make Param.jsonEncode/jsonDecode support Vector
mengxr Nov 17, 2015
936bc0b
[SPARK-11786][CORE] Tone down messages from akka error monitor.
Nov 17, 2015
928d631
[SPARK-11740][STREAMING] Fix the race condition of two checkpoints in…
zsxwing Nov 17, 2015
965245d
[SPARK-9552] Add force control for killExecutors to avoid false killi…
GraceH Nov 17, 2015
e29656f
[MINOR] Correct comments in JavaDirectKafkaWordCount
Nov 17, 2015
3720b14
[SPARK-11790][STREAMING][TESTS] Increase the connection timeout
zsxwing Nov 17, 2015
52c734b
[SPARK-11771][YARN][TRIVIAL] maximum memory in yarn is controlled by …
holdenk Nov 17, 2015
b362d50
[SPARK-11726] Throw exception on timeout when waiting for REST server…
jacek-lewandowski Nov 17, 2015
75a2922
[SPARK-9065][STREAMING][PYSPARK] Add MessageHandler for Kafka Python API
jerryshao Nov 18, 2015
ed8d153
[SPARK-11793][SQL] Dataset should set the resolved encoders internall…
rxin Nov 18, 2015
bf25f9b
[SPARK-11016] Move RoaringBitmap to explicit Kryo serializer
Nov 18, 2015
e33053e
[SPARK-11583] [CORE] MapStatus Using RoaringBitmap More Properly
yaooqinn Nov 18, 2015
98be816
[SPARK-11737] [SQL] Fix serialization of UTF8String with Kyro
Nov 18, 2015
91f4b6f
[SPARK-11797][SQL] collect, first, and take should use encoders for s…
rxin Nov 18, 2015
8fb775b
[SPARK-11755][R] SparkR should export "predict"
yanboliang Nov 18, 2015
446738e
[SPARK-11761] Prevent the call to StreamingContext#stop() in the list…
tedyu Nov 18, 2015
67a5132
[SPARK-7013][ML][TEST] Add unit test for spark.ml StandardScaler
RoyGao Nov 18, 2015
2f191c6
[SPARK-11643] [SQL] parse year with leading zero
Nov 18, 2015
9154f89
[SPARK-11728] Replace example code in ml-ensembles.md using include_e…
yinxusen Nov 18, 2015
8019f66
[SPARK-10186][SQL][FOLLOW-UP] simplify test
cloud-fan Nov 18, 2015
5e2b444
[SPARK-11802][SQL] Kryo-based encoder for opaque types in Datasets
rxin Nov 18, 2015
1714350
[SPARK-11792][SQL] SizeEstimator cannot provide a good size estimatio…
yhuai Nov 18, 2015
b8f4379
[SPARK-10946][SQL] JDBC - Use Statement.executeUpdate instead of Prep…
somideshmukh Nov 18, 2015
e62820c
[SPARK-6541] Sort executors by ID (numeric)
jbonofre Nov 18, 2015
9631ca3
[SPARK-11652][CORE] Remote code execution with InvokerTransformer
srowen Nov 18, 2015
1429e0a
rmse was wrongly calculated
Nov 18, 2015
3a6807f
[SPARK-11804] [PYSPARK] Exception raise when using Jdbc predicates opt…
zjffdu Nov 18, 2015
a97d6f3
[SPARK-11281][SPARKR] Add tests covering the issue.
zero323 Nov 18, 2015
224723e
[SPARK-11773][SPARKR] Implement collection functions in SparkR.
Nov 18, 2015
3cca5ff
[SPARK-11195][CORE] Use correct classloader for TaskResultGetter
Nov 18, 2015
cffb899
[SPARK-11803][SQL] fix Dataset self-join
cloud-fan Nov 18, 2015
33b8373
[SPARK-11725][SQL] correctly handle null inputs for UDF
cloud-fan Nov 18, 2015
dbf428c
[SPARK-11795][SQL] combine grouping attributes into a single NamedExp…
cloud-fan Nov 18, 2015
90a7519
[MINOR][BUILD] Ignore ensime cache
jodersky Nov 18, 2015
6f99522
[SPARK-11792] [SQL] [FOLLOW-UP] Change SizeEstimation to KnownSizeEst…
yhuai Nov 18, 2015
94624ea
[SPARK-11739][SQL] clear the instantiated SQLContext
Nov 18, 2015
31921e0
[SPARK-4557][STREAMING] Spark Streaming foreachRDD Java API method sh…
BryanCutler Nov 18, 2015
a416e41
[SPARK-11809] Switch the default Mesos mode to coarse-grained mode
rxin Nov 18, 2015
7c5b641
[SPARK-10745][CORE] Separate configs between shuffle and RPC
zsxwing Nov 18, 2015
09ad953
[SPARK-11720][SQL][ML] Handle edge cases when count = 0 or 1 for Stat…
JihongMA Nov 18, 2015
045a4f0
[SPARK-6790][ML] Add spark.ml LinearRegression import/export
fayeshine Nov 18, 2015
2acdf10
[SPARK-6789][ML] Add Readable, Writable support for spark.ml ALS, ALS…
jkbradley Nov 18, 2015
e391abd
[SPARK-11813][MLLIB] Avoid serialization of vocab in Word2Vec
hhbyyh Nov 18, 2015
e222d75
[SPARK-11684][R][ML][DOC] Update SparkR glm API doc, user guide and e…
yanboliang Nov 18, 2015
603a721
[SPARK-11820][ML][PYSPARK] PySpark LiR & LoR should support weightCol
yanboliang Nov 18, 2015
54db797
[SPARK-11544][SQL] sqlContext doesn't use PathFilter
dilipbiswal Nov 18, 2015
5df0894
[SPARK-11810][SQL] Java-based encoder for opaque types in Datasets.
rxin Nov 18, 2015
7e987de
[SPARK-6787][ML] add read/write to estimators under ml.feature (1)
mengxr Nov 18, 2015
3a98519
[SPARK-11649] Properly set Akka frame size in SparkListenerSuite test
JoshRosen Nov 18, 2015
c07a50b
[SPARK-10930] History "Stages" page "duration" can be confusing
Nov 18, 2015
4b11712
[SPARK-11495] Fix potential socket / file handle leaks that were foun…
JoshRosen Nov 19, 2015
a402c92
[SPARK-11814][STREAMING] Add better default checkpoint duration
tdas Nov 19, 2015
921900f
[SPARK-11791] Fix flaky test in BatchedWriteAheadLogSuite
brkyvz Nov 19, 2015
59a5013
[SPARK-11636][SQL] Support classes defined in the REPL with Encoders
marmbrus Nov 19, 2015
e99d339
[SPARK-11839][ML] refactor save/write traits
mengxr Nov 19, 2015
e61367b
[SPARK-11833][SQL] Add Java tests for Kryo/Java Dataset encoders
rxin Nov 19, 2015
6d0848b
[SPARK-11787][SQL] Improve Parquet scan performance when using flat s…
nongli Nov 19, 2015
9c0654d
Revert "[SPARK-11544][SQL] sqlContext doesn't use PathFilter"
yhuai Nov 19, 2015
67c7582
[SPARK-11816][ML] fix some style issue in ML/MLlib examples
hhbyyh Nov 19, 2015
fc3f77b
[SPARK-11614][SQL] serde parameters should be set only when all param…
navis Nov 19, 2015
d02d5b9
[SPARK-11842][ML] Small cleanups to existing Readers and Writers
jkbradley Nov 19, 2015
1a93323
[SPARK-11339][SPARKR] Document the list of functions in R base packag…
felixcheung Nov 19, 2015
f449992
[SPARK-11849][SQL] Analyzer should replace current_date and current_t…
rxin Nov 19, 2015
9628788
[SPARK-11840][SQL] Restore the 1.5's behavior of planning a single di…
yhuai Nov 19, 2015
72d150c
[SPARK-11830][CORE] Make NettyRpcEnv bind to the specified host
zsxwing Nov 19, 2015
276a7e1
[SPARK-11633][SQL] LogicalRDD throws TreeNode Exception : Failed to C…
gatorsmile Nov 19, 2015
7d4aba1
[SPARK-11848][SQL] Support EXPLAIN in DataSet APIs
gatorsmile Nov 19, 2015
47d1c23
[SPARK-11750][SQL] revert SPARK-11727 and code clean up
cloud-fan Nov 19, 2015
4700074
[SPARK-11778][SQL] parse table name before it is passed to lookupRela…
Nov 19, 2015
599a8c6
[SPARK-11812][PYSPARK] invFunc=None works properly with python's redu…
dtolpin Nov 19, 2015
014c0f7
[SPARK-11858][SQL] Move sql.columnar into sql.execution.
rxin Nov 19, 2015
90d384d
[SPARK-11831][CORE][TESTS] Use port 0 to avoid port conflicts in tests
zsxwing Nov 19, 2015
3bd77b2
[SPARK-11799][CORE] Make it explicit in executor logs that uncaught e…
Nov 19, 2015
f7135ed
[SPARK-11828][CORE] Register DAGScheduler metrics source after app id…
Nov 19, 2015
01403aa
[SPARK-11746][CORE] Use cache-aware method dependencies
suyanNone Nov 19, 2015
37cff1b
[SPARK-11275][SQL] Incorrect results when using rollup/cube
aray Nov 19, 2015
880128f
[SPARK-4134][CORE] Lower severity of some executor loss logs.
Nov 20, 2015
b2cecb8
[SPARK-11845][STREAMING][TEST] Added unit test to verify TrackStateRD…
tdas Nov 20, 2015
ee21407
[SPARK-11864][SQL] Improve performance of max/min
Nov 20, 2015
7ee7d5a
[SPARK-11544][SQL][TEST-HADOOP1.0] sqlContext doesn't use PathFilter
dilipbiswal Nov 20, 2015
4114ce2
[SPARK-11846] Add save/load for AFTSurvivalRegression and IsotonicReg…
yinxusen Nov 20, 2015
3b7f056
[SPARK-11829][ML] Add read/write to estimators under ml.feature (II)
yanboliang Nov 20, 2015
7216f40
[SPARK-11875][ML][PYSPARK] Update doc for PySpark HasCheckpointInterval
yanboliang Nov 20, 2015
0fff8eb
[SPARK-11869][ML] Clean up TempDirectory properly in ML tests
jkbradley Nov 20, 2015
3e1d120
[SPARK-11867] Add save/load for kmeans and naive bayes
yinxusen Nov 20, 2015
a66142d
[SPARK-11877] Prevent agg. fallback conf. from leaking across test su…
JoshRosen Nov 20, 2015
9ace2e5
[SPARK-11852][ML] StandardScaler minor refactor
yanboliang Nov 20, 2015
e359d5d
[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml
hhbyyh Nov 20, 2015
bef361c
[SPARK-11876][SQL] Support printSchema in DataSet API
gatorsmile Nov 20, 2015
60bfb11
[SPARK-11817][SQL] Truncating the fractional seconds to prevent inser…
viirya Nov 20, 2015
3b9d2a3
[SPARK-11819][SQL] nice error message for missing encoder
cloud-fan Nov 20, 2015
652def3
[SPARK-11650] Reduce RPC timeouts to speed up slow AkkaUtilsSuite test
JoshRosen Nov 20, 2015
9ed4ad4
[SPARK-11724][SQL] Change casting between int and timestamp to consis…
nongli Nov 20, 2015
be7a2cf
[SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in Transform…
zsxwing Nov 20, 2015
89fd9bd
[SPARK-11887] Close PersistenceEngine at the end of PersistenceEngine…
JoshRosen Nov 20, 2015
03ba56d
[SPARK-11716][SQL] UDFRegistration just drops the input type when re-…
jbonofre Nov 20, 2015
a6239d5
[SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help…
felixcheung Nov 20, 2015
4b84c72
[SPARK-11636][SQL] Support classes defined in the REPL with Encoders
marmbrus Nov 20, 2015
ed47b1e
[SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.…
Nov 20, 2015
58b4e4f
[SPARK-11787][SPARK-11883][SQL][FOLLOW-UP] Cleanup for this patch.
nongli Nov 20, 2015
968acf3
[SPARK-11889][SQL] Fix type inference for GroupedDataset.agg in REPL
marmbrus Nov 20, 2015
68ed046
[SPARK-11890][SQL] Fix compilation for Scala 2.11
marmbrus Nov 20, 2015
4781587
[HOTFIX] Fix Java Dataset Tests
marmbrus Nov 21, 2015
a2dce22
Revert "[SPARK-11689][ML] Add user guide and example code for LDA und…
mengxr Nov 21, 2015
7d3f922
[SPARK-11819][SQL][FOLLOW-UP] fix scala 2.11 build
cloud-fan Nov 21, 2015
54328b6
[SPARK-11900][SQL] Add since version for all encoders
rxin Nov 21, 2015
5967102
[SPARK-11901][SQL] API audit for Aggregator.
rxin Nov 21, 2015
ff442bb
[SPARK-11899][SQL] API audit for GroupedDataset.
rxin Nov 21, 2015
426004a
[SPARK-11908][SQL] Add NullType support to RowEncoder
viirya Nov 22, 2015
fe89c18
[SPARK-11895][ML] rename and refactor DatasetExample under mllib/exam…
mengxr Nov 23, 2015
a6fda0b
[SPARK-6791][ML] Add read/write for CrossValidator and Evaluators
jkbradley Nov 23, 2015
fc4b792
[SPARK-11835] Adds a sidebar menu to MLlib's documentation
thunterdb Nov 23, 2015
d9cf9c2
[SPARK-11912][ML] ml.feature.PCA minor refactor
yanboliang Nov 23, 2015
4be360d
[SPARK-11902][ML] Unhandled case in VectorAssembler#transform
BenFradet Nov 23, 2015
94ce65d
[SPARK-11628][SQL] support column datatype of char(x) to recognize Hi…
xguo27 Nov 23, 2015
1a5baaa
[SPARK-11894][SQL] fix isNull for GetInternalRowField
cloud-fan Nov 23, 2015
f2996e0
[SPARK-11921][SQL] fix `nullable` of encoder schema
cloud-fan Nov 23, 2015
946b406
[SPARK-11913][SQL] support typed aggregate with complex buffer schema
cloud-fan Nov 23, 2015
5fd86e4
[SPARK-7173][YARN] Add label expression support for application master
jerryshao Nov 23, 2015
5231cd5
[SPARK-11762][NETWORK] Account for active streams when couting outsta…
Nov 23, 2015
98d7ec7
[SPARK-11920][ML][DOC] ML LinearRegression should use correct dataset…
yanboliang Nov 23, 2015
f6dcc6e
[SPARK-11837][EC2] python3 compatibility for launching ec2 m3 instances
mortada Nov 23, 2015
1b6e938
[SPARK-4424] Remove spark.driver.allowMultipleContexts override in tests
JoshRosen Nov 23, 2015
1d91202
[SPARK-11836][SQL] udf/cast should not create new SQLContext
Nov 23, 2015
242be7d
[SPARK-11910][STREAMING][DOCS] Update twitter4j dependency version
lresende Nov 23, 2015
7cfa4c6
[SPARK-11865][NETWORK] Avoid returning inactive client in TransportCl…
Nov 23, 2015
c2467da
[SPARK-11140][CORE] Transfer files using network lib when using Netty…
Nov 23, 2015
9db5f60
[SPARK-9866][SQL] Speed up VersionsSuite by using persistent Ivy cache
JoshRosen Nov 24, 2015
1057456
[SPARK-10560][PYSPARK][MLLIB][DOCS] Make StreamingLogisticRegressionW…
BryanCutler Nov 24, 2015
026ea2e
Updated sql programming guide to include jdbc fetch size
sksamuel Nov 24, 2015
8d57524
[SPARK-11933][SQL] Rename mapGroup -> mapGroups and flatMapGroup -> f…
rxin Nov 24, 2015
6cf51a7
[SPARK-11903] Remove --skip-java-test
nchammas Nov 24, 2015
4021a28
[SPARK-10707][SQL] Fix nullability computation in union output
mbautin Nov 24, 2015
12eea83
[SPARK-11897][SQL] Add @scala.annotations.varargs to sql functions
xguo27 Nov 24, 2015
800bd79
[SPARK-11906][WEB UI] Speculation Tasks Cause ProgressBar UI Overflow
saurfang Nov 24, 2015
d4a5e6f
[SPARK-11043][SQL] BugFix:Set the operator log in the thrift server.
SaintBacchus Nov 24, 2015
5889880
[SPARK-11592][SQL] flush spark-sql command line history to history file
adrian-wang Nov 24, 2015
be9dd15
[SPARK-11818][REPL] Fix ExecutorClassLoader to lookup resources from …
HeartSaVioR Nov 24, 2015
e5aaae6
[SPARK-11942][SQL] fix encoder life cycle for CoGroup
cloud-fan Nov 24, 2015
56a0aba
[SPARK-11952][ML] Remove duplicate ml examples
yanboliang Nov 24, 2015
9e24ba6
[SPARK-11521][ML][DOC] Document that Logistic, Linear Regression summ…
jkbradley Nov 24, 2015
52bc25c
[SPARK-11847][ML] Model export/import for spark.ml: LDA
hhbyyh Nov 24, 2015
19530da
[SPARK-11926][SQL] unify GetStructField and GetInternalRowField
cloud-fan Nov 24, 2015
8101254
[SPARK-11872] Prevent the call to SparkContext#stop() in the listener…
tedyu Nov 24, 2015
f315272
[SPARK-11946][SQL] Audit pivot API for 1.6.
rxin Nov 24, 2015
e6dd237
[SPARK-11929][CORE] Make the repl log4j configuration override the ro…
Nov 24, 2015
58d9b26
[SPARK-11805] free the array in UnsafeExternalSorter during spilling
Nov 24, 2015
34ca392
Added a line of comment to explain why the extra sort exists in pivot.
rxin Nov 24, 2015
c7f95df
[SPARK-11783][SQL] Fixes execution Hive client when using remote Hive…
liancheng Nov 24, 2015
238ae51
[SPARK-11914][SQL] Support coalesce and repartition in Dataset APIs
gatorsmile Nov 24, 2015
25bbd3c
[SPARK-11967][SQL] Consistent use of varargs for multiple paths in Da…
rxin Nov 25, 2015
4d6bbbc
[SPARK-11947][SQL] Mark deprecated methods with "This will be removed…
rxin Nov 25, 2015
a5d9887
[STREAMING][FLAKY-TEST] Catch execution context race condition in `Fi…
brkyvz Nov 25, 2015
151d7c2
[SPARK-10621][SQL] Consistent naming for functions in SQL, Python, Scala
rxin Nov 25, 2015
2169886
[SPARK-11979][STREAMING] Empty TrackStateRDD cannot be checkpointed a…
tdas Nov 25, 2015
2610e06
[SPARK-11970][SQL] Adding JoinType into JoinWith and support Sample i…
gatorsmile Nov 25, 2015
a0f1a11
[SPARK-11981][SQL] Move implementations of methods back to DataFrame …
rxin Nov 25, 2015
6385002
[SPARK-11686][CORE] Issue WARN when dynamic allocation is disabled du…
Nov 25, 2015
b9b6fbe
[SPARK-11860][PYSAPRK][DOCUMENTATION] Invalid argument specification …
zjffdu Nov 25, 2015
0a5aef7
[SPARK-10666][SPARK-6880][CORE] Use properties from ActiveJob associa…
markhamstra Nov 25, 2015
c1f85fc
[SPARK-11956][CORE] Fix a few bugs in network lib-based file transfer.
Nov 25, 2015
faabdfa
[SPARK-11984][SQL][PYTHON] Fix typos in doc for pivot for scala and p…
felixcheung Nov 25, 2015
6b78157
[SPARK-11974][CORE] Not all the temp dirs had been deleted when the J…
pzzs Nov 25, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Revert "[SPARK-11271][SPARK-11016][CORE] Use Spark BitSet instead of …
…RoaringBitmap to reduce memory usage"

This reverts commit e209fa2.
  • Loading branch information
davies committed Nov 16, 2015
commit 3c025087b58f475a9bcb5c8f4b2b2df804915b2b
4 changes: 4 additions & 0 deletions core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,10 @@
<groupId>net.jpountz.lz4</groupId>
<artifactId>lz4</artifactId>
</dependency>
<dependency>
<groupId>org.roaringbitmap</groupId>
<artifactId>RoaringBitmap</artifactId>
</dependency>
<dependency>
<groupId>commons-net</groupId>
<artifactId>commons-net</artifactId>
Expand Down
13 changes: 7 additions & 6 deletions core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@ package org.apache.spark.scheduler

import java.io.{Externalizable, ObjectInput, ObjectOutput}

import org.roaringbitmap.RoaringBitmap

import org.apache.spark.storage.BlockManagerId
import org.apache.spark.util.collection.BitSet
import org.apache.spark.util.Utils

/**
Expand Down Expand Up @@ -132,7 +133,7 @@ private[spark] class CompressedMapStatus(
private[spark] class HighlyCompressedMapStatus private (
private[this] var loc: BlockManagerId,
private[this] var numNonEmptyBlocks: Int,
private[this] var emptyBlocks: BitSet,
private[this] var emptyBlocks: RoaringBitmap,
private[this] var avgSize: Long)
extends MapStatus with Externalizable {

Expand All @@ -145,7 +146,7 @@ private[spark] class HighlyCompressedMapStatus private (
override def location: BlockManagerId = loc

override def getSizeForBlock(reduceId: Int): Long = {
if (emptyBlocks.get(reduceId)) {
if (emptyBlocks.contains(reduceId)) {
0
} else {
avgSize
Expand All @@ -160,7 +161,7 @@ private[spark] class HighlyCompressedMapStatus private (

override def readExternal(in: ObjectInput): Unit = Utils.tryOrIOException {
loc = BlockManagerId(in)
emptyBlocks = new BitSet
emptyBlocks = new RoaringBitmap()
emptyBlocks.readExternal(in)
avgSize = in.readLong()
}
Expand All @@ -176,15 +177,15 @@ private[spark] object HighlyCompressedMapStatus {
// From a compression standpoint, it shouldn't matter whether we track empty or non-empty
// blocks. From a performance standpoint, we benefit from tracking empty blocks because
// we expect that there will be far fewer of them, so we will perform fewer bitmap insertions.
val emptyBlocks = new RoaringBitmap()
val totalNumBlocks = uncompressedSizes.length
val emptyBlocks = new BitSet(totalNumBlocks)
while (i < totalNumBlocks) {
var size = uncompressedSizes(i)
if (size > 0) {
numNonEmptyBlocks += 1
totalSize += size
} else {
emptyBlocks.set(i)
emptyBlocks.add(i)
}
i += 1
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import com.esotericsoftware.kryo.io.{Input => KryoInput, Output => KryoOutput}
import com.esotericsoftware.kryo.serializers.{JavaSerializer => KryoJavaSerializer}
import com.twitter.chill.{AllScalaRegistrar, EmptyScalaKryoInstantiator}
import org.apache.avro.generic.{GenericData, GenericRecord}
import org.roaringbitmap.{ArrayContainer, BitmapContainer, RoaringArray, RoaringBitmap}

import org.apache.spark._
import org.apache.spark.api.python.PythonBroadcast
Expand All @@ -38,7 +39,7 @@ import org.apache.spark.network.util.ByteUnit
import org.apache.spark.scheduler.{CompressedMapStatus, HighlyCompressedMapStatus}
import org.apache.spark.storage._
import org.apache.spark.util.{Utils, BoundedPriorityQueue, SerializableConfiguration, SerializableJobConf}
import org.apache.spark.util.collection.{BitSet, CompactBuffer}
import org.apache.spark.util.collection.CompactBuffer

/**
* A Spark serializer that uses the [[https://code.google.com/p/kryo/ Kryo serialization library]].
Expand Down Expand Up @@ -362,7 +363,12 @@ private[serializer] object KryoSerializer {
classOf[StorageLevel],
classOf[CompressedMapStatus],
classOf[HighlyCompressedMapStatus],
classOf[BitSet],
classOf[RoaringBitmap],
classOf[RoaringArray],
classOf[RoaringArray.Element],
classOf[Array[RoaringArray.Element]],
classOf[ArrayContainer],
classOf[BitmapContainer],
classOf[CompactBuffer[_]],
classOf[BlockManagerId],
classOf[Array[Byte]],
Expand Down
28 changes: 3 additions & 25 deletions core/src/main/scala/org/apache/spark/util/collection/BitSet.scala
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,14 @@

package org.apache.spark.util.collection

import java.io.{Externalizable, ObjectInput, ObjectOutput}

import org.apache.spark.util.{Utils => UUtils}


/**
* A simple, fixed-size bit set implementation. This implementation is fast because it avoids
* safety/bound checking.
*/
class BitSet(private[this] var numBits: Int) extends Externalizable {
class BitSet(numBits: Int) extends Serializable {

private var words = new Array[Long](bit2words(numBits))
private def numWords = words.length

def this() = this(0)
private val words = new Array[Long](bit2words(numBits))
private val numWords = words.length

/**
* Compute the capacity (number of bits) that can be represented
Expand Down Expand Up @@ -237,19 +230,4 @@ class BitSet(private[this] var numBits: Int) extends Externalizable {

/** Return the number of longs it would take to hold numBits. */
private def bit2words(numBits: Int) = ((numBits - 1) >> 6) + 1

override def writeExternal(out: ObjectOutput): Unit = UUtils.tryOrIOException {
out.writeInt(numBits)
words.foreach(out.writeLong(_))
}

override def readExternal(in: ObjectInput): Unit = UUtils.tryOrIOException {
numBits = in.readInt()
words = new Array[Long](bit2words(numBits))
var index = 0
while (index < words.length) {
words(index) = in.readLong()
index += 1
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,12 @@ class KryoSerializerSuite extends SparkFunSuite with SharedSparkContext {
val conf = new SparkConf(false)
conf.set("spark.kryo.registrationRequired", "true")

// these cases require knowing the internals of RoaringBitmap a little. Blocks span 2^16
// values, and they use a bitmap (dense) if they have more than 4096 values, and an
// array (sparse) if they use less. So we just create two cases, one sparse and one dense.
// and we use a roaring bitmap for the empty blocks, so we trigger the dense case w/ mostly
// empty blocks

val ser = new KryoSerializer(conf).newInstance()
val denseBlockSizes = new Array[Long](5000)
val sparseBlockSizes = Array[Long](0L, 1L, 0L, 2L)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,7 @@

package org.apache.spark.util.collection

import java.io.{File, FileInputStream, FileOutputStream, ObjectInputStream, ObjectOutputStream}

import org.apache.spark.SparkFunSuite
import org.apache.spark.util.{Utils => UUtils}

class BitSetSuite extends SparkFunSuite {

Expand Down Expand Up @@ -155,50 +152,4 @@ class BitSetSuite extends SparkFunSuite {
assert(bitsetDiff.nextSetBit(85) === 85)
assert(bitsetDiff.nextSetBit(86) === -1)
}

test("read and write externally") {
val tempDir = UUtils.createTempDir()
val outputFile = File.createTempFile("bits", null, tempDir)

val fos = new FileOutputStream(outputFile)
val oos = new ObjectOutputStream(fos)

// Create BitSet
val setBits = Seq(0, 9, 1, 10, 90, 96)
val bitset = new BitSet(100)

for (i <- 0 until 100) {
assert(!bitset.get(i))
}

setBits.foreach(i => bitset.set(i))

for (i <- 0 until 100) {
if (setBits.contains(i)) {
assert(bitset.get(i))
} else {
assert(!bitset.get(i))
}
}
assert(bitset.cardinality() === setBits.size)

bitset.writeExternal(oos)
oos.close()

val fis = new FileInputStream(outputFile)
val ois = new ObjectInputStream(fis)

// Read BitSet from the file
val bitset2 = new BitSet(0)
bitset2.readExternal(ois)

for (i <- 0 until 100) {
if (setBits.contains(i)) {
assert(bitset2.get(i))
} else {
assert(!bitset2.get(i))
}
}
assert(bitset2.cardinality() === setBits.size)
}
}
5 changes: 5 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -634,6 +634,11 @@
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.roaringbitmap</groupId>
<artifactId>RoaringBitmap</artifactId>
<version>0.4.5</version>
</dependency>
<dependency>
<groupId>commons-net</groupId>
<artifactId>commons-net</artifactId>
Expand Down