Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
189 commits
Select commit Hold shift + click to select a range
ae9e424
[SQL][MINOR][TEST] Set spark.unsafe.exceptionOnMemoryLeak to true
gatorsmile Aug 17, 2017
6aad02d
[SPARK-18394][SQL] Make an AttributeSet.toSeq output order consistent
maropu Aug 17, 2017
bfdc361
[SPARK-16742] Mesos Kerberos Support
Aug 17, 2017
7ab9518
[SPARK-21677][SQL] json_tuple throws NullPointException when column i…
jmchung Aug 17, 2017
2caaed9
[SPARK-21767][TEST][SQL] Add Decimal Test For Avro in VersionSuite
gatorsmile Aug 17, 2017
310454b
[SPARK-21739][SQL] Cast expression should initialize timezoneId when …
DonnyZone Aug 18, 2017
07a2b87
[SPARK-21778][SQL] Simpler Dataset.sample API in Scala / Java
rxin Aug 18, 2017
23ea898
[SPARK-21213][SQL] Support collecting partition-level statistics: row…
mbasmanova Aug 18, 2017
7880909
[SPARK-21743][SQL][FOLLOW-UP] top-most limit should not cause memory …
cloud-fan Aug 18, 2017
a2db5c5
[MINOR][TYPO] Fix typos: runnning and Excecutors
ash211 Aug 18, 2017
10be018
[SPARK-21566][SQL][PYTHON] Python method for summary
aray Aug 19, 2017
72b738d
[SPARK-21790][TESTS] Fix Docker-based Integration Test errors.
wangyum Aug 19, 2017
73e04ec
[MINOR] Correct validateAndTransformSchema in GaussianMixture and AFT…
sharp-pixel Aug 20, 2017
41e0eb7
[SPARK-21773][BUILD][DOCS] Installs mkdocs if missing in the path in …
HyukjinKwon Aug 20, 2017
28a6cca
[SPARK-21721][SQL][FOLLOWUP] Clear FileSystem deleteOnExit cache when…
viirya Aug 20, 2017
77d046e
[SPARK-21782][CORE] Repartition creates skews when numPartitions is a…
megaserg Aug 21, 2017
b3a0752
[SPARK-21718][SQL] Heavy log of type: "Skipping partition based on st…
srowen Aug 21, 2017
988b84d
[SPARK-21468][PYSPARK][ML] Python API for FeatureHasher
Aug 21, 2017
ba84329
[SPARK-21790][TESTS][FOLLOW-UP] Add filter pushdown verification back.
wangyum Aug 21, 2017
84b5b16
[SPARK-21617][SQL] Store correct table metadata when altering schema …
Aug 21, 2017
c108a5d
[SPARK-19762][ML][FOLLOWUP] Add necessary comments to L2Regularization.
yanboliang Aug 22, 2017
751f513
[SPARK-21070][PYSPARK] Attempt to update cloudpickle again
rgbkrk Aug 22, 2017
5c9b301
[SPARK-21584][SQL][SPARKR] Update R method for summary to call new im…
aray Aug 22, 2017
be72b15
[SPARK-21803][TEST] Remove the HiveDDLCommandSuite
gatorsmile Aug 22, 2017
3ed1ae1
[SPARK-20641][CORE] Add missing kvstore module in Laucher and SparkSu…
jerryshao Aug 22, 2017
43d71d9
[SPARK-21499][SQL] Support creating persistent function for Spark UDA…
gatorsmile Aug 22, 2017
01a8e46
[SPARK-21769][SQL] Add a table-specific option for always respecting …
gatorsmile Aug 22, 2017
d56c262
[SPARK-21681][ML] fix bug of MLOR do not work correctly when featureS…
WeichenXu123 Aug 22, 2017
41bb1dd
[SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Values from Esti…
BryanCutler Aug 23, 2017
3c0c2d0
[SPARK-21765] Set isStreaming on leaf nodes for streaming plans.
Aug 23, 2017
3429619
[ML][MINOR] Make sharedParams update.
yanboliang Aug 23, 2017
d58a350
[SPARK-19326] Speculated task attempts do not get launched in few sce…
janewangfb Aug 23, 2017
d6b30ed
[SPARK-12664][ML] Expose probability in mlp model
WeichenXu123 Aug 23, 2017
1662e93
[SPARK-21501] Change CacheLoader to limit entries based on memory foo…
Aug 23, 2017
6942aee
[SPARK-21603][SQL][FOLLOW-UP] Change the default value of maxLinesPer…
maropu Aug 23, 2017
b8aaef4
[SPARK-21807][SQL] Override ++ operation in ExpressionSet to reduce c…
eatoncys Aug 24, 2017
43cbfad
[SPARK-21805][SPARKR] Disable R vignettes code on Windows
felixcheung Aug 24, 2017
ce0d3bb
[SPARK-21694][MESOS] Support Mesos CNI network labels
susanxhuynh Aug 24, 2017
846bc61
[MINOR][SQL] The comment of Class ExchangeCoordinator exist a typing …
figo77 Aug 24, 2017
95713eb
[SPARK-21804][SQL] json_tuple returns null values within repeated col…
jmchung Aug 24, 2017
dc5d34d
[SPARK-19165][PYTHON][SQL] PySpark APIs using columns as arguments sh…
HyukjinKwon Aug 24, 2017
9e33954
[SPARK-21745][SQL] Refactor ColumnVector hierarchy to make ColumnVect…
ueshin Aug 24, 2017
183d4cb
[SPARK-21759][SQL] In.checkInputDataTypes should not wrongly report u…
viirya Aug 24, 2017
2dd37d8
[SPARK-21826][SQL] outer broadcast hash join should not throw NPE
cloud-fan Aug 24, 2017
d3abb36
[SPARK-21788][SS] Handle more exceptions when stopping a streaming query
zsxwing Aug 24, 2017
763b83e
[SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF` and ` SO_SN…
Aug 24, 2017
05af2de
[SPARK-21830][SQL] Bump ANTLR version and fix a few issues.
hvanhovell Aug 24, 2017
f3676d6
[SPARK-21108][ML] convert LinearSVC to aggregator framework
YY-OnCall Aug 25, 2017
7d16776
[SPARK-21255][SQL][WIP] Fixed NPE when creating encoder for enum
mike0sv Aug 25, 2017
574ef6c
[SPARK-21527][CORE] Use buffer limit in order to use JAVA NIO Util's …
caneGuy Aug 25, 2017
de7af29
[MINOR][BUILD] Fix build warnings and Java lint errors
srowen Aug 25, 2017
1f24cee
[SPARK-21832][TEST] Merge SQLBuilderTest into ExpressionSQLBuilderSuite
dongjoon-hyun Aug 25, 2017
1813c4a
[SPARK-21714][CORE][YARN] Avoiding re-uploading remote resources in y…
jerryshao Aug 25, 2017
628bdea
[SPARK-17742][CORE] Fail launcher app handle if child process exits w…
Aug 25, 2017
51620e2
[SPARK-21756][SQL] Add JSON option to allow unquoted control characters
vinodkc Aug 25, 2017
1a598d7
[SPARK-21837][SQL][TESTS] UserDefinedTypeSuite Local UDTs not actuall…
srowen Aug 25, 2017
522e1f8
[SPARK-21831][TEST] Remove `spark.sql.hive.convertMetastoreOrc` confi…
dongjoon-hyun Aug 26, 2017
3b66b1c
[MINOR][DOCS] Minor doc fixes related with doc build and uses script …
HyukjinKwon Aug 26, 2017
07142cf
[SPARK-21843] testNameNote should be "(minNumPostShufflePartitions: 5)"
iamhumanbeing Aug 27, 2017
0456b40
[SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSummarizer.vari…
WeichenXu123 Aug 28, 2017
24e6c18
[SPARK-21798] No config to replace deprecated SPARK_CLASSPATH config …
Aug 28, 2017
73e64f7
[SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit Test coverage …
erenavsarogullari Aug 28, 2017
c7270a4
[SPARK-17139][ML] Add model summary for MultinomialLogisticRegression
WeichenXu123 Aug 28, 2017
32fa0b8
[SPARK-21781][SQL] Modify DataSourceScanExec to use concrete ColumnVe…
ueshin Aug 29, 2017
8fcbda9
[SPARK-21848][SQL] Add trait UserDefinedExpression to identify user-d…
gengliangwang Aug 29, 2017
6327ea5
[SPARK-21255][SQL] simplify encoder for java enum
cloud-fan Aug 29, 2017
6077e3e
[SPARK-21801][SPARKR][TEST] unit test randomly fail with randomforest
felixcheung Aug 29, 2017
840ba05
[MINOR][ML] Document treatment of instance weights in logreg summary
jkbradley Aug 29, 2017
d7b1fcf
[SPARK-21728][CORE] Allow SparkSubmit to use Logging.
Aug 29, 2017
fba9cc8
[SPARK-21813][CORE] Modify TaskMemoryManager.MAXIMUM_PAGE_SIZE_BYTES …
Geek-He Aug 29, 2017
3d0e174
[SPARK-21845][SQL] Make codegen fallback of expressions configurable
gatorsmile Aug 30, 2017
e47f48c
[SPARK-20886][CORE] HadoopMapReduceCommitProtocol to handle FileOutpu…
steveloughran Aug 30, 2017
66338b8
Merge remote-tracking branch 'origin/master' into palantir-master
ash211 Aug 30, 2017
228560c
Resolve conflicts
ash211 Aug 30, 2017
90b58ac
Fix broken Kubernetes code
ash211 Aug 30, 2017
d4895c9
[MINOR][TEST] Off -heap memory leaks for unit tests
10110346 Aug 30, 2017
8f0df6b
[SPARK-21873][SS] - Avoid using `return` inside `CachedKafkaConsumer.…
Aug 30, 2017
734ed7a
[SPARK-21806][MLLIB] BinaryClassificationMetrics pr(): first point (0…
srowen Aug 30, 2017
b30a11a
[SPARK-21764][TESTS] Fix tests failures on Windows: resources not bei…
HyukjinKwon Aug 30, 2017
4133c1b
[SPARK-21469][ML][EXAMPLES] Adding Examples for FeatureHasher
BryanCutler Aug 30, 2017
32d6d9d
Revert "[SPARK-21845][SQL] Make codegen fallback of expressions confi…
gatorsmile Aug 30, 2017
235d283
[MINOR][SQL][TEST] Test shuffle hash join while is not expected
heary-cao Aug 30, 2017
8a7acba
Fix Java style bugs
ash211 Aug 30, 2017
6949a9c
[SPARK-21834] Incorrect executor request in case of dynamic allocation
Aug 30, 2017
d8f4540
[SPARK-21839][SQL] Support SQL config for ORC compression
dongjoon-hyun Aug 30, 2017
313c6ca
[SPARK-21875][BUILD] Fix Java style bugs
ash211 Aug 31, 2017
cd5d0f3
[SPARK-11574][CORE] Add metrics StatsD sink
Aug 31, 2017
37c725d
Our newer parquet is 40bytes more efficient: 1.9.1-palantir3 vs 1.8.2
ash211 Aug 30, 2017
4482ff2
[SPARK-17321][YARN] Avoid writing shuffle metadata to disk if NM reco…
jerryshao Aug 31, 2017
ecf437a
[SPARK-21534][SQL][PYSPARK] PickleException when creating dataframe f…
viirya Aug 31, 2017
964b507
[SPARK-21583][SQL] Create a ColumnarBatch from ArrowColumnVectors
BryanCutler Aug 31, 2017
19b0240
[SPARK-21878][SQL][TEST] Create SQLMetricsTestUtils
gatorsmile Aug 31, 2017
9696580
[SPARK-21886][SQL] Use SparkSession.internalCreateDataFrame to create…
jaceklaskowski Aug 31, 2017
fc45c2c
[SPARK-20812][MESOS] Add secrets support to the dispatcher
Aug 31, 2017
501370d
[SPARK-21583][HOTFIX] Removed intercept in test causing failures
BryanCutler Aug 31, 2017
7ce1108
[SPARK-17107][SQL][FOLLOW-UP] Remove redundant pushdown rule for Union
gatorsmile Aug 31, 2017
cba69ae
[SPARK-21110][SQL] Structs, arrays, and other orderable datatypes sho…
aray Aug 31, 2017
96028e3
[SPARK-17139][ML][FOLLOW-UP] Add convenient method `asBinary` for cas…
WeichenXu123 Aug 31, 2017
f5e10a3
[SPARK-21862][ML] Add overflow check in PCA
WeichenXu123 Aug 31, 2017
5cd8ea9
[SPARK-21779][PYTHON] Simpler DataFrame.sample API in Python
HyukjinKwon Sep 1, 2017
648a862
[SPARK-21789][PYTHON] Remove obsolete codes for parsing abstract sche…
HyukjinKwon Sep 1, 2017
0bdbefe
[SPARK-21728][CORE] Follow up: fix user config, auth in SparkSubmit l…
Sep 1, 2017
12f0d24
[SPARK-21880][WEB UI] In the SQL table page, modify jobs trace inform…
Geek-He Sep 1, 2017
12ab7f7
[SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add…
srowen Sep 1, 2017
aba9492
[SPARK-21895][SQL] Support changing database in HiveClient
gatorsmile Sep 1, 2017
900f14f
[SPARK-21729][ML][TEST] Generic test for ProbabilisticClassifier to e…
WeichenXu123 Sep 2, 2017
acb7fed
[SPARK-21891][SQL] Add TBLPROPERTIES to DDL statement: CREATE TABLE U…
gatorsmile Sep 2, 2017
07fd68a
[SPARK-21897][PYTHON][R] Add unionByName API to DataFrame in Python a…
HyukjinKwon Sep 3, 2017
9f30d92
[SPARK-21654][SQL] Complement SQL predicates expression description
viirya Sep 4, 2017
ca59445
[SPARK-21418][SQL] NoSuchElementException: None.get in DataSourceScan…
srowen Sep 4, 2017
4e7a29e
[SPARK-21913][SQL][TEST] withDatabase` should drop database with CASCADE
dongjoon-hyun Sep 5, 2017
7f3c6ff
[SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.
HyukjinKwon Sep 5, 2017
02a4386
[SPARK-20978][SQL] Bump up Univocity version to 2.5.4
HyukjinKwon Sep 5, 2017
2974406
[SPARK-21845][SQL][TEST-MAVEN] Make codegen fallback of expressions c…
gatorsmile Sep 5, 2017
8c954d2
[SPARK-21925] Update trigger interval documentation in docs with beha…
brkyvz Sep 5, 2017
fd60d4f
[SPARK-21652][SQL] Fix rule confliction between InferFiltersFromConst…
jiangxb1987 Sep 5, 2017
9e451bc
[MINOR][DOC] Update `Partition Discovery` section to enumerate all av…
dongjoon-hyun Sep 5, 2017
6a23254
[SPARK-18061][THRIFTSERVER] Add spnego auth support for ThriftServer …
jerryshao Sep 6, 2017
445f179
[SPARK-9104][CORE] Expose Netty memory metrics in Spark
jerryshao Sep 6, 2017
4ee7dfe
[SPARK-21924][DOCS] Update structured streaming programming guide doc
Sep 6, 2017
16c4c03
[SPARK-19357][ML] Adding parallel model evaluation in ML tuning
BryanCutler Sep 6, 2017
64936c1
[SPARK-21903][BUILD][FOLLOWUP] Upgrade scalastyle-maven-plugin and sc…
HyukjinKwon Sep 6, 2017
f2e22ae
[SPARK-21835][SQL] RewritePredicateSubquery should not produce unreso…
viirya Sep 6, 2017
36b48ee
[SPARK-21801][SPARKR][TEST] set random seed for predictable test
felixcheung Sep 6, 2017
acdf45f
[SPARK-21765] Check that optimization doesn't affect isStreaming bit.
jose-torres Sep 6, 2017
fa0092b
[SPARK-21901][SS] Define toString for StateOperatorProgress
jaceklaskowski Sep 6, 2017
aad2125
Fixed pandoc dependency issue in python/setup.py
Sep 7, 2017
ce7293c
[SPARK-21835][SQL][FOLLOW-UP] RewritePredicateSubquery should not pro…
viirya Sep 7, 2017
eea2b87
[SPARK-21912][SQL] ORC/Parquet table should not create invalid column…
dongjoon-hyun Sep 7, 2017
b9ab791
[SPARK-21890] Credentials not being passed to add the tokens
Sep 7, 2017
e00f1a1
[SPARK-13656][SQL] Delete spark.sql.parquet.cacheMetadata from SQLCon…
dongjoon-hyun Sep 7, 2017
c26976f
[SPARK-21939][TEST] Use TimeLimits instead of Timeouts
dongjoon-hyun Sep 8, 2017
57bc1e9
[SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTests2 should s…
ueshin Sep 8, 2017
f62b20f
[SPARK-21949][TEST] Tables created in unit tests should be dropped af…
10110346 Sep 8, 2017
6e37524
[SPARK-21726][SQL] Check for structural integrity of the plan in Opti…
viirya Sep 8, 2017
dbb8241
[SPARK-21936][SQL] backward compatibility test framework for HiveExte…
cloud-fan Sep 8, 2017
0dfc1ec
[SPARK-21726][SQL][FOLLOW-UP] Check for structural integrity of the p…
viirya Sep 8, 2017
8a4f228
[SPARK-21946][TEST] fix flaky test: "alter table: rename cached table…
kiszk Sep 8, 2017
8598d03
[SPARK-15243][ML][SQL][PYTHON] Add missing support for unicode in Par…
HyukjinKwon Sep 8, 2017
31c74fe
[SPARK-19866][ML][PYSPARK] Add local version of Word2Vec findSynonyms…
keypointt Sep 8, 2017
8a5eb50
[SPARK-21941] Stop storing unused attemptId in SQLTaskMetrics
ash211 Sep 9, 2017
6b45d7e
[SPARK-21954][SQL] JacksonUtils should verify MapType's value type in…
viirya Sep 9, 2017
e4d8f9a
[MINOR][SQL] Correct DataFrame doc.
yanboliang Sep 9, 2017
f767905
[SPARK-4131] Support "Writing data into the filesystem from queries"
janewangfb Sep 9, 2017
520d92a
[SPARK-20098][PYSPARK] dataType's typeName fix
szalai1 Sep 10, 2017
6273a71
[SPARK-21610][SQL] Corrupt records are not handled properly when crea…
jmchung Sep 11, 2017
828fab0
[BUILD][TEST][SPARKR] add sparksubmitsuite to appveyor tests
felixcheung Sep 11, 2017
4bab8f5
[SPARK-21856] Add probability and rawPrediction to MLPC for Python
chunshengji Sep 11, 2017
dc74c0e
[MINOR][SQL] remove unuse import class
heary-cao Sep 11, 2017
e2ac2f1
[SPARK-21976][DOC] Fix wrong documentation for Mean Absolute Error.
FavioVazquez Sep 12, 2017
dd78167
[SPARK-14516][ML] Adding ClusteringEvaluator with the implementation …
mgaido91 Sep 12, 2017
7d0a3ef
[SPARK-21610][SQL][FOLLOWUP] Corrupt records are not handled properly…
jmchung Sep 12, 2017
9575582
[DOCS] Fix unreachable links in the document
sarutak Sep 12, 2017
515910e
[SPARK-17642][SQL] support DESC EXTENDED/FORMATTED table column commands
wzhfy Sep 12, 2017
720c94f
[SPARK-21027][ML][PYTHON] Added tunable parallelism to one vs. rest i…
ajaysaini725 Sep 12, 2017
b9b54b1
[SPARK-21368][SQL] TPCDSQueryBenchmark can't refer query files.
sarutak Sep 12, 2017
c5f9b89
[SPARK-18608][ML] Fix double caching
zhengruifeng Sep 12, 2017
1a98574
[SPARK-21979][SQL] Improve QueryPlanConstraints framework
gengliangwang Sep 12, 2017
371e4e2
[SPARK-21513][SQL] Allow UDF to_json support converting MapType to json
goldmedal Sep 13, 2017
f6c5d8f
[SPARK-21027][MINOR][FOLLOW-UP] add missing since tag
WeichenXu123 Sep 13, 2017
dd88fa3
[BUILD] Close stale PRs
srowen Sep 13, 2017
a1d98c6
[SPARK-21982] Set locale to US
Gschiavon Sep 13, 2017
4fbf748
[SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behind a profile
srowen Sep 13, 2017
ca00cc7
[SPARK-21963][CORE][TEST] Create temp file should be delete after use
heary-cao Sep 13, 2017
0fa5b7c
[SPARK-21690][ML] one-pass imputer
zhengruifeng Sep 13, 2017
b6ef1f5
[SPARK-21970][CORE] Fix Redundant Throws Declarations in Java Codebase
original-brownbear Sep 13, 2017
21c4450
[SPARK-21980][SQL] References in grouping functions should be indexed…
DonnyZone Sep 13, 2017
8c7e19a
[SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.scala
janewangfb Sep 13, 2017
17edfec
[SPARK-20427][SQL] Read JDBC table use custom schema
wangyum Sep 13, 2017
8be7e6b
[SPARK-21973][SQL] Add an new option to filter queries in TPC-DS
maropu Sep 14, 2017
dcbb229
[MINOR][SQL] Only populate type metadata for required types such as C…
dilipbiswal Sep 14, 2017
8d8641f
[SPARK-21854] Added LogisticRegressionTrainingSummary for Multinomial…
Sep 14, 2017
66cb72d
[MINOR][DOC] Add missing call of `update()` in examples of PeriodicGr…
zhengruifeng Sep 14, 2017
c76153c
[SPARK-18608][ML][FOLLOWUP] Fix double caching for PySpark OneVsRest.
yanboliang Sep 14, 2017
4e6fc69
[SPARK-4131][FOLLOW-UP] Support "Writing data into the filesystem fro…
gatorsmile Sep 14, 2017
4b88393
[SPARK-21922] Fix duration always updating when task failed but statu…
caneGuy Sep 14, 2017
ddd7f5e
[SPARK-17642][SQL][FOLLOWUP] drop test tables and improve comments
Sep 14, 2017
054ddb2
[SPARK-21988] Add default stats to StreamingExecutionRelation.
jose-torres Sep 14, 2017
a28728a
[SPARK-21513][SQL][FOLLOWUP] Allow UDF to_json support converting Map…
goldmedal Sep 15, 2017
8866174
[SPARK-22018][SQL] Preserve top-level alias metadata when collapsing …
tdas Sep 15, 2017
22b111e
[SPARK-21902][CORE] Print root cause for BlockManager#doPut
caneGuy Sep 15, 2017
4decedf
[SPARK-22002][SQL] Read JDBC table use custom schema support specify …
wangyum Sep 15, 2017
3c6198c
[SPARK-21987][SQL] fix a compatibility issue of sql event logs
cloud-fan Sep 15, 2017
6197903
Merge branch 'master' into resync-apache
Sep 15, 2017
0e7940b
Resolve conflicts
Sep 15, 2017
51a2d0d
manifset
Sep 15, 2017
3d67f87
checkstyle
Sep 15, 2017
5e65a21
order
Sep 15, 2017
388cb48
correctly resolve conflicts
Sep 15, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behind a profile
## What changes were proposed in this pull request?

Put Kafka 0.8 support behind a kafka-0-8 profile.

## How was this patch tested?

Existing tests, but, until PR builder and Jenkins configs are updated the effect here is to not build or test Kafka 0.8 support at all.

Author: Sean Owen <[email protected]>

Closes apache#19134 from srowen/SPARK-21893.
  • Loading branch information
srowen committed Sep 13, 2017
commit 4fbf748bf85b18f32a2cd32b1b1881d24360626e
32 changes: 20 additions & 12 deletions dev/create-release/release-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,17 @@ NEXUS_PROFILE=d63f592e7eac0 # Profile for Spark staging uploads
BASE_DIR=$(pwd)

MVN="build/mvn --force"
PUBLISH_PROFILES="-Pmesos -Pyarn -Phive -Phive-thriftserver"
PUBLISH_PROFILES="$PUBLISH_PROFILES -Pspark-ganglia-lgpl -Pkinesis-asl"

# Hive-specific profiles for some builds
HIVE_PROFILES="-Phive -Phive-thriftserver"
# Profiles for publishing snapshots and release to Maven Central
PUBLISH_PROFILES="-Pmesos -Pyarn $HIVE_PROFILES -Pspark-ganglia-lgpl -Pkinesis-asl"
# Profiles for building binary releases
BASE_RELEASE_PROFILES="-Pmesos -Pyarn -Psparkr"
# Scala 2.11 only profiles for some builds
SCALA_2_11_PROFILES="-Pkafka-0-8"
# Scala 2.12 only profiles for some builds
SCALA_2_12_PROFILES="-Pscala-2.12"

rm -rf spark
git clone https://git-wip-us.apache.org/repos/asf/spark.git
Expand Down Expand Up @@ -235,10 +244,9 @@ if [[ "$1" == "package" ]]; then

# We increment the Zinc port each time to avoid OOM's and other craziness if multiple builds
# share the same Zinc server.
FLAGS="-Psparkr -Phive -Phive-thriftserver -Pyarn -Pmesos"
make_binary_release "hadoop2.6" "-Phadoop-2.6 $FLAGS" "3035" "withr" &
make_binary_release "hadoop2.7" "-Phadoop-2.7 $FLAGS" "3036" "withpip" &
make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn -Pmesos" "3038" &
make_binary_release "hadoop2.6" "-Phadoop-2.6 $HIVE_PROFILES $SCALA_2_11_PROFILES $BASE_RELEASE_PROFILES" "3035" "withr" &
make_binary_release "hadoop2.7" "-Phadoop-2.7 $HIVE_PROFILES $SCALA_2_11_PROFILES $BASE_RELEASE_PROFILES" "3036" "withpip" &
make_binary_release "without-hadoop" "-Phadoop-provided $SCALA_2_11_PROFILES $BASE_RELEASE_PROFILES" "3038" &
wait
rm -rf spark-$SPARK_VERSION-bin-*/

Expand Down Expand Up @@ -304,10 +312,10 @@ if [[ "$1" == "publish-snapshot" ]]; then
# Generate random point for Zinc
export ZINC_PORT=$(python -S -c "import random; print random.randrange(3030,4030)")

$MVN -DzincPort=$ZINC_PORT --settings $tmp_settings -DskipTests $PUBLISH_PROFILES deploy
$MVN -DzincPort=$ZINC_PORT --settings $tmp_settings -DskipTests $SCALA_2_11_PROFILES $PUBLISH_PROFILES deploy
#./dev/change-scala-version.sh 2.12
#$MVN -DzincPort=$ZINC_PORT -Pscala-2.12 --settings $tmp_settings \
# -DskipTests $PUBLISH_PROFILES clean deploy
#$MVN -DzincPort=$ZINC_PORT --settings $tmp_settings \
# -DskipTests $SCALA_2_12_PROFILES $PUBLISH_PROFILES clean deploy

# Clean-up Zinc nailgun process
/usr/sbin/lsof -P |grep $ZINC_PORT | grep LISTEN | awk '{ print $2; }' | xargs kill
Expand Down Expand Up @@ -340,11 +348,11 @@ if [[ "$1" == "publish-release" ]]; then
# Generate random point for Zinc
export ZINC_PORT=$(python -S -c "import random; print random.randrange(3030,4030)")

$MVN -DzincPort=$ZINC_PORT -Dmaven.repo.local=$tmp_repo -DskipTests $PUBLISH_PROFILES clean install
$MVN -DzincPort=$ZINC_PORT -Dmaven.repo.local=$tmp_repo -DskipTests $SCALA_2_11_PROFILES $PUBLISH_PROFILES clean install

#./dev/change-scala-version.sh 2.12
#$MVN -DzincPort=$ZINC_PORT -Dmaven.repo.local=$tmp_repo -Pscala-2.12 \
# -DskipTests $PUBLISH_PROFILES clean install
#$MVN -DzincPort=$ZINC_PORT -Dmaven.repo.local=$tmp_repo \
# -DskipTests $SCALA_2_12_PROFILES §$PUBLISH_PROFILES clean install

# Clean-up Zinc nailgun process
/usr/sbin/lsof -P |grep $ZINC_PORT | grep LISTEN | awk '{ print $2; }' | xargs kill
Expand Down
2 changes: 1 addition & 1 deletion dev/mima
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ set -e
FWDIR="$(cd "`dirname "$0"`"/..; pwd)"
cd "$FWDIR"

SPARK_PROFILES="-Pmesos -Pyarn -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver -Phive"
SPARK_PROFILES="-Pmesos -Pkafka-0-8 -Pyarn -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver -Phive"
TOOLS_CLASSPATH="$(build/sbt -DcopyDependencies=false "export tools/fullClasspath" | tail -n1)"
OLD_DEPS_CLASSPATH="$(build/sbt -DcopyDependencies=false $SPARK_PROFILES "export oldDeps/fullClasspath" | tail -n1)"

Expand Down
1 change: 1 addition & 0 deletions dev/scalastyle
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ ERRORS=$(echo -e "q\n" \
| build/sbt \
-Pkinesis-asl \
-Pmesos \
-Pkafka-0-8 \
-Pyarn \
-Phive \
-Phive-thriftserver \
Expand Down
6 changes: 6 additions & 0 deletions dev/sparktestsupport/modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,12 @@ def __hash__(self):
"external/kafka-0-8",
"external/kafka-0-8-assembly",
],
build_profile_flags=[
"-Pkafka-0-8",
],
environ={
"ENABLE_KAFKA_0_8_TESTS": "1"
},
sbt_test_goals=[
"streaming-kafka-0-8/test",
]
Expand Down
2 changes: 1 addition & 1 deletion dev/test-dependencies.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ export LC_ALL=C
# TODO: This would be much nicer to do in SBT, once SBT supports Maven-style resolution.

# NOTE: These should match those in the release publishing script
HADOOP2_MODULE_PROFILES="-Phive-thriftserver -Pmesos -Pyarn -Phive"
HADOOP2_MODULE_PROFILES="-Phive-thriftserver -Pmesos -Pkafka-0-8 -Pyarn -Phive"
MVN="build/mvn"
HADOOP_PROFILES=(
hadoop-2.6
Expand Down
9 changes: 9 additions & 0 deletions docs/building-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,15 @@ like ZooKeeper and Hadoop itself.
## Building with Mesos support

./build/mvn -Pmesos -DskipTests clean package

## Building with Kafka 0.8 support

Kafka 0.8 support must be explicitly enabled with the `kafka-0-8` profile.
Note: Kafka 0.8 support is deprecated as of Spark 2.3.0.

./build/mvn -Pkafka-0-8 -DskipTests clean package

Kafka 0.10 support is still automatically built.

## Building submodules individually

Expand Down
23 changes: 10 additions & 13 deletions docs/streaming-kafka-0-8-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
layout: global
title: Spark Streaming + Kafka Integration Guide (Kafka broker version 0.8.2.1 or higher)
---

**Note: Kafka 0.8 support is deprecated as of Spark 2.3.0.**

Here we explain how to configure Spark Streaming to receive data from Kafka. There are two approaches to this - the old approach using Receivers and Kafka's high-level API, and a new approach (introduced in Spark 1.3) without using Receivers. They have different programming models, performance characteristics, and semantics guarantees, so read on for more details. Both approaches are considered stable APIs as of the current version of Spark.

## Approach 1: Receiver-based Approach
Expand All @@ -28,8 +31,7 @@ Next, we discuss how to use this approach in your streaming application.
val kafkaStream = KafkaUtils.createStream(streamingContext,
[ZK quorum], [consumer group id], [per-topic number of Kafka partitions to consume])

You can also specify the key and value classes and their corresponding decoder classes using variations of `createStream`. See the [API docs](api/scala/index.html#org.apache.spark.streaming.kafka.KafkaUtils$)
and the [example]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala).
You can also specify the key and value classes and their corresponding decoder classes using variations of `createStream`. See the [API docs](api/scala/index.html#org.apache.spark.streaming.kafka.KafkaUtils$).
</div>
<div data-lang="java" markdown="1">
import org.apache.spark.streaming.kafka.*;
Expand All @@ -38,8 +40,7 @@ Next, we discuss how to use this approach in your streaming application.
KafkaUtils.createStream(streamingContext,
[ZK quorum], [consumer group id], [per-topic number of Kafka partitions to consume]);

You can also specify the key and value classes and their corresponding decoder classes using variations of `createStream`. See the [API docs](api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html)
and the [example]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java).
You can also specify the key and value classes and their corresponding decoder classes using variations of `createStream`. See the [API docs](api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html).

</div>
<div data-lang="python" markdown="1">
Expand All @@ -48,8 +49,7 @@ Next, we discuss how to use this approach in your streaming application.
kafkaStream = KafkaUtils.createStream(streamingContext, \
[ZK quorum], [consumer group id], [per-topic number of Kafka partitions to consume])

By default, the Python API will decode Kafka data as UTF8 encoded strings. You can specify your custom decoding function to decode the byte arrays in Kafka records to any arbitrary data type. See the [API docs](api/python/pyspark.streaming.html#pyspark.streaming.kafka.KafkaUtils)
and the [example]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/python/streaming/kafka_wordcount.py).
By default, the Python API will decode Kafka data as UTF8 encoded strings. You can specify your custom decoding function to decode the byte arrays in Kafka records to any arbitrary data type. See the [API docs](api/python/pyspark.streaming.html#pyspark.streaming.kafka.KafkaUtils).
</div>
</div>

Expand All @@ -71,7 +71,7 @@ Next, we discuss how to use this approach in your streaming application.
./bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}} ...

Alternatively, you can also download the JAR of the Maven artifact `spark-streaming-kafka-0-8-assembly` from the
[Maven repository](http://search.maven.org/#search|ga|1|a%3A%22spark-streaming-kafka-0-8-assembly_{{site.SCALA_BINARY_VERSION}}%22%20AND%20v%3A%22{{site.SPARK_VERSION_SHORT}}%22) and add it to `spark-submit` with `--jars`.
[Maven repository](https://search.maven.org/#search|ga|1|a%3A%22spark-streaming-kafka-0-8-assembly_{{site.SCALA_BINARY_VERSION}}%22%20AND%20v%3A%22{{site.SPARK_VERSION_SHORT}}%22) and add it to `spark-submit` with `--jars`.

## Approach 2: Direct Approach (No Receivers)
This new receiver-less "direct" approach has been introduced in Spark 1.3 to ensure stronger end-to-end guarantees. Instead of using receivers to receive data, this approach periodically queries Kafka for the latest offsets in each topic+partition, and accordingly defines the offset ranges to process in each batch. When the jobs to process the data are launched, Kafka's simple consumer API is used to read the defined ranges of offsets from Kafka (similar to read files from a file system). Note that this feature was introduced in Spark 1.3 for the Scala and Java API, in Spark 1.4 for the Python API.
Expand Down Expand Up @@ -105,8 +105,7 @@ Next, we discuss how to use this approach in your streaming application.
streamingContext, [map of Kafka parameters], [set of topics to consume])

You can also pass a `messageHandler` to `createDirectStream` to access `MessageAndMetadata` that contains metadata about the current message and transform it to any desired type.
See the [API docs](api/scala/index.html#org.apache.spark.streaming.kafka.KafkaUtils$)
and the [example]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/scala/org/apache/spark/examples/streaming/DirectKafkaWordCount.scala).
See the [API docs](api/scala/index.html#org.apache.spark.streaming.kafka.KafkaUtils$).
</div>
<div data-lang="java" markdown="1">
import org.apache.spark.streaming.kafka.*;
Expand All @@ -117,17 +116,15 @@ Next, we discuss how to use this approach in your streaming application.
[map of Kafka parameters], [set of topics to consume]);

You can also pass a `messageHandler` to `createDirectStream` to access `MessageAndMetadata` that contains metadata about the current message and transform it to any desired type.
See the [API docs](api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html)
and the [example]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/java/org/apache/spark/examples/streaming/JavaDirectKafkaWordCount.java).
See the [API docs](api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html).

</div>
<div data-lang="python" markdown="1">
from pyspark.streaming.kafka import KafkaUtils
directKafkaStream = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": brokers})

You can also pass a `messageHandler` to `createDirectStream` to access `KafkaMessageAndMetadata` that contains metadata about the current message and transform it to any desired type.
By default, the Python API will decode Kafka data as UTF8 encoded strings. You can specify your custom decoding function to decode the byte arrays in Kafka records to any arbitrary data type. See the [API docs](api/python/pyspark.streaming.html#pyspark.streaming.kafka.KafkaUtils)
and the [example]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/python/streaming/direct_kafka_wordcount.py).
By default, the Python API will decode Kafka data as UTF8 encoded strings. You can specify your custom decoding function to decode the byte arrays in Kafka records to any arbitrary data type. See the [API docs](api/python/pyspark.streaming.html#pyspark.streaming.kafka.KafkaUtils).
</div>
</div>

Expand Down
11 changes: 6 additions & 5 deletions docs/streaming-kafka-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@ layout: global
title: Spark Streaming + Kafka Integration Guide
---

[Apache Kafka](http://kafka.apache.org/) is publish-subscribe messaging rethought as a distributed, partitioned, replicated commit log service. Please read the [Kafka documentation](http://kafka.apache.org/documentation.html) thoroughly before starting an integration using Spark.
[Apache Kafka](https://kafka.apache.org/) is publish-subscribe messaging rethought as a distributed, partitioned, replicated commit log service. Please read the [Kafka documentation](https://kafka.apache.org/documentation.html) thoroughly before starting an integration using Spark.

The Kafka project introduced a new consumer api between versions 0.8 and 0.10, so there are 2 separate corresponding Spark Streaming packages available. Please choose the correct package for your brokers and desired features; note that the 0.8 integration is compatible with later 0.9 and 0.10 brokers, but the 0.10 integration is not compatible with earlier brokers.
The Kafka project introduced a new consumer API between versions 0.8 and 0.10, so there are 2 separate corresponding Spark Streaming packages available. Please choose the correct package for your brokers and desired features; note that the 0.8 integration is compatible with later 0.9 and 0.10 brokers, but the 0.10 integration is not compatible with earlier brokers.

**Note: Kafka 0.8 support is deprecated as of Spark 2.3.0.**

<table class="table">
<tr><th></th><th><a href="streaming-kafka-0-8-integration.html">spark-streaming-kafka-0-8</a></th><th><a href="streaming-kafka-0-10-integration.html">spark-streaming-kafka-0-10</a></th></tr>
Expand All @@ -16,9 +17,9 @@ The Kafka project introduced a new consumer api between versions 0.8 and 0.10, s
<td>0.10.0 or higher</td>
</tr>
<tr>
<td>Api Stability</td>
<td>API Maturity</td>
<td>Deprecated</td>
<td>Stable</td>
<td>Experimental</td>
</tr>
<tr>
<td>Language Support</td>
Expand All @@ -41,7 +42,7 @@ The Kafka project introduced a new consumer api between versions 0.8 and 0.10, s
<td>Yes</td>
</tr>
<tr>
<td>Offset Commit Api</td>
<td>Offset Commit API</td>
<td>No</td>
<td>Yes</td>
</tr>
Expand Down
6 changes: 3 additions & 3 deletions docs/streaming-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -401,14 +401,14 @@ some of the common ones are as follows.

<table class="table">
<tr><th>Source</th><th>Artifact</th></tr>
<tr><td> Kafka </td><td> spark-streaming-kafka-0-8_{{site.SCALA_BINARY_VERSION}} </td></tr>
<tr><td> Kafka </td><td> spark-streaming-kafka-0-10_{{site.SCALA_BINARY_VERSION}} </td></tr>
<tr><td> Flume </td><td> spark-streaming-flume_{{site.SCALA_BINARY_VERSION}} </td></tr>
<tr><td> Kinesis<br/></td><td>spark-streaming-kinesis-asl_{{site.SCALA_BINARY_VERSION}} [Amazon Software License] </td></tr>
<tr><td></td><td></td></tr>
</table>

For an up-to-date list, please refer to the
[Maven repository](http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%22{{site.SPARK_VERSION_SHORT}}%22)
[Maven repository](https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%22{{site.SPARK_VERSION_SHORT}}%22)
for the full list of supported sources and artifacts.

***
Expand Down Expand Up @@ -1899,7 +1899,7 @@ To run a Spark Streaming applications, you need to have the following.
if your application uses [advanced sources](#advanced-sources) (e.g. Kafka, Flume),
then you will have to package the extra artifact they link to, along with their dependencies,
in the JAR that is used to deploy the application. For example, an application using `KafkaUtils`
will have to include `spark-streaming-kafka-0-8_{{site.SCALA_BINARY_VERSION}}` and all its
will have to include `spark-streaming-kafka-0-10_{{site.SCALA_BINARY_VERSION}}` and all its
transitive dependencies in the application JAR.

- *Configuring sufficient memory for the executors* - Since the received data must be stored in
Expand Down
2 changes: 1 addition & 1 deletion examples/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-8_${scala.binary.version}</artifactId>
<artifactId>spark-streaming-kafka-0-10_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,13 @@

import scala.Tuple2;

import kafka.serializer.StringDecoder;
import org.apache.kafka.clients.consumer.ConsumerRecord;

import org.apache.spark.SparkConf;
import org.apache.spark.streaming.api.java.*;
import org.apache.spark.streaming.kafka.KafkaUtils;
import org.apache.spark.streaming.kafka010.ConsumerStrategies;
import org.apache.spark.streaming.kafka010.KafkaUtils;
import org.apache.spark.streaming.kafka010.LocationStrategies;
import org.apache.spark.streaming.Durations;

/**
Expand Down Expand Up @@ -65,22 +67,17 @@ public static void main(String[] args) throws Exception {
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));

Set<String> topicsSet = new HashSet<>(Arrays.asList(topics.split(",")));
Map<String, String> kafkaParams = new HashMap<>();
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put("metadata.broker.list", brokers);

// Create direct kafka stream with brokers and topics
JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(
JavaInputDStream<ConsumerRecord<String, String>> messages = KafkaUtils.createDirectStream(
jssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
kafkaParams,
topicsSet
);
LocationStrategies.PreferConsistent(),
ConsumerStrategies.Subscribe(topicsSet, kafkaParams));

// Get the lines, split them into words, count the words and print
JavaDStream<String> lines = messages.map(Tuple2::_2);
JavaDStream<String> lines = messages.map(ConsumerRecord::value);
JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(SPACE.split(x)).iterator());
JavaPairDStream<String, Integer> wordCounts = words.mapToPair(s -> new Tuple2<>(s, 1))
.reduceByKey((i1, i2) -> i1 + i2);
Expand Down
Loading