Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
1286 commits
Select commit Hold shift + click to select a range
517bdf3
[doc][streaming] Fixed broken link in mllib section
BenFradet Apr 20, 2015
ce7ddab
[SPARK-6368][SQL] Build a specialized serializer for Exchange operator.
yhuai Apr 21, 2015
c736220
[SPARK-6635][SQL] DataFrame.withColumn should replace columns with id…
viirya Apr 21, 2015
8136810
[SPARK-6490][Core] Add spark.rpc.* and deprecate spark.akka.*
zsxwing Apr 21, 2015
ab9128f
[SPARK-6949] [SQL] [PySpark] Support Date/Timestamp in Column expression
Apr 21, 2015
1f2f723
[SPARK-5990] [MLLIB] Model import/export for IsotonicRegression
yanboliang Apr 21, 2015
5fea3e5
[SPARK-6985][streaming] Receiver maxRate over 1000 causes a StackOver…
Apr 21, 2015
c035c0f
[SPARK-5360] [SPARK-6606] Eliminate duplicate objects in serialized C…
kayousterhout Apr 21, 2015
c25ca7c
SPARK-3276 Added a new configuration spark.streaming.minRememberDuration
emres Apr 21, 2015
45c47fa
[SPARK-6845] [MLlib] [PySpark] Add isTranposed flag to DenseMatrix
MechCoder Apr 21, 2015
04bf34e
[SPARK-7011] Build(compilation) fails with scala 2.11 option, because…
ScrapCodes Apr 21, 2015
2e8c6ca
[SPARK-6994] Allow to fetch field values by name in sql.Row
Apr 21, 2015
03fd921
[SQL][minor] make it more clear that we only need to re-throw GetFiel…
cloud-fan Apr 21, 2015
6265cba
[SPARK-6969][SQL] Refresh the cached table when REFRESH TABLE is used
yhuai Apr 21, 2015
2a24bf9
[SPARK-6996][SQL] Support map types in java beans
Apr 21, 2015
7662ec2
[SPARK-5817] [SQL] Fix bug of udtf with column names
chenghao-intel Apr 21, 2015
f83c0f1
[SPARK-3386] Share and reuse SerializerInstances in shuffle paths
JoshRosen Apr 21, 2015
a70e849
[minor] [build] Set java options when generating mima ignores.
Apr 21, 2015
7fe6142
[SPARK-6065] [MLlib] Optimize word2vec.findSynonyms using blas calls
MechCoder Apr 21, 2015
686dd74
[SPARK-7036][MLLIB] ALS.train should support DataFrames in PySpark
mengxr Apr 21, 2015
ae036d0
[Minor][MLLIB] Fix a minor formatting bug in toString method in Node.…
Apr 21, 2015
b063a61
Avoid warning message about invalid refuse_seconds value in Mesos >=0…
Apr 22, 2015
e72c16e
[SPARK-6014] [core] Revamp Spark shutdown hooks, fix shutdown races.
Apr 22, 2015
3134c3f
[SPARK-6953] [PySpark] speed up python tests
rxin Apr 22, 2015
41ef78a
Closes #5427
rxin Apr 22, 2015
a0761ec
[SPARK-1684] [PROJECT INFRA] Merge script should standardize SPARK-XX…
texasmichelle Apr 22, 2015
3a3f710
[SPARK-6490][Docs] Add docs for rpc configurations
zsxwing Apr 22, 2015
70f9f8f
[MINOR] Comment improvements in ExternalSorter.
pwendell Apr 22, 2015
607eff0
[SPARK-6113] [ML] Small cleanups after original tree API PR
jkbradley Apr 22, 2015
bdc5c16
[SPARK-6889] [DOCS] CONTRIBUTING.md updates to accompany contribution…
srowen Apr 22, 2015
33b8562
[SPARK-7052][Core] Add ThreadUtils and move thread methods from Utils…
zsxwing Apr 22, 2015
cdf0328
[SQL] Rename some apply functions.
rxin Apr 22, 2015
fbe7106
[SPARK-7039][SQL]JDBCRDD: Add support on type NVARCHAR
szheng79 Apr 22, 2015
baf865d
[SPARK-7059][SQL] Create a DataFrame join API to facilitate equijoin.
rxin Apr 22, 2015
f4f3998
[SPARK-6827] [MLLIB] Wrap FPGrowthModel.freqItemsets and make it cons…
yanboliang Apr 23, 2015
04525c0
[SPARK-6967] [SQL] fix date type convertion in jdbcrdd
adrian-wang Apr 23, 2015
b69c4f9
Disable flaky test: ReceiverSuite "block generator throttling".
rxin Apr 23, 2015
1b85e08
[MLlib] UnaryTransformer nullability should not depend on PrimitiveType.
rxin Apr 23, 2015
d206860
[SPARK-7066][MLlib] VectorAssembler should use NumericType not Native…
rxin Apr 23, 2015
03e85b4
[SPARK-7046] Remove InputMetrics from BlockResult
kayousterhout Apr 23, 2015
d9e70f3
[HOTFIX][SQL] Fix broken cached test
viirya Apr 23, 2015
2d33323
[MLlib] Add support for BooleanType to VectorAssembler.
rxin Apr 23, 2015
29163c5
[SPARK-7068][SQL] Remove PrimitiveType
rxin Apr 23, 2015
f60bece
[SPARK-7069][SQL] Rename NativeType -> AtomicType.
rxin Apr 23, 2015
a7d65d3
[HOTFIX] [SQL] Fix compilation for scala 2.11.
ScrapCodes Apr 23, 2015
975f53e
[minor][streaming]fixed scala string interpolation error
Apr 23, 2015
cc48e63
[SPARK-7044] [SQL] Fix the deadlock in script transformation
chenghao-intel Apr 23, 2015
534f2a4
[SPARK-6752][Streaming] Allow StreamingContext to be recreated from c…
tdas Apr 23, 2015
c1213e6
[SPARK-7055][SQL]Use correct ClassLoader for JDBC Driver in JDBCRDD.g…
Apr 23, 2015
6afde2c
[SPARK-7058] Include RDD deserialization time in "task deserializatio…
JoshRosen Apr 23, 2015
3e91cc2
[SPARK-7085][MLlib] Fix miniBatchFraction parameter in train method c…
Apr 23, 2015
baa83a9
[SPARK-6879] [HISTORYSERVER] check if app is completed before clean i…
WangTaoTheTonic Apr 23, 2015
6d0749c
[SPARK-7087] [BUILD] Fix path issue change version script
Apr 23, 2015
1ed46a6
[SPARK-7070] [MLLIB] LDA.setBeta should call setTopicConcentration.
mengxr Apr 23, 2015
6220d93
[SQL] Break dataTypes.scala into multiple files.
rxin Apr 23, 2015
73db132
[SPARK-6818] [SPARKR] Support column deletion in SparkR DataFrame API.
Apr 23, 2015
336f7f5
[SPARK-7037] [CORE] Inconsistent behavior for non-spark config proper…
Apr 24, 2015
2d010f7
[SPARK-7060][SQL] Add alias function to python dataframe
yhuai Apr 24, 2015
67bccbd
Update sql-programming-guide.md
kgeis Apr 24, 2015
d3a302d
[SQL] Fixed expression data type matching.
rxin Apr 24, 2015
4c722d7
Fixed a typo from the previous commit.
rxin Apr 24, 2015
8509519
[SPARK-5894] [ML] Add polynomial mapper
yinxusen Apr 24, 2015
78b39c7
[SPARK-7115] [MLLIB] skip the very first 1 in poly expansion
mengxr Apr 24, 2015
6e57d57
[SPARK-6528] [ML] Add IDF transformer
yinxusen Apr 24, 2015
ebb77b2
[SPARK-7033] [SPARKR] Clean usage of split. Use partition instead whe…
Apr 24, 2015
caf0136
[SPARK-6852] [SPARKR] Accept numeric as numPartitions in SparkR.
Apr 24, 2015
438859e
[SPARK-6122] [CORE] Upgrade tachyon-client version to 0.6.3
calvinjia Apr 24, 2015
d874f8b
[PySpark][Minor] Update sql example, so that can read file correctly
Sephiroth-Lin Apr 25, 2015
59b7cfc
[SPARK-7136][Docs] Spark SQL and DataFrame Guide fix example file and…
dbsiegel Apr 25, 2015
cca9905
update the deprecated CountMinSketchMonoid function to TopPctCMS func…
caikehe Apr 25, 2015
a61d65f
Revert "[SPARK-6752][Streaming] Allow StreamingContext to be recreate…
pwendell Apr 25, 2015
a7160c4
[SPARK-6113] [ML] Tree ensembles for Pipelines API
jkbradley Apr 25, 2015
aa6966f
[SQL] Update SQL readme to include instructions on generating golden …
yhuai Apr 25, 2015
a11c868
[SPARK-7092] Update spark scala version to 2.11.6
ScrapCodes Apr 25, 2015
f5473c2
[SPARK-6014] [CORE] [HOTFIX] Add try-catch block around ShutDownHook
nishkamravi2 Apr 26, 2015
9a5bbe0
[MINOR] [MLLIB] Refactor toString method in MLLIB
Apr 26, 2015
ca55dc9
[SPARK-7152][SQL] Add a Column expression for partition ID.
rxin Apr 26, 2015
d188b8b
[SQL][Minor] rename DataTypeParser.apply to DataTypeParser.parse
scwf Apr 27, 2015
82bb7fd
[SPARK-6505] [SQL] Remove the reflection call in HiveFunctionWrapper
baishuo Apr 27, 2015
998aac2
[SPARK-4925] Publish Spark SQL hive-thriftserver maven artifact
chernetsov Apr 27, 2015
7078f60
[SPARK-6856] [R] Make RDD information more useful in SparkR
Jeffrharr Apr 27, 2015
ef82bdd
SPARK-7107 Add parameter for zookeeper.znode.parent to hbase_inputfor…
tedyu Apr 27, 2015
ca9f4eb
[SPARK-6991] [SPARKR] Adds support for zipPartitions.
hlin09 Apr 27, 2015
b9de9e0
[SPARK-7103] Fix crash with SparkContext.union when RDD has no partit…
stshe Apr 27, 2015
8e1c00d
[SPARK-6738] [CORE] Improve estimate the size of a large array
shenh062326 Apr 27, 2015
5d45e1f
[SPARK-3090] [CORE] Stop SparkContext if user forgets to.
Apr 27, 2015
ab5adb7
[SPARK-7145] [CORE] commons-lang (2.x) classes used instead of common…
srowen Apr 27, 2015
62888a4
[SPARK-7162] [YARN] Launcher error in yarn-client
witgo Apr 27, 2015
4d9e560
[SPARK-7090] [MLLIB] Introduce LDAOptimizer to LDA to further improve…
hhbyyh Apr 28, 2015
874a2ca
[SPARK-7174][Core] Move calling `TaskScheduler.executorHeartbeatRecei…
zsxwing Apr 28, 2015
29576e7
[SPARK-6829] Added math functions for DataFrames
brkyvz Apr 28, 2015
9e4e82b
[SPARK-5946] [STREAMING] Add Python API for direct Kafka stream
jerryshao Apr 28, 2015
bf35edd
[SPARK-7187] SerializationDebugger should not crash user code
Apr 28, 2015
d94cd1a
[SPARK-7135][SQL] DataFrame expression for monotonically increasing IDs.
rxin Apr 28, 2015
e13cd86
[SPARK-6352] [SQL] Custom parquet output committer
Apr 28, 2015
7f3b3b7
[SPARK-7168] [BUILD] Update plugin versions in Maven build and centra…
srowen Apr 28, 2015
75905c5
[SPARK-7100] [MLLIB] Fix persisted RDD leak in GradientBoostTrees
Apr 28, 2015
268c419
[SPARK-6435] spark-shell --jars option does not add all jars to class…
tsudukim Apr 28, 2015
6a827d5
[SPARK-5253] [ML] LinearRegression with L1/L2 (ElasticNet) using OWLQN
Apr 28, 2015
b14cd23
[SPARK-7140] [MLLIB] only scan the first 16 entries in Vector.hashCode
mengxr Apr 28, 2015
52ccf1d
[Core][test][minor] replace try finally block with tryWithSafeFinally
liyezhang556520 Apr 28, 2015
8aab94d
[SPARK-4286] Add an external shuffle service that can be run as a dae…
dragos Apr 28, 2015
2d222fb
[SPARK-5932] [CORE] Use consistent naming for size properties
Apr 28, 2015
8009810
[SPARK-6314] [CORE] handle JsonParseException for history server
liyezhang556520 Apr 28, 2015
53befac
[SPARK-5338] [MESOS] Add cluster mode support for Mesos
tnachen Apr 28, 2015
28b1af7
[MINOR] [CORE] Warn users who try to cache RDDs with dynamic allocati…
Apr 28, 2015
f0a1f90
[SPARK-7201] [MLLIB] move Identifiable to ml.util
mengxr Apr 28, 2015
555213e
Closes #4807
mengxr Apr 28, 2015
d36e673
[SPARK-6965] [MLLIB] StringIndexer handles numeric input.
mengxr Apr 29, 2015
5c8f4bd
[SPARK-7138] [STREAMING] Add method to BlockGenerator to add multiple…
tdas Apr 29, 2015
a8aeadb
[SPARK-7208] [ML] [PYTHON] Added Matrix, SparseMatrix to __all__ list…
jkbradley Apr 29, 2015
5ef006f
[SPARK-6756] [MLLIB] add toSparse, toDense, numActives, numNonzeros, …
mengxr Apr 29, 2015
271c4c6
[SPARK-7215] made coalesce and repartition a part of the query plan
brkyvz Apr 29, 2015
f98773a
[SPARK-7205] Support `.ivy2/local` and `.m2/repositories/` in --packages
brkyvz Apr 29, 2015
8dee274
MAINTENANCE: Automated closing of pull requests.
pwendell Apr 29, 2015
fe917f5
[SPARK-7188] added python support for math DataFrame functions
brkyvz Apr 29, 2015
1fd6ed9
[SPARK-7204] [SQL] Fix callSite for Dataframe and SQL operations
pwendell Apr 29, 2015
f49284b
[SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use managed memory for aggr…
JoshRosen Apr 29, 2015
baed3f2
[SPARK-6918] [YARN] Secure HBase support.
deanchen Apr 29, 2015
687273d
[SPARK-7223] Rename RPC askWithReply -> askWithReply, sendWithReply -…
rxin Apr 29, 2015
3df9c5d
Better error message on access to non-existing attribute
ksonj Apr 29, 2015
81ea42b
[SQL][Minor] fix java doc for DataFrame.agg
cloud-fan Apr 29, 2015
c0c0ba6
Fix a typo of "threshold"
yinxusen Apr 29, 2015
1868bd4
[SPARK-7056] [STREAMING] Make the Write Ahead Log pluggable
tdas Apr 29, 2015
a9c4e29
[SPARK-6752] [STREAMING] [REOPENED] Allow StreamingContext to be recr…
tdas Apr 29, 2015
3a180c1
[SPARK-6629] cancelJobGroup() may not work for jobs whose job groups …
JoshRosen Apr 29, 2015
15995c8
[SPARK-7222] [ML] Added mathematical derivation in comment and compre…
Apr 29, 2015
c9d530e
[SPARK-6529] [ML] Add Word2Vec transformer
yinxusen Apr 29, 2015
d7dbce8
[SPARK-7156][SQL] support RandomSplit in DataFrames
brkyvz Apr 29, 2015
7f4b583
[SPARK-7181] [CORE] fix inifite loop in Externalsorter's mergeWithAgg…
chouqin Apr 29, 2015
3fc6cfd
[SPARK-7155] [CORE] Allow newAPIHadoopFile to support comma-separated…
yongtang Apr 29, 2015
f8cbb0a
[SPARK-7229] [SQL] SpecificMutableRow should take integer type as int…
chenghao-intel Apr 29, 2015
b1ef6a6
[SPARK-7259] [ML] VectorIndexer: do not copy non-ML metadata to outpu…
jkbradley Apr 29, 2015
1fdfdb4
[SQL] [Minor] Print detail query execution info when spark answer is …
scwf Apr 30, 2015
114bad6
[SPARK-7176] [ML] Add validation functionality to Param
jkbradley Apr 30, 2015
1b7106b
[SPARK-6862] [STREAMING] [WEBUI] Add BatchPage to display details of …
zsxwing Apr 30, 2015
7143f6e
[SPARK-7234][SQL] Fix DateType mismatch when codegen on.
Apr 30, 2015
5553198
[SPARK-7156][SQL] Addressed follow up comments for randomSplit
brkyvz Apr 30, 2015
ba49eb1
Some code clean up.
Apr 30, 2015
4459514
[SPARK-7225][SQL] CombineLimits optimizer does not work
pzzs Apr 30, 2015
254e050
[SPARK-1406] Mllib pmml model export
selvinsource Apr 30, 2015
47bf406
[HOTFIX] Disabling flaky test (fix in progress as part of SPARK-7224)
pwendell Apr 30, 2015
7dacc08
[SPARK-7224] added mock repository generator for --packages tests
brkyvz Apr 30, 2015
6c65da6
[SPARK-5342] [YARN] Allow long running Spark apps to run on secure YA…
harishreedharan Apr 30, 2015
adbdb19
[SPARK-7207] [ML] [BUILD] Added ml.recommendation, ml.regression to S…
jkbradley Apr 30, 2015
e0628f2
Revert "[SPARK-5342] [YARN] Allow long running Spark apps to run on s…
pwendell Apr 30, 2015
6702324
[SPARK-7196][SQL] Support precision and scale of decimal type for JDBC
viirya Apr 30, 2015
07a8620
[SPARK-7288] Suppress compiler warnings due to use of sun.misc.Unsafe…
JoshRosen Apr 30, 2015
77cc25f
[SPARK-7267][SQL]Push down Project when it's child is Limit
pzzs Apr 30, 2015
fa01bec
[Build] Enable MiMa checks for SQL
JoshRosen Apr 30, 2015
1c3e402
[SPARK-7279] Removed diffSum which is theoretical zero in LinearRegre…
Apr 30, 2015
149b3ee
[SPARK-7242][SQL][MLLIB] Frequent items for DataFrames
brkyvz Apr 30, 2015
ee04413
[SPARK-7280][SQL] Add "drop" column/s on a data frame
rakeshchalasani May 1, 2015
0797338
[SPARK-7093] [SQL] Using newPredicate in NestedLoopJoin to enable cod…
scwf May 1, 2015
a0d8a61
[SPARK-7109] [SQL] Push down left side filter for left semi join
scwf May 1, 2015
e991255
[SPARK-6913][SQL] Fixed "java.sql.SQLException: No suitable driver fo…
SlavikBaranov May 1, 2015
3ba5aaa
[SPARK-5213] [SQL] Pluggable SQL Parser Support
chenghao-intel May 1, 2015
473552f
[SPARK-7123] [SQL] support table.star in sqlcontext
scwf May 1, 2015
beeafcf
Revert "[SPARK-5213] [SQL] Pluggable SQL Parser Support"
pwendell May 1, 2015
69a739c
[SPARK-7282] [STREAMING] Fix the race conditions in StreamingListener…
zsxwing May 1, 2015
b5347a4
[SPARK-7248] implemented random number generators for DataFrames
brkyvz May 1, 2015
36a7a68
[SPARK-6479] [BLOCK MANAGER] Create off-heap block storage API
zhzhan May 1, 2015
a9fc505
HOTFIX: Disable buggy dependency checker
pwendell May 1, 2015
0a2b15c
[SPARK-4550] In sort-based shuffle, store map outputs in serialized form
sryza May 1, 2015
7cf1eb7
[SPARK-7287] enabled fixed test
brkyvz May 1, 2015
14b3288
[SPARK-7291] [CORE] Fix a flaky test in AkkaRpcEnvSuite
zsxwing May 1, 2015
c24aeb6
[SPARK-6257] [PYSPARK] [MLLIB] MLlib API missing items in Recommendation
MechCoder May 1, 2015
7fe0f3f
[SPARK-3468] [WEBUI] Timeline-View feature
sarutak May 1, 2015
3052f49
[SPARK-4705] Handle multiple app attempts event logs, history server.
May 1, 2015
3b514af
[SPARK-3066] [MLLIB] Support recommendAll in matrix factorization model
May 1, 2015
7630213
[SPARK-5891] [ML] Add Binarizer ML Transformer
viirya May 1, 2015
c8c481d
Limit help option regex
May 1, 2015
27de6fe
changing persistence engine trait to an abstract class
nirandaperera May 1, 2015
7d42722
[SPARK-5854] personalized page rank
dwmclary May 1, 2015
1262e31
[SPARK-6846] [WEBUI] [HOTFIX] return to GET for kill link in UI since…
srowen May 1, 2015
1686032
[SPARK-7183] [NETWORK] Fix memory leak of TransportRequestHandler.str…
viirya May 1, 2015
3753776
[SPARK-7274] [SQL] Create Column expression for array/struct creation.
rxin May 1, 2015
58d6584
Revert "[SPARK-7287] enabled fixed test"
pwendell May 1, 2015
c6d9a42
Revert "[SPARK-7224] added mock repository generator for --packages t…
pwendell May 1, 2015
f53a488
[SPARK-7213] [YARN] Check for read permissions before copying a Hadoo…
nishkamravi2 May 1, 2015
7b5dd3e
[SPARK-7281] [YARN] Add option to set AM's lib path in client mode.
May 1, 2015
4dc8d74
[SPARK-7240][SQL] Single pass covariance calculation for dataframes
brkyvz May 1, 2015
b1f4ca8
[SPARK-5342] [YARN] Allow long running Spark apps to run on secure YA…
harishreedharan May 1, 2015
5c1faba
Ignore flakey test in SparkSubmitUtilsSuite
pwendell May 1, 2015
41c6a44
[SPARK-7312][SQL] SPARK-6913 broke jdk6 build
yhuai May 1, 2015
e6fb377
[SPARK-7304] [BUILD] Include $@ in call to mvn consistently in make-d…
May 2, 2015
98e7045
[SPARK-6999] [SQL] Remove the infinite recursive method (useless)
chenghao-intel May 2, 2015
ebc25a4
[SPARK-7309] [CORE] [STREAMING] Shutdown the thread pools in Received…
zsxwing May 2, 2015
b88c275
[SPARK-7112][Streaming][WIP] Add a InputInfoTracker to track all the …
jerryshao May 2, 2015
4786484
[SPARK-2808][Streaming][Kafka] update kafka to 0.8.2
koeninger May 2, 2015
ae98eec
[SPARK-3444] Provide an easy way to change log level
holdenk May 2, 2015
099327d
[SPARK-6954] [YARN] ExecutorAllocationManager can end up requesting a…
sryza May 2, 2015
2022193
[SPARK-7216] [MESOS] Add driver details page to Mesos cluster UI.
tnachen May 2, 2015
b4b43df
[SPARK-6443] [SPARK SUBMIT] Could not submit app in standalone cluste…
WangTaoTheTonic May 2, 2015
8f50a07
[SPARK-2691] [MESOS] Support for Mesos DockerInfo
hellertime May 2, 2015
38d4e9e
[SPARK-6229] Add SASL encryption to network library.
May 2, 2015
b79aeb9
[SPARK-7317] [Shuffle] Expose shuffle handle
May 2, 2015
2e0f357
[SPARK-7242] added python api for freqItems in DataFrames
brkyvz May 2, 2015
7394e7a
[SPARK-7120] [SPARK-7121] Closure cleaner nesting + documentation + t…
May 2, 2015
ecc6eb5
[SPARK-7315] [STREAMING] [TEST] Fix flaky WALBackedBlockRDDSuite
tdas May 2, 2015
856a571
[SPARK-3444] Fix typo in Dataframes.py introduced in []
deanchen May 2, 2015
da30352
[SPARK-7323] [SPARK CORE] Use insertAll instead of insert while mergi…
May 2, 2015
bfcd528
[SPARK-6030] [CORE] Using simulated field layout method to compute cl…
advancedxy May 2, 2015
82c8c37
[MINOR] [HIVE] Fix QueryPartitionSuite.
May 2, 2015
5d6b90d
[SPARK-5213] [SQL] Pluggable SQL Parser Support
chenghao-intel May 2, 2015
ea841ef
[SPARK-7255] [STREAMING] [DOCUMENTATION] Added documentation for spar…
BenFradet May 2, 2015
49549d5
[SPARK-7031] [THRIFTSERVER] let thrift server take SPARK_DAEMON_MEMOR…
WangTaoTheTonic May 2, 2015
f4af925
[SPARK-7022] [PYSPARK] [ML] Add ML.Tuning.ParamGridBuilder to PySpark
May 3, 2015
daa70bf
[SPARK-6907] [SQL] Isolated client for HiveMetastore
marmbrus May 3, 2015
9e25b09
[SPARK-7302] [DOCS] SPARK building documentation still mentions build…
srowen May 3, 2015
1ffa8cb
[SPARK-7329] [MLLIB] simplify ParamGridBuilder impl
mengxr May 4, 2015
9646018
[SPARK-7241] Pearson correlation for DataFrames
brkyvz May 4, 2015
3539cb7
[SPARK-5563] [MLLIB] LDA with online variational inference
hhbyyh May 4, 2015
343d3bf
[SPARK-5100] [SQL] add webui for thriftserver
tianyi May 4, 2015
5a1a107
[MINOR] Fix python test typo?
May 4, 2015
e0833c5
[SPARK-5956] [MLLIB] Pipeline components should be copyable.
mengxr May 4, 2015
f32e69e
[SPARK-7319][SQL] Improve the output from DataFrame.show()
May 4, 2015
fc8b581
[SPARK-6943] [SPARK-6944] DAG visualization on SparkUI
May 4, 2015
8055411
[SPARK-7243][SQL] Contingency Tables for DataFrames
brkyvz May 5, 2015
678c4da
[SPARK-7266] Add ExpectsInputTypes to expressions when possible.
rxin May 5, 2015
8aa5aea
[SPARK-7236] [CORE] Fix to prevent AkkaUtils askWithReply from sleepi…
BryanCutler May 5, 2015
e9b16e6
[SPARK-7314] [SPARK-3524] [PYSPARK] upgrade Pyrolite to 4.4
mengxr May 5, 2015
da738cf
[MINOR] Renamed variables in SparkKMeans.scala, LocalKMeans.scala and…
pippobaudos May 5, 2015
c5790a2
[MINOR] [BUILD] Declare ivy dependency in root pom.
May 5, 2015
1854ac3
[SPARK-7139] [STREAMING] Allow received block metadata to be saved to…
tdas May 5, 2015
8776fe0
[HOTFIX] [TEST] Ignoring flaky tests
tdas May 5, 2015
8436f7e
[SPARK-7113] [STREAMING] Support input information reporting for Dire…
jerryshao May 5, 2015
4d29867
[SPARK-7341] [STREAMING] [TESTS] Fix the flaky test: org.apache.spark…
zsxwing May 5, 2015
fc8feaa
[SPARK-6653] [YARN] New config to specify port for sparkYarnAM actor …
May 5, 2015
4222da6
[SPARK-5112] Expose SizeEstimator as a developer api
sryza May 5, 2015
51f4620
[SPARK-7357] Improving HBaseTest example
JihongMA May 5, 2015
d497358
[SPARK-3454] separate json endpoints for data in the UI
squito May 5, 2015
b83091a
[MINOR] Minor update for document
viirya May 5, 2015
5ffc73e
[SPARK-5074] [CORE] [TESTS] Fix the flakey test 'run shuffle with map…
zsxwing May 5, 2015
c6d1efb
[SPARK-7350] [STREAMING] [WEBUI] Attach the Streaming tab when callin…
zsxwing May 5, 2015
5ab652c
[SPARK-7202] [MLLIB] [PYSPARK] Add SparseMatrixPickler to SerDe
MechCoder May 5, 2015
5995ada
[SPARK-6612] [MLLIB] [PYSPARK] Python KMeans parity
FlytxtRnD May 5, 2015
9d250e6
Closes #5591
mengxr May 5, 2015
d4cb38a
[MLLIB] [TREE] Verify size of input rdd > 0 when building meta data
May 5, 2015
1fdabf8
[SPARK-7237] Many user provided closures are not actually cleaned
May 5, 2015
57e9f29
[SPARK-7318] [STREAMING] DStream cleans objects that are not closures
May 5, 2015
9f1f9b1
[SPARK-7007] [CORE] Add a metric source for ExecutorAllocationManager
jerryshao May 5, 2015
18340d7
[SPARK-7243][SQL] Reduce size for Contingency Tables in DataFrames
brkyvz May 5, 2015
ee374e8
[SPARK-7333] [MLLIB] Add BinaryClassificationEvaluator to PySpark
mengxr May 5, 2015
47728db
[SPARK-5888] [MLLIB] Add OneHotEncoder as a Transformer
sryza May 5, 2015
489700c
[SPARK-6939] [STREAMING] [WEBUI] Add timeline and histogram graphs fo…
zsxwing May 5, 2015
735bc3d
[SPARK-7294][SQL] ADD BETWEEN
May 5, 2015
fec7b29
[SPARK-7351] [STREAMING] [DOCS] Add spark.streaming.ui.retainedBatche…
zsxwing May 5, 2015
3059291
[SQL][Minor] make StringComparison extends ExpectsInputTypes
scwf May 5, 2015
c688e3c
[SPARK-7230] [SPARKR] Make RDD private in SparkR.
shivaram May 5, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-5342] [YARN] Allow long running Spark apps to run on secure YA…
…RN/HDFS

Current Spark apps running on Secure YARN/HDFS would not be able to write data
to HDFS after 7 days, since delegation tokens cannot be renewed beyond that. This
means Spark Streaming apps will not be able to run on Secure YARN.

This commit adds basic functionality to fix this issue. In this patch:
- new parameters are added - principal and keytab, which can be used to login to a KDC
- the client logs in, and then get tokens to start the AM
- the keytab is copied to the staging directory
- the AM waits for 60% of the time till expiry of the tokens and then logs in using the keytab
- each time after 60% of the time, new tokens are created and sent to the executors

Currently, to avoid complicating the architecture, we set the keytab and principal in the
SparkHadoopUtil singleton, and schedule a login. Once the login is completed, a callback is scheduled.

This is being posted for feedback, so I can gather feedback on the general implementation.

There are currently a bunch of things to do:
- [x] logging
- [x] testing - I plan to manually test this soon. If you have ideas of how to add unit tests, comment.
- [x] add code to ensure that if these params are set in non-YARN cluster mode, we complain
- [x] documentation
- [x] Have the executors request for credentials from the AM, so that retries are possible.

Author: Hari Shreedharan <[email protected]>

Closes apache#4688 from harishreedharan/kerberos-longrunning and squashes the following commits:

36eb8a9 [Hari Shreedharan] Change the renewal interval config param. Fix a bunch of comments.
611923a [Hari Shreedharan] Make sure the namenodes are listed correctly for creating tokens.
09fe224 [Hari Shreedharan] Use token.renew to get token's renewal interval rather than using hdfs-site.xml
6963bbc [Hari Shreedharan] Schedule renewal in AM before starting user class. Else, a restarted AM cannot access HDFS if the user class tries to.
072659e [Hari Shreedharan] Fix build failure caused by thread factory getting moved to ThreadUtils.
f041dd3 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
42eead4 [Hari Shreedharan] Remove RPC part. Refactor and move methods around, use renewal interval rather than max lifetime to create new tokens.
ebb36f5 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
bc083e3 [Hari Shreedharan] Overload RegisteredExecutor to send tokens. Minor doc updates.
7b19643 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
8a4f268 [Hari Shreedharan] Added docs in the security guide. Changed some code to ensure that the renewer objects are created only if required.
e800c8b [Hari Shreedharan] Restore original RegisteredExecutor message, and send new tokens via NewTokens message.
0e9507e [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
7f1bc58 [Hari Shreedharan] Minor fixes, cleanup.
bcd11f9 [Hari Shreedharan] Refactor AM and Executor token update code into separate classes, also send tokens via akka on executor startup.
f74303c [Hari Shreedharan] Move the new logic into specialized classes. Add cleanup for old credentials files.
2f9975c [Hari Shreedharan] Ensure new tokens are written out immediately on AM restart. Also, pikc up the latest suffix from HDFS if the AM is restarted.
61b2b27 [Hari Shreedharan] Account for AM restarts by making sure lastSuffix is read from the files on HDFS.
62c45ce [Hari Shreedharan] Relogin from keytab periodically.
fa233bd [Hari Shreedharan] Adding logging, fixing minor formatting and ordering issues.
42813b4 [Hari Shreedharan] Remove utils.sh, which was re-added due to merge with master.
0de27ee [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
55522e3 [Hari Shreedharan] Fix failure caused by Preconditions ambiguity.
9ef5f1b [Hari Shreedharan] Added explanation of how the credentials refresh works, some other minor fixes.
f4fd711 [Hari Shreedharan] Fix SparkConf usage.
2debcea [Hari Shreedharan] Change the file structure for credentials files. I will push a followup patch which adds a cleanup mechanism for old credentials files. The credentials files are small and few enough for it to cause issues on HDFS.
af6d5f0 [Hari Shreedharan] Cleaning up files where changes weren't required.
f0f54cb [Hari Shreedharan] Be more defensive when updating the credentials file.
f6954da [Hari Shreedharan] Got rid of Akka communication to renew, instead the executors check a known file's modification time to read the credentials.
5c11c3e [Hari Shreedharan] Move tests to YarnSparkHadoopUtil to fix compile issues.
b4cb917 [Hari Shreedharan] Send keytab to AM via DistributedCache rather than directly via HDFS
0985b4e [Hari Shreedharan] Write tokens to HDFS and read them back when required, rather than sending them over the wire.
d79b2b9 [Hari Shreedharan] Make sure correct credentials are passed to FileSystem#addDelegationTokens()
8c6928a [Hari Shreedharan] Fix issue caused by direct creation of Actor object.
fb27f46 [Hari Shreedharan] Make sure principal and keytab are set before CoarseGrainedSchedulerBackend is started. Also schedule re-logins in CoarseGrainedSchedulerBackend#start()
41efde0 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
d282d7a [Hari Shreedharan] Fix ClientSuite to set YARN mode, so that the correct class is used in tests.
bcfc374 [Hari Shreedharan] Fix Hadoop-1 build by adding no-op methods in SparkHadoopUtil, with impl in YarnSparkHadoopUtil.
f8fe694 [Hari Shreedharan] Handle None if keytab-login is not scheduled.
2b0d745 [Hari Shreedharan] [SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS.
ccba5bc [Hari Shreedharan] WIP: More changes wrt kerberos
77914dd [Hari Shreedharan] WIP: Add kerberos principal and keytab to YARN client.
  • Loading branch information
harishreedharan authored and tgravescs committed Apr 30, 2015
commit 6c65da6bb7d1213e6a4a9f7fd1597d029d87d07c
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.spark.deploy

import java.util.concurrent.{Executors, TimeUnit}

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.security.{Credentials, UserGroupInformation}

import org.apache.spark.{Logging, SparkConf}
import org.apache.spark.util.{ThreadUtils, Utils}

import scala.util.control.NonFatal

private[spark] class ExecutorDelegationTokenUpdater(
sparkConf: SparkConf,
hadoopConf: Configuration) extends Logging {

@volatile private var lastCredentialsFileSuffix = 0

private val credentialsFile = sparkConf.get("spark.yarn.credentials.file")

private val delegationTokenRenewer =
Executors.newSingleThreadScheduledExecutor(
ThreadUtils.namedThreadFactory("Delegation Token Refresh Thread"))

// On the executor, this thread wakes up and picks up new tokens from HDFS, if any.
private val executorUpdaterRunnable =
new Runnable {
override def run(): Unit = Utils.logUncaughtExceptions(updateCredentialsIfRequired())
}

def updateCredentialsIfRequired(): Unit = {
try {
val credentialsFilePath = new Path(credentialsFile)
val remoteFs = FileSystem.get(hadoopConf)
SparkHadoopUtil.get.listFilesSorted(
remoteFs, credentialsFilePath.getParent,
credentialsFilePath.getName, SparkHadoopUtil.SPARK_YARN_CREDS_TEMP_EXTENSION)
.lastOption.foreach { credentialsStatus =>
val suffix = SparkHadoopUtil.get.getSuffixForCredentialsPath(credentialsStatus.getPath)
if (suffix > lastCredentialsFileSuffix) {
logInfo("Reading new delegation tokens from " + credentialsStatus.getPath)
val newCredentials = getCredentialsFromHDFSFile(remoteFs, credentialsStatus.getPath)
lastCredentialsFileSuffix = suffix
UserGroupInformation.getCurrentUser.addCredentials(newCredentials)
logInfo("Tokens updated from credentials file.")
} else {
// Check every hour to see if new credentials arrived.
logInfo("Updated delegation tokens were expected, but the driver has not updated the " +
"tokens yet, will check again in an hour.")
delegationTokenRenewer.schedule(executorUpdaterRunnable, 1, TimeUnit.HOURS)
return
}
}
val timeFromNowToRenewal =
SparkHadoopUtil.get.getTimeFromNowToRenewal(
sparkConf, 0.8, UserGroupInformation.getCurrentUser.getCredentials)
if (timeFromNowToRenewal <= 0) {
executorUpdaterRunnable.run()
} else {
logInfo(s"Scheduling token refresh from HDFS in $timeFromNowToRenewal millis.")
delegationTokenRenewer.schedule(
executorUpdaterRunnable, timeFromNowToRenewal, TimeUnit.MILLISECONDS)
}
} catch {
// Since the file may get deleted while we are reading it, catch the Exception and come
// back in an hour to try again
case NonFatal(e) =>
logWarning("Error while trying to update credentials, will try again in 1 hour", e)
delegationTokenRenewer.schedule(executorUpdaterRunnable, 1, TimeUnit.HOURS)
}
}

private def getCredentialsFromHDFSFile(remoteFs: FileSystem, tokenPath: Path): Credentials = {
val stream = remoteFs.open(tokenPath)
try {
val newCredentials = new Credentials()
newCredentials.readTokenStorageStream(stream)
newCredentials
} finally {
stream.close()
}
}

def stop(): Unit = {
delegationTokenRenewer.shutdown()
}

}
69 changes: 67 additions & 2 deletions core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,16 @@

package org.apache.spark.deploy

import java.io.{ByteArrayInputStream, DataInputStream}
import java.lang.reflect.Method
import java.security.PrivilegedExceptionAction
import java.util.{Arrays, Comparator}

import com.google.common.primitives.Longs
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileStatus, FileSystem, Path}
import org.apache.hadoop.fs.{FileStatus, FileSystem, Path, PathFilter}
import org.apache.hadoop.fs.FileSystem.Statistics
import org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier
import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.mapreduce.JobContext
import org.apache.hadoop.security.{Credentials, UserGroupInformation}
Expand All @@ -32,14 +36,16 @@ import org.apache.spark.annotation.DeveloperApi
import org.apache.spark.util.Utils

import scala.collection.JavaConversions._
import scala.concurrent.duration._

/**
* :: DeveloperApi ::
* Contains util methods to interact with Hadoop from Spark.
*/
@DeveloperApi
class SparkHadoopUtil extends Logging {
val conf: Configuration = newConfiguration(new SparkConf())
private val sparkConf = new SparkConf()
val conf: Configuration = newConfiguration(sparkConf)
UserGroupInformation.setConfiguration(conf)

/**
Expand Down Expand Up @@ -201,6 +207,61 @@ class SparkHadoopUtil extends Logging {
if (baseStatus.isDir) recurse(basePath) else Array(baseStatus)
}

/**
* Lists all the files in a directory with the specified prefix, and does not end with the
* given suffix. The returned {{FileStatus}} instances are sorted by the modification times of
* the respective files.
*/
def listFilesSorted(
remoteFs: FileSystem,
dir: Path,
prefix: String,
exclusionSuffix: String): Array[FileStatus] = {
val fileStatuses = remoteFs.listStatus(dir,
new PathFilter {
override def accept(path: Path): Boolean = {
val name = path.getName
name.startsWith(prefix) && !name.endsWith(exclusionSuffix)
}
})
Arrays.sort(fileStatuses, new Comparator[FileStatus] {
override def compare(o1: FileStatus, o2: FileStatus): Int = {
Longs.compare(o1.getModificationTime, o2.getModificationTime)
}
})
fileStatuses
}

/**
* How much time is remaining (in millis) from now to (fraction * renewal time for the token that
* is valid the latest)?
* This will return -ve (or 0) value if the fraction of validity has already expired.
*/
def getTimeFromNowToRenewal(
sparkConf: SparkConf,
fraction: Double,
credentials: Credentials): Long = {
val now = System.currentTimeMillis()

val renewalInterval =
sparkConf.getLong("spark.yarn.token.renewal.interval", (24 hours).toMillis)

credentials.getAllTokens.filter(_.getKind == DelegationTokenIdentifier.HDFS_DELEGATION_KIND)
.map { t =>
val identifier = new DelegationTokenIdentifier()
identifier.readFields(new DataInputStream(new ByteArrayInputStream(t.getIdentifier)))
(identifier.getIssueDate + fraction * renewalInterval).toLong - now
}.foldLeft(0L)(math.max)
}


private[spark] def getSuffixForCredentialsPath(credentialsPath: Path): Int = {
val fileName = credentialsPath.getName
fileName.substring(
fileName.lastIndexOf(SparkHadoopUtil.SPARK_YARN_CREDS_COUNTER_DELIM) + 1).toInt
}


private val HADOOP_CONF_PATTERN = "(\\$\\{hadoopconf-[^\\}\\$\\s]+\\})".r.unanchored

/**
Expand Down Expand Up @@ -251,6 +312,10 @@ object SparkHadoopUtil {
}
}

val SPARK_YARN_CREDS_TEMP_EXTENSION = ".tmp"

val SPARK_YARN_CREDS_COUNTER_DELIM = "-"

def get: SparkHadoopUtil = {
hadoop
}
Expand Down
4 changes: 4 additions & 0 deletions core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,10 @@ object SparkSubmit {
OptionAssigner(args.archives, YARN, CLUSTER, clOption = "--archives"),
OptionAssigner(args.jars, YARN, CLUSTER, clOption = "--addJars"),

// Yarn client or cluster
OptionAssigner(args.principal, YARN, ALL_DEPLOY_MODES, clOption = "--principal"),
OptionAssigner(args.keytab, YARN, ALL_DEPLOY_MODES, clOption = "--keytab"),

// Other options
OptionAssigner(args.executorCores, STANDALONE, ALL_DEPLOY_MODES,
sysProp = "spark.executor.cores"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
var action: SparkSubmitAction = null
val sparkProperties: HashMap[String, String] = new HashMap[String, String]()
var proxyUser: String = null
var principal: String = null
var keytab: String = null

// Standalone cluster mode only
var supervise: Boolean = false
Expand Down Expand Up @@ -393,6 +395,12 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
case PROXY_USER =>
proxyUser = value

case PRINCIPAL =>
principal = value

case KEYTAB =>
keytab = value

case HELP =>
printUsageAndExit(0)

Expand Down Expand Up @@ -506,6 +514,13 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
| --num-executors NUM Number of executors to launch (Default: 2).
| --archives ARCHIVES Comma separated list of archives to be extracted into the
| working directory of each executor.
| --principal PRINCIPAL Principal to be used to login to KDC, while running on
| secure HDFS.
| --keytab KEYTAB The full path to the file that contains the keytab for the
| principal specified above. This keytab will be copied to
| the node running the Application Master via the Secure
| Distributed Cache, for renewing the login tickets and the
| delegation tokens periodically.
""".stripMargin
)
SparkSubmit.exitFn()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ import scala.util.{Failure, Success}
import org.apache.spark.rpc._
import org.apache.spark._
import org.apache.spark.TaskState.TaskState
import org.apache.spark.deploy.SparkHadoopUtil
import org.apache.spark.deploy.{ExecutorDelegationTokenUpdater, SparkHadoopUtil}
import org.apache.spark.deploy.worker.WorkerWatcher
import org.apache.spark.scheduler.TaskDescription
import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._
Expand Down Expand Up @@ -168,6 +168,16 @@ private[spark] object CoarseGrainedExecutorBackend extends Logging {
driverConf.set(key, value)
}
}
var tokenUpdaterOption: Option[ExecutorDelegationTokenUpdater] = None
if (driverConf.contains("spark.yarn.credentials.file")) {
logInfo("Will periodically update credentials from: " +
driverConf.get("spark.yarn.credentials.file"))
// Periodically update the credentials for this user to ensure HDFS tokens get updated.
tokenUpdaterOption =
Some(new ExecutorDelegationTokenUpdater(driverConf, SparkHadoopUtil.get.conf))
tokenUpdaterOption.get.updateCredentialsIfRequired()
}

val env = SparkEnv.createExecutorEnv(
driverConf, executorId, hostname, port, cores, isLocal = false)

Expand All @@ -183,6 +193,7 @@ private[spark] object CoarseGrainedExecutorBackend extends Logging {
env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
}
env.rpcEnv.awaitTermination()
tokenUpdaterOption.foreach(_.stop())
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp

class DriverEndpoint(override val rpcEnv: RpcEnv, sparkProperties: Seq[(String, String)])
extends ThreadSafeRpcEndpoint with Logging {

override protected def log = CoarseGrainedSchedulerBackend.this.log

private val addressToExecutorId = new HashMap[RpcAddress, String]
Expand Down Expand Up @@ -112,6 +113,7 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp
// Ignoring the task kill since the executor is not registered.
logWarning(s"Attempted to kill task $taskId for unknown executor $executorId.")
}

}

override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
Expand All @@ -122,7 +124,6 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp
} else {
logInfo("Registered executor: " + executorRef + " with ID " + executorId)
context.reply(RegisteredExecutor)

addressToExecutorId(executorRef.address) = executorId
totalCoreCount.addAndGet(cores)
totalRegisteredExecutors.addAndGet(1)
Expand Down Expand Up @@ -243,6 +244,7 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp
properties += ((key, value))
}
}

// TODO (prashant) send conf instead of properties
driverEndpoint = rpcEnv.setupEndpoint(
CoarseGrainedSchedulerBackend.ENDPOINT_NAME, new DriverEndpoint(rpcEnv, properties))
Expand Down
2 changes: 2 additions & 0 deletions docs/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ SSL must be configured on each node and configured for each component involved i
### YARN mode
The key-store can be prepared on the client side and then distributed and used by the executors as the part of the application. It is possible because the user is able to deploy files before the application is started in YARN by using `spark.yarn.dist.files` or `spark.yarn.dist.archives` configuration settings. The responsibility for encryption of transferring these files is on YARN side and has nothing to do with Spark.

For long-running apps like Spark Streaming apps to be able to write to HDFS, it is possible to pass a principal and keytab to `spark-submit` via the `--principal` and `--keytab` parameters respectively. The keytab passed in will be copied over to the machine running the Application Master via the Hadoop Distributed Cache (securely - if YARN is configured with SSL and HDFS encryption is enabled). The Kerberos login will be periodically renewed using this principal and keytab and the delegation tokens required for HDFS will be generated periodically so the application can continue writing to HDFS.

### Standalone mode
The user needs to provide key-stores and configuration options for master and workers. They have to be set by attaching appropriate Java system properties in `SPARK_MASTER_OPTS` and in `SPARK_WORKER_OPTS` environment variables, or just in `SPARK_DAEMON_JAVA_OPTS`. In this mode, the user may allow the executors to use the SSL settings inherited from the worker which spawned that executor. It can be accomplished by setting `spark.ssl.useNodeLocalConf` to `true`. If that parameter is set, the settings provided by user on the client side, are not used by the executors.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,10 @@ class SparkSubmitOptionParser {
// YARN-only options.
protected final String ARCHIVES = "--archives";
protected final String EXECUTOR_CORES = "--executor-cores";
protected final String QUEUE = "--queue";
protected final String KEYTAB = "--keytab";
protected final String NUM_EXECUTORS = "--num-executors";
protected final String PRINCIPAL = "--principal";
protected final String QUEUE = "--queue";

/**
* This is the canonical list of spark-submit options. Each entry in the array contains the
Expand All @@ -96,11 +98,13 @@ class SparkSubmitOptionParser {
{ EXECUTOR_MEMORY },
{ FILES },
{ JARS },
{ KEYTAB },
{ KILL_SUBMISSION },
{ MASTER },
{ NAME },
{ NUM_EXECUTORS },
{ PACKAGES },
{ PRINCIPAL },
{ PROPERTIES_FILE },
{ PROXY_USER },
{ PY_FILES },
Expand Down
Loading