Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
742 commits
Select commit Hold shift + click to select a range
dd1abef
[SPARK-19444][ML][DOCUMENTATION] Fix imports not being present in doc…
anshbansal Feb 7, 2017
e642a07
[SPARK-18682][SS] Batch Source for Kafka
Feb 7, 2017
706d6c1
[SPARK-19499][SS] Add more notes in the comments of Sink.addBatch()
CodingCat Feb 8, 2017
4d04029
[MINOR][DOC] Remove parenthesis in readStream() on kafka structured s…
manugarri Feb 8, 2017
71b6eac
[SPARK-18609][SPARK-18841][SQL][BACKPORT-2.1] Fix redundant Alias rem…
hvanhovell Feb 8, 2017
502c927
[SPARK-19413][SS] MapGroupsWithState for arbitrary stateful operation…
tdas Feb 8, 2017
b3fd36a
[SPARK-19481] [REPL] [MAVEN] Avoid to leak SparkContext in Signaling.…
zsxwing Feb 9, 2017
a3d5300
[SPARK-19509][SQL] Grouping Sets do not respect nullable grouping col…
Feb 9, 2017
ff5818b
[SPARK-19512][BACKPORT-2.1][SQL] codegen for compare structs fails #1…
bogdanrdc Feb 10, 2017
7b5ea00
[SPARK-19543] from_json fails when the input row is empty
brkyvz Feb 10, 2017
e580bb0
[SPARK-18717][SQL] Make code generation for Scala Map work with immut…
aray Dec 13, 2016
173c238
[SPARK-19342][SPARKR] bug fixed in collect method for collecting time…
titicaca Feb 12, 2017
06e77e0
[SPARK-19319][BACKPORT-2.1][SPARKR] SparkR Kmeans summary returns err…
wangmiao1981 Feb 12, 2017
fe4fcc5
[SPARK-19564][SPARK-19559][SS][KAFKA] KafkaOffsetReader's consumers s…
lw-lin Feb 13, 2017
a3b6751
[SPARK-19574][ML][DOCUMENTATION] Fix Liquid Exception: Start indices …
gatorsmile Feb 13, 2017
ef4fb7e
[SPARK-19506][ML][PYTHON] Import warnings in pyspark.ml.util
zero323 Feb 13, 2017
c5a7cb0
[SPARK-19542][SS] Delete the temp checkpoint if a query is stopped wi…
zsxwing Feb 13, 2017
328b229
[SPARK-17714][CORE][TEST-MAVEN][TEST-HADOOP2.6] Avoid using ExecutorC…
zsxwing Feb 13, 2017
2968d8c
[HOTFIX][SPARK-19542][SS]Fix the missing import in DataStreamReaderWr…
zsxwing Feb 13, 2017
5db2347
[SPARK-19529] TransportClientFactory.createClient() shouldn't call aw…
JoshRosen Feb 13, 2017
7fe3543
[SPARK-19520][STREAMING] Do not encrypt data written to the WAL.
Feb 13, 2017
c8113b0
[SPARK-19585][DOC][SQL] Fix the cacheTable and uncacheTable api call …
skambha Feb 14, 2017
f837ced
[SPARK-19501][YARN] Reduce the number of HDFS RPCs during YARN deploy…
jongwook Feb 14, 2017
7763b0b
[SPARK-19387][SPARKR] Tests do not run with SparkR source package in …
felixcheung Feb 14, 2017
8ee4ec8
[SPARK-19584][SS][DOCS] update structured streaming documentation aro…
Feb 15, 2017
6c35399
[SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column
felixcheung Feb 15, 2017
88c43f4
[SPARK-19599][SS] Clean up HDFSMetadataLog
zsxwing Feb 16, 2017
b9ab4c0
[SPARK-19604][TESTS] Log the start of every Python test
yhuai Feb 15, 2017
db7adb6
[SPARK-19603][SS] Fix StreamingQuery explain command
zsxwing Feb 16, 2017
252dd05
[SPARK-19399][SPARKR][BACKPORT-2.1] fix tests broken by merge
felixcheung Feb 16, 2017
55958bc
[SPARK-19622][WEBUI] Fix a http error in a paged table when using a `…
stanzhai Feb 17, 2017
6e3abed
[SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMap
Feb 17, 2017
b083ec5
[SPARK-19517][SS] KafkaSource fails to initialize partition offsets
vitillo Feb 17, 2017
7c371de
[SPARK-19646][CORE][STREAMING] binaryRecords replicates records in sc…
srowen Feb 20, 2017
c331674
[SPARK-19646][BUILD][HOTFIX] Fix compile error from cherry-pick of SP…
srowen Feb 20, 2017
6edf02a
[SPARK-19626][YARN] Using the correct config to set credentials updat…
yaooqinn Feb 21, 2017
9a890b5
[SPARK-19617][SS] Fix the race condition when starting and stopping a…
zsxwing Feb 22, 2017
21afc45
[SPARK-19652][UI] Do auth checks for REST API access (branch-2.1).
Feb 22, 2017
d30238f
[SPARK-19682][SPARKR] Issue warning (or error) when subset method "[[…
actuaryzhang Feb 23, 2017
43084b3
[SPARK-19459][SQL][BRANCH-2.1] Support for nested char/varchar fields…
hvanhovell Feb 23, 2017
66a7ca2
[SPARK-19691][SQL][BRANCH-2.1] Fix ClassCastException when calculatin…
maropu Feb 24, 2017
6da6a27
[SPARK-19707][CORE] Improve the invalid path check for sc.addJar
jerryshao Feb 24, 2017
ed9aaa3
[SPARK-19038][YARN] Avoid overwriting keytab configuration in yarn-cl…
jerryshao Feb 24, 2017
97866e1
[MINOR][DOCS] Fixes two problems in the SQL programing guide page
boazmohar Feb 25, 2017
20a4329
[SPARK-14772][PYTHON][ML] Fixed Params.copy method to match Scala imp…
BryanCutler Feb 26, 2017
04fbb9e
[SPARK-19594][STRUCTURED STREAMING] StreamingQueryListener fails to h…
Feb 26, 2017
4b4c3bf
[SPARK-19748][SQL] refresh function has a wrong order to do cache inv…
windpiger Feb 28, 2017
947c0cd
[SPARK-19677][SS] Committing a delta file atop an existing one should…
vitillo Feb 28, 2017
d887f75
[SPARK-19769][DOCS] Update quickstart instructions
elmiko Feb 28, 2017
f719ccc
[SPARK-19572][SPARKR] Allow to disable hive in sparkR shell
zjffdu Mar 1, 2017
bbe0d8c
[SPARK-19766][SQL] Constant alias columns in INNER JOIN should not be…
stanzhai Mar 1, 2017
27347b5
[SPARK-19373][MESOS] Base spark.scheduler.minRegisteredResourceRatio …
Mar 1, 2017
3a7591a
[SPARK-19750][UI][BRANCH-2.1] Fix redirect issue from http to https
jerryshao Mar 3, 2017
1237aae
[SPARK-19779][SS] Delete needless tmp file after restart structured s…
gf53520 Mar 3, 2017
accbed7
[SPARK-19797][DOC] ML pipeline document correction
ymwdalex Mar 3, 2017
da04d45
[SPARK-19774] StreamExecution should call stop() on sources when a st…
brkyvz Mar 3, 2017
664c979
[SPARK-19816][SQL][TESTS] Fix an issue that DataFrameCallbackSuite do…
zsxwing Mar 4, 2017
ca7a7e8
[SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should…
uncleGen Mar 6, 2017
fd6c6d5
[SPARK-19719][SS] Kafka writer for both structured streaming and batc…
Mar 7, 2017
711addd
[SPARK-19561] [PYTHON] cast TimestampType.toInternal output to long
Mar 7, 2017
551b7bd
[SPARK-19857][YARN] Correctly calculate next credential update time.
Mar 8, 2017
cbc3700
Revert "[SPARK-19561] [PYTHON] cast TimestampType.toInternal output t…
cloud-fan Mar 8, 2017
3b648a6
[SPARK-19859][SS] The new watermark should override the old one
zsxwing Mar 8, 2017
0ba9ecb
[SPARK-19348][PYTHON] PySpark keyword_only decorator is not thread-safe
BryanCutler Mar 8, 2017
320eff1
[SPARK-18055][SQL] Use correct mirror in ExpresionEncoder
marmbrus Mar 8, 2017
f6c1ad2
[SPARK-19813] maxFilesPerTrigger combo latestFirst may miss old files…
brkyvz Mar 8, 2017
3457c32
Revert "[SPARK-19413][SS] MapGroupsWithState for arbitrary stateful o…
zsxwing Mar 8, 2017
78cc572
[MINOR][SQL] The analyzer rules are fired twice for cases when Analys…
dilipbiswal Mar 9, 2017
00859e1
[SPARK-19874][BUILD] Hide API docs for org.apache.spark.sql.internal
zsxwing Mar 9, 2017
0c140c1
[SPARK-19859][SS][FOLLOW-UP] The new watermark should override the ol…
uncleGen Mar 9, 2017
2a76e24
[SPARK-19561][SQL] add int case handling for TimestampType
Mar 9, 2017
ffe65b0
[SPARK-19861][SS] watermark should not be a negative time.
uncleGen Mar 9, 2017
a59cc36
[SPARK-19886] Fix reportDataLoss if statement in SS KafkaSource
brkyvz Mar 10, 2017
f0d50fd
[SPARK-19891][SS] Await Batch Lock notified on stream execution exit
Mar 10, 2017
5a2ad43
[SPARK-19893][SQL] should not run DataFrame set oprations with map type
cloud-fan Mar 11, 2017
e481a73
[SPARK-19611][SQL] Introduce configurable table schema inference
Mar 11, 2017
f9833c6
[DOCS][SS] fix structured streaming python example
uncleGen Mar 12, 2017
8c46080
[SPARK-19853][SS] uppercase kafka topics fail when startingOffsets ar…
uncleGen Mar 13, 2017
4545782
[SPARK-19933][SQL] Do not change output of a subquery
hvanhovell Mar 14, 2017
a0ce845
[SPARK-19887][SQL] dynamic partition keys can be null or empty string
cloud-fan Mar 15, 2017
80ebca6
[SPARK-19944][SQL] Move SQLConf from sql/core to sql/catalyst (branch…
rxin Mar 15, 2017
0622546
[SPARK-19872] [PYTHON] Use the correct deserializer for RDD construct…
HyukjinKwon Mar 15, 2017
9d032d0
[SPARK-19329][SQL][BRANCH-2.1] Reading from or writing to a datasourc…
windpiger Mar 16, 2017
4b977ff
[SPARK-19765][SPARK-18549][SPARK-19093][SPARK-19736][BACKPORT-2.1][SQ…
gatorsmile Mar 17, 2017
710b555
[SPARK-19721][SS][BRANCH-2.1] Good error message for version mismatch…
lw-lin Mar 17, 2017
5fb7083
[SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests mor…
zsxwing Mar 17, 2017
780f606
[SQL][MINOR] Fix scaladoc for UDFRegistration
jaceklaskowski Mar 18, 2017
b60f690
[SPARK-18817][SPARKR][SQL] change derby log output to temp dir
felixcheung Mar 19, 2017
af8bf21
[SPARK-19994][SQL] Wrong outputOrdering for right/full outer smj
Mar 20, 2017
d205d40
[SPARK-17204][CORE] Fix replicated off heap storage
Mar 21, 2017
c4c7b18
[SPARK-19912][SQL] String literals should be escaped for Hive metasto…
dongjoon-hyun Mar 21, 2017
a88c88a
[SPARK-20017][SQL] change the nullability of function 'StringToMap' f…
zhaorongsheng Mar 21, 2017
5c18b6c
[SPARK-19237][SPARKR][CORE] On Windows spark-submit should handle whe…
felixcheung Mar 21, 2017
9dfdd2a
clarify array_contains function description
lwwmanning Mar 21, 2017
a04428f
[SPARK-19980][SQL][BACKPORT-2.1] Add NULL checks in Bean serializer
maropu Mar 22, 2017
30abb95
Preparing Spark release v2.1.1-rc1
pwendell Mar 22, 2017
c4d2b83
Preparing development version 2.1.2-SNAPSHOT
pwendell Mar 22, 2017
277ed37
[SPARK-19925][SPARKR] Fix SparkR spark.getSparkFiles fails when it wa…
yanboliang Mar 22, 2017
56f997f
[SPARK-20021][PYSPARK] Miss backslash in python code
uncleGen Mar 22, 2017
af960e8
[SPARK-19970][SQL][BRANCH-2.1] Table owner should be USER instead of …
dongjoon-hyun Mar 23, 2017
92f0b01
[SPARK-19959][SQL] Fix to throw NullPointerException in df[java.lang…
kiszk Mar 24, 2017
d989434
[SPARK-19674][SQL] Ignore driver accumulator updates don't belong to …
carsonwang Mar 25, 2017
b6d348e
[SPARK-20086][SQL] CollapseWindow should not collapse dependent adjac…
hvanhovell Mar 26, 2017
4056191
[SPARK-20102] Fix nightly packaging and RC packaging scripts w/ two m…
JoshRosen Mar 27, 2017
4bcb7d6
[SPARK-19995][YARN] Register tokens to current UGI to avoid re-issuin…
jerryshao Mar 28, 2017
fd2e406
[SPARK-20125][SQL] Dataset of type option of map does not work
cloud-fan Mar 28, 2017
e669dd7
[SPARK-14536][SQL][BACKPORT-2.1] fix to handle null value in array ty…
sureshthalamati Mar 28, 2017
02b165d
Preparing Spark release v2.1.1-rc2
pwendell Mar 28, 2017
4964dbe
Preparing development version 2.1.2-SNAPSHOT
pwendell Mar 28, 2017
3095480
[SPARK-20043][ML] DecisionTreeModel: ImpurityCalculator builder fails…
facaiy Mar 28, 2017
f8c1b3e
[SPARK-20134][SQL] SQLMetrics.postDriverMetricUpdates to simplify dri…
rxin Mar 29, 2017
103ff54
[SPARK-20059][YARN] Use the correct classloader for HBaseCredentialPr…
jerryshao Mar 29, 2017
6a1b2eb
[SPARK-20164][SQL] AnalysisException not tolerant of null query plan.
kunalkhamar Mar 31, 2017
e3cec18
[SPARK-20084][CORE] Remove internal.metrics.updatedBlockStatuses from…
rdblue Mar 31, 2017
968eace
[SPARK-19999][BACKPORT-2.1][CORE] Workaround JDK-8165231 to identify …
kiszk Apr 2, 2017
ca14410
[SPARK-20197][SPARKR][BRANCH-2.1] CRAN check fail with package instal…
felixcheung Apr 3, 2017
77700ea
[MINOR][DOCS] Replace non-breaking space to normal spaces that breaks…
HyukjinKwon Apr 3, 2017
f9546da
[SPARK-20190][APP-ID] applications//jobs' in rest api,status should b…
Apr 4, 2017
00c1248
[SPARK-20191][YARN] Crate wrapper for RackResolver so tests can overr…
Apr 4, 2017
efc72dc
[SPARK-20042][WEB UI] Fix log page buttons for reverse proxy mode
okoethibm Apr 5, 2017
2b85e05
[SPARK-20223][SQL] Fix typo in tpcds q77.sql
Apr 5, 2017
fb81a41
[SPARK-20214][ML] Make sure converted csc matrix has sorted indices
viirya Apr 6, 2017
7791120
[SPARK-20218][DOC][APP-ID] applications//stages' in REST API,add desc…
Apr 7, 2017
fc242cc
[SPARK-20246][SQL] should not push predicate down through aggregate w…
cloud-fan Apr 8, 2017
658b358
[SPARK-20262][SQL] AssertNotNull should throw NullPointerException
rxin Apr 8, 2017
43a7fca
[SPARK-20260][MLLIB] String interpolation required for error message
Apr 9, 2017
1a73046
[SPARK-20264][SQL] asm should be non-test dependency in sql/core
rxin Apr 10, 2017
bc7304e
[SPARK-20280][CORE] FileStatusCache Weigher integer overflow
bogdanrdc Apr 10, 2017
489c1f3
[SPARK-20285][TESTS] Increase the pyspark streaming test timeout to 3…
zsxwing Apr 10, 2017
b26f2c2
[SPARK-18555][SQL] DataFrameNaFunctions.fill miss up original values …
Dec 6, 2016
f40e44d
[SPARK-20270][SQL] na.fill should not change the values in long or in…
Apr 10, 2017
8eb71b8
[SPARK-17564][TESTS] Fix flaky RequestTimeoutIntegrationSuite.further…
zsxwing Apr 11, 2017
03a42c0
[SPARK-18555][MINOR][SQL] Fix the @since tag when backporting from 2.…
dbtsai Apr 11, 2017
46e212d
[SPARK-20291][SQL] NaNvl(FloatType, NullType) should not be cast to N…
Apr 12, 2017
b2970d9
[MINOR][DOCS] Fix spacings in Structured Streaming Programming Guide
dongjinleekr Apr 12, 2017
dbb6d1b
[SPARK-20296][TRIVIAL][DOCS] Count distinct error message for streaming
jtoka Apr 12, 2017
7e0ddda
[SPARK-20304][SQL] AssertNotNull should not include path in string re…
rxin Apr 12, 2017
be36c2f
[SPARK-20131][CORE] Don't use `this` lock in StandaloneSchedulerBacke…
zsxwing Apr 13, 2017
98ae548
[SPARK-19924][SQL][BACKPORT-2.1] Handle InvocationTargetException for…
gatorsmile Apr 13, 2017
bca7ce2
[SPARK-19946][TESTS][BACKPORT-2.1] DebugFilesystem.assertNoOpenStream…
bogdanrdc Apr 13, 2017
6f715c0
[SPARK-20243][TESTS] DebugFilesystem.assertNoOpenStreams thread race
bogdanrdc Apr 10, 2017
2ed19cf
Preparing Spark release v2.1.1-rc3
pwendell Apr 14, 2017
2a3e50e
Preparing development version 2.1.2-SNAPSHOT
pwendell Apr 14, 2017
efa11a4
[SPARK-20335][SQL][BACKPORT-2.1] Children expressions of Hive UDF imp…
gatorsmile Apr 17, 2017
7aad057
[SPARK-20349][SQL] ListFunctions returns duplicate functions after us…
gatorsmile Apr 17, 2017
db9517c
[SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patterns.
jodersky Apr 17, 2017
622d7a8
[HOTFIX] Fix compilation.
rxin Apr 17, 2017
3808b47
[SPARK-20349][SQL][REVERT-BRANCH2.1] ListFunctions returns duplicate …
gatorsmile Apr 18, 2017
a4c1ebc
[SPARK-17647][SQL][FOLLOWUP][MINOR] fix typo
felixcheung Apr 18, 2017
171bf65
[SPARK-20359][SQL] Avoid unnecessary execution in EliminateOuterJoin …
koertkuipers Apr 19, 2017
9e5dc82
[MINOR][SS] Fix a missing space in UnsupportedOperationChecker error …
zsxwing Apr 20, 2017
66e7a8f
[SPARK-20409][SQL] fail early if aggregate function in GROUP BY
cloud-fan Apr 20, 2017
fb0351a
Small rewording about history server use case
dud225 Apr 21, 2017
ba50580
[SPARK-20407][TESTS][BACKPORT-2.1] ParquetQuerySuite 'Enabling/disabl…
bogdanrdc Apr 22, 2017
d99b49b
[SPARK-20450][SQL] Unexpected first-query schema inference cost with …
ericl Apr 24, 2017
4279665
[SPARK-20451] Filter out nested mapType datatypes from sort order in …
sameeragarwal Apr 25, 2017
65990fc
[SPARK-20455][DOCS] Fix Broken Docker IT Docs
original-brownbear Apr 25, 2017
2d47e1a
[SPARK-20404][CORE] Using Option(name) instead of Some(name)
szhem Apr 25, 2017
359382c
[SPARK-20239][CORE][2.1-BACKPORT] Improve HistoryServer's ACL mechanism
jerryshao Apr 25, 2017
267aca5
Preparing Spark release v2.1.1-rc4
pwendell Apr 25, 2017
8460b09
Preparing development version 2.1.2-SNAPSHOT
pwendell Apr 25, 2017
6696ad0
[SPARK-20439][SQL][BACKPORT-2.1] Fix Catalog API listTables and getTa…
gatorsmile Apr 26, 2017
5131b0a
[SPARK-20496][SS] Bug in KafkaWriter Looks at Unanalyzed Plans
Apr 28, 2017
868b4a1
[SPARK-20517][UI] Fix broken history UI download link
jerryshao May 1, 2017
5915588
[SPARK-20540][CORE] Fix unstable executor requests.
rdblue May 1, 2017
d10b0f6
[SPARK-20558][CORE] clear InheritableThreadLocal variables in SparkCo…
cloud-fan May 3, 2017
179f537
[SPARK-20546][DEPLOY] spark-class gets syntax error in posix mode
jyu00 May 5, 2017
2a7f5da
[SPARK-20613] Remove excess quotes in Windows executable
jarrettmeyer May 5, 2017
704b249
[SPARK-20603][SS][TEST] Set default number of topic partitions to 1 t…
zsxwing May 5, 2017
a1112c6
[SPARK-20616] RuleExecutor logDebug of batch results should show diff…
juliuszsompolski May 5, 2017
f7a91a1
[SPARK-20615][ML][TEST] SparseVector.argmax throws IndexOutOfBoundsEx…
May 9, 2017
12c937e
[SPARK-20627][PYSPARK] Drop the hadoop distirbution name from the Pyt…
holdenk May 9, 2017
50f28df
[SPARK-17685][SQL] Make SortMergeJoinExec's currentVars is null when …
wangyum May 10, 2017
8e09789
[SPARK-20686][SQL] PropagateEmptyRelation incorrectly handles aggrega…
JoshRosen May 10, 2017
69786ea
[SPARK-20631][PYTHON][ML] LogisticRegression._checkThresholdConsisten…
zero323 May 10, 2017
bdc08ab
[SPARK-20688][SQL] correctly check analysis for scalar sub-queries
cloud-fan May 10, 2017
92a71a6
[SPARK-20685] Fix BatchPythonEvaluation bug in case of single UDF w/ …
JoshRosen May 10, 2017
6e89d57
[SPARK-20665][SQL] Bround" and "Round" function return NULL
10110346 May 12, 2017
95de467
[SPARK-17424] Fix unsound substitution bug in ScalaReflection.
rdblue May 12, 2017
62969e9
[SPARK-20705][WEB-UI] The sort function can not be used in the master…
May 15, 2017
14b6a9d
[SPARK-20735][SQL][TEST] Enable cross join in TPCDSQueryBenchmark
dongjoon-hyun May 15, 2017
ba35c6b
[SPARK-20769][DOC] Incorrect documentation for using Jupyter notebook
aray May 17, 2017
e06d936
[SPARK-20796] the location of start-master.sh in spark-standalone.md …
liu-zhaokun May 18, 2017
e326de4
[SPARK-20798] GenerateUnsafeProjection should check if a value is nul…
ala May 19, 2017
c53fe79
[SPARK-20759] SCALA_VERSION in _config.yml should be consistent with …
liu-zhaokun May 19, 2017
e9804b3
[SPARK-20781] the location of Dockerfile in docker.properties.templat…
liu-zhaokun May 19, 2017
c3a986b
[SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash when convert…
ghoto May 22, 2017
f5ef076
[SPARK-20756][YARN] yarn-shuffle jar references unshaded guava
markgrover May 22, 2017
f4538c9
[SPARK-20763][SQL][BACKPORT-2.1] The function of `month` and `day` re…
10110346 May 23, 2017
13adc0f
[SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape i…
MrBago May 24, 2017
2f68631
[SPARK-20848][SQL] Shutdown the pool after reading parquet files
viirya May 24, 2017
c3302e8
[SPARK-18406][CORE][BACKPORT-2.1] Race between end-of-task and comple…
jiangxb1987 May 25, 2017
7015f6f
[SPARK-20848][SQL][FOLLOW-UP] Shutdown the pool after reading parquet…
viirya May 25, 2017
7fc2347
[SPARK-20250][CORE] Improper OOM error when a task been killed while …
ConeyLiu May 25, 2017
4f6fccf
[SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to exam…
zsxwing May 25, 2017
6e6adcc
[SPARK-20868][CORE] UnsafeShuffleWriter should verify the position af…
cloud-fan May 26, 2017
ebd72f4
[SPARK-20843][CORE] Add a config to set driver terminate timeout
zsxwing May 27, 2017
38f37c5
[SPARK-20393][WEBU UI] Strengthen Spark to prevent XSS vulnerabilities
n-marion May 10, 2017
4640086
[SPARK-20275][UI] Do not display "Completed" column for in-progress a…
jerryshao May 31, 2017
dade85f
[SPARK-20940][CORE] Replace IllegalAccessError with IllegalStateExcep…
zsxwing Jun 1, 2017
772a9b9
[SPARK-20922][CORE] Add whitelist of classes that can be deserialized…
Jun 1, 2017
0b25a7d
[SPARK-20922][CORE][HOTFIX] Don't use Java 8 lambdas in older branches.
Jun 1, 2017
afab855
[SPARK-20974][BUILD] we should run REPL tests if SQL module has code …
cloud-fan Jun 3, 2017
03cc18b
[SPARK-20914][DOCS] Javadoc contains code that is invalid
srowen Jun 8, 2017
58a8a37
[SPARK-20920][SQL] ForkJoinPool pools are leaked when writing hive ta…
srowen Jun 13, 2017
ee0e74e
[SPARK-21064][CORE][TEST] Fix the default value bug in NettyBlockTran…
Jun 13, 2017
a890466
[SPARK-20211][SQL][BACKPORT-2.2] Fix the Precision and Scale of Decim…
gatorsmile Jun 14, 2017
62f2b80
[SPARK-16251][SPARK-20200][CORE][TEST] Flaky test: org.apache.spark.r…
jiangxb1987 Jun 15, 2017
915a201
[SPARK-21072][SQL] TreeNode.mapChildren should only apply to the chil…
ConeyLiu Jun 16, 2017
0ebb3b8
[SPARK-21114][TEST][2.1] Fix test failure in Spark 2.1/2.0 due to nam…
gatorsmile Jun 16, 2017
a44c118
[SPARK-19688][STREAMING] Not to read `spark.yarn.credentials.file` fr…
Jun 19, 2017
7799f35
[SPARK-21138][YARN] Cannot delete staging dir when the clusters of "s…
Jun 19, 2017
8923bac
[SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream sou…
Jun 20, 2017
6b37c86
[SPARK-18016][SQL][CATALYST][BRANCH-2.1] Code Generation: Constant Po…
Jun 22, 2017
1a98d5d
[SPARK-21167][SS] Decode the path generated by File sink to handle sp…
zsxwing Jun 22, 2017
f8fd3b4
[SPARK-21181] Release byteBuffers to suppress netty error messages
dhruve Jun 23, 2017
bcaf06c
[MINOR][DOCS] Docs in DataFrameNaFunctions.scala use wrong method
ongmingyang Jun 23, 2017
f12883e
[SPARK-20555][SQL] Fix mapping of Oracle DECIMAL types to Spark types…
Jun 24, 2017
6750db3
[SPARK-21159][CORE] Don't try to connect to launcher in standalone cl…
Jun 24, 2017
0d6b701
[SPARK-21203][SQL] Fix wrong results of insertion of Array of Struct
gatorsmile Jun 24, 2017
26f4f34
Revert "[SPARK-18016][SQL][CATALYST][BRANCH-2.1] Code Generation: Con…
cloud-fan Jun 25, 2017
083adb0
[SPARK-21176][WEB UI] Limit number of selector threads for admin ui p…
IngoSchuster Jun 30, 2017
d995dac
[SPARK-21258][SQL] Fix WindowExec complex object aggregation with spi…
hvanhovell Jun 30, 2017
3ecef24
Revert "[SPARK-21258][SQL] Fix WindowExec complex object aggregation …
cloud-fan Jun 30, 2017
8f1ca69
[SPARK-20256][SQL][BRANCH-2.1] SessionState should be created more la…
dongjoon-hyun Jul 5, 2017
7f7b63b
[SPARK-21312][SQL] correct offsetInBytes in UnsafeRow.writeToStream
Jul 6, 2017
5e2bfd5
[SPARK-21345][SQL][TEST][TEST-MAVEN][BRANCH-2.1] SparkSessionBuilderS…
dongjoon-hyun Jul 9, 2017
2c28462
[SPARK-21083][SQL][BRANCH-2.1] Store zero size and row count when ana…
wzhfy Jul 10, 2017
ca4d2aa
[SPARK-21344][SQL] BinaryType comparison does signed byte array compa…
kiszk Jul 15, 2017
a9efce4
[SPARK-19104][BACKPORT-2.1][SQL] Lambda variables in ExternalMapToCat…
kiszk Jul 18, 2017
caf32b3
[SPARK-21332][SQL] Incorrect result type inferred for some decimal ex…
Jul 18, 2017
ac20693
[SPARK-21441][SQL] Incorrect Codegen in SortMergeJoinExec results fai…
DonnyZone Jul 19, 2017
9498798
[SPARK-21446][SQL] Fix setAutoCommit never executed
DFFuture Jul 19, 2017
8520d7c
[SPARK-21306][ML] OneVsRest should support setWeightCol
facaiy Jul 28, 2017
258ca40
Revert "[SPARK-21306][ML] OneVsRest should support setWeightCol"
yanboliang Jul 28, 2017
78f7cdf
[SPARK-21555][SQL] RuntimeReplaceable should be compared semantically…
viirya Jul 29, 2017
b31b302
[SPARK-21522][CORE] Fix flakiness in LauncherServerSuite.
Aug 1, 2017
d93e45b
[SPARK-12717][PYTHON][BRANCH-2.1] Adding thread-safe broadcast pickle…
BryanCutler Aug 3, 2017
734b144
[SPARK-21330][SQL] Bad partitioning does not allow to read a JDBC tab…
aray Aug 4, 2017
5634fad
[SPARK-21588][SQL] SQLContext.getConf(key, null) should return null
vinodkc Aug 6, 2017
444cca1
[SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive infor…
markgrover Aug 7, 2017
9b749b6
[SPARK-21306][ML] For branch 2.1, OneVsRest should support setWeightCol
facaiy Aug 8, 2017
6f366fb
[SPARK-21721][SQL][BACKPORT-2.1] Clear FileSystem deleteOnExit cache …
viirya Aug 15, 2017
2394ae2
[MINOR] Correct validateAndTransformSchema in GaussianMixture and AFT…
sharp-pixel Aug 20, 2017
3d3be4d
[SPARK-21721][SQL][BACKPORT-2.1][FOLLOWUP] Clear FileSystem deleteOnE…
viirya Aug 20, 2017
5769753
[SPARK-21826][SQL][2.1][2.0] outer broadcast hash join should not thr…
cloud-fan Aug 24, 2017
041eccb
[SPARK-21834] Incorrect executor request in case of dynamic allocation
Aug 30, 2017
6a8a726
[SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTests2 should s…
ueshin Sep 8, 2017
ae4e8ae
[SPARKR][BACKPORT-2.1] backporting package and test changes
felixcheung Sep 10, 2017
e7696eb
[SPARK-21976][DOC] Fix wrong documentation for Mean Absolute Error.
FavioVazquez Sep 12, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive infor…
…mation

## What changes were proposed in this pull request?

Backporting SPARK-18535 and SPARK-19720 to spark 2.1

It's a backport PR that redacts senstive information by configuration to Spark UI and Spark Submit console logs.

Using reference from Mark Grover markapache.org PRs

## How was this patch tested?

Same tests from PR applied

Author: Mark Grover <[email protected]>

Closes #18802 from dmvieira/feature-redact.
  • Loading branch information
markgrover authored and Marcelo Vanzin committed Aug 7, 2017
commit 444cca14d7ac8c5ab5d7e9d080b11f4d6babe3bf
Original file line number Diff line number Diff line change
Expand Up @@ -670,7 +670,8 @@ object SparkSubmit {
if (verbose) {
printStream.println(s"Main class:\n$childMainClass")
printStream.println(s"Arguments:\n${childArgs.mkString("\n")}")
printStream.println(s"System properties:\n${sysProps.mkString("\n")}")
// sysProps may contain sensitive information, so redact before printing
printStream.println(s"System properties:\n${Utils.redact(sysProps).mkString("\n")}")
printStream.println(s"Classpath elements:\n${childClasspath.mkString("\n")}")
printStream.println("\n")
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,15 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
// scalastyle:off println
if (verbose) SparkSubmit.printStream.println(s"Using properties file: $propertiesFile")
Option(propertiesFile).foreach { filename =>
Utils.getPropertiesFromFile(filename).foreach { case (k, v) =>
val properties = Utils.getPropertiesFromFile(filename)
properties.foreach { case (k, v) =>
defaultProperties(k) = v
if (verbose) SparkSubmit.printStream.println(s"Adding default property: $k=$v")
}
// Property files may contain sensitive information, so redact before printing
if (verbose) {
Utils.redact(properties).foreach { case (k, v) =>
SparkSubmit.printStream.println(s"Adding default property: $k=$v")
}
}
}
// scalastyle:on println
Expand Down Expand Up @@ -318,7 +324,7 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
|
|Spark properties used, including those specified through
| --conf and those from the properties file $propertiesFile:
|${sparkProperties.mkString(" ", "\n ", "\n")}
|${Utils.redact(sparkProperties).mkString(" ", "\n ", "\n")}
""".stripMargin
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -220,4 +220,13 @@ package object config {
" bigger files.")
.longConf
.createWithDefault(4 * 1024 * 1024)

private[spark] val SECRET_REDACTION_PATTERN =
ConfigBuilder("spark.redaction.regex")
.doc("Regex to decide which Spark configuration properties and environment variables in " +
"driver and executor environments contain sensitive information. When this regex matches " +
"a property, its value is redacted from the environment UI and various logs like YARN " +
"and event logs.")
.stringConf
.createWithDefault("(?i)secret|password")
}
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,9 @@ private[spark] class EventLoggingListener(

override def onTaskEnd(event: SparkListenerTaskEnd): Unit = logEvent(event)

override def onEnvironmentUpdate(event: SparkListenerEnvironmentUpdate): Unit = logEvent(event)
override def onEnvironmentUpdate(event: SparkListenerEnvironmentUpdate): Unit = {
logEvent(redactEvent(event))
}

// Events that trigger a flush
override def onStageCompleted(event: SparkListenerStageCompleted): Unit = {
Expand Down Expand Up @@ -231,6 +233,15 @@ private[spark] class EventLoggingListener(
}
}

private[spark] def redactEvent(
event: SparkListenerEnvironmentUpdate): SparkListenerEnvironmentUpdate = {
// "Spark Properties" entry will always exist because the map is always populated with it.
val redactedProps = Utils.redact(sparkConf, event.environmentDetails("Spark Properties"))
val redactedEnvironmentDetails = event.environmentDetails +
("Spark Properties" -> redactedProps)
SparkListenerEnvironmentUpdate(redactedEnvironmentDetails)
}

}

private[spark] object EventLoggingListener extends Logging {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,21 +22,17 @@ import javax.servlet.http.HttpServletRequest
import scala.xml.Node

import org.apache.spark.ui.{UIUtils, WebUIPage}
import org.apache.spark.util.Utils

private[ui] class EnvironmentPage(parent: EnvironmentTab) extends WebUIPage("") {
private val listener = parent.listener

private def removePass(kv: (String, String)): (String, String) = {
if (kv._1.toLowerCase.contains("password") || kv._1.toLowerCase.contains("secret")) {
(kv._1, "******")
} else kv
}

def render(request: HttpServletRequest): Seq[Node] = {
val runtimeInformationTable = UIUtils.listingTable(
propertyHeader, jvmRow, listener.jvmInformation, fixedWidth = true)
val sparkPropertiesTable = UIUtils.listingTable(
propertyHeader, propertyRow, listener.sparkProperties.map(removePass), fixedWidth = true)
val sparkPropertiesTable = UIUtils.listingTable(propertyHeader, propertyRow,
Utils.redact(parent.conf, listener.sparkProperties), fixedWidth = true)

val systemPropertiesTable = UIUtils.listingTable(
propertyHeader, propertyRow, listener.systemProperties, fixedWidth = true)
val classpathEntriesTable = UIUtils.listingTable(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import org.apache.spark.ui._

private[ui] class EnvironmentTab(parent: SparkUI) extends SparkUITab(parent, "environment") {
val listener = parent.environmentListener
val conf = parent.conf
attachPage(new EnvironmentPage(this))
}

Expand Down
33 changes: 32 additions & 1 deletion core/src/main/scala/org/apache/spark/util/Utils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ import scala.io.Source
import scala.reflect.ClassTag
import scala.util.Try
import scala.util.control.{ControlThrowable, NonFatal}
import scala.util.matching.Regex

import _root_.io.netty.channel.unix.Errors.NativeIoException
import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
Expand All @@ -55,7 +56,7 @@ import org.slf4j.Logger
import org.apache.spark._
import org.apache.spark.deploy.SparkHadoopUtil
import org.apache.spark.internal.Logging
import org.apache.spark.internal.config.{DYN_ALLOCATION_INITIAL_EXECUTORS, DYN_ALLOCATION_MIN_EXECUTORS, EXECUTOR_INSTANCES}
import org.apache.spark.internal.config._
import org.apache.spark.network.util.JavaUtils
import org.apache.spark.serializer.{DeserializationStream, SerializationStream, SerializerInstance}

Expand Down Expand Up @@ -2571,6 +2572,36 @@ private[spark] object Utils extends Logging {
sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
}
}

private[util] val REDACTION_REPLACEMENT_TEXT = "*********(redacted)"

def redact(conf: SparkConf, kvs: Seq[(String, String)]): Seq[(String, String)] = {
val redactionPattern = conf.get(SECRET_REDACTION_PATTERN).r
redact(redactionPattern, kvs)
}

private def redact(redactionPattern: Regex, kvs: Seq[(String, String)]): Seq[(String, String)] = {
kvs.map { kv =>
redactionPattern.findFirstIn(kv._1)
.map { _ => (kv._1, REDACTION_REPLACEMENT_TEXT) }
.getOrElse(kv)
}
}

/**
* Looks up the redaction regex from within the key value pairs and uses it to redact the rest
* of the key value pairs. No care is taken to make sure the redaction property itself is not
* redacted. So theoretically, the property itself could be configured to redact its own value
* when printing.
*/
def redact(kvs: Map[String, String]): Seq[(String, String)] = {
val redactionPattern = kvs.getOrElse(
SECRET_REDACTION_PATTERN.key,
SECRET_REDACTION_PATTERN.defaultValueString
).r
redact(redactionPattern, kvs.toArray)
}

}

private[util] object CallerContext extends Logging {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,18 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit
}
}

test("Event logging with password redaction") {
val key = "spark.executorEnv.HADOOP_CREDSTORE_PASSWORD"
val secretPassword = "secret_password"
val conf = getLoggingConf(testDirPath, None)
.set(key, secretPassword)
val eventLogger = new EventLoggingListener("test", None, testDirPath.toUri(), conf)
val envDetails = SparkEnv.environmentDetails(conf, "FIFO", Seq.empty, Seq.empty)
val event = SparkListenerEnvironmentUpdate(envDetails)
val redactedProps = eventLogger.redactEvent(event).environmentDetails("Spark Properties").toMap
assert(redactedProps(key) == "*********(redacted)")
}

test("Log overwriting") {
val logUri = EventLoggingListener.getLogPath(testDir.toURI, "test", None)
val logPath = new URI(logUri).getPath
Expand Down
20 changes: 20 additions & 0 deletions core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -975,4 +975,24 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging {

assert(pValue > threshold)
}

test("redact sensitive information") {
val sparkConf = new SparkConf

// Set some secret keys
val secretKeys = Seq(
"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD",
"spark.my.password",
"spark.my.sECreT")
secretKeys.foreach { key => sparkConf.set(key, "secret_password") }
// Set a non-secret key
sparkConf.set("spark.regular.property", "not_a_secret")

// Redact sensitive information
val redactedConf = Utils.redact(sparkConf, sparkConf.getAll).toMap

// Assert that secret information got redacted while the regular property remained the same
secretKeys.foreach { key => assert(redactedConf(key) === Utils.REDACTION_REPLACEMENT_TEXT) }
assert(redactedConf("spark.regular.property") === "not_a_secret")
}
}
9 changes: 9 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,15 @@ Apart from these, the following properties are also available, and may be useful
process. The user can specify multiple of these to set multiple environment variables.
</td>
</tr>
<tr>
<td><code>spark.redaction.regex</code></td>
<td>(?i)secret|password</td>
<td>
Regex to decide which Spark configuration properties and environment variables in driver and
executor environments contain sensitive information. When this regex matches a property, its
value is redacted from the environment UI and various logs like YARN and event logs.
</td>
</tr>
<tr>
<td><code>spark.python.profile</code></td>
<td>false</td>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ import org.apache.hadoop.yarn.ipc.YarnRPC
import org.apache.hadoop.yarn.util.{ConverterUtils, Records}

import org.apache.spark.{SecurityManager, SparkConf, SparkException}
import org.apache.spark.deploy.yarn.config._
import org.apache.spark.internal.Logging
import org.apache.spark.internal.config._
import org.apache.spark.launcher.YarnCommandBuilderUtils
Expand Down Expand Up @@ -75,7 +74,7 @@ private[yarn] class ExecutorRunnable(
|===============================================================================
|YARN executor launch context:
| env:
|${env.map { case (k, v) => s" $k -> $v\n" }.mkString}
|${Utils.redact(sparkConf, env.toSeq).map { case (k, v) => s" $k -> $v\n" }.mkString}
| command:
| ${commands.mkString(" \\ \n ")}
|
Expand Down