Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
436 commits
Select commit Hold shift + click to select a range
4fcd52b
[SPARK-20506][DOCS] 2.2 migration guide
May 19, 2017
3aad598
[SPARK-20781] the location of Dockerfile in docker.properties.templat…
liu-zhaokun May 19, 2017
cfd1bf0
[SPARK-20792][SS] Support same timeout operations in mapGroupsWithSta…
tdas May 21, 2017
41d8d21
[SPARK-19089][SQL] Add support for nested sequences
michalsenkyr May 22, 2017
af1ff8b
[SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash when convert…
ghoto May 22, 2017
50dba30
[SPARK-20506][DOCS] Add HTML links to highlight list in MLlib guide f…
May 22, 2017
c4b16dc
[SPARK-20813][WEB UI] Fixed Web UI executor page tab search by status…
May 22, 2017
81f63c8
[SPARK-20801] Record accurate size of blocks in MapStatus when it's a…
May 22, 2017
a575532
[SPARK-20831][SQL] Fix INSERT OVERWRITE data source tables with IF NO…
gatorsmile May 22, 2017
a0bf5c4
[SPARK-20764][ML][PYSPARK] Fix visibility discrepancy with numInstanc…
May 22, 2017
2fd6138
[SPARK-20756][YARN] yarn-shuffle jar references unshaded guava
markgrover May 22, 2017
d8328d8
[SPARK-20814][MESOS] Restore support for spark.executor.extraClassPath.
May 22, 2017
ddc199e
[SPARK-20815][SPARKR] NullPointerException in RPackageUtils#checkMani…
jrshust May 23, 2017
5e9541a
[SPARK-20727] Skip tests that use Hadoop utils on CRAN Windows
shivaram May 23, 2017
06c985c
[SPARK-20399][SQL][FOLLOW-UP] Add a config to fallback string literal…
viirya May 23, 2017
dbb068f
[MINOR][SPARKR][ML] Joint coefficients with intercept for SparkR line…
yanboliang May 23, 2017
d20c646
[SPARK-20857][SQL] Generic resolved hint node
rxin May 23, 2017
00dee39
[SPARK-20861][ML][PYTHON] Delegate looping over paramMaps to estimators
MrBago May 24, 2017
ee9d597
[SPARK-18406][CORE] Race between end-of-task and completion iterator …
jiangxb1987 May 24, 2017
e936a96
[SPARK-20764][ML][PYSPARK][FOLLOWUP] Fix visibility discrepancy with …
May 24, 2017
1d10724
[SPARK-20631][FOLLOW-UP] Fix incorrect tests.
zero323 May 24, 2017
83aeac9
[SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape i…
MrBago May 24, 2017
c59ad42
[SPARK-20848][SQL] Shutdown the pool after reading parquet files
viirya May 24, 2017
b7a2a16
[SPARK-20867][SQL] Move hints from Statistics into HintInfo class
rxin May 24, 2017
2405afc
[SPARK-20872][SQL] ShuffleExchange.nodeName should handle null coordi…
rednaxelafx May 25, 2017
ae65d30
[SPARK-16202][SQL][DOC] Follow-up to Correct The Description of Creat…
jaceklaskowski May 25, 2017
3f82d65
[SPARK-20403][SQL] Modify the instructions of some functions
10110346 May 25, 2017
e0aa239
[SPARK-20848][SQL][FOLLOW-UP] Shutdown the pool after reading parquet…
viirya May 25, 2017
b52a06d
[SPARK-20250][CORE] Improper OOM error when a task been killed while …
ConeyLiu May 25, 2017
8896c4e
[SPARK-19659] Fetch big blocks to disk when shuffle-read.
May 25, 2017
9cbf39f
[SPARK-19281][FOLLOWUP][ML] Minor fix for PySpark FPGrowth.
yanboliang May 25, 2017
e01f1f2
[SPARK-20768][PYSPARK][ML] Expose numPartitions (expert) param of PyS…
facaiy May 25, 2017
022a495
[SPARK-20741][SPARK SUBMIT] Added cleanup of JARs archive generated b…
liorregev May 25, 2017
5ae1c65
[SPARK-19707][SPARK-18922][TESTS][SQL][CORE] Fix test failures/the in…
HyukjinKwon May 25, 2017
7a21de9
[SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to exam…
zsxwing May 25, 2017
289dd17
[SPARK-20888][SQL][DOCS] Document change of default setting of spark.…
May 26, 2017
fafe283
[SPARK-20868][CORE] UnsafeShuffleWriter should verify the position af…
cloud-fan May 26, 2017
f99456b
[SPARK-20393][WEBU UI] Strengthen Spark to prevent XSS vulnerabilities
n-marion May 10, 2017
92837ae
[SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due t…
kiszk May 16, 2017
2b59ed4
[SPARK-20844] Remove experimental from Structured Streaming APIs
marmbrus May 26, 2017
30922de
[SPARK-20694][DOCS][SQL] Document DataFrameWriter partitionBy, bucket…
zero323 May 26, 2017
fc799d7
[SPARK-10643][CORE] Make spark-submit download remote files to local …
loneknightpy May 26, 2017
39f7665
[SPARK-19659][CORE][FOLLOW-UP] Fetch big blocks to disk when shuffle-…
cloud-fan May 27, 2017
f2408bd
[SPARK-20843][CORE] Add a config to set driver terminate timeout
zsxwing May 27, 2017
25e87d8
[SPARK-20897][SQL] cached self-join should not fail
cloud-fan May 27, 2017
dc51be1
[SPARK-20908][SQL] Cache Manager: Hint should be ignored in plan matc…
gatorsmile May 28, 2017
26640a2
[SPARK-20907][TEST] Use testQuietly for test suites that generate lon…
kiszk May 29, 2017
3b79e4c
[SPARK-8184][SQL] Add additional function description for weekofyear
wangyum May 29, 2017
f6730a7
[SPARK-19968][SS] Use a cached instance of `KafkaProducer` instead of…
ScrapCodes May 30, 2017
5fdc7d8
[SPARK-20924][SQL] Unable to call the function registered in the not-…
gatorsmile May 30, 2017
287440d
[SPARK-20275][UI] Do not display "Completed" column for in-progress a…
jerryshao May 31, 2017
3cad66e
[SPARK-20877][SPARKR][WIP] add timestamps to test runs
felixcheung May 31, 2017
3686c2e
[SPARK-20790][MLLIB] Correctly handle negative values for implicit fe…
May 31, 2017
f59f9a3
[SPARK-20876][SQL][BACKPORT-2.2] If the input parameter is float type…
10110346 May 31, 2017
a607a26
[SPARK-20940][CORE] Replace IllegalAccessError with IllegalStateExcep…
zsxwing Jun 1, 2017
14fda6f
[SPARK-20244][CORE] Handle incorrect bytesRead metrics when using PyS…
jerryshao Jun 1, 2017
4ab7b82
[MINOR][SQL] Fix a few function description error.
wangyum Jun 1, 2017
6a4e023
[SPARK-20941][SQL] Fix SubqueryExec Reuse
gatorsmile Jun 1, 2017
b81a702
[SPARK-20365][YARN] Remove local scheme when add path to ClassPath.
lycplus Jun 1, 2017
4cba3b5
[SPARK-20922][CORE] Add whitelist of classes that can be deserialized…
Jun 1, 2017
bb3d900
[SPARK-20854][SQL] Extend hint syntax to support expressions
bogdanrdc Jun 1, 2017
25cc800
[SPARK-20942][WEB-UI] The title style about field is error in the his…
Jun 2, 2017
ae00d49
[SPARK-20967][SQL] SharedState.externalCatalog is not really lazy
cloud-fan Jun 2, 2017
f36c3ee
[SPARK-20946][SQL] simplify the config setting logic in SparkSession.…
cloud-fan Jun 2, 2017
7f35f5b
[SPARK-20955][CORE] Intern "executorId" to reduce the memory usage
zsxwing Jun 2, 2017
9a4a8e1
[SPARK-19236][SQL][BACKPORT-2.2] Added createOrReplaceGlobalTempView …
gatorsmile Jun 2, 2017
cc5dbd5
Preparing Spark release v2.2.0-rc3
pwendell Jun 2, 2017
0c42279
Preparing development version 2.2.0-SNAPSHOT
pwendell Jun 2, 2017
6c628e7
[MINOR][SQL] Update the description of spark.sql.files.ignoreCorruptF…
gatorsmile Jun 2, 2017
b560c97
Revert "[SPARK-20946][SQL] simplify the config setting logic in Spark…
yhuai Jun 2, 2017
377cfa8
Preparing Spark release v2.2.0-rc4
pwendell Jun 3, 2017
478874e
Preparing development version 2.2.1-SNAPSHOT
pwendell Jun 3, 2017
c8bbab6
[SPARK-20974][BUILD] we should run REPL tests if SQL module has code …
cloud-fan Jun 3, 2017
acd4481
[SPARK-20790][MLLIB] Remove extraneous logging in test
Jun 3, 2017
1388fdd
[SPARK-20926][SQL] Removing exposures to guava library caused by dire…
Jun 5, 2017
421d8ec
[SPARK-20957][SS][TESTS] Fix o.a.s.sql.streaming.StreamingQueryManage…
zsxwing Jun 5, 2017
3f93d07
[SPARK-20854][TESTS] Removing duplicate test case
bogdanrdc Jun 7, 2017
9a4341b
[MINOR][DOC] Update deprecation notes on Python/Hadoop/Scala.
dongjoon-hyun Jun 7, 2017
2f5eaa9
[SPARK-20914][DOCS] Javadoc contains code that is invalid
srowen Jun 8, 2017
02cf178
[SPARK-19185][DSTREAM] Make Kafka consumer cache configurable
markgrover Jun 8, 2017
3f6812c
[SPARK-20954][SQL][BRANCH-2.2][EXTENDED] DESCRIBE ` result should be …
dongjoon-hyun Jun 9, 2017
714153c
Fixed broken link
coreywoodfield Jun 9, 2017
869af5b
Fix bug in JavaRegressionMetricsExample.
masterwugui Jun 9, 2017
815a082
[SPARK-21042][SQL] Document Dataset.union is resolution by position
rxin Jun 10, 2017
0b0be47
[SPARK-20877][SPARKR] refactor tests to basic tests only for CRAN
felixcheung Jun 11, 2017
26003de
[SPARK-20877][SPARKR][FOLLOWUP] clean up after test move
felixcheung Jun 11, 2017
a4d78e4
[DOCS] Fix error: ambiguous reference to overloaded definition
ZiyueHuang Jun 12, 2017
e677394
[SPARK-21041][SQL] SparkSession.range should be consistent with Spark…
dongjoon-hyun Jun 12, 2017
92f7c8f
[SPARK-17914][SQL] Fix parsing of timestamp strings with nanoseconds
Jun 12, 2017
a6b7875
[SPARK-20345][SQL] Fix STS error handling logic on HiveSQLException
dongjoon-hyun Jun 12, 2017
580ecfd
[SPARK-21059][SQL] LikeSimplification can NPE on null pattern
rxin Jun 12, 2017
48a843b
[SPARK-21050][ML] Word2vec persistence overflow bug fix
jkbradley Jun 12, 2017
dae1a98
[TEST][SPARKR][CORE] Fix broken SparkSubmitSuite
felixcheung Jun 13, 2017
24836be
[SPARK-20920][SQL] ForkJoinPool pools are leaked when writing hive ta…
srowen Jun 13, 2017
039c465
[SPARK-21060][WEB-UI] Css style about paging function is error in the…
Jun 13, 2017
2bc2c15
[SPARK-21064][CORE][TEST] Fix the default value bug in NettyBlockTran…
Jun 13, 2017
220943d
[SPARK-20979][SS] Add RateSource to generate values for tests and ben…
zsxwing Jun 12, 2017
53212c3
[SPARK-12552][CORE] Correctly count the driver resource when recoveri…
jerryshao Jun 14, 2017
42cc830
[SPARK-20986][SQL] Reset table's statistics after PruneFileSourcePart…
lianhuiwang Jun 14, 2017
9bdc835
[SPARK-21085][SQL] Failed to read the partitioned table created by Sp…
gatorsmile Jun 14, 2017
6265119
[SPARK-20211][SQL][BACKPORT-2.2] Fix the Precision and Scale of Decim…
gatorsmile Jun 14, 2017
3dda682
[SPARK-21089][SQL] Fix DESC EXTENDED/FORMATTED to Show Table Properties
gatorsmile Jun 14, 2017
e02e063
Revert "[SPARK-20941][SQL] Fix SubqueryExec Reuse"
gatorsmile Jun 14, 2017
af4f89c
[SPARK-20980][SQL] Rename `wholeFile` to `multiLine` for both CSV and…
gatorsmile Jun 15, 2017
b5504f6
[SPARK-20980][DOCS] update doc to reflect multiLine change
felixcheung Jun 15, 2017
76ee41f
[SPARK-16251][SPARK-20200][CORE][TEST] Flaky test: org.apache.spark.r…
jiangxb1987 Jun 15, 2017
a585c87
[SPARK-21111][TEST][2.2] Fix the test failure of describe.sql
gatorsmile Jun 16, 2017
9909be3
[SPARK-21072][SQL] TreeNode.mapChildren should only apply to the chil…
ConeyLiu Jun 16, 2017
653e6f1
[SPARK-12552][FOLLOWUP] Fix flaky test for "o.a.s.deploy.master.Maste…
jerryshao Jun 16, 2017
d3deeb3
[MINOR][DOCS] Improve Running R Tests docs
wangyum Jun 16, 2017
8747f8e
[SPARK-21126] The configuration which named "spark.core.connection.au…
liu-zhaokun Jun 18, 2017
c0d4acc
[MINOR][R] Add knitr and rmarkdown packages/improve output for versio…
HyukjinKwon Jun 18, 2017
d3c79b7
[SPARK-21090][CORE] Optimize the unified memory manager code
10110346 Jun 19, 2017
fab070c
[SPARK-21132][SQL] DISTINCT modifier of function arguments should not…
gatorsmile Jun 19, 2017
f7fcdec
[SPARK-19688][STREAMING] Not to read `spark.yarn.credentials.file` fr…
Jun 19, 2017
7b50736
[SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream sou…
Jun 19, 2017
e329bea
[MINOR][BUILD] Fix Java linter errors
dongjoon-hyun Jun 19, 2017
cf10fa8
[SPARK-21138][YARN] Cannot delete staging dir when the clusters of "s…
Jun 19, 2017
8bf7f1e
[SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeExternal throw…
wangyum Jun 20, 2017
514a7e6
[SPARK-20929][ML] LinearSVC should use its own threshold param
jkbradley Jun 20, 2017
b8b80f6
[SPARK-21150][SQL] Persistent view stored in Hive metastore should be…
cloud-fan Jun 20, 2017
62e442e
Preparing Spark release v2.2.0-rc5
pwendell Jun 20, 2017
e883498
Preparing development version 2.2.1-SNAPSHOT
pwendell Jun 20, 2017
529c04f
[MINOR][DOCS] Add lost <tr> tag for configuration.md
wangyum Jun 21, 2017
198e3a0
[SPARK-18016][SQL][CATALYST][BRANCH-2.2] Code Generation: Constant Po…
Jun 22, 2017
6ef7a5b
[SPARK-21167][SS] Decode the path generated by File sink to handle sp…
zsxwing Jun 22, 2017
d625734
[SQL][DOC] Fix documentation of lpad
actuaryzhang Jun 22, 2017
b99c0e9
Revert "[SPARK-18016][SQL][CATALYST][BRANCH-2.2] Code Generation: Con…
cloud-fan Jun 23, 2017
b6749ba
[SPARK-21165] [SQL] [2.2] Use executedPlan instead of analyzedPlan in…
gatorsmile Jun 23, 2017
9d29808
[SPARK-21144][SQL] Print a warning if the data schema and partition s…
maropu Jun 23, 2017
f160267
[SPARK-21181] Release byteBuffers to suppress netty error messages
dhruve Jun 23, 2017
3394b06
[MINOR][DOCS] Docs in DataFrameNaFunctions.scala use wrong method
ongmingyang Jun 23, 2017
a3088d2
[SPARK-20555][SQL] Fix mapping of Oracle DECIMAL types to Spark types…
Jun 24, 2017
96c04f1
[SPARK-21159][CORE] Don't try to connect to launcher in standalone cl…
Jun 24, 2017
ad44ab5
[SPARK-21203][SQL] Fix wrong results of insertion of Array of Struct
gatorsmile Jun 24, 2017
d8e3a4a
[SPARK-21079][SQL] Calculate total size of a partition table as a sum…
mbasmanova Jun 25, 2017
970f68c
[SPARK-19104][SQL] Lambda variables in ExternalMapToCatalyst should b…
viirya Jun 27, 2017
17a04b9
[SPARK-21210][DOC][ML] Javadoc 8 fixes for ML shared param traits
Jun 29, 2017
20cf511
[SPARK-21253][CORE] Fix a bug that StreamCallback may not be notified…
zsxwing Jun 30, 2017
8de67e3
[SPARK-21253][CORE] Disable spark.reducer.maxReqSizeShuffleToMem
zsxwing Jun 30, 2017
c6ba647
[SPARK-21176][WEB UI] Limit number of selector threads for admin ui p…
IngoSchuster Jun 30, 2017
d16e262
[SPARK-21253][CORE][HOTFIX] Fix Scala 2.10 build
zsxwing Jun 30, 2017
8b08fd0
[SPARK-21258][SQL] Fix WindowExec complex object aggregation with spi…
hvanhovell Jun 30, 2017
29a0be2
[SPARK-21129][SQL] Arguments of SQL function call should not be named…
gatorsmile Jun 30, 2017
a2c7b21
Preparing Spark release v2.2.0-rc6
pwendell Jun 30, 2017
85fddf4
Preparing development version 2.2.1-SNAPSHOT
pwendell Jun 30, 2017
6fd39ea
[SPARK-21170][CORE] Utils.tryWithSafeFinallyAndFailureCallbacks throw…
Jul 1, 2017
db21b67
[SPARK-20256][SQL] SessionState should be created more lazily
dongjoon-hyun Jul 4, 2017
770fd2a
[SPARK-21300][SQL] ExternalMapToCatalyst should null-check map key pr…
ueshin Jul 5, 2017
6e1081c
[SPARK-21312][SQL] correct offsetInBytes in UnsafeRow.writeToStream
Jul 6, 2017
4e53a4e
[SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp check…
tdas Jul 6, 2017
576fd4c
[SPARK-21267][SS][DOCS] Update Structured Streaming Documentation
tdas Jul 7, 2017
ab12848
[SPARK-21069][SS][DOCS] Add rate source to programming guide.
ScrapCodes Jul 8, 2017
7d0b1c9
[SPARK-21228][SQL][BRANCH-2.2] InSet incorrect handling of structs
bogdanrdc Jul 8, 2017
a64f108
[SPARK-21345][SQL][TEST][TEST-MAVEN] SparkSessionBuilderSuite should …
dongjoon-hyun Jul 8, 2017
c8d7855
[SPARK-20342][CORE] Update task accumulators before sending task end …
Jul 8, 2017
964332b
[SPARK-21343] Refine the document for spark.reducer.maxReqSizeShuffle…
Jul 8, 2017
3bfad9d
[SPARK-21083][SQL][BRANCH-2.2] Store zero size and row count when ana…
Jul 9, 2017
40fd0ce
[SPARK-21342] Fix DownloadCallback to work well with RetryingBlockFet…
Jul 10, 2017
a05edf4
[SPARK-21272] SortMergeJoin LeftAnti does not update numOutputRows
juliuszsompolski Jul 10, 2017
edcd9fb
[SPARK-21369][CORE] Don't use Scala Tuple2 in common/network-*
zsxwing Jul 11, 2017
399aa01
[SPARK-21366][SQL][TEST] Add sql test for window functions
jiangxb1987 Jul 11, 2017
cb6fc89
[SPARK-21219][CORE] Task retry occurs on same executor due to race co…
Jul 12, 2017
39eba30
[SPARK-18646][REPL] Set parent classloader as null for ExecutorClassL…
taroplus Jul 13, 2017
cf0719b
Revert "[SPARK-18646][REPL] Set parent classloader as null for Execut…
cloud-fan Jul 13, 2017
bfe3ba8
[SPARK-21376][YARN] Fix yarn client token expire issue when cleaning …
jerryshao Jul 13, 2017
1cb4369
[SPARK-21344][SQL] BinaryType comparison does signed byte array compa…
kiszk Jul 15, 2017
8e85ce6
[SPARK-21267][DOCS][MINOR] Follow up to avoid referencing programming…
srowen Jul 15, 2017
0ef98fd
[SPARK-21321][SPARK CORE] Spark very verbose on shutdown
Jul 17, 2017
83bdb04
[SPARK-21332][SQL] Incorrect result type inferred for some decimal ex…
Jul 18, 2017
99ce551
[SPARK-21445] Make IntWrapper and LongWrapper in UTF8String Serializable
brkyvz Jul 18, 2017
df061fd
[SPARK-21457][SQL] ExternalCatalog.listPartitions should correctly ha…
cloud-fan Jul 18, 2017
5a0a76f
[SPARK-21414] Refine SlidingWindowFunctionFrame to avoid OOM.
Jul 19, 2017
4c212ee
[SPARK-21441][SQL] Incorrect Codegen in SortMergeJoinExec results fai…
DonnyZone Jul 19, 2017
86cd3c0
[SPARK-21464][SS] Minimize deprecation warnings caused by ProcessingT…
tdas Jul 19, 2017
308bce0
[SPARK-21446][SQL] Fix setAutoCommit never executed
DFFuture Jul 19, 2017
9949fed
[SPARK-21333][DOCS] Removed invalid joinTypes from javadoc of Dataset…
coreywoodfield Jul 19, 2017
88dccda
[SPARK-21243][CORE] Limit no. of map outputs in a shuffle fetch
dhruve Jul 21, 2017
da403b9
[SPARK-21434][PYTHON][DOCS] Add pyspark pip documentation.
holdenk Jul 21, 2017
62ca13d
[SPARK-20904][CORE] Don't report task failures to driver during shutd…
Jul 23, 2017
e5ec339
[SPARK-21383][YARN] Fix the YarnAllocator allocates more Resource
Jul 25, 2017
c91191b
[SPARK-21447][WEB UI] Spark history server fails to render compressed
Jul 25, 2017
1bfd1a8
[SPARK-21494][NETWORK] Use correct app id when authenticating to exte…
Jul 26, 2017
06b2ef0
[SPARK-21538][SQL] Attribute resolution inconsistency in the Dataset API
Jul 27, 2017
9379031
[SPARK-21306][ML] OneVsRest should support setWeightCol
facaiy Jul 28, 2017
df6cd35
[SPARK-21508][DOC] Fix example code provided in Spark Streaming Docum…
Jul 29, 2017
24a9bac
[SPARK-21555][SQL] RuntimeReplaceable should be compared semantically…
viirya Jul 29, 2017
66fa6bd
[SPARK-19451][SQL] rangeBetween method should accept Long value as bo…
jiangxb1987 Jul 29, 2017
e2062b9
Revert "[SPARK-19451][SQL] rangeBetween method should accept Long val…
gatorsmile Jul 30, 2017
1745434
[SPARK-21522][CORE] Fix flakiness in LauncherServerSuite.
Aug 1, 2017
79e5805
[SPARK-21593][DOCS] Fix 2 rendering errors on configuration page
srowen Aug 1, 2017
67c60d7
[SPARK-21339][CORE] spark-shell --packages option does not add jars t…
Aug 1, 2017
397f904
[SPARK-21597][SS] Fix a potential overflow issue in EventTimeStats
zsxwing Aug 2, 2017
467ee8d
[SPARK-21546][SS] dropDuplicates should ignore watermark when it's no…
zsxwing Aug 2, 2017
690f491
[SPARK-12717][PYTHON][BRANCH-2.2] Adding thread-safe broadcast pickle…
BryanCutler Aug 3, 2017
1bcfa2a
Fix Java SimpleApp spark application
christiam Aug 3, 2017
f9aae8e
[SPARK-21330][SQL] Bad partitioning does not allow to read a JDBC tab…
aray Aug 4, 2017
841bc2f
[SPARK-21580][SQL] Integers in aggregation expressions are wrongly ta…
10110346 Aug 5, 2017
098aaec
[SPARK-21588][SQL] SQLContext.getConf(key, null) should return null
vinodkc Aug 6, 2017
7a04def
[SPARK-21621][CORE] Reset numRecordsWritten after DiskBlockObjectWrit…
ConeyLiu Aug 7, 2017
4f0eb0c
[SPARK-21647][SQL] Fix SortMergeJoin when using CROSS
gatorsmile Aug 7, 2017
43f9c84
[SPARK-21374][CORE] Fix reading globbed paths from S3 into DF with di…
Aug 5, 2017
fa92a7b
[SPARK-21565][SS] Propagate metadata in attribute replacement.
Aug 7, 2017
a1c1199
[SPARK-21648][SQL] Fix confusing assert failure in JDBC source when p…
gatorsmile Aug 7, 2017
86609a9
[SPARK-21567][SQL] Dataset should work with type alias
viirya Aug 8, 2017
e87ffca
Revert "[SPARK-21567][SQL] Dataset should work with type alias"
cloud-fan Aug 8, 2017
d023314
[SPARK-21503][UI] Spark UI shows incorrect task status for a killed E…
Aug 9, 2017
7446be3
[SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in …
WeichenXu123 Aug 9, 2017
f6d56d2
[SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the…
zsxwing Aug 9, 2017
3ca55ea
[SPARK-21663][TESTS] test("remote fetch below max RPC message size") …
wangjiaochun Aug 9, 2017
c909496
[SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog
rxin Aug 11, 2017
406eb1c
[SPARK-21595] Separate thresholds for buffering and spilling in Exter…
tejasapatil Aug 11, 2017
7b98077
[SPARK-21563][CORE] Fix race condition when serializing TaskDescripti…
ash211 Aug 14, 2017
48bacd3
[SPARK-21696][SS] Fix a potential issue that may generate partial sna…
zsxwing Aug 14, 2017
d9c8e62
[SPARK-21721][SQL] Clear FileSystem deleteOnExit cache when paths are…
viirya Aug 15, 2017
f1accc8
[SPARK-21723][ML] Fix writing LibSVM (key not found: numFeatures)
Aug 16, 2017
f5ede0d
[SPARK-21656][CORE] spark dynamic allocation should not idle timeout …
Aug 16, 2017
2a96975
[SPARK-18464][SQL][BACKPORT] support old table which doesn't store sc…
cloud-fan Aug 16, 2017
fdea642
[SPARK-21739][SQL] Cast expression should initialize timezoneId when …
DonnyZone Aug 18, 2017
6c2a38a
[MINOR] Correct validateAndTransformSchema in GaussianMixture and AFT…
sharp-pixel Aug 20, 2017
0f640e9
[SPARK-21721][SQL][FOLLOWUP] Clear FileSystem deleteOnExit cache when…
viirya Aug 20, 2017
526087f
[SPARK-21617][SQL] Store correct table metadata when altering schema …
Aug 21, 2017
236b2f4
[SPARK-21805][SPARKR] Disable R vignettes code on Windows
felixcheung Aug 24, 2017
a585367
[SPARK-21826][SQL] outer broadcast hash join should not throw NPE
cloud-fan Aug 24, 2017
2b4bd79
[SPARK-21681][ML] fix bug of MLOR do not work correctly when featureS…
WeichenXu123 Aug 24, 2017
0d4ef2f
[SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSummarizer.vari…
WeichenXu123 Aug 28, 2017
59bb7eb
[SPARK-21798] No config to replace deprecated SPARK_CLASSPATH config …
Aug 28, 2017
59529b2
[SPARK-21714][CORE][BACKPORT-2.2] Avoiding re-uploading remote resour…
jerryshao Aug 29, 2017
917fe66
Revert "[SPARK-21714][CORE][BACKPORT-2.2] Avoiding re-uploading remot…
Aug 29, 2017
a6a9944
[SPARK-21254][WEBUI] History UI performance fixes
2ooom Aug 30, 2017
d10c9dc
[SPARK-21714][CORE][BACKPORT-2.2] Avoiding re-uploading remote resour…
jerryshao Aug 30, 2017
14054ff
[SPARK-21834] Incorrect executor request in case of dynamic allocation
Aug 30, 2017
50f86e1
[SPARK-21884][SPARK-21477][BACKPORT-2.2][SQL] Mark LocalTableScanExec…
gatorsmile Sep 1, 2017
fb1b5f0
[SPARK-21418][SQL] NoSuchElementException: None.get in DataSourceScan…
srowen Sep 4, 2017
1f7c486
[SPARK-21925] Update trigger interval documentation in docs with beha…
brkyvz Sep 5, 2017
7da8fbf
[MINOR][DOC] Update `Partition Discovery` section to enumerate all av…
dongjoon-hyun Sep 5, 2017
9afab9a
[SPARK-21924][DOCS] Update structured streaming programming guide doc
Sep 6, 2017
342cc2a
[SPARK-21901][SS] Define toString for StateOperatorProgress
jaceklaskowski Sep 6, 2017
49968de
Fixed pandoc dependency issue in python/setup.py
Sep 7, 2017
0848df1
[SPARK-21890] Credentials not being passed to add the tokens
Sep 7, 2017
4304d0b
[SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTests2 should s…
ueshin Sep 8, 2017
781a1f8
[SPARK-21915][ML][PYSPARK] Model 1 and Model 2 ParamMaps Missing
marktab Sep 8, 2017
08cb06a
[SPARK-21936][SQL][2.2] backward compatibility test framework for Hiv…
cloud-fan Sep 8, 2017
9ae7c96
[SPARK-21946][TEST] fix flaky test: "alter table: rename cached table…
kiszk Sep 8, 2017
9876821
[SPARK-21128][R][BACKPORT-2.2] Remove both "spark-warehouse" and "met…
HyukjinKwon Sep 8, 2017
182478e
[SPARK-21954][SQL] JacksonUtils should verify MapType's value type in…
viirya Sep 9, 2017
b1b5a7f
[SPARK-20098][PYSPARK] dataType's typeName fix
szalai1 Sep 10, 2017
10c6836
[SPARK-21976][DOC] Fix wrong documentation for Mean Absolute Error.
FavioVazquez Sep 12, 2017
63098dc
[DOCS] Fix unreachable links in the document
sarutak Sep 12, 2017
b606dc1
[SPARK-18608][ML] Fix double caching
zhengruifeng Sep 12, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-20844] Remove experimental from Structured Streaming APIs
Now that Structured Streaming has been out for several Spark release and has large production use cases, the `Experimental` label is no longer appropriate.  I've left `InterfaceStability.Evolving` however, as I think we may make a few changes to the pluggable Source & Sink API in Spark 2.3.

Author: Michael Armbrust <[email protected]>

Closes #18065 from marmbrus/streamingGA.
  • Loading branch information
marmbrus authored and zsxwing committed May 26, 2017
commit 2b59ed4f1d4e859d5987b6eaaee074260b2a12f8
4 changes: 2 additions & 2 deletions docs/structured-streaming-programming-guide.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: global
displayTitle: Structured Streaming Programming Guide [Experimental]
displayTitle: Structured Streaming Programming Guide
title: Structured Streaming Programming Guide
---

Expand All @@ -10,7 +10,7 @@ title: Structured Streaming Programming Guide
# Overview
Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. You can express your streaming computation the same way you would express a batch computation on static data. The Spark SQL engine will take care of running it incrementally and continuously and updating the final result as streaming data continues to arrive. You can use the [Dataset/DataFrame API](sql-programming-guide.html) in Scala, Java, Python or R to express streaming aggregations, event-time windows, stream-to-batch joins, etc. The computation is executed on the same optimized Spark SQL engine. Finally, the system ensures end-to-end exactly-once fault-tolerance guarantees through checkpointing and Write Ahead Logs. In short, *Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing without the user having to reason about streaming.*

**Structured Streaming is still ALPHA in Spark 2.1** and the APIs are still experimental. In this guide, we are going to walk you through the programming model and the APIs. First, let's start with a simple example - a streaming word count.
In this guide, we are going to walk you through the programming model and the APIs. First, let's start with a simple example - a streaming word count.

# Quick Example
Let’s say you want to maintain a running word count of text data received from a data server listening on a TCP socket. Let’s see how you can express this using Structured Streaming. You can see the full code in
Expand Down
4 changes: 2 additions & 2 deletions python/pyspark/sql/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -470,7 +470,7 @@ def readStream(self):
Returns a :class:`DataStreamReader` that can be used to read data streams
as a streaming :class:`DataFrame`.

.. note:: Experimental.
.. note:: Evolving.

:return: :class:`DataStreamReader`

Expand All @@ -486,7 +486,7 @@ def streams(self):
"""Returns a :class:`StreamingQueryManager` that allows managing all the
:class:`StreamingQuery` StreamingQueries active on `this` context.

.. note:: Experimental.
.. note:: Evolving.
"""
from pyspark.sql.streaming import StreamingQueryManager
return StreamingQueryManager(self._ssql_ctx.streams())
Expand Down
6 changes: 3 additions & 3 deletions python/pyspark/sql/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ def writeStream(self):
Interface for saving the content of the streaming :class:`DataFrame` out into external
storage.

.. note:: Experimental.
.. note:: Evolving.

:return: :class:`DataStreamWriter`
"""
Expand Down Expand Up @@ -285,7 +285,7 @@ def isStreaming(self):
:func:`collect`) will throw an :class:`AnalysisException` when there is a streaming
source present.

.. note:: Experimental
.. note:: Evolving
"""
return self._jdf.isStreaming()

Expand Down Expand Up @@ -359,7 +359,7 @@ def withWatermark(self, eventTime, delayThreshold):
latest record that has been processed in the form of an interval
(e.g. "1 minute" or "5 hours").

.. note:: Experimental
.. note:: Evolving

>>> sdf.select('name', sdf.time.cast('timestamp')).withWatermark('time', '10 minutes')
DataFrame[name: string, time: timestamp]
Expand Down
4 changes: 2 additions & 2 deletions python/pyspark/sql/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -586,7 +586,7 @@ def readStream(self):
Returns a :class:`DataStreamReader` that can be used to read data streams
as a streaming :class:`DataFrame`.

.. note:: Experimental.
.. note:: Evolving.

:return: :class:`DataStreamReader`
"""
Expand All @@ -598,7 +598,7 @@ def streams(self):
"""Returns a :class:`StreamingQueryManager` that allows managing all the
:class:`StreamingQuery` StreamingQueries active on `this` context.

.. note:: Experimental.
.. note:: Evolving.

:return: :class:`StreamingQueryManager`
"""
Expand Down
42 changes: 21 additions & 21 deletions python/pyspark/sql/streaming.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ class StreamingQuery(object):
A handle to a query that is executing continuously in the background as new data arrives.
All these methods are thread-safe.

.. note:: Experimental
.. note:: Evolving

.. versionadded:: 2.0
"""
Expand Down Expand Up @@ -197,7 +197,7 @@ def exception(self):
class StreamingQueryManager(object):
"""A class to manage all the :class:`StreamingQuery` StreamingQueries active.

.. note:: Experimental
.. note:: Evolving

.. versionadded:: 2.0
"""
Expand Down Expand Up @@ -283,7 +283,7 @@ class DataStreamReader(OptionUtils):
(e.g. file systems, key-value stores, etc). Use :func:`spark.readStream`
to access this.

.. note:: Experimental.
.. note:: Evolving.

.. versionadded:: 2.0
"""
Expand All @@ -300,7 +300,7 @@ def _df(self, jdf):
def format(self, source):
"""Specifies the input data source format.

.. note:: Experimental.
.. note:: Evolving.

:param source: string, name of the data source, e.g. 'json', 'parquet'.

Expand All @@ -317,7 +317,7 @@ def schema(self, schema):
By specifying the schema here, the underlying data source can skip the schema
inference step, and thus speed up data loading.

.. note:: Experimental.
.. note:: Evolving.

:param schema: a :class:`pyspark.sql.types.StructType` object

Expand All @@ -340,7 +340,7 @@ def option(self, key, value):
in the JSON/CSV datasources or partition values.
If it isn't set, it uses the default value, session local timezone.

.. note:: Experimental.
.. note:: Evolving.

>>> s = spark.readStream.option("x", 1)
"""
Expand All @@ -356,7 +356,7 @@ def options(self, **options):
in the JSON/CSV datasources or partition values.
If it isn't set, it uses the default value, session local timezone.

.. note:: Experimental.
.. note:: Evolving.

>>> s = spark.readStream.options(x="1", y=2)
"""
Expand All @@ -368,7 +368,7 @@ def options(self, **options):
def load(self, path=None, format=None, schema=None, **options):
"""Loads a data stream from a data source and returns it as a :class`DataFrame`.

.. note:: Experimental.
.. note:: Evolving.

:param path: optional string for file-system backed data sources.
:param format: optional string for format of the data source. Default to 'parquet'.
Expand Down Expand Up @@ -411,7 +411,7 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
If the ``schema`` parameter is not specified, this function goes
through the input once to determine the input schema.

.. note:: Experimental.
.. note:: Evolving.

:param path: string represents path to the JSON dataset,
or RDD of Strings storing JSON objects.
Expand Down Expand Up @@ -488,7 +488,7 @@ def parquet(self, path):
Parquet part-files. This will override ``spark.sql.parquet.mergeSchema``. \
The default value is specified in ``spark.sql.parquet.mergeSchema``.

.. note:: Experimental.
.. note:: Evolving.

>>> parquet_sdf = spark.readStream.schema(sdf_schema).parquet(tempfile.mkdtemp())
>>> parquet_sdf.isStreaming
Expand All @@ -511,7 +511,7 @@ def text(self, path):

Each line in the text file is a new row in the resulting DataFrame.

.. note:: Experimental.
.. note:: Evolving.

:param paths: string, or list of strings, for input path(s).

Expand Down Expand Up @@ -539,7 +539,7 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
``inferSchema`` is enabled. To avoid going through the entire data once, disable
``inferSchema`` option or specify the schema explicitly using ``schema``.

.. note:: Experimental.
.. note:: Evolving.

:param path: string, or list of strings, for input path(s).
:param schema: an optional :class:`pyspark.sql.types.StructType` for the input schema.
Expand Down Expand Up @@ -637,7 +637,7 @@ class DataStreamWriter(object):
(e.g. file systems, key-value stores, etc). Use :func:`DataFrame.writeStream`
to access this.

.. note:: Experimental.
.. note:: Evolving.

.. versionadded:: 2.0
"""
Expand Down Expand Up @@ -665,7 +665,7 @@ def outputMode(self, outputMode):
written to the sink every time there are some updates. If the query doesn't contain
aggregations, it will be equivalent to `append` mode.

.. note:: Experimental.
.. note:: Evolving.

>>> writer = sdf.writeStream.outputMode('append')
"""
Expand All @@ -678,7 +678,7 @@ def outputMode(self, outputMode):
def format(self, source):
"""Specifies the underlying output data source.

.. note:: Experimental.
.. note:: Evolving.

:param source: string, name of the data source, which for now can be 'parquet'.

Expand All @@ -696,7 +696,7 @@ def option(self, key, value):
timestamps in the JSON/CSV datasources or partition values.
If it isn't set, it uses the default value, session local timezone.

.. note:: Experimental.
.. note:: Evolving.
"""
self._jwrite = self._jwrite.option(key, to_str(value))
return self
Expand All @@ -710,7 +710,7 @@ def options(self, **options):
timestamps in the JSON/CSV datasources or partition values.
If it isn't set, it uses the default value, session local timezone.

.. note:: Experimental.
.. note:: Evolving.
"""
for k in options:
self._jwrite = self._jwrite.option(k, to_str(options[k]))
Expand All @@ -723,7 +723,7 @@ def partitionBy(self, *cols):
If specified, the output is laid out on the file system similar
to Hive's partitioning scheme.

.. note:: Experimental.
.. note:: Evolving.

:param cols: name of columns

Expand All @@ -739,7 +739,7 @@ def queryName(self, queryName):
:func:`start`. This name must be unique among all the currently active queries
in the associated SparkSession.

.. note:: Experimental.
.. note:: Evolving.

:param queryName: unique name for the query

Expand All @@ -756,7 +756,7 @@ def trigger(self, processingTime=None, once=None):
"""Set the trigger for the stream query. If this is not set it will run the query as fast
as possible, which is equivalent to setting the trigger to ``processingTime='0 seconds'``.

.. note:: Experimental.
.. note:: Evolving.

:param processingTime: a processing time interval as a string, e.g. '5 seconds', '1 minute'.

Expand Down Expand Up @@ -794,7 +794,7 @@ def start(self, path=None, format=None, outputMode=None, partitionBy=None, query
If ``format`` is not specified, the default data source configured by
``spark.sql.sources.default`` will be used.

.. note:: Experimental.
.. note:: Evolving.

:param path: the path in a Hadoop supported file system
:param format: the format used to save
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,11 @@
import org.apache.spark.sql.catalyst.streaming.InternalOutputModes;

/**
* :: Experimental ::
*
* OutputMode is used to what data will be written to a streaming sink when there is
* new data available in a streaming DataFrame/Dataset.
*
* @since 2.0.0
*/
@Experimental
@InterfaceStability.Evolving
public class OutputMode {

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,22 +21,18 @@

import scala.concurrent.duration.Duration;

import org.apache.spark.annotation.Experimental;
import org.apache.spark.annotation.InterfaceStability;
import org.apache.spark.sql.execution.streaming.OneTimeTrigger$;

/**
* :: Experimental ::
* Policy used to indicate how often results should be produced by a [[StreamingQuery]].
*
* @since 2.0.0
*/
@Experimental
@InterfaceStability.Evolving
public class Trigger {

/**
* :: Experimental ::
* A trigger policy that runs a query periodically based on an interval in processing time.
* If `interval` is 0, the query will run as fast as possible.
*
Expand All @@ -47,7 +43,6 @@ public static Trigger ProcessingTime(long intervalMs) {
}

/**
* :: Experimental ::
* (Java-friendly)
* A trigger policy that runs a query periodically based on an interval in processing time.
* If `interval` is 0, the query will run as fast as possible.
Expand All @@ -64,7 +59,6 @@ public static Trigger ProcessingTime(long interval, TimeUnit timeUnit) {
}

/**
* :: Experimental ::
* (Scala-friendly)
* A trigger policy that runs a query periodically based on an interval in processing time.
* If `duration` is 0, the query will run as fast as possible.
Expand All @@ -80,7 +74,6 @@ public static Trigger ProcessingTime(Duration interval) {
}

/**
* :: Experimental ::
* A trigger policy that runs a query periodically based on an interval in processing time.
* If `interval` is effectively 0, the query will run as fast as possible.
*
Expand Down
2 changes: 0 additions & 2 deletions sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
Original file line number Diff line number Diff line change
Expand Up @@ -2695,13 +2695,11 @@ class Dataset[T] private[sql](
}

/**
* :: Experimental ::
* Interface for saving the content of the streaming Dataset out into external storage.
*
* @group basic
* @since 2.0.0
*/
@Experimental
@InterfaceStability.Evolving
def writeStream: DataStreamWriter[T] = {
if (!isStreaming) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,9 @@

package org.apache.spark.sql

import org.apache.spark.annotation.{Experimental, InterfaceStability}
import org.apache.spark.annotation.InterfaceStability

/**
* :: Experimental ::
* A class to consume data generated by a `StreamingQuery`. Typically this is used to send the
* generated data to external systems. Each partition will use a new deserialized instance, so you
* usually should do all the initialization (e.g. opening a connection or initiating a transaction)
Expand Down Expand Up @@ -66,7 +65,6 @@ import org.apache.spark.annotation.{Experimental, InterfaceStability}
* }}}
* @since 2.0.0
*/
@Experimental
@InterfaceStability.Evolving
abstract class ForeachWriter[T] extends Serializable {

Expand Down
2 changes: 0 additions & 2 deletions sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
Original file line number Diff line number Diff line change
Expand Up @@ -505,7 +505,6 @@ class SQLContext private[sql](val sparkSession: SparkSession)


/**
* :: Experimental ::
* Returns a `DataStreamReader` that can be used to read streaming data in as a `DataFrame`.
* {{{
* sparkSession.readStream.parquet("/path/to/directory/of/parquet/files")
Expand All @@ -514,7 +513,6 @@ class SQLContext private[sql](val sparkSession: SparkSession)
*
* @since 2.0.0
*/
@Experimental
@InterfaceStability.Evolving
def readStream: DataStreamReader = sparkSession.readStream

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -636,7 +636,6 @@ class SparkSession private(
def read: DataFrameReader = new DataFrameReader(self)

/**
* :: Experimental ::
* Returns a `DataStreamReader` that can be used to read streaming data in as a `DataFrame`.
* {{{
* sparkSession.readStream.parquet("/path/to/directory/of/parquet/files")
Expand All @@ -645,7 +644,6 @@ class SparkSession private(
*
* @since 2.0.0
*/
@Experimental
@InterfaceStability.Evolving
def readStream: DataStreamReader = new DataStreamReader(self)

Expand Down
Loading