Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
1161 commits
Select commit Hold shift + click to select a range
a36c1a6
[SPARK-23668][K8S] Added missing config property in running-on-kubern…
liyinan926 Jun 2, 2018
de4feae
[SPARK-24356][CORE] Duplicate strings in File.path managed by FileSeg…
misha-cloudera Jun 3, 2018
a2166ec
[SPARK-24455][CORE] fix typo in TaskSchedulerImpl comment
Jun 4, 2018
416cd1f
[SPARK-24369][SQL] Correct handling for multiple distinct aggregation…
cloud-fan Jun 4, 2018
1d9338b
[SPARK-23786][SQL] Checking column names of csv headers
MaxGekk Jun 4, 2018
0be5aa2
[SPARK-23903][SQL] Add support for date extract
wangyum Jun 4, 2018
7297ae0
[SPARK-21896][SQL] Fix StackOverflow caused by window functions insid…
Jun 4, 2018
b24d3db
[SPARK-24290][ML] add support for Array input for instrumentation.log…
lu-wang-dl Jun 4, 2018
ff0501b
[SPARK-24300][ML] change the way to set seed in ml.cluster.LDASuite.g…
lu-wang-dl Jun 4, 2018
dbb4d83
[SPARK-24215][PYSPARK] Implement _repr_html_ for dataframes in PySpark
xuanyuanking Jun 5, 2018
b3417b7
[SPARK-16451][REPL] Fail shell if SparkSession fails to start.
Jun 5, 2018
e8c1a0c
[SPARK-15784] Add Power Iteration Clustering to spark.ml
WeichenXu123 Jun 5, 2018
2c2a86b
[SPARK-24453][SS] Fix error recovering from the failure in a no-data …
tdas Jun 5, 2018
93df3cd
[SPARK-22384][SQL] Refine partition pruning when attribute is wrapped…
Jun 5, 2018
e9efb62
[SPARK-24187][R][SQL] Add array_join function to SparkR
huaxingao Jun 6, 2018
e76b012
[SPARK-23803][SQL] Support bucket pruning
Jun 6, 2018
1462bba
[SPARK-24119][SQL] Add interpreted execution to SortPrefix expression
bersprockets Jun 8, 2018
2c10020
[SPARK-24224][ML-EXAMPLES] Java example code for Power Iteration Clus…
shahidki31 Jun 8, 2018
a5d775a
[SPARK-24191][ML] Scala Example code for Power Iteration Clustering
shahidki31 Jun 8, 2018
173fe45
[SPARK-24477][SPARK-24454][ML][PYTHON] Imports submodule in ml/__init…
HyukjinKwon Jun 8, 2018
1a644af
[SPARK-23984][K8S] Initial Python Bindings for PySpark on K8s
ifilonenko Jun 8, 2018
b070ded
[SPARK-17756][PYTHON][STREAMING] Workaround to avoid return type mism…
HyukjinKwon Jun 8, 2018
f433ef7
[SPARK-23010][K8S] Initial checkin of k8s integration tests.
ssuchter Jun 8, 2018
36a3409
[SPARK-24412][SQL] Adding docs about automagical type casting in `isi…
raptond Jun 9, 2018
f07c506
[SPARK-24468][SQL] Handle negative scale when adjusting precision for…
mgaido91 Jun 9, 2018
3e5b4ae
[SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration wrapping from…
e-dorigatti Jun 11, 2018
a99d284
[SPARK-19826][ML][PYTHON] add spark.ml Python API for PIC
huaxingao Jun 11, 2018
9b6f242
[MINOR][CORE] Log committer class used by HadoopMapRedCommitProtocol
ejono Jun 11, 2018
2dc047a
[SPARK-24520] Double braces in documentations
Jun 11, 2018
f5af86e
[SPARK-24134][DOCS] A missing full-stop in doc "Tuning Spark".
XD-DENG Jun 11, 2018
0481977
[SPARK-22144][SQL] ExchangeCoordinator combine the partitions of an 0…
liutang123 Jun 12, 2018
dc22465
[SPARK-23732][DOCS] Fix source links in generated scaladoc.
Jun 12, 2018
01452ea
[SPARK-24502][SQL] flaky test: UnsafeRowSerializerSuite
cloud-fan Jun 12, 2018
1d7db65
docs: fix typo
tomsaleeba Jun 12, 2018
5d6a53d
[SPARK-15064][ML] Locale support in StopWordsRemover
dongjinleekr Jun 12, 2018
2824f14
[SPARK-24531][TESTS] Remove version 2.2.0 from testing versions in Hi…
mgaido91 Jun 12, 2018
3af1d3e
[SPARK-24416] Fix configuration specification for killBlacklisted exe…
Jun 12, 2018
f0ef1b3
[SPARK-23931][SQL] Adds arrays_zip function to sparksql
DylanGuedes Jun 12, 2018
cc88d7f
[SPARK-24216][SQL] Spark TypedAggregateExpression uses getSimpleName …
Jun 12, 2018
ada28f2
[SPARK-23933][SQL] Add map_from_arrays function
kiszk Jun 12, 2018
0d3714d
[SPARK-23010][BUILD][FOLLOWUP] Fix java checkstyle failure of kuberne…
jiangxb1987 Jun 12, 2018
f53818d
[SPARK-24506][UI] Add UI filters to tabs added after binding
mgaido91 Jun 12, 2018
9786ce6
[SPARK-22239][SQL][PYTHON] Enable grouped aggregate pandas UDFs as wi…
icexelloss Jun 13, 2018
3352d6f
[SPARK-24466][SS] Fix TextSocketMicroBatchReader to be compatible wit…
HeartSaVioR Jun 13, 2018
4c388bc
[SPARK-24485][SS] Measure and log elapsed time for filesystem operati…
HeartSaVioR Jun 13, 2018
7703b46
[SPARK-24479][SS] Added config for registering streamingQueryListeners
arunmahadevan Jun 13, 2018
299d297
[SPARK-24500][SQL] Make sure streams are materialized during Tree tra…
hvanhovell Jun 13, 2018
1b46f41
[SPARK-24235][SS] Implement continuous shuffle writer for single read…
jose-torres Jun 13, 2018
3bf7691
[SPARK-24531][TESTS] Replace 2.3.0 version with 2.3.1
mgaido91 Jun 13, 2018
534065e
[MINOR][CORE][TEST] Remove unnecessary sort in UnsafeInMemorySorterSuite
jiangxb1987 Jun 14, 2018
fdadc4b
[SPARK-24495][SQL] EnsureRequirement returns wrong plan when reorderi…
mgaido91 Jun 14, 2018
d3eed8f
[SPARK-24563][PYTHON] Catch TypeError when testing existence of HiveC…
icexelloss Jun 14, 2018
b8f27ae
[SPARK-24543][SQL] Support any type as DDL string for from_json's schema
MaxGekk Jun 14, 2018
18cb0c0
[SPARK-24319][SPARK SUBMIT] Fix spark-submit execution where no main …
gaborgsomogyi Jun 14, 2018
270a9a3
[SPARK-24248][K8S] Use level triggering and state reconciliation in s…
mccheah Jun 14, 2018
22daeba
[SPARK-24478][SQL] Move projection and filter push down to physical c…
rdblue Jun 15, 2018
6567fc4
[PYTHON] Fix typo in serializer exception
rberenguel Jun 15, 2018
495d8cf
[SPARK-24490][WEBUI] Use WebUI.addStaticHandler in web UIs
jaceklaskowski Jun 15, 2018
b5ccf0d
[SPARK-24396][SS][PYSPARK] Add Structured Streaming ForeachWriter for…
tdas Jun 15, 2018
90da7dc
[SPARK-24452][SQL][CORE] Avoid possible overflow in int add or multiple
kiszk Jun 15, 2018
e4fee39
[SPARK-24525][SS] Provide an option to limit number of rows in a Memo…
Jun 15, 2018
c7c0b08
add one supported type missing from the javadoc
ispot-james Jun 16, 2018
b0a9352
[SPARK-24573][INFRA] Runs SBT checkstyle after the build to work arou…
HyukjinKwon Jun 18, 2018
e219e69
[SPARK-23772][SQL] Provide an option to ignore column of all null val…
maropu Jun 18, 2018
bce1775
[SPARK-24526][BUILD][TEST-MAVEN] Spaces in the build dir causes failu…
Jun 18, 2018
8f225e0
[SPARK-24548][SQL] Fix incorrect schema of Dataset with tuple encoders
viirya Jun 18, 2018
1737d45
[SPARK-24478][SQL][FOLLOWUP] Move projection and filter push down to …
cloud-fan Jun 19, 2018
9a75c18
[SPARK-24542][SQL] UDF series UDFXPathXXXX allow users to pass carefu…
gatorsmile Jun 19, 2018
a78a904
[SPARK-24521][SQL][TEST] Fix ineffective test in CachedTableSuite
icexelloss Jun 19, 2018
9dbe53e
[SPARK-24556][SQL] Always rewrite output partitioning in ReusedExchan…
yucai Jun 19, 2018
13092d7
[SPARK-24534][K8S] Bypass non spark-on-k8s commands
rimolive Jun 19, 2018
2cb9763
[SPARK-24565][SS] Add API for in Structured Streaming for exposing ou…
tdas Jun 19, 2018
bc0498d
[SPARK-24583][SQL] Wrong schema type in InsertIntoDataSourceCommand
maryannxue Jun 19, 2018
bc11146
[SPARK-23778][CORE] Avoid unneeded shuffle when union gets an empty RDD
mgaido91 Jun 20, 2018
c8ef923
[MINOR][SQL] Remove invalid comment from SparkStrategies
HeartSaVioR Jun 20, 2018
c5a0d11
[SPARK-24575][SQL] Prohibit window expressions inside WHERE and HAVIN…
Jun 20, 2018
3f4bda7
[SPARK-24578][CORE] Cap sub-region's size of returned nio buffer
WenboZhao Jun 20, 2018
15747cf
[SPARK-24547][K8S] Allow for building spark on k8s docker images with…
Jun 21, 2018
9de11d3
[SPARK-23912][SQL] add array_distinct
huaxingao Jun 21, 2018
54fcaaf
[SPARK-24571][SQL] Support Char literals
MaxGekk Jun 21, 2018
7236e75
[SPARK-24574][SQL] array_contains, array_position, array_remove and e…
chongguang Jun 21, 2018
c0cad59
[SPARK-24614][PYSPARK] Fix for SyntaxWarning on tests.py
rekhajoshm Jun 21, 2018
b56e9c6
[SPARK-16630][YARN] Blacklist a node if executors won't launch on it
attilapiros Jun 21, 2018
c8e909c
[SPARK-24589][CORE] Correctly identify tasks in output commit coordin…
Jun 21, 2018
b9a6f74
[SPARK-24613][SQL] Cache with UDF could not be matched with subsequen…
maryannxue Jun 21, 2018
dc8a6be
[SPARK-24588][SS] streaming join should require HashClusteredPartitio…
cloud-fan Jun 21, 2018
92c2f00
[SPARK-23934][SQL] Adding map_from_entries function
mn-mikke Jun 22, 2018
39dfaf2
[SPARK-24519] Make the threshold for highly compressed map status con…
Jun 22, 2018
33e77fa
[SPARK-24518][CORE] Using Hadoop credential provider API to store pas…
jerryshao Jun 22, 2018
4e7d867
[SPARK-24372][BUILD] Add scripts to help with preparing releases.
Jun 22, 2018
c7e2742
[SPARK-24190][SQL] Allow saving of JSON files in UTF-16 and UTF-32
MaxGekk Jun 24, 2018
98f363b
[SPARK-24206][SQL] Improve FilterPushdownBenchmark benchmark code
maropu Jun 24, 2018
a5849ad
[SPARK-24324][PYTHON] Pandas Grouped Map UDF should assign result col…
BryanCutler Jun 24, 2018
f596ebe
[SPARK-24327][SQL] Verify and normalize a partition column name based…
maropu Jun 25, 2018
6e0596e
[SPARK-23931][SQL][FOLLOW-UP] Make `arrays_zip` in function.scala `@s…
ueshin Jun 25, 2018
8ab8ef7
Fix minor typo in docs/cloud-integration.md
Jun 25, 2018
bac50aa
[SPARK-24596][SQL] Non-cascading Cache Invalidation
maryannxue Jun 25, 2018
594ac4f
[SPARK-24633][SQL] Fix codegen when split is required for arrays_zip
mgaido91 Jun 25, 2018
5264164
[SPARK-24648][SQL] SqlMetrics should be threadsafe
dbkerkela Jun 25, 2018
baa01c8
[INFRA] Close stale PR.
Jun 25, 2018
6d16b98
[SPARK-24552][CORE][SQL] Use task ID instead of attempt number for wr…
Jun 25, 2018
d48803b
[SPARK-24324][PYTHON][FOLLOWUP] Grouped Map positional conf should ha…
BryanCutler Jun 26, 2018
4c059eb
[SPARK-23776][DOC] Update instructions for running PySpark after buil…
bersprockets Jun 26, 2018
c7967c6
[SPARK-24418][BUILD] Upgrade Scala to 2.11.12 and 2.12.6
dbtsai Jun 26, 2018
e07aee2
[SPARK-24636][SQL] Type coercion of arrays for array_join function
mn-mikke Jun 26, 2018
dcaa49f
[SPARK-24658][SQL] Remove workaround for ANTLR bug
wangyum Jun 26, 2018
02f8781
[SPARK-24423][SQL] Add a new option for JDBC sources
dilipbiswal Jun 26, 2018
16f2c3e
[SPARK-6237][NETWORK] Network-layer changes to allow stream upload.
squito Jun 26, 2018
1b9368f
[SPARK-24659][SQL] GenericArrayData.equals should respect element typ…
rednaxelafx Jun 27, 2018
d08f53d
[SPARK-24605][SQL] size(null) returns null instead of -1
MaxGekk Jun 27, 2018
2669b4d
[SPARK-23927][SQL] Add "sequence" expression
wajda Jun 27, 2018
9a76f23
[SPARK-23927][SQL][FOLLOW-UP] Fix a build failure.
ueshin Jun 27, 2018
a1a64e3
[SPARK-21335][DOC] doc changes for disallowed un-aliased subquery use…
cnZach Jun 27, 2018
6a0b77a
[SPARK-24215][PYSPARK][FOLLOW UP] Implement eager evaluation for Data…
xuanyuanking Jun 27, 2018
78ecb6d
[SPARK-24446][YARN] Properly quote library path for YARN.
Jun 27, 2018
c04cb2d
[SPARK-21687][SQL] Spark SQL should set createTime for Hive partition
debugger87 Jun 27, 2018
776befb
[SPARK-24660][SHS] Show correct error pages when downloading logs
mgaido91 Jun 27, 2018
221d03a
[SPARK-24533] Typesafe rebranded to lightbend. Changing the build dow…
Jun 27, 2018
893ea22
[SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFileFormat
maropu Jun 27, 2018
c5aa54d
[SPARK-24553][WEB-UI] http 302 fixes for href redirect
SJKallman Jun 27, 2018
bd32b50
[SPARK-24645][SQL] Skip parsing when csvColumnPruning enabled and par…
maropu Jun 28, 2018
1c9acc2
[SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBenchmark benchmar…
maropu Jun 28, 2018
6a97e8e
[SPARK-24603][SQL] Fix findTightestCommonType reference in comments
Jun 28, 2018
5b05966
[SPARK-24564][TEST] Add test suite for RecordBinaryComparator
jiangxb1987 Jun 28, 2018
524827f
[SPARK-14712][ML] LogisticRegressionModel.toString should summarize m…
jiayue-zhang Jun 28, 2018
a95a4af
[SPARK-23120][PYSPARK][ML] Add basic PMML export support to PySpark
holdenk Jun 28, 2018
e1d3f80
[SPARK-24408][SQL][DOC] Move abs function to math_funcs group
jaceklaskowski Jun 28, 2018
2224861
[SPARK-24439][ML][PYTHON] Add distanceMeasure to BisectingKMeans in P…
huaxingao Jun 28, 2018
f6e6899
[SPARK-24386][SS] coalesce(1) aggregates in continuous processing
jose-torres Jun 28, 2018
f71e8da
[SPARK-24566][CORE] Fix spark.storage.blockManagerSlaveTimeoutMs defa…
xueyumusic Jun 29, 2018
03545ce
[SPARK-24638][SQL] StringStartsWith support push down
wangyum Jun 30, 2018
797971e
[SPARK-24696][SQL] ColumnPruning rule fails to remove extra Project
maryannxue Jun 30, 2018
d54d8b8
simplify rand in dsl/package.scala
gatorsmile Jun 30, 2018
f825847
[SPARK-24654][BUILD] Update, fix LICENSE and NOTICE, and specialize f…
srowen Jul 1, 2018
8f91c69
[SPARK-24665][PYSPARK] Use SQLConf in PySpark to manage all sql configs
xuanyuanking Jul 2, 2018
8008f9c
[SPARK-24715][BUILD] Override jline version as 2.14.3 in SBT
viirya Jul 2, 2018
f599cde
[SPARK-24507][DOCUMENTATION] Update streaming guide
rekhajoshm Jul 2, 2018
4281554
[SPARK-24683][K8S] Fix k8s no resource
mccheah Jul 2, 2018
85fe129
[SPARK-24428][K8S] Fix unused code
Jul 2, 2018
a7c8f0c
[SPARK-24385][SQL] Resolve self-join condition ambiguity for EqualNul…
mgaido91 Jul 3, 2018
5585c57
[SPARK-24420][BUILD] Upgrade ASM to 6.1 to support JDK9+
dbtsai Jul 3, 2018
776f299
[SPARK-24709][SQL] schema_of_json() - schema inference from an example
MaxGekk Jul 4, 2018
b42fda8
[SPARK-23698] Remove raw_input() from Python 2
Jul 4, 2018
5bf95f2
[BUILD] Close stale PRs
srowen Jul 4, 2018
7c08eb6
[SPARK-24732][SQL] Type coercion between MapTypes.
ueshin Jul 4, 2018
772060d
[SPARK-24704][WEBUI] Fix the order of stages in the DAG graph
stanzhai Jul 4, 2018
b2deef6
[SPARK-24727][SQL] Add a static config to control cache size for gene…
maropu Jul 4, 2018
021145f
[SPARK-24716][SQL] Refactor ParquetFilters
wangyum Jul 4, 2018
1a2655a
[SPARK-24635][SQL] Remove Blocks class from JavaCode class hierarchy
viirya Jul 4, 2018
ca8243f
[MINOR][ML] Minor correction in the powerIterationSuite
shahidki31 Jul 4, 2018
bf764a3
[SPARK-22384][SQL][FOLLOWUP] Refine partition pruning when attribute …
cloud-fan Jul 5, 2018
489a529
[SPARK-17213][SPARK-17213][FOLLOW-UP] Improve the test of
gatorsmile Jul 5, 2018
f997be0
[SPARK-24698][PYTHON] Fixed typo in pyspark.ml's Identifiable class.
mcteo Jul 5, 2018
4be9f0c
[SPARK-24673][SQL] scala sql function from_utc_timestamp second argum…
agilelab-tmnd1991 Jul 5, 2018
32cfd3e
[SPARK-24361][SQL] Polish code block manipulation API
viirya Jul 5, 2018
e58dadb
[SPARK-23820][CORE] Enable use of long form of callsite in logs
michaelmior Jul 5, 2018
7bd6d54
[SPARK-24711][K8S] Fix tags for integration tests
Jul 5, 2018
ac78bcc
[SPARK-24743][EXAMPLES] Update the JavaDirectKafkaWordCount example t…
cluo512 Jul 5, 2018
33952cf
[SPARK-24675][SQL] Rename table: validate existence of new location
gengliangwang Jul 5, 2018
e71e93a
[SPARK-24694][K8S] Pass all app args to integration tests
Jul 5, 2018
01fcba2
[SPARK-24737][SQL] Type coercion between StructTypes.
ueshin Jul 6, 2018
bf67f70
[SPARK-24692][TESTS] Improvement FilterPushdownBenchmark
wangyum Jul 6, 2018
141953f
[SPARK-24535][SPARKR] fix tests on java check error
felixcheung Jul 6, 2018
a381bce
[SPARK-24673][SQL][PYTHON][FOLLOWUP] Support Column arguments in time…
maropu Jul 6, 2018
4de0425
[SPARK-24569][SQL] Aggregator with output type Option should produce …
viirya Jul 7, 2018
fc43690
[SPARK-24749][SQL] Use sameType to compare Array's element type in Ar…
viirya Jul 7, 2018
74f6a92
[SPARK-24739][PYTHON] Make PySpark compatible with Python 3.7
HyukjinKwon Jul 7, 2018
044b33b
[SPARK-24740][PYTHON][ML] Make PySpark's tests compatible with NumPy …
HyukjinKwon Jul 7, 2018
79c6689
[SPARK-24757][SQL] Improving the error message for broadcast timeouts
MaxGekk Jul 7, 2018
e2c7e09
[SPARK-24646][CORE] Minor change to spark.yarn.dist.forceDownloadSche…
jerryshao Jul 9, 2018
034913b
[SPARK-23936][SQL] Implement map_concat
bersprockets Jul 9, 2018
1bd3d61
[SPARK-24268][SQL] Use datatype.simpleString in error messages
mgaido91 Jul 9, 2018
aec966b
Revert "[SPARK-24268][SQL] Use datatype.simpleString in error messages"
gatorsmile Jul 9, 2018
eb6e988
[SPARK-24759][SQL] No reordering keys for broadcast hash join
gatorsmile Jul 9, 2018
4984f1a
[MINOR] Add Sphinx into dev/requirements.txt
HyukjinKwon Jul 10, 2018
a289009
[SPARK-24706][SQL] ByteType and ShortType support pushdown to parquet
wangyum Jul 10, 2018
6fe3286
[SPARK-24678][SPARK-STREAMING] Give priority in use of 'PROCESS_LOCAL…
Jul 10, 2018
e0559f2
[SPARK-21743][SQL][FOLLOWUP] free aggregate map when task ends
cloud-fan Jul 10, 2018
32cb508
[SPARK-24662][SQL][SS] Support limit in structured streaming
mukulmurthy Jul 10, 2018
6078b89
[SPARK-24730][SS] Add policy to choose max as global watermark when s…
tdas Jul 11, 2018
1f94bf4
[SPARK-24530][PYTHON] Add a control to force Python version in Sphinx…
HyukjinKwon Jul 11, 2018
74a8d63
[SPARK-24165][SQL] Fixing conditional expressions to handle nullabili…
mn-mikke Jul 11, 2018
5ff1b9b
[SPARK-23529][K8S] Support mounting volumes
Jul 11, 2018
006e798
[SPARK-23461][R] vignettes should include model predictions for some …
huaxingao Jul 11, 2018
592cc84
[SPARK-24562][TESTS] Support different configs for same test in SQLQu…
mgaido91 Jul 11, 2018
ebf4bfb
[SPARK-24208][SQL] Fix attribute deduplication for FlatMapGroupsInPandas
mgaido91 Jul 11, 2018
290c30a
[SPARK-24470][CORE] RestSubmissionClient to be robust against 404 & n…
rekhajoshm Jul 11, 2018
59c3c23
[SPARK-23254][ML] Add user guide entry and example for DataFrame mult…
WeichenXu123 Jul 11, 2018
ff7f6ef
[SPARK-24697][SS] Fix the reported start offsets in streaming query p…
tdas Jul 11, 2018
e008ad1
[SPARK-24782][SQL] Simplify conf retrieval in SQL expressions
mgaido91 Jul 12, 2018
3ab48f9
[SPARK-24761][SQL] Adding of isModifiable() to RuntimeConfig
MaxGekk Jul 12, 2018
5ad4735
[SPARK-24529][BUILD][TEST-MAVEN] Add spotbugs into maven build process
kiszk Jul 12, 2018
301bff7
[SPARK-23914][SQL] Add array_union function
kiszk Jul 12, 2018
e6c6f90
[SPARK-24691][SQL] Dispatch the type support check in FileFormat impl…
gengliangwang Jul 12, 2018
9fa4a1e
[SPARK-20168][STREAMING KINESIS] Setting the timestamp directly would…
yashs360 Jul 12, 2018
1055c94
[SPARK-24610] fix reading small files via wholeTextFiles
dhruve Jul 12, 2018
395860a
[SPARK-24768][SQL] Have a built-in AVRO data source implementation
gengliangwang Jul 12, 2018
07704c9
[SPARK-23007][SQL][TEST] Add read schema suite for file-based data so…
dongjoon-hyun Jul 12, 2018
1138489
[SPARK-24208][SQL][FOLLOWUP] Move test cases to proper locations
mgaido91 Jul 12, 2018
7572505
[SPARK-24790][SQL] Allow complex aggregate expressions in Pivot
maryannxue Jul 12, 2018
e0f4f20
[SPARK-24537][R] Add array_remove / array_zip / map_from_arrays / arr…
huaxingao Jul 13, 2018
0ce11d0
[SPARK-23486] cache the function name from the external catalog for l…
kevinyu98 Jul 13, 2018
0f24c6f
[SPARK-24713] AppMatser of spark streaming kafka OOM if there are hund…
Jul 13, 2018
dfd7ac9
[SPARK-24781][SQL] Using a reference from Dataset in Filter/Sort migh…
viirya Jul 13, 2018
c1b62e4
[SPARK-24776][SQL] Avro unit test: use SQLTestUtils and replace depre…
gengliangwang Jul 13, 2018
3bcb1b4
Revert "[SPARK-24776][SQL] Avro unit test: use SQLTestUtils and repla…
gatorsmile Jul 13, 2018
3b6005b
[SPARK-23528][ML] Add numIter to ClusteringSummary
mgaido91 Jul 13, 2018
a75571b
[SPARK-23831][SQL] Add org.apache.derby to IsolatedClientLoader
wangyum Jul 13, 2018
f1a99ad
[SPARK-23984][K8S][TEST] Added Integration Tests for PySpark on Kuber…
ifilonenko Jul 14, 2018
e1de341
[SPARK-17091][SQL] Add rule to convert IN predicate to equivalent Par…
wangyum Jul 14, 2018
8aceb96
[SPARK-24754][ML] Minhash integer overflow
srowen Jul 14, 2018
43e4e85
[SPARK-24718][SQL] Timestamp support pushdown to parquet data source
wangyum Jul 15, 2018
3e7dc82
[SPARK-24776][SQL] Avro unit test: deduplicate code and replace depre…
gengliangwang Jul 15, 2018
6999321
[SPARK-24807][CORE] Adding files/jars twice: output a warning and add…
MaxGekk Jul 15, 2018
9603087
[SPARK-24800][SQL] Refactor Avro Serializer and Deserializer
gengliangwang Jul 15, 2018
5d62a98
Doc fix: The Imputer is an Estimator
zoltanctoth Jul 15, 2018
bbc2ffc
[SPARK-24813][TESTS][HIVE][HOTFIX] HiveExternalCatalogVersionsSuite s…
srowen Jul 16, 2018
bcf7121
[TRIVIAL][ML] GMM unpersist RDD after training
Jul 16, 2018
d463533
[SPARK-24676][SQL] Project required data from CSV parsed data when co…
maropu Jul 16, 2018
9f92945
[SPARK-24810][SQL] Fix paths to test files in AvroSuite
MaxGekk Jul 16, 2018
2603ae3
[SPARK-24558][CORE] wrong Idle Timeout value is used in case of the c…
sandeep-katta Jul 16, 2018
9549a28
[SPARK-24549][SQL] Support Decimal type push down to the parquet data…
wangyum Jul 16, 2018
cf97045
[SPARK-18230][MLLIB] Throw a better exception, if the user or product…
shahidki31 Jul 16, 2018
b045315
[SPARK-24734][SQL] Fix type coercions and nullabilities of nested dat…
ueshin Jul 16, 2018
b0c95a1
[SPARK-23901][SQL] Removing masking functions
mn-mikke Jul 16, 2018
ba437fc
[SPARK-24805][SQL] Do not ignore avro files without extensions by def…
MaxGekk Jul 16, 2018
0f0d186
[SPARK-24402][SQL] Optimize `In` expression when only one element in …
dbtsai Jul 16, 2018
d57a267
[SPARK-23259][SQL] Clean up legacy code around hive external catalog …
Jul 17, 2018
f876d3f
[SPARK-20220][DOCS] Documentation Add thrift scheduling pool config t…
mrchristine Jul 17, 2018
0ca16f6
Revert "[SPARK-24402][SQL] Optimize `In` expression when only one ele…
HyukjinKwon Jul 17, 2018
4cf1bec
[SPARK-24305][SQL][FOLLOWUP] Avoid serialization of private fields in…
mn-mikke Jul 17, 2018
5215344
[SPARK-24813][BUILD][FOLLOW-UP][HOTFIX] HiveExternalCatalogVersionsSu…
srowen Jul 17, 2018
7688ce8
[SPARK-21590][SS] Window start time should support negative values
KevinZwx Jul 17, 2018
912634b
[SPARK-24747][ML] Make Instrumentation class more flexible
MrBago Jul 17, 2018
2a4dd6f
[SPARK-24681][SQL] Verify nested column names in Hive metastore
maropu Jul 17, 2018
681845f
[SPARK-24402][SQL] Optimize `In` expression when only one element in …
dbtsai Jul 18, 2018
fc2e189
[SPARK-24529][BUILD][TEST-MAVEN][FOLLOW-UP] Set spotbugs-maven-plugin…
wangyum Jul 18, 2018
3b59d32
[SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2
dongjoon-hyun Jul 18, 2018
34cb3b5
[SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix lint-java and Scala 2…
ueshin Jul 18, 2018
2694dd2
[MINOR][CORE] Add test cases for RDD.cartesian
NiharS Jul 18, 2018
002300d
[SPARK-24804] There are duplicate words in the test title in the Data…
httfighter Jul 18, 2018
ebe9e28
[SPARK-24628][DOC] Typos of the example code in docs/mllib-data-types.md
huangweizhe123 Jul 18, 2018
fc0c8c9
[SPARK-24825][K8S][TEST] Kubernetes integration tests build the whole…
mccheah Jul 18, 2018
c8bee93
[SPARK-24677][CORE] Avoid NoSuchElementException from MedianHeap
cxzl25 Jul 18, 2018
1272b20
[SPARK-22151] PYTHONPATH not picked up from the spark.yarn.appMaste…
Jul 18, 2018
cd203e0
[SPARK-24163][SPARK-24164][SQL] Support column list as the pivot colu…
maryannxue Jul 18, 2018
d404e54
[SPARK-24129][K8S] Add option to pass --build-arg's to docker-image-t…
Jul 18, 2018
753f115
[SPARK-21261][DOCS][SQL] SQL Regex document fix
srowen Jul 18, 2018
cd5d93c
[SPARK-24854][SQL] Gathering all Avro options into the AvroOptions class
MaxGekk Jul 19, 2018
1a4fda8
[INFRA] Close stale PR
HyukjinKwon Jul 19, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-23010][K8S] Initial checkin of k8s integration tests.
These tests were developed in the https://github.com/apache-spark-on-k8s/spark-integration repo
by several contributors. This is a copy of the current state into the main apache spark repo.
The only changes from the current spark-integration repo state are:
* Move the files from the repo root into resource-managers/kubernetes/integration-tests
* Add a reference to these tests in the root README.md
* Fix a path reference in dev/dev-run-integration-tests.sh
* Add a TODO in include/util.sh

## What changes were proposed in this pull request?

Incorporation of Kubernetes integration tests.

## How was this patch tested?

This code has its own unit tests, but the main purpose is to provide the integration tests.
I tested this on my laptop by running dev/dev-run-integration-tests.sh --spark-tgz ~/spark-2.4.0-SNAPSHOT-bin--.tgz

The spark-integration tests have already been running for months in AMPLab, here is an example:
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-scheduled-spark-integration-master/

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Sean Suchter <[email protected]>
Author: Sean Suchter <[email protected]>

Closes #20697 from ssuchter/ssuchter-k8s-integration-tests.
  • Loading branch information
ssuchter authored and mccheah committed Jun 8, 2018
commit f433ef786770e48e3594ad158ce9908f98ef0d9a
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ can be run using:
Please see the guidance on how to
[run tests for a module, or individual tests](http://spark.apache.org/developer-tools.html#individual-tests).

There is also a Kubernetes integration test, see resource-managers/kubernetes/integration-tests/README.md

## A Note About Hadoop Versions

Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
Expand Down
2 changes: 1 addition & 1 deletion dev/tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@
[pycodestyle]
ignore=E402,E731,E241,W503,E226,E722,E741,E305
max-line-length=100
exclude=cloudpickle.py,heapq3.py,shared.py,python/docs/conf.py,work/*/*.py,python/.eggs/*
exclude=cloudpickle.py,heapq3.py,shared.py,python/docs/conf.py,work/*/*.py,python/.eggs/*,dist/*
1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2705,6 +2705,7 @@
<id>kubernetes</id>
<modules>
<module>resource-managers/kubernetes/core</module>
<module>resource-managers/kubernetes/integration-tests</module>
</modules>
</profile>

Expand Down
52 changes: 52 additions & 0 deletions resource-managers/kubernetes/integration-tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
layout: global
title: Spark on Kubernetes Integration Tests
---

# Running the Kubernetes Integration Tests

Note that the integration test framework is currently being heavily revised and
is subject to change. Note that currently the integration tests only run with Java 8.

The simplest way to run the integration tests is to install and run Minikube, then run the following:

dev/dev-run-integration-tests.sh

The minimum tested version of Minikube is 0.23.0. The kube-dns addon must be enabled. Minikube should
run with a minimum of 3 CPUs and 4G of memory:

minikube start --cpus 3 --memory 4096

You can download Minikube [here](https://github.com/kubernetes/minikube/releases).

# Integration test customization

Configuration of the integration test runtime is done through passing different arguments to the test script. The main useful options are outlined below.

## Re-using Docker Images

By default, the test framework will build new Docker images on every test execution. A unique image tag is generated,
and it is written to file at `target/imageTag.txt`. To reuse the images built in a previous run, or to use a Docker image tag
that you have built by other means already, pass the tag to the test script:

dev/dev-run-integration-tests.sh --image-tag <tag>

where if you still want to use images that were built before by the test framework:

dev/dev-run-integration-tests.sh --image-tag $(cat target/imageTag.txt)

## Spark Distribution Under Test

The Spark code to test is handed to the integration test system via a tarball. Here is the option that is used to specify the tarball:

* `--spark-tgz <path-to-tgz>` - set `<path-to-tgz>` to point to a tarball containing the Spark distribution to test.

TODO: Don't require the packaging of the built Spark artifacts into this tarball, just read them out of the current tree.

## Customizing the Namespace and Service Account

* `--namespace <namespace>` - set `<namespace>` to the namespace in which the tests should be run.
* `--service-account <service account name>` - set `<service account name>` to the name of the Kubernetes service account to
use in the namespace specified by the `--namespace`. The service account is expected to have permissions to get, list, watch,
and create pods. For clusters with RBAC turned on, it's important that the right permissions are granted to the service account
in the namespace through an appropriate role and role binding. A reference RBAC configuration is provided in `dev/spark-rbac.yaml`.
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
TEST_ROOT_DIR=$(git rev-parse --show-toplevel)/resource-managers/kubernetes/integration-tests

cd "${TEST_ROOT_DIR}"

DEPLOY_MODE="minikube"
IMAGE_REPO="docker.io/kubespark"
SPARK_TGZ="N/A"
IMAGE_TAG="N/A"
SPARK_MASTER=
NAMESPACE=
SERVICE_ACCOUNT=

# Parse arguments
while (( "$#" )); do
case $1 in
--image-repo)
IMAGE_REPO="$2"
shift
;;
--image-tag)
IMAGE_TAG="$2"
shift
;;
--deploy-mode)
DEPLOY_MODE="$2"
shift
;;
--spark-tgz)
SPARK_TGZ="$2"
shift
;;
--spark-master)
SPARK_MASTER="$2"
shift
;;
--namespace)
NAMESPACE="$2"
shift
;;
--service-account)
SERVICE_ACCOUNT="$2"
shift
;;
*)
break
;;
esac
shift
done

cd $TEST_ROOT_DIR

properties=(
-Dspark.kubernetes.test.sparkTgz=$SPARK_TGZ \
-Dspark.kubernetes.test.imageTag=$IMAGE_TAG \
-Dspark.kubernetes.test.imageRepo=$IMAGE_REPO \
-Dspark.kubernetes.test.deployMode=$DEPLOY_MODE
)

if [ -n $NAMESPACE ];
then
properties=( ${properties[@]} -Dspark.kubernetes.test.namespace=$NAMESPACE )
fi

if [ -n $SERVICE_ACCOUNT ];
then
properties=( ${properties[@]} -Dspark.kubernetes.test.serviceAccountName=$SERVICE_ACCOUNT )
fi

if [ -n $SPARK_MASTER ];
then
properties=( ${properties[@]} -Dspark.kubernetes.test.master=$SPARK_MASTER )
fi

../../../build/mvn integration-test ${properties[@]}
52 changes: 52 additions & 0 deletions resource-managers/kubernetes/integration-tests/dev/spark-rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

apiVersion: v1
kind: Namespace
metadata:
name: spark
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark-sa
namespace: spark
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: spark-role
rules:
- apiGroups:
- ""
resources:
- "pods"
verbs:
- "*"
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: spark-role-binding
subjects:
- kind: ServiceAccount
name: spark-sa
namespace: spark
roleRef:
kind: ClusterRole
name: spark-role
apiGroup: rbac.authorization.k8s.io
155 changes: 155 additions & 0 deletions resource-managers/kubernetes/integration-tests/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.11</artifactId>
<version>2.4.0-SNAPSHOT</version>
<relativePath>../../../pom.xml</relativePath>
</parent>

<artifactId>spark-kubernetes-integration-tests_2.11</artifactId>
<groupId>spark-kubernetes-integration-tests</groupId>
<properties>
<download-maven-plugin.version>1.3.0</download-maven-plugin.version>
<exec-maven-plugin.version>1.4.0</exec-maven-plugin.version>
<extraScalaTestArgs></extraScalaTestArgs>
<kubernetes-client.version>3.0.0</kubernetes-client.version>
<scala-maven-plugin.version>3.2.2</scala-maven-plugin.version>
<scalatest-maven-plugin.version>1.0</scalatest-maven-plugin.version>
<sbt.project.name>kubernetes-integration-tests</sbt.project.name>
<spark.kubernetes.test.unpackSparkDir>${project.build.directory}/spark-dist-unpacked</spark.kubernetes.test.unpackSparkDir>
<spark.kubernetes.test.imageTag>N/A</spark.kubernetes.test.imageTag>
<spark.kubernetes.test.imageTagFile>${project.build.directory}/imageTag.txt</spark.kubernetes.test.imageTagFile>
<spark.kubernetes.test.deployMode>minikube</spark.kubernetes.test.deployMode>
<spark.kubernetes.test.imageRepo>docker.io/kubespark</spark.kubernetes.test.imageRepo>
<test.exclude.tags></test.exclude.tags>
</properties>
<packaging>jar</packaging>
<name>Spark Project Kubernetes Integration Tests</name>

<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.fabric8</groupId>
<artifactId>kubernetes-client</artifactId>
<version>${kubernetes-client.version}</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>${exec-maven-plugin.version}</version>
<executions>
<execution>
<id>setup-integration-test-env</id>
<phase>pre-integration-test</phase>
<goals>
<goal>exec</goal>
</goals>
<configuration>
<executable>scripts/setup-integration-test-env.sh</executable>
<arguments>
<argument>--unpacked-spark-tgz</argument>
<argument>${spark.kubernetes.test.unpackSparkDir}</argument>

<argument>--image-repo</argument>
<argument>${spark.kubernetes.test.imageRepo}</argument>

<argument>--image-tag</argument>
<argument>${spark.kubernetes.test.imageTag}</argument>

<argument>--image-tag-output-file</argument>
<argument>${spark.kubernetes.test.imageTagFile}</argument>

<argument>--deploy-mode</argument>
<argument>${spark.kubernetes.test.deployMode}</argument>

<argument>--spark-tgz</argument>
<argument>${spark.kubernetes.test.sparkTgz}</argument>
</arguments>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<!-- Triggers scalatest plugin in the integration-test phase instead of
the test phase. -->
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
<version>${scalatest-maven-plugin.version}</version>
<configuration>
<reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
<junitxml>.</junitxml>
<filereports>SparkTestSuite.txt</filereports>
<argLine>-ea -Xmx3g -XX:ReservedCodeCacheSize=512m ${extraScalaTestArgs}</argLine>
<stderr/>
<systemProperties>
<log4j.configuration>file:src/test/resources/log4j.properties</log4j.configuration>
<java.awt.headless>true</java.awt.headless>
<spark.kubernetes.test.imageTagFile>${spark.kubernetes.test.imageTagFile}</spark.kubernetes.test.imageTagFile>
<spark.kubernetes.test.unpackSparkDir>${spark.kubernetes.test.unpackSparkDir}</spark.kubernetes.test.unpackSparkDir>
<spark.kubernetes.test.imageRepo>${spark.kubernetes.test.imageRepo}</spark.kubernetes.test.imageRepo>
<spark.kubernetes.test.deployMode>${spark.kubernetes.test.deployMode}</spark.kubernetes.test.deployMode>
<spark.kubernetes.test.master>${spark.kubernetes.test.master}</spark.kubernetes.test.master>
<spark.kubernetes.test.namespace>${spark.kubernetes.test.namespace}</spark.kubernetes.test.namespace>
<spark.kubernetes.test.serviceAccountName>${spark.kubernetes.test.serviceAccountName}</spark.kubernetes.test.serviceAccountName>
</systemProperties>
<tagsToExclude>${test.exclude.tags}</tagsToExclude>
</configuration>
<executions>
<execution>
<id>test</id>
<goals>
<goal>test</goal>
</goals>
<configuration>
<!-- The negative pattern below prevents integration tests such as
KubernetesSuite from running in the test phase. -->
<suffixes>(?&lt;!Suite)</suffixes>
</configuration>
</execution>
<execution>
<id>integration-test</id>
<phase>integration-test</phase>
<goals>
<goal>test</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>

</build>

</project>
Loading