Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
925 commits
Select commit Hold shift + click to select a range
82b4ad2
[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.…
cloud-fan Jun 7, 2024
9491292
[SPARK-48548][BUILD] Add LICENSE/NOTICE for spark-core with shaded de…
yaooqinn Jun 7, 2024
b7d9c31
Revert "[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTable…
yaooqinn Jun 7, 2024
87b0f59
[SPARK-48561][PS][CONNECT] Throw `PandasNotImplementedError` for unsu…
zhengruifeng Jun 7, 2024
d81b1e3
[SPARK-48559][SQL] Fetch globalTempDatabase name directly without inv…
willwwt Jun 7, 2024
8911d59
[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.…
panbingkun Jun 7, 2024
201df0d
[MINOR][PYTHON][TESTS] Move a test out of parity tests
zhengruifeng Jun 7, 2024
24bce72
[SPARK-48012][SQL] SPJ: Support Transfrom Expressions for One Side Sh…
szehon-ho Jun 9, 2024
d9394ee
[SPARK-48560][SS][PYTHON] Make StreamingQueryListener.spark settable
HyukjinKwon Jun 9, 2024
1901669
[SPARK-48564][PYTHON][CONNECT] Propagate cached schema in set operations
zhengruifeng Jun 10, 2024
61fd936
[SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCAS…
uros-db Jun 10, 2024
3857a9d
[SPARK-48410][SQL] Fix InitCap expression for UTF8_BINARY_LCASE & ICU…
uros-db Jun 10, 2024
ec6db63
[SPARK-48569][SS][CONNECT] Handle edge cases in query.name
WweiL Jun 10, 2024
5a2f374
[SPARK-48544][SQL] Reduce memory pressure of empty TreeNode BitSets
n-young-db Jun 10, 2024
3fe6abd
[SPARK-48563][BUILD] Upgrade `pickle` to 1.5
LuciferYang Jun 11, 2024
1e4750e
[SPARK-47500][PYTHON][CONNECT][FOLLOWUP] Restore error message for `D…
zhengruifeng Jun 11, 2024
53d65fd
[SPARK-48565][UI] Fix thread dump display in UI
pan3793 Jun 11, 2024
452c1b6
[SPARK-48551][SQL] Perf improvement for escapePathName
yaooqinn Jun 11, 2024
df4156a
[SPARK-48372][SPARK-45716][PYTHON][FOLLOW-UP] Remove unused helper me…
zhengruifeng Jun 11, 2024
583ab05
[SPARK-47415][SQL] Add collation support for Levenshtein expression
uros-db Jun 11, 2024
224ba16
[SPARK-48556][SQL] Fix incorrect error message pointing to UNSUPPORTE…
nikolamand-db Jun 11, 2024
aad6771
[SPARK-48576][SQL] Rename UTF8_BINARY_LCASE to UTF8_LCASE
uros-db Jun 11, 2024
6107836
[SPARK-48576][SQL][FOLLOWUP] Rename UTF8_BINARY_LCASE to UTF8_LCASE
uros-db Jun 11, 2024
82a84ed
[SPARK-46937][SQL] Revert "[] Improve concurrency performance for Fun…
cloud-fan Jun 11, 2024
72df3cb
[SPARK-48582][BUILD] Upgrade `braces` from 3.0.2 to 3.0.3 in ui-test
LuciferYang Jun 12, 2024
334816a
[SPARK-48411][SS][PYTHON] Add E2E test for DropDuplicateWithinWatermark
eason-yuchen-liu Jun 12, 2024
8870efc
[SPARK-48581][BUILD] Upgrade dropwizard metrics to 4.2.26
wayneguow Jun 12, 2024
da81d8e
[SPARK-48584][SQL] Perf improvement for unescapePathName
yaooqinn Jun 12, 2024
a3625a9
[SPARK-48595][CORE] Cleanup deprecated api usage related to `commons-…
LuciferYang Jun 12, 2024
b5e1b79
[SPARK-48596][SQL] Perf improvement for calculating hex string for long
yaooqinn Jun 12, 2024
2d0b122
[SPARK-48594][PYTHON][CONNECT] Rename `parent` field to `child` in `C…
zhengruifeng Jun 12, 2024
d1d29c9
[SPARK-48598][PYTHON][CONNECT] Propagate cached schema in dataframe o…
zhengruifeng Jun 12, 2024
0bbd049
[SPARK-48591][PYTHON] Simplify the if-else branches with `F.lit`
zhengruifeng Jun 12, 2024
c059c84
[SPARK-48421][SQL] SPJ: Add documentation
szehon-ho Jun 12, 2024
3988548
[SPARK-48593][PYTHON][CONNECT] Fix the string representation of lambd…
zhengruifeng Jun 12, 2024
fd045c9
[SPARK-48583][SQL][TESTS] Replace deprecated classes and methods of `…
wayneguow Jun 13, 2024
ea2bca7
[SPARK-48602][SQL] Make csv generator support different output style …
yaooqinn Jun 13, 2024
78fd4e3
[SPARK-48584][SQL][FOLLOWUP] Improve the unescapePathName
beliefer Jun 13, 2024
b8c7aee
[SPARK-48609][BUILD] Upgrade `scala-xml` to 2.3.0
panbingkun Jun 13, 2024
bdcb79f
[SPARK-48543][SS] Track state row validation failures using explicit …
anishshri-db Jun 13, 2024
08e741b
[SPARK-48604][SQL] Replace deprecated `new ArrowType.Decimal(precisio…
wayneguow Jun 13, 2024
be154a3
[SPARK-48622][SQL] get SQLConf once when resolving column names
andrewxue-db Jun 14, 2024
70bdcc9
[MINOR][DOCS] Fix metrics info of shuffle service
Jun 14, 2024
0b214f1
[MINOR][DOCS][TESTS] Update repo name and link from `parquet-mr` to `…
wayneguow Jun 14, 2024
75fff90
[SPARK-45685][SQL][FOLLOWUP] Add handling for `Stream` where `LazyLis…
LuciferYang Jun 14, 2024
157b1e3
[SPARK-48612][SQL][SS] Cleanup deprecated api usage related to common…
LuciferYang Jun 14, 2024
3831886
[SPARK-48625][BUILD] Upgrade `mssql-jdbc` to 12.6.2.jre11
wayneguow Jun 14, 2024
878de00
[SPARK-48626][CORE] Change the scope of object LogKeys as private in …
gengliangwang Jun 14, 2024
dd8b05f
[SPARK-42252][CORE] Add `spark.shuffle.localDisk.file.output.buffer` …
wayneguow Jun 14, 2024
2d2bedf
[SPARK-48056][CONNECT][FOLLOW-UP] Scala Client re-execute plan if a S…
Jun 14, 2024
aa4bfb0
Revert "[SPARK-48591][PYTHON] Simplify the if-else branches with `F.l…
zhengruifeng Jun 14, 2024
8ee8aba
[SPARK-48621][SQL] Fix Like simplification in Optimizer for collated …
uros-db Jun 14, 2024
0775ea7
[SPARK-48611][CORE] Log TID for input split in HadoopRDD and NewHadoo…
pan3793 Jun 15, 2024
347f9c6
[SPARK-48302][PYTHON] Preserve nulls in map columns in PyArrow Tables
ianmcook Jun 16, 2024
c09039a
[SPARK-48597][SQL] Introduce a marker for isStreaming property in tex…
HeartSaVioR Jun 17, 2024
9881e0a
[SPARK-47777] fix python streaming data source connect test
chaoqin-li1123 Jun 17, 2024
8f662fc
[SPARK-48555][SQL][PYTHON][CONNECT] Support using Columns as paramete…
Ronserruya Jun 17, 2024
33a9c5d
[MINOR][PYTHON][DOCS] Fix pyspark.sql.functions.reduce docstring typo
kaashif Jun 17, 2024
257a788
[SPARK-48615][SQL] Perf improvement for parsing hex string
yaooqinn Jun 17, 2024
42cd961
[SPARK-48587][VARIANT] Avoid storage amplification when accessing a s…
cashmand Jun 17, 2024
0a4b112
[SPARK-48633][BUILD] Upgrade `scalacheck` to 1.18.0
wayneguow Jun 17, 2024
71475f7
[SPARK-48577][SQL] Invalid UTF-8 byte sequence replacement
uros-db Jun 17, 2024
0c16624
[SPARK-48627][SQL] Perf improvement for binary to to HEX_DISCRETE string
yaooqinn Jun 17, 2024
90d302a
[SPARK-48557][SQL] Support scalar subquery with group-by on column eq…
jchen5 Jun 17, 2024
d3da240
[SPARK-48610][SQL] refactor: use auxiliary idMap instead of OP_ID_TAG
liuzqt Jun 17, 2024
e00d26f
[SPARK-48600][SQL] Fix FrameLessOffsetWindowFunction expressions impl…
mihailomilosevic2001 Jun 17, 2024
d3455df
[SPARK-48572][SQL] Fix DateSub, DateAdd, WindowTime, TimeWindow and S…
mihailomilosevic2001 Jun 17, 2024
9ef092f
[SPARK-48641][BUILD] Upgrade `curator` to 5.7.0
wayneguow Jun 17, 2024
8fdd85f
[SPARK-48603][TEST] Update *ParquetReadSchemaSuite to cover type wide…
pan3793 Jun 17, 2024
66d8a29
[SPARK-47577][SPARK-47579] Correct misleading usage of log key TASK_ID
gengliangwang Jun 17, 2024
0864bbe
[SPARK-48566][PYTHON] Fix bug where partition indices are incorrect w…
dtenedor Jun 17, 2024
00a96bb
[SPARK-48642][CORE] False SparkOutOfMemoryError caused by killing tas…
pan3793 Jun 17, 2024
d8a24b7
[SPARK-48645][BUILD] Upgrade Maven to 3.9.8
dongjoon-hyun Jun 17, 2024
042804a
[SPARK-48567][SS] StreamingQuery.lastProgress should return the actua…
WweiL Jun 17, 2024
f0b7cfa
[SPARK-48497][PYTHON][DOCS] Add an example for Python data source wri…
allisonwang-db Jun 17, 2024
e265c60
[SPARK-47910][CORE] close stream when DiskBlockObjectWriter closeReso…
JacobZheng0927 Jun 18, 2024
738acd1
[SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly …
HyukjinKwon Jun 18, 2024
05c87e5
[SPARK-48644][SQL] Do a length check and throw COLLECTION_SIZE_LIMIT_…
yaooqinn Jun 18, 2024
c5809b6
[SPARK-48647][PYTHON][CONNECT] Refine the error message for `YearMont…
zhengruifeng Jun 18, 2024
a3feffd
[SPARK-48585][SQL] Make `built-in` JdbcDialect's method `classifyExce…
panbingkun Jun 18, 2024
9898e9d
[SPARK-48342][SQL] Introduction of SQL Scripting Parser
davidm-db Jun 18, 2024
58701d8
[SPARK-47148][SQL][FOLLOWUP] Use broadcast hint to make test more stable
cloud-fan Jun 18, 2024
80bba44
[SPARK-48459][CONNECT][PYTHON] Implement DataFrameQueryContext in Spa…
HyukjinKwon Jun 18, 2024
47ffe40
[SPARK-48646][PYTHON] Refine Python data source API docstring and typ…
allisonwang-db Jun 19, 2024
6ee7c25
[SPARK-48634][PYTHON][CONNECT] Avoid statically initialize threadpool…
HyukjinKwon Jun 19, 2024
5e28e95
[SPARK-48649][SQL] Add "ignoreInvalidPartitionPaths" and "spark.sql.f…
sadikovi Jun 19, 2024
878dd6a
[SPARK-48601][SQL] Give a more user friendly error message when setti…
stevomitric Jun 19, 2024
b77caf7
[SPARK-48651][DOC] Configuring different JDK for Spark on YARN
pan3793 Jun 19, 2024
1e868b2
Revert "[SPARK-48554][INFRA] Use R 4.4.0 in `windows` R GitHub Action…
HyukjinKwon Jun 19, 2024
d067fc6
Revert "[SPARK-48567][SS] StreamingQuery.lastProgress should return t…
HyukjinKwon Jun 19, 2024
b0e2cb5
[SPARK-48623][CORE] Structured Logging Migrations
asl3 Jun 19, 2024
2fe0692
[SPARK-48466][SQL] Create dedicated node for EmptyRelation in AQE
liuzqt Jun 19, 2024
3ac31b1
[SPARK-48574][SQL] Fix support for StructTypes with collations
mihailomilosevic2001 Jun 19, 2024
484e7ac
[SPARK-48472][SQL] Enable reflect expressions with collated strings
mihailoale-db Jun 19, 2024
5458763
[SPARK-48541][CORE] Add a new exit code for executors killed by TaskR…
bozhang2820 Jun 19, 2024
5d6e9dd
[SPARK-47986][CONNECT][FOLLOW-UP] Unable to create a new session when…
Jun 19, 2024
248fd4c
[SPARK-48342][FOLLOWUP][SQL] Remove unnecessary import in AstBuilder
davidm-db Jun 20, 2024
9eadb2c
[SPARK-48634][PYTHON][CONNECT][FOLLOW-UP] Do not make a request if th…
HyukjinKwon Jun 20, 2024
0d9f8a1
[SPARK-48479][SQL] Support creating scalar and table SQL UDFs in parser
allisonwang-db Jun 20, 2024
692d869
[SPARK-48591][PYTHON] Add a helper function to simplify `Column.py`
zhengruifeng Jun 20, 2024
955349f
[SPARK-48620][PYTHON] Fix internal raw data leak in `YearMonthInterva…
zhengruifeng Jun 20, 2024
714699b
[SPARK-47911][SQL][FOLLOWUP] Enable binary format tests in ThriftServ…
yaooqinn Jun 20, 2024
904d4dd
[SPARK-48635][SQL] Assign classes to join type errors and as-of join …
wayneguow Jun 21, 2024
b0b02b2
[SPARK-48653][PYTHON] Fix invalid Python data source error class refe…
allisonwang-db Jun 21, 2024
e68b8ca
[SPARK-48677][BUILD] Upgrade `scalafmt` to 3.8.2
panbingkun Jun 21, 2024
6eb7978
[SQL][TEST] Re-run collation benchmark
uros-db Jun 21, 2024
d8099a2
[SPARK-48479][SQL][FOLLOWUP] Consolidate createOrReplaceTableColType …
allisonwang-db Jun 21, 2024
67c7187
[SPARK-48661][BUILD] Upgrade `RoaringBitmap` to 1.1.0
wayneguow Jun 21, 2024
f077759
[SPARK-48631][CORE][TEST] Fix test "error during accessing host local…
bozhang2820 Jun 21, 2024
62bad53
[SPARK-48672][DOC] Update Jakarta Servlet reference in security page
pan3793 Jun 21, 2024
b99bb00
[SPARK-48630][INFRA] Make `merge_spark_pr` keep the format of revert PR
zhengruifeng Jun 21, 2024
3469ec6
[SPARK-48656][CORE] Do a length check and throw COLLECTION_SIZE_LIMIT…
wayneguow Jun 21, 2024
cd8bf11
[SPARK-48659][SQL][TESTS] Unify v1 and v2 ALTER TABLE .. SET TBLPROPE…
panbingkun Jun 21, 2024
fdabe08
[SPARK-48490][CORE][FOLLOWUP] Properly process escape sequences
gengliangwang Jun 21, 2024
b5d0d07
[SPARK-48662][SQL] Fix StructsToXml expression with collations
mihailomilosevic2001 Jun 21, 2024
97d3add
Revert "Revert "[SPARK-48554][INFRA] Use R 4.4.0 in `windows` R GitHu…
HyukjinKwon Jun 21, 2024
32861e0
[SPARK-48684][INFRA] Print related JIRA summary before proceeding merge
yaooqinn Jun 21, 2024
f0563ef
[SPARK-47172][CORE] Add support for AES-GCM for RPC encryption
sweisdb Jun 21, 2024
0bc38ac
[SPARK-48675][SQL] Fix cache table with collated column
nikolamand-db Jun 21, 2024
9414211
[SPARK-48490][CORE][TESTS][FOLLOWUP] Add some UT for the Windows path…
panbingkun Jun 21, 2024
7e5a461
[SPARK-48655][SQL] SPJ: Add tests for shuffle skipping for aggregate …
szehon-ho Jun 21, 2024
b1677a4
[SPARK-48545][SQL] Create to_avro and from_avro SQL functions to matc…
dtenedor Jun 21, 2024
c8d75c1
[SPARK-48620][PYTHON][FOLLOW-UP] Correct the error message for `Calen…
zhengruifeng Jun 22, 2024
84d278c
[MINOR] Fix some typos in `error-states.json`
wayneguow Jun 23, 2024
4b37eb8
[SPARK-48678][CORE] Performance optimizations for SparkConf.get(Confi…
JoshRosen Jun 23, 2024
e972dae
[SPARK-48688][SQL] Return reasonable error when calling SQL to_avro a…
dtenedor Jun 23, 2024
88cc153
[SPARK-48650][PYTHON] Display correct call site from IPython Notebook
itholic Jun 24, 2024
31fa9d8
[SQL][TEST][FOLLOWUP] Re-run collation benchmark (NonASCII)
uros-db Jun 24, 2024
4663b84
[SPARK-47681][FOLLOWUP] Fix schema_of_variant for float inputs
chenhao-db Jun 24, 2024
a7dc020
[SPARK-48681][SQL] Use ICU in Lower/Upper expressions for UTF8_BINARY…
uros-db Jun 24, 2024
8b16196
[SPARK-48680][SQL][DOCS] Add missing Java APIs and language-specific …
yaooqinn Jun 24, 2024
e459674
[SPARK-48683][SQL] Fix schema evolution with `df.mergeInto` losing `w…
xupefei Jun 24, 2024
09cb592
[SPARK-48639][CONNECT][PYTHON] Add Origin to Relation.RelationCommon
HyukjinKwon Jun 24, 2024
8e02a64
[SPARK-48695][PYTHON] `TimestampNTZType.fromInternal` not use the dep…
zhengruifeng Jun 24, 2024
fb5697d
[SPARK-48658][SQL] Encode/Decode functions report coding errors inste…
yaooqinn Jun 24, 2024
2ac2710
[SPARK-48702][INFRA] Fix `Python CodeGen check`
panbingkun Jun 25, 2024
5112e58
[SPARK-48692][BUILD] Upgrade `rocksdbjni` to 9.2.1
panbingkun Jun 25, 2024
d47f34f
[SPARK-48629] Migrate the residual code to structured logging framework
panbingkun Jun 25, 2024
b49479b
[SPARK-48704][INFRA] Update `build_sparkr_window.yml` to use `windows…
panbingkun Jun 25, 2024
51f1103
[SPARK-48686][SQL] Improve performance of ParserUtils.unescapeSQLString
JoshRosen Jun 25, 2024
8c4ca7e
[SPARK-48693][SQL] Simplify and unify toString of Invoke and StaticIn…
yaooqinn Jun 25, 2024
068be4b
[SPARK-48578][SQL] add UTF8 string validation related functions
uros-db Jun 25, 2024
5928908
[SPARK-48466][SQL][FOLLOWUP] Fix missing pattern match in EmptyRelati…
liuzqt Jun 25, 2024
9d4abaf
[SPARK-48638][CONNECT] Add ExecutionInfo support for DataFrame
grundprinzip Jun 25, 2024
ebacb91
[SPARK-48718][SQL] Handle and fix the case when deserializer in cogro…
anchovYu Jun 26, 2024
c459afb
[SPARK-48573][SQL] Upgrade ICU version
mihailomilosevic2001 Jun 26, 2024
169346c
[SPARK-48578][SQL][FOLLOWUP] Fix `dev/scalastyle` error for `Expressi…
panbingkun Jun 26, 2024
07cbba6
[SPARK-48706][PYTHON] Python UDF in higher order functions should not…
HyukjinKwon Jun 26, 2024
ee3a612
[SPARK-48705][PYTHON] Explicitly use worker_main when it starts with …
HyukjinKwon Jun 26, 2024
bb21861
[SPARK-48059][CORE][FOLLOWUP] Fix bug for SparkLogger
panbingkun Jun 26, 2024
ec0ee86
[SPARK-48699][SQL] Refine collation API
uros-db Jun 26, 2024
7a1608b
[SPARK-48670][SQL] Providing suggestion as part of error message when…
dbatomic Jun 26, 2024
0fc5b0b
[SPARK-47353][SQL] Enable collation support for the Mode expression
GideonPotok Jun 26, 2024
4cf5450
[SPARK-48699][SQL][FOLLOWUP] Refine collation API
uros-db Jun 26, 2024
313479c
[SPARK-48713][SQL] Add index range check for UnsafeRow.pointTo when b…
Ngone51 Jun 26, 2024
e23d69b
[SPARK-48709][SQL] Fix varchar type resolution mismatch for DataSourc…
wangyum Jun 26, 2024
a474b88
[SPARK-48724][SQL][TESTS] Fix incorrect conf settings of `ignoreCorru…
wayneguow Jun 26, 2024
a50b30d
[SPARK-48687][SS] Add change to perform state schema validation and u…
anishshri-db Jun 26, 2024
7f5f96c
[SPARK-48691][BUILD] Upgrade scalatest related dependencies to the 3.…
wayneguow Jun 26, 2024
b47c614
writing schema
ericm-db Jun 25, 2024
c238e70
commenting
ericm-db Jun 25, 2024
acd8504
removing TODO
ericm-db Jun 26, 2024
de30c7a
tws tests pass
ericm-db Jun 26, 2024
e9d4fcc
added test case, serializing list of columnFamilyMetadata instead of …
ericm-db Jun 26, 2024
998a019
adding test case
ericm-db Jun 26, 2024
062f955
comment
ericm-db Jun 26, 2024
723f23c
adding purging logic
ericm-db Jun 26, 2024
32e73d0
added test case for purging
ericm-db Jun 26, 2024
1581264
[SPARK-48717][PYTHON][SS] Catch ForeachBatch py4j InterruptedExceptio…
WweiL Jun 27, 2024
906af78
[SPARK-48578][SQL][FOLLOWUP] add UTF8 string validation related funct…
uros-db Jun 27, 2024
b5f76b1
[SPARK-48723][INFRA] Run `git cherry-pick --abort` if backporting is …
yaooqinn Jun 27, 2024
14272e8
[SPARK-48721][SQL][DOCS] Fix the doc of `decode` function in SQL API …
yaooqinn Jun 27, 2024
ec7dde7
[SPARK-42610][CONNECT][FOLLOWUP] Add some test cases for `SQLImplicit…
wayneguow Jun 27, 2024
ea0cd01
[SPARK-39627][SQL][FOLLOWUP] Cleanup deprecated api usage related to …
wayneguow Jun 27, 2024
7c7c196
[SPARK-48712][SQL] Perf Improvement for encode with empty values or U…
yaooqinn Jun 27, 2024
48f39b8
[SPARK-48729][SQL] Add a UserDefinedFunction interface to represent a…
allisonwang-db Jun 27, 2024
2e31572
Revert "[SPARK-48639][CONNECT][PYTHON] Add Origin to Relation.Relatio…
HyukjinKwon Jun 27, 2024
b154623
[MINOR][TESTS] Always remove spark.master in ReusedConnectTestCase
HyukjinKwon Jun 27, 2024
58d1a89
[SPARK-48555][PYTHON][FOLLOW-UP] Simplify the support of `Any` parame…
zhengruifeng Jun 27, 2024
ff2d177
[MINOR][DOCS] Make pivot doctest deterministic
HyukjinKwon Jun 27, 2024
5943905
[MINOR][PYTHON][TESTS] Remove duplicate schema checking
HyukjinKwon Jun 27, 2024
642c4bb
[SPARK-48639][CONNECT][PYTHON] Add Origin to RelationCommon
HyukjinKwon Jun 27, 2024
5b53c6c
[MINOR][PYTHON][DOCS] Fix indents in function API references
zhengruifeng Jun 27, 2024
c788d12
[SPARK-48733][PYTHON][TESTS] Do not test SET command in Python UDTF test
HyukjinKwon Jun 27, 2024
7bf9119
[SPARK-48734][PYTHON][TESTS] Separate local cluster test from test_ar…
HyukjinKwon Jun 27, 2024
d89aad3
[SPARK-47927][SQL][FOLLOWUP] fix ScalaUDF output nullability
cloud-fan Jun 27, 2024
b11608c
[SPARK-48428][SQL] Fix IllegalStateException in NestedColumnAliasing
Jun 27, 2024
b5a55e4
[SPARK-46957][CORE] Decommission migrated shuffle files should be abl…
Ngone51 Jun 27, 2024
40ad829
[SPARK-48586][SS] Remove lock acquisition in doMaintenance() by makin…
riyaverm-db Jun 27, 2024
baf461b
[SPARK-48708][CORE] Remove three unnecessary type registrations from …
LuciferYang Jun 27, 2024
df13ca0
[SPARK-48735][SQL] Performance Improvement for BIN function
yaooqinn Jun 27, 2024
1cdd5fa
[SPARK-48736][PYTHON] Support infra fro additional includes for Pytho…
grundprinzip Jun 27, 2024
2c9eb1c
[SPARK-48682][SQL] Use ICU in InitCap expression for UTF8_BINARY strings
uros-db Jun 28, 2024
6304484
[SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE co…
uros-db Jun 28, 2024
8cd095f
[SPARK-48738][SQL] Correct since version for built-in func alias `ran…
wayneguow Jun 28, 2024
3bf7de0
[SPARK-48668][SQL] Support ALTER NAMESPACE ... UNSET PROPERTIES in v2
panbingkun Jun 28, 2024
2eeebef
[SPARK-46957][CORE][FOLLOW-UP] Use Collections.emptyMap for Java comp…
Ngone51 Jun 28, 2024
9141aa4
[SPARK-48744][CORE] Log entry should be constructed only once
gengliangwang Jun 28, 2024
a2c4be0
[SPARK-47233][CONNECT][SS][FOLLOW-UP] Add eventually for terminated e…
HyukjinKwon Jun 28, 2024
69f3a9b
[SPARK-48586][SS][FOLLOWUP] RocksDB and RocksDBFileManager code style…
riyaverm-db Jun 28, 2024
fc98ccd
[SPARK-48746][PYTHON][SS][TESTS] Avoid using global temp view in fore…
HyukjinKwon Jun 28, 2024
80277ee
[SPARK-48745][INFRA][PYTHON][TESTS] Remove unnecessary installation `…
panbingkun Jun 28, 2024
4e57f06
[SPARK-48307][SQL][FOLLOWUP] not-inlined CTE references sibling shoul…
cloud-fan Jun 28, 2024
6bfeb09
[SPARK-48757][CORE] Make `IndexShuffleBlockResolver` have explicit co…
dongjoon-hyun Jun 28, 2024
f49418b
[SPARK-48751][INFRA][PYTHON][TESTS] Re-balance `pyspark-pandas-connec…
panbingkun Jun 30, 2024
0487d78
[SPARK-48748][SQL] Cache numChars in UTF8String
uros-db Jul 1, 2024
a84a6a4
[SPARK-48749][SQL] Simplify UnaryPositive and eliminate its Catalyst …
yaooqinn Jul 1, 2024
f70ce13
[SPARK-48638][INFRA][FOLLOW-UP] Add graphviz into CI to run the relat…
HyukjinKwon Jul 1, 2024
399980e
[MINOR][DOCS] Fix the type hints of `functions.first(..., ignorenulls…
HyukjinKwon Jul 1, 2024
bc16b24
[SPARK-48765][DEPLOY] Enhance default value evaluation for SPARK_IDEN…
pan3793 Jul 1, 2024
cd1f1af
[SPARK-48747][SQL] Add code point iterator to UTF8String
uros-db Jul 1, 2024
48eb4d5
[SPARK-48737] Perf improvement during analysis - Create exception onl…
urosstan-db Jul 1, 2024
703b076
[SPARK-48697][SQL] Add collation aware string filters
stefankandic Jul 1, 2024
bab129d
combining rules
ericm-db Jul 1, 2024
9b94439
passing in encoders to columnfamilyschema
ericm-db Jul 1, 2024
5c29d8d
[SPARK-48768][PYTHON][CONNECT] Should not cache `explain`
zhengruifeng Jul 1, 2024
5ac7c9b
[SPARK-48766][PYTHON] Document the behavior difference of `extraction…
zhengruifeng Jul 1, 2024
6768eea
Feedback
ericm-db Jul 2, 2024
afb5e39
added base class
ericm-db Jul 2, 2024
d515740
rebase
ericm-db Jul 2, 2024
a1288a4
refactors
ericm-db Jul 2, 2024
a670585
comment
ericm-db Jul 2, 2024
9304223
[SPARK-44718][FOLLOWUP][DOCS] Avoid using ConfigEntry in spark.sql.co…
yaooqinn Jul 2, 2024
8a5f4e0
[SPARK-48759][SQL] Add migration doc for CREATE TABLE AS SELECT behav…
asl3 Jul 2, 2024
4ee37ed
[SPARK-48764][PYTHON] Filtering out IPython-related frames from user …
itholic Jul 2, 2024
353c2da
feedback, creating ColumnFamilySchemaFactory
ericm-db Jul 2, 2024
f471bfe
adding version in tws
ericm-db Jul 2, 2024
db9e1ac
[SPARK-48177][BUILD] Upgrade `Apache Parquet` to 1.14.1
Fokko Jul 2, 2024
49fece8
added override modifier
ericm-db Jul 2, 2024
7ad352f
feedback
ericm-db Jul 2, 2024
ee0d306
[SPARK-48589][SQL][SS] Add option snapshotStartBatchId and snapshotPa…
eason-yuchen-liu Jul 2, 2024
efe2e74
Merge branch 'master' into state-schema-tws
ericm-db Jul 2, 2024
3ea5d29
using tempdir
ericm-db Jul 2, 2024
5eee250
using map instead of list
ericm-db Jul 2, 2024
dce9968
removing purging
ericm-db Jul 2, 2024
9b1a7b2
tests pass
ericm-db Jul 3, 2024
f2857b9
combining rules
ericm-db Jul 3, 2024
4337016
removing metadataCacheEnabled
ericm-db Jul 3, 2024
dfb122f
removing COLUMN_FAMILY_SCHEMA_VERSION
ericm-db Jul 3, 2024
21af247
reverting hdfs metadata log changes
ericm-db Jul 3, 2024
3a36d06
feedback
ericm-db Jul 4, 2024
1b6ea1a
removing unused imports
ericm-db Jul 4, 2024
1df1e5d
line
ericm-db Jul 4, 2024
25b7b80
case match
ericm-db Jul 8, 2024
ede3136
feedback
ericm-db Jul 9, 2024
0a7945e
adding todos
ericm-db Jul 9, 2024
c1a041d
adding PR link
ericm-db Jul 9, 2024
feb4c01
removing batchId as dir
ericm-db Jul 9, 2024
c24490a
removing links
ericm-db Jul 10, 2024
b250aa4
sparkthrowablesuite
ericm-db Jul 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-47986][CONNECT][FOLLOW-UP] Unable to create a new session when…
… the default session is closed by the server

### What changes were proposed in this pull request?

This is a Scala port of apache#46221 and apache#46435.

A client is unaware of a server restart or the server having closed the client until it receives an error. However, at this point, the client in unable to create a new session to the same connect endpoint, since the stale session is still recorded
as the active and default session.

With this change, when the server communicates that the session has changed via a GRPC error, the session and the respective client are marked as stale, thereby allowing a new default connection can be created via the session builder.

In some cases, particularly when running older versions of the Spark cluster (3.5), the error actually manifests as a mismatch in the observed server-side session id between calls. With this fix, we also capture this case and ensure that this case is
also handled.

### Why are the changes needed?

Being unable to use getOrCreate() after an error is unacceptable and should be fixed.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

./build/sbt testOnly *SparkSessionE2ESuite

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47008 from changgyoopark-db/SPARK-47986.

Authored-by: Changgyoo Park <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
  • Loading branch information
Changgyoo Park authored and HyukjinKwon committed Jun 19, 2024
commit 5d6e9dd6b1212823dd3aa148935723151027f911
Original file line number Diff line number Diff line change
Expand Up @@ -829,10 +829,16 @@ object SparkSession extends Logging {

/**
* Set the (global) default [[SparkSession]], and (thread-local) active [[SparkSession]] when
* they are not set yet.
* they are not set yet or the associated [[SparkConnectClient]] is unusable.
*/
private def setDefaultAndActiveSession(session: SparkSession): Unit = {
defaultSession.compareAndSet(null, session)
val currentDefault = defaultSession.getAcquire
if (currentDefault == null || !currentDefault.client.isSessionValid) {
// Update `defaultSession` if it is null or the contained session is not valid. There is a
// chance that the following `compareAndSet` fails if a new default session has just been set,
// but that does not matter since that event has happened after this method was invoked.
defaultSession.compareAndSet(currentDefault, session)
}
if (getActiveSession.isEmpty) {
setActiveSession(session)
}
Expand Down Expand Up @@ -972,7 +978,7 @@ object SparkSession extends Logging {
def appName(name: String): Builder = this

private def tryCreateSessionFromClient(): Option[SparkSession] = {
if (client != null) {
if (client != null && client.isSessionValid) {
Option(new SparkSession(client, planIdGenerator))
} else {
None
Expand Down Expand Up @@ -1024,19 +1030,30 @@ object SparkSession extends Logging {
*/
def getOrCreate(): SparkSession = {
val session = tryCreateSessionFromClient()
.getOrElse(sessions.get(builder.configuration))
.getOrElse({
var existingSession = sessions.get(builder.configuration)
if (!existingSession.client.isSessionValid) {
// If the cached session has become invalid, e.g., due to a server restart, the cache
// entry is invalidated.
sessions.invalidate(builder.configuration)
existingSession = sessions.get(builder.configuration)
}
existingSession
})
setDefaultAndActiveSession(session)
applyOptions(session)
session
}
}

/**
* Returns the default SparkSession.
* Returns the default SparkSession. If the previously set default SparkSession becomes
* unusable, returns None.
*
* @since 3.5.0
*/
def getDefaultSession: Option[SparkSession] = Option(defaultSession.get())
def getDefaultSession: Option[SparkSession] =
Option(defaultSession.get()).filter(_.client.isSessionValid)

/**
* Sets the default SparkSession.
Expand All @@ -1057,11 +1074,13 @@ object SparkSession extends Logging {
}

/**
* Returns the active SparkSession for the current thread.
* Returns the active SparkSession for the current thread. If the previously set active
* SparkSession becomes unusable, returns None.
*
* @since 3.5.0
*/
def getActiveSession: Option[SparkSession] = Option(activeThreadSession.get())
def getActiveSession: Option[SparkSession] =
Option(activeThreadSession.get()).filter(_.client.isSessionValid)

/**
* Changes the SparkSession that will be returned in this thread and its children when
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -382,4 +382,43 @@ class SparkSessionE2ESuite extends ConnectFunSuite with RemoteSparkSession {
.create()
}
}

test("SPARK-47986: get or create after session changed") {
val remote = s"sc://localhost:$serverPort"

SparkSession.clearDefaultSession()
SparkSession.clearActiveSession()

val session1 = SparkSession
.builder()
.remote(remote)
.getOrCreate()

assert(session1 eq SparkSession.getActiveSession.get)
assert(session1 eq SparkSession.getDefaultSession.get)
assert(session1.range(3).collect().length == 3)

session1.client.hijackServerSideSessionIdForTesting("-testing")

val e = intercept[SparkException] {
session1.range(3).analyze
}

assert(e.getMessage.contains("[INVALID_HANDLE.SESSION_CHANGED]"))
assert(!session1.client.isSessionValid)
assert(SparkSession.getActiveSession.isEmpty)
assert(SparkSession.getDefaultSession.isEmpty)

val session2 = SparkSession
.builder()
.remote(remote)
.getOrCreate()

assert(session1 ne session2)
assert(session2.client.isSessionValid)
assert(session2 eq SparkSession.getActiveSession.get)
assert(session2 eq SparkSession.getDefaultSession.get)
assert(session2.range(3).collect().length == 3)
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,10 @@
*/
package org.apache.spark.sql.connect.client

import java.util.concurrent.atomic.AtomicBoolean

import com.google.protobuf.GeneratedMessageV3
import io.grpc.{Status, StatusRuntimeException}
import io.grpc.stub.StreamObserver

import org.apache.spark.internal.Logging
Expand All @@ -30,6 +33,12 @@ class ResponseValidator extends Logging {
// do not use server-side streaming.
private var serverSideSessionId: Option[String] = None

// Indicates whether the client and the client information on the server correspond to each other
// This flag being false means that the server has restarted and lost the client information, or
// there is a logic error in the code; both cases, the user should establish a new connection to
// the server. Access to the value has to be synchronized since it can be shared.
private val isSessionActive: AtomicBoolean = new AtomicBoolean(true)

// Returns the server side session ID, used to send it back to the server in the follow-up
// requests so the server can validate it session id against the previous requests.
def getServerSideSessionId: Option[String] = serverSideSessionId
Expand All @@ -42,8 +51,25 @@ class ResponseValidator extends Logging {
serverSideSessionId = Some(serverSideSessionId.getOrElse("") + suffix)
}

/**
* Returns true if the session is valid on both the client and the server.
*/
private[sql] def isSessionValid: Boolean = {
// An active session is considered valid.
isSessionActive.getAcquire
}

def verifyResponse[RespT <: GeneratedMessageV3](fn: => RespT): RespT = {
val response = fn
val response =
try {
fn
} catch {
case e: StatusRuntimeException
if e.getStatus.getCode == Status.Code.INTERNAL &&
e.getMessage.contains("[INVALID_HANDLE.SESSION_CHANGED]") =>
isSessionActive.setRelease(false)
throw e
}
val field = response.getDescriptorForType.findFieldByName("server_side_session_id")
// If the field does not exist, we ignore it. New / Old message might not contain it and this
// behavior allows us to be compatible.
Expand All @@ -54,6 +80,7 @@ class ResponseValidator extends Logging {
serverSideSessionId match {
case Some(id) =>
if (value != id) {
isSessionActive.setRelease(false)
throw new IllegalStateException(
s"Server side session ID changed from $id to $value")
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,17 @@ private[sql] class SparkConnectClient(
stubState.responseValidator.hijackServerSideSessionIdForTesting(suffix)
}

/**
* Returns true if the session is valid on both the client and the server. A session becomes
* invalid if the server side information about the client, e.g., session ID, does not
* correspond to the actual client state.
*/
private[sql] def isSessionValid: Boolean = {
// The last known state of the session is store in `responseValidator`, because it is where the
// client gets responses from the server.
stubState.responseValidator.isSessionValid
}

private[sql] val artifactManager: ArtifactManager = {
new ArtifactManager(configuration, sessionId, bstub, stub)
}
Expand Down