[SPARK-49666][SQL] Add feature flag for trim collation feature #48222

jovanpavl-db · 2024-09-24T07:52:28Z

What changes were proposed in this pull request?

Introducing new specifier for trim collations (both leading and trailing trimming). These are initial changes so that trim specifier is recognized and put under feature flag (all code paths blocked).

Why are the changes needed?

Support for trailing space trimming is one of the requested feature by users.

Does this PR introduce any user-facing change?

This is guarded by feature flag.

How was this patch tested?

Added tests to CollationSuite, SqlConfSuite and QueryCompilationErrorSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

stefankandic

Nice work! Left some minor comments but looks good overall

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala

common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala

stefankandic

LGTM pending scalastyle fixes!

common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java

sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

...catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala

common/utils/src/main/resources/error/error-conditions.json

common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala

cloud-fan · 2024-09-30T09:43:11Z

thanks, merging to master!

…ationNameToId` outside of cases ### What changes were proposed in this pull request? In this PR, UTF8_BINARY performance regression is addressed, that was first identified here #48721. The regression is traced back to this PR #48222 when it first occurred, however this isn't the actual source of performance degradation. ### Why are the changes needed? The PR #48222 caused the regression because it changed the `collationNameToId` function and made it slightly slower by removing a short-circuit for fetching the UTF8_BINARY collation. However this function should be called fixed amount of times for each query and from the benchmark framework at most once - this was not the case and it was the largest contributor to performance regression. This PR addresses the benchmarking framework to not call this function at each expression, but once per the test case. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing testing surface, benchmarks. ### Was this patch authored or co-authored using generative AI tooling? No Closes #48804 from stevomitric/stevomitric/fix-utf8_binary-regression. Authored-by: Stevo Mitric <[email protected]> Signed-off-by: Max Gekk <[email protected]>

jovanpavl-db added 11 commits September 20, 2024 16:24

introduce space trimming flag in collation factory.

b5a37e7

refactor CollationFactory.

954c6c1

fix collation name to id.

803d187

minor fix.

a880e02

fix default space trimming.

01628d6

feature flag implementation

5192476

Trim collation banned with collate builder

a3c5e8c

implement collation trim block without collate builder call.

e65232e

add test for set collation.

9635ab7

add tests for recognition of space trimming.

87cc344

minor fix

587cd2b

github-actions bot added the SQL label Sep 24, 2024

stefankandic suggested changes Sep 24, 2024

View reviewed changes

jovanpavl-db added 3 commits September 24, 2024 12:22

address comments

27a3058

simpfly switch statement.

58aab95

fix test, address rest of the comments.

16c1674

vladimirg-db reviewed Sep 24, 2024

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Outdated Show resolved Hide resolved

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala Outdated Show resolved Hide resolved

stefankandic approved these changes Sep 25, 2024

View reviewed changes

jovanpavl-db added 3 commits September 25, 2024 10:29

nit fixes.

bc99ecb

fix scala style.

5328ab7

fix java style.

156728c