Commit a7dc020
[SPARK-48681][SQL] Use ICU in Lower/Upper expressions for UTF8_BINARY strings
### What changes were proposed in this pull request?
Update `Lower` & `Upper` Spark expressions to use ICU case mappings for UTF8_BINARY collation, instead of the currently used JVM case mappings. This behaviour is put under the `ICU_CASE_MAPPINGS_ENABLED` flag in SQLConf, which is `true` by default.
### Why are the changes needed?
To keep the consistency between collations - all collations shouls use ICU-based case mappings, including the UTF8_BINARY collation.
### Does this PR introduce _any_ user-facing change?
Yes, the behaviour of `lower` & `upper` string functions for UTF8_BINARY will now rely on ICU-based case mappings. However, by turning the `ICU_CASE_MAPPINGS_ENABLED` flag off, users can get the old JVM-based case mappings. Note that the difference between the two is really subtle.
### How was this patch tested?
Existing tests, with extended `CollationSupport` unit tests for Lower/Upper to verify both ICU and JVM behaviour.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #47043 from uros-db/change-lower-upper.
Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent 4663b84 commit a7dc020
File tree
4 files changed
+47
-15
lines changed- common/unsafe/src
- main/java/org/apache/spark/sql/catalyst/util
- test/java/org/apache/spark/unsafe/types
- sql/catalyst/src/main/scala/org/apache/spark/sql
- catalyst/expressions
- internal
4 files changed
+47
-15
lines changedLines changed: 17 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
206 | 206 | | |
207 | 207 | | |
208 | 208 | | |
209 | | - | |
| 209 | + | |
210 | 210 | | |
211 | 211 | | |
212 | | - | |
| 212 | + | |
213 | 213 | | |
214 | 214 | | |
215 | 215 | | |
216 | 216 | | |
217 | 217 | | |
218 | 218 | | |
219 | | - | |
| 219 | + | |
220 | 220 | | |
221 | 221 | | |
222 | 222 | | |
223 | | - | |
| 223 | + | |
| 224 | + | |
224 | 225 | | |
225 | 226 | | |
226 | 227 | | |
| |||
230 | 231 | | |
231 | 232 | | |
232 | 233 | | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
233 | 237 | | |
234 | 238 | | |
235 | 239 | | |
| |||
239 | 243 | | |
240 | 244 | | |
241 | 245 | | |
242 | | - | |
| 246 | + | |
243 | 247 | | |
244 | 248 | | |
245 | | - | |
| 249 | + | |
246 | 250 | | |
247 | 251 | | |
248 | 252 | | |
249 | 253 | | |
250 | 254 | | |
251 | 255 | | |
252 | | - | |
| 256 | + | |
253 | 257 | | |
254 | | - | |
| 258 | + | |
255 | 259 | | |
256 | | - | |
| 260 | + | |
| 261 | + | |
257 | 262 | | |
258 | 263 | | |
259 | 264 | | |
| |||
263 | 268 | | |
264 | 269 | | |
265 | 270 | | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
266 | 274 | | |
267 | 275 | | |
268 | 276 | | |
| |||
Lines changed: 10 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
606 | 606 | | |
607 | 607 | | |
608 | 608 | | |
609 | | - | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
610 | 614 | | |
611 | 615 | | |
612 | 616 | | |
| |||
660 | 664 | | |
661 | 665 | | |
662 | 666 | | |
663 | | - | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
664 | 672 | | |
665 | 673 | | |
666 | 674 | | |
| |||
Lines changed: 12 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
455 | 455 | | |
456 | 456 | | |
457 | 457 | | |
458 | | - | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
459 | 463 | | |
460 | 464 | | |
461 | 465 | | |
462 | 466 | | |
463 | | - | |
| 467 | + | |
464 | 468 | | |
465 | 469 | | |
466 | 470 | | |
| |||
483 | 487 | | |
484 | 488 | | |
485 | 489 | | |
486 | | - | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
487 | 495 | | |
488 | 496 | | |
489 | 497 | | |
490 | 498 | | |
491 | | - | |
| 499 | + | |
492 | 500 | | |
493 | 501 | | |
494 | 502 | | |
| |||
Lines changed: 8 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
785 | 785 | | |
786 | 786 | | |
787 | 787 | | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
788 | 796 | | |
789 | 797 | | |
790 | 798 | | |
| |||
0 commit comments