Skip to content
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
b34544a
Implicit casting on collated expressions
mihailomilosevic2001 Mar 5, 2024
fdbfa44
Fix doc files
mihailomilosevic2001 Mar 5, 2024
ce9b027
Fix contains, startWith, endWith tests
mihailomilosevic2001 Mar 5, 2024
e537190
Fix imports
mihailomilosevic2001 Mar 5, 2024
b5a79c1
Fix docs and incorporate changes
mihailomilosevic2001 Mar 6, 2024
8321d0c
Fix tests in CollationSuite
mihailomilosevic2001 Mar 6, 2024
d178233
Add test and incorporate changes
mihailomilosevic2001 Mar 7, 2024
a4b9be7
Fix godlen files
mihailomilosevic2001 Mar 7, 2024
a6e7662
Incorporate StringType in findWiderCommonType
mihailomilosevic2001 Mar 8, 2024
e1d7ad5
Merge branch 'master' into SPARK-47210
mihailomilosevic2001 Mar 8, 2024
b3b1356
Fix ArrayType(StringType, _) casting in findWiderCommonType
mihailomilosevic2001 Mar 11, 2024
7773d13
Fix type mismatch error
mihailomilosevic2001 Mar 11, 2024
198a728
Merge branch 'apache:master' into SPARK-47210
mihailomilosevic2001 Mar 11, 2024
255b1ab
Incorporate changes and fix errors
mihailomilosevic2001 Mar 11, 2024
9ce417f
Merge branch 'master' into SPARK-47210
mihailomilosevic2001 Mar 12, 2024
50f3aa2
Fix errors
mihailomilosevic2001 Mar 12, 2024
ca0c84d
Rework casting
mihailomilosevic2001 Mar 13, 2024
880a1b1
Merge branch 'master' into SPARK-47210
mihailomilosevic2001 Mar 13, 2024
56d6c7c
Fix failing tests
mihailomilosevic2001 Mar 14, 2024
94e5259
Fix array cast errors
mihailomilosevic2001 Mar 14, 2024
ccb52ba
Fix additional errors
mihailomilosevic2001 Mar 14, 2024
9b1387b
Fix explicit collation search
mihailomilosevic2001 Mar 17, 2024
c9974e1
Fix scala style errors
mihailomilosevic2001 Mar 18, 2024
fca9a65
Add support for ImplicitCastInputTypes
mihailomilosevic2001 Mar 18, 2024
660d664
Fix accidental change in license header
mihailomilosevic2001 Mar 18, 2024
c8edd93
Fix null casting
mihailomilosevic2001 Mar 19, 2024
a91490b
Fix failing tests
mihailomilosevic2001 Mar 19, 2024
49a8d61
Move implicit casting when strings present
mihailomilosevic2001 Mar 19, 2024
4c4cd84
Fix unintentional changes
mihailomilosevic2001 Mar 19, 2024
66122a6
improve types.py
mihailomilosevic2001 Mar 20, 2024
50f46e4
Refactor code
mihailomilosevic2001 Mar 21, 2024
cc86a87
Merge branch 'master' into SPARK-47210
mihailomilosevic2001 Mar 21, 2024
c01e80c
Fix imports and failing tests
mihailomilosevic2001 Mar 21, 2024
cc797a2
Disable casting of StructTypes
mihailomilosevic2001 Mar 21, 2024
5d001ee
Fix imports
mihailomilosevic2001 Mar 21, 2024
c68fc7d
Fix concat tests
mihailomilosevic2001 Mar 21, 2024
1c926ab
Fix unnecessary repetition
mihailomilosevic2001 Mar 21, 2024
dec39bf
Remove Elt test
mihailomilosevic2001 Mar 21, 2024
e808446
Remove tests for Repeat
mihailomilosevic2001 Mar 21, 2024
ca1a23a
Merge branch 'master' into SPARK-47210
mihailomilosevic2001 Mar 21, 2024
116931c
Merge branch 'apache:master' into SPARK-47210
mihailomilosevic2001 Mar 22, 2024
af487a2
Fix failing tests
mihailomilosevic2001 Mar 22, 2024
4ba7055
Fix nullability for StringType->StringType
mihailomilosevic2001 Mar 22, 2024
e490e42
Improve comments and switch tests from E2E to unit tests
mihailomilosevic2001 Mar 24, 2024
00e88e7
Add new tests and remove compatibility test
mihailomilosevic2001 Mar 25, 2024
85b4d16
Fix conflict resolution mistake
mihailomilosevic2001 Mar 25, 2024
30f7225
Merge branch 'apache:master' into SPARK-47210
mihailomilosevic2001 Mar 25, 2024
e89a354
Add indeterminate collation tests
mihailomilosevic2001 Mar 26, 2024
788dc06
Fix test
mihailomilosevic2001 Mar 26, 2024
75c0140
Block Alias on Indeterminate
mihailomilosevic2001 Mar 27, 2024
2918413
Merge remote-tracking branch 'upstream/master' into SPARK-47210
mihailomilosevic2001 Mar 28, 2024
f6ed55a
Remove introduction of indeterminate collation
mihailomilosevic2001 Mar 28, 2024
98960c0
Fix import problem
mihailomilosevic2001 Mar 28, 2024
de623c8
Fix failing tests
mihailomilosevic2001 Mar 28, 2024
a92b4e1
Fix pyspark error
mihailomilosevic2001 Mar 28, 2024
f7f3011
Merge branch 'apache:master' into SPARK-47210
mihailomilosevic2001 Mar 28, 2024
f67808e
Fix errors
mihailomilosevic2001 Mar 29, 2024
815ce42
Fix schema error
mihailomilosevic2001 Mar 29, 2024
7fca38a
Merge remote-tracking branch 'upstream/master' into SPARK-47210
mihailomilosevic2001 Mar 29, 2024
b19b0eb
Fix collated tests
mihailomilosevic2001 Mar 29, 2024
a111f03
Add isExplicit flag
mihailomilosevic2001 Mar 29, 2024
55bdd9b
Fix import error
mihailomilosevic2001 Mar 29, 2024
a7228be
Fix imports in TypeCoercion
mihailomilosevic2001 Mar 31, 2024
27a72c6
Merge remote-tracking branch 'upstream/master' into SPARK-47210
mihailomilosevic2001 Apr 1, 2024
18ada04
Add support for explicit propagation in arrays
mihailomilosevic2001 Apr 1, 2024
38670af
Fix tests to follow recent changes
mihailomilosevic2001 Apr 1, 2024
01d891e
Incorporate changes
mihailomilosevic2001 Apr 1, 2024
c5daf86
Fix error
mihailomilosevic2001 Apr 1, 2024
9ac5678
Change var to val in StringType
mihailomilosevic2001 Apr 1, 2024
0f1757d
Fix import style
mihailomilosevic2001 Apr 1, 2024
506c8c0
Revert explicit flag addition
mihailomilosevic2001 Apr 1, 2024
f743cf8
Narrow down expressions casting
mihailomilosevic2001 Apr 2, 2024
4f8fe1d
Incorporate minor changes
mihailomilosevic2001 Apr 2, 2024
52bf4dc
Incorporate changes
mihailomilosevic2001 Apr 2, 2024
7cbeafe
Special case expressions
mihailomilosevic2001 Apr 3, 2024
3e92e92
Return new line
mihailomilosevic2001 Apr 3, 2024
b23e106
Remove indentation cosmetic
mihailomilosevic2001 Apr 3, 2024
880ebed
Add more cosmetic changes
mihailomilosevic2001 Apr 3, 2024
f96ecd9
Incorporate changes
mihailomilosevic2001 Apr 3, 2024
e1e0cf4
Merge branch 'apache:master' into SPARK-47210
mihailomilosevic2001 Apr 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ public Collation(
private static final Collation[] collationTable = new Collation[4];
private static final HashMap<String, Integer> collationNameToIdMap = new HashMap<>();

public static final int INDETERMINATE_COLLATION_ID = -1;
public static final int DEFAULT_COLLATION_ID = 0;
public static final int LOWERCASE_COLLATION_ID = 1;

Expand Down
29 changes: 24 additions & 5 deletions common/utils/src/main/resources/error/error-classes.json
Original file line number Diff line number Diff line change
Expand Up @@ -475,6 +475,24 @@
],
"sqlState" : "42704"
},
"COLLATION_MISMATCH" : {
"message" : [
"Could not determine which collation to use for string comparison."
],
"subClass" : {
"EXPLICIT" : {
"message" : [
"Error occurred due to the mismatch between explicit collations: <explicitTypes>. Decide on a single explicit collation and remove others."
]
},
"IMPLICIT" : {
"message" : [
"Error occurred due to the mismatch between multiple implicit non-default collations. Use COLLATE function to set the collation explicitly."
]
}
},
"sqlState" : "42P21"
},
"COLLECTION_SIZE_LIMIT_EXCEEDED" : {
"message" : [
"Can't create array with <numberOfElements> elements which exceeding the array size limit <maxRoundedArrayLength>,"
Expand Down Expand Up @@ -696,11 +714,6 @@
"To convert values from <srcType> to <targetType>, you can use the functions <functionNames> instead."
]
},
"COLLATION_MISMATCH" : {
"message" : [
"Collations <collationNameLeft> and <collationNameRight> are not compatible. Please use the same collation for both strings."
]
},
"CREATE_MAP_KEY_DIFF_TYPES" : {
"message" : [
"The given keys of function <functionName> should all be the same type, but they are <dataType>."
Expand Down Expand Up @@ -1574,6 +1587,12 @@
],
"sqlState" : "22003"
},
"INDETERMINATE_COLLATION" : {
"message" : [
"Function called requires knowledge of the collation it should apply, but indeterminate collation was found. Use COLLATE function to set the collation explicitly."
],
"sqlState" : "42P22"
},
"INDEX_ALREADY_EXISTS" : {
"message" : [
"Cannot create the index <indexName> on table <tableName> because it already exists."
Expand Down
41 changes: 41 additions & 0 deletions docs/sql-error-conditions-collation-mismatch-error-class.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
layout: global
title: COLLATION_MISMATCH error class
displayTitle: COLLATION_MISMATCH error class
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---

<!--
DO NOT EDIT THIS FILE.
It was generated automatically by `org.apache.spark.SparkThrowableSuite`.
-->

[SQLSTATE: 42P21](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)

Could not determine which collation to use for string comparison.

This error class has the following derived error classes:

## EXPLICIT

Error occurred due to the mismatch between explicit collations: `<explicitTypes>`. Decide on a single explicit collation and remove others.

## IMPLICIT

Error occurred due to the mismatch between multiple implicit non-default collations. Use COLLATE function to set the collation explicitly.


4 changes: 0 additions & 4 deletions docs/sql-error-conditions-datatype-mismatch-error-class.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,6 @@ If you have to cast `<srcType>` to `<targetType>`, you can set `<config>` as `<c
cannot cast `<srcType>` to `<targetType>`.
To convert values from `<srcType>` to `<targetType>`, you can use the functions `<functionNames>` instead.

## COLLATION_MISMATCH

Collations `<collationNameLeft>` and `<collationNameRight>` are not compatible. Please use the same collation for both strings.

## CREATE_MAP_KEY_DIFF_TYPES

The given keys of function `<functionName>` should all be the same type, but they are `<dataType>`.
Expand Down
14 changes: 14 additions & 0 deletions docs/sql-error-conditions.md
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,14 @@ Cannot find a short name for the codec `<codecName>`.

The value `<collationName>` does not represent a correct collation name. Suggested valid collation name: [`<proposal>`].

### [COLLATION_MISMATCH](sql-error-conditions-collation-mismatch-error-class.html)

[SQLSTATE: 42P21](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)

Could not determine which collation to use for string comparison.

For more details see [COLLATION_MISMATCH](sql-error-conditions-collation-mismatch-error-class.html)

### [COLLECTION_SIZE_LIMIT_EXCEEDED](sql-error-conditions-collection-size-limit-exceeded-error-class.html)

[SQLSTATE: 54000](sql-error-conditions-sqlstates.html#class-54-program-limit-exceeded)
Expand Down Expand Up @@ -945,6 +953,12 @@ For more details see [INCONSISTENT_BEHAVIOR_CROSS_VERSION](sql-error-conditions-

Max offset with `<rowsPerSecond>` rowsPerSecond is `<maxSeconds>`, but 'rampUpTimeSeconds' is `<rampUpTimeSeconds>`.

### INDETERMINATE_COLLATION

[SQLSTATE: 42P22](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)

Function called requires knowledge of the collation it should apply, but indeterminate collation was found. Use COLLATE function to set the collation explicitly.

### INDEX_ALREADY_EXISTS

[SQLSTATE: 42710](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,22 @@ class StringType private(val collationId: Int) extends AtomicType with Serializa
*/
def isBinaryCollation: Boolean = CollationFactory.fetchCollation(collationId).isBinaryCollation

/**
* Returns whether the collation is indeterminate. An indeterminate collation is
* a result of combination of conflicting non-default implicit collations.
*/
def isIndeterminateCollation: Boolean = collationId == CollationFactory.INDETERMINATE_COLLATION_ID

/**
* Type name that is shown to the customer.
* If this is an UCS_BASIC collation output is `string` due to backwards compatibility.
*/
override def typeName: String =
if (isDefaultCollation) "string"
else if (isIndeterminateCollation) s"string COLLATE INDETERMINATE_COLLATION"
else s"string COLLATE ${CollationFactory.fetchCollation(collationId).collationName}"


override def equals(obj: Any): Boolean =
obj.isInstanceOf[StringType] && obj.asInstanceOf[StringType].collationId == collationId

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

package org.apache.spark.sql.catalyst.analysis

import org.apache.spark.sql.catalyst.analysis.TypeCoercion.{castStringType, hasStringType}
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.rules.Rule
Expand Down Expand Up @@ -90,6 +91,7 @@ object AnsiTypeCoercion extends TypeCoercionBase {
Division ::
IntegralDivision ::
ImplicitTypeCasts ::
CollationTypeCasts ::
DateTimeOperations ::
WindowFrameCoercion ::
GetDateFieldOperations:: Nil) :: Nil
Expand Down Expand Up @@ -138,21 +140,31 @@ object AnsiTypeCoercion extends TypeCoercionBase {
@scala.annotation.tailrec
private def findWiderTypeForString(dt1: DataType, dt2: DataType): Option[DataType] = {
(dt1, dt2) match {
case (StringType, _: IntegralType) => Some(LongType)
case (StringType, _: FractionalType) => Some(DoubleType)
case (StringType, NullType) => Some(StringType)
case (_: StringType, _: IntegralType) => Some(LongType)
case (_: StringType, _: FractionalType) => Some(DoubleType)
case (st: StringType, NullType) => Some(st)
// If a binary operation contains interval type and string, we can't decide which
// interval type the string should be promoted as. There are many possible interval
// types, such as year interval, month interval, day interval, hour interval, etc.
case (StringType, _: AnsiIntervalType) => None
case (StringType, a: AtomicType) => Some(a)
case (other, StringType) if other != StringType => findWiderTypeForString(StringType, other)
case (_: StringType, _: AnsiIntervalType) => None
case (_: StringType, a: AtomicType) => Some(a)
case (other, st: StringType) if !other.isInstanceOf[StringType] =>
findWiderTypeForString(st, other)
case _ => None
}
}

override def findWiderCommonType(types: Seq[DataType]): Option[DataType] = {
types.foldLeft[Option[DataType]](Some(NullType))((r, c) =>
override def findWiderCommonType(exprs: Seq[Expression],
failOnIndeterminate: Boolean = false): Option[DataType] = {
(if (exprs.map(_.dataType).partition(hasStringType)._1.distinct.size > 1) {
val collationId = CollationTypeCasts.getOutputCollation(exprs, failOnIndeterminate)
exprs.map(e =>
if (hasStringType(e.dataType)) {
castStringType(e.dataType, collationId)
e
}
else e)
} else exprs).map(_.dataType).foldLeft[Option[DataType]](Some(NullType))((r, c) =>
r match {
case Some(d) => findWiderTypeForTwo(d, c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this pretty weird.
Why can't we just rely on fold + findWiderTypeForTwo logic? I think that type checks should remain foldable even with collation concept? i.e. we should always be able to determine output collation by just comparing two expressions and eventually folding to the result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findWiderTypeForTwo only accepts dataType's. To extract if we have explicit collation, we need the expression itself. I will look into findWiderTypeForTwo and whether it makes sense to change it to accept expressions as well.

case _ => None
Expand All @@ -173,6 +185,8 @@ object AnsiTypeCoercion extends TypeCoercionBase {
inType: DataType,
expectedType: AbstractDataType): Option[DataType] = {
(inType, expectedType) match {
case (_: StringType, st: StringType) =>
Some(st)
// If the expected type equals the input type, no need to cast.
case _ if expectedType.acceptsType(inType) => Some(inType)

Expand All @@ -188,21 +202,21 @@ object AnsiTypeCoercion extends TypeCoercionBase {

// This type coercion system will allow implicit converting String type as other
// primitive types, in case of breaking too many existing Spark SQL queries.
case (StringType, a: AtomicType) =>
case (_: StringType, a: AtomicType) =>
Some(a)

// If the target type is any Numeric type, convert the String type as Double type.
case (StringType, NumericType) =>
case (_: StringType, NumericType) =>
Some(DoubleType)

// If the target type is any Decimal type, convert the String type as the default
// Decimal type.
case (StringType, DecimalType) =>
case (_: StringType, DecimalType) =>
Some(DecimalType.SYSTEM_DEFAULT)

// If the target type is any timestamp type, convert the String type as the default
// Timestamp type.
case (StringType, AnyTimestampType) =>
case (_: StringType, AnyTimestampType) =>
Some(AnyTimestampType.defaultConcreteType)

case (DateType, AnyTimestampType) =>
Expand Down
Loading