Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
b34544a
Implicit casting on collated expressions
mihailomilosevic2001 Mar 5, 2024
fdbfa44
Fix doc files
mihailomilosevic2001 Mar 5, 2024
ce9b027
Fix contains, startWith, endWith tests
mihailomilosevic2001 Mar 5, 2024
e537190
Fix imports
mihailomilosevic2001 Mar 5, 2024
b5a79c1
Fix docs and incorporate changes
mihailomilosevic2001 Mar 6, 2024
8321d0c
Fix tests in CollationSuite
mihailomilosevic2001 Mar 6, 2024
d178233
Add test and incorporate changes
mihailomilosevic2001 Mar 7, 2024
a4b9be7
Fix godlen files
mihailomilosevic2001 Mar 7, 2024
a6e7662
Incorporate StringType in findWiderCommonType
mihailomilosevic2001 Mar 8, 2024
e1d7ad5
Merge branch 'master' into SPARK-47210
mihailomilosevic2001 Mar 8, 2024
b3b1356
Fix ArrayType(StringType, _) casting in findWiderCommonType
mihailomilosevic2001 Mar 11, 2024
7773d13
Fix type mismatch error
mihailomilosevic2001 Mar 11, 2024
198a728
Merge branch 'apache:master' into SPARK-47210
mihailomilosevic2001 Mar 11, 2024
255b1ab
Incorporate changes and fix errors
mihailomilosevic2001 Mar 11, 2024
9ce417f
Merge branch 'master' into SPARK-47210
mihailomilosevic2001 Mar 12, 2024
50f3aa2
Fix errors
mihailomilosevic2001 Mar 12, 2024
ca0c84d
Rework casting
mihailomilosevic2001 Mar 13, 2024
880a1b1
Merge branch 'master' into SPARK-47210
mihailomilosevic2001 Mar 13, 2024
56d6c7c
Fix failing tests
mihailomilosevic2001 Mar 14, 2024
94e5259
Fix array cast errors
mihailomilosevic2001 Mar 14, 2024
ccb52ba
Fix additional errors
mihailomilosevic2001 Mar 14, 2024
9b1387b
Fix explicit collation search
mihailomilosevic2001 Mar 17, 2024
c9974e1
Fix scala style errors
mihailomilosevic2001 Mar 18, 2024
fca9a65
Add support for ImplicitCastInputTypes
mihailomilosevic2001 Mar 18, 2024
660d664
Fix accidental change in license header
mihailomilosevic2001 Mar 18, 2024
c8edd93
Fix null casting
mihailomilosevic2001 Mar 19, 2024
a91490b
Fix failing tests
mihailomilosevic2001 Mar 19, 2024
49a8d61
Move implicit casting when strings present
mihailomilosevic2001 Mar 19, 2024
4c4cd84
Fix unintentional changes
mihailomilosevic2001 Mar 19, 2024
66122a6
improve types.py
mihailomilosevic2001 Mar 20, 2024
50f46e4
Refactor code
mihailomilosevic2001 Mar 21, 2024
cc86a87
Merge branch 'master' into SPARK-47210
mihailomilosevic2001 Mar 21, 2024
c01e80c
Fix imports and failing tests
mihailomilosevic2001 Mar 21, 2024
cc797a2
Disable casting of StructTypes
mihailomilosevic2001 Mar 21, 2024
5d001ee
Fix imports
mihailomilosevic2001 Mar 21, 2024
c68fc7d
Fix concat tests
mihailomilosevic2001 Mar 21, 2024
1c926ab
Fix unnecessary repetition
mihailomilosevic2001 Mar 21, 2024
dec39bf
Remove Elt test
mihailomilosevic2001 Mar 21, 2024
e808446
Remove tests for Repeat
mihailomilosevic2001 Mar 21, 2024
ca1a23a
Merge branch 'master' into SPARK-47210
mihailomilosevic2001 Mar 21, 2024
116931c
Merge branch 'apache:master' into SPARK-47210
mihailomilosevic2001 Mar 22, 2024
af487a2
Fix failing tests
mihailomilosevic2001 Mar 22, 2024
4ba7055
Fix nullability for StringType->StringType
mihailomilosevic2001 Mar 22, 2024
e490e42
Improve comments and switch tests from E2E to unit tests
mihailomilosevic2001 Mar 24, 2024
00e88e7
Add new tests and remove compatibility test
mihailomilosevic2001 Mar 25, 2024
85b4d16
Fix conflict resolution mistake
mihailomilosevic2001 Mar 25, 2024
30f7225
Merge branch 'apache:master' into SPARK-47210
mihailomilosevic2001 Mar 25, 2024
e89a354
Add indeterminate collation tests
mihailomilosevic2001 Mar 26, 2024
788dc06
Fix test
mihailomilosevic2001 Mar 26, 2024
75c0140
Block Alias on Indeterminate
mihailomilosevic2001 Mar 27, 2024
2918413
Merge remote-tracking branch 'upstream/master' into SPARK-47210
mihailomilosevic2001 Mar 28, 2024
f6ed55a
Remove introduction of indeterminate collation
mihailomilosevic2001 Mar 28, 2024
98960c0
Fix import problem
mihailomilosevic2001 Mar 28, 2024
de623c8
Fix failing tests
mihailomilosevic2001 Mar 28, 2024
a92b4e1
Fix pyspark error
mihailomilosevic2001 Mar 28, 2024
f7f3011
Merge branch 'apache:master' into SPARK-47210
mihailomilosevic2001 Mar 28, 2024
f67808e
Fix errors
mihailomilosevic2001 Mar 29, 2024
815ce42
Fix schema error
mihailomilosevic2001 Mar 29, 2024
7fca38a
Merge remote-tracking branch 'upstream/master' into SPARK-47210
mihailomilosevic2001 Mar 29, 2024
b19b0eb
Fix collated tests
mihailomilosevic2001 Mar 29, 2024
a111f03
Add isExplicit flag
mihailomilosevic2001 Mar 29, 2024
55bdd9b
Fix import error
mihailomilosevic2001 Mar 29, 2024
a7228be
Fix imports in TypeCoercion
mihailomilosevic2001 Mar 31, 2024
27a72c6
Merge remote-tracking branch 'upstream/master' into SPARK-47210
mihailomilosevic2001 Apr 1, 2024
18ada04
Add support for explicit propagation in arrays
mihailomilosevic2001 Apr 1, 2024
38670af
Fix tests to follow recent changes
mihailomilosevic2001 Apr 1, 2024
01d891e
Incorporate changes
mihailomilosevic2001 Apr 1, 2024
c5daf86
Fix error
mihailomilosevic2001 Apr 1, 2024
9ac5678
Change var to val in StringType
mihailomilosevic2001 Apr 1, 2024
0f1757d
Fix import style
mihailomilosevic2001 Apr 1, 2024
506c8c0
Revert explicit flag addition
mihailomilosevic2001 Apr 1, 2024
f743cf8
Narrow down expressions casting
mihailomilosevic2001 Apr 2, 2024
4f8fe1d
Incorporate minor changes
mihailomilosevic2001 Apr 2, 2024
52bf4dc
Incorporate changes
mihailomilosevic2001 Apr 2, 2024
7cbeafe
Special case expressions
mihailomilosevic2001 Apr 3, 2024
3e92e92
Return new line
mihailomilosevic2001 Apr 3, 2024
b23e106
Remove indentation cosmetic
mihailomilosevic2001 Apr 3, 2024
880ebed
Add more cosmetic changes
mihailomilosevic2001 Apr 3, 2024
f96ecd9
Incorporate changes
mihailomilosevic2001 Apr 3, 2024
e1e0cf4
Merge branch 'apache:master' into SPARK-47210
mihailomilosevic2001 Apr 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Incorporate changes
  • Loading branch information
mihailomilosevic2001 committed Apr 3, 2024
commit f96ecd9c45e4aad1c37ce08fbd50d7f113b5917c
Original file line number Diff line number Diff line change
Expand Up @@ -21,27 +21,30 @@ import javax.annotation.Nullable

import scala.annotation.tailrec

import org.apache.spark.sql.catalyst.analysis.TypeCoercion.{hasStringType}
import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, CaseWhen, Cast, Coalesce, Collate, Concat, ConcatWs, CreateArray, Expression, Greatest, If, In, InSubquery, Least, Substring}
import org.apache.spark.sql.catalyst.analysis.TypeCoercion.{hasStringType, haveSameType}
import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, CaseWhen, Cast, Coalesce, Collate, Concat, ConcatWs, CreateArray, Expression, Greatest, If, In, InSubquery, Least}
import org.apache.spark.sql.errors.QueryCompilationErrors
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType}
import org.apache.spark.sql.types.{ArrayType, DataType, StringType}

object CollationTypeCasts extends TypeCoercionRule {
override val transform: PartialFunction[Expression, Expression] = {
case e if !e.childrenResolved => e

case ifExpr: If =>
ifExpr.withNewChildren(
ifExpr.predicate +: collateToSingleType(Seq(ifExpr.trueValue, ifExpr.falseValue)))
case caseWhenExpr: CaseWhen =>
val newValues = collateToSingleType(
caseWhenExpr.branches.map(b => b._2) ++ caseWhenExpr.elseValue)
caseWhenExpr.withNewChildren(
interleave(Seq.empty, caseWhenExpr.branches.map(b => b._1), newValues))
case substrExpr: Substring =>
// This case is necessary for changing Substring input to implicit collation
substrExpr.withNewChildren(
collateToSingleType(Seq(substrExpr.str)) :+ substrExpr.pos :+ substrExpr.len)

case caseWhenExpr: CaseWhen if !haveSameType(caseWhenExpr.inputTypesForMerging) =>
val outputStringType =
getOutputCollation(caseWhenExpr.branches.map(_._2) ++ caseWhenExpr.elseValue)
val newBranches = caseWhenExpr.branches.map { case (condition, value) =>
(condition, castStringType(value, outputStringType).getOrElse(value))
}
val newElseValue =
caseWhenExpr.elseValue.map(e => castStringType(e, outputStringType).getOrElse(e))
CaseWhen(newBranches, newElseValue)

case otherExpr @ (
_: In | _: InSubquery | _: CreateArray | _: ArrayJoin | _: Concat | _: Greatest | _: Least |
_: Coalesce | _: BinaryExpression | _: ConcatWs) =>
Expand All @@ -67,7 +70,7 @@ object CollationTypeCasts extends TypeCoercionRule {
def castStringType(expr: Expression, st: StringType): Option[Expression] =
castStringType(expr.dataType, st).map { dt => Cast(expr, dt)}

private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = {
private def castStringType(inType: DataType, castType: StringType): Option[DataType] = {
@Nullable val ret: DataType = inType match {
case st: StringType if st.collationId != castType.collationId => castType
case ArrayType(arrType, nullable) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with special-case array type. The code looks broken. It assumes the children of the given expression can have both string type and array of string type, then tries to find a common collation between the string type child and the array element. This makes no sense without knowing the semantic of the given expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple example is ConcatWs. It can have ArrayType(StringType, _) for input strings and StringType for separator as parameters. What collations do we want for this then? We need to cast the ArrayType into a proper collation if separator is explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please match the ConcatWs expression explicitly to handle this case. What I disagree with is to do this blindly for all expressions.

Expand Down Expand Up @@ -122,10 +125,4 @@ object CollationTypeCasts extends TypeCoercionRule {
}
}
}

@tailrec
final def interleave[A](base: Seq[A], a: Seq[A], b: Seq[A]): Seq[A] = a match {
case elt :: aTail => interleave(base :+ elt, b, aTail)
case _ => base ++ b
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,6 @@ case class Collate(child: Expression, collationName: String)
// scalastyle:on line.contains.tab
case class Collation(child: Expression)
extends UnaryExpression with RuntimeReplaceable with ExpectsInputTypes {
override def dataType: DataType = SQLConf.get.defaultStringType
override protected def withNewChildInternal(newChild: Expression): Collation = copy(newChild)
override def replacement: Expression = {
val collationId = child.dataType.asInstanceOf[StringType].collationId
Expand Down