Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
address comments
  • Loading branch information
gengliangwang committed Feb 20, 2021
commit ae5caa2b824979db3a877561885d2624c90fb601
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ import org.apache.spark.sql.types._
* the following operators. The data type of the operator is the tightest common precedent
* data type.
* * In
* * Except(odd)
* * Except
* * Intersect
* * Greatest
* * Least
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,19 +107,6 @@ abstract class TypeCoercionBase {
case _ => None
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move findCommonTypeDifferentOnlyInNullFlags into object TypeCoercion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasStringType is only used for TypeCoercion, too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let me move them


/**
* The method finds a common type for data types that differ only in nullable flags, including
* `nullable`, `containsNull` of [[ArrayType]] and `valueContainsNull` of [[MapType]].
* If the input types are different besides nullable flags, None is returned.
*/
def findCommonTypeDifferentOnlyInNullFlags(t1: DataType, t2: DataType): Option[DataType] = {
if (t1 == t2) {
Some(t1)
} else {
findTypeForComplex(t1, t2, findCommonTypeDifferentOnlyInNullFlags)
}
}

def findCommonTypeDifferentOnlyInNullFlags(types: Seq[DataType]): Option[DataType] = {
if (types.isEmpty) {
None
Expand Down Expand Up @@ -170,17 +157,6 @@ abstract class TypeCoercionBase {
})
}


/**
* Whether the data type contains StringType.
*/
def hasStringType(dt: DataType): Boolean = dt match {
case StringType => true
case ArrayType(et, _) => hasStringType(et)
// Add StructType if we support string promotion for struct fields in the future.
case _ => false
}

/**
* Check whether the given types are equal ignoring nullable, containsNull and valueContainsNull.
*/
Expand All @@ -202,30 +178,32 @@ abstract class TypeCoercionBase {
}

/**
* Widens numeric types and converts strings to numbers when appropriate.
* Widens the data types of the children of Union/Except/Intersect.
* 1. When ANSI mode is off:
* Loosely based on rules from "Hadoop: The Definitive Guide" 2nd edition, by Tom White
*
* Loosely based on rules from "Hadoop: The Definitive Guide" 2nd edition, by Tom White
* The implicit conversion rules can be summarized as follows:
* - Any integral numeric type can be implicitly converted to a wider type.
* - All the integral numeric types, FLOAT, and (perhaps surprisingly) STRING can be
* implicitly converted to DOUBLE.
* - TINYINT, SMALLINT, and INT can all be converted to FLOAT.
* - BOOLEAN types cannot be converted to any other type.
* - Any integral numeric type can be implicitly converted to decimal type.
* - two different decimal types will be converted into a wider decimal type for both of them.
* - decimal type will be converted into double if there float or double together with it.
*
* The implicit conversion rules can be summarized as follows:
* - Any integral numeric type can be implicitly converted to a wider type.
* - All the integral numeric types, FLOAT, and (perhaps surprisingly) STRING can be implicitly
* converted to DOUBLE.
* - TINYINT, SMALLINT, and INT can all be converted to FLOAT.
* - BOOLEAN types cannot be converted to any other type.
* - Any integral numeric type can be implicitly converted to decimal type.
* - two different decimal types will be converted into a wider decimal type for both of them.
* - decimal type will be converted into double if there float or double together with it.
* All types when UNION-ed with strings will be promoted to
* strings. Other string conversions are handled by PromoteStrings.
*
* Additionally when ANSI mode is off, all types when UNION-ed with strings will be promoted to
* strings. Other string conversions are handled by PromoteStrings.
* Widening types might result in loss of precision in the following cases:
* - IntegerType to FloatType
* - LongType to FloatType
* - LongType to DoubleType
* - DecimalType to Double
*
* Widening types might result in loss of precision in the following cases:
* - IntegerType to FloatType
* - LongType to FloatType
* - LongType to DoubleType
* - DecimalType to Double
*
* This rule is only applied to Union/Except/Intersect
* 2. When ANSI mode is on:
* The implicit conversion is determined by the closest common data type from the precedent
* lists from left and right child. See the comments of Object `AnsiTypeCoercion` for details.
*/
object WidenSetOperationTypes extends TypeCoercionRule {

Expand Down Expand Up @@ -1044,6 +1022,29 @@ object TypeCoercion extends TypeCoercionBase {
Option(ret)
}

/**
* The method finds a common type for data types that differ only in nullable flags, including
* `nullable`, `containsNull` of [[ArrayType]] and `valueContainsNull` of [[MapType]].
* If the input types are different besides nullable flags, None is returned.
*/
def findCommonTypeDifferentOnlyInNullFlags(t1: DataType, t2: DataType): Option[DataType] = {
if (t1 == t2) {
Some(t1)
} else {
findTypeForComplex(t1, t2, findCommonTypeDifferentOnlyInNullFlags)
}
}

/**
* Whether the data type contains StringType.
*/
def hasStringType(dt: DataType): Boolean = dt match {
case StringType => true
case ArrayType(et, _) => hasStringType(et)
// Add StructType if we support string promotion for struct fields in the future.
case _ => false
}

/**
* Promotes strings that appear in arithmetic expressions.
*/
Expand Down