-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21247][SQL] Type comparison should respect case-sensitive SQL conf #18460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
037120b
cded719
180e1b3
2343d97
8fa884e
b2d1338
c72aa18
67a037c
52d19d3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,6 +17,7 @@ | |
|
|
||
| package org.apache.spark.sql.catalyst.analysis | ||
|
|
||
| import java.util.Locale | ||
| import javax.annotation.Nullable | ||
|
|
||
| import scala.annotation.tailrec | ||
|
|
@@ -100,6 +101,17 @@ object TypeCoercion { | |
| case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => | ||
| Some(TimestampType) | ||
|
|
||
| case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => | ||
| Some(StructType(fields1.zip(fields2).map { case (f1, f2) => | ||
| // Since `t1.sameType(t2)` is true, two StructTypes have the same DataType | ||
| // except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. | ||
| // - Different names: use a lower case name because findTightestCommonType is commutative. | ||
| // - Different nullabilities: `nullable` is true iff one of them is nullable. | ||
| val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) | ||
| val dataType = findTightestCommonType(f1.dataType, f2.dataType).get | ||
| StructField(name, dataType, nullable = f1.nullable || f2.nullable) | ||
|
||
| })) | ||
|
|
||
| case _ => None | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,6 +26,7 @@ import org.json4s.jackson.JsonMethods._ | |
|
|
||
| import org.apache.spark.annotation.InterfaceStability | ||
| import org.apache.spark.sql.catalyst.expressions.Expression | ||
| import org.apache.spark.sql.internal.SQLConf | ||
| import org.apache.spark.util.Utils | ||
|
|
||
| /** | ||
|
|
@@ -80,7 +81,11 @@ abstract class DataType extends AbstractDataType { | |
| * (`StructField.nullable`, `ArrayType.containsNull`, and `MapType.valueContainsNull`). | ||
| */ | ||
| private[spark] def sameType(other: DataType): Boolean = | ||
| DataType.equalsIgnoreNullability(this, other) | ||
| if (SQLConf.get.caseSensitiveAnalysis) { | ||
| DataType.equalsIgnoreNullability(this, other) | ||
| } else { | ||
| DataType.equalsIgnoreCaseAndNullability(this, other) | ||
|
||
| } | ||
|
|
||
| /** | ||
| * Returns the same data type but set all nullability fields are true | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the output nested column name is lower case? Is Hive behaving like this?
In addition, could you add one more test and check whether we also respect case sensitivity conf when we resolve the queries that contain the nested column in the references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Hive does like the following.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a test case, does
nested column in the referencesmeanWHEREclause?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do both
tab.Col1.aandtab.Col1.Awork well when case sensitivity is off?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When case sensitive is off, Spark considers them in lower cases.
For example, in the test case, we need table name
struct1andstruct2. In case ofa = A, it raises ambiquous column exceptions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, does the case sensitivity conf works in nest column name resolution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Hive, your example works like the following.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I thought so. Let me check that again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It works. I updated the test cases.