-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24734][SQL] Fix type coercions and nullabilities of nested data types of some functions. #21704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24734][SQL] Fix type coercions and nullabilities of nested data types of some functions. #21704
Changes from 2 commits
d87a8c6
30d5aed
2e624df
fa73b32
da0702b
b31e401
7a838b0
01c9ff3
b2ca587
3d8891e
1fa692a
444383d
3e1f7e4
2c54e38
5f1b865
2ab025f
db254e5
b412f7b
f701242
5115961
e489e8b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2007,7 +2007,14 @@ case class Concat(children: Seq[Expression]) extends Expression { | |
| } | ||
| } | ||
|
|
||
| override def dataType: DataType = children.map(_.dataType).headOption.getOrElse(StringType) | ||
| override def dataType: DataType = { | ||
| val dataTypes = children.map(_.dataType) | ||
| dataTypes.headOption.map { | ||
| case ArrayType(et, _) => | ||
| ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) | ||
| case dt => dt | ||
| }.getOrElse(StringType) | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can't we handle this case in type coercion (analysis phase)?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, E.g.,
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a test to show the wrong nullability.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aha, I see. But, I just have a hunch that
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, that also makes sense. I'm not sure we can remove the simplification, though. cc @gatorsmile @cloud-fan
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah,
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This can work, but my point is we should not add the cast to change
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, see. In that case, it would be nice to introduce a method that will resolve the output DataType and merges
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SGTM
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for the @mn-mikke idea |
||
|
|
||
| lazy val javaType: String = CodeGenerator.javaType(dataType) | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we support array of array in concat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should work (see a test for it). Did we miss anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then shall we fix the
containNullfor the inner array?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, +
valueContainsNullforMapTypeandnullableforStructFieldThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the inner nullabilities are coerced during type-coercion? If the inner nullabilities are different, type coercion adds casts and they will remain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin For
Concat,Coalesce, etc. it seems to be that case since a coercion rule is executed if there is any nullability difference on any level of nesting. But it's not the case ofCaseWhenCoercionrule, sincesameTypemethod is used for comparison.I'm wondering if the goal is to avoid generation of extra
Castexpressions, shouldn't other coercion rules utilizesameTypemethod as well? Let's assume that the result ofconcatis subsequently used byflatten, wouldn't it lead to generation of extra null safe checks as mentioned here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ignore the part of my previous comment regarding
flattenfunction. The output data type ofconcat, etc. will be the same regardless what resolvesnullflags.