Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
local var
  • Loading branch information
kings129 committed Apr 19, 2023
commit 2f43ef41b6c6f3446db8fefb6cbe0176ed5b1eda
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,12 @@ object ExpressionEncoder {
If(IsNull(input), Literal.create(null, result.dataType), result)
}

val serializer = nullSafe(newSerializerInput, newSerializer)
val deserializer = nullSafe(newDeserializerInput, newDeserializer)

new ExpressionEncoder[Any](
nullSafe(newSerializerInput, newSerializer),
nullSafe(newDeserializerInput, newDeserializer),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's kind of we push down the null check to the children deserializers. Why is the serializer fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is intended to create a deserializer type newinstance(class scala.Tuple*) that can convert to a single null value. This behavior is the same as before the commit introduced the regression.
Regarding the serializer, in the new unit test added in this pull request, when the tuple is not null, named_struct is created for each element, and null is handled there.

if (isnull(input[0, scala.Tuple2, true])) null else named_struct(_1, if (isnull(input[0, scala.Tuple2, true]._1)) null else named_struct(a, if (input[0, scala.Tuple2, true]._1.isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(input[0, scala.Tuple2, true]._1, 0, a), StringType, ObjectType(class java.lang.String)), true, false, true), b, assertnotnull(validateexternaltype(getexternalrowfield(input[0, scala.Tuple2, true]._1, 1, b), IntegerType, ObjectType(class java.lang.Integer)).intValue)) AS _1#18, _2, if (isnull(input[0, scala.Tuple2, true]._2)) null else named_struct(a, if (input[0, scala.Tuple2, true]._2.isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(input[0, scala.Tuple2, true]._2, 0, a), StringType, ObjectType(class java.lang.String)), true, false, true), b, assertnotnull(validateexternaltype(getexternalrowfield(input[0, scala.Tuple2, true]._2, 1, b), IntegerType, ObjectType(class java.lang.Integer)).intValue)) AS _2#19)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan does my comment answer your question? PTAL, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks correct to me to add the null check for the children deserializers. But I don't quite understand why this PR removes the outermost null check. After looking at the code, I think it doesn't matter, as the outermost null check will be removed anyway: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala#L274

Since this is unrelated to this PR, let's not touch it. If you do want to fix it (adding null check and removing it later is useless), let's fix the serializer as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan, thanks for the explanation! You're right; it doesn't matter whether to keep the outermost null check. (null check for deserializer was also added in refactor commit)

I also prefer making minimal changes to fix the target issue. I added back the outermost null check for the deserializer.

serializer,
deserializer,
ClassTag(cls))
}

Expand Down