Commit 8b8ea60
[SPARK-47927][SQL] Fix nullability attribute in UDF decoder
### What changes were proposed in this pull request?
This PR fixes a correctness issue by moving the batch that resolves udf decoders to after the `UpdateNullability` batch. This means we now derive a decoder with the updated attributes which fixes a correctness issue.
I think the issue has existed since apache#28645 when udf support case class arguments was added. So therefore this issue should be present in all currently supported versions.
### Why are the changes needed?
Currently the following code
```
scala> val ds1 = Seq(1).toDS()
| val ds2 = Seq[Int]().toDS()
| val f = udf[Tuple1[Option[Int]],Tuple1[Option[Int]]](identity)
| ds1.join(ds2, ds1("value") === ds2("value"), "left_outer").select(f(struct(ds2("value")))).collect()
val ds1: org.apache.spark.sql.Dataset[Int] = [value: int]
val ds2: org.apache.spark.sql.Dataset[Int] = [value: int]
val f: org.apache.spark.sql.expressions.UserDefinedFunction = SparkUserDefinedFunction($Lambda$2481/0x00007f7f50961f086b1a2c9f,StructType(StructField(_1,IntegerType,true)),List(Some(class[_1[0]: int])),Some(class[_1[0]: int]),None,true,true)
val res0: Array[org.apache.spark.sql.Row] = Array([[0]])
```
results in an row containing `0` this is incorrect as the value should be `null`. Removing the udf call
```
scala> ds1.join(ds2, ds1("value") === ds2("value"), "left_outer").select(struct(ds2("value"))).collect()
val res1: Array[org.apache.spark.sql.Row] = Array([[null]])
```
gives the correct value.
### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue when using ScalaUDFs.
### How was this patch tested?
Existing and new unit tests.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes apache#46156 from eejbyfeldt/SPARK-47927.
Authored-by: Emil Ejbyfeldt <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent 76ce6b0 commit 8b8ea60
File tree
2 files changed
+13
-2
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis
- core/src/test/scala/org/apache/spark/sql
2 files changed
+13
-2
lines changedLines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
339 | 339 | | |
340 | 340 | | |
341 | 341 | | |
| 342 | + | |
| 343 | + | |
342 | 344 | | |
343 | 345 | | |
344 | 346 | | |
345 | | - | |
346 | | - | |
347 | 347 | | |
348 | 348 | | |
349 | 349 | | |
| |||
Lines changed: 11 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1183 | 1183 | | |
1184 | 1184 | | |
1185 | 1185 | | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
1186 | 1197 | | |
0 commit comments