Commit eeef48f
committed
[SPARK-37784][SQL] Correctly handle UDTs in CodeGenerator.addBufferedState()
### What changes were proposed in this pull request?
This PR fixes a correctness issue in the CodeGenerator.addBufferedState() helper method (which is used by the SortMergeJoinExec operator).
The addBufferedState() method generates code for buffering values that come from a row in an operator's input iterator, performing any necessary copying so that the buffered values remain correct after the input iterator advances to the next row.
The current logic does not correctly handle UDTs: these fall through to the match statement's default branch, causing UDT values to be buffered without copying. This is problematic if the UDT's underlying SQL type is an array, map, struct, or string type (since those types require copying). Failing to copy values can lead to correctness issues or crashes.
This patch's fix is simple: when the dataType is a UDT, use its underlying sqlType for determining whether values need to be copied. I used an existing helper function to perform this type unwrapping.
### Why are the changes needed?
Fix a correctness issue.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
I manually tested this change by re-running a workload which failed with a segfault prior to this patch. See JIRA for more details: https://issues.apache.org/jira/browse/SPARK-37784
So far I have been unable to come up with a CI-runnable regression test which would have failed prior to this change (my only working reproduction runs in a pre-production environment and does not fail in my development environment).
Closes apache#35066 from JoshRosen/SPARK-37784.
Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>1 parent 08fd501 commit eeef48f
File tree
1 file changed
+1
-1
lines changed- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen
1 file changed
+1
-1
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
348 | 348 | | |
349 | 349 | | |
350 | 350 | | |
351 | | - | |
| 351 | + | |
352 | 352 | | |
353 | 353 | | |
354 | 354 | | |
| |||
0 commit comments