Skip to content

Conversation

@JoshRosen
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes a correctness issue in the CodeGenerator.addBufferedState() helper method (which is used by the SortMergeJoinExec operator).

The addBufferedState() method generates code for buffering values that come from a row in an operator's input iterator, performing any necessary copying so that the buffered values remain correct after the input iterator advances to the next row.

The current logic does not correctly handle UDTs: these fall through to the match statement's default branch, causing UDT values to be buffered without copying. This is problematic if the UDT's underlying SQL type is an array, map, struct, or string type (since those types require copying). Failing to copy values can lead to correctness issues or crashes.

This patch's fix is simple: when the dataType is a UDT, use its underlying sqlType for determining whether values need to be copied. I used an existing helper function to perform this type unwrapping.

Why are the changes needed?

Fix a correctness issue.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I manually tested this change by re-running a workload which failed with a segfault prior to this patch. See JIRA for more details: https://issues.apache.org/jira/browse/SPARK-37784

So far I have been unable to come up with a CI-runnable regression test which would have failed prior to this change (my only working reproduction runs in a pre-production environment and does not fail in my development environment).

@JoshRosen JoshRosen changed the title [SPARK-37784] Correctly handle UDTs in CodeGenerator.addBufferedState() [SPARK-37784][SQL] Correctly handle UDTs in CodeGenerator.addBufferedState() Dec 30, 2021
@github-actions github-actions bot added the SQL label Dec 30, 2021
@JoshRosen
Copy link
Contributor Author

Please let me know if you have suggestions for good ways to write a regression test for this bug. So far I've been unable to adapt my existing reproduction into something which fails in CI.

Given enough time, I might be able to contrive a failing regression test by manually instantiating a SortMergeJoinExec operator and controlling its input iterators such that the non-copied values are mutated when the iterator advances (I'd use the SparkPlanTest helpers for this).

OTOH this particular helper function changes very infrequently, so I think the risk of future regression might be small enough that it might be okay to forgo writing the more complicated test. If anyone has strong opinions here then please let me know.


I'm now curious about whether there could be other similar UDT-related bugs in our code generation. I plan to search through the code for all other places where we generate copy() / clone() logic to check whether they properly handle UDTs.

@JoshRosen
Copy link
Contributor Author

I'm going to merge this to master, branch-3.2, branch-3.1, and branch-3.0.

@JoshRosen JoshRosen closed this in eeef48f Jan 4, 2022
JoshRosen added a commit that referenced this pull request Jan 4, 2022
…State()

### What changes were proposed in this pull request?

This PR fixes a correctness issue in the CodeGenerator.addBufferedState() helper method (which is used by the SortMergeJoinExec operator).

The addBufferedState() method generates code for buffering values that come from a row in an operator's input iterator, performing any necessary copying so that the buffered values remain correct after the input iterator advances to the next row.

The current logic does not correctly handle UDTs: these fall through to the match statement's default branch, causing UDT values to be buffered without copying. This is problematic if the UDT's underlying SQL type is an array, map, struct, or string type (since those types require copying). Failing to copy values can lead to correctness issues or crashes.

This patch's fix is simple: when the dataType is a UDT, use its underlying sqlType for determining whether values need to be copied. I used an existing helper function to perform this type unwrapping.

### Why are the changes needed?

Fix a correctness issue.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I manually tested this change by re-running a workload which failed with a segfault prior to this patch. See JIRA for more details: https://issues.apache.org/jira/browse/SPARK-37784

So far I have been unable to come up with a CI-runnable regression test which would have failed prior to this change (my only working reproduction runs in a pre-production environment and does not fail in my development environment).

Closes #35066 from JoshRosen/SPARK-37784.

Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>
(cherry picked from commit eeef48f)
Signed-off-by: Josh Rosen <[email protected]>
JoshRosen added a commit that referenced this pull request Jan 4, 2022
…State()

### What changes were proposed in this pull request?

This PR fixes a correctness issue in the CodeGenerator.addBufferedState() helper method (which is used by the SortMergeJoinExec operator).

The addBufferedState() method generates code for buffering values that come from a row in an operator's input iterator, performing any necessary copying so that the buffered values remain correct after the input iterator advances to the next row.

The current logic does not correctly handle UDTs: these fall through to the match statement's default branch, causing UDT values to be buffered without copying. This is problematic if the UDT's underlying SQL type is an array, map, struct, or string type (since those types require copying). Failing to copy values can lead to correctness issues or crashes.

This patch's fix is simple: when the dataType is a UDT, use its underlying sqlType for determining whether values need to be copied. I used an existing helper function to perform this type unwrapping.

### Why are the changes needed?

Fix a correctness issue.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I manually tested this change by re-running a workload which failed with a segfault prior to this patch. See JIRA for more details: https://issues.apache.org/jira/browse/SPARK-37784

So far I have been unable to come up with a CI-runnable regression test which would have failed prior to this change (my only working reproduction runs in a pre-production environment and does not fail in my development environment).

Closes #35066 from JoshRosen/SPARK-37784.

Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>
(cherry picked from commit eeef48f)
Signed-off-by: Josh Rosen <[email protected]>
JoshRosen added a commit that referenced this pull request Jan 4, 2022
…State()

### What changes were proposed in this pull request?

This PR fixes a correctness issue in the CodeGenerator.addBufferedState() helper method (which is used by the SortMergeJoinExec operator).

The addBufferedState() method generates code for buffering values that come from a row in an operator's input iterator, performing any necessary copying so that the buffered values remain correct after the input iterator advances to the next row.

The current logic does not correctly handle UDTs: these fall through to the match statement's default branch, causing UDT values to be buffered without copying. This is problematic if the UDT's underlying SQL type is an array, map, struct, or string type (since those types require copying). Failing to copy values can lead to correctness issues or crashes.

This patch's fix is simple: when the dataType is a UDT, use its underlying sqlType for determining whether values need to be copied. I used an existing helper function to perform this type unwrapping.

### Why are the changes needed?

Fix a correctness issue.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I manually tested this change by re-running a workload which failed with a segfault prior to this patch. See JIRA for more details: https://issues.apache.org/jira/browse/SPARK-37784

So far I have been unable to come up with a CI-runnable regression test which would have failed prior to this change (my only working reproduction runs in a pre-production environment and does not fail in my development environment).

Closes #35066 from JoshRosen/SPARK-37784.

Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>
(cherry picked from commit eeef48f)
Signed-off-by: Josh Rosen <[email protected]>
@JoshRosen JoshRosen deleted the SPARK-37784 branch January 4, 2022 19:05
@dongjoon-hyun
Copy link
Member

Thank you!

fishcus pushed a commit to fishcus/spark that referenced this pull request Jan 12, 2022
…State()

### What changes were proposed in this pull request?

This PR fixes a correctness issue in the CodeGenerator.addBufferedState() helper method (which is used by the SortMergeJoinExec operator).

The addBufferedState() method generates code for buffering values that come from a row in an operator's input iterator, performing any necessary copying so that the buffered values remain correct after the input iterator advances to the next row.

The current logic does not correctly handle UDTs: these fall through to the match statement's default branch, causing UDT values to be buffered without copying. This is problematic if the UDT's underlying SQL type is an array, map, struct, or string type (since those types require copying). Failing to copy values can lead to correctness issues or crashes.

This patch's fix is simple: when the dataType is a UDT, use its underlying sqlType for determining whether values need to be copied. I used an existing helper function to perform this type unwrapping.

### Why are the changes needed?

Fix a correctness issue.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I manually tested this change by re-running a workload which failed with a segfault prior to this patch. See JIRA for more details: https://issues.apache.org/jira/browse/SPARK-37784

So far I have been unable to come up with a CI-runnable regression test which would have failed prior to this change (my only working reproduction runs in a pre-production environment and does not fail in my development environment).

Closes apache#35066 from JoshRosen/SPARK-37784.

Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>
(cherry picked from commit eeef48f)
Signed-off-by: Josh Rosen <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
…State()

### What changes were proposed in this pull request?

This PR fixes a correctness issue in the CodeGenerator.addBufferedState() helper method (which is used by the SortMergeJoinExec operator).

The addBufferedState() method generates code for buffering values that come from a row in an operator's input iterator, performing any necessary copying so that the buffered values remain correct after the input iterator advances to the next row.

The current logic does not correctly handle UDTs: these fall through to the match statement's default branch, causing UDT values to be buffered without copying. This is problematic if the UDT's underlying SQL type is an array, map, struct, or string type (since those types require copying). Failing to copy values can lead to correctness issues or crashes.

This patch's fix is simple: when the dataType is a UDT, use its underlying sqlType for determining whether values need to be copied. I used an existing helper function to perform this type unwrapping.

### Why are the changes needed?

Fix a correctness issue.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I manually tested this change by re-running a workload which failed with a segfault prior to this patch. See JIRA for more details: https://issues.apache.org/jira/browse/SPARK-37784

So far I have been unable to come up with a CI-runnable regression test which would have failed prior to this change (my only working reproduction runs in a pre-production environment and does not fail in my development environment).

Closes apache#35066 from JoshRosen/SPARK-37784.

Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>
(cherry picked from commit eeef48f)
Signed-off-by: Josh Rosen <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
…State()

### What changes were proposed in this pull request?

This PR fixes a correctness issue in the CodeGenerator.addBufferedState() helper method (which is used by the SortMergeJoinExec operator).

The addBufferedState() method generates code for buffering values that come from a row in an operator's input iterator, performing any necessary copying so that the buffered values remain correct after the input iterator advances to the next row.

The current logic does not correctly handle UDTs: these fall through to the match statement's default branch, causing UDT values to be buffered without copying. This is problematic if the UDT's underlying SQL type is an array, map, struct, or string type (since those types require copying). Failing to copy values can lead to correctness issues or crashes.

This patch's fix is simple: when the dataType is a UDT, use its underlying sqlType for determining whether values need to be copied. I used an existing helper function to perform this type unwrapping.

### Why are the changes needed?

Fix a correctness issue.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I manually tested this change by re-running a workload which failed with a segfault prior to this patch. See JIRA for more details: https://issues.apache.org/jira/browse/SPARK-37784

So far I have been unable to come up with a CI-runnable regression test which would have failed prior to this change (my only working reproduction runs in a pre-production environment and does not fail in my development environment).

Closes apache#35066 from JoshRosen/SPARK-37784.

Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>
(cherry picked from commit eeef48f)
Signed-off-by: Josh Rosen <[email protected]>
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…State()

### What changes were proposed in this pull request?

This PR fixes a correctness issue in the CodeGenerator.addBufferedState() helper method (which is used by the SortMergeJoinExec operator).

The addBufferedState() method generates code for buffering values that come from a row in an operator's input iterator, performing any necessary copying so that the buffered values remain correct after the input iterator advances to the next row.

The current logic does not correctly handle UDTs: these fall through to the match statement's default branch, causing UDT values to be buffered without copying. This is problematic if the UDT's underlying SQL type is an array, map, struct, or string type (since those types require copying). Failing to copy values can lead to correctness issues or crashes.

This patch's fix is simple: when the dataType is a UDT, use its underlying sqlType for determining whether values need to be copied. I used an existing helper function to perform this type unwrapping.

### Why are the changes needed?

Fix a correctness issue.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I manually tested this change by re-running a workload which failed with a segfault prior to this patch. See JIRA for more details: https://issues.apache.org/jira/browse/SPARK-37784

So far I have been unable to come up with a CI-runnable regression test which would have failed prior to this change (my only working reproduction runs in a pre-production environment and does not fail in my development environment).

Closes apache#35066 from JoshRosen/SPARK-37784.

Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>
(cherry picked from commit eeef48f)
Signed-off-by: Josh Rosen <[email protected]>
(cherry picked from commit 45b7b7e)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants