Skip to content

Conversation

@eejbyfeldt
Copy link
Contributor

What changes were proposed in this pull request?

In #35170 SPARK-37855 and #32301 SPARK-35194 introduced conditions for ExtractValues that can currently not be handled. The considtion is introduced after collectRootReferenceAndExtractValue and just removes these candidates. This is problematic since these expressions might have contained AttributeReference that needed to not do an incorrect aliasing. This fixes this family of bugs by moving the conditions into the function collectRootReferenceAndExtractValue.

Why are the changes needed?

The current code leads to IllegalStateException runtime failures.

Does this PR introduce any user-facing change?

Yes, fixes a bug.

How was this patch tested?

Existing and new unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label May 27, 2024
@eejbyfeldt
Copy link
Contributor Author

eejbyfeldt commented May 31, 2024

@cloud-fan @dongjoon-hyun @viirya @ulysses-you have been involved in earlier PRs on this code. Do you know who would be a good reviewer of this change?

Emil Ejbyfeldt added 2 commits June 24, 2024 14:48
In apache#35170 SPARK-37855 and apache#32301 SPARK-35194 introduced conditions for
ExtractValues that can currently not be handled. The considtion is
introduced after `collectRootReferenceAndExtractValue` and just removes
these candidates. This is problematic since these expressions might have
contained `AttributeReference` that needed to not do an incorrect
rewrite. This fixes these family of bugs by moving the conditions into
the function `collectRootReferenceAndExtractValue`.
.select(
GetStructField(GetStructField(CreateStruct(Seq($"id", $"employer")), 1), 0),
$"employer.id")
.analyze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I haven't touched this part of code for a while. Can you give a detailed explanation about how IllegalStateException can be thrown, with this test case, to help reviewers understand the bug?

Copy link
Contributor Author

@eejbyfeldt eejbyfeldt Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the test case with the code on master collectRootReferenceAndExtractValue will collect to candidates for aliasing $"employer.id" and GetStructField(GetStructField(CreateStruct(Seq($"id", $"employer")), 1), 0) and no root references. The second one gets filtered out due to this condition
https://github.com/apache/spark/blob/v3.5.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala#L253

Then the logic after this will only consider that we are accessing only $"employer.id" and will miss/be unaware that we are accessing $employer inside the other expression. This will cause it to perform the transformation/optimization incorrectly which leads to the IllegalStateException.

With the fix collectRootReferenceAndExtractValue will return $"employer.id" as a candidate and $id and $employer as root references. This means that when evaluating $"employer.id this time it will be discarded due to this filter: https://github.com/apache/spark/blob/v3.5.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala#L264

Does that help understand the issue?

@eejbyfeldt eejbyfeldt changed the title SPARK-48428: Fix IllegalStateException in NestedColumnAliasing [SPARK-48428][SQL]: Fix IllegalStateException in NestedColumnAliasing Jun 25, 2024
@cloud-fan
Copy link
Contributor

@eejbyfeldt thanks for the fix! Can you re-trigger the GA jobs? The failure seems flaky.

@eejbyfeldt
Copy link
Contributor Author

@eejbyfeldt thanks for the fix! Can you re-trigger the GA jobs? The failure seems flaky.

@cloud-fan done.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.5!

@cloud-fan cloud-fan closed this in b11608c Jun 27, 2024
cloud-fan pushed a commit that referenced this pull request Jun 27, 2024
### What changes were proposed in this pull request?

In #35170 SPARK-37855 and #32301 SPARK-35194 introduced conditions for ExtractValues that can currently not be handled. The considtion is introduced after `collectRootReferenceAndExtractValue` and just removes these candidates. This is problematic since these expressions might have contained `AttributeReference` that needed to not do an incorrect aliasing. This fixes this family of bugs by moving the conditions into the function `collectRootReferenceAndExtractValue`.

### Why are the changes needed?

The current code leads to `IllegalStateException` runtime failures.

### Does this PR introduce _any_ user-facing change?

Yes, fixes a bug.

### How was this patch tested?

Existing and new unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46756 from eejbyfeldt/SPARK-48428.

Authored-by: Emil Ejbyfeldt <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit b11608c)
Signed-off-by: Wenchen Fan <[email protected]>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
### What changes were proposed in this pull request?

In apache#35170 SPARK-37855 and apache#32301 SPARK-35194 introduced conditions for ExtractValues that can currently not be handled. The considtion is introduced after `collectRootReferenceAndExtractValue` and just removes these candidates. This is problematic since these expressions might have contained `AttributeReference` that needed to not do an incorrect aliasing. This fixes this family of bugs by moving the conditions into the function `collectRootReferenceAndExtractValue`.

### Why are the changes needed?

The current code leads to `IllegalStateException` runtime failures.

### Does this PR introduce _any_ user-facing change?

Yes, fixes a bug.

### How was this patch tested?

Existing and new unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46756 from eejbyfeldt/SPARK-48428.

Authored-by: Emil Ejbyfeldt <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit b11608c)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants