Skip to content

Conversation

@liuzqt
Copy link
Contributor

@liuzqt liuzqt commented Aug 29, 2024

What changes were proposed in this pull request?

Remove cleanupResource() fromEmptyRelationExec

Why are the changes needed?

This bug was introduced in #46830 : cleanupResources might be executed on the executor where logical is null.

After revisiting cleanupResources relevant code paths, I think EmptyRelationExec doesn't need to anything here.

  • for driver side cleanup, we have this code path to cleanup each AQE query stage.
  • for executor side cleanup, so far we only have SortMergeJoinExec which invoke cleanupResource during its execution, so upon the time when EmptyRelationExec is created, it's guaranteed necessary cleanup has been done.

After all, EmptyRelationExec is only a never-execute wrapper for materialized physical query stages, it should not be responsible for any cleanup invocation.

So I'm removing cleanupResources implementation from EmptyRelationExec.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

New unit test.

Was this patch authored or co-authored using generative AI tooling?

NO

@github-actions github-actions bot added the SQL label Aug 29, 2024
@liuzqt
Copy link
Contributor Author

liuzqt commented Aug 29, 2024

@LuciferYang Thank you for the review! updated

@liuzqt liuzqt requested a review from LuciferYang August 29, 2024 15:34
@liuzqt
Copy link
Contributor Author

liuzqt commented Aug 29, 2024

I would like to hold on this PR a bit, want to dig more into when cleanupResources should be called and whether we should somehow cleanup this query stage in executor

@liuzqt
Copy link
Contributor Author

liuzqt commented Aug 30, 2024

After revisiting cleanupResources relevant code paths, I think EmptyRelationExec doesn't need to anything here.

  • for driver side cleanup, we have this code path to cleanup each AQE query stage.
  • for executor side cleanup, so far we only have SortMergeJoinExec which invoke cleanupResource during its execution, so upon the time when EmptyRelationExec is created, it's guaranteed necessary cleanup has been done.

After all, EmptyRelationExec is only a never-execute wrapper for materialized physical query stages, it should not be responsible for any cleanup invocation.

So I'm removing cleanupResources implementation from EmptyRelationExec.

@LuciferYang
Copy link
Contributor

[error] /home/runner/work/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/EmptyRelationExec.scala:25:48: Unused import
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=<part of the message>, cat=unused-imports, site=org.apache.spark.sql.execution
[error] import org.apache.spark.sql.execution.adaptive.LogicalQueryStage
[error]                                                ^

@LuciferYang
Copy link
Contributor

After revisiting cleanupResources relevant code paths, I think EmptyRelationExec doesn't need to anything here.

  • for driver side cleanup, we have this code path to cleanup each AQE query stage.
  • for executor side cleanup, so far we only have SortMergeJoinExec which invoke cleanupResource during its execution, so upon the time when EmptyRelationExec is created, it's guaranteed necessary cleanup has been done.

After all, EmptyRelationExec is only a never-execute wrapper for materialized physical query stages, it should not be responsible for any cleanup invocation.

So I'm removing cleanupResources implementation from EmptyRelationExec.

Could you update the title and description of the PR to make them more aligned with the current modifications?

@liuzqt liuzqt changed the title [SPARK-49460][SQL] Fix NPE in EmptyRelationExec [SPARK-49460][SQL] Remove cleanupResource() from EmptyRelationExec Aug 30, 2024
@liuzqt
Copy link
Contributor Author

liuzqt commented Aug 31, 2024

@LuciferYang Updated. Could you pls help merge this PR? Thanks!

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@LuciferYang
Copy link
Contributor

Merged into master. Thanks @liuzqt

@LuciferYang
Copy link
Contributor

@liuzqt I would like to confirm, does this issue really exist in 3.5.x? It seems that EmptyRelationExec does not exist in branch-3.5.

@liuzqt
Copy link
Contributor Author

liuzqt commented Aug 31, 2024

@LuciferYang sorry my bad, it doesn't exist in 3.5, so merging into master should be enough. I also updated the Jira ticket. Thank you for the reivew!

@LuciferYang
Copy link
Contributor

@LuciferYang sorry my bad, it doesn't exist in 3.5, so merging into master should be enough. I also updated the Jira ticket. Thank you for the reivew!

Thanks for your confirmation

HyukjinKwon pushed a commit that referenced this pull request Sep 21, 2024
### What changes were proposed in this pull request?

Fixed potential NPE risk in `EmptyRelationExec.logical`

### Why are the changes needed?

This is a follow up for #47931, I've checked other callsites of `EmptyRelationExec.logical`, which we can not assure it's driver-only. So we should fix those potential risks.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
Existing UT

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes #48191 from liuzqt/SPARK-49460.

Authored-by: Ziqi Liu <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants