Skip to content

Conversation

@liuzqt
Copy link
Contributor

@liuzqt liuzqt commented May 31, 2024

What changes were proposed in this pull request?

Add dedicated node to represent empty relation (AQE only, i.e., AQEPropagateEmptyRelation.scala).

  • logical.EmptyRelation
  • execution.EmptyRelationExec

both are leaf node and store the eliminated logical plan as a field.

In order to display the plan in spark UI, I extended SparkPlanInfo to support mix of logical and physical plan.

Spark UI

Screenshot 2024-06-07 at 12 50 33 PM

String representation

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   EmptyRelation [plan_id=260]
   +- Join Inner, (key#3 = a#23)
      :- LogicalQueryStage SerializeFromObject [knownnotnull(assertnotnull(input[0, TestData, true])).key AS key#3, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, TestData, true])).value, true, false, true) AS value#4], ShuffleQueryStage 0
      +- LogicalQueryStage Filter (isnotnull(b#24) AND (b#24 = 1)), ShuffleQueryStage 1
+- == Initial Plan ==
   SortMergeJoin [key#3], [a#23], Inner
   :- Sort [key#3 ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(key#3, 200), ENSURE_REQUIREMENTS, [plan_id=204]
   :   ...

Why are the changes needed?

Currently we replace with a LocalTableScan in case of empty relation propagation, which lost the information about the original query plan and make it less human readable. The idea is to create a dedicated EmptyRelation node which is a leaf node but wraps the original query plan inside.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

Existing tests

Was this patch authored or co-authored using generative AI tooling?

NO

@github-actions github-actions bot added the SQL label May 31, 2024
@HyukjinKwon HyukjinKwon changed the title [SPARK-48466] Create dedicated node for EmptyRelation in AQE [SPARK-48466][SQL] Create dedicated node for EmptyRelation in AQE Jun 3, 2024
@liuzqt
Copy link
Contributor Author

liuzqt commented Jun 3, 2024

@cloud-fan @maryannxue

object AQEPropagateEmptyRelation extends PropagateEmptyRelationBase {
override protected def isEmpty(plan: LogicalPlan): Boolean =
super.isEmpty(plan) || (!isRootRepartition(plan) && getEstimatedRowCount(plan).contains(0))
super.isEmpty(plan) || plan.isInstanceOf[EmptyRelation] ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to check the type explicitly? It has row count which should be sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@liuzqt liuzqt Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan we still need to check this explicitly. getEstimatedRowCount only match for specific patterns. Before the change, the reason why it works is because super.isEmpty explicitly match empty LocalRelation.

We can also match EmptyRelation in getEstimatedRowCount alternatively. I don't have preference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more reasonable to make getEstimatedRowCount recognize this new EmptyRelation

@liuzqt liuzqt requested a review from cloud-fan June 12, 2024 19:59
@cloud-fan
Copy link
Contributor

the failed pyspark test is unrelated, thanks, merging to master!

@cloud-fan cloud-fan closed this in 2fe0692 Jun 19, 2024
gengliangwang pushed a commit that referenced this pull request Jun 25, 2024
…onExec

### What changes were proposed in this pull request?

Fixed a missing pattern match introduced in #46830

Sorry for the silly mistake...

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
Existing tests

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes #47089 from liuzqt/SPARK-48466.

Authored-by: Ziqi Liu <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
LuciferYang pushed a commit that referenced this pull request Aug 31, 2024
### What changes were proposed in this pull request?
Remove cleanupResource() from`EmptyRelationExec`

### Why are the changes needed?

This bug was introduced in #46830 : `cleanupResources` might be executed on the executor where `logical` is null.

After revisiting cleanupResources relevant code paths, I think `EmptyRelationExec` doesn't need to anything here.

- for driver side cleanup, we have [this code path](https://github.com/apache/spark/blob/0602020eb3b346a8c50ad32eeda4e6dabb70c584/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala) to cleanup each AQE query stage.
- for executor side cleanup, so far we only have SortMergeJoinExec which invoke cleanupResource during its execution, so upon the time when EmptyRelationExec is created, it's guaranteed necessary cleanup has been done.
-
After all, `EmptyRelationExec` is only a never-execute wrapper for materialized physical query stages, it should not be responsible for any cleanup invocation.

So I'm removing `cleanupResources` implementation from `EmptyRelationExec`.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
New unit test.

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes #47931 from liuzqt/SPARK-49460.

Authored-by: Ziqi Liu <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
cloud-fan pushed a commit that referenced this pull request Mar 24, 2025
…kPlanInfo

### What changes were proposed in this pull request?

Fix `SparkPlanInfo.fromLogicalPlan` to handle nested empty relation.

Before:
<img width="294" alt="Screenshot 2025-03-21 at 11 30 56 AM" src="https://github.com/user-attachments/assets/d03f6b88-13ad-4b67-bc4d-18b532e4dea2" />

After:

<img width="390" alt="Screenshot 2025-03-20 at 5 51 21 PM" src="https://github.com/user-attachments/assets/0e4f775c-b9cf-4955-af17-5e47fa44e44b" />

### Why are the changes needed?

A followup for #46830, in the original PR I forget to handle nested empty relation.

### Does this PR introduce _any_ user-facing change?
Yes, UI change

### How was this patch tested?
Verifed in Spark UI

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes #50350 from liuzqt/SPARK-48466.

Authored-by: liuzqt <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Mar 24, 2025
…kPlanInfo

### What changes were proposed in this pull request?

Fix `SparkPlanInfo.fromLogicalPlan` to handle nested empty relation.

Before:
<img width="294" alt="Screenshot 2025-03-21 at 11 30 56 AM" src="https://github.com/user-attachments/assets/d03f6b88-13ad-4b67-bc4d-18b532e4dea2" />

After:

<img width="390" alt="Screenshot 2025-03-20 at 5 51 21 PM" src="https://github.com/user-attachments/assets/0e4f775c-b9cf-4955-af17-5e47fa44e44b" />

### Why are the changes needed?

A followup for #46830, in the original PR I forget to handle nested empty relation.

### Does this PR introduce _any_ user-facing change?
Yes, UI change

### How was this patch tested?
Verifed in Spark UI

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes #50350 from liuzqt/SPARK-48466.

Authored-by: liuzqt <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit b8e0246)
Signed-off-by: Wenchen Fan <[email protected]>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
…kPlanInfo

### What changes were proposed in this pull request?

Fix `SparkPlanInfo.fromLogicalPlan` to handle nested empty relation.

Before:
<img width="294" alt="Screenshot 2025-03-21 at 11 30 56 AM" src="https://github.com/user-attachments/assets/d03f6b88-13ad-4b67-bc4d-18b532e4dea2" />

After:

<img width="390" alt="Screenshot 2025-03-20 at 5 51 21 PM" src="https://github.com/user-attachments/assets/0e4f775c-b9cf-4955-af17-5e47fa44e44b" />

### Why are the changes needed?

A followup for apache#46830, in the original PR I forget to handle nested empty relation.

### Does this PR introduce _any_ user-facing change?
Yes, UI change

### How was this patch tested?
Verifed in Spark UI

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes apache#50350 from liuzqt/SPARK-48466.

Authored-by: liuzqt <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 4661833)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants