Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Ideally it's OK to always propagate metadata columns, as column pruning will kick in later and prune them aways if they are not used. However, it may cause problems in cases like CTE. #39081 fixed such a bug.

This PR only propagates metadata columns if they are used, to keep the analyzed plan simple and reliable.

Why are the changes needed?

avoid potential bugs.

Does this PR introduce any user-facing change?

no

How was this patch tested?

new tests

@cloud-fan cloud-fan marked this pull request as ready for review December 21, 2022 07:33
@github-actions github-actions bot added the SQL label Dec 21, 2022
@cloud-fan
Copy link
Contributor Author

cc @gengliangwang @viirya

plan: LogicalPlan,
isRequired: Attribute => Boolean): LogicalPlan = plan match {
case s: ExposesMetadataColumns if s.metadataOutput.exists(isRequired) =>
s.withMetadataColumns()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are doing this, shall we propagate the required metadata columns only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, but we need to change the ExposesMetadataColumns.withMetadataColumns API to pass required attributes, which may break custom logical plans. Since the benefit is small, this may not worth.

s.withMetadataColumns()
case p: Project if p.metadataOutput.exists(isRequired) =>
val newProj = p.copy(
projectList = p.projectList ++ p.metadataOutput,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto: shall we propagate the required metadata columns only?

@gengliangwang
Copy link
Member

LGTM except minor comments

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule only adds metadata columns when a node is resolved but is missing input from its
children. This ensures that metadata columns are not added to the plan unless they are used.

Based on the rule description, I think this change makes it more correct now.

@cloud-fan
Copy link
Contributor Author

thanks for review, merging to master!

@cloud-fan cloud-fan closed this in bd864a0 Dec 21, 2022
gengliangwang added a commit that referenced this pull request Jan 6, 2023
…tadataColumnSuite

### What changes were proposed in this pull request?

Move the new test case for Metadata column in #39081 to `MetadataColumnSuite`

### Why are the changes needed?

All metadata column related test cases should go into `MetadataColumnSuite`. For example:

- #37758
- #39152

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

GA tests

Closes #39425 from gengliangwang/moveTest.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
huaxingao added a commit that referenced this pull request Apr 21, 2023
### What changes were proposed in this pull request?
backporting #39152 to 3.3

### Why are the changes needed?
bug fixing

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT

Closes #40889 from huaxingao/metadata.

Authored-by: huaxingao <[email protected]>
Signed-off-by: huaxingao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants