Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,9 @@ object ColumnPruning extends Rule[LogicalPlan] {
p
}

// Eliminate the Projects with empty projectList
case p @ Project(projectList, child) if projectList.isEmpty => child
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of the correctness of this rule. Actually this is not column pruning, but add more columns, as child may have more one columns.

And why this rule case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child can't work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because OneRowRelation has no output. So its output is different to its parent Project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But a Project with empty projectList also has no output right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    case p @ Project(_, l: LeafNode) => p 

There is another case above it. Thus, it will stop here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

case p @ Project(_, l: LeafNode) if !l.isInstanceOf[OneRowRelation] => p 

Then, we do not need the first line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. As I posted before. I added a new rule that has side-effect to fix this issue too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya @cloud-fan !

I am not sure which way is better.

case p @ Project(_, l: LeafNode) if !l.isInstanceOf[OneRowRelation] => p 

My concern is the above line looks more hacky than the current PR fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me respond the original question by @cloud-fan
We will not see an empty Project, if the child has more than one columns. The empty Project only happens after PruningColumns. I am fine, if we want to add an extra rule for eliminating Project only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we just move that case ahead? It seems always safe to apply case p @ Project(projectList, child) if sameOutput(child.output, p.output) => child

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we intentionally did it in this way. I am not 100% sure if we might hit any issue because of it. Let me try it and check if we will hit any test case failure.


// Can't prune the columns on LeafNode
case p @ Project(_, l: LeafNode) => p

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,14 @@ class ColumnPruningSuite extends PlanTest {
comparePlans(Optimize.execute(query), expected)
}

test("Eliminate the Project with an empty projectList") {
val input = OneRowRelation
val query =
Project(Literal(1).as("1") :: Nil, Project(Literal(1).as("1") :: Nil, input)).analyze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you test empty projectList?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running Optimize.execute(query), the second Project's projectList is pruned to empty at first. Then, the second Project will be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add another case with an empty List too.

val expected = Project(Literal(1).as("1") :: Nil, input).analyze
comparePlans(Optimize.execute(query), expected)
}

test("column pruning for group") {
val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
val originalQuery =
Expand Down