Skip to content

Conversation

@ConeyLiu
Copy link
Contributor

@ConeyLiu ConeyLiu commented Jul 4, 2023

There are many unnecessary metadata columns reading when merging using the Iceberg table. The problem should be caused by Spark 3.3 AddMetadataColumns rule which has been fixed in Spark 3.4.
In this PR, we add a rule to remove the unnecessary metadata column reading to fix the problem in Spark 3.3.

Before this PR:
image

After this PR:
image

@github-actions github-actions bot added the spark label Jul 4, 2023
@ConeyLiu
Copy link
Contributor Author

ConeyLiu commented Jul 4, 2023

Hi @rdblue @szehon-ho @aokolnychyi @RussellSpitzer @Fokko, could you please help to review this when you are free? Thanks a lot.

@RussellSpitzer
Copy link
Member

@huaxingao I believe you did the Spark fix for this?

@huaxingao
Copy link
Contributor

I think the problem has already been fixed in Spark 3.3 by this PR

@ConeyLiu
Copy link
Contributor Author

ConeyLiu commented Jul 4, 2023

Thanks @RussellSpitzer @huaxingao, I see, that's because the Spark 3.3.3 is not released.

@ConeyLiu
Copy link
Contributor Author

ConeyLiu commented Jul 5, 2023

Close this since Spark 3.3 has been fixed.

@ConeyLiu ConeyLiu closed this Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants