Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Oct 28, 2024

What changes were proposed in this pull request?

This pr makes the following changes to the maven-shade-plugin rules for the sql/core module:

  1. To avoid being influenced by the parent pom.xml, use combine.self = "override" in the <configuration> of the maven-shade-plugin for the sql/core module. Before this configuration was added, the relocation result was incorrect, and protobuf-java was not relocated. We can unzip the packaging result to confirm this issue.

We can use IntelliJ's "Show Effective POM" feature to view the result of this parameter, the result is equivalent to the effective POM log with --debug printing added during the Maven compilation:

Before

image

We can see that an unexpected

<includes>
  <include>org.eclipse.jetty.**</include>
</includes>

has been added to the relocation rule.

After

image

We can see that the extra <includes> in the relocation rule is no longer present.

  1. Before SPARK-48755 | [SPARK-48755][SS][PYTHON] transformWithState pyspark base implementation and ValueState support #47133 overwrote the maven-shade-plugin rules for sql/core, it inherited the rules from the parent pom.xml and shaded org.spark-project.spark:unused. This behavior changed after SPARK-48755, so this pr restores it.

  2. The relocation rules for Guava should be retained and follow the configuration in the parent pom.xml, which relocates com.google.common to ${spark.shade.packageName}.guava. This PR restores this configuration.

  3. For protobuf-java, which is under the com.google.protobuf package, the already shaded protobuf-java in the core module can be reused instead of shading it again in sql/core module. Therefore, this pr only configures the corresponding relocation rule for it: com.google.protobuf -> ${spark.shade.packageName}.spark_core.protobuf.

  4. Regarding the ServicesResourceTransformer configuration, it is used to merge META-INF/services resources. This is not needed for Guava and protobuf-java, so this pr removes it.

Why are the changes needed?

Fix shade and relocation rule of sql/core module

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Pass Github Aciton
  • Manually inspect the packaging result: Extract spark-sql_2.13-4.0.0-SNAPSHOT.jar to a separate directory, then execute grep "org.sparkproject.guava" -R * and grep "org.sparkproject.spark_core.protobuf" -R * to confirm the successful relocation.
  • Maven test passed: https://github.com/LuciferYang/spark/runs/32278520082
image

Was this patch authored or co-authored using generative AI tooling?

No

@LuciferYang LuciferYang marked this pull request as draft October 28, 2024 06:03
<!-- Guava is relocated to ${spark.shade.packageName}.connect.guava -->
<exclude>com.google.common.**</exclude>
</excludes>
<pattern>com.google.common</pattern>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the effective relocation rules are very confusing, seems to be affected by parent pom

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use combine.self = "override" to resolve this issue.

@github-actions github-actions bot added the INFRA label Oct 28, 2024
@LuciferYang LuciferYang changed the title Test fix sql core shade rule [BUILD] Fix shade and relocation rule of sql/core module Oct 30, 2024
@LuciferYang LuciferYang changed the title [BUILD] Fix shade and relocation rule of sql/core module [SPARK-50166][SQL][BUILD] Fix shade and relocation rule of sql/core module Oct 30, 2024
@github-actions github-actions bot removed the INFRA label Oct 30, 2024
@github-actions github-actions bot added the INFRA label Oct 30, 2024
@LuciferYang LuciferYang marked this pull request as ready for review October 31, 2024 03:07
@github-actions github-actions bot removed the INFRA label Oct 31, 2024
Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this.

@LuciferYang
Copy link
Contributor Author

@LuciferYang
Copy link
Contributor Author

Thanks @pan3793 ~

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the PR description, this is a regression due to SPARK-48755 ?

Thank you for discovering this regression and fixing before Apache Spark 4.0.0 release, @LuciferYang .

@dongjoon-hyun
Copy link
Member

BTW, to @LuciferYang , we need to wait for SPARK-48755 people as you pinged already.

@LuciferYang
Copy link
Contributor Author

According to the PR description, this is a regression due to SPARK-48755 ?

Thank you for discovering this regression and fixing before Apache Spark 4.0.0 release, @LuciferYang .

Yes

@LuciferYang
Copy link
Contributor Author

BTW, to @LuciferYang , we need to wait for SPARK-48755 people as you pinged already.

Got it ~

@dongjoon-hyun
Copy link
Member

Gentle ping once more, @bogao007 and @HeartSaVioR , because the change is reported at #47133 .

@bogao007
Copy link
Contributor

bogao007 commented Nov 1, 2024

I don't have specific concerns as long as the existing CI passes, thanks for fixing this!

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 thanks for the fix!

@LuciferYang
Copy link
Contributor Author

Merged into master for Spark 4.0. Thanks @pan3793 @dongjoon-hyun @HeartSaVioR and @bogao007 ~

@LuciferYang LuciferYang deleted the sql-core-shade branch May 2, 2025 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants