Review comments, fix table, and add menu link

apache · szehon-ho · May 25, 2024 · Jun 11, 2024 · Jun 12, 2024 · Jun 12, 2024
commit e1205a471d525059a0ae43f9a8d21799ceb658a0
diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
@@ -63,6 +63,8 @@
       url: sql-performance-tuning.html#optimizing-the-join-strategy
     - text: Adaptive Query Execution
       url: sql-performance-tuning.html#adaptive-query-execution
+    - text: Storage Partition Join
+      url: sql-performance-tuning.html#storage-partition-join
 - text: Distributed SQL Engine
   url: sql-distributed-sql-engine.html
   subitems:

diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md
@@ -435,7 +435,7 @@ Storage Partition Join (SPJ) is an optimization technique in Spark SQL that make
 
 This is a generalization of the concept of Bucket Joins, which is only applicable for [bucketed](sql-data-sources-load-save-functions.html#bucketing-sorting-and-partitioning) tables, to tables partitioned by functions registered in FunctionCatalog. Storage Partition Joins are currently supported for compatible V2 DataSources.
 
-The following SQL properties enable Storage Partition Join.
+The following SQL properties enable Storage Partition Join in different join queries with various optimizations.
 
   <table class="spark-config">
     <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead>
@@ -468,34 +468,38 @@ The following SQL properties enable Storage Partition Join.
       <td>false</td>
       <td>
         When true, and when the join is not a full outer join, enable skew optimizations to handle partitions with large amounts of data when avoiding shuffle. One side will be chosen as the big table based on table statistics, and the splits on this side will be partially-clustered. The splits of the other side will be grouped and replicated to match. This config requires both <code>spark.sql.sources.v2.bucketing.enabled</code> and <code>spark.sql.sources.v2.bucketing.pushPartValues.enabled</code> to be true.
+      </td>
       <td>3.4.0</td>
     </tr>
     <tr>
       <td><code>spark.sql.sources.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled</code></td>
       <td>false</td>
       <td>
         When enabled, try to avoid shuffle if join or MERGE condition does not include all partition columns. This config requires both <code>spark.sql.sources.v2.bucketing.enabled</code> and <code>spark.sql.sources.v2.bucketing.pushPartValues.enabled</code> to be true, and <code>spark.sql.requireAllClusterKeysForCoPartition</code> to be false.
+      </td>
       <td>4.0.0</td>
     </tr>
     <tr>
       <td><code>spark.sql.sources.v2.bucketing.allowCompatibleTransforms.enabled</code></td>
       <td>false</td>
       <td>
-        When enabled, try to avoid shuffle if partition transforms are compatible but not identical. This config requires both <code>spark.sql.sources.v2.bucketing.enabled</code> and <code>spark.sql.sources.v2.bucketing.pushPartValues.enabled</code> to be true.</td>
+        When enabled, try to avoid shuffle if partition transforms are compatible but not identical. This config requires both <code>spark.sql.sources.v2.bucketing.enabled</code> and <code>spark.sql.sources.v2.bucketing.pushPartValues.enabled</code> to be true.
+      </td>
       <td>4.0.0</td>
     </tr>
     <tr>
       <td><code>spark.sql.sources.v2.bucketing.shuffle.enabled</code></td>
       <td>false</td>
       <td>
         When enabled, try to avoid shuffle on one side of the join, by recognizing the partitioning reported by a V2 data source on the other side.
+      </td>
       <td>4.0.0</td>
     </tr>
   </table>
 
 If Storage Partition Join is performed, the query plan will not contain Exchange nodes prior to the join.
 
-The following example uses Iceberg (https://iceberg.apache.org/docs/nightly/spark-getting-started/), a Spark V2 DataSource that supports Storage Partition Join.
+The following example uses Iceberg ([https://iceberg.apache.org/docs/latest/spark-getting-started/](https://iceberg.apache.org/docs/latest/spark-getting-started/)), a Spark V2 DataSource that supports Storage Partition Join.
 ```sql
 CREATE TABLE prod.db.target (id INT, salary INT, dep STRING)
 USING iceberg