[SPARK-18067] Avoid shuffling child if join keys are superset of child's partitioning keys #19054

tejasapatil · 2017-08-25T15:14:16Z

Jira : https://issues.apache.org/jira/browse/SPARK-18067

What problem is being addressed in this PR ?

Currently shuffle based joins require its children to be shuffled over all the columns in the join condition. In case the child node is already distributed over a subset of columns in the join condition, this shuffle is not needed (eg. if the input is bucketed, if the input is output of a subquery). Avoiding the shuffle makes the join run faster and more stably as its single stage.

To dive deeper, lets look at this example. Both input tables table1 and table2 are bucketed on columns i and j and have 8 buckets. The query is joining the 2 tables over i,j,k. With bucketing, all the rows with the same values of i and j should reside in the same bucket of both the inputs. So, if we simply sort the corresponding buckets over the join columns and perform the join, that would suffice the requirements.

partitions	table1 (i,j,k) values	table2 (i,j,k) values
bucket 0	(0,0,1) (0,0,2) (1,0,4)	(0,0,1) (0,0,3)
bucket 1	(1,0,2) (1,1,1)	(1,0,1) (1,0,2) (1,1,2)
bucket 2	(0,1,8) (0,1,6)	(0,1,1)

What changes were proposed in this pull request?

Both shuffled hash join and sort merge join would not keep track of which keys should the children be distributed on ?. To start off, this is same as the join keys. The rule ReorderJoinPredicates is modified to detect if the child's output partitioning is over a subset of join keys and based on that the distribution keys for the join operator are revised.

How was this patch tested?

Added unit test.
manual test:

Query:

-- both the input tables are bucketed over columns `i` and `j`
SELECT * FROM table1 a JOIN table2 b ON a.i = b.i AND a.j = b.j AND a.k = b.k

BEFORE

SortMergeJoin [i#5, j#6, k#7], [i#8, j#9, k#10], Inner
:- *Sort [i#5 ASC NULLS FIRST, j#6 ASC NULLS FIRST, k#7 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(i#5, j#6, k#7, 200)
:     +- *FileScan orc default.table1[i#5,j#6,k#7] Batched: false, Format: ORC, Location: InMemoryFileIndex[file:warehouse/table1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<i:int,j:int,k:string>
+- *Sort [i#8 ASC NULLS FIRST, j#9 ASC NULLS FIRST, k#10 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(i#8, j#9, k#10, 200)
      +- *FileScan orc default.table2[i#8,j#9,k#10] Batched: false, Format: ORC, Location: InMemoryFileIndex[file:warehouse/table2], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<i:int,j:int,k:string>

AFTER

SortMergeJoin [i#5, j#6, k#7], [i#8, j#9, k#10], [i#5, j#6], [i#8, j#9], Inner
:- *Sort [i#5 ASC NULLS FIRST, j#6 ASC NULLS FIRST, k#7 ASC NULLS FIRST], false, 0
:  +- *FileScan orc default.table1[i#5,j#6,k#7] Batched: false, Format: ORC, Location: InMemoryFileIndex[file:warehouse/table1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<i:int,j:int,k:string>
+- *Sort [i#8 ASC NULLS FIRST, j#9 ASC NULLS FIRST, k#10 ASC NULLS FIRST], false, 0
   +- *FileScan orc default.table2[i#8,j#9,k#10] Batched: false, Format: ORC, Location: InMemoryFileIndex[file:warehouse/table2], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<i:int,j:int,k:string>

SparkQA · 2017-08-25T17:49:27Z

Test build #81131 has finished for PR 19054 at commit 9ba8add.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tejasapatil · 2017-08-25T18:06:29Z

cc @hvanhovell @cloud-fan for review

SparkQA · 2017-09-08T17:20:17Z

Test build #81562 has finished for PR 19054 at commit ec8bd80.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-01T03:24:41Z

Test build #84366 has finished for PR 19054 at commit b0db6aa.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-01T06:29:35Z

Test build #84368 has finished for PR 19054 at commit 69e288e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-11T21:30:49Z

Test build #85985 has finished for PR 19054 at commit c689ff1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tejasapatil · 2018-01-18T23:14:55Z

cc @hvanhovell @cloud-fan for review

cloud-fan · 2018-01-19T07:16:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

We should add some documentation to explain what the return value is.

added more doc. I wasn't sure how to make it easier to understand. Hope that the example helps with that

cloud-fan · 2018-01-19T07:20:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

if leftPartitioning is HashPartitioning, we don't need to care about rightPartitioning at all?

given that this was only done over SortMergeJoinExec and ShuffledHashJoinExec where both the partitionings are HashPartitioning, things worked fine. I have changed this to have a stricter check.

…d's partitioning keys

tejasapatil · 2018-01-20T01:41:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

   */
  private def reorderJoinPredicates(plan: SparkPlan): SparkPlan = {
    plan.transformUp {
-      case BroadcastHashJoinExec(leftKeys, rightKeys, joinType, buildSide, condition, left,


Removal of BroadcastHashJoinExec is intentional. The children are expected to have BroadcastDistribution or UnspecifiedDistribution so this method wont help here (this optimization only helps in case of shuffle based joins)

SparkQA · 2018-01-20T04:46:21Z

Test build #86406 has finished for PR 19054 at commit 00bb14b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

eyalfa · 2018-02-04T22:19:24Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

-    val rightKeysBuffer = ArrayBuffer[Expression]()
+      expectedOrderOfKeys: Seq[Expression], // comes from child's output partitioning
+      currentOrderOfKeys: Seq[Expression]): // comes from join predicate
+  (Seq[Expression], Seq[Expression], Seq[Expression], Seq[Expression]) = {


can you please add a comment describing the return type? a tuple4 is not such a descriptive type 😃

eyalfa · 2018-02-04T22:56:06Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

-      rightKeysBuffer.append(rightKeys(index))
+      val index = currentOrderOfKeys.zipWithIndex.find { case (currKey, i) =>
+        !processedIndicies.contains(i) && currKey.semanticEquals(expression)
+      }.get._2


is the find guaranteed to always succeed?
if so, worth a comment on method's pre/post conditions.

a getOrElse(sys error "...") might also be a good way of documenting this.

github-actions · 2020-01-16T00:08:25Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

tejasapatil mentioned this pull request Aug 25, 2017

[WIP] [SPARK-18067] [SQL] SortMergeJoin adds shuffle if join predicates have non partitioned columns #15605

Closed

2 tasks

tejasapatil mentioned this pull request Sep 6, 2017

[SPARK-21417][SQL] Infer join conditions using propagated constraints #18692

Closed

tejasapatil force-pushed the SPARK-18067_take2 branch from 9ba8add to ec8bd80 Compare September 8, 2017 14:40

tejasapatil force-pushed the SPARK-18067_take2 branch from ec8bd80 to b0db6aa Compare December 1, 2017 03:15

tejasapatil force-pushed the SPARK-18067_take2 branch from b0db6aa to 69e288e Compare December 1, 2017 03:43

tejasapatil force-pushed the SPARK-18067_take2 branch from 69e288e to c689ff1 Compare January 11, 2018 18:31

cloud-fan reviewed Jan 19, 2018

View reviewed changes

tejasapatil added 3 commits January 19, 2018 08:59

[SPARK-18067] Avoid shuffling child if join keys are superset of chil…

1fcce49

…d's partitioning keys

bug fix: use Seq instead of Set

32b976b

rebase + updates

00bb14b

tejasapatil force-pushed the SPARK-18067_take2 branch from c689ff1 to 00bb14b Compare January 20, 2018 01:35

tejasapatil commented Jan 20, 2018

View reviewed changes

eyalfa reviewed Feb 4, 2018

View reviewed changes

dongjoon-hyun added the SQL label Jun 14, 2019

github-actions bot added the Stale label Jan 16, 2020

github-actions bot closed this Jan 17, 2020

andyvanyperenAM mentioned this pull request Sep 11, 2020

[SPARK-32806][SQL] SortMergeJoin with partial hash distribution can be optimized to remove shuffle #29655

Closed

[SPARK-18067] Avoid shuffling child if join keys are superset of child's partitioning keys #19054

[SPARK-18067] Avoid shuffling child if join keys are superset of child's partitioning keys #19054

Uh oh!

Conversation

tejasapatil commented Aug 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem is being addressed in this PR ?

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Aug 25, 2017

Uh oh!

tejasapatil commented Aug 25, 2017

Uh oh!

SparkQA commented Sep 8, 2017

Uh oh!

SparkQA commented Dec 1, 2017

Uh oh!

SparkQA commented Dec 1, 2017

Uh oh!

SparkQA commented Jan 11, 2018

Uh oh!

tejasapatil commented Jan 18, 2018

Uh oh!

cloud-fan Jan 19, 2018

Choose a reason for hiding this comment

Uh oh!

tejasapatil Jan 20, 2018

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jan 19, 2018

Choose a reason for hiding this comment

Uh oh!

tejasapatil Jan 20, 2018

Choose a reason for hiding this comment

Uh oh!

tejasapatil Jan 20, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 20, 2018

Uh oh!

eyalfa Feb 4, 2018

Choose a reason for hiding this comment

Uh oh!

eyalfa Feb 4, 2018

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tejasapatil commented Aug 25, 2017 •

edited

Loading