[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions #14425

ericl · 2016-07-30T22:04:17Z

What changes were proposed in this pull request?

This fixes a bug wherethe file scan operator does not take into account partition pruning in its implementation of sameResult(). As a result, executions may be incorrect on self-joins over the same base file relation.

The patch here is minimal, but we should reconsider relying on metadata for implementing sameResult() in the future, as string representations may not be uniquely identifying.

cc @rxin

How was this patch tested?

Unit tests.

rxin · 2016-07-30T22:30:15Z

...core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala

+      def getPlan(df: DataFrame): SparkPlan = {
+        df.queryExecution.executedPlan
+      }
+      assert(getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 2"))))


did you verify this would fail without your patch?

rxin · 2016-07-30T22:31:54Z

LGTM (assuming the test case would fail without the fix)

ericl · 2016-07-30T23:06:01Z

Yep, both fail prior to the fix.

On Sat, Jul 30, 2016, 3:32 PM Reynold Xin [email protected] wrote:

LGTM (assuming the test case would fail without the fix)

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#14425 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAA6SgCkQqr5hfHqbT7FCL0ttYdqWOtRks5qa9EQgaJpZM4JY51Y
.

SparkQA · 2016-07-30T23:49:21Z

Test build #63047 has finished for PR 14425 at commit a254540.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-07-31T05:47:36Z

Merging in master/2.0.

rxin · 2016-07-31T05:49:21Z

@ericl there is a conflict with branch-2.0. Can you create a pull request for branch-2.0?

…sets of partitions This fixes a bug wherethe file scan operator does not take into account partition pruning in its implementation of `sameResult()`. As a result, executions may be incorrect on self-joins over the same base file relation. The patch here is minimal, but we should reconsider relying on `metadata` for implementing sameResult() in the future, as string representations may not be uniquely identifying. cc rxin Unit tests. Author: Eric Liang <[email protected]> Closes apache#14425 from ericl/spark-16818. Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala

ericl · 2016-07-31T05:56:21Z

Done, see #14427

…sets of partitions #14425 rebased for branch-2.0 Author: Eric Liang <[email protected]> Closes #14427 from ericl/spark-16818-br-2.

ericl added 2 commits July 30, 2016 15:02

Sat Jul 30 15:02:48 PDT 2016

e7e545f

Sat Jul 30 15:06:12 PDT 2016

a254540

rxin reviewed Jul 30, 2016
View reviewed changes

asfgit closed this in 957a8ab Jul 31, 2016

ericl mentioned this pull request Jul 31, 2016

[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions #14427

Closed

asfgit pushed a commit that referenced this pull request Aug 2, 2016

[SPARK-16818] Exchange reuse incorrectly reuses scans over different …

5fbf5f9

…sets of partitions #14425 rebased for branch-2.0 Author: Eric Liang <[email protected]> Closes #14427 from ericl/spark-16818-br-2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions #14425

[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions #14425

Uh oh!

ericl commented Jul 30, 2016 •

edited

Loading

Uh oh!

rxin Jul 30, 2016

Uh oh!

rxin commented Jul 30, 2016

Uh oh!

ericl commented Jul 30, 2016

Uh oh!

SparkQA commented Jul 30, 2016

Uh oh!

rxin commented Jul 31, 2016

Uh oh!

rxin commented Jul 31, 2016

Uh oh!

ericl commented Jul 31, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions #14425

[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions #14425

Uh oh!

Conversation

ericl commented Jul 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

rxin Jul 30, 2016

Choose a reason for hiding this comment

Uh oh!

rxin commented Jul 30, 2016

Uh oh!

ericl commented Jul 30, 2016

Uh oh!

SparkQA commented Jul 30, 2016

Uh oh!

rxin commented Jul 31, 2016

Uh oh!

rxin commented Jul 31, 2016

Uh oh!

ericl commented Jul 31, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ericl commented Jul 30, 2016 •

edited

Loading