Skip to content

Conversation

@ericl
Copy link
Contributor

@ericl ericl commented Jul 30, 2016

What changes were proposed in this pull request?

This fixes a bug wherethe file scan operator does not take into account partition pruning in its implementation of sameResult(). As a result, executions may be incorrect on self-joins over the same base file relation.

The patch here is minimal, but we should reconsider relying on metadata for implementing sameResult() in the future, as string representations may not be uniquely identifying.

cc @rxin

How was this patch tested?

Unit tests.

def getPlan(df: DataFrame): SparkPlan = {
df.queryExecution.executedPlan
}
assert(getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 2"))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you verify this would fail without your patch?

@rxin
Copy link
Contributor

rxin commented Jul 30, 2016

LGTM (assuming the test case would fail without the fix)

@ericl
Copy link
Contributor Author

ericl commented Jul 30, 2016

Yep, both fail prior to the fix.

On Sat, Jul 30, 2016, 3:32 PM Reynold Xin [email protected] wrote:

LGTM (assuming the test case would fail without the fix)


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#14425 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAA6SgCkQqr5hfHqbT7FCL0ttYdqWOtRks5qa9EQgaJpZM4JY51Y
.

@SparkQA
Copy link

SparkQA commented Jul 30, 2016

Test build #63047 has finished for PR 14425 at commit a254540.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 31, 2016

Merging in master/2.0.

@rxin
Copy link
Contributor

rxin commented Jul 31, 2016

@ericl there is a conflict with branch-2.0. Can you create a pull request for branch-2.0?

@asfgit asfgit closed this in 957a8ab Jul 31, 2016
ericl added a commit to ericl/spark that referenced this pull request Jul 31, 2016
…sets of partitions

This fixes a bug wherethe file scan operator does not take into account partition pruning in its implementation of `sameResult()`. As a result, executions may be incorrect on self-joins over the same base file relation.

The patch here is minimal, but we should reconsider relying on `metadata` for implementing sameResult() in the future, as string representations may not be uniquely identifying.

cc rxin

Unit tests.

Author: Eric Liang <[email protected]>

Closes apache#14425 from ericl/spark-16818.

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
@ericl
Copy link
Contributor Author

ericl commented Jul 31, 2016

Done, see #14427

asfgit pushed a commit that referenced this pull request Aug 2, 2016
…sets of partitions

#14425 rebased for branch-2.0

Author: Eric Liang <[email protected]>

Closes #14427 from ericl/spark-16818-br-2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants