Skip to content

Conversation

@cloud-fan
Copy link
Contributor

This PR is a improvement for #5189.

The resolution rule for ORDER BY is: first resolve based on what comes from the select clause and then fall back on its child only when this fails.

There are 2 steps. First, try to resolve Sort in ResolveReferences based on select clause, and ignore exceptions. Second, try to resolve Sort in ResolveSortReferences and add missing projection.

However, the way we resolve SortOrder is wrong. We just resolve UnresolvedAttribute and use the result to indicate if we can resolve SortOrder. But UnresolvedAttribute is only part of GetField chain(broken by GetItem), so we need to go through the whole chain to indicate if we can resolve SortOrder.

With this change, we can also avoid re-throw GetField exception in CheckAnalysis which is little ugly.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Apr 23, 2015

Test build #30861 has started for PR 5659 at commit ef6039c.

@SparkQA
Copy link

SparkQA commented Apr 23, 2015

Test build #30861 has finished for PR 5659 at commit ef6039c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30861/
Test PASSed.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@cloud-fan
Copy link
Contributor Author

ping @marmbrus

@cloud-fan
Copy link
Contributor Author

Retest this please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions here:

  • Can we share the code with the block below? and only add a try/catch around it?
  • I think we can probably avoid the changed optimization. The rule executor and transform already do checks to avoid churn when the plan does not change. Either way, I think its better to keep rules simple even if there is a small performance penalty.

@cloud-fan cloud-fan force-pushed the order-by branch 3 times, most recently from 8c2e600 to d75cef0 Compare May 14, 2015 04:47
@cloud-fan
Copy link
Contributor Author

Retest this please.

@cloud-fan
Copy link
Contributor Author

cc @marmbrus

@cloud-fan
Copy link
Contributor Author

ping @marmbrus

@cloud-fan
Copy link
Contributor Author

cc @marmbrus , is it OK to test?

@marmbrus
Copy link
Contributor

add to whitelist

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jun 15, 2015

Test build #34909 has started for PR 5659 at commit 2ac76ea.

@SparkQA
Copy link

SparkQA commented Jun 15, 2015

Test build #34909 has finished for PR 5659 at commit 2ac76ea.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jun 15, 2015

Test build #34910 has started for PR 5659 at commit d71f022.

@SparkQA
Copy link

SparkQA commented Jun 15, 2015

Test build #34910 has finished for PR 5659 at commit d71f022.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jun 16, 2015

Test build #34984 has started for PR 5659 at commit e04b0e5.

@SparkQA
Copy link

SparkQA commented Jun 16, 2015

Test build #34984 has finished for PR 5659 at commit e04b0e5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class IsNull(child: Expression) extends UnaryExpression with Predicate
    • case class IsNotNull(child: Expression) extends UnaryExpression with Predicate

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jun 16, 2015

Test build #34989 has started for PR 5659 at commit cfa79f8.

@SparkQA
Copy link

SparkQA commented Jun 16, 2015

Test build #34989 has finished for PR 5659 at commit cfa79f8.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@cloud-fan
Copy link
Contributor Author

retest it please.

@marmbrus
Copy link
Contributor

test this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jun 17, 2015

Test build #35056 has started for PR 5659 at commit cfa79f8.

@SparkQA
Copy link

SparkQA commented Jun 17, 2015

Test build #35056 has finished for PR 5659 at commit cfa79f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@marmbrus
Copy link
Contributor

Thanks! Merging to master.

@asfgit asfgit closed this in 7f05b1f Jun 17, 2015
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
This PR is a improvement for apache#5189.

The resolution rule for ORDER BY is: first resolve based on what comes from the select clause and then fall back on its child only when this fails.

There are 2 steps. First, try to resolve `Sort` in `ResolveReferences` based on select clause, and ignore exceptions. Second, try to resolve `Sort` in `ResolveSortReferences` and add missing projection.

However, the way we resolve `SortOrder` is wrong. We just resolve `UnresolvedAttribute` and use the result to indicate if we can resolve `SortOrder`. But `UnresolvedAttribute` is only part of `GetField` chain(broken by `GetItem`), so we need to go through the whole chain to indicate if we can resolve `SortOrder`.

With this change, we can also avoid re-throw GetField exception in `CheckAnalysis` which is little ugly.

Author: Wenchen Fan <[email protected]>

Closes apache#5659 from cloud-fan/order-by and squashes the following commits:

cfa79f8 [Wenchen Fan] update test
3245d28 [Wenchen Fan] minor improve
465ee07 [Wenchen Fan] address comment
1fc41a2 [Wenchen Fan] fix SPARK-7067
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants