Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Jul 8, 2019

What changes were proposed in this pull request?

Current catalyst construct, we can't add our user-defined hint to Analyzer. Since Catalyst will resolve hint first when analyze LogicalPlan, after it resolve it' s own hints, it will call RemoveAllHint, so when we add Rule of hint to SparkSessionExtension, it will be ignored.
This PR is a small extension to SparkSessionExtension. It enable user to add user-defined hint to Analyze.

FAQ:
Q: Why we need this extension?
A: Such as DataBricks, in their delta platform, the add more hint for better join behavior such as RANK_JOIN, and SKEW_JOIN, through these hints and their parameter, Catalyst will choose a better behavior to solve un-healthy join operator. And we also can add some other hints to restrict user's behavior such as we only can submit query like SELECT * FROM TABLE with a spec hint flag.
With this extension, you will have more freedom to customize your catalyst.
And will a hint extension, you can combine you code with SparkSessionExtension. More safety.

Q: How to write your hints.
A: Blow is a simple example, it's function is to use hint to pass a LIMIT behavior to a subquery:

object MyHint extends Rule[LogicalPlan] {
 // you hint name, a Set of string, for same Hint
  private val MY_HINT_NAME = Set("MY_HINT")

  def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators {
  // match when there is a hint in you SQL
    case h: UnresolvedHint if MY_HINT_NAME.contains(h.name.toUpperCase(Locale.ROOT)) =>
     // resolve hint parameters
    val limits = h.parameters match {
        case Seq(IntegerLiteral(limits)) =>
          limits
        case Seq(limits: Int) =>
          limits
        case _ =>
          throw new AnalysisException(s"$MY_HINT_NAME Hint expects a limits number as parameter")
      }
      // The result LogicalPlan with you hint , in there, it just add a Limit to current subquery
      GlobalLimit(Literal(limits), LocalLimit(Literal(limits), h.child))
  }
}

SQL =>
SELECT /* MY_HINT(10) */ * FROM TEST_TABLE

it will return same behavior as
SELECT * FROM TEST_TABLE LIMIT 10

How was this patch tested?

Added unit test.

@AngersZhuuuu AngersZhuuuu changed the title [SPARK-28292] Enable inject user-defined Hint [WIP][SPARK-28292] Enable inject user-defined Hint Jul 8, 2019
@AngersZhuuuu AngersZhuuuu changed the title [WIP][SPARK-28292] Enable inject user-defined Hint [WIP][SPARK-28292][SQL] Enable inject user-defined Hint Jul 8, 2019
@AngersZhuuuu AngersZhuuuu changed the title [WIP][SPARK-28292][SQL] Enable inject user-defined Hint [SPARK-28292][SQL] Enable inject user-defined Hint Jul 8, 2019
* Inject an analyzer resolution `Hint` builder into the [[SparkSession]]. These analyzer
* rules will be executed as part of the resolution phase of analysis.
*/
def injectResolutionHint(builder: RuleBuilder): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try to use def injectResolutionRule(builder: RuleBuilder)? Is it insufficient for your use-case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun
Since for Hint , it will call ResolveHints.RemoveAllHintsafter it handle all Spark's hint ,.
So add Hint rule in
def injectResolutionRule(builder: RuleBuilder)
won't work since all hint has been cleared, That's why I add a new method for hint.

new ResolveHints.RemoveAllHints(conf)),
new ResolveHints.ResolveJoinStrategyHints(conf) +:
ResolveHints.ResolveCoalesceHints +:
extendedResolutionHints :+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see.

@gatorsmile and @cloud-fan . How do you think about this extension?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @maryannxue , too. (I think this kind of extensions is basically bug-prone and we always need a better design...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu @maryannxue Any advise to improve this feature.

@dongjoon-hyun
Copy link
Member

Thank you for your contribution, @AngersZhuuuu . Please add a test case .

@dongjoon-hyun
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107431 has finished for PR 25071 at commit e3db168.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

Thank you for your contribution, @AngersZhuuuu . Please add a test case .

I am finding place to add test case. ==
By the way , this pr has been used well in our prod

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107432 has finished for PR 25071 at commit 23a8bcc.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107462 has finished for PR 25071 at commit 4ea911a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2019

Test build #107464 has finished for PR 25071 at commit e283ad9.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

@AngersZhuuuu . The regression test is used to protect your patch. Without that, your contribution can be disabled accidentally in the future. :)

By the way , this pr has been used well in our prod

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Jul 11, 2019

@AngersZhuuuu . The regression test is used to protect your patch. Without that, your contribution can be disabled accidentally in the future. :)

By the way , this pr has been used well in our prod

I know it,。
I'm not familiar with the unit tests about spark, so sometimes it takes time to find the right place and way to write unit tests。I am learning how to write good unit tests for these。
Thank you very much for your guidance

@SparkQA
Copy link

SparkQA commented Jul 11, 2019

Test build #107496 has finished for PR 25071 at commit 8a60a22.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu . The regression test is used to protect your patch. Without that, your contribution can be disabled accidentally in the future. :)

By the way , this pr has been used well in our prod

Unit test added and passed all test.

@dongjoon-hyun
Copy link
Member

Retest this please.

@maropu
Copy link
Member

maropu commented Jul 16, 2019

Hi, @AngersZhuuuu, could you show us use cases for this extension and add them in the PR description? (I think this is one of FAQ for these kinds of similar extensions)

@AngersZhuuuu
Copy link
Contributor Author

Hi, @AngersZhuuuu, could you show us use cases for this extension and add them in the PR description? (I think this is one of FAQ for these kinds of similar extensions)

I will do this later after work.

@SparkQA
Copy link

SparkQA commented Jul 16, 2019

Test build #107717 has finished for PR 25071 at commit 8a60a22.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 16, 2019

retest this please

@AngersZhuuuu
Copy link
Contributor Author

Hi, @AngersZhuuuu, could you show us use cases for this extension and add them in the PR description? (I think this is one of FAQ for these kinds of similar extensions)

Update description, please review it and what else can we add.

@SparkQA
Copy link

SparkQA commented Jul 16, 2019

Test build #107740 has finished for PR 25071 at commit 8a60a22.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110419 has finished for PR 25071 at commit 5ca207c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maryannxue
Copy link
Contributor

maryannxue commented Sep 10, 2019

Sorry to chime in late. First of all, I don't think it's a safe or meaningful change at this point. After join hint refactoring, we now have the assumption that there is no ResolvedHint node in the optimizer, all hints will either become a specific real operator (like repartitioning) or be associated with a Join operator. This means your custom hint-handling rule can only deal with the information at UnresolvedHint level, and I don't see a strong reason for doing that.

@AngersZhuuuu
Copy link
Contributor Author

Sorry to chime in late. First of all, I don't think it's a safe or meaningful change at this point. After join hint refactoring, we now have the assumption that there is no ResolvedHint node in the optimizer, all hints will either become a specific real operator (like repartitioning) or be associated with a Join operator. This means your custom hint-handling rule can only deal with the information at UnresolvedHint level, and I don't see a strong reason for doing that.

In our env, we use this feature to combine some hint defined by ourself. Different hint combined for different spark env for different goal.

May be it have some threshold to use Hint, but inject unsafe rule or strategies also cause problems.

@maryannxue
Copy link
Contributor

maryannxue commented Sep 10, 2019

May be it have some threshold to use Hint, but inject unsafe rule or strategies also cause problems.

Not sure if I understand you. But if you are saying that Spark already has some sort of unsafe extension interface so it shouldn't matter to add more, I would strongly disagree.

@AngersZhuuuu
Copy link
Contributor Author

May be it have some threshold to use Hint, but inject unsafe rule or strategies also cause problems.

Not sure if I understand you. But if you are saying that Spark already has some sort of unsafe extension interface so it shouldn't matter to add more, I would strongly disagree.

I think safety is not the point we need to concern, but demand. For our env, we need this to organize different hints。

@maryannxue
Copy link
Contributor

I think safety is not the point we need to concern, but demand.

If that were true, there would be no point of having interfaces at all, nor should we have this conversation here.

I suggest you either come up with a more complete solution to this demand of yours, or have your own fork.

@gatorsmile
Copy link
Member

gatorsmile commented Sep 10, 2019

Adding new extensions is not needed. See the WIP PR: #25746

@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110423 has finished for PR 25071 at commit af759fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

Closing this in favour of #25746

@AngersZhuuuu
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants