-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28292][SQL] Enable inject user-defined Hint #25071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * Inject an analyzer resolution `Hint` builder into the [[SparkSession]]. These analyzer | ||
| * rules will be executed as part of the resolution phase of analysis. | ||
| */ | ||
| def injectResolutionHint(builder: RuleBuilder): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try to use def injectResolutionRule(builder: RuleBuilder)? Is it insufficient for your use-case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun
Since for Hint , it will call ResolveHints.RemoveAllHintsafter it handle all Spark's hint ,.
So add Hint rule in
def injectResolutionRule(builder: RuleBuilder)
won't work since all hint has been cleared, That's why I add a new method for hint.
| new ResolveHints.RemoveAllHints(conf)), | ||
| new ResolveHints.ResolveJoinStrategyHints(conf) +: | ||
| ResolveHints.ResolveCoalesceHints +: | ||
| extendedResolutionHints :+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see.
@gatorsmile and @cloud-fan . How do you think about this extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @maryannxue , too. (I think this kind of extensions is basically bug-prone and we always need a better design...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu @maryannxue Any advise to improve this feature.
|
Thank you for your contribution, @AngersZhuuuu . Please add a test case . |
|
ok to test |
|
Test build #107431 has finished for PR 25071 at commit
|
I am finding place to add test case. == |
|
Test build #107432 has finished for PR 25071 at commit
|
|
Test build #107462 has finished for PR 25071 at commit
|
|
Test build #107464 has finished for PR 25071 at commit
|
|
@AngersZhuuuu . The regression test is used to protect your patch. Without that, your contribution can be disabled accidentally in the future. :)
|
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
Outdated
Show resolved
Hide resolved
I know it,。 |
|
Test build #107496 has finished for PR 25071 at commit
|
Unit test added and passed all test. |
|
Retest this please. |
|
Hi, @AngersZhuuuu, could you show us use cases for this extension and add them in the PR description? (I think this is one of FAQ for these kinds of similar extensions) |
I will do this later after work. |
|
Test build #107717 has finished for PR 25071 at commit
|
|
retest this please |
Update description, please review it and what else can we add. |
|
Test build #107740 has finished for PR 25071 at commit
|
|
Test build #110419 has finished for PR 25071 at commit
|
|
Sorry to chime in late. First of all, I don't think it's a safe or meaningful change at this point. After join hint refactoring, we now have the assumption that there is no |
In our env, we use this feature to combine some hint defined by ourself. Different hint combined for different spark env for different goal. May be it have some threshold to use Hint, but inject unsafe rule or strategies also cause problems. |
Not sure if I understand you. But if you are saying that Spark already has some sort of unsafe extension interface so it shouldn't matter to add more, I would strongly disagree. |
I think safety is not the point we need to concern, but demand. For our env, we need this to organize different hints。 |
If that were true, there would be no point of having interfaces at all, nor should we have this conversation here. I suggest you either come up with a more complete solution to this demand of yours, or have your own fork. |
|
Adding new extensions is not needed. See the WIP PR: #25746 |
|
Test build #110423 has finished for PR 25071 at commit
|
|
Can one of the admins verify this patch? |
|
Closing this in favour of #25746 |
What changes were proposed in this pull request?
Current catalyst construct, we can't add our user-defined hint to Analyzer. Since Catalyst will resolve hint first when analyze LogicalPlan, after it resolve it' s own hints, it will call RemoveAllHint, so when we add Rule of hint to SparkSessionExtension, it will be ignored.
This PR is a small extension to SparkSessionExtension. It enable user to add user-defined hint to Analyze.
FAQ:
Q: Why we need this extension?
A: Such as DataBricks, in their delta platform, the add more hint for better join behavior such as RANK_JOIN, and SKEW_JOIN, through these hints and their parameter, Catalyst will choose a better behavior to solve un-healthy join operator. And we also can add some other hints to restrict user's behavior such as we only can submit query like SELECT * FROM TABLE with a spec hint flag.
With this extension, you will have more freedom to customize your catalyst.
And will a hint extension, you can combine you code with SparkSessionExtension. More safety.
Q: How to write your hints.
A: Blow is a simple example, it's function is to use hint to pass a LIMIT behavior to a subquery:
SQL =>
SELECT /* MY_HINT(10) */ * FROM TEST_TABLEit will return same behavior as
SELECT * FROM TEST_TABLE LIMIT 10How was this patch tested?
Added unit test.