-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16475][SQL] Broadcast hint for SQL Queries #16925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
539782d
[SPARK-16475][SQL] Broadcast Hint for SQL Queries
dongjoon-hyun 318bc03
Merge pull request #14426 from dongjoon-hyun/SPARK-16475-HINT
rxin c702e3e
Get rid of the merge leftover.
rxin a095df3
Move rule out to its own file.
rxin 617f8a2
Rewrote the PR.
rxin 51a73d5
Separate rule.
rxin 0d42978
CR
rxin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Separate rule.
- Loading branch information
commit 51a73d510a5faf3bbed8bb76ffe5590c68a67e2c
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,62 +24,80 @@ import org.apache.spark.sql.catalyst.trees.CurrentOrigin | |
|
|
||
|
|
||
| /** | ||
| * Substitute Hints. | ||
| * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters. | ||
| * Collection of rules related to hints. The only hint currently available is broadcast join hint. | ||
| * | ||
| * In the case of broadcast hint, we find the frontier of | ||
| * | ||
| * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations` | ||
| * rule is applied. Here are two reasons. | ||
| * - To support `MetastoreRelation` in Hive module. | ||
| * - To reduce the effect of `Hint` on the other rules. | ||
| * | ||
| * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan. | ||
| * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here. | ||
| * Note that this is separatedly into two rules because in the future we might introduce new hint | ||
| * rules that have different ordering requirements from broadcast. | ||
| */ | ||
| class SubstituteHints(conf: CatalystConf) extends Rule[LogicalPlan] { | ||
| private val BROADCAST_HINT_NAMES = Set("BROADCAST", "BROADCASTJOIN", "MAPJOIN") | ||
| object SubstituteHints { | ||
|
|
||
| /** | ||
| * Substitute Hints. | ||
| * | ||
| * The only hint currently available is broadcast join hint. | ||
| * | ||
| * For broadcast hint, we accept "BROADCAST", "BROADCASTJOIN", and "MAPJOIN", and a sequence of | ||
| * relation aliases can be specified in the hint. A broadcast hint plan node will be inserted | ||
| * on top of any relation (that is not aliased differently), subquery, or common table expression | ||
| * that match the specified name. | ||
| * | ||
| * The hint resolution works by recursively traversing down the query plan to find a relation or | ||
| * subquery that matches one of the specified broadcast aliases. The traversal does not go past | ||
| * beyond any existing broadcast hints, subquery aliases. | ||
| * | ||
| * This rule must happen before common table expressions. | ||
| */ | ||
| class SubstituteBroadcastHints(conf: CatalystConf) extends Rule[LogicalPlan] { | ||
| private val BROADCAST_HINT_NAMES = Set("BROADCAST", "BROADCASTJOIN", "MAPJOIN") | ||
|
|
||
| def resolver: Resolver = conf.resolver | ||
| def resolver: Resolver = conf.resolver | ||
|
|
||
| private def applyBroadcastHint(plan: LogicalPlan, toBroadcast: Set[String]): LogicalPlan = { | ||
| // Whether to continue recursing down the tree | ||
| var recurse = true | ||
| private def applyBroadcastHint(plan: LogicalPlan, toBroadcast: Set[String]): LogicalPlan = { | ||
| // Whether to continue recursing down the tree | ||
| var recurse = true | ||
|
|
||
| val newNode = CurrentOrigin.withOrigin(plan.origin) { | ||
| plan match { | ||
| case r: UnresolvedRelation => | ||
| val alias = r.alias.getOrElse(r.tableIdentifier.table) | ||
| if (toBroadcast.exists(resolver(_, alias))) BroadcastHint(plan) else plan | ||
| case r: SubqueryAlias => | ||
| if (toBroadcast.exists(resolver(_, r.alias))) { | ||
| BroadcastHint(plan) | ||
| } else { | ||
| // Don't recurse down subquery aliases if there are no match. | ||
| val newNode = CurrentOrigin.withOrigin(plan.origin) { | ||
| plan match { | ||
| case r: UnresolvedRelation => | ||
| val alias = r.alias.getOrElse(r.tableIdentifier.table) | ||
| if (toBroadcast.exists(resolver(_, alias))) BroadcastHint(plan) else plan | ||
| case r: SubqueryAlias => | ||
| if (toBroadcast.exists(resolver(_, r.alias))) { | ||
| BroadcastHint(plan) | ||
| } else { | ||
| // Don't recurse down subquery aliases if there are no match. | ||
| recurse = false | ||
| plan | ||
| } | ||
| case _: BroadcastHint => | ||
| // Found a broadcast hint; don't change the plan but also don't recurse down. | ||
| recurse = false | ||
| plan | ||
| } | ||
| case _: BroadcastHint => | ||
| // Found a broadcast hint; don't change the plan but also don't recurse down. | ||
| recurse = false | ||
| plan | ||
| case _ => | ||
| plan | ||
| case _ => | ||
| plan | ||
| } | ||
| } | ||
|
|
||
| if ((plan fastEquals newNode) && recurse) { | ||
| newNode.mapChildren(child => applyBroadcastHint(child, toBroadcast)) | ||
| } else { | ||
| newNode | ||
| } | ||
| } | ||
|
|
||
| if ((plan fastEquals newNode) && recurse) { | ||
| newNode.mapChildren(child => applyBroadcastHint(child, toBroadcast)) | ||
| } else { | ||
| newNode | ||
| def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { | ||
| case h: Hint if BROADCAST_HINT_NAMES.contains(h.name.toUpperCase) => | ||
| applyBroadcastHint(h.child, h.parameters.toSet) | ||
| } | ||
| } | ||
|
|
||
| def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { | ||
| case h: Hint if BROADCAST_HINT_NAMES.contains(h.name.toUpperCase) => | ||
| applyBroadcastHint(h.child, h.parameters.toSet) | ||
|
|
||
| // Remove unrecognized hints | ||
| case h: Hint => h.child | ||
| /** | ||
| * Removes all the hints. This must be executed after all the other hint rules are executed. | ||
|
||
| */ | ||
| object RemoveAllHints extends Rule[LogicalPlan] { | ||
| def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { | ||
| case h: Hint => h.child | ||
| } | ||
| } | ||
|
|
||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the case of self-join, we may broadcast both side, is it expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. Both being broadcastable doesn't mean we broadcast both.