Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
polish the comment for why execute the OptimizaLocalShuffleReader rul…
…e twice and before ReduceNumShufflePartitions
  • Loading branch information
JkSelf committed Oct 23, 2019
commit b372636f8820ece343e7fd7c1e1a0d1f60fadb8f
Original file line number Diff line number Diff line change
Expand Up @@ -92,13 +92,14 @@ case class AdaptiveSparkPlanExec(
// optimizations should be stage-independent.
@transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
ReuseAdaptiveSubquery(conf, subqueryCache),
// We will revert the all local shuffle reader node in OptimizeLocalShuffleReader rule
// when introduce additional shuffle. It may be too conservative. So we also re-execute this
// rule when creating new query stage.
// Why need put the OptimizeLocalShuffleReader rule before ReduceNumShufflePartitions rule:
// After we optimize the shuffle reader to local shuffle reader(leaf node),
// it does not need to further optimize the reduce number. So we need to put
// the OptimizeLocalShuffleReader rule before ReduceNumShufflePartitions rule.

// When adding local shuffle readers in 'OptimizeLocalShuffleReader`, we revert all the local
// readers if additional shuffles are introduced. This may be too conservative: maybe there is
// only one local reader that introduces shuffle, and we can still keep other local readers.
// Here we re-execute this rule with the sub-plan-tree of a query stage, to make sure necessary
// local readers are added before executing the query stage.
// This rule must be executed before `ReduceNumShufflePartitions`, as local shuffle readers
// can't change number of partitions.
OptimizeLocalShuffleReader(conf),
ReduceNumShufflePartitions(conf),
ApplyColumnarRulesAndInsertTransitions(session.sessionState.conf,
Expand Down