-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-22614] Dataset API: repartitionByRange(...) #19828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
08527b0
950b3dc
671e9e4
fe690fc
66b192d
012d617
60ec0e3
e0bda1d
2f0f8f4
f6cd388
67ca139
012baa0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -839,8 +839,6 @@ case class RepartitionByExpression( | |
|
|
||
| require(numPartitions > 0, s"Number of partitions ($numPartitions) must be positive.") | ||
|
|
||
| require(partitionExpressions.nonEmpty, "At least one partition-by expression must be specified.") | ||
|
|
||
| val partitioning: Partitioning = { | ||
| val (sortOrder, nonSortOrder) = partitionExpressions.partition(_.isInstanceOf[SortOrder]) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Still the same question. What happened when the SortOrder is not at the root node.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's going to follow the |
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2732,16 +2732,18 @@ class Dataset[T] private[sql]( | |
| * @since 2.0.0 | ||
| */ | ||
| @scala.annotation.varargs | ||
| def repartition(numPartitions: Int, partitionExprs: Column*): Dataset[T] = withTypedPlan { | ||
| def repartition(numPartitions: Int, partitionExprs: Column*): Dataset[T] = { | ||
| // The underlying `LogicalPlan` operator special-cases all-`SortOrder` arguments. | ||
| // However, we don't want to complicate the semantics of this API method. Instead, let's | ||
| // give users a friendly error message, pointing them to the new method. | ||
| // However, we don't want to complicate the semantics of this API method. | ||
| // Instead, let's give users a friendly error message, pointing them to the new method. | ||
| val sortOrders = partitionExprs.filter(_.expr.isInstanceOf[SortOrder]) | ||
| if (sortOrders.nonEmpty) throw new IllegalArgumentException( | ||
| s"""Invalid partitionExprs specified: $sortOrders | ||
| |For range partitioning use repartitionByRange(...) instead. | ||
| """.stripMargin) | ||
| RepartitionByExpression(partitionExprs.map(_.expr), logicalPlan, numPartitions) | ||
| withTypedPlan { | ||
| RepartitionByExpression(partitionExprs.map(_.expr), logicalPlan, numPartitions) | ||
| } | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -2763,27 +2765,32 @@ class Dataset[T] private[sql]( | |
| * Returns a new Dataset partitioned by the given partitioning expressions into | ||
| * `numPartitions`. The resulting Dataset is range partitioned. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you update this to describe the latest change? |
||
| * | ||
| * At least one partition-by expression must be specified. | ||
| * When no explicit sort order is specified, "ascending nulls first" is assumed. | ||
| * | ||
| * @group typedrel | ||
| * @since 2.3.0 | ||
| */ | ||
| @scala.annotation.varargs | ||
| def repartitionByRange(numPartitions: Int, partitionExprs: Column*): Dataset[T] = withTypedPlan { | ||
| val sortOrder: Seq[SortOrder] = partitionExprs.map { col => | ||
| col.expr match { | ||
| case expr: SortOrder => | ||
| expr | ||
| case expr: Expression => | ||
| SortOrder(expr, Ascending) | ||
| } | ||
| def repartitionByRange(numPartitions: Int, partitionExprs: Column*): Dataset[T] = { | ||
| require(partitionExprs.nonEmpty, "At least one partition-by expression must be specified.") | ||
| val sortOrder: Seq[SortOrder] = partitionExprs.map(_.expr match { | ||
| case expr: SortOrder => expr | ||
| case expr: Expression => SortOrder(expr, Ascending) | ||
| }) | ||
| withTypedPlan { | ||
| RepartitionByExpression(sortOrder, logicalPlan, numPartitions) | ||
| } | ||
| RepartitionByExpression(sortOrder, logicalPlan, numPartitions) | ||
| } | ||
|
|
||
| /** | ||
| * Returns a new Dataset partitioned by the given partitioning expressions, using | ||
| * `spark.sql.shuffle.partitions` as number of partitions. | ||
| * The resulting Dataset is range partitioned. | ||
| * | ||
| * At least one partition-by expression must be specified. | ||
| * When no explicit sort order is specified, "ascending nulls first" is assumed. | ||
| * | ||
| * @group typedrel | ||
| * @since 2.3.0 | ||
| */ | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for safety, also keep this change?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would change the current behavior of
.repartition(numPartitions, Seq.empty: _*)and I'd like to avoid that.In fact, I've just raised a separate ticket about the latter: SPARK-22665