[SPARK-9747] [SQL] Avoid starving an unsafe operator in aggregation #8038

andrewor14 · 2015-08-07T20:44:09Z

This is the sister patch to #8011, but for aggregation.

In a nutshell: create the TungstenAggregationIterator before computing the parent partition. Internally this creates a BytesToBytesMap which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance.

@rxin @yhuai

…emory-agg

yhuai · 2015-08-07T21:17:55Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala

What will happen if there is no memory space left to reserve?

we'll fail fast with "unable to acquire memory" exception

…emory-agg Conflicts: core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala

SparkQA · 2015-08-07T23:36:26Z

Test build #40190 has finished for PR 8038 at commit ca1b44c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-08T00:22:16Z

Test build #40195 has finished for PR 8038 at commit 355a9bd.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class JoinedRow extends InternalRow

rxin · 2015-08-09T07:58:53Z

@andrewor14 can you bring this up to date?

…emory-agg Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala

In TungstenAggregate, we fall back to sort-based aggregation if the hash-based approach cannot request more memory. To do this, we create a new sorter from an existing unsafe map destructively. Because this is largely in place, we don't need to reserve a page in the sorter's constructor.

yhuai · 2015-08-10T20:40:51Z

test this please.

yhuai · 2015-08-10T20:48:52Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala

Seems we should put this comment in the else block. Instead, this branch is used when we do not have input row and there is no grouping expression.

good catch.

…emory-agg

yhuai · 2015-08-10T21:40:05Z

LGTM. Let's wait for jenkins.

SparkQA · 2015-08-10T21:51:54Z

Test build #40328 has finished for PR 8038 at commit b4d3633.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2015-08-10T21:55:32Z

test this please.

SparkQA · 2015-08-10T23:40:44Z

Test build #40317 timed out for PR 8038 at commit 4d416d0 after a configured wait of 175m.

yhuai · 2015-08-10T23:43:26Z

test this please.

SparkQA · 2015-08-11T00:56:26Z

Test build #40334 timed out for PR 8038 at commit b4d3633 after a configured wait of 175m.

yhuai · 2015-08-11T01:57:40Z

...c/test/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIteratorSuite.scala

I feel the spark context we are creating at here messed up the the following tests. How about we comment it out and try the pr builder?

Actually, is it possible to create the taskMemoryManager and shuffleMemoryManager without creating a new SparkContext?

yeah I can figure something out

(this is why we shouldn't have singleton SQLContexts!)

SparkQA · 2015-08-11T02:38:37Z

Test build #1428 timed out for PR 8038 at commit b4d3633 after a configured wait of 175m.

SparkQA · 2015-08-11T20:14:26Z

Test build #40472 timed out for PR 8038 at commit b10a4f3 after a configured wait of 175m.

…emory-agg Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala

andrewor14 · 2015-08-11T21:30:16Z

retest this please

SparkQA · 2015-08-11T21:47:00Z

Test build #1454 has finished for PR 8038 at commit 94ca5de.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-08-11T21:58:53Z

retest this please

yhuai · 2015-08-11T22:33:23Z

I just triggered another build as a backup.

SparkQA · 2015-08-11T23:22:14Z

Test build #40525 has finished for PR 8038 at commit 94ca5de.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2015-08-11T23:24:45Z

compilation failed?

yhuai · 2015-08-11T23:34:01Z

[error] /home/jenkins/workspace/SparkPullRequestBuilder@2/sql/core/src/test/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIteratorSuite.scala:42: not enough arguments for constructor TungstenAggregationIterator: (groupingExpressions: Seq[org.apache.spark.sql.catalyst.expressions.NamedExpression], nonCompleteAggregateExpressions: Seq[org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression2], completeAggregateExpressions: Seq[org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression2], initialInputBufferOffset: Int, resultExpressions: Seq[org.apache.spark.sql.catalyst.expressions.NamedExpression], newMutableProjection: (Seq[org.apache.spark.sql.catalyst.expressions.Expression], Seq[org.apache.spark.sql.catalyst.expressions.Attribute]) => () => org.apache.spark.sql.catalyst.expressions.MutableProjection, originalInputAttributes: Seq[org.apache.spark.sql.catalyst.expressions.Attribute], testFallbackStartsAt: Option[Int], numInputRows: org.apache.spark.sql.execution.metric.LongSQLMetric, numOutputRows: org.apache.spark.sql.execution.metric.LongSQLMetric)org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.
[error] Unspecified value parameters numInputRows, numOutputRows.
[error]       iter = new TungstenAggregationIterator(
[error]              ^
[info] Done updating.
[info] Compiling 2 Scala sources to /home/jenkins/workspace/SparkPullRequestBuilder@2/repl/target/scala-2.10/test-classes...
[error] one error found

SparkQA · 2015-08-11T23:44:18Z

Test build #1463 has finished for PR 8038 at commit 94ca5de.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-08-12T00:48:45Z

OK, I fixed. Jenkins, retest this please?

SparkQA · 2015-08-12T03:49:11Z

Test build #1470 has finished for PR 8038 at commit d4dc9ca.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class SQLTransformer (override val uid: String) extends Transformer
- case class EqualNullSafe(attribute: String, value: Any) extends Filter

yhuai · 2015-08-12T04:03:02Z

[error] /home/jenkins/workspace/NewSparkPullRequestBuilder/sql/core/src/test/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIteratorSuite.scala:43: constructor LongSQLMetric in class LongSQLMetric cannot be accessed in class TungstenAggregationIteratorSuite
[error]       val dummyAccum = new LongSQLMetric("dummy")
[error]                        ^
[info] Compiling 2 Scala sources to /home/jenkins/workspace/NewSparkPullRequestBuilder/repl/target/scala-2.10/test-classes...
[error] one error found

…emory-agg

SparkQA · 2015-08-12T09:19:25Z

Test build #40597 timed out for PR 8038 at commit 7ebf6b9 after a configured wait of 175m.

SparkQA · 2015-08-12T10:19:45Z

Test build #1477 timed out for PR 8038 at commit 7ebf6b9 after a configured wait of 175m.

SparkQA · 2015-08-12T10:20:03Z

Test build #1478 timed out for PR 8038 at commit 7ebf6b9 after a configured wait of 175m.

andrewor14 · 2015-08-12T16:18:56Z

So weird... this finished running all python tests successfully too but still timed out somehow?

rxin · 2015-08-12T17:08:20Z

I'm merging this since it's unlikely a separate issue to cause your test timeout (all the tests did run)

This is the sister patch to #8011, but for aggregation. In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance. rxin yhuai Author: Andrew Or <[email protected]> Closes #8038 from andrewor14/unsafe-starve-memory-agg. (cherry picked from commit e011079) Signed-off-by: Reynold Xin <[email protected]>

SparkQA · 2015-08-12T19:01:43Z

Test build #1495 has finished for PR 8038 at commit 7ebf6b9.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

This is the sister patch to apache#8011, but for aggregation. In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance. rxin yhuai Author: Andrew Or <[email protected]> Closes apache#8038 from andrewor14/unsafe-starve-memory-agg.

Since we do not need to preserve a page before calling compute(), MapPartitionsWithPreparationRDD is not needed anymore. This PR basically revert #8543, #8511, #8038, #8011 Author: Davies Liu <[email protected]> Closes #9381 from davies/remove_prepare2.

Andrew Or added 4 commits August 7, 2015 12:36

Reserve memory in advance in TungstenAggregate

27f2e7f

Actually request the memory in constructor + add tests

6549654

Merge branch 'master' of github.com:apache/spark into unsafe-starve-m…

995be3d

…emory-agg

Minor: Update comment

ca1b44c

andrewor14 changed the title ~~[SPARK-9747] Avoid starving an unsafe operator in aggregation~~ [SPARK-9747] [SQL] Avoid starving an unsafe operator in aggregation Aug 7, 2015

yhuai reviewed Aug 7, 2015
View reviewed changes

Andrew Or added 2 commits August 10, 2015 13:08

yhuai reviewed Aug 10, 2015
View reviewed changes

Andrew Or added 2 commits August 10, 2015 14:24

Merge branch 'master' of github.com:apache/spark into unsafe-starve-m…

70b8500

…emory-agg

Address comments

b4d3633

yhuai reviewed Aug 11, 2015
View reviewed changes

Fix tests

d4dc9ca

Andrew Or added 2 commits August 11, 2015 23:12

Fix tests again

19f2e1b

Merge branch 'master' of github.com:apache/spark into unsafe-starve-m…

7ebf6b9

…emory-agg

asfgit closed this in e011079 Aug 12, 2015

andrewor14 deleted the unsafe-starve-memory-agg branch August 12, 2015 17:45

davies mentioned this pull request Oct 30, 2015

[SPARK-11423] remove MapPartitionsWithPreparationRDD #9381

Closed

[SPARK-9747] [SQL] Avoid starving an unsafe operator in aggregation #8038

[SPARK-9747] [SQL] Avoid starving an unsafe operator in aggregation #8038

Uh oh!

Conversation

andrewor14 commented Aug 7, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 7, 2015

Uh oh!

SparkQA commented Aug 8, 2015

Uh oh!

rxin commented Aug 9, 2015

Uh oh!

yhuai commented Aug 10, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yhuai commented Aug 10, 2015

Uh oh!

SparkQA commented Aug 10, 2015

Uh oh!

yhuai commented Aug 10, 2015

Uh oh!

SparkQA commented Aug 10, 2015

Uh oh!

yhuai commented Aug 10, 2015

Uh oh!

SparkQA commented Aug 11, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 11, 2015

Uh oh!

SparkQA commented Aug 11, 2015

Uh oh!

andrewor14 commented Aug 11, 2015

Uh oh!

SparkQA commented Aug 11, 2015

Uh oh!

andrewor14 commented Aug 11, 2015

Uh oh!

yhuai commented Aug 11, 2015

Uh oh!

SparkQA commented Aug 11, 2015

Uh oh!

rxin commented Aug 11, 2015

Uh oh!

yhuai commented Aug 11, 2015

Uh oh!

SparkQA commented Aug 11, 2015

Uh oh!

andrewor14 commented Aug 12, 2015

Uh oh!

SparkQA commented Aug 12, 2015

Uh oh!

yhuai commented Aug 12, 2015

Uh oh!

SparkQA commented Aug 12, 2015

Uh oh!

SparkQA commented Aug 12, 2015

Uh oh!

SparkQA commented Aug 12, 2015

Uh oh!

andrewor14 commented Aug 12, 2015

Uh oh!

rxin commented Aug 12, 2015

Uh oh!

SparkQA commented Aug 12, 2015

Uh oh!