[SPARK-17406][WEB UI] limit timeline executor events #14969

cenyuhai · 2016-09-06T05:23:52Z

What changes were proposed in this pull request?

The job page will be too slow to open when there are thousands of executor events(added or removed). I found that in ExecutorsTab file, executorIdToData will not remove elements, it will increase all the time.Before this pr, it looks like timeline1.png. After this pr, it looks like timeline2.png(we can set how many executor events will be displayed)

SparkQA · 2016-09-06T05:46:30Z

Test build #64970 has finished for PR 14969 at commit c368f88.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

cenyuhai · 2016-09-06T06:38:48Z

[error] * method executorIdToData()scala.collection.mutable.HashMap in class org.apache.spark.ui.exec.ExecutorsListener does not have a correspondent in current version
[error] filter with: ProblemFilters.excludeDirectMissingMethodProblem

I have remove "executorIdToData", why it will failed.

srowen · 2016-09-06T09:02:44Z

core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala

-  val executorIdToData = HashMap[String, ExecutorUIData]()
+  var executorEvents = new mutable.ListBuffer[SparkListenerEvent]()
+
+  val MAX_EXECUTOR_LIMIT = conf.getInt("spark.ui.timeline.executors.maximum", 1000)


It's not really executors but tasks right? we already have a property for limiting tasks, spark.ui.timeline.tasks.maximum. It would be reasonable to apply it here?

No， it is abount executors(SparkListenerExecutorAdded and SparkListenerExecutorRemoved).

SparkQA · 2016-09-06T09:16:44Z

Test build #64976 has finished for PR 14969 at commit ba17918.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-07T07:04:22Z

Test build #65029 has finished for PR 14969 at commit 9169901.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class ExecutorTaskSummary(

SparkQA · 2016-09-07T16:21:42Z

Test build #65045 has finished for PR 14969 at commit 2d445cb.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-07T17:37:48Z

Test build #65048 has finished for PR 14969 at commit a7e261c.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-08T05:09:00Z

Test build #65074 has finished for PR 14969 at commit 4b865e5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-08T18:57:27Z

Test build #65106 has finished for PR 14969 at commit a7f0ec3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cenyuhai · 2016-09-09T02:45:58Z

@srowen I remove parallel maps, please review the latest codes.Thank you!

srowen · 2016-09-09T18:03:07Z

core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala

-        executorToShuffleWrite(eid) =
-          executorToShuffleWrite.getOrElse(eid, 0L) + metrics.shuffleWriteMetrics.bytesWritten
-        executorToJvmGCTime(eid) = executorToJvmGCTime.getOrElse(eid, 0L) + metrics.jvmGCTime
+        executorToTaskSummary(eid).inputBytes =


In many places in this PR, you can write "x += a" instead of "x = x + a". It would be more compact when 'x' is complex like here.

Also, don't you want to just retrieve executorToTaskSummary(eid) once and mutate it?

SparkQA · 2016-09-10T06:09:19Z

Test build #65191 has finished for PR 14969 at commit 0080f14.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-09-10T08:30:28Z

core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala

  }
 }

+case class ExecutorTaskSummary(


One more thing occurred to me: do we need to make this private[ui]? I think so.

SparkQA · 2016-09-10T10:59:43Z

Test build #65203 has finished for PR 14969 at commit 4dda55c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-10T11:50:02Z

Test build #65204 has finished for PR 14969 at commit c725891.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-09-12T10:56:12Z

core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala

+  var executorToTaskSummary = LinkedHashMap[String, ExecutorTaskSummary]()
+  var executorEvents = new ListBuffer[SparkListenerEvent]()
+
+  private val maxTimelineExecutors = conf.getInt("spark.ui.timeline.executors.maximum", 1000)


Getting close now, but what about spark.ui.timeline.retainedExecutors? that would be more consistent. Then what about spark.ui.timeline.retainedDeadExecutors?

spark.ui.timeline.executors.maximum is similar to spark.ui.timeline.tasks.maximum. It is a configuration about ExecutorAdded event and ExecutorRemoved event, so spark.ui.timeline.retainedDeadExecutors is not suitable.

OK on spark.ui.timeline.executors.maximum. The dead executor config isn't relevant to the timeline?

executorToTaskSummary is used by ExecutorsPage. Dead executors are still retained in ExecutorsPage. So I can't remove this executor's information immediately after it is removed.

srowen · 2016-09-12T11:01:43Z

Looking quite good. The code is significantly simpler after this change too.

SparkQA · 2016-09-12T17:39:22Z

Test build #65261 has finished for PR 14969 at commit ac99524.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-09-13T08:44:46Z

LGTM. Will leave this open a short time for any comments, especially a double check on the property names that are introduced here. I like the cleanup as well as the functionality.

cenyuhai · 2016-09-13T11:11:30Z

OK

srowen · 2016-09-15T08:59:11Z

Merged to master

srowen · 2016-09-15T09:57:10Z

Ah, hm, though this passed the PR builder, for some reason it fails to build because of MiMa checks:

[error]  * method executorToTotalCores()scala.collection.mutable.HashMap in class org.apache.spark.ui.exec.ExecutorsListener does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ui.exec.ExecutorsListener.executorToTotalCores")
[error]  * method executorToTasksMax()scala.collection.mutable.HashMap in class org.apache.spark.ui.exec.ExecutorsListener does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ui.exec.ExecutorsListener.executorToTasksMax")
[error]  * method executorToJvmGCTime()scala.collection.mutable.HashMap in class org.apache.spark.ui.exec.ExecutorsListener does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[DirectMissingMethodProblem]

You handled a lot of these, so not quite clear what happened, but I'll hotfix it. These are in fact false positives we should exclude.

Fixing it in #15110

cenyuhai · 2016-09-15T11:12:28Z

Ah, I was confused by MimaExcludes.scala. I asked @liancheng, he told me that just add these to MimaExcludes.scala. I see your HOTFIX, you just remove what I added. If I don't add this changes into MimaExcludes.scala, I can't compile project. Do you know the right way?

srowen · 2016-09-15T11:14:35Z

I actually just moved them, and added more. Yes, they're needed, but for some reason more excludes were needed even though the PR builder passed.

cenyuhai · 2016-09-15T12:05:12Z

OK, so it's still not sure that this will never happen again because SparkQA can't find out whether developer has added all excludes.

## What changes were proposed in this pull request? Following #14969 for some reason the MiMa excludes weren't complete, but still passed the PR builder. This adds 3 more excludes from https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.2/1749/consoleFull It also moves the excludes to their own Seq in the build, as they probably should have been. Even though this is merged to 2.1.x only / master, I left the exclude in for 2.0.x in case we back port. It's a private API so is always a false positive. ## How was this patch tested? Jenkins build Author: Sean Owen <[email protected]> Closes #15110 from srowen/SPARK-17406.2.

## What changes were proposed in this pull request? The job page will be too slow to open when there are thousands of executor events(added or removed). I found that in ExecutorsTab file, executorIdToData will not remove elements, it will increase all the time.Before this pr, it looks like [timeline1.png](https://issues.apache.org/jira/secure/attachment/12827112/timeline1.png). After this pr, it looks like [timeline2.png](https://issues.apache.org/jira/secure/attachment/12827113/timeline2.png)(we can set how many executor events will be displayed) Author: cenyuhai <[email protected]> Closes apache#14969 from cenyuhai/SPARK-17406.

## What changes were proposed in this pull request? Following apache#14969 for some reason the MiMa excludes weren't complete, but still passed the PR builder. This adds 3 more excludes from https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.2/1749/consoleFull It also moves the excludes to their own Seq in the build, as they probably should have been. Even though this is merged to 2.1.x only / master, I left the exclude in for 2.0.x in case we back port. It's a private API so is always a false positive. ## How was this patch tested? Jenkins build Author: Sean Owen <[email protected]> Closes apache#15110 from srowen/SPARK-17406.2.

zhaorongsheng · 2017-01-24T02:54:05Z

core/src/main/scala/org/apache/spark/ui/exec/ExecutorsTab.scala

+    val deadExecutors = executorToTaskSummary.filter(e => !e._2.isAlive)
+    if (deadExecutors.size > retainedDeadExecutors) {
+      val head = deadExecutors.head
+      executorToTaskSummary.remove(head._1)


Here we remove only one elements in each time. So we would remove one element when each new executor is added.
Could we remove more elements at once time?

limit timeline executor events

c368f88

cenyuhai changed the title ~~[SPARK-17406][WEB-UI] limit timeline executor events~~ [SPARK-17406][WEB UI] limit timeline executor events Sep 6, 2016

exclude executorIdToData()

ba17918

srowen reviewed Sep 6, 2016
View reviewed changes

use case class instead of HashMap

9169901

cenyuhai added 2 commits September 8, 2016 00:22

change executorToLogUrls to ExecutorTaskSummary.executorLogs

2d445cb

fix compile error

a7e261c

exclude all binary compatibility errors

4b865e5

fix errors in history executor tab

a7f0ec3

srowen reviewed Sep 9, 2016
View reviewed changes

change code style

0080f14

srowen reviewed Sep 10, 2016
View reviewed changes

add private[ui] for new case class

4dda55c

simplifying codes

c725891

srowen reviewed Sep 12, 2016
View reviewed changes

remove unnecessary variable

ac99524

asfgit closed this in ad79fc0 Sep 15, 2016

srowen mentioned this pull request Sep 15, 2016

[SPARK-17406] [BUILD] [HOTFIX] MiMa excludes fix #15110

Closed

cenyuhai deleted the SPARK-17406 branch September 15, 2016 12:06

zhaorongsheng reviewed Jan 24, 2017

View reviewed changes

[SPARK-17406][WEB UI] limit timeline executor events #14969

[SPARK-17406][WEB UI] limit timeline executor events #14969

Uh oh!

Conversation

cenyuhai commented Sep 6, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Uh oh!

SparkQA commented Sep 6, 2016

Uh oh!

cenyuhai commented Sep 6, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 6, 2016

Uh oh!

SparkQA commented Sep 7, 2016

Uh oh!

SparkQA commented Sep 7, 2016

Uh oh!

SparkQA commented Sep 7, 2016

Uh oh!

SparkQA commented Sep 8, 2016

Uh oh!

SparkQA commented Sep 8, 2016

Uh oh!

cenyuhai commented Sep 9, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen Sep 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 10, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 10, 2016

Uh oh!

SparkQA commented Sep 10, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen commented Sep 12, 2016

Uh oh!

SparkQA commented Sep 12, 2016

Uh oh!

srowen commented Sep 13, 2016

Uh oh!

cenyuhai commented Sep 13, 2016

Uh oh!

srowen commented Sep 15, 2016

Uh oh!

srowen commented Sep 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cenyuhai commented Sep 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Sep 15, 2016

Uh oh!

cenyuhai commented Sep 15, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

cenyuhai commented Sep 6, 2016 •

edited

Loading

srowen Sep 9, 2016 •

edited

Loading

srowen commented Sep 15, 2016 •

edited

Loading

cenyuhai commented Sep 15, 2016 •

edited

Loading