Skip to content

Conversation

@cenyuhai
Copy link
Contributor

@cenyuhai cenyuhai commented Sep 6, 2016

What changes were proposed in this pull request?

The job page will be too slow to open when there are thousands of executor events(added or removed). I found that in ExecutorsTab file, executorIdToData will not remove elements, it will increase all the time.Before this pr, it looks like timeline1.png. After this pr, it looks like timeline2.png(we can set how many executor events will be displayed)

@SparkQA
Copy link

SparkQA commented Sep 6, 2016

Test build #64970 has finished for PR 14969 at commit c368f88.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cenyuhai
Copy link
Contributor Author

cenyuhai commented Sep 6, 2016

[error] * method executorIdToData()scala.collection.mutable.HashMap in class org.apache.spark.ui.exec.ExecutorsListener does not have a correspondent in current version
[error] filter with: ProblemFilters.excludeDirectMissingMethodProblem

I have remove "executorIdToData", why it will failed.

@cenyuhai cenyuhai changed the title [SPARK-17406][WEB-UI] limit timeline executor events [SPARK-17406][WEB UI] limit timeline executor events Sep 6, 2016
val executorIdToData = HashMap[String, ExecutorUIData]()
var executorEvents = new mutable.ListBuffer[SparkListenerEvent]()

val MAX_EXECUTOR_LIMIT = conf.getInt("spark.ui.timeline.executors.maximum", 1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really executors but tasks right? we already have a property for limiting tasks, spark.ui.timeline.tasks.maximum. It would be reasonable to apply it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is abount executors(SparkListenerExecutorAdded and SparkListenerExecutorRemoved).

@SparkQA
Copy link

SparkQA commented Sep 6, 2016

Test build #64976 has finished for PR 14969 at commit ba17918.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 7, 2016

Test build #65029 has finished for PR 14969 at commit 9169901.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ExecutorTaskSummary(

@SparkQA
Copy link

SparkQA commented Sep 7, 2016

Test build #65045 has finished for PR 14969 at commit 2d445cb.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 7, 2016

Test build #65048 has finished for PR 14969 at commit a7e261c.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 8, 2016

Test build #65074 has finished for PR 14969 at commit 4b865e5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 8, 2016

Test build #65106 has finished for PR 14969 at commit a7f0ec3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cenyuhai
Copy link
Contributor Author

cenyuhai commented Sep 9, 2016

@srowen I remove parallel maps, please review the latest codes.Thank you!

executorToShuffleWrite(eid) =
executorToShuffleWrite.getOrElse(eid, 0L) + metrics.shuffleWriteMetrics.bytesWritten
executorToJvmGCTime(eid) = executorToJvmGCTime.getOrElse(eid, 0L) + metrics.jvmGCTime
executorToTaskSummary(eid).inputBytes =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In many places in this PR, you can write "x += a" instead of "x = x + a". It would be more compact when 'x' is complex like here.

Copy link
Member

@srowen srowen Sep 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, don't you want to just retrieve executorToTaskSummary(eid) once and mutate it?

@SparkQA
Copy link

SparkQA commented Sep 10, 2016

Test build #65191 has finished for PR 14969 at commit 0080f14.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
}

case class ExecutorTaskSummary(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing occurred to me: do we need to make this private[ui]? I think so.

@SparkQA
Copy link

SparkQA commented Sep 10, 2016

Test build #65203 has finished for PR 14969 at commit 4dda55c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 10, 2016

Test build #65204 has finished for PR 14969 at commit c725891.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

var executorToTaskSummary = LinkedHashMap[String, ExecutorTaskSummary]()
var executorEvents = new ListBuffer[SparkListenerEvent]()

private val maxTimelineExecutors = conf.getInt("spark.ui.timeline.executors.maximum", 1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close now, but what about spark.ui.timeline.retainedExecutors? that would be more consistent. Then what about spark.ui.timeline.retainedDeadExecutors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark.ui.timeline.executors.maximum is similar to spark.ui.timeline.tasks.maximum. It is a configuration about ExecutorAdded event and ExecutorRemoved event, so spark.ui.timeline.retainedDeadExecutors is not suitable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK on spark.ui.timeline.executors.maximum. The dead executor config isn't relevant to the timeline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executorToTaskSummary is used by ExecutorsPage. Dead executors are still retained in ExecutorsPage. So I can't remove this executor's information immediately after it is removed.

@srowen
Copy link
Member

srowen commented Sep 12, 2016

Looking quite good. The code is significantly simpler after this change too.

@SparkQA
Copy link

SparkQA commented Sep 12, 2016

Test build #65261 has finished for PR 14969 at commit ac99524.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 13, 2016

LGTM. Will leave this open a short time for any comments, especially a double check on the property names that are introduced here. I like the cleanup as well as the functionality.

@cenyuhai
Copy link
Contributor Author

OK

@srowen
Copy link
Member

srowen commented Sep 15, 2016

Merged to master

@asfgit asfgit closed this in ad79fc0 Sep 15, 2016
@srowen
Copy link
Member

srowen commented Sep 15, 2016

Ah, hm, though this passed the PR builder, for some reason it fails to build because of MiMa checks:

[error]  * method executorToTotalCores()scala.collection.mutable.HashMap in class org.apache.spark.ui.exec.ExecutorsListener does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ui.exec.ExecutorsListener.executorToTotalCores")
[error]  * method executorToTasksMax()scala.collection.mutable.HashMap in class org.apache.spark.ui.exec.ExecutorsListener does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ui.exec.ExecutorsListener.executorToTasksMax")
[error]  * method executorToJvmGCTime()scala.collection.mutable.HashMap in class org.apache.spark.ui.exec.ExecutorsListener does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[DirectMissingMethodProblem]

You handled a lot of these, so not quite clear what happened, but I'll hotfix it. These are in fact false positives we should exclude.

Fixing it in #15110

@cenyuhai
Copy link
Contributor Author

cenyuhai commented Sep 15, 2016

Ah, I was confused by MimaExcludes.scala. I asked @liancheng, he told me that just add these to MimaExcludes.scala. I see your HOTFIX, you just remove what I added. If I don't add this changes into MimaExcludes.scala, I can't compile project. Do you know the right way?

@srowen
Copy link
Member

srowen commented Sep 15, 2016

I actually just moved them, and added more. Yes, they're needed, but for some reason more excludes were needed even though the PR builder passed.

@cenyuhai
Copy link
Contributor Author

OK, so it's still not sure that this will never happen again because SparkQA can't find out whether developer has added all excludes.

@cenyuhai cenyuhai deleted the SPARK-17406 branch September 15, 2016 12:06
asfgit pushed a commit that referenced this pull request Sep 15, 2016
## What changes were proposed in this pull request?

Following #14969 for some reason the MiMa excludes weren't complete, but still passed the PR builder. This adds 3 more excludes from https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.2/1749/consoleFull

It also moves the excludes to their own Seq in the build, as they probably should have been.
Even though this is merged to 2.1.x only / master, I left the exclude in for 2.0.x in case we back port. It's a private API so is always a false positive.

## How was this patch tested?

Jenkins build

Author: Sean Owen <[email protected]>

Closes #15110 from srowen/SPARK-17406.2.
wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
## What changes were proposed in this pull request?
The job page will be too slow to open when there are thousands of executor events(added or removed). I found that in ExecutorsTab file, executorIdToData will not remove elements, it will increase all the time.Before this pr, it looks like [timeline1.png](https://issues.apache.org/jira/secure/attachment/12827112/timeline1.png). After this pr, it looks like [timeline2.png](https://issues.apache.org/jira/secure/attachment/12827113/timeline2.png)(we can set how many executor events will be displayed)

Author: cenyuhai <[email protected]>

Closes apache#14969 from cenyuhai/SPARK-17406.
wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
## What changes were proposed in this pull request?

Following apache#14969 for some reason the MiMa excludes weren't complete, but still passed the PR builder. This adds 3 more excludes from https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.2/1749/consoleFull

It also moves the excludes to their own Seq in the build, as they probably should have been.
Even though this is merged to 2.1.x only / master, I left the exclude in for 2.0.x in case we back port. It's a private API so is always a false positive.

## How was this patch tested?

Jenkins build

Author: Sean Owen <[email protected]>

Closes apache#15110 from srowen/SPARK-17406.2.
val deadExecutors = executorToTaskSummary.filter(e => !e._2.isAlive)
if (deadExecutors.size > retainedDeadExecutors) {
val head = deadExecutors.head
executorToTaskSummary.remove(head._1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we remove only one elements in each time. So we would remove one element when each new executor is added.
Could we remove more elements at once time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants