-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17406][WEB UI] limit timeline executor events #14969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #64970 has finished for PR 14969 at commit
|
|
[error] * method executorIdToData()scala.collection.mutable.HashMap in class org.apache.spark.ui.exec.ExecutorsListener does not have a correspondent in current version I have remove "executorIdToData", why it will failed. |
| val executorIdToData = HashMap[String, ExecutorUIData]() | ||
| var executorEvents = new mutable.ListBuffer[SparkListenerEvent]() | ||
|
|
||
| val MAX_EXECUTOR_LIMIT = conf.getInt("spark.ui.timeline.executors.maximum", 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really executors but tasks right? we already have a property for limiting tasks, spark.ui.timeline.tasks.maximum. It would be reasonable to apply it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is abount executors(SparkListenerExecutorAdded and SparkListenerExecutorRemoved).
|
Test build #64976 has finished for PR 14969 at commit
|
|
Test build #65029 has finished for PR 14969 at commit
|
|
Test build #65045 has finished for PR 14969 at commit
|
|
Test build #65048 has finished for PR 14969 at commit
|
|
Test build #65074 has finished for PR 14969 at commit
|
|
Test build #65106 has finished for PR 14969 at commit
|
|
@srowen I remove parallel maps, please review the latest codes.Thank you! |
| executorToShuffleWrite(eid) = | ||
| executorToShuffleWrite.getOrElse(eid, 0L) + metrics.shuffleWriteMetrics.bytesWritten | ||
| executorToJvmGCTime(eid) = executorToJvmGCTime.getOrElse(eid, 0L) + metrics.jvmGCTime | ||
| executorToTaskSummary(eid).inputBytes = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In many places in this PR, you can write "x += a" instead of "x = x + a". It would be more compact when 'x' is complex like here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, don't you want to just retrieve executorToTaskSummary(eid) once and mutate it?
|
Test build #65191 has finished for PR 14969 at commit
|
| } | ||
| } | ||
|
|
||
| case class ExecutorTaskSummary( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing occurred to me: do we need to make this private[ui]? I think so.
|
Test build #65203 has finished for PR 14969 at commit
|
|
Test build #65204 has finished for PR 14969 at commit
|
| var executorToTaskSummary = LinkedHashMap[String, ExecutorTaskSummary]() | ||
| var executorEvents = new ListBuffer[SparkListenerEvent]() | ||
|
|
||
| private val maxTimelineExecutors = conf.getInt("spark.ui.timeline.executors.maximum", 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting close now, but what about spark.ui.timeline.retainedExecutors? that would be more consistent. Then what about spark.ui.timeline.retainedDeadExecutors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.ui.timeline.executors.maximum is similar to spark.ui.timeline.tasks.maximum. It is a configuration about ExecutorAdded event and ExecutorRemoved event, so spark.ui.timeline.retainedDeadExecutors is not suitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK on spark.ui.timeline.executors.maximum. The dead executor config isn't relevant to the timeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
executorToTaskSummary is used by ExecutorsPage. Dead executors are still retained in ExecutorsPage. So I can't remove this executor's information immediately after it is removed.
|
Looking quite good. The code is significantly simpler after this change too. |
|
Test build #65261 has finished for PR 14969 at commit
|
|
LGTM. Will leave this open a short time for any comments, especially a double check on the property names that are introduced here. I like the cleanup as well as the functionality. |
|
OK |
|
Merged to master |
|
Ah, hm, though this passed the PR builder, for some reason it fails to build because of MiMa checks: You handled a lot of these, so not quite clear what happened, but I'll hotfix it. These are in fact false positives we should exclude. Fixing it in #15110 |
|
Ah, I was confused by MimaExcludes.scala. I asked @liancheng, he told me that just add these to MimaExcludes.scala. I see your HOTFIX, you just remove what I added. If I don't add this changes into MimaExcludes.scala, I can't compile project. Do you know the right way? |
|
I actually just moved them, and added more. Yes, they're needed, but for some reason more excludes were needed even though the PR builder passed. |
|
OK, so it's still not sure that this will never happen again because SparkQA can't find out whether developer has added all excludes. |
## What changes were proposed in this pull request? Following #14969 for some reason the MiMa excludes weren't complete, but still passed the PR builder. This adds 3 more excludes from https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.2/1749/consoleFull It also moves the excludes to their own Seq in the build, as they probably should have been. Even though this is merged to 2.1.x only / master, I left the exclude in for 2.0.x in case we back port. It's a private API so is always a false positive. ## How was this patch tested? Jenkins build Author: Sean Owen <[email protected]> Closes #15110 from srowen/SPARK-17406.2.
## What changes were proposed in this pull request? The job page will be too slow to open when there are thousands of executor events(added or removed). I found that in ExecutorsTab file, executorIdToData will not remove elements, it will increase all the time.Before this pr, it looks like [timeline1.png](https://issues.apache.org/jira/secure/attachment/12827112/timeline1.png). After this pr, it looks like [timeline2.png](https://issues.apache.org/jira/secure/attachment/12827113/timeline2.png)(we can set how many executor events will be displayed) Author: cenyuhai <[email protected]> Closes apache#14969 from cenyuhai/SPARK-17406.
## What changes were proposed in this pull request? Following apache#14969 for some reason the MiMa excludes weren't complete, but still passed the PR builder. This adds 3 more excludes from https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.2/1749/consoleFull It also moves the excludes to their own Seq in the build, as they probably should have been. Even though this is merged to 2.1.x only / master, I left the exclude in for 2.0.x in case we back port. It's a private API so is always a false positive. ## How was this patch tested? Jenkins build Author: Sean Owen <[email protected]> Closes apache#15110 from srowen/SPARK-17406.2.
| val deadExecutors = executorToTaskSummary.filter(e => !e._2.isAlive) | ||
| if (deadExecutors.size > retainedDeadExecutors) { | ||
| val head = deadExecutors.head | ||
| executorToTaskSummary.remove(head._1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we remove only one elements in each time. So we would remove one element when each new executor is added.
Could we remove more elements at once time?
What changes were proposed in this pull request?
The job page will be too slow to open when there are thousands of executor events(added or removed). I found that in ExecutorsTab file, executorIdToData will not remove elements, it will increase all the time.Before this pr, it looks like timeline1.png. After this pr, it looks like timeline2.png(we can set how many executor events will be displayed)