-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25170][DOC] Add list and short description of Spark Executor Task Metrics to the documentation #22167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -388,6 +388,163 @@ value triggering garbage collection on jobs, and `spark.ui.retainedStages` that | |
| Note that the garbage collection takes place on playback: it is possible to retrieve | ||
| more entries by increasing these values and restarting the history server. | ||
|
|
||
| ### Executor Task Metrics | ||
|
|
||
| The REST API exposes the values of the Task Metrics collected by Spark executors at the | ||
| task execution level. The metrics can be used for performance troubleshooting. | ||
| A list of the available metrics with a short description: | ||
|
|
||
| <table class="table"> | ||
| <tr><th>Spark Executor Task Metric name</th> | ||
| <th>Short description</th> | ||
| </tr> | ||
| <tr> | ||
| <td>executorRunTime</td> | ||
| <td>Time the executor spent running this task. This includes time fetching shuffle data. | ||
| The value is expressed in milliseconds.</td> | ||
| </tr> | ||
| <tr> | ||
| <td>executorCpuTime | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we miss |
||
| <td>CPU Time the executor spent running this task. This includes time fetching shuffle data. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: |
||
| The value is expressed in nanoseconds. | ||
| </tr> | ||
| <tr> | ||
| <td>executorDeserializeTime</td> | ||
| <td>Time taken on the executor to deserialize this task. | ||
| The value is expressed in milliseconds.</td> | ||
| </tr> | ||
| <tr> | ||
| <td>executorDeserializeCpuTime</td> | ||
| <td>CPU Time taken on the executor to deserialize this task. | ||
| The value is expressed in nanoseconds.</td> | ||
| </tr> | ||
| <tr> | ||
| <td>resultSize</td> | ||
| <td>The number of bytes this task transmitted back to the driver as the TaskResult.</td> | ||
| </tr> | ||
| <tr> | ||
| <td>jvmGCTime</td> | ||
| <td>Amount of time the JVM spent in garbage collection while executing this task. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we start with |
||
| The value is expressed in milliseconds.</td> | ||
| </tr> | ||
| <tr> | ||
| <td>resultSerializationTime</td> | ||
| <td>Amount of time spent serializing the task result. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto |
||
| The value is expressed in milliseconds.</td> | ||
| </tr> | ||
| <tr> | ||
| <td>memoryBytesSpilled</td> | ||
| <td>The number of in-memory bytes spilled by this task.</td> | ||
| </tr> | ||
| <tr> | ||
| <td>diskBytesSpilled</td> | ||
| <td>The number of on-disk bytes spilled by this task.</td> | ||
| </tr> | ||
| <tr> | ||
| <td>peakExecutionMemory</td> | ||
| <td>Peak memory used by internal data structures created during shuffles, aggregations and | ||
| joins. The value of this accumulator should be approximately the sum of the peak sizes | ||
| across all such data structures created in this task. For SQL jobs, this only tracks all | ||
| unsafe operators and ExternalSort.</td> | ||
| </tr> | ||
| <tr> | ||
| <td>inputMetrics.* | ||
| </td> | ||
| <td>Metrics related to reading data from [[org.apache.spark.rdd.HadoopRDD]] | ||
| or from persisted data. | ||
| </tr> | ||
| <tr> | ||
| <td> .bytesRead</td> | ||
| <td>Total number of bytes read.</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .recordsRead | ||
| </td> | ||
| <td>Total number of records read.</td> | ||
| </tr> | ||
| <tr> | ||
| <td>outputMetrics.* | ||
| </td> | ||
| <td>Metrics related to writing data externally (e.g. to a distributed filesystem), defined only | ||
| in tasks with output. | ||
| </tr> | ||
| <tr> | ||
| <td> .bytesWritten</td> | ||
| <td>Total number of bytes written</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .recordsWritten</td> | ||
| <td>Total number of records written</td> | ||
| </tr> | ||
| <tr> | ||
| <td>shuffleReadMetrics.*</td> | ||
| <td>Metrics related to shuffle read operations.</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .recordsRead</td> | ||
| <td>Number of records read in shuffle operations</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .remoteBlocksFetched</td> | ||
| <td>Number of remote blocks fetched in shuffle operations</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .localBlocksFetched</td> | ||
| <td>Number of local (as opposed to read from a remote executor) | ||
| blocks fetched in shuffle operations</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .totalBlocksFetched</td> | ||
| <td>Number of blocks fetched in shuffle operations (both local and remote)</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .remoteBytesRead</td> | ||
| <td>Number of remote bytes read in shuffle operations</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .localBytesRead</td> | ||
| <td>Number of bytes read in shuffle operations from local disk | ||
| (as opposed to read from a remote executor)</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .totalBytesRead</td> | ||
| <td>Number of bytes read in shuffle operations (both local and remote)</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .remoteBytesReadToDisk</td> | ||
| <td>Number of remote bytes read to disk in shuffle operations. | ||
| Large blocks are fetched to disk in shuffle read operations, as opposed to | ||
| being read into memory, which is the default behavior.</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .fetchWaitTime</td> | ||
| <td>Time the task spent waiting for remote shuffle blocks. | ||
| This only includes the time blocking on shuffle input data. | ||
| For instance if block B is being fetched while the task is still not finished | ||
| processing block A, it is not considered to be blocking on block B. | ||
| The value is expressed in milliseconds.</td> | ||
| </tr> | ||
| <tr> | ||
| <td>shuffleWriteMetrics.*</td> | ||
| <td>Metrics related to operations writing shuffle data.</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .bytesWritten</td> | ||
| <td>Number of bytes written in shuffle operations</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .recordsWritten</td> | ||
| <td>Number of records written in shuffle operations</td> | ||
| </tr> | ||
| <tr> | ||
| <td> .writeTime</td> | ||
| <td>Time spent blocking on writes to disk or buffer cache. | ||
| The value is expressed in nanoseconds.</td> | ||
| </tr> | ||
| </table> | ||
|
|
||
|
|
||
|
|
||
| ### API Versioning Policy | ||
|
|
||
| These endpoints have been strongly versioned to make it easier to develop applications on top. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
Timemeanelapsed timeor othertime?