diff --git a/docs/monitoring.md b/docs/monitoring.md index 2717dd091c751..f6d52ef4597e9 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -388,6 +388,158 @@ value triggering garbage collection on jobs, and `spark.ui.retainedStages` that Note that the garbage collection takes place on playback: it is possible to retrieve more entries by increasing these values and restarting the history server. +### Executor Task Metrics + +The REST API exposes the values of the Task Metrics collected by Spark executors with the granularity +of task execution. The metrics can be used for performance troubleshooting and workload characterization. +A list of the available metrics, with a short description: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Spark Executor Task Metric nameShort description
executorRunTimeElapsed time the executor spent running this task. This includes time fetching shuffle data. + The value is expressed in milliseconds.
executorCpuTimeCPU time the executor spent running this task. This includes time fetching shuffle data. + The value is expressed in nanoseconds.
executorDeserializeTimeElapsed time spent to deserialize this task. The value is expressed in milliseconds.
executorDeserializeCpuTimeCPU time taken on the executor to deserialize this task. The value is expressed + in nanoseconds.
resultSizeThe number of bytes this task transmitted back to the driver as the TaskResult.
jvmGCTimeElapsed time the JVM spent in garbage collection while executing this task. + The value is expressed in milliseconds.
resultSerializationTimeElapsed time spent serializing the task result. The value is expressed in milliseconds.
memoryBytesSpilledThe number of in-memory bytes spilled by this task.
diskBytesSpilledThe number of on-disk bytes spilled by this task.
peakExecutionMemoryPeak memory used by internal data structures created during shuffles, aggregations and + joins. The value of this accumulator should be approximately the sum of the peak sizes + across all such data structures created in this task. For SQL jobs, this only tracks all + unsafe operators and ExternalSort.
inputMetrics.*Metrics related to reading data from [[org.apache.spark.rdd.HadoopRDD]] + or from persisted data.
    .bytesReadTotal number of bytes read.
    .recordsReadTotal number of records read.
outputMetrics.*Metrics related to writing data externally (e.g. to a distributed filesystem), + defined only in tasks with output.
    .bytesWrittenTotal number of bytes written
    .recordsWrittenTotal number of records written
shuffleReadMetrics.*Metrics related to shuffle read operations.
    .recordsReadNumber of records read in shuffle operations
    .remoteBlocksFetchedNumber of remote blocks fetched in shuffle operations
    .localBlocksFetchedNumber of local (as opposed to read from a remote executor) blocks fetched + in shuffle operations
    .totalBlocksFetchedNumber of blocks fetched in shuffle operations (both local and remote)
    .remoteBytesReadNumber of remote bytes read in shuffle operations
    .localBytesReadNumber of bytes read in shuffle operations from local disk (as opposed to + read from a remote executor)
    .totalBytesReadNumber of bytes read in shuffle operations (both local and remote)
    .remoteBytesReadToDiskNumber of remote bytes read to disk in shuffle operations. + Large blocks are fetched to disk in shuffle read operations, as opposed to + being read into memory, which is the default behavior.
    .fetchWaitTimeTime the task spent waiting for remote shuffle blocks. + This only includes the time blocking on shuffle input data. + For instance if block B is being fetched while the task is still not finished + processing block A, it is not considered to be blocking on block B. + The value is expressed in milliseconds.
shuffleWriteMetrics.*Metrics related to operations writing shuffle data.
    .bytesWrittenNumber of bytes written in shuffle operations
    .recordsWrittenNumber of records written in shuffle operations
    .writeTimeTime spent blocking on writes to disk or buffer cache. The value is expressed + in nanoseconds.
+ + + ### API Versioning Policy These endpoints have been strongly versioned to make it easier to develop applications on top.