diff --git a/docs/monitoring.md b/docs/monitoring.md index 72e4f47e197d..036a575bb861 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -609,7 +609,150 @@ A list of the available metrics, with a short description: +### Executor Metrics +Executor-level metrics are sent from each executor to the driver as part of the Heartbeat to describe the performance metrics of Executor itself like JVM heap memory, GC infomation. Metrics `peakExecutorMetrics.*` are only enabled if `spark.eventLog.logStageExecutorMetrics.enabled` is true. +A list of the available metrics, with a short description: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Executor Level Metric nameShort description
totalGCTimeElapsed time the JVM spent in garbage collection summed in this Executor. + The value is expressed in milliseconds.
totalInputBytesTotal input bytes summed in this Executor.
totalShuffleReadTotal shuffer read bytes summed in this Executor.
totalShuffleWriteTotal shuffer write bytes summed in this Executor.
maxMemoryTotal amount of memory available for storage, in bytes.
memoryMetrics.*Current value of memory metrics:
    .usedOnHeapStorageMemoryUsed on heap memory currently for storage, in bytes.
    .usedOffHeapStorageMemoryUsed off heap memory currently for storage, in bytes.
    .totalOnHeapStorageMemoryTotal available on heap memory for storage, in bytes. This amount can vary over time, on the MemoryManager implementation.
    .totalOffHeapStorageMemoryTotal available off heap memory for storage, in bytes. This amount can vary over time, depending on the MemoryManager implementation.
peakMemoryMetrics.*Peak value of memory (and GC) metrics:
    .JVMHeapMemoryPeak memory usage of the heap that is used for object allocation. + The heap consists of one or more memory pools. The used and committed size of the returned memory usage is the sum of those values of all heap memory pools whereas the init and max size of the returned memory usage represents the setting of the heap memory which may not be the sum of those of all heap memory pools. + The amount of used memory in the returned memory usage is the amount of memory occupied by both live objects and garbage objects that have not been collected, if any.
    .JVMOffHeapMemoryPeak memory usage of non-heap memory that is used by the Java virtual machine. The non-heap memory consists of one or more memory pools. The used and committed size of the returned memory usage is the sum of those values of all non-heap memory pools whereas the init and max size of the returned memory usage represents the setting of the non-heap memory which may not be the sum of those of all non-heap memory pools.
    .OnHeapExecutionMemoryPeak on heap execution memory in use, in bytes.
    .OffHeapExecutionMemoryPeak off heap execution memory in use, in bytes.
    .OnHeapStorageMemoryPeak on heap storage memory in use, in bytes.
    .OffHeapStorageMemoryPeak off heap storage memory in use, in bytes.
    .OnHeapUnifiedMemoryPeak on heap memory (execution and storage).
    .OffHeapUnifiedMemoryPeak off heap memory (execution and storage).
    .DirectPoolMemoryPeak memory that the JVM is using for direct buffer pool ([[java.lang.management.BufferPoolMXBean]])
    .MappedPoolMemoryPeak memory that the JVM is using for mapped buffer pool ([[java.lang.management.BufferPoolMXBean]])
    .ProcessTreeJVMVMemoryVirtual memory size in bytes. Enabled if spark.eventLog.logStageExecutorProcessTreeMetrics.enabled is true.
    .ProcessTreeJVMRSSMemoryResident Set Size: number of pages the process has + in real memory. This is just the pages which count + toward text, data, or stack space. This does not + include pages which have not been demand-loaded in, + or which are swapped out. Enabled if spark.eventLog.logStageExecutorProcessTreeMetrics.enabled is true.
    .ProcessTreePythonVMemoryVirtual memory size for Python in bytes. Enabled if spark.eventLog.logStageExecutorProcessTreeMetrics.enabled is true.
    .ProcessTreePythonRSSMemoryResident Set Size for Python. Enabled if spark.eventLog.logStageExecutorProcessTreeMetrics.enabled is true.
    .ProcessTreeOtherVMemoryVirtual memory size for other kind of process in bytes. Enabled if spark.eventLog.logStageExecutorProcessTreeMetrics.enabled is true.
    .ProcessTreeOtherRSSMemoryResident Set Size for other kind of process. Enabled if spark.eventLog.logStageExecutorProcessTreeMetrics.enabled is true.
    .MinorGCCountTotal minor GC count. For example, the garbage collector is one of Copy, PS Scavenge, ParNew, G1 Young Generation and so on.
    .MinorGCTimeElapsed total minor GC time. + The value is expressed in milliseconds.
    .MajorGCCountTotal major GC count. For example, the garbage collector is one of MarkSweepCompact, PS MarkSweep, ConcurrentMarkSweep, G1 Old Generation and so on.
    .MajorGCTimeElapsed total major GC time. + The value is expressed in milliseconds.
+The computation of RSS and Vmem are based on [proc(5)](http://man7.org/linux/man-pages/man5/proc.5.html) ### API Versioning Policy