From c158ac0c276c671ef5b0ab9eaf37e8f9c27787d1 Mon Sep 17 00:00:00 2001 From: Lantao Jin Date: Thu, 14 Mar 2019 16:46:57 +0800 Subject: [PATCH 1/3] [SPARK-27157][DOCS] Add Executor level metrics to monitoring doc --- docs/monitoring.md | 148 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 148 insertions(+) diff --git a/docs/monitoring.md b/docs/monitoring.md index 72e4f47e197d..1ba54b6a9121 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -609,7 +609,155 @@ A list of the available metrics, with a short description: +### Executor Metrics +Executor-level metrics are sent from each executor to the driver as part of the Heartbeat to describe the performance metrics of Executor itself like JVM heap memory, GC infomation. +A list of the available metrics, with a short description: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Executor Level Metric nameShort description
totalGCTimeElapsed time the JVM spent in garbage collection summed in this Executor. + The value is expressed in milliseconds.
totalInputBytesTotal input bytes summed in this Executor.
totalShuffleReadTotal shuffer read bytes summed in this Executor.
totalShuffleWriteTotal shuffer write bytes summed in this Executor.
maxMemoryTotal amount of memory available for storage, in bytes.
memoryMetrics.*Current value of memory metrics:
    .usedOnHeapStorageMemoryUsed on heap memory currently for storage, in bytes.
    .usedOffHeapStorageMemoryUsed off heap memory currently for storage, in bytes.
    .totalOnHeapStorageMemoryTotal available on heap memory for storage, in bytes. This amount can vary over time, on the MemoryManager implementation.
    .totalOffHeapStorageMemoryTotal available off heap memory for storage, in bytes. This amount can vary over time, depending on the MemoryManager implementation.
peakMemoryMetrics.*Peak value of memory (and GC) metrics:
    .JVMHeapMemoryPeak memory usage of the heap that is used for object allocation. + The heap consists of one or more memory pools. The used and committed size of the returned memory usage is the sum of those values of all heap memory pools whereas the init and max size of the returned memory usage represents the setting of the heap memory which may not be the sum of those of all heap memory pools. + The amount of used memory in the returned memory usage is the amount of memory occupied by both live objects and garbage objects that have not been collected, if any.
    .JVMOffHeapMemoryPeak memory usage of non-heap memory that is used by the Java virtual machine. The non-heap memory consists of one or more memory pools. The used and committed size of the returned memory usage is the sum of those values of all non-heap memory pools whereas the init and max size of the returned memory usage represents the setting of the non-heap memory which may not be the sum of those of all non-heap memory pools.
    .OnHeapExecutionMemoryPeak on heap execution memory in use, in bytes.
    .OffHeapExecutionMemoryPeak off heap execution memory in use, in bytes.
    .OnHeapStorageMemoryPeak on heap storage memory in use, in bytes.
    .OffHeapStorageMemoryPeak off heap storage memory in use, in bytes.
    .OnHeapUnifiedMemoryPeak on heap memory (execution and storage).
    .OffHeapUnifiedMemoryPeak off heap memory (execution and storage).
    .DirectPoolMemoryPeak memory that the JVM is using for direct buffer pool ([[java.lang.management.BufferPoolMXBean]])
    .MappedPoolMemoryPeak memory that the JVM is using for mapped buffer pool ([[java.lang.management.BufferPoolMXBean]])
    .ProcessTreeJVMVMemoryVirtual memory size in bytes.
    .ProcessTreeJVMRSSMemoryResident Set Size: number of pages the process has + in real memory. This is just the pages which count + toward text, data, or stack space. This does not + include pages which have not been demand-loaded in, + or which are swapped out.
    .ProcessTreePythonVMemoryVirtual memory size for Python in bytes.
    .ProcessTreePythonRSSMemoryResident Set Size for Python.
    .ProcessTreeOtherVMemoryVirtual memory size for other kind of process in bytes.
    .ProcessTreeOtherRSSMemoryResident Set Size for other kind of process.
    .MinorGCCountTotal minor GC count. For example, the garbage collector is one of Copy, PS Scavenge, ParNew, G1 Young Generation and so on.
    .MinorGCTimeElapsed total minor GC time. + The value is expressed in milliseconds.
    .MajorGCCountTotal major GC count. For example, the garbage collector is one of MarkSweepCompact, PS MarkSweep, ConcurrentMarkSweep, G1 Old Generation and so on.
    .MajorGCTimeElapsed total major GC time. + The value is expressed in milliseconds.
+The computation of RSS and Vmem are based on [proc(5)](http://man7.org/linux/man-pages/man5/proc.5.html) ### API Versioning Policy From 4049689b4033ac74248ab528da378501f1694b23 Mon Sep 17 00:00:00 2001 From: Lantao Jin Date: Thu, 14 Mar 2019 18:47:46 +0800 Subject: [PATCH 2/3] remove duping tr --- docs/monitoring.md | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/docs/monitoring.md b/docs/monitoring.md index 1ba54b6a9121..43fe6dd35a7f 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -618,33 +618,28 @@ A list of the available metrics, with a short description: Executor Level Metric name Short description - + totalGCTime Elapsed time the JVM spent in garbage collection summed in this Executor. The value is expressed in milliseconds. - - + totalInputBytes Total input bytes summed in this Executor. - - + totalShuffleRead Total shuffer read bytes summed in this Executor. - - + totalShuffleWrite Total shuffer write bytes summed in this Executor. - - + maxMemory Total amount of memory available for storage, in bytes. - - + memoryMetrics.* Current value of memory metrics: @@ -668,7 +663,7 @@ A list of the available metrics, with a short description: peakMemoryMetrics.* Peak value of memory (and GC) metrics: - +     .JVMHeapMemory Peak memory usage of the heap that is used for object allocation. The heap consists of one or more memory pools. The used and committed size of the returned memory usage is the sum of those values of all heap memory pools whereas the init and max size of the returned memory usage represents the setting of the heap memory which may not be the sum of those of all heap memory pools. From 4779f7f269a8c30d07e0dd53dc2d4b4db0732949 Mon Sep 17 00:00:00 2001 From: Lantao Jin Date: Fri, 15 Mar 2019 20:39:59 +0800 Subject: [PATCH 3/3] Add metrics collection condition --- docs/monitoring.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/monitoring.md b/docs/monitoring.md index 43fe6dd35a7f..036a575bb861 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -611,7 +611,7 @@ A list of the available metrics, with a short description: ### Executor Metrics -Executor-level metrics are sent from each executor to the driver as part of the Heartbeat to describe the performance metrics of Executor itself like JVM heap memory, GC infomation. +Executor-level metrics are sent from each executor to the driver as part of the Heartbeat to describe the performance metrics of Executor itself like JVM heap memory, GC infomation. Metrics `peakExecutorMetrics.*` are only enabled if `spark.eventLog.logStageExecutorMetrics.enabled` is true. A list of the available metrics, with a short description: @@ -707,7 +707,7 @@ A list of the available metrics, with a short description: - + @@ -715,23 +715,23 @@ A list of the available metrics, with a short description: in real memory. This is just the pages which count toward text, data, or stack space. This does not include pages which have not been demand-loaded in, - or which are swapped out. + or which are swapped out. Enabled if spark.eventLog.logStageExecutorProcessTreeMetrics.enabled is true. - + - + - + - +
    .ProcessTreeJVMVMemoryVirtual memory size in bytes.Virtual memory size in bytes. Enabled if spark.eventLog.logStageExecutorProcessTreeMetrics.enabled is true.
    .ProcessTreeJVMRSSMemory
    .ProcessTreePythonVMemoryVirtual memory size for Python in bytes.Virtual memory size for Python in bytes. Enabled if spark.eventLog.logStageExecutorProcessTreeMetrics.enabled is true.
    .ProcessTreePythonRSSMemoryResident Set Size for Python.Resident Set Size for Python. Enabled if spark.eventLog.logStageExecutorProcessTreeMetrics.enabled is true.
    .ProcessTreeOtherVMemoryVirtual memory size for other kind of process in bytes.Virtual memory size for other kind of process in bytes. Enabled if spark.eventLog.logStageExecutorProcessTreeMetrics.enabled is true.
    .ProcessTreeOtherRSSMemoryResident Set Size for other kind of process.Resident Set Size for other kind of process. Enabled if spark.eventLog.logStageExecutorProcessTreeMetrics.enabled is true.
    .MinorGCCount