-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21934][CORE] Expose Shuffle Netty memory usage to MetricsSystem #19160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,19 +19,19 @@ package org.apache.spark.deploy | |
|
|
||
| import javax.annotation.concurrent.ThreadSafe | ||
|
|
||
| import com.codahale.metrics.MetricRegistry | ||
| import com.codahale.metrics.{MetricRegistry, MetricSet} | ||
|
|
||
| import org.apache.spark.metrics.source.Source | ||
| import org.apache.spark.network.shuffle.ExternalShuffleBlockHandler | ||
|
|
||
| /** | ||
| * Provides metrics source for external shuffle service | ||
| */ | ||
| @ThreadSafe | ||
| private class ExternalShuffleServiceSource | ||
| (blockHandler: ExternalShuffleBlockHandler) extends Source { | ||
| private class ExternalShuffleServiceSource extends Source { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just for my own understanding, not directly related to this change -- I hadn't realized that the ExternalShuffleBlockHandler had its own ShuffleMetrics already. Some of those metrics really seem like they should be part of regular shuffle server, in the executor. Eg., openBlockRequestLatencyMillis. Do you know why its separate?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure, maybe it should a part of regular shuffle server. |
||
| override val metricRegistry = new MetricRegistry() | ||
| override val sourceName = "shuffleService" | ||
|
|
||
| metricRegistry.registerAll(blockHandler.getAllMetrics) | ||
| def registerMetricSet(metricSet: MetricSet): Unit = { | ||
| metricRegistry.registerAll(metricSet) | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,10 +29,13 @@ import scala.reflect.ClassTag | |
| import scala.util.Random | ||
| import scala.util.control.NonFatal | ||
|
|
||
| import com.codahale.metrics.{MetricRegistry, MetricSet} | ||
|
|
||
| import org.apache.spark._ | ||
| import org.apache.spark.executor.{DataReadMethod, ShuffleWriteMetrics} | ||
| import org.apache.spark.internal.{config, Logging} | ||
| import org.apache.spark.memory.{MemoryManager, MemoryMode} | ||
| import org.apache.spark.metrics.source.Source | ||
| import org.apache.spark.network._ | ||
| import org.apache.spark.network.buffer.ManagedBuffer | ||
| import org.apache.spark.network.netty.SparkTransportConf | ||
|
|
@@ -248,6 +251,16 @@ private[spark] class BlockManager( | |
| logInfo(s"Initialized BlockManager: $blockManagerId") | ||
| } | ||
|
|
||
| def shuffleMetricsSource: Source = { | ||
| import BlockManager._ | ||
|
|
||
| if (externalShuffleServiceEnabled) { | ||
| new ShuffleMetricsSource("ExternalShuffle", shuffleClient.shuffleMetrics()) | ||
| } else { | ||
| new ShuffleMetricsSource("NettyBlockTransfer", shuffleClient.shuffleMetrics()) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you think we really need to distinguish these two cases? whether or not you have the external shuffle service, this memory is still owned by the executor JVM (its really only external on the remote end).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the external shuffle, we only have Transport client in the executor side, while for
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, I guess I haven't seen enough setups to know how users subscribe to these and whether they'd actually want that distinction. (especially since it seems like some of the distinction probably shouldn't be there, eg. the openBlockLatency metric.) But I don't think there is any clear right answer here, if you think this is the best naming, that is fine with me. |
||
| } | ||
| } | ||
|
|
||
| private def registerWithExternalShuffleServer() { | ||
| logInfo("Registering executor with local external shuffle service.") | ||
| val shuffleConfig = new ExecutorShuffleInfo( | ||
|
|
@@ -1526,4 +1539,12 @@ private[spark] object BlockManager { | |
| } | ||
| blockManagers.toMap | ||
| } | ||
|
|
||
| private class ShuffleMetricsSource( | ||
| override val sourceName: String, | ||
| metricSet: MetricSet) extends Source { | ||
|
|
||
| override val metricRegistry = new MetricRegistry | ||
| metricRegistry.registerAll(metricSet) | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to the change here -- but should we also checkInit in the
close()function?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it should be, but looks like we never touch this issue before.