-
Notifications
You must be signed in to change notification settings - Fork 33
[DCOS-53535] Spark StatsD Metrics Reporter #515
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we try to put filtering work, we need the MetricAttribute enum. I think it would be a good idea to use it here also.
| histograms.forEach((name, histogram) -> { | ||
| Snapshot snapshot = histogram.getSnapshot(); | ||
| send(socket, | ||
| metricFormatter.buildMetricString(name, "count", histogram.getCount(), GAUGE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intead of using attribute directly, could we use com.codahale.metrics.MetricAttribute enum for specifying attribute here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great, but we depend on Spark transitive Dropwizard dependency which doesn't have it in 3.1.5. Depending on Spark this way allows us to avoid any version conflicts in libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we won't be able to use latest version of dropwizard-metrics for filtering. It uses MetricAttribute in-built for defining set of attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It looks like dropwizard-metrics requires at least metrics-core:4.02
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest doing the following: separately bump the version of metrics-core in mesosphere/spark to the latest 4.0.5 as per https://github.com/dropwizard/metrics/releases and then add necessary dependencies to statsd-reporter to minimize the impact on Spark codebase itself.
d1f9cf3 to
fe3319a
Compare
14f0ef0 to
0b36455
Compare
spark-statsd-reporter/src/main/java/org/apache/spark/metrics/sink/statsd/InstanceType.java
Outdated
Show resolved
Hide resolved
spark-statsd-reporter/src/main/java/org/apache/spark/metrics/sink/statsd/MetricFormatter.java
Show resolved
Hide resolved
spark-statsd-reporter/src/main/java/org/apache/spark/metrics/sink/statsd/Configuration.java
Outdated
Show resolved
Hide resolved
0b36455 to
5dfe172
Compare
|
@rpalaznik, @alembiewski, I've fixed the issues you've pointed out at during the review and all CI builds are green so I'm merging this PR. |
What changes were proposed in this pull request?
Resolves DCOS-53535
Dedicated StatsD metric reporter with tagging and metric name standardization support.
Original problem:
Spark assigns metric names using
spark.app.idandspark.executor.idas a part of them. Thus the number of metrics is continuously growing because those IDs are unique between executions whereas the metrics themselves report the same thing. Another issue which arises here is how to use constantly changing metric names in dashboards.For example,
jvm_heap_usedreported by all Spark instances (components):jvm_heap_used(Dispatcher)<spark.app.id>_driver_jvm_heap_used(driver)<spark.app.id>_<spark.executor.id>_jvm_heap_used(executor)This PR provides PoC implementation of StatsD reporter which removes variable parts of metric names and moves them to tags to ease the pressure on underlying time series database and provide metric names consistent across executions.
Example:
<spark.app.id>_driver_jvm_heap_used(driver)after:
driver_jvm_heap_used,instance_type=driver,instance_id=<spark.app.id><spark.app.id>_<spark.executor.id>_jvm_heap_used(executor)after:
executor_jvm_heap_used,instance_type=executor,instance_id=<spark.executor.id>How were these changes tested?
By manually running Spark jobs submitted via Dispatcher/Metronome/Marathon/
spark-submitand a sample Grafana Dashboard:Release Notes