[DCOS-53535] Spark StatsD Metrics Reporter #515

akirillov · 2019-05-11T02:54:28Z

What changes were proposed in this pull request?

Dedicated StatsD metric reporter with tagging and metric name standardization support.

Original problem:
Spark assigns metric names using spark.app.id and spark.executor.id as a part of them. Thus the number of metrics is continuously growing because those IDs are unique between executions whereas the metrics themselves report the same thing. Another issue which arises here is how to use constantly changing metric names in dashboards.

For example, jvm_heap_used reported by all Spark instances (components):

jvm_heap_used (Dispatcher)
<spark.app.id>_driver_jvm_heap_used (driver)
<spark.app.id>_<spark.executor.id>_jvm_heap_used (executor)

This PR provides PoC implementation of StatsD reporter which removes variable parts of metric names and moves them to tags to ease the pressure on underlying time series database and provide metric names consistent across executions.

Example:

before: <spark.app.id>_driver_jvm_heap_used (driver)
after: driver_jvm_heap_used,instance_type=driver,instance_id=<spark.app.id>
before: <spark.app.id>_<spark.executor.id>_jvm_heap_used (executor)
after: executor_jvm_heap_used,instance_type=executor,instance_id=<spark.executor.id>

How were these changes tested?

By manually running Spark jobs submitted via Dispatcher/Metronome/Marathon/spark-submit and a sample Grafana Dashboard:

Release Notes

consistent metrics naming and tagging support for DCOS monitoring

farhan5900

When we try to put filtering work, we need the MetricAttribute enum. I think it would be a good idea to use it here also.

farhan5900 · 2019-05-13T08:13:52Z

spark-statsd-reporter/src/main/java/org/apache/spark/metrics/sink/statsd/StatsdReporter.java

+        histograms.forEach((name, histogram) -> {
+            Snapshot snapshot = histogram.getSnapshot();
+            send(socket,
+                    metricFormatter.buildMetricString(name, "count", histogram.getCount(), GAUGE),


Intead of using attribute directly, could we use com.codahale.metrics.MetricAttribute enum for specifying attribute here?

Sounds great, but we depend on Spark transitive Dropwizard dependency which doesn't have it in 3.1.5. Depending on Spark this way allows us to avoid any version conflicts in libraries.

In this case, we won't be able to use latest version of dropwizard-metrics for filtering. It uses MetricAttribute in-built for defining set of attributes.

Good point. It looks like dropwizard-metrics requires at least metrics-core:4.02

I suggest doing the following: separately bump the version of metrics-core in mesosphere/spark to the latest 4.0.5 as per https://github.com/dropwizard/metrics/releases and then add necessary dependencies to statsd-reporter to minimize the impact on Spark codebase itself.

spark-statsd-reporter/src/main/java/org/apache/spark/metrics/sink/statsd/InstanceType.java

spark-statsd-reporter/src/main/java/org/apache/spark/metrics/sink/statsd/MetricFormatter.java

spark-statsd-reporter/src/main/java/org/apache/spark/metrics/sink/statsd/Configuration.java

akirillov · 2019-05-16T03:59:00Z

@rpalaznik, @alembiewski, I've fixed the issues you've pointed out at during the review and all CI builds are green so I'm merging this PR.

akirillov added the wip label May 11, 2019

akirillov requested review from alembiewski, farhan5900 and rpalaznik May 11, 2019 02:54

farhan5900 reviewed May 13, 2019

View reviewed changes

akirillov force-pushed the DCOS-53535-POC-for-statsd-reporter branch from d1f9cf3 to fe3319a Compare May 14, 2019 02:10

akirillov added ready for review and removed wip labels May 14, 2019

akirillov requested a review from samvantran May 14, 2019 02:23

akirillov mentioned this pull request May 14, 2019

Improved naming for Mesos metrics and metric sources d2iq-archive/spark#58

Merged

akirillov force-pushed the DCOS-53535-POC-for-statsd-reporter branch 5 times, most recently from 14f0ef0 to 0b36455 Compare May 14, 2019 05:30

rpalaznik reviewed May 14, 2019

View reviewed changes

spark-statsd-reporter/src/main/java/org/apache/spark/metrics/sink/statsd/InstanceType.java Outdated Show resolved Hide resolved

spark-statsd-reporter/src/main/java/org/apache/spark/metrics/sink/statsd/MetricFormatter.java Show resolved Hide resolved

alembiewski reviewed May 15, 2019

View reviewed changes

spark-statsd-reporter/src/main/java/org/apache/spark/metrics/sink/statsd/Configuration.java Outdated Show resolved Hide resolved

rpalaznik approved these changes May 15, 2019

View reviewed changes

DCOS-53535-StatsD-mterics-reporter-for-Spark

5dfe172

akirillov force-pushed the DCOS-53535-POC-for-statsd-reporter branch from 0b36455 to 5dfe172 Compare May 15, 2019 17:15

akirillov merged commit 91a9074 into master May 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DCOS-53535] Spark StatsD Metrics Reporter #515

[DCOS-53535] Spark StatsD Metrics Reporter #515

Uh oh!

akirillov commented May 11, 2019 •

edited

Loading

Uh oh!

farhan5900 left a comment •

edited

Loading

Uh oh!

farhan5900 May 13, 2019

Uh oh!

akirillov May 13, 2019

Uh oh!

farhan5900 May 14, 2019 •

edited

Loading

Uh oh!

akirillov May 14, 2019

Uh oh!

akirillov May 14, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akirillov commented May 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[DCOS-53535] Spark StatsD Metrics Reporter #515

[DCOS-53535] Spark StatsD Metrics Reporter #515

Uh oh!

Conversation

akirillov commented May 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How were these changes tested?

Release Notes

Uh oh!

farhan5900 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

farhan5900 May 13, 2019

Choose a reason for hiding this comment

Uh oh!

akirillov May 13, 2019

Choose a reason for hiding this comment

Uh oh!

farhan5900 May 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akirillov May 14, 2019

Choose a reason for hiding this comment

Uh oh!

akirillov May 14, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akirillov commented May 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

akirillov commented May 11, 2019 •

edited

Loading

farhan5900 left a comment •

edited

Loading

farhan5900 May 14, 2019 •

edited

Loading