Skip to content
This repository was archived by the owner on Dec 4, 2024. It is now read-only.

Conversation

@akirillov
Copy link
Contributor

@akirillov akirillov commented May 11, 2019

What changes were proposed in this pull request?

Resolves DCOS-53535

Dedicated StatsD metric reporter with tagging and metric name standardization support.

Original problem:
Spark assigns metric names using spark.app.id and spark.executor.id as a part of them. Thus the number of metrics is continuously growing because those IDs are unique between executions whereas the metrics themselves report the same thing. Another issue which arises here is how to use constantly changing metric names in dashboards.

For example, jvm_heap_used reported by all Spark instances (components):

  • jvm_heap_used (Dispatcher)
  • <spark.app.id>_driver_jvm_heap_used (driver)
  • <spark.app.id>_<spark.executor.id>_jvm_heap_used (executor)

This PR provides PoC implementation of StatsD reporter which removes variable parts of metric names and moves them to tags to ease the pressure on underlying time series database and provide metric names consistent across executions.

Example:

  • before: <spark.app.id>_driver_jvm_heap_used (driver)
    after: driver_jvm_heap_used,instance_type=driver,instance_id=<spark.app.id>
  • before: <spark.app.id>_<spark.executor.id>_jvm_heap_used (executor)
    after: executor_jvm_heap_used,instance_type=executor,instance_id=<spark.executor.id>

How were these changes tested?

By manually running Spark jobs submitted via Dispatcher/Metronome/Marathon/spark-submit and a sample Grafana Dashboard:

image

image

Release Notes

  • consistent metrics naming and tagging support for DCOS monitoring

Copy link
Contributor

@farhan5900 farhan5900 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we try to put filtering work, we need the MetricAttribute enum. I think it would be a good idea to use it here also.

histograms.forEach((name, histogram) -> {
Snapshot snapshot = histogram.getSnapshot();
send(socket,
metricFormatter.buildMetricString(name, "count", histogram.getCount(), GAUGE),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intead of using attribute directly, could we use com.codahale.metrics.MetricAttribute enum for specifying attribute here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, but we depend on Spark transitive Dropwizard dependency which doesn't have it in 3.1.5. Depending on Spark this way allows us to avoid any version conflicts in libraries.

Copy link
Contributor

@farhan5900 farhan5900 May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we won't be able to use latest version of dropwizard-metrics for filtering. It uses MetricAttribute in-built for defining set of attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It looks like dropwizard-metrics requires at least metrics-core:4.02

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest doing the following: separately bump the version of metrics-core in mesosphere/spark to the latest 4.0.5 as per https://github.com/dropwizard/metrics/releases and then add necessary dependencies to statsd-reporter to minimize the impact on Spark codebase itself.

@akirillov akirillov force-pushed the DCOS-53535-POC-for-statsd-reporter branch from d1f9cf3 to fe3319a Compare May 14, 2019 02:10
@akirillov akirillov requested a review from samvantran May 14, 2019 02:23
@akirillov akirillov force-pushed the DCOS-53535-POC-for-statsd-reporter branch 5 times, most recently from 14f0ef0 to 0b36455 Compare May 14, 2019 05:30
@akirillov akirillov force-pushed the DCOS-53535-POC-for-statsd-reporter branch from 0b36455 to 5dfe172 Compare May 15, 2019 17:15
@akirillov
Copy link
Contributor Author

@rpalaznik, @alembiewski, I've fixed the issues you've pointed out at during the review and all CI builds are green so I'm merging this PR.

@akirillov akirillov merged commit 91a9074 into master May 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants