Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add comment about monotonic gauge.
  • Loading branch information
Roman S. Borschel committed Aug 12, 2020
commit 3ea8dd9fb7f23af09cf51201dc937e0c40fa01b5
2 changes: 2 additions & 0 deletions client/network/src/service.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1133,6 +1133,8 @@ struct Metrics {
kbuckets_num_nodes: GaugeVec<U64>,
listeners_local_addresses: Gauge<U64>,
listeners_errors_total: Counter<U64>,
// Note: `network_bytes_total` is a monotonic gauge obtained by
// sampling an existing counter.
network_bytes_total: GaugeVec<U64>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
network_bytes_total: GaugeVec<U64>,
network_bytes_total: CounterVec<U64>,

Gauge is for metrics that go up and down, and Counter is for metrics that only ever increase over time except for restarts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm mistaken, but from the prometheus crate docs, counters have essentially inc() and inc_by(). Gauges can be set, which is what we need to do for these totals, i.e. these are counter values from existing counters (the running totals) reported as a gauge over time, so this is a monotonic gauge. I'm happy to be corrected.

Copy link
Contributor

@tomaka tomaka Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The problem is that this is not only a change in the programmatic API of the prometheus library.
Whether a metric is a counter or a gauge is ultimately reported to the Prometheus server running as a separate process, and then (I think?) to Grafana.

Since we know that the value can only every increase, we could use Counter::get to get the current value, then call inc_by.
Doing this would be racy if there were multiple clones of that Counter, but since we only ever access that Counter from a single location, we know that nothing will modify the counter between the call to get and the call to inc_by.

@mxinden Any opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem of updating a counter from an existing counter is often discussed in the Prometheus ecosystem. In general we suggest to implement a custom collector, see robustperception for general information, tikv/rust-prometheus#303 specifically for the Rust client and my comment at the very end tikv/rust-prometheus#303 (comment).

The reason why Counter does not implement set is, that it allows newcomers to easily shoot themselfes in the foot and thus it was not deemed worth the additional ergonomic for this advanced use case.

I see 3 ways moving forward:

  • Keep it as a Gauge. This would expose the metric with the TYPE gauge header. As @tomaka mentioned above this is ingested by Prometheus and forwarded to Grafana, but only used for query suggestions, thus not a big issue.

  • Use a Counter. As mentioned by @tomaka this is potentially racy if it ever gets set concurrently.

  • Use a custom impl Collector. Quite verbose.

I would be fine with using a Gauge here for now. I would just ask to add a comment above stating that it is monotonic. In the future we can look into a custom Collector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mxinden Thanks for your input.

I would be fine with using a Gauge here for now. I would just ask to add a comment above stating that it is monotonic. In the future we can look into a custom Collector.

I added a comment to the code. I don't have a strong preference in any case.

Out of curiosity, if gauges "must go up and down" and counters "must only increase", what do you use for a monotonically decreasing metric? I'm not sure I understand the significance of the discussions of counter vs gauge. It would almost seem more sensible to directly offer and use Atomic for both use-cases, designating it as a "gauge" or "counter" based on the semantic meaning of the metric. I guess I just don't find the distinction between gauge and counter very useful, technically, especially if the counter can "jump", both forward and back to 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you use for a monotonically decreasing metric

So far I have never come across a use case for a monotonically decreasing metric. Can you come up with something that can not also be modeled by inverting a monotonically increasing counter?

I'm not sure I understand the significance of the discussions of counter vs gauge.

The benefit of differentiating the two comes into play only at query (aggregation time). E.g. one can apply the rate function to a counter knowing that the underlying data never breaks monotonicity. I think the best summary is given by robust perception here.

Let me know if the above makes sense @romanb. More than happy to put more effort into a detailed explanation.

Copy link
Contributor Author

@romanb romanb Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I have never come across a use case for a monotonically decreasing metric. [..]

A metric for a bounded resource with fixed capacity and its exhaustion over time? I don't know. The restriction not to be able to do so just seems a bit arbitrary.

The benefit of differentiating the two comes into play only at query (aggregation time). E.g. one can apply the rate function to a counter knowing that the underlying data never breaks monotonicity. I think the best summary is given by robust perception here.

Ok, so there are special query functions with specific characteristics for gauges and counters, like rate() (counter) vs deriv() for gauges, both essentially yielding the derivative but the former makes special use of counter properties to better handle things like resets on restarts. Thanks for the pointers @mxinden.

notifications_sizes: HistogramVec,
notifications_streams_closed_total: CounterVec<U64>,
Expand Down