Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Change the instance name for standard pod scraping to be unique
Any of the potentially many containers in a pod can expose one or more
ports with Prometheus metrics. However, with our current target
labels, all of these targets get the same instance label (just the pod
name), which leads to the dreaded `PrometheusOutOfOrderTimestamps`
alert, see grafana/deployment_tools#3441 .

(In fact, if we get the alert, we are already lucky, because the
problem can go unnoticed until someone actually needs one of the time
series that receive samples from different targets, rendering them
useless.)

In practice, we rarely have more than one port to scrape per pod, but
it does happen, and it's totally within the intended usage pattern of
K8s, which means it can happen more at any time.

The two examples I'm aware of:

- Kube-state-metrics (KSM) has only one container it its pod, but that
  container exposes two metrics ports (http-metrics and self-metrics).

- Consul pods run a container with the consul-exporter and a container
  with the statsd-exporter, each exposing their metrics on a different
  port. Both ports are named http-metrics, which is possible because
  they are exposed by different containers. (This is the case that
  triggered the above linked issue.)

To avoid the metric duplication, we could add a container and a port
label, but it is a Prometheus convention that the instance label alone
should be unique within a job.

Which brings us to what I'm proposing in this commit: Create the
instance label by joining pod name, container name, and port name with
`:` in between. In most cases, the resulting instance value will
appear redundant, but I believe the consistency has some
value. Applying same magic to shorten the instance label when possible
would add complexity and remove the consistency.
  • Loading branch information
beorn7 committed May 15, 2020
commit 26402f2538cc1304bfd197c3a771fd2322608b76
23 changes: 17 additions & 6 deletions prometheus-ksonnet/lib/prometheus-config.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -141,10 +141,16 @@
target_label: 'namespace',
},

// Rename instances to be the pod name
// Rename instances to the concatenation of pod:container:port.
// All three components are needed to guarantee a unique instance label.
{
source_labels: ['__meta_kubernetes_pod_name'],
source_labels: [
'__meta_kubernetes_pod_name',
'__meta_kubernetes_pod_container_name',
'__meta_kubernetes_pod_container_port_name',
],
action: 'replace',
separator: ':',
target_label: 'instance',
},

Expand Down Expand Up @@ -192,11 +198,16 @@
action: 'keep',
},

// Rename instances to be the pod name.
// As the scrape two ports of KSM, include the port name in the instance
// name. Otherwise alerts about scrape failures and timeouts won't work.
// Rename instances to the concatenation of pod:container:port.
// In the specific case of KSM, we could leave out the container
// name and still have a unique instance label, but we leave it
// in here for consistency with the normal pod scraping.
{
source_labels: ['__meta_kubernetes_pod_name', '__meta_kubernetes_pod_container_port_name'],
source_labels: [
'__meta_kubernetes_pod_name',
'__meta_kubernetes_pod_container_name',
'__meta_kubernetes_pod_container_port_name',
],
action: 'replace',
separator: ':',
target_label: 'instance',
Expand Down