Skip to content

Commit 63974c9

Browse files
committed
keps/../20200415-cardinality-enforcement.md: Adjust wording and clarify a few points
Also removes open questions as those were solved in the discussion.
1 parent 6551a76 commit 63974c9

File tree

1 file changed

+42
-61
lines changed

1 file changed

+42
-61
lines changed

keps/sig-instrumentation/20200415-cardinality-enforcement.md

Lines changed: 42 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,12 @@ owning-sig: sig-instrumentation
66
participating-sigs:
77
- sig-instrumentation
88
reviewers:
9-
- todo
9+
- sig-instrumentation
1010
approvers:
11-
- todo
11+
- sig-instrumentation
1212
editor: todo
1313
creation-date: 2020-04-15
14-
last-updated: 2020-04-15
14+
last-updated: 2020-05-19
1515
status: provisional
1616
---
1717

@@ -33,33 +33,21 @@ status: provisional
3333

3434
## Summary
3535

36-
TLDR; metrics with unbounded dimensions can cause memory issues in the components they instrument.
37-
38-
The simple solution to this problem is to say "don't do that". SIG instrumentation has already explicitly stated this in our instrumentation guidelines: which says that ["one should know a comprehensive list of all possible values for a label at instrumentation time."](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/instrumentation.md#dimensionality--cardinality).
39-
40-
The problem is more complicated. First, SIG Instrumentation doesn't have a way to validate adherence to SIG instrumentation guidelines outside of reviewing PRs with instrumentation changes. Not only is this highly manual, and error-prone, we do not have a terrific existing procedure for ensuring SIG Instrumentation is tagged on relevant PRs. Even if we do have such a mechanism, it isn't a fully sufficient solution because:
41-
42-
1. metrics changes can be seemingly innocous, even to the most diligent of code reviewers (i.e these issues are hard to catch)
43-
2. metrics cardinality issues exist latently; in other words, they're all over the place in Kubernetes already (so even if we could prevent 100% of these occurrence from **now on**, that wouldn't guarantee Kubernetes is free of these classes of issues).
44-
36+
Background for this KEP is that metrics with unbounded dimensions can cause memory issues in the components they instrument. Our proposed solution for this is we want to be able to *dynamically configure* an allowlist of label values for a metric *at runtime*.
4537

4638
## Motivation
4739

48-
TLDR; Having metrics turn into memory leaks sucks and it sucks even more when we can't fix these issues without re-releasing the entire Kubernetes binary.
49-
50-
**Q:** *How have we approached these issues historically?*
51-
52-
**A:** __Unfortunately, not consistently.__
40+
Having metrics turn into memory leaks is a problem, but what is even a bigger problem is when we can't fix these issues without re-releasing the entire Kubernetes binary.
5341

54-
Sometimes, a [metric label dimension is intended to be bound to some known sets of values but coding mistakes cause IDs to be thrown in as a label value](https://github.com/kubernetes/kubernetes/issues/53485).
42+
Historically we have approached these issues in various ways and were not consistent. A few approaches:
5543

56-
In anther case, [we opt to delete the entire metric](https://github.com/kubernetes/kubernetes/pull/74636) because it basically can't be used in a meaningfully correct way.
44+
- Sometimes, a [metric label dimension is intended to be bound to some known sets of values but coding mistakes cause IDs to be thrown in as a label value](https://github.com/kubernetes/kubernetes/issues/53485).
5745

58-
Recently we've opted to both (1) [wholesale delete a metric label](https://github.com/kubernetes/kubernetes/pull/87669) and (2) [retroactively introduce a set of acceptable values for a metric label](https://github.com/kubernetes/kubernetes/pull/87913).
46+
- In another case, [we opt to delete the entire metric](https://github.com/kubernetes/kubernetes/pull/74636) because it basically can't be used in a meaningfully correct way.
5947

60-
Fixing these issues is a currently a manual process, both laborious and time-consuming.
48+
- Recently we've opted to both (1) [wholesale delete a metric label](https://github.com/kubernetes/kubernetes/pull/87669) and (2) [retroactively introduce a set of acceptable values for a metric label](https://github.com/kubernetes/kubernetes/pull/87913).
6149

62-
We don't have a standard prescription for resolving this class of issue. This is especially bad when you consider that this class of issue is so totally predictable (in general).
50+
Fixing these issues is currently a manual process, both laborious and time-consuming. We don't have a standard prescription for resolving this class of issue.
6351

6452
### Goals
6553

@@ -71,19 +59,22 @@ We will expose the machinery and tools to bind a metric's labels to a discrete s
7159

7260
It is *not a goal* to implement and plumb this solution for each Kubernetes component (there are many SIGs and a number of verticals, which may have their own preferred way of doing things). As such it will be up to component owners to leverage this functionality that we provide, by feeding configuration data through whatever mechanism deemed appropriate (i.e. command line flags or reading from a file).
7361

62+
These flags are really only meant to be used as escape hatches, and should not be used to have extremely customized kubernetes setups where our existing dashboards and alerting rule definitions are just not going to apply generally anymore.
63+
7464
## Proposal
7565

76-
TLDR; we want to be able to *dynamically configure* a whitelist of label values for a metric.
66+
The simple solution to this problem would be for each metric added to keep the unbounded dimensions in mind and prevent it from happening. SIG instrumentation has already explicitly stated this in our instrumentation guidelines: which says that ["one should know a comprehensive list of all possible values for a label at instrumentation time."](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/instrumentation.md#dimensionality--cardinality). The problem is more complicated. First, SIG Instrumentation doesn't have a way to validate adherence to SIG instrumentation guidelines outside of reviewing PRs with instrumentation changes. Not only is this highly manual, and error-prone, we do not have a terrific existing procedure for ensuring SIG Instrumentation is tagged on relevant PRs. Even if we do have such a mechanism, it isn't a fully sufficient solution because:
7767

78-
By *dynamically configure*, we mean configure a whitelist *at runtime* rather than during build/compile step (this is so ugh).
68+
1. metrics changes can be seemingly innocuous, even to the most diligent of code reviewers (i.e these issues are hard to catch)
69+
2. metrics cardinality issues exist latently; in other words, they're all over the place in Kubernetes already (so even if we could prevent 100% of these occurrence from **now on**, that wouldn't guarantee Kubernetes is free of these classes of issues).
7970

80-
And by *at runtime*, we mean, more specifically, during the boot sequence for a Kubernetes component (and we mean daemons here, not cli tools like kubectl).
71+
Instead, the proposed solution is we will be able to *dynamically configure* an allowlist of label values for a metric. By *dynamically configure*, we mean configure an allowlist *at runtime* rather than during the build/compile step. And by *at runtime*, we mean, more specifically, during the boot sequence for a Kubernetes component (and we mean daemons here, not CLI tools like kubectl).
8172

82-
Brief aside: a Kubernetes component (which is a daemon) is an executable, which you can launch from the command line manually if you so desired. Components take a number of start-up configuration flags, which are passed into the component to modify execution paths (if curious, you can check out the [zillion flags we have on the kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/)). It is also possible to read configuration data from files (like yamls) during the component boot sequence. This KEP does not have an opinion on the specific mechanism used to load config data into a Kubernetes binary during the boot sequence. What we *actually* care about, is just the fact that it is possible.
73+
Brief aside: a Kubernetes component (which is a daemon) is an executable, which can be launched from the command line manually if desired. Components take a number of start-up configuration flags, which are passed into the component to modify execution paths (if curious, check out the [large amount of flags we have on the kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/)). It is also possible to read configuration data from files (like yaml format) during the component boot sequence. This KEP does not have an opinion on the specific mechanism used to load config data into a Kubernetes binary during the boot sequence. What we *actually* care about, is just the fact that it is possible.
8374

84-
Our design, is thus config-ingestion agnostic.
75+
Our design is thus config-ingestion agnostic.
8576

86-
Our design is also based on the premise that metrics can be uniquely identified (i.e. by their metric descriptor). Fortunately for us, this is actually a built in constraint of prometheus clients (which is how we instrument the Kubernetes components). This metric ID is resolvable to a unique string (this is pretty evident when you look at a raw prometheus client endpoint).
77+
Our design is also based on the premise that metrics can be uniquely identified (i.e. by their metric descriptor). Fortunately for us, this is actually a built in constraint of prometheus clients (which is how we instrument the Kubernetes components). This metric ID is resolvable to a unique string (this is pretty evident when looking at a raw prometheus client endpoint).
8778

8879
We want to provide a data structure which basically maps a unique metric ID and one of it's labels to a bounded set of values.
8980

@@ -119,7 +110,7 @@ And we want to express a set of expected values:
119110
{
120111
"metric-id": "some_metric",
121112
"label": "label_too_many_values",
122-
"labelValueWhitelist": [
113+
"labelValueAllowlist": [
123114
"1",
124115
"2",
125116
"3"
@@ -129,13 +120,13 @@ And we want to express a set of expected values:
129120

130121
```
131122

132-
Since we already have an interception layer built into our Kubernetes monitoring stack (from the metrics stability effort), we can leverage existing wrappers to provide a global entrypoint for intercepting and enforcing rules for individual metrics.
123+
Since we already have an interception layer built into our Kubernetes monitoring stack (from the metrics stability effort), we can leverage existing wrappers to provide a global entrypoint for intercepting and enforcing rules for individual metrics.
133124

134-
**Q:** *But won't this invalidate our metric data?*
125+
As a result metric data will not be invalid, just data will be more coarse when the metric type is a counter. In the case of a gauge type metric the data will be invalid but the number of series would be bound.
135126

136-
**A:** __It shouldn't invalidate metric data. But metric data may be more coarse.__
127+
Following is an example used to demonstrate the proposed solution.
137128

138-
It's easier to demonstrate our strategy than explain stuff out in written word, so we're going to just do that. Let's say we have some client code somewhere which instantiates our metric from above:
129+
Let's say we have some client code somewhere which instantiates our metric from above:
139130

140131
```golang
141132
SomeMetric = metrics.NewCounterVec(
@@ -169,7 +160,7 @@ some_metric{label_too_many_values="1"} 1
169160
some_metric{label_too_many_values="2"} 1
170161
some_metric{label_too_many_values="3"} 1
171162
```
172-
This would not change. What would change is if we encounter label values outside our explicit whitelist of values. So if where to encounter this malicious piece of code:
163+
This would not change. What would change is if we encounter label values outside our explicit allowlist of values. So if where to encounter this malicious piece of code:
173164

174165

175166
```golang
@@ -180,7 +171,7 @@ This would not change. What would change is if we encounter label values outside
180171
}
181172
```
182173

183-
Then in existing Kubernetes components, we would have a terrible memory leak, since we would effectively create a million metrics (one per unique label value). If we curled our prometheus endpoint it would thus look something like this:
174+
Then in existing Kubernetes components, we would have a terrible memory leak, since we would effectively create a million metrics (one per unique label value). If we curled our metrics endpoint it would thus look something like this:
184175

185176
```prometheus
186177
# HELP some_metric alalalala
@@ -190,11 +181,14 @@ some_metric{label_too_many_values="2"} 1
190181
some_metric{label_too_many_values="3"} 1
191182
some_metric{label_too_many_values="4"} 1
192183
... //
184+
... //
193185
... // zillion metrics here
194186
some_metric{label_too_many_values="1000003"} 1
195187
```
196188

197-
That would suck. With our cardinality enforcer in place, we would expect the output to look like this:
189+
In total we would have more than a million of series for one metric, which is known to cause a memory problem.
190+
191+
With our cardinality enforcer in place, we would expect the output to look like this:
198192

199193
```prometheus
200194
# HELP some_metric alalalala
@@ -205,7 +199,8 @@ some_metric{label_too_many_values="3"} 1
205199
some_metric{label_too_many_values="unexpected"} 1000000
206200
```
207201

208-
We can effect this change by registering a whitelist to our prom registries during the boot-up sequence. Since we intercept metric registration events and have a wrapper around each of the primitive prometheus metric types, we can effectively add some code that looks like this (disclaimer: the below is pseudocode for demonstration):
202+
We can effect this change by registering a allowlist to our prometheus registries during the boot-up sequence. Since we intercept metric registration events and have a wrapper around each of the primitive prometheus metric types, we can effectively add some code that looks like this (disclaimer: the below is pseudocode for demonstration):
203+
209204

210205
```golang
211206
const (
@@ -220,11 +215,11 @@ func (v *CounterVec) WithLabelValues(lvs ...string) CounterMetric {
220215
newLabels = make([]string, len(lvs))
221216
for i, l := range lvs {
222217

223-
// do we have a whitelist on any of the labels for this metric?
224-
if metricLabelWhitelist, ok := MetricsLabelWhitelist[BuildFQName(v.CounterOpts.Namespace, v.CounterOpts.Subsystem, v.CounterOpts.Name)]; ok {
225-
// do we have a whitelist for this label on this metric?
226-
if whitelist, ok := metricLabelWhitelist[v.originalLabels[i]]; ok {
227-
if whitelist.Has(l) {
218+
// do we have an allowlist on any of the labels for this metric?
219+
if metricLabelAllowlist, ok := MetricsLabelAllowlist[BuildFQName(v.CounterOpts.Namespace, v.CounterOpts.Subsystem, v.CounterOpts.Name)]; ok {
220+
// do we have an allowlist for this label on this metric?
221+
if allowlist, ok := metricLabelAllowlist[v.originalLabels[i]]; ok {
222+
if allowllist.Has(l) {
228223
newLabels[i] = l
229224
continue
230225
}
@@ -238,42 +233,27 @@ func (v *CounterVec) WithLabelValues(lvs ...string) CounterMetric {
238233
}
239234
```
240235

241-
This design allows us to optionally adopt @lilic's excellent idea about simplifying the interface for component owners, who can then opt to just specify a metric and label pair *without* having to specify a whitelist. Personally, I like that idea since it simplifies how a component owner can implement our cardinality enforcing helpers without having to necessary plumb through complicated maps. This would make it considerably easier to feed this data in through the command line since you could do something like this:
236+
This design allows us to optionally adopt the idea about simplifying the interface for component owners, who can then opt to just specify a metric and label pair *without* having to specify an allowlist. The good part of this idea is it simplifies how a component owner can implement our cardinality enforcing helpers without having to necessary plumb through complicated maps. This would make it considerably easier to feed this data in through the command line since it can be done like this:
242237

243238
```bash
244239
$ kube-apiserver --accepted-metric-labels "some_metric=label_too_many_values"
245240
```
246241

247-
..which would then be interpreted by our machinery as this:
242+
This would then be interpreted by our machinery as this:
248243

249244

250245
```json
251246
[
252247
{
253248
"metric-id": "some_metric",
254249
"label": "label_too_many_values",
255-
"labelValueWhitelist": []
250+
"labelValueAllowlist": []
256251
}
257252
]
258253

259254
```
260255

261256
## Open-Question
262-
_(Discussion Points which need to be resolved prior to merge)_
263-
264-
- @dashpole
265-
266-
> Should have labels with a specific set of values, should we start enforcing that all metrics have a whitelist at compile-time?
267-
268-
- @x13n
269-
270-
> ... instead of getting label_too_many_values right away, [enforcing the cardinality limit directly] would still work until certain label cardinality limit is reached. Whitelisting would guarantee a value will not be dropped, but other values wouldn't be dropped either unless there is too many of them. Cluster admin can configure the per metric and per label limits once and can get alerted on "some metric labels are dropped" instead of "your metrics storage is getting blown up".
271-
272-
- @brancz/@lilic
273-
274-
> Potentially we would want to treat buckets completely separately (as in a separate flag just for bucket configuration of histograms). @bboreham opened the original PR for apiserver request duration bucket reduction, maybe he has some input as well.
275-
>
276-
> My biggest concern I think with all of this is, it's going to be super easy to have extremely customized kubernetes setups where our existing dashboards and alerting rule definitions are just not going to apply generally anymore. I'd like to make sure we emphasize that these flags are really only meant to be used as escape hatches, and we must always strive to truly fix the root of the issue.
277257

278258

279259
## Graduation Criteria
@@ -287,4 +267,5 @@ todo
287267

288268
## Implementation History
289269

290-
todo
270+
todo
271+

0 commit comments

Comments
 (0)