keps/../20200415-cardinality-enforcement.md: Adjust wording and clarify a few points

lilic · lilic · commit 63974c9762a7 · 2020-05-19T16:24:58.000+02:00
Also removes open questions as those were solved in the discussion.
diff --git a/keps/sig-instrumentation/20200415-cardinality-enforcement.md b/keps/sig-instrumentation/20200415-cardinality-enforcement.md
@@ -6,12 +6,12 @@ owning-sig: sig-instrumentation
 participating-sigs:
   - sig-instrumentation
 reviewers:
-  - todo
+  - sig-instrumentation
 approvers:
-  - todo
+  - sig-instrumentation
 editor: todo
 creation-date: 2020-04-15
-last-updated: 2020-04-15
+last-updated: 2020-05-19
 status: provisional
 ---
 
@@ -33,33 +33,21 @@ status: provisional
 
 ## Summary
 
-TLDR; metrics with unbounded dimensions can cause memory issues in the components they instrument.
-
-The simple solution to this problem is to say "don't do that". SIG instrumentation has already explicitly stated this in our instrumentation guidelines: which says that ["one should know a comprehensive list of all possible values for a label at instrumentation time."](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/instrumentation.md#dimensionality--cardinality).
-
-The problem is more complicated. First, SIG Instrumentation doesn't have a way to validate adherence to SIG instrumentation guidelines outside of reviewing PRs with instrumentation changes. Not only is this highly manual, and error-prone, we do not have a terrific existing procedure for ensuring SIG Instrumentation is tagged on relevant PRs. Even if we do have such a mechanism, it isn't a fully sufficient solution because:
-
-1. metrics changes can be seemingly innocous, even to the most diligent of code reviewers (i.e these issues are hard to catch)
-2. metrics cardinality issues exist latently; in other words, they're all over the place in Kubernetes already (so even if we could prevent 100% of these occurrence from **now on**, that wouldn't guarantee Kubernetes is free of these classes of issues).
-
+Background for this KEP is that metrics with unbounded dimensions can cause memory issues in the components they instrument. Our proposed solution for this is we want to be able to *dynamically configure* an allowlist of label values for a metric *at runtime*.
 
 ## Motivation
 
-TLDR; Having metrics turn into memory leaks sucks and it sucks even more when we can't fix these issues without re-releasing the entire Kubernetes binary.
-
-**Q:** *How have we approached these issues historically?*
-
-**A:** __Unfortunately, not consistently.__
+Having metrics turn into memory leaks is a problem, but what is even a bigger problem is  when we can't fix these issues without re-releasing the entire Kubernetes binary.
 
-Sometimes, a [metric label dimension is intended to be bound to some known sets of values but coding mistakes cause IDs to be thrown in as a label value](https://github.com/kubernetes/kubernetes/issues/53485).
+Historically we have approached these issues in various ways and were not consistent. A few approaches:
 
-In anther case, [we opt to delete the entire metric](https://github.com/kubernetes/kubernetes/pull/74636) because it basically can't be used in a meaningfully correct way.
+- Sometimes, a [metric label dimension is intended to be bound to some known sets of values but coding mistakes cause IDs to be thrown in as a label value](https://github.com/kubernetes/kubernetes/issues/53485).
 
-Recently we've opted to both (1) [wholesale delete a metric label](https://github.com/kubernetes/kubernetes/pull/87669) and (2) [retroactively introduce a set of acceptable values for a metric label](https://github.com/kubernetes/kubernetes/pull/87913).
+- In another case, [we opt to delete the entire metric](https://github.com/kubernetes/kubernetes/pull/74636) because it basically can't be used in a meaningfully correct way.
 
-Fixing these issues is a currently a manual process, both laborious and time-consuming.
+- Recently we've opted to both (1) [wholesale delete a metric label](https://github.com/kubernetes/kubernetes/pull/87669) and (2) [retroactively introduce a set of acceptable values for a metric label](https://github.com/kubernetes/kubernetes/pull/87913).
 
-We don't have a standard prescription for resolving this class of issue. This is especially bad when you consider that this class of issue is so totally predictable (in general).
+Fixing these issues is currently a manual process, both laborious and time-consuming. We don't have a standard prescription for resolving this class of issue.
 
 ### Goals
 
@@ -71,19 +59,22 @@ We will expose the machinery and tools to bind a metric's labels to a discrete s
 
 It is *not a goal* to implement and plumb this solution for each Kubernetes component (there are many SIGs and a number of verticals, which may have their own preferred way of doing things). As such it will be up to component owners to leverage this functionality that we provide, by feeding configuration data through whatever mechanism deemed appropriate (i.e. command line flags or reading from a file).
 
+These flags are really only meant to be used as escape hatches, and should not be used to have extremely customized kubernetes setups where our existing dashboards and alerting rule definitions are just not going to apply generally anymore.
+
 ## Proposal
 
-TLDR; we want to be able to *dynamically configure* a whitelist of label values for a metric.
+The simple solution to this problem would be for each metric added to keep the unbounded dimensions in mind and prevent it from happening. SIG instrumentation has already explicitly stated this in our instrumentation guidelines: which says that ["one should know a comprehensive list of all possible values for a label at instrumentation time."](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/instrumentation.md#dimensionality--cardinality). The problem is more complicated. First, SIG Instrumentation doesn't have a way to validate adherence to SIG instrumentation guidelines outside of reviewing PRs with instrumentation changes. Not only is this highly manual, and error-prone, we do not have a terrific existing procedure for ensuring SIG Instrumentation is tagged on relevant PRs. Even if we do have such a mechanism, it isn't a fully sufficient solution because:
 
-By *dynamically configure*, we mean configure a whitelist *at runtime* rather than during build/compile step (this is so ugh).
+1. metrics changes can be seemingly innocuous, even to the most diligent of code reviewers (i.e these issues are hard to catch)
+2. metrics cardinality issues exist latently; in other words, they're all over the place in Kubernetes already (so even if we could prevent 100% of these occurrence from **now on**, that wouldn't guarantee Kubernetes is free of these classes of issues).
 
-And by *at runtime*, we mean, more specifically, during the boot sequence for a Kubernetes component (and we mean daemons here, not cli tools like kubectl).
+Instead, the proposed solution is we will be able to *dynamically configure* an allowlist of label values for a metric. By *dynamically configure*, we mean configure an allowlist *at runtime* rather than during the build/compile step. And by *at runtime*, we mean, more specifically, during the boot sequence for a Kubernetes component (and we mean daemons here, not CLI tools like kubectl).
 
-Brief aside: a Kubernetes component (which is a daemon) is an executable, which you can launch from the command line manually if you so desired. Components take a number of start-up configuration flags, which are passed into the component to modify execution paths (if curious, you can check out the [zillion flags we have on the kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/)). It is also possible to read configuration data from files (like yamls) during the component boot sequence. This KEP does not have an opinion on the specific mechanism used to load config data into a Kubernetes binary during the boot sequence. What we *actually* care about, is just the fact that it is possible.
+Brief aside: a Kubernetes component (which is a daemon) is an executable, which can be launched from the command line manually if desired. Components take a number of start-up configuration flags, which are passed into the component to modify execution paths (if curious, check out the [large amount of flags we have on the kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/)). It is also possible to read configuration data from files (like yaml format) during the component boot sequence. This KEP does not have an opinion on the specific mechanism used to load config data into a Kubernetes binary during the boot sequence. What we *actually* care about, is just the fact that it is possible.
 
-Our design, is thus config-ingestion agnostic.
+Our design is thus config-ingestion agnostic.
 
-Our design is also based on the premise that metrics can be uniquely identified (i.e. by their metric descriptor). Fortunately for us, this is actually a built in constraint of prometheus clients (which is how we instrument the Kubernetes components). This metric ID is resolvable to a unique string (this is pretty evident when you look at a raw prometheus client endpoint).
+Our design is also based on the premise that metrics can be uniquely identified (i.e. by their metric descriptor). Fortunately for us, this is actually a built in constraint of prometheus clients (which is how we instrument the Kubernetes components). This metric ID is resolvable to a unique string (this is pretty evident when looking at a raw prometheus client endpoint).
 
 We want to provide a data structure which basically maps a unique metric ID and one of it's labels to a bounded set of values.
 
@@ -119,7 +110,7 @@ And we want to express a set of expected values:
 	{
 		"metric-id": "some_metric",
 		"label": "label_too_many_values",
-		"labelValueWhitelist": [
+		"labelValueAllowlist": [
 			"1",
 			"2",
 			"3"
@@ -129,13 +120,13 @@ And we want to express a set of expected values:
 
 ```
 
-Since we already have an interception layer built into our Kubernetes monitoring stack (from the metrics stability effort), we can leverage existing wrappers to provide a global entrypoint for intercepting and enforcing rules for individual metrics.
+Since we already have an interception layer built into our Kubernetes monitoring stack (from the metrics stability effort), we can leverage existing wrappers to provide a global entrypoint for intercepting and enforcing rules for individual metrics. 
 
-**Q:** *But won't this invalidate our metric data?*
+As a result metric data will not be invalid, just data will be more coarse when the metric type is a counter. In the case of a gauge type metric the data will be invalid but the number of series would be bound.
 
-**A:** __It shouldn't invalidate metric data. But metric data may be more coarse.__
+Following is an example used to demonstrate the proposed solution. 
 
-It's easier to demonstrate our strategy than explain stuff out in written word, so we're going to just do that. Let's say we have some client code somewhere which instantiates our metric from above:
+Let's say we have some client code somewhere which instantiates our metric from above:
 
 ```golang
 	SomeMetric = metrics.NewCounterVec(
@@ -169,7 +160,7 @@ some_metric{label_too_many_values="1"} 1
 some_metric{label_too_many_values="2"} 1
 some_metric{label_too_many_values="3"} 1
 ```
-This would not change. What would change is if we encounter label values outside our explicit whitelist of values. So if where to encounter this malicious piece of code:
+This would not change. What would change is if we encounter label values outside our explicit allowlist of values. So if where to encounter this malicious piece of code:
 
 
 ```golang
@@ -180,7 +171,7 @@ This would not change. What would change is if we encounter label values outside
 	}
 ```
 
-Then in existing Kubernetes components, we would have a terrible memory leak, since we would effectively create a million metrics (one per unique label value). If we curled our prometheus endpoint it would thus look something like this:
+Then in existing Kubernetes components, we would have a terrible memory leak, since we would effectively create a million metrics (one per unique label value). If we curled our metrics endpoint it would thus look something like this:
 
 ```prometheus
 # HELP some_metric alalalala
@@ -190,11 +181,14 @@ some_metric{label_too_many_values="2"} 1
 some_metric{label_too_many_values="3"} 1
 some_metric{label_too_many_values="4"} 1
 ... //
+... //
 ... // zillion metrics here
 some_metric{label_too_many_values="1000003"} 1
 ```
 
-That would suck. With our cardinality enforcer in place, we would expect the output to look like this:
+In total we would have more than a million of series for one metric, which is known to cause a memory problem.
+
+With our cardinality enforcer in place, we would expect the output to look like this:
 
 ```prometheus
 # HELP some_metric alalalala
@@ -205,7 +199,8 @@ some_metric{label_too_many_values="3"} 1
 some_metric{label_too_many_values="unexpected"} 1000000
 ```
 
-We can effect this change by registering a whitelist to our prom registries during the boot-up sequence. Since we intercept metric registration events and have a wrapper around each of the primitive prometheus metric types, we can effectively add some code that looks like this (disclaimer: the below is pseudocode for demonstration):
+We can effect this change by registering a allowlist to our prometheus registries during the boot-up sequence. Since we intercept metric registration events and have a wrapper around each of the primitive prometheus metric types, we can effectively add some code that looks like this (disclaimer: the below is pseudocode for demonstration):
+
 
 ```golang
 const (
@@ -220,11 +215,11 @@ func (v *CounterVec) WithLabelValues(lvs ...string) CounterMetric {
 	newLabels = make([]string, len(lvs))
 	for i, l := range lvs {
 
-		// do we have a whitelist on any of the labels for this metric?
-		if metricLabelWhitelist, ok := MetricsLabelWhitelist[BuildFQName(v.CounterOpts.Namespace, v.CounterOpts.Subsystem, v.CounterOpts.Name)]; ok {
-			// do we have a whitelist for this label on this metric?
-			if whitelist, ok := metricLabelWhitelist[v.originalLabels[i]]; ok {
-				if whitelist.Has(l) {
+		// do we have an allowlist on any of the labels for this metric?
+		if metricLabelAllowlist, ok := MetricsLabelAllowlist[BuildFQName(v.CounterOpts.Namespace, v.CounterOpts.Subsystem, v.CounterOpts.Name)]; ok {
+			// do we have an allowlist for this label on this metric?
+			if allowlist, ok := metricLabelAllowlist[v.originalLabels[i]]; ok {
+				if allowllist.Has(l) {
 					newLabels[i] = l
 					continue
 				}
@@ -238,42 +233,27 @@ func (v *CounterVec) WithLabelValues(lvs ...string) CounterMetric {
 }
 ```
 
-This design allows us to optionally adopt @lilic's excellent idea about simplifying the interface for component owners, who can then opt to just specify a metric and label pair *without* having to specify a whitelist. Personally, I like that idea since it simplifies how a component owner can implement our cardinality enforcing helpers without having to necessary plumb through complicated maps. This would make it considerably easier to feed this data in through the command line since you could do something like this:
+This design allows us to optionally adopt the idea about simplifying the interface for component owners, who can then opt to just specify a metric and label pair *without* having to specify an allowlist. The good part of this idea is it simplifies how a component owner can implement our cardinality enforcing helpers without having to necessary plumb through complicated maps. This would make it considerably easier to feed this data in through the command line since it can be done like this:
 
 ```bash
 $ kube-apiserver --accepted-metric-labels "some_metric=label_too_many_values"
 ```
 
-..which would then be interpreted by our machinery as this:
+This would then be interpreted by our machinery as this:
 
 
 ```json
 [
 	{
 		"metric-id": "some_metric",
 		"label": "label_too_many_values",
-		"labelValueWhitelist": []
+		"labelValueAllowlist": []
 	}
 ]
 
 ```
 
 ## Open-Question
-_(Discussion Points which need to be resolved prior to merge)_
-
-- @dashpole
-
-> Should have labels with a specific set of values, should we start enforcing that all metrics have a whitelist at compile-time?
-
-- @x13n
-
-> ... instead of getting label_too_many_values right away, [enforcing the cardinality limit directly] would still work until certain label cardinality limit is reached. Whitelisting would guarantee a value will not be dropped, but other values wouldn't be dropped either unless there is too many of them. Cluster admin can configure the per metric and per label limits once and can get alerted on "some metric labels are dropped" instead of "your metrics storage is getting blown up".
-
-- @brancz/@lilic
-
-> Potentially we would want to treat buckets completely separately (as in a separate flag just for bucket configuration of histograms). @bboreham opened the original PR for apiserver request duration bucket reduction, maybe he has some input as well.
->
-> My biggest concern I think with all of this is, it's going to be super easy to have extremely customized kubernetes setups where our existing dashboards and alerting rule definitions are just not going to apply generally anymore. I'd like to make sure we emphasize that these flags are really only meant to be used as escape hatches, and we must always strive to truly fix the root of the issue.
 
 
 ## Graduation Criteria
@@ -287,4 +267,5 @@ todo
 
 ## Implementation History
 
-todo
+todo
+