-
Notifications
You must be signed in to change notification settings - Fork 77
Enable Auth for the metrics endpoints for the controllers #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Auth for the metrics endpoints for the controllers #81
Conversation
6173cc9 to
7a4c32a
Compare
|
/test e2e-aws |
|
/retest |
ae7f6e3 to
df673d6
Compare
|
/test e2e-aws-operator |
|
/test e2e-aws |
df673d6 to
7124468
Compare
| apiVersion: servicecertsigner.config.openshift.io/v1alpha1 | ||
| kind: APIServiceCABundleInjectorConfig | ||
| caBundleFile: /var/run/configmaps/signing-cabundle/ca-bundle.crt | ||
| authentication: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stlaz Is this just a configuration change that prompts library code to ensure the metrics endpoint is secured? If so, why would that require operator-specific e2e coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would require a library extended test, or at least an integration one, but we would still want to check that the operator is actually behaving the intended way and publishes it's metrics as desired, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally got to browse through the code - it was not my point to try to check that the endpoints are authenticated, but that they are actually showing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't that be validated in a unit test?
Edit: Likely not a unit test, but if the metrics endpoints is exposed by a library function, would it make sense to implement a library test that operators could trivially reuse? i.e.
func TestMetricsEndpoint(t *testing.T) {
libraryTestFunc(t, myEndpointCfg, func(result []byte) {
# validate the output in an operator-specific way
})
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that that would be a fine general approach to test an endpoint, yes. I wonder how many endpoints we'd like to test this way though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've identified 10 operators in the service ca compatibility audit that should be tested this way, and there are likely more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what @marun suggested makes sense to me.
70a153d to
b43a5b5
Compare
test/e2e/e2e_test.go
Outdated
| TypeMeta: metav1.TypeMeta{ | ||
| Kind: "Route", | ||
| APIVersion: "route.openshift.io/v1", | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to specify TypeMeta, I think the client takes care of it
test/e2e/e2e_test.go
Outdated
| TypeMeta: metav1.TypeMeta{ | ||
| Kind: "Service", | ||
| APIVersion: "v1", | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
test/e2e/e2e_test.go
Outdated
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: s.ObjectMeta.Name, | ||
| Namespace: s.ObjectMeta.Namespace, | ||
| Labels: labels, | ||
| }, | ||
| Spec: routev1.RouteSpec{ | ||
| To: routev1.RouteTargetReference{ | ||
| Kind: "Service", | ||
| Name: s.ObjectMeta.Name, | ||
| }, | ||
| Port: &routev1.RoutePort{ | ||
| TargetPort: s.Spec.Ports[0].TargetPort, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use constant names
| apiVersion: servicecertsigner.config.openshift.io/v1alpha1 | ||
| kind: APIServiceCABundleInjectorConfig | ||
| caBundleFile: /var/run/configmaps/signing-cabundle/ca-bundle.crt | ||
| authentication: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally got to browse through the code - it was not my point to try to check that the endpoints are authenticated, but that they are actually showing something
test/e2e/e2e_test.go
Outdated
| } | ||
| defer res.Body.Close() | ||
| if res.StatusCode != 403 { | ||
| t.Fatalf("The metrics endpoint is not secured: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to see what was actually returned in order to realize what went wrong in the test + start with lowercase letter 🙂
test/e2e/e2e_test.go
Outdated
|
|
||
| url := "http://" + route.Spec.Host + "/metrics" | ||
|
|
||
| res, err := http.Get(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is fine, but I would like you to also see that the metrics endpoint actually returns some metrics
15456f5 to
6e971b8
Compare
|
/test e2e-aws |
6e971b8 to
4e8662b
Compare
|
/test e2e-aws-operator |
b95b27f to
1ccfb97
Compare
|
/test e2e-aws |
test/e2e/e2e_test.go
Outdated
| } | ||
|
|
||
| // generateRoute creates a OpenShift routes | ||
| func generateRoute(routeClient *routeclient.RouteV1Client, s *v1.Service, namespace string) (*routev1.Route, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's still the naming discrepancy with createService, call this createRouteForService
test/e2e/e2e_test.go
Outdated
|
|
||
| var header string | ||
| for k, v := range res.Header { | ||
| header = fmt.Sprintf("Header field %q, Value %q, Statuscode %q\n", k, v, strconv.Itoa(res.StatusCode)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you just rewriting the header variable for every key in the res.Header map?
test/e2e/e2e_test.go
Outdated
| for k, v := range res.Header { | ||
| header = fmt.Sprintf("Header field %q, Value %q, Statuscode %q\n", k, v, strconv.Itoa(res.StatusCode)) | ||
| } | ||
| fmt.Println(header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use t.Logf(), and only in the error case here
test/e2e/e2e_test.go
Outdated
| t.Fatalf("error getting route client: %v", err) | ||
| } | ||
|
|
||
| // Test case to validate if metrics endpoint for the apiservice-cabundle-injector controller is secured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for _, controller := []string{"apiservice-cabundle-injector", "configmap-cabundle-injector", "service-serving-cert-signer"}, derive your service name for testMetricsEndpoint() from the controller name, last argument of that function is constant.
Improve the test case name, at leat to <controller-name> /metrics
cb78062 to
9229697
Compare
|
/test e2e-aws-operator |
test/e2e/e2e_test.go
Outdated
| } | ||
|
|
||
| // createRouteForService creates a OpenShift routes | ||
| func createRouteForService(routeClient *routeclient.RouteV1Client, s *v1.Service, namespace string) (*routev1.Route, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposing the metrics endpoint with a route should not be necessary here - way too complicated. The endpoint should be checked from inside the cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either is fine I think, I noticed people from console doing this and it looked like a good idea to me, for testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A common pattern in upstream kube when needing to validate a cluster-internal endpoint is to launch a busybox pod that connects to the endpoint. This pattern limits the scope of test dependency. Why is it desirable for validation of internal connectivity to depend on external connectivity?
| - apiGroups: | ||
| - "" | ||
| resources: | ||
| - configmaps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary for the controller to access configmaps across the entire cluster, or only in the service-ca namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added that change after seeing the following error:
configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:openshift-service-ca:apiservice-cabundle-injector-sa" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That suggests limiting the scope of authorization to that resource in that namespace rather than granting global access to configmaps.
test/e2e/e2e_test.go
Outdated
| } | ||
|
|
||
| // Test case to validate if metrics endpoint for the controllers are secured. | ||
| for _, controller := range []string{"apiservice-cabundle-injector", "configmap-cabundle-injector", "service-serving-cert-signer"} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test cases are added in conjunction with the changes done in this PR. I can add another test case if that serves the purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A metrics endpoint is a metrics endpoint - it is exposed by the controller framework that both the operator and its controllers use. It should be possible to test any metrics endpoint with the same code.
test/e2e/e2e_test.go
Outdated
| httpclient := &http.Client{ | ||
| Transport: &http.Transport{ | ||
| TLSClientConfig: &tls.Config{ | ||
| InsecureSkipVerify: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This needs to be false.
- A metrics endpoint needs to be secured with a service ca-provided serving cert.
- The metrics endpoint is automatically secured by a serving cert provided the pod template has the serving cert secret mounted to
/var/run/secrets/serving-cert(this is already true for the operator). - The controller framework will generate an insecure self-signed certificate on startup and will exit the pod when the cert files are updated, so even the service ca controllers and operators can be secured with serving certs.
- Configuring each controller with a service ca-provided ca bundle will require creating an annotated service (
service.beta.openshift.io/serving-cert-secret-name) that exposes the metrics endpoint.
- The metrics endpoint is automatically secured by a serving cert provided the pod template has the serving cert secret mounted to
- The client to access the metrics endpoint needs to be configured with a service ca-provided ca bundle (see the rotation tests for how to accomplish this).
@stlaz Who is going to watch these new endpoints, and how do they discover them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those endpoints are not new, they are being scraped by monitoring operator, they most probably use their service account for that.
The InsecureSkipVerify is here as you're not trying to see whether the endpoint is secured by a valid certificate, that it should always be, otherwise scraping the data would not work, it merely tested that you can't scrape it unauthenticated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of InsecureSkipVerify is masking a bug. Metrics endpoints are not secured automatically by a valid certificate, as per my previous comment, and anyone trying to connect to the endpoint enabled by the current state of this PR is going to be unable to validate the cert.
Can you please point me to how the monitoring operator knows how to read from these endpoints? Given that the endpoint cert is not secure in the current iteration of this PR, I am interested to know how to verify on the monitoring operator side whether the endpoints it is charged with collecting from are configured properly. I found the monitoring doc instructions and posted them in channel.
b7196b2 to
879aa7b
Compare
|
/test e2e-aws-operator |
|
Given that these metrics endpoints are not currently being used by the cluster, I think there are 2 potential paths forward:
I do not believe that enabling authn/authz on the metrics endpoints and writing e2e tests to verify that configuration, as currently proposed, is worth pursuing. Testing what is essentially a function of library go (configuration of metrics endpoints) should be out-of-scope for this operator. If the endpoints are intended to be used, testing that prometheus monitoring is properly configured to use them would ensure proper configuration without requiring bespoke testing. There is already at least one example of testing that prometheus is receiving alerts and this pattern could probably be generalized for inclusion in library-go (cc: @sallyom). |
879aa7b to
7a4c32a
Compare
08b23df to
032ddaa
Compare
032ddaa to
6f2a866
Compare
|
LGTM @stlaz I asked for the e2e test be removed. Ensuring that the generic operator config doesn't disable the auth is sufficient to ensure that the endpoint isn't accessible to unauthorized access. A follow-on PR can configure prometheus to read from the endpoint and leverage a generic test for prometheus metric configuration that @sallyom and @sanchezl are working on adding to library-go. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sohankunkerkar, stlaz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
No description provided.