Skip to content

Conversation

@sohankunkerkar
Copy link
Member

No description provided.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 7, 2019
@sohankunkerkar
Copy link
Member Author

/test e2e-aws

@stlaz
Copy link
Contributor

stlaz commented Nov 8, 2019

/retest

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 10, 2019
@sohankunkerkar sohankunkerkar force-pushed the fix_metrics_endpoint branch 4 times, most recently from ae7f6e3 to df673d6 Compare November 11, 2019 15:24
@stlaz
Copy link
Contributor

stlaz commented Nov 11, 2019

/test e2e-aws-operator

@sohankunkerkar
Copy link
Member Author

/test e2e-aws

apiVersion: servicecertsigner.config.openshift.io/v1alpha1
kind: APIServiceCABundleInjectorConfig
caBundleFile: /var/run/configmaps/signing-cabundle/ca-bundle.crt
authentication:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stlaz Is this just a configuration change that prompts library code to ensure the metrics endpoint is secured? If so, why would that require operator-specific e2e coverage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would require a library extended test, or at least an integration one, but we would still want to check that the operator is actually behaving the intended way and publishes it's metrics as desired, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally got to browse through the code - it was not my point to try to check that the endpoints are authenticated, but that they are actually showing something

Copy link
Contributor

@marun marun Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't that be validated in a unit test?

Edit: Likely not a unit test, but if the metrics endpoints is exposed by a library function, would it make sense to implement a library test that operators could trivially reuse? i.e.

func TestMetricsEndpoint(t *testing.T) {
  libraryTestFunc(t, myEndpointCfg, func(result []byte) {
    # validate the output in an operator-specific way
  })
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that that would be a fine general approach to test an endpoint, yes. I wonder how many endpoints we'd like to test this way though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've identified 10 operators in the service ca compatibility audit that should be tested this way, and there are likely more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what @marun suggested makes sense to me.

@sohankunkerkar sohankunkerkar force-pushed the fix_metrics_endpoint branch 3 times, most recently from 70a153d to b43a5b5 Compare November 12, 2019 04:50
TypeMeta: metav1.TypeMeta{
Kind: "Route",
APIVersion: "route.openshift.io/v1",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to specify TypeMeta, I think the client takes care of it

Comment on lines 338 to 358
TypeMeta: metav1.TypeMeta{
Kind: "Service",
APIVersion: "v1",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 309 to 321
ObjectMeta: metav1.ObjectMeta{
Name: s.ObjectMeta.Name,
Namespace: s.ObjectMeta.Namespace,
Labels: labels,
},
Spec: routev1.RouteSpec{
To: routev1.RouteTargetReference{
Kind: "Service",
Name: s.ObjectMeta.Name,
},
Port: &routev1.RoutePort{
TargetPort: s.Spec.Ports[0].TargetPort,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use constant names

apiVersion: servicecertsigner.config.openshift.io/v1alpha1
kind: APIServiceCABundleInjectorConfig
caBundleFile: /var/run/configmaps/signing-cabundle/ca-bundle.crt
authentication:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally got to browse through the code - it was not my point to try to check that the endpoints are authenticated, but that they are actually showing something

}
defer res.Body.Close()
if res.StatusCode != 403 {
t.Fatalf("The metrics endpoint is not secured: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to see what was actually returned in order to realize what went wrong in the test + start with lowercase letter 🙂


url := "http://" + route.Spec.Host + "/metrics"

res, err := http.Get(url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is fine, but I would like you to also see that the metrics endpoint actually returns some metrics

@sohankunkerkar sohankunkerkar force-pushed the fix_metrics_endpoint branch 4 times, most recently from 15456f5 to 6e971b8 Compare November 13, 2019 17:36
@sohankunkerkar
Copy link
Member Author

/test e2e-aws

@sohankunkerkar
Copy link
Member Author

/test e2e-aws-operator

@sohankunkerkar sohankunkerkar force-pushed the fix_metrics_endpoint branch 2 times, most recently from b95b27f to 1ccfb97 Compare November 19, 2019 16:26
@sohankunkerkar
Copy link
Member Author

/test e2e-aws

}

// generateRoute creates a OpenShift routes
func generateRoute(routeClient *routeclient.RouteV1Client, s *v1.Service, namespace string) (*routev1.Route, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's still the naming discrepancy with createService, call this createRouteForService


var header string
for k, v := range res.Header {
header = fmt.Sprintf("Header field %q, Value %q, Statuscode %q\n", k, v, strconv.Itoa(res.StatusCode))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you just rewriting the header variable for every key in the res.Header map?

for k, v := range res.Header {
header = fmt.Sprintf("Header field %q, Value %q, Statuscode %q\n", k, v, strconv.Itoa(res.StatusCode))
}
fmt.Println(header)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use t.Logf(), and only in the error case here

t.Fatalf("error getting route client: %v", err)
}

// Test case to validate if metrics endpoint for the apiservice-cabundle-injector controller is secured.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for _, controller := []string{"apiservice-cabundle-injector", "configmap-cabundle-injector", "service-serving-cert-signer"}, derive your service name for testMetricsEndpoint() from the controller name, last argument of that function is constant.

Improve the test case name, at leat to <controller-name> /metrics

@sohankunkerkar sohankunkerkar force-pushed the fix_metrics_endpoint branch 2 times, most recently from cb78062 to 9229697 Compare November 20, 2019 14:34
@sohankunkerkar
Copy link
Member Author

/test e2e-aws-operator

}

// createRouteForService creates a OpenShift routes
func createRouteForService(routeClient *routeclient.RouteV1Client, s *v1.Service, namespace string) (*routev1.Route, error) {
Copy link
Contributor

@marun marun Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing the metrics endpoint with a route should not be necessary here - way too complicated. The endpoint should be checked from inside the cluster

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sounds reasonable to me. I have designed this logic after having some discussion with @stlaz
@stlaz wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either is fine I think, I noticed people from console doing this and it looked like a good idea to me, for testing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common pattern in upstream kube when needing to validate a cluster-internal endpoint is to launch a busybox pod that connects to the endpoint. This pattern limits the scope of test dependency. Why is it desirable for validation of internal connectivity to depend on external connectivity?

- apiGroups:
- ""
resources:
- configmaps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary for the controller to access configmaps across the entire cluster, or only in the service-ca namespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added that change after seeing the following error:

configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:openshift-service-ca:apiservice-cabundle-injector-sa" cannot get resource "configmaps" in API group "" in the namespace "kube-system"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That suggests limiting the scope of authorization to that resource in that namespace rather than granting global access to configmaps.

}

// Test case to validate if metrics endpoint for the controllers are secured.
for _, controller := range []string{"apiservice-cabundle-injector", "configmap-cabundle-injector", "service-serving-cert-signer"} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the operator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test cases are added in conjunction with the changes done in this PR. I can add another test case if that serves the purpose.

Copy link
Contributor

@marun marun Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A metrics endpoint is a metrics endpoint - it is exposed by the controller framework that both the operator and its controllers use. It should be possible to test any metrics endpoint with the same code.

httpclient := &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{
InsecureSkipVerify: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This needs to be false.
  • A metrics endpoint needs to be secured with a service ca-provided serving cert.
    • The metrics endpoint is automatically secured by a serving cert provided the pod template has the serving cert secret mounted to /var/run/secrets/serving-cert (this is already true for the operator).
    • The controller framework will generate an insecure self-signed certificate on startup and will exit the pod when the cert files are updated, so even the service ca controllers and operators can be secured with serving certs.
    • Configuring each controller with a service ca-provided ca bundle will require creating an annotated service ( service.beta.openshift.io/serving-cert-secret-name) that exposes the metrics endpoint.
  • The client to access the metrics endpoint needs to be configured with a service ca-provided ca bundle (see the rotation tests for how to accomplish this).

@stlaz Who is going to watch these new endpoints, and how do they discover them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those endpoints are not new, they are being scraped by monitoring operator, they most probably use their service account for that.

The InsecureSkipVerify is here as you're not trying to see whether the endpoint is secured by a valid certificate, that it should always be, otherwise scraping the data would not work, it merely tested that you can't scrape it unauthenticated.

Copy link
Contributor

@marun marun Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of InsecureSkipVerify is masking a bug. Metrics endpoints are not secured automatically by a valid certificate, as per my previous comment, and anyone trying to connect to the endpoint enabled by the current state of this PR is going to be unable to validate the cert.

Can you please point me to how the monitoring operator knows how to read from these endpoints? Given that the endpoint cert is not secure in the current iteration of this PR, I am interested to know how to verify on the monitoring operator side whether the endpoints it is charged with collecting from are configured properly. I found the monitoring doc instructions and posted them in channel.

@sohankunkerkar sohankunkerkar force-pushed the fix_metrics_endpoint branch 2 times, most recently from b7196b2 to 879aa7b Compare November 20, 2019 18:44
@sohankunkerkar
Copy link
Member Author

/test e2e-aws-operator

@marun
Copy link
Contributor

marun commented Nov 20, 2019

Given that these metrics endpoints are not currently being used by the cluster, I think there are 2 potential paths forward:

  • Disable the endpoints
    • Who is intended to use them in their current form?
      • They are using insecure self-signed certificates.
  • Configure prometheus monitoring
    • According to the monitoring doc this is the responsibility of all operators anyway.
      • The doc suggests that instrumenting an operator is likely to take a few hours.

I do not believe that enabling authn/authz on the metrics endpoints and writing e2e tests to verify that configuration, as currently proposed, is worth pursuing. Testing what is essentially a function of library go (configuration of metrics endpoints) should be out-of-scope for this operator. If the endpoints are intended to be used, testing that prometheus monitoring is properly configured to use them would ensure proper configuration without requiring bespoke testing. There is already at least one example of testing that prometheus is receiving alerts and this pattern could probably be generalized for inclusion in library-go (cc: @sallyom).

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 21, 2019
@sohankunkerkar sohankunkerkar changed the title Enable Auth for metrics endpoint for controllers Enable Auth for the metrics endpoints for the controllers Nov 21, 2019
@marun
Copy link
Contributor

marun commented Nov 22, 2019

LGTM

@stlaz I asked for the e2e test be removed. Ensuring that the generic operator config doesn't disable the auth is sufficient to ensure that the endpoint isn't accessible to unauthorized access. A follow-on PR can configure prometheus to read from the endpoint and leverage a generic test for prometheus metric configuration that @sallyom and @sanchezl are working on adding to library-go.

@stlaz
Copy link
Contributor

stlaz commented Nov 22, 2019

/lgtm
@marun sounds good to me

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 22, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sohankunkerkar, stlaz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 22, 2019
@openshift-merge-robot openshift-merge-robot merged commit 1e69be2 into openshift:master Nov 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants