-
Notifications
You must be signed in to change notification settings - Fork 249
add helper functions for running prometheus query #602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add helper functions for running prometheus query #602
Conversation
8a89b34 to
e5000e7
Compare
e5000e7 to
7578d03
Compare
7578d03 to
d583fb2
Compare
|
@marun thanks for the review, I've updated, ready for another review. Also, see this functioning in the cluster-kube-controller-operator PR. |
d583fb2 to
60688ba
Compare
marun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the separation of concerns you are pursuing.
What do you think of adding an e2e test that exercises the code? Some of my comments (e.g. exiting polling on recoverable errors) would be challenging to discover from code alone but would become immediately apparent when executed (e.g. failing immediately when configmap doesn't contain cabundle). I'd also prefer to have these helpers be maintainable in library-go vs requiring co-developement with one or more operators.
e3c1277 to
dd5e01f
Compare
soltysh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sallyom since these are meant as testing helpers please move them to pkg/test/library/metrics I don't want anyone accidentally figuring this one out and using it for other purpose than testing.
ccef27b to
be5ca46
Compare
test/library/metrics/query.go
Outdated
| } | ||
| for _, secret := range secrets.Items { | ||
| if secret.Type != corev1.SecretTypeServiceAccountToken || | ||
| !strings.HasPrefix(secret.Name, "prometheus-") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to specify the more explicit prometheus-k8s- prefix? Tokens sourced from secrets prefixed with both prometheus-k8s- and prometheus-operator- support querying today, but I'm not sure that will always be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
@sallyom Nice work! I've updated openshift/service-ca-operator#90 to use it and tests are passing locally. Once this PR merges I'll add another helper that checks that metrics are being collected for a given namespace (e.g. |
test/library/metrics/query.go
Outdated
| return nil, err | ||
| } | ||
|
|
||
| route, err := rc.RouteV1().Routes("openshift-monitoring").Get("prometheus-k8s", metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not something to block merge since the code works as-is., but I seem to recall @s-urbaniak mentioning that it would be preferable to query thanos-querier. However, my naive attempts to use the thanos-querier route were unsuccessful with tokens sourced from prometheus-k8s- and thanos-querier- prefixed secrets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the prometheus-k8s is route is still maintained, but effectively deprecated. Thanos Querier is gated by the same oauth-proxy sidecar as prometheus-k8s, hence the same token secrets should apply, else we have a bug on our side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s-urbaniak No bug. I was requesting targets via the thanos-querier route and it wasn't succeeding, but I realize now that it was a 404 rather than an auth failure presumably because thanos doesn't support querying targets. Performing an up query instead works fine.
@sallyom Please update to use the thanos-querier route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hit a snag, the thanos-querier route works for a query, but when searching for an alert, that route does not work, have to use the prometheus-k8s route for alerts (like I'm doing in the kube-controller-manager-operator PR). I added a switch and a param to the NewPrometheusClient to pass either 'query' or 'alert'.
@s-urbaniak is that expected? thanos-querier route not for Alerts? I get a 404 for /alerts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that you can pass the entire alert as a query, like so:
ALERTS{alertname="PodDisruptionBudgetAtLimit",alertstate="pending",namespace="pdbns",poddisruptionbudget="pdbname",prometheus="openshift-monitoring/k8s",service="kube-state-metrics",severity="warning"}==1
instead of using the deprecated prometheus-k8s route. From what I can find, you have to pass the full alert, since partial responses return an error w/ how we have thanos setup - that is stretching my knowledge of prometheus/thanos, correct me if you can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sallyom do you need to pass all of the properties in that query or only those you care about? Seems quite cumbersome at first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sallyom indeed, thanos querier does not support /alerts as it is "only" responsible for the query part. It doesn't scrape targets either as it is "just" a proxy in front of prometheus, hence /targets doesn't exist here either. Your ALERTS query is fine from my point of view, but passing just ALERTS{alertname="PodDisruptionBudgetAtLimit"} should work too.
b7a7638 to
af5efab
Compare
test/library/metrics/query.go
Outdated
| } | ||
| host = route.Status.Ingress[0].Host | ||
| case "alert": | ||
| route, err := rc.RouteV1().Routes("openshift-monitoring").Get("prometheus-k8s", metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s-urbaniak This code suggests that thanos-querier doesn't support querying for alerts via the Alerts method of the prometheus client. Is there another way for tests to query for alerts similar to how an up query can replace use of the Targets method of the prometheus client? Or is testing of alerts dependent on the deprecated prometheus-k8s route?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^ lol I found that out also, see above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned above you can use the ALERTS query to get the desired result. Indeed, the /alerts endpoint is not accessible via Thanos Querier, as it is "just" responsible for querying metrics.
564ed58 to
6ed7cdc
Compare
|
LGTM |
|
/lgtm |
test/library/metrics/query.go
Outdated
| KeepAlive: 30 * time.Second, | ||
| }).DialContext, | ||
| TLSHandshakeTimeout: 10 * time.Second, | ||
| TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot send a bearer token to a destination without verifying the server you're speaking to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the potential risk(s) you are concerned with? As per the test prefix of this path, this code is intended only to support e2e testing of metrics collection.
|
/lgtm cancel |
6ed7cdc to
9b4817f
Compare
9b4817f to
23320a9
Compare
|
@deads2k ptal, thanks! and thanks for pointing me in the direction of the correct router-ca configmap, that did the trick, I was initially using the service-ca-operator generated CA, not the correct router-ca |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, mfojtik, sallyom, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
add helper functions for running prometheus query
@marun @sanchezl this PR provides helper functions for prometheus query
I placed a fake bump/testing here:
/cc @sttts