feat sicedar: added readiness prober#1395
Conversation
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
e460685 to
3b2d411
Compare
bwplotka
left a comment
There was a problem hiding this comment.
Some suggestions (: Thanks overall!
cmd/thanos/sidecar.go
Outdated
| func registerSidecar(m map[string]setupFunc, app *kingpin.Application, name string) { | ||
| cmd := app.Command(name, "sidecar for Prometheus server") | ||
| func registerSidecar(m map[string]setupFunc, app *kingpin.Application) { | ||
| comp := component.Sidecar |
There was a problem hiding this comment.
I would maybe inline those, would be more verbose.
There was a problem hiding this comment.
I inlined it in the registerSidecar function, not sure if you want to inline it everywhere even in the runSidecar ?
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
3eaf82a to
cdc30f4
Compare
|
@povilasv thanks! there are some new changes just so you don't approve something you didn't see so maybe check that out :) @bwplotka Thanks for the suggestions! I tried some refactor of the sidecar liveness so PTAL. |
|
those failing test are weird, not sure if related to my changes? |
bwplotka
left a comment
There was a problem hiding this comment.
I think it generally make sense.. I am bit worried that those values for sinceLastSuccessfulHeartbeatLimit might be generally nice.. but might not match some unique use cases user may have. Also this means any flake on connection to Prometheus (single heartbeat failed) might mean unavailability. I am a bit skeptical, I would mark it as non ready after 3 heartbeats or something (?)
|
Let's chat offline @FUSAKLA but overall problem with those probes is that we need to be careful. @povilasv mentioned that all of this can be configured anyone on Kubernetes side, but that's not that simple. Problems:
|
|
Regarding healthyness: https://twitter.com/miekg/status/1152911058627153920 We are I think on the same page that we should do it ONLY set healthyness to NOT healthy if we are sure that restart will help. |
|
Yeah, so should we get back to the original state of just modifying the readiness based on the halthchecks? I wanted originally to avoid touching the liveness probe because it felt bit harsh but I understood that this was what you suggested. But I probably did not clearly understood what you meant, sorry for the confusement 😕 |
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
326450a to
153cd3f
Compare
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
35633e7 to
d351d03
Compare
|
Along with the offline agreement, the readiness probe is only set after the initial Prometheus external labels fetch from the Prometheus. Once they are fetched it stays ready all the time. If we hit any case in the future when more complex readiness handling will be needed it can be added then. |
bwplotka
left a comment
There was a problem hiding this comment.
This looks good 👍
One important thing before merge. We never set prober to be healthy. And in prober constructor we set healthyness to be initialError. Maybe we should be healthy by default in NewProber?
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
|
But I found one bug with opposite condition on error logging in the Prober, fixed in 60897a8 🙄 |
pkg/prober/prober.go
Outdated
| func NewProber(component component.Component, logger log.Logger, reg prometheus.Registerer) *Prober { | ||
| initialErr := fmt.Errorf(initialErrorFmt, component) | ||
|
|
||
| // From Kubernetes documentation https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/ : |
There was a problem hiding this comment.
Minor nit but this should part of struct TBH (for GoDocs)
There was a problem hiding this comment.
good point, is it ok this way?
79d3507 to
480fff5
Compare
Signed-off-by: Martin Chodur <m.chodur@seznam.cz>
480fff5 to
210075a
Compare
|
Thanks @FUSAKLA ! |
|
Great, thank you all for the comments and going through the discussions around the readiness! There is only few components left 😄 |
* feat sicedar: added readiness prober Signed-off-by: Martin Chodur <m.chodur@seznam.cz>

Signed-off-by: Martin Chodur m.chodur@seznam.cz
Changes
Next PR in the from the series of adding readiness and liveness probes.
Added readiness and liveness prober to the thanos sidecar.
It follows the logic of setting the
promUpmetric so when it successfully fetches config of the Prometheus it becomes ready.The question is should it take into account status of the GRPC interface also?
Verification