store: Start metric and status probe HTTP server as earlier as possible#1656
store: Start metric and status probe HTTP server as earlier as possible#1656bwplotka merged 10 commits intothanos-io:masterfrom
Conversation
2611d22 to
816c2fc
Compare
|
Looking at the code changes, it seems to me that the only thing that has changed is the lexicographic ordering of the function calls but that the server is still started at exactly the same time. Note that the HTTP servers are assigned to the run group and that they only actually start serving requests when the run group itself is started. |
|
@squat Thanks for pointing it. I somehow missed that. I'll have another hard look at it. |
|
Wonder if we can also add more logs to the startup as well, since you are touching this part as per: #1655 |
|
@bwplotka Of course, I can I'll check the related conversation. |
4fc298f to
5bb1f5f
Compare
bwplotka
left a comment
There was a problem hiding this comment.
I don't understand really why we move around all of this scheduleHTTP servers but happy with that if there is a reason (:
2 comments, otherwise LGTM! Thanks.
I think we really need to advertise now to NOT add readiness probe on metric. Users do that and now they will have false impression store that store is ready, but it actually still just loading the blocks...
Wonder how to communicate that ):
I will add more logging to initial sync in separare PR (:
5bb1f5f to
0c784c0
Compare
|
@bwplotka @squat I removed unnecessary changes, fixed emphasized issues, updated CHANGELOG and store documentation. Documentation definitely needs some help, I'm open to suggestions.
I couldn't get to this one yet, however it would probably better since you have more context on the component, thanks a lot. |
c731615 to
7a1453f
Compare
b32a709 to
e88291a
Compare
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Co-Authored-By: Martin Chodur <m.chodur@seznam.cz> Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
e88291a to
7004a30
Compare
bwplotka
left a comment
There was a problem hiding this comment.
Nice! Finally we have metrics on Thanos startup!
Let's make sure we have proper logging and metrics on sync - happy to attack this on Monday.
|
Good work @kakkoyun thanks! ❤️ |
…le (#1656) * Start metric and status probe server as soon as possible Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update changelog Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Schedule a separate goroutine to start server Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add InitSync to the rungroup Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix linter pointed issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Move InitSync to alreay existed run.Group Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove unnecessary changes and update CHANGELOG Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add simple explanation for probes Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Make requested changes Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update CHANGELOG.md Co-Authored-By: Martin Chodur <m.chodur@seznam.cz> Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
As a result of a couple of issues that we had by running Thanos Store with liveness probes on Kubernetes, this PR attempts to fix the liveness probe issues by moving them earlier in the start-up sequence. Now, metrics and status probe HTTP server starts as earlier as possible and returns success from
/-/healthyendpoint and sets its status ready when everything else properly bootstrapped.A good read on the subject: https://srcco.de/posts/kubernetes-liveness-probes-are-dangerous.html
Changes
Verification
make local-testMINIO_ENABLED=1 ./scripts/quickstart.shwhile debug log enabled, and observed start sequence.Thanos Store start sequence logs: