Add thanos query frontend sub command#2973
Conversation
1df74b4 to
884a415
Compare
49842f5 to
d944b89
Compare
I haven't had the chance to review this PR in itself but I think since this work will be on-going I think I would prefer to merge a minimum state that works good enough ™️, and then iterate on it. So to be explicit, I am for this, but in a follow up PR. |
33d71b9 to
115499a
Compare
ff6d5a7 to
de35949
Compare
de35949 to
2c21459
Compare
bwplotka
left a comment
There was a problem hiding this comment.
This looks epic for a start 👍
Let's rebase on master since we merged the upgrade PR, wdyt? (:
Otherwise is good for first iteration 💪
The only thing is that I would try to create config as we have for other caches in store so we are ready for other cache providers (:
cmd/thanos/query-frontend.go
Outdated
| BoolVar(&c.cacheResults) | ||
|
|
||
| cmd.Flag("query-range.split-interval", "Split queries by an interval and execute in parallel, 0 disables it."). | ||
| Default("24h").DurationVar(&c.splitInterval) |
There was a problem hiding this comment.
Let's leave it for now but probably we need expose it better (:
cmd/thanos/query-frontend.go
Outdated
|
|
||
| func (c *responseCacheConfig) registerFlag(cmd *kingpin.CmdClause) { | ||
| c.fifoCache.registerFlag(cmd) | ||
| cmd.Flag("query-range.response-cache-max-freshness", "Most recent allowed cacheable result, to prevent caching very recent results that might still be in flux."). |
There was a problem hiding this comment.
I believe this should be overall general config not fifo? 🤔
There was a problem hiding this comment.
Yes, you are right. It is a general cache config so I named it query-range.response-cache-max-freshness and it is available for all caches.
cmd/thanos/query-frontend.go
Outdated
|
|
||
| // fifoCacheConfig defines configurations for Cortex fifo cache. | ||
| type fifoCacheConfig struct { | ||
| maxSizeBytes units.Base2Bytes |
There was a problem hiding this comment.
BTW I think we should have those as cache client now and use our cache flags like in:
Line 54 in 82cca56
cmd/thanos/query-frontend.go
Outdated
| m[comp.String()] = func(g *run.Group, logger log.Logger, reg *prometheus.Registry, _ opentracing.Tracer, _ <-chan struct{}, _ bool) error { | ||
|
|
||
| return runQueryFrontend( | ||
| g, |
There was a problem hiding this comment.
no need for this syle - I think it can fit a line (:
cmd/thanos/query-frontend.go
Outdated
| conf.queryRangeConfig.respCacheConfig.cacheMaxFreshness, | ||
| ) | ||
|
|
||
| // TODO(yeya24): support other cache when available. |
|
Some build errors got after rebasing master. Seems the Prometheus dependency needs to be updated on the Cortex side as well, |
|
That might be true. Here comes cyclic deps (: |
|
Let's propose a PR on their side cc @pracucci |
I will prepare a pr. |
pkg/api/queryfrontend/v1.go
Outdated
|
|
||
| r.Get("/labels", instr("labels", handleFunc)) | ||
| r.Post("/labels", instr("labels", handleFunc)) | ||
| } |
There was a problem hiding this comment.
Is it necessary to add stores and rules APIs here?
There was a problem hiding this comment.
well depends how we handle this - I think query frontend was doing pass through for all not defined endpoints, but we should test against it. If that's the case then why we define labels, series, values if not cached?
2c21459 to
c590906
Compare
7789133 to
3e755e2
Compare
|
I have updated my pr to support the response cache config file. PTAL when you have time. For the Cortex dependency error, I opened a pr cortexproject/cortex#3000 already. |
eda511a to
67c5609
Compare
bwplotka
left a comment
There was a problem hiding this comment.
Hey, some suggestions mostly around diffetent paths. The current implementation is super vague what is cached, what is splitted and retries etc. What if some path is not in api.go? (:
Can we make it explicit? For example if no retry, splitting is expected for ALL but query_range, can we just remove all but query_range from API? maybe leaving query instant makes sense as slow query log is nice for those (: WDYT?
Beside that it's amazing! 💪 Great job! We are missing docs, but we can add those later (:
| - [#2865](https://github.com/thanos-io/thanos/pull/2865) ui: Migrate Thanos Ruler UI to React | ||
| - [#2964](https://github.com/thanos-io/thanos/pull/2964) Query: Add time range parameters to label APIs. Add `start` and `end` fields to Store API `LabelNamesRequest` and `LabelValuesRequest`. | ||
| - [#2996](https://github.com/thanos-io/thanos/pull/2996) Sidecar: Add `reloader_config_apply_errors_total` metric. Add new flags `--reloader.watch-interval`, and `--reloader.retry-interval`. | ||
| - [#2973](https://github.com/thanos-io/thanos/pull/2973) Add Thanos Query Frontend component. |
There was a problem hiding this comment.
Docs, would be nice, maybe in next PR
| cmd.Flag("query-range.max-query-parallelism", "Maximum number of queries will be scheduled in parallel by the frontend."). | ||
| Default("14").IntVar(&c.maxQueryParallelism) | ||
|
|
||
| cmd.Flag("query-range.response-cache-max-freshness", "Most recent allowed cacheable result, to prevent caching very recent results that might still be in flux."). |
There was a problem hiding this comment.
We need better help flag description - in separate PR is ok
There was a problem hiding this comment.
essentially let's describe why this is needed
There was a problem hiding this comment.
Make sense, we can also help update the flag description in Cortex as well.
pkg/api/queryfrontend/v1.go
Outdated
|
|
||
| r.Get("/labels", instr("labels", handleFunc)) | ||
| r.Post("/labels", instr("labels", handleFunc)) | ||
| } |
There was a problem hiding this comment.
well depends how we handle this - I think query frontend was doing pass through for all not defined endpoints, but we should test against it. If that's the case then why we define labels, series, values if not cached?
pkg/api/queryfrontend/v1.go
Outdated
|
|
||
| r.Get("/labels", instr("labels", handleFunc)) | ||
| r.Post("/labels", instr("labels", handleFunc)) | ||
| } |
| // TestRoundTripRetryMiddleware tests the retry middleware. | ||
| func TestRoundTripRetryMiddleware(t *testing.T) { | ||
| testRequest := &queryrange.PrometheusRequest{ | ||
| Path: "/api/v1/query_range", |
There was a problem hiding this comment.
What if path is totally different? not query range? Can we test it?
pkg/queryfrontend/roundtrip_test.go
Outdated
| { | ||
| name: "disable split", | ||
| req: &queryrange.PrometheusRequest{ | ||
| Path: "/api/v1/query_range", |
There was a problem hiding this comment.
again, can we test different paths? just to see what to expect
|
Also still somehow tests flakes, wonder if it's SWIFT flakiness |
Not passing through. Please see cortexproject/cortex#2742. TBH I am not sure if this is the behavior we want to have. Only query range results are cached. The workflow is:
Umm, I agree we should separate labels for other endpoints as well. Cortex does the same thing |
|
Seems we got another flaky test https://app.circleci.com/pipelines/github/thanos-io/thanos/2764/workflows/74964958-9ce1-4041-b3b9-4ea659a1764e/jobs/10997 |
|
In this case let's do following:
|
Hello, I agree that we shouldn't limit the endpoints here and it is better to pass through all other routes to downstream querier to make sure everything works fine (like Grafana). But I am not sure why a separate handler is needed? Can I just reuse the downstream roundtripper implemented in Cortex frontend? https://github.com/cortexproject/cortex/blob/master/pkg/querier/frontend/frontend.go#L118. This works but TBH I don't know whether this is a good pattern or not. I just found it not as easy as I thought to deal with the default route with return func(next http.RoundTripper) http.RoundTripper {
queryRangeTripper := queryrange.NewRoundTripper(next, codec, queryRangeMiddleware...)
return frontend.RoundTripFunc(func(r *http.Request) (*http.Response, error) {
switch r.URL.Path {
case "/api/v1/query":
if r.Method == http.MethodGet || r.Method == http.MethodPost {
queriesCount.WithLabelValues(labelQuery).Inc()
}
case "/api/v1/query_range":
if r.Method == http.MethodGet || r.Method == http.MethodPost {
queriesCount.WithLabelValues(labelQueryRange).Inc()
return queryRangeTripper.RoundTrip(r)
}
default:
}
return next.RoundTrip(r)
})
}, nil |
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
… panic Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
Signed-off-by: Ben Ye <yb532204897@gmail.com>
67c5609 to
56caec6
Compare
|
PR is updated and I added more test cases. PTAL tomorrow |
|
Happy with whatever makes sense more (: Will look today |
bwplotka
left a comment
There was a problem hiding this comment.
Amazing.
I like it. What I am only missing here are some docs (can do in next PR) and tests for different pass through items. Again good for next PR (:
| }) | ||
| return hf | ||
| } | ||
| srv.Handle("/", injectf(fe.Handler().ServeHTTP)) |
|
|
||
| t.Run("same range query, cache hit.", func(t *testing.T) { | ||
| // Run the same range query again, the result can be retrieved from cache directly. | ||
| rangeQuery( |
There was a problem hiding this comment.
can we check that other endpoints are pass through?
There was a problem hiding this comment.
I already tested labelNames and labelValues here https://github.com/thanos-io/thanos/blob/master/test/e2e/query_frontend_test.go#L99-L118. I am not sure whether this is what you want or not.
Signed-off-by: Ben Ye yb532204897@gmail.com
Changes
The current code is almost the same as the Cortex frontend. I just removed the sharding middleware because seems that is specific for Cortex.
TODO:
dedup,max_source_resolution,partial_response, etc. These params should be part of the cache key because these affect the query results. We need to implement Thanos codec to decode/encode Thanos query requests. Supporting Thanos codec to parse Thanos specific query parameters yeya24/thanos#159Verification