Add thanos query frontend sub command by yeya24 · Pull Request #2973 · thanos-io/thanos

yeya24 · 2020-08-04T03:48:21Z

Signed-off-by: Ben Ye yb532204897@gmail.com

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

The current code is almost the same as the Cortex frontend. I just removed the sharding middleware because seems that is specific for Cortex.

TODO:

Implement Cortex cache.Cache interface, this should be done in another pr.
Unit tests and E2E tests.
Support parsing Thanos specific query parameters such as dedup, max_source_resolution, partial_response, etc. These params should be part of the cache key because these affect the query results. We need to implement Thanos codec to decode/encode Thanos query requests. Supporting Thanos codec to parse Thanos specific query parameters yeya24/thanos#159

Verification

cmd/thanos/query-frontend.go

brancz · 2020-08-06T07:23:52Z

IMHO Thanos should implement its own Request type just like the loki request. Is it better for me to include it in this pr, or in another pr?

I haven't had the chance to review this PR in itself but I think since this work will be on-going I think I would prefer to merge a minimum state that works good enough ™️, and then iterate on it. So to be explicit, I am for this, but in a follow up PR.

bwplotka

This looks epic for a start 👍

Let's rebase on master since we merged the upgrade PR, wdyt? (:

Otherwise is good for first iteration 💪

The only thing is that I would try to create config as we have for other caches in store so we are ready for other cache providers (:

bwplotka · 2020-08-04T17:12:15Z

cmd/thanos/query-frontend.go

+		BoolVar(&c.cacheResults)
+
+	cmd.Flag("query-range.split-interval", "Split queries by an interval and execute in parallel, 0 disables it.").
+		Default("24h").DurationVar(&c.splitInterval)


Let's leave it for now but probably we need expose it better (:

bwplotka · 2020-08-07T20:54:13Z

cmd/thanos/query-frontend.go

+
+func (c *responseCacheConfig) registerFlag(cmd *kingpin.CmdClause) {
+	c.fifoCache.registerFlag(cmd)
+	cmd.Flag("query-range.response-cache-max-freshness", "Most recent allowed cacheable result, to prevent caching very recent results that might still be in flux.").


I believe this should be overall general config not fifo? 🤔

Yes, you are right. It is a general cache config so I named it query-range.response-cache-max-freshness and it is available for all caches.

bwplotka · 2020-08-07T20:55:10Z

cmd/thanos/query-frontend.go

+
+// fifoCacheConfig defines configurations for Cortex fifo cache.
+type fifoCacheConfig struct {
+	maxSizeBytes units.Base2Bytes


BTW I think we should have those as cache client now and use our cache flags like in:

thanos/cmd/thanos/store.go

Line 54 in 82cca56

indexCacheConfig := extflag.RegisterPathOrContent(cmd, "index-cache.config",

bwplotka · 2020-08-07T20:55:30Z

cmd/thanos/query-frontend.go

+	m[comp.String()] = func(g *run.Group, logger log.Logger, reg *prometheus.Registry, _ opentracing.Tracer, _ <-chan struct{}, _ bool) error {
+
+		return runQueryFrontend(
+			g,


no need for this syle - I think it can fit a line (:

bwplotka · 2020-08-07T20:55:50Z

cmd/thanos/query-frontend.go

+		conf.queryRangeConfig.respCacheConfig.cacheMaxFreshness,
+	)
+
+	// TODO(yeya24): support other cache when available.


yeya24 · 2020-08-07T21:48:14Z

Some build errors got after rebasing master.

# github.com/cortexproject/cortex/pkg/querier/queryrange
../../../go/pkg/mod/github.com/cortexproject/cortex@v1.2.1-0.20200805064754-d8edc95e2c91/pkg/querier/queryrange/value.go:92:2: invalid case parser.ValueTypeVector in switch on promRes.Data.ResultType (mismatched types parser.ValueType and string)
../../../go/pkg/mod/github.com/cortexproject/cortex@v1.2.1-0.20200805064754-d8edc95e2c91/pkg/querier/queryrange/value.go:92:2: invalid case parser.ValueTypeMatrix in switch on promRes.Data.ResultType (mismatched types parser.ValueType and string)
!! command failed: build -o /home/yeya24/go/bin/thanos -ldflags -X github.com/prometheus/common/version.Version=0.14.0 -X github.com/prometheus/common/version.Revision=f3b29ab18ede95baad53cda564b490916efb5407 -X github.com/prometheus/common/version.Branch=add-thanos-query-frontend -X github.com/prometheus/common/version.BuildUser=yeya24@yeya24 -X github.com/prometheus/common/version.BuildDate=20200807-21:34:12  -extldflags '-static' -a -tags netgo github.com/thanos-io/thanos/cmd/thanos: exit status 2
make: *** [Makefile:114: build] Error 1

Seems the Prometheus dependency needs to be updated on the Cortex side as well,

bwplotka · 2020-08-07T22:59:01Z

That might be true. Here comes cyclic deps (:

bwplotka · 2020-08-07T22:59:22Z

Let's propose a PR on their side cc @pracucci

yeya24 · 2020-08-08T11:16:25Z

Let's propose a PR on their side cc @pracucci

I will prepare a pr.

yeya24 · 2020-08-08T18:49:08Z

pkg/api/queryfrontend/v1.go

+
+	r.Get("/labels", instr("labels", handleFunc))
+	r.Post("/labels", instr("labels", handleFunc))
+}


Is it necessary to add stores and rules APIs here?

well depends how we handle this - I think query frontend was doing pass through for all not defined endpoints, but we should test against it. If that's the case then why we define labels, series, values if not cached?

Can we check it? (:

yeya24 · 2020-08-09T04:18:14Z

I have updated my pr to support the response cache config file. PTAL when you have time. For the Cortex dependency error, I opened a pr cortexproject/cortex#3000 already.

bwplotka

Hey, some suggestions mostly around diffetent paths. The current implementation is super vague what is cached, what is splitted and retries etc. What if some path is not in api.go? (:

Can we make it explicit? For example if no retry, splitting is expected for ALL but query_range, can we just remove all but query_range from API? maybe leaving query instant makes sense as slow query log is nice for those (: WDYT?

Beside that it's amazing! 💪 Great job! We are missing docs, but we can add those later (:

bwplotka · 2020-08-10T17:23:22Z

CHANGELOG.md

 - [#2865](https://github.com/thanos-io/thanos/pull/2865) ui: Migrate Thanos Ruler UI to React
 - [#2964](https://github.com/thanos-io/thanos/pull/2964) Query: Add time range parameters to label APIs. Add `start` and `end` fields to Store API `LabelNamesRequest` and `LabelValuesRequest`.
 - [#2996](https://github.com/thanos-io/thanos/pull/2996) Sidecar: Add `reloader_config_apply_errors_total` metric. Add new flags `--reloader.watch-interval`, and `--reloader.retry-interval`.
+- [#2973](https://github.com/thanos-io/thanos/pull/2973) Add Thanos Query Frontend component.


Docs, would be nice, maybe in next PR

bwplotka · 2020-08-10T17:25:16Z

cmd/thanos/query-frontend.go

+	cmd.Flag("query-range.max-query-parallelism", "Maximum number of queries will be scheduled in parallel by the frontend.").
+		Default("14").IntVar(&c.maxQueryParallelism)
+
+	cmd.Flag("query-range.response-cache-max-freshness", "Most recent allowed cacheable result, to prevent caching very recent results that might still be in flux.").


We need better help flag description - in separate PR is ok

essentially let's describe why this is needed

Make sense, we can also help update the flag description in Cortex as well.

bwplotka · 2020-08-10T17:28:50Z

pkg/api/queryfrontend/v1.go

+
+	r.Get("/labels", instr("labels", handleFunc))
+	r.Post("/labels", instr("labels", handleFunc))
+}


well depends how we handle this - I think query frontend was doing pass through for all not defined endpoints, but we should test against it. If that's the case then why we define labels, series, values if not cached?

pkg/queryfrontend/cache/inmemory.go

bwplotka · 2020-08-10T17:29:56Z

pkg/api/queryfrontend/v1.go

+
+	r.Get("/labels", instr("labels", handleFunc))
+	r.Post("/labels", instr("labels", handleFunc))
+}


Can we check it? (:

pkg/queryfrontend/roundtrip.go

bwplotka · 2020-08-10T17:35:19Z

pkg/queryfrontend/roundtrip_test.go

+// TestRoundTripRetryMiddleware tests the retry middleware.
+func TestRoundTripRetryMiddleware(t *testing.T) {
+	testRequest := &queryrange.PrometheusRequest{
+		Path:  "/api/v1/query_range",


What if path is totally different? not query range? Can we test it?

bwplotka · 2020-08-10T17:35:46Z

pkg/queryfrontend/roundtrip_test.go

+		{
+			name: "disable split",
+			req: &queryrange.PrometheusRequest{
+				Path:  "/api/v1/query_range",


again, can we test different paths? just to see what to expect

test/e2e/query_frontend_test.go

bwplotka · 2020-08-10T17:40:12Z

Also still somehow tests flakes, wonder if it's SWIFT flakiness

yeya24 · 2020-08-10T18:06:40Z

well depends how we handle this - I think query frontend was doing pass through for all not defined endpoints, but we should test against it. If that's the case then why we define labels, series, values if not cached??

Not passing through. Please see cortexproject/cortex#2742. TBH I am not sure if this is the behavior we want to have.
Like cortex, now only defined endpoints are accessible. That's why the frontend needs to register other endpoints.

Only query range results are cached. The workflow is:

Non-query range but defined endpoints -> pass through to downstream URL. So no middlewares will be applied here.
query range -> go through all defined middlewares and finally go to downstream.

this is weird. I think we put this this tripperware for query, query range but also /label series label/.../values?

Umm, I agree we should separate labels for other endpoints as well. Cortex does the same thing

yeya24 · 2020-08-10T18:08:47Z

Seems we got another flaky test https://app.circleci.com/pipelines/github/thanos-io/thanos/2764/workflows/74964958-9ce1-4041-b3b9-4ea659a1764e/jobs/10997

bwplotka · 2020-08-10T22:33:31Z

In this case let's do following:

Use handler in query_range and query since we will soon modify it to use our tripperware
For everything else let's have (SEPARATE!) passthrough handler. WDYT? (: Also I am surprised it does not work as our UI works just fine on query frontend 🤔
TBH I don't feel like we need allowlist all of endpoints. Maybe we can catch query + query range for now and pass through rest?
Make it more transparent (:

yeya24 · 2020-08-11T02:07:56Z

In this case let's do following:

Use handler in query_range and query since we will soon modify it to use our tripperware

For everything else let's have (SEPARATE!) passthrough handler. WDYT? (: Also I am surprised it does not work as our UI works just fine on query frontend thinking
TBH I don't feel like we need allowlist all of endpoints. Maybe we can catch query + query range for now and pass through rest?
Make it more transparent (:

Hello, I agree that we shouldn't limit the endpoints here and it is better to pass through all other routes to downstream querier to make sure everything works fine (like Grafana).

But I am not sure why a separate handler is needed? Can I just reuse the downstream roundtripper implemented in Cortex frontend? https://github.com/cortexproject/cortex/blob/master/pkg/querier/frontend/frontend.go#L118. This works but TBH I don't know whether this is a good pattern or not.

I just found it not as easy as I thought to deal with the default route with github.com/prometheus/common/route package, so I removed the router and added the routing logic to the tripperware. WDYT? This way we don't need to have other routers.

	return func(next http.RoundTripper) http.RoundTripper {
		queryRangeTripper := queryrange.NewRoundTripper(next, codec, queryRangeMiddleware...)
		return frontend.RoundTripFunc(func(r *http.Request) (*http.Response, error) {
			switch r.URL.Path {
			case "/api/v1/query":
				if r.Method == http.MethodGet || r.Method == http.MethodPost {
					queriesCount.WithLabelValues(labelQuery).Inc()
				}
			case "/api/v1/query_range":
				if r.Method == http.MethodGet || r.Method == http.MethodPost {
					queriesCount.WithLabelValues(labelQueryRange).Inc()
					return queryRangeTripper.RoundTrip(r)
				}
			default:
			}
			return next.RoundTrip(r)
		})
	}, nil

Signed-off-by: Ben Ye <yb532204897@gmail.com>

… panic Signed-off-by: Ben Ye <yb532204897@gmail.com>

Signed-off-by: Ben Ye <yb532204897@gmail.com>

yeya24 · 2020-08-11T04:51:47Z

PR is updated and I added more test cases. PTAL tomorrow

bwplotka · 2020-08-11T06:32:15Z

Happy with whatever makes sense more (: Will look today

bwplotka

Amazing.

I like it. What I am only missing here are some docs (can do in next PR) and tests for different pass through items. Again good for next PR (:

bwplotka · 2020-08-11T12:13:44Z

cmd/thanos/query-frontend.go

+			})
+			return hf
+		}
+		srv.Handle("/", injectf(fe.Handler().ServeHTTP))


bwplotka · 2020-08-11T12:14:34Z

test/e2e/query_frontend_test.go

+
+	t.Run("same range query, cache hit.", func(t *testing.T) {
+		// Run the same range query again, the result can be retrieved from cache directly.
+		rangeQuery(


can we check that other endpoints are pass through?

I already tested labelNames and labelValues here https://github.com/thanos-io/thanos/blob/master/test/e2e/query_frontend_test.go#L99-L118. I am not sure whether this is what you want or not.

yeya24 force-pushed the add-thanos-query-frontend branch from 1df74b4 to 884a415 Compare August 4, 2020 13:46

bwplotka self-requested a review August 4, 2020 16:44

bwplotka reviewed Aug 4, 2020

View reviewed changes

cmd/thanos/query-frontend.go Show resolved Hide resolved

bwplotka reviewed Aug 4, 2020

View reviewed changes

cmd/thanos/query-frontend.go Outdated Show resolved Hide resolved

yeya24 mentioned this pull request Aug 5, 2020

Bump vendored cortex version #2981

Merged

2 tasks

yeya24 force-pushed the add-thanos-query-frontend branch from 49842f5 to d944b89 Compare August 6, 2020 03:53

yeya24 force-pushed the add-thanos-query-frontend branch 4 times, most recently from 33d71b9 to 115499a Compare August 7, 2020 15:22

yeya24 changed the title ~~WIP: Add thanos query frontend sub command~~ Add thanos query frontend sub command Aug 7, 2020

yeya24 marked this pull request as ready for review August 7, 2020 20:17

yeya24 force-pushed the add-thanos-query-frontend branch from ff6d5a7 to de35949 Compare August 7, 2020 20:18

yeya24 requested a review from bwplotka August 7, 2020 20:28

yeya24 force-pushed the add-thanos-query-frontend branch from de35949 to 2c21459 Compare August 7, 2020 20:29

bwplotka approved these changes Aug 7, 2020

View reviewed changes

yeya24 commented Aug 8, 2020

View reviewed changes

yeya24 force-pushed the add-thanos-query-frontend branch from 2c21459 to c590906 Compare August 8, 2020 20:18

yeya24 mentioned this pull request Aug 8, 2020

Bump prometheus and thanos to master cortexproject/cortex#3000

Merged

3 tasks

yeya24 force-pushed the add-thanos-query-frontend branch from 7789133 to 3e755e2 Compare August 9, 2020 04:16

yeya24 force-pushed the add-thanos-query-frontend branch 3 times, most recently from eda511a to 67c5609 Compare August 10, 2020 15:47

bwplotka requested changes Aug 10, 2020

View reviewed changes

yeya24 added 13 commits August 11, 2020 00:47

add thanos query frontend sub command

d9bbdaf

Signed-off-by: Ben Ye <yb532204897@gmail.com>

remove disable-step-align flag

db870da

Signed-off-by: Ben Ye <yb532204897@gmail.com>

check empty downstream url

4a312e2

Signed-off-by: Ben Ye <yb532204897@gmail.com>

remove userID in cache key

44d0534

Signed-off-by: Ben Ye <yb532204897@gmail.com>

add flags max-query-length and max-query-parallelelism

e2525ff

Signed-off-by: Ben Ye <yb532204897@gmail.com>

Add E2E test for query frontend

d55e2f0

Signed-off-by: Ben Ye <yb532204897@gmail.com>

add unit tests for roundtripper

bf7e782

Signed-off-by: Ben Ye <yb532204897@gmail.com>

add changelog

12aa431

Signed-off-by: Ben Ye <yb532204897@gmail.com>

add config flags for Cortex fifo cache

2203a06

Signed-off-by: Ben Ye <yb532204897@gmail.com>

add response cache config

5f27fd1

Signed-off-by: Ben Ye <yb532204897@gmail.com>

don't create cache middleware when splitQueryInterval == 0 to prevent…

7cdfa8b

… panic Signed-off-by: Ben Ye <yb532204897@gmail.com>

fix windows build failure

87580cd

Signed-off-by: Ben Ye <yb532204897@gmail.com>

refactor handler; add more test cases to e2e and unit tests

56caec6

Signed-off-by: Ben Ye <yb532204897@gmail.com>

yeya24 force-pushed the add-thanos-query-frontend branch from 67c5609 to 56caec6 Compare August 11, 2020 04:50

bwplotka approved these changes Aug 11, 2020

View reviewed changes

bwplotka merged commit 2ea2c2b into thanos-io:master Aug 11, 2020

yeya24 deleted the add-thanos-query-frontend branch August 11, 2020 12:41

Conversation

yeya24 commented Aug 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Verification

Uh oh!

Uh oh!

Uh oh!

brancz commented Aug 6, 2020

Uh oh!

bwplotka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeya24 Aug 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeya24 commented Aug 7, 2020

Uh oh!

bwplotka commented Aug 7, 2020

Uh oh!

bwplotka commented Aug 7, 2020

Uh oh!

yeya24 commented Aug 8, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeya24 commented Aug 9, 2020

Uh oh!

bwplotka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bwplotka commented Aug 10, 2020

Uh oh!

yeya24 commented Aug 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yeya24 commented Aug 10, 2020

Uh oh!

bwplotka commented Aug 10, 2020

Uh oh!

yeya24 commented Aug 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yeya24 commented Aug 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

yeya24 commented Aug 4, 2020 •

edited

Loading

yeya24 Aug 9, 2020 •

edited

Loading

yeya24 commented Aug 10, 2020 •

edited

Loading

yeya24 commented Aug 11, 2020 •

edited

Loading

yeya24 commented Aug 11, 2020 •

edited

Loading

yeya24 Aug 11, 2020 •

edited

Loading