Support max source resolution for instant queries by 0robustus1 · Pull Request #1431 · thanos-io/thanos

0robustus1 · 2019-08-16T13:57:00Z

[] Added thanos query --query.instant.default.max_source_resolution flag to set a default value for maxSourceResolution on instant queries (as are for example used by the thanos rule component)
[] Added support for the max_source_resolution query parameter for /api/v1/query

Reasoning

We have configured our compactor with different retentions for different resolution levels:

thanos
  compact
  --log.level=debug
  --data-dir=/var/thanos/store
  --objstore.config-file=/s3/thanos.yaml
  --retention.resolution-raw=7d
  --retention.resolution-5m=30d
  --retention.resolution-1h=180d
  --wait

This causes a situation where the result of a thanos rule recording rule was
only taking the last 7-8 days of data into account, because instant queries were
always using a maxSourceResolution of 0 and therefore only using raw resolution
data.

In our case the query looked like this and was executed on a thanos ruler
because our prometheuses only have a relatively short retention of a few days:

(100 -
  sum_over_time (
    some_metric:errors_count:increase1d[30d]
  ) /
  sum_over_time (
    some_metric:total_count:increase1d[30d]
  ) * 100
)

The thanos rule component runs an instant query in order to retrieve a single
datapoint to set for the recording rule metric. Our query requires that data
from at least 30 days is taken into account. But by default it would only select
raw data and therefore only provide data from the last 7 days.

Changes

This allows for /api/v1/query to accept the ?max_source_resolution query parameter
(the documentation did not clearly state that it was only accepted for
/api/v1/query_range prior to this).
We also added a command line argument to set a default should the query parameter not be set.

We also propose the following (for later PRs):

Amend the UI to allow setting the max_source_resolution for "Console"
queries (which are executing instant queries).
Add a new field to the rule-yaml for thanos ruler to set the
max_source_resolution on a per-recording-rule basis.

Verification

We've used a fork to fire individual instant queries with the max_source_resolution
parameter set and compared it with the last result we got from a range query.
We also deployed a version with the default value changed and observed that the recording
rule was now producing the expected value.

brancz · 2019-08-20T18:54:09Z

This looks right to me, but it'd be good if @bwplotka and or @GiedriusS have a look as well.

pkg/query/api/v1.go

cmd/thanos/query.go

GiedriusS

The idea to add that parameter there is good but not sure about the flag. Perhaps we could re-use --query.auto-downsampling here and automatically select it depending on the range vectors?

0robustus1 · 2019-08-21T13:32:15Z

Hi @GiedriusS, that might be a significantly more comprehensive and powerful solution.
It sounds however like a larger-scale change (that should then be adopted for all query types).

There is just one concern that comes to mind. Wouldn't the querier need to be aware of the retention configured for the different compaction levels? How else would it have translated the 30d range selector to the correct maxSourceResolution (1h) in a retention setup like this:

thanos
  compact
  --log.level=debug
  --data-dir=/var/thanos/store
  --objstore.config-file=/s3/thanos.yaml
  --retention.resolution-raw=7d
  --retention.resolution-5m=11d
  --retention.resolution-1h=365d
  --wait

Configuring a maxSourceResolution via the flag would enable that.

povilasv

Hmm, maybe we should change the default --query.instant.default.max_source_resolution=0s to 1h.

As it has a bit buggy behavior if you configure retention like in the example.

Thoughts?

0robustus1 · 2019-08-23T09:48:14Z

That's what we set it to in our setup now and would probably leave it at. I just wasn't sure whether to include it in the Pull-Request.

GiedriusS · 2019-08-25T09:31:01Z

Hi @GiedriusS, that might be a significantly more comprehensive and powerful solution.
It sounds however like a larger-scale change (that should then be adopted for all query types).

There is just one concern that comes to mind. Wouldn't the querier need to be aware of the retention configured for the different compaction levels? How else would it have translated the 30d range selector to the correct maxSourceResolution (1h) in a retention setup like this:
thanos
  compact
  --log.level=debug
  --data-dir=/var/thanos/store
  --objstore.config-file=/s3/thanos.yaml
  --retention.resolution-raw=7d
  --retention.resolution-5m=11d
  --retention.resolution-1h=365d
  --wait
Configuring a maxSourceResolution via the flag would enable that.

Indeed, it would be a much bigger change but not sure that introducing a new flag is the best solution. It's always easiest to do that but then after some time you end with a thing which has hundreds of knobs which is bad.

I think my original suggestion would work out because actually we do downsampling at fixed intervals. The options that you've pasted only control the retention part. I have outlined this in my other issue what I have started about how Thanos Store should always prefer higher resolution data.

I still would very much like @bwplotka to comment on this.

bwplotka

👋 Thanks for this all!

I am fine with unblocking you @0robustus1 on this, if we make this flag hidden for now. Once that is done LGTM 👍 (: Essentially for long term direction we need bit more design/discussion. IMO there might be some work on PromQL to improve this flow (more below).

Overall, I agree with @GiedriusS and I would vote for query instant asking for range vector using exactly the same logic as the range query. I would really reuse --query.auto-downsampling and same flow.

There is just one concern that comes to mind. Wouldn't the querier need to be aware of the retention configured for the different compaction levels? How else would it have translated the 30d

Technically it would be nice to control all by the step you provide to the query... which you cannot on the instant query. I think there were discussions on enabling step for range vector selector [ .. ] and that would be helpful. I would really love to follow on that with Prometheus team since at some point we want to propose downsampling to Prometheus itself.

Anyway, from Thanos perspective IMO querier should not be aware of anything on storage layer, especially retention - all it knows about is StoreAPI

The options that you've pasted only control the retention part. I have outlined this in my other issue what I have started about how Thanos Store should always prefer higher resolution data.

This is an interesting one. Especially for range vectors and functions, it might be really faster and equally accurate (: in most cases. (: But I don't think for all cases. I think still having granular control for each range vector selector would be the solution here, but let's discuss.

cmd/thanos/query.go

pkg/query/api/v1.go

pkg/query/api/v1_test.go

0robustus1 · 2019-08-26T14:17:11Z

I adjusted the code to the remaining review comments. The flag is now hidden.

This adds support for the ?max_source_resolution query param for instant queries. Instant queries had the issue that when there was a subquery requesting wide data ranges e.g. sum_over_time(metric_name[30d])) and the retention of raw data was for example only 7d, the query would (silently) only take data from the last 7 days into account (ignoring the downsampled 5m and 1h timeseries that were available with longer retention). We default to 1h max_source_resolution for now as this will cover the highest/broadest resolution available and should include all retention. It is configurable per query (although no known clients send it). Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>

The flag only affects instant queries. Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>

Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>

Instead of defining an additional function we do the adjustment of maxSourceResolution directly in query_range. Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>

Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>

bwplotka · 2019-08-27T17:16:54Z

Actually, I was super wrong.. All things I mentioned with resolution per range works right now: https://prometheus.io/blog/2019/01/28/subquery-support/#subqueries

I might think we need to change slightly implementation then for querier and use per selection Step int64 // Query step size in milliseconds. instead of global, query one.

bwplotka · 2019-08-27T19:53:39Z

Failed quick experiment here so merging this "hidden" workaround for now.

Thanks for doing this! Would love to have your input on further work in this area @0robustus1

0robustus1 force-pushed the support_max_source_resolution_for_instant_queries branch 2 times, most recently from dc7a3c3 to 765ac90 Compare August 16, 2019 14:06

bwplotka requested review from GiedriusS, brancz, fabxc and povilasv August 16, 2019 15:49

brancz approved these changes Aug 20, 2019

View reviewed changes

povilasv requested changes Aug 21, 2019

View reviewed changes

pkg/query/api/v1.go Outdated Show resolved Hide resolved

cmd/thanos/query.go Outdated Show resolved Hide resolved

GiedriusS reviewed Aug 21, 2019

View reviewed changes

0robustus1 force-pushed the support_max_source_resolution_for_instant_queries branch 2 times, most recently from a430f7e to 4765b2a Compare August 21, 2019 08:16

povilasv approved these changes Aug 23, 2019

View reviewed changes

bwplotka approved these changes Aug 25, 2019

View reviewed changes

cmd/thanos/query.go Outdated Show resolved Hide resolved

bwplotka reviewed Aug 25, 2019

View reviewed changes

cmd/thanos/query.go Outdated Show resolved Hide resolved

bwplotka reviewed Aug 25, 2019

View reviewed changes

pkg/query/api/v1.go Outdated Show resolved Hide resolved

bwplotka reviewed Aug 25, 2019

View reviewed changes

pkg/query/api/v1_test.go Outdated Show resolved Hide resolved

0robustus1 force-pushed the support_max_source_resolution_for_instant_queries branch from 8e2c093 to 1fc56e5 Compare August 26, 2019 14:00

0robustus1 force-pushed the support_max_source_resolution_for_instant_queries branch from 1fc56e5 to 5e69042 Compare August 27, 2019 07:40

0robustus1 added 7 commits August 27, 2019 14:04

add flag for defaultMaxSourceResolution

2f0322c

The flag only affects instant queries. Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>

update docs (forgot the make docs).

9cf0be5

Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>

use step/5 directly within query_range

adb02ed

Instead of defining an additional function we do the adjustment of maxSourceResolution directly in query_range. Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>

add instant-query flag example case documentation.

82338ae

Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>

adjust naming & comments to review comments.

298a929

Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>

hide --query.instant.default.max_source_resolution.

4dc1856

Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>

remove hidden instant-query resolution flag from docs.

2433524

Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>

0robustus1 force-pushed the support_max_source_resolution_for_instant_queries branch from 5e69042 to 2433524 Compare August 27, 2019 12:06

bwplotka mentioned this pull request Aug 27, 2019

Choose lowest downsampling resolution with granular control. #1465

Closed

bwplotka merged commit 27b4705 into thanos-io:master Aug 27, 2019

rekup mentioned this pull request Apr 4, 2023

"unhide" query.instant.default.max_source_resolution Flag #6261

Open

Conversation

0robustus1 commented Aug 16, 2019

Reasoning

Changes

Verification

Uh oh!

brancz commented Aug 20, 2019

Uh oh!

Uh oh!

Uh oh!

GiedriusS left a comment

Choose a reason for hiding this comment

Uh oh!

0robustus1 commented Aug 21, 2019

Uh oh!

povilasv left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0robustus1 commented Aug 23, 2019

Uh oh!

GiedriusS commented Aug 25, 2019

Uh oh!

bwplotka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

0robustus1 commented Aug 26, 2019

Uh oh!

bwplotka commented Aug 27, 2019

Uh oh!

bwplotka commented Aug 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

povilasv left a comment •

edited

Loading