Support max source resolution for instant queries#1431
Conversation
dc7a3c3 to
765ac90
Compare
|
This looks right to me, but it'd be good if @bwplotka and or @GiedriusS have a look as well. |
GiedriusS
left a comment
There was a problem hiding this comment.
The idea to add that parameter there is good but not sure about the flag. Perhaps we could re-use --query.auto-downsampling here and automatically select it depending on the range vectors?
a430f7e to
4765b2a
Compare
|
Hi @GiedriusS, that might be a significantly more comprehensive and powerful solution. There is just one concern that comes to mind. Wouldn't the querier need to be aware of the retention configured for the different compaction levels? How else would it have translated the Configuring a |
|
That's what we set it to in our setup now and would probably leave it at. I just wasn't sure whether to include it in the Pull-Request. |
Indeed, it would be a much bigger change but not sure that introducing a new flag is the best solution. It's always easiest to do that but then after some time you end with a thing which has hundreds of knobs which is bad. I think my original suggestion would work out because actually we do downsampling at fixed intervals. The options that you've pasted only control the retention part. I have outlined this in my other issue what I have started about how Thanos Store should always prefer higher resolution data. I still would very much like @bwplotka to comment on this. |
bwplotka
left a comment
There was a problem hiding this comment.
👋 Thanks for this all!
I am fine with unblocking you @0robustus1 on this, if we make this flag hidden for now. Once that is done LGTM 👍 (: Essentially for long term direction we need bit more design/discussion. IMO there might be some work on PromQL to improve this flow (more below).
Overall, I agree with @GiedriusS and I would vote for query instant asking for range vector using exactly the same logic as the range query. I would really reuse --query.auto-downsampling and same flow.
There is just one concern that comes to mind. Wouldn't the querier need to be aware of the retention configured for the different compaction levels? How else would it have translated the 30d
Technically it would be nice to control all by the step you provide to the query... which you cannot on the instant query. I think there were discussions on enabling step for range vector selector [ .. ] and that would be helpful. I would really love to follow on that with Prometheus team since at some point we want to propose downsampling to Prometheus itself.
Anyway, from Thanos perspective IMO querier should not be aware of anything on storage layer, especially retention - all it knows about is StoreAPI
The options that you've pasted only control the retention part. I have outlined this in my other issue what I have started about how Thanos Store should always prefer higher resolution data.
This is an interesting one. Especially for range vectors and functions, it might be really faster and equally accurate (: in most cases. (: But I don't think for all cases. I think still having granular control for each range vector selector would be the solution here, but let's discuss.
8e2c093 to
1fc56e5
Compare
|
I adjusted the code to the remaining review comments. The flag is now hidden. |
1fc56e5 to
5e69042
Compare
This adds support for the ?max_source_resolution query param for instant queries. Instant queries had the issue that when there was a subquery requesting wide data ranges e.g. sum_over_time(metric_name[30d])) and the retention of raw data was for example only 7d, the query would (silently) only take data from the last 7 days into account (ignoring the downsampled 5m and 1h timeseries that were available with longer retention). We default to 1h max_source_resolution for now as this will cover the highest/broadest resolution available and should include all retention. It is configurable per query (although no known clients send it). Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>
The flag only affects instant queries. Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>
Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>
Instead of defining an additional function we do the adjustment of maxSourceResolution directly in query_range. Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>
Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>
Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>
Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>
Signed-off-by: Tim Reddehase <tim.reddehase@xing.com>
5e69042 to
2433524
Compare
|
Actually, I was super wrong.. All things I mentioned with resolution per range works right now: https://prometheus.io/blog/2019/01/28/subquery-support/#subqueries I might think we need to change slightly implementation then for querier and use per selection |
|
Failed quick experiment here so merging this "hidden" workaround for now. Thanks for doing this! Would love to have your input on further work in this area @0robustus1 |
thanos query --query.instant.default.max_source_resolutionflag to set a default value for maxSourceResolution on instant queries (as are for example used by thethanos rulecomponent)/api/v1/queryReasoning
We have configured our compactor with different retentions for different resolution levels:
This causes a situation where the result of a
thanos rulerecording rule wasonly taking the last 7-8 days of data into account, because instant queries were
always using a maxSourceResolution of 0 and therefore only using raw resolution
data.
In our case the query looked like this and was executed on a thanos ruler
because our prometheuses only have a relatively short retention of a few days:
The
thanos rulecomponent runs an instant query in order to retrieve a singledatapoint to set for the recording rule metric. Our query requires that data
from at least 30 days is taken into account. But by default it would only select
raw data and therefore only provide data from the last 7 days.
Changes
This allows for
/api/v1/queryto accept the?max_source_resolutionquery parameter(the documentation did not clearly state that it was only accepted for
/api/v1/query_rangeprior to this).We also added a command line argument to set a default should the query parameter not be set.
We also propose the following (for later PRs):
max_source_resolutionfor "Console"queries (which are executing instant queries).
max_source_resolutionon a per-recording-rule basis.Verification
max_source_resolutionparameter set and compared it with the last result we got from a range query.
rule was now producing the expected value.