query/querier: fix sum() inflated values problem#1278
Merged
brancz merged 7 commits intothanos-io:masterfrom Jun 27, 2019
Merged
query/querier: fix sum() inflated values problem#1278brancz merged 7 commits intothanos-io:masterfrom
brancz merged 7 commits intothanos-io:masterfrom
Conversation
Add a test for a typical setup of one Sidecar connected + one or more Thanos Store nodes. Testing how the whole thing really works.
added 3 commits
June 25, 2019 16:15
It is to be expected that Prometheus code will select the latest value in any time window because otherwise the implicit conversion between raw and pre-aggregated would not work.
This is not needed.
Member
Author
Member
|
Nice find. lgtm 👍 |
brancz
approved these changes
Jun 26, 2019
Contributor
|
Where is my approve button? I want to hit it. :-) Thanks! |
Member
|
I think we've reached sufficient consensus that this is correct. @bwplotka feel free to still review, but I'll go ahead and merge :) |
bwplotka
reviewed
Jun 28, 2019
Member
bwplotka
left a comment
There was a problem hiding this comment.
Thanks! I think this fix makes sense but I am worried there is more to it. E.g I am not sure if sum_ shouldn't be the same here. I need to dive into overall caller logic as well to tell.
| return []storepb.Aggr{storepb.Aggr_COUNT}, resAggrCount | ||
| } | ||
| if f == "sum" || strings.HasPrefix(f, "sum_") { | ||
| // f == "sum" falls through here since we want the actual samples |
Member
There was a problem hiding this comment.
Missing trailing period.
Also I don't understand this comment itself - it makes sense after reading this PR, but it's otherwise not clear. I would add more explanation here. (:
| return time.Unix(int64(s), int64(ns*float64(time.Second))) | ||
| } | ||
|
|
||
| st := ptm("0") |
Member
There was a problem hiding this comment.
I would be clear in variable names here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the problem with
sumand inflated values as outlined here: #922.The problem was the following:
sumselects the last value of each time series in each window and adds the different dimensions up however our code asks for the downsampled sum value which is equal to all samples added up in either a 5m or 1h window, and adding those up, obviously, results in an inflated value. The practical result was that as if we appliedsum_over_time(...[5m])on top. So the fix is to ask for the last sample in each window in the case ofsuminstead of the aggregated value.Also adds an E2E test for a typical setup of one Sidecar connected + one or more
Thanos Store nodes. Tests and shows how the whole thing with
sumreally works.Testing: wrote a query like
sum(kafka_log_log_value{topic="iam"})with identical Sidecar/Store nodes and selectedMax 5m/1h downsampling. With this fix the values look sane, without - not (it jumps up very high after the downsampled data comes into play).