query/querier: fix sum() inflated values problem by GiedriusS · Pull Request #1278 · thanos-io/thanos

GiedriusS · 2019-06-25T11:57:19Z

Fixes the problem with sum and inflated values as outlined here: #922.

The problem was the following: sum selects the last value of each time series in each window and adds the different dimensions up however our code asks for the downsampled sum value which is equal to all samples added up in either a 5m or 1h window, and adding those up, obviously, results in an inflated value. The practical result was that as if we applied sum_over_time(...[5m]) on top. So the fix is to ask for the last sample in each window in the case of sum instead of the aggregated value.

Also adds an E2E test for a typical setup of one Sidecar connected + one or more
Thanos Store nodes. Tests and shows how the whole thing with sum really works.

Testing: wrote a query like sum(kafka_log_log_value{topic="iam"}) with identical Sidecar/Store nodes and selected Max 5m/1h downsampling. With this fix the values look sane, without - not (it jumps up very high after the downsampled data comes into play).

Add a test for a typical setup of one Sidecar connected + one or more Thanos Store nodes. Testing how the whole thing really works.

bwplotka

Nice, looking good so far.

It is to be expected that Prometheus code will select the latest value in any time window because otherwise the implicit conversion between raw and pre-aggregated would not work.

This is not needed.

GiedriusS · 2019-06-26T18:04:41Z

cc @jjneely. It's funny that we've arrived that the same conclusion even though I haven't looked at the comment that you've made in #922.

brancz · 2019-06-26T19:06:58Z

Nice find. lgtm 👍

povilasv

LGTM

jjneely · 2019-06-27T20:35:30Z

Where is my approve button? I want to hit it. :-) Thanks!

brancz · 2019-06-27T20:38:53Z

I think we've reached sufficient consensus that this is correct. @bwplotka feel free to still review, but I'll go ahead and merge :)

bwplotka

Thanks! I think this fix makes sense but I am worried there is more to it. E.g I am not sure if sum_ shouldn't be the same here. I need to dive into overall caller logic as well to tell.

bwplotka · 2019-06-27T23:48:12Z

pkg/query/querier.go

 		return []storepb.Aggr{storepb.Aggr_COUNT}, resAggrCount
 	}
-	if f == "sum" || strings.HasPrefix(f, "sum_") {
+	// f == "sum" falls through here since we want the actual samples


Missing trailing period.

Also I don't understand this comment itself - it makes sense after reading this PR, but it's otherwise not clear. I would add more explanation here. (:

bwplotka · 2019-06-27T23:49:22Z

pkg/query/querier_test.go

+		return time.Unix(int64(s), int64(ns*float64(time.Second)))
+	}
+
+	st := ptm("0")


I would be clear in variable names here.

query/querier_test: add minimal test for typical setup

1601ab1

Add a test for a typical setup of one Sidecar connected + one or more Thanos Store nodes. Testing how the whole thing really works.

bwplotka reviewed Jun 25, 2019

View reviewed changes

Giedrius Statkevičius added 3 commits June 25, 2019 16:15

query/querier_test: fix test

971bf63

It is to be expected that Prometheus code will select the latest value in any time window because otherwise the implicit conversion between raw and pre-aggregated would not work.

query: improve test, switch around iter

0a6a75b

querier: fix sum() data request

53e940c

GiedriusS changed the title ~~query/querier_test: add minimal test for typical setup~~ query/querier: fix sum() inflated values problem Jun 26, 2019

query/iter: revert change

9ae7489

This is not needed.

GiedriusS marked this pull request as ready for review June 26, 2019 07:40

GiedriusS requested review from brancz, bwplotka and povilasv June 26, 2019 07:40

Giedrius Statkevičius added 2 commits June 26, 2019 10:42

query/querier: add explanatory comment

49c2517

query/querier_test: fix tests

e168d2d

brancz approved these changes Jun 26, 2019

View reviewed changes

povilasv approved these changes Jun 27, 2019

View reviewed changes

brancz merged commit 19a51b1 into thanos-io:master Jun 27, 2019

bwplotka reviewed Jun 28, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query/querier: fix sum() inflated values problem#1278

query/querier: fix sum() inflated values problem#1278
brancz merged 7 commits intothanos-io:masterfrom
GiedriusS:mixed_chunks

GiedriusS commented Jun 25, 2019 •

edited

Loading

Uh oh!

bwplotka left a comment

Uh oh!

GiedriusS commented Jun 26, 2019 •

edited

Loading

Uh oh!

brancz commented Jun 26, 2019

Uh oh!

povilasv left a comment

Uh oh!

jjneely commented Jun 27, 2019

Uh oh!

brancz commented Jun 27, 2019

Uh oh!

bwplotka left a comment

Uh oh!

bwplotka Jun 27, 2019

Uh oh!

bwplotka Jun 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

GiedriusS commented Jun 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bwplotka left a comment

Choose a reason for hiding this comment

Uh oh!

GiedriusS commented Jun 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brancz commented Jun 26, 2019

Uh oh!

povilasv left a comment

Choose a reason for hiding this comment

Uh oh!

jjneely commented Jun 27, 2019

Uh oh!

brancz commented Jun 27, 2019

Uh oh!

bwplotka left a comment

Choose a reason for hiding this comment

Uh oh!

bwplotka Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

bwplotka Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

GiedriusS commented Jun 25, 2019 •

edited

Loading

GiedriusS commented Jun 26, 2019 •

edited

Loading