Skip to content

store/proxy: Deduplicate chunks on StoreAPI level. Recommend chunk sorting for StoreAPI + Optimized iter chunk dedup.#2710

Merged
bwplotka merged 4 commits intomasterfrom
dedup-same-chunks
Jun 3, 2020
Merged

store/proxy: Deduplicate chunks on StoreAPI level. Recommend chunk sorting for StoreAPI + Optimized iter chunk dedup.#2710
bwplotka merged 4 commits intomasterfrom
dedup-same-chunks

Conversation

@bwplotka
Copy link
Member

@bwplotka bwplotka commented Jun 3, 2020

Actually do properly the #2546

This is a rebased version of #2603 - It was wrongly merged to chained, already merged PR. + small fix to avoid .String()

This also has to be cherry picked to 0.13

bwplotka added 3 commits June 3, 2020 17:41
…ng for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@bwplotka bwplotka requested review from brancz, pracucci and yeya24 June 3, 2020 16:50
@bwplotka
Copy link
Member Author

bwplotka commented Jun 3, 2020

See benchmarks here: #2603 (comment)

Also we desperatedly needs Query benchmarks (there are in some old forgotten PR...)

Copy link
Contributor

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😹


for _, c := range chks[1:] {
if ret[len(ret)-1].String() == c.String() {
if ret[len(ret)-1].Compare(c) == 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brancz solving regression here.

Never use proto .String() in fast path!

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@bwplotka bwplotka force-pushed the dedup-same-chunks branch from 37af4c2 to 00c71a4 Compare June 3, 2020 18:11
@bwplotka bwplotka merged commit 2000451 into master Jun 3, 2020
@bwplotka bwplotka deleted the dedup-same-chunks branch June 3, 2020 18:28
bwplotka added a commit that referenced this pull request Jun 3, 2020
…rting for StoreAPI + Optimized iter chunk dedup. (#2710)

* Deduplicate chunk dups on proxy StoreAPI level. Recommend chunk sorting for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Optimized algorithm to combine series only on start.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Optimized chunk comparision for overlaps.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Optimized deduplication for deduplicated chunk on query level as well.

Never use proto .String() in fast path!

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
# Conflicts:
#	CHANGELOG.md
#	pkg/store/storepb/custom.go
#	pkg/store/storepb/custom_test.go
bwplotka added a commit that referenced this pull request Jun 3, 2020
…rting for StoreAPI + Optimized iter chunk dedup. (#2710)

* Deduplicate chunk dups on proxy StoreAPI level. Recommend chunk sorting for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Optimized algorithm to combine series only on start.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Optimized chunk comparision for overlaps.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Optimized deduplication for deduplicated chunk on query level as well.

Never use proto .String() in fast path!

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
# Conflicts:
#	CHANGELOG.md
#	pkg/store/storepb/custom.go
#	pkg/store/storepb/custom_test.go
brancz pushed a commit that referenced this pull request Jun 4, 2020
…rting for StoreAPI + Optimized iter chunk dedup. (#2710) (#2711)

* Deduplicate chunk dups on proxy StoreAPI level. Recommend chunk sorting for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Optimized algorithm to combine series only on start.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Optimized chunk comparision for overlaps.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Optimized deduplication for deduplicated chunk on query level as well.

Never use proto .String() in fast path!

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
# Conflicts:
#	CHANGELOG.md
#	pkg/store/storepb/custom.go
#	pkg/store/storepb/custom_test.go
paulfantom added a commit to paulfantom/thanos that referenced this pull request Jul 8, 2020
openshift/master

* upstream/release-0.13:
  Cut release v0.13.0
  shipper: Be strict about upload order unless it's specified so & cut v0.13.0-rc.2 (thanos-io#2765)
  Cut 0.13.0 release. (thanos-io#2762)
  Cut release 0.13.0-rc.1 (thanos-io#2720)
  Store: `irate` and `resets` use now counter downsampling aggregations. (thanos-io#2719)
  deps: Updated minio-go dependency to v6.0.56 to add two region endpoints (thanos-io#2705) (thanos-io#2718)
  store/proxy: Deduplicate chunks on StoreAPI level. Recommend chunk sorting for StoreAPI + Optimized iter chunk dedup. (thanos-io#2710) (thanos-io#2711)
  Allow using multiple memcached clients at the same time. (thanos-io#2648) (thanos-io#2698)
  Updated Prometheus as little as possible to include Isolation fix. (thanos-io#2697)
  Release fix attempt2.
  Fixed test job. (thanos-io#2650)
  Fixed promu build to build in compatible directory that crossbuild understands.
  Cut v0.13.0-rc.0 (thanos-io#2628)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants