shipper: Be strict about upload order unless it's specified so & cut v0.13.0-rc.2#2765
shipper: Be strict about upload order unless it's specified so & cut v0.13.0-rc.2#2765bwplotka merged 2 commits intorelease-0.13from
Conversation
brancz
left a comment
There was a problem hiding this comment.
lgtm, just wondering though .. what's the use case of not uploading oldest to newest? do we really need a flag?
|
I mentioned this in flag help. Why I created an option to still enable it:
Those arguments are not strong I agree, I am happy to reconsider this... Maybe moving this flag to Also technically we should do rc.3 with this not full release. Thoughts? |
squat
left a comment
There was a problem hiding this comment.
looks pretty good! Just some language nits
CHANGELOG.md
Outdated
| - [#2416](https://github.com/thanos-io/thanos/pull/2416) Bucket: Fixed issue #2416 bug in `inspect --sort-by` doesn't work correctly in all cases. | ||
| - [#2719](https://github.com/thanos-io/thanos/pull/2719) Query: `irate` and `resets` use now counter downsampling aggregations. | ||
| - [#2705](https://github.com/thanos-io/thanos/pull/2705) minio-go: Added support for `af-south-1` and `eu-south-1` regions. | ||
| - [#2753](https://github.com/thanos-io/thanos/issues/2753) Sidecar,Receive,Rule: Fixed cause for compactor overlapping blocks in upload error cases. |
There was a problem hiding this comment.
nit on comment formatting: we should have spaces between component names
cmd/thanos/receive.go
Outdated
| walCompression := cmd.Flag("tsdb.wal-compression", "Compress the tsdb WAL.").Default("true").Bool() | ||
|
|
||
| allowOutOfOrderUpload := cmd.Flag("shipper.allow-out-of-order-uploads", | ||
| "If true shipper will skip failed block uploads in given iteration and retry later. This means that some newer blocks might uploaded sooner than older."+ |
There was a problem hiding this comment.
Let's keep these as full sentences to make it more readable.
| "If true shipper will skip failed block uploads in given iteration and retry later. This means that some newer blocks might uploaded sooner than older."+ | |
| "If true, shipper will skip failed block uploads in the given iteration and retry later. This means that some newer blocks might be uploaded sooner than older blocks."+ |
cmd/thanos/receive.go
Outdated
|
|
||
| allowOutOfOrderUpload := cmd.Flag("shipper.allow-out-of-order-uploads", | ||
| "If true shipper will skip failed block uploads in given iteration and retry later. This means that some newer blocks might uploaded sooner than older."+ | ||
| "This will trigger compaction without those blocks, as a resulted create 'valid overlap situation'. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+ |
There was a problem hiding this comment.
| "This will trigger compaction without those blocks, as a resulted create 'valid overlap situation'. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+ | |
| "This will trigger compaction without those blocks and as a result will create a 'valid overlap situation'. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+ |
There was a problem hiding this comment.
just for clarification, what is a 'valid overlap situation'?
There was a problem hiding this comment.
I changed to just an overlap situation
docs/operating/troubleshooting.md
Outdated
|
|
||
| ### Reasons | ||
|
|
||
| - You are running Thanos (sidecar, ruler or receive) older than 0.13.0. During transient upload errors there was possibility to have overlaps caused by compactor not being aware of all blocks See: [this](https://github.com/thanos-io/thanos/issues/2753) |
There was a problem hiding this comment.
| - You are running Thanos (sidecar, ruler or receive) older than 0.13.0. During transient upload errors there was possibility to have overlaps caused by compactor not being aware of all blocks See: [this](https://github.com/thanos-io/thanos/issues/2753) | |
| - You are running Thanos (sidecar, ruler or receive) older than 0.13.0. During transient upload errors there is a possibility to have overlaps caused by the compactor not being aware of all blocks See: [this](https://github.com/thanos-io/thanos/issues/2753) |
|
|
||
| if err := s.upload(ctx, m); err != nil { | ||
| level.Error(s.logger).Log("msg", "shipping failed", "block", m.ULID, "err", err) | ||
| if !s.allowOutOfOrderUploads { |
| var metas []*metadata.Meta | ||
| // blockMetasFromOldest returns the block meta of each block found in dir | ||
| // sorted by minTime asc. | ||
| func (s *Shipper) blockMetasFromOldest() (metas []*metadata.Meta, _ error) { |
There was a problem hiding this comment.
nice, I think this is a good simplification
Yes I think rc.3 is the right thing to do here. I think a hidden flag for now sounds good. |
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
openshift/master * upstream/release-0.13: Cut release v0.13.0 shipper: Be strict about upload order unless it's specified so & cut v0.13.0-rc.2 (thanos-io#2765) Cut 0.13.0 release. (thanos-io#2762) Cut release 0.13.0-rc.1 (thanos-io#2720) Store: `irate` and `resets` use now counter downsampling aggregations. (thanos-io#2719) deps: Updated minio-go dependency to v6.0.56 to add two region endpoints (thanos-io#2705) (thanos-io#2718) store/proxy: Deduplicate chunks on StoreAPI level. Recommend chunk sorting for StoreAPI + Optimized iter chunk dedup. (thanos-io#2710) (thanos-io#2711) Allow using multiple memcached clients at the same time. (thanos-io#2648) (thanos-io#2698) Updated Prometheus as little as possible to include Isolation fix. (thanos-io#2697) Release fix attempt2. Fixed test job. (thanos-io#2650) Fixed promu build to build in compatible directory that crossbuild understands. Cut v0.13.0-rc.0 (thanos-io#2628)
This is actually a quite real case for potential overlaps in Thanos system, so fixing before 0.13.
Fixes #2753
Thanks @gburek-fastly for all pointers, it helped us to narrow this down 💪
Signed-off-by: Bartlomiej Plotka bwplotka@gmail.com