Skip to content

shipper: Be strict about upload order unless it's specified so & cut v0.13.0-rc.2#2765

Merged
bwplotka merged 2 commits intorelease-0.13from
fix2753
Jun 15, 2020
Merged

shipper: Be strict about upload order unless it's specified so & cut v0.13.0-rc.2#2765
bwplotka merged 2 commits intorelease-0.13from
fix2753

Conversation

@bwplotka
Copy link
Member

This is actually a quite real case for potential overlaps in Thanos system, so fixing before 0.13.

Fixes #2753

Thanks @gburek-fastly for all pointers, it helped us to narrow this down 💪

Signed-off-by: Bartlomiej Plotka bwplotka@gmail.com

Copy link
Member

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just wondering though .. what's the use case of not uploading oldest to newest? do we really need a flag?

@bwplotka
Copy link
Member Author

I mentioned this in flag help. Why I created an option to still enable it:

  • In case if you don't have the persistent disk you kind of wants to upload as soon as possible. Also this allows you to have block with malformed metadata and it will be just ignored if you set this flag to true (well ignored + error log, metric). Also with upload-compacted you might want best effort upload as soon as possible.
  • For all those releases we were allowing out of order uploads, so someone might have rely on it (?)

Those arguments are not strong I agree, I am happy to reconsider this... Maybe moving this flag to Hidden for now?

Also technically we should do rc.3 with this not full release. Thoughts?

Copy link
Member

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good! Just some language nits

CHANGELOG.md Outdated
- [#2416](https://github.com/thanos-io/thanos/pull/2416) Bucket: Fixed issue #2416 bug in `inspect --sort-by` doesn't work correctly in all cases.
- [#2719](https://github.com/thanos-io/thanos/pull/2719) Query: `irate` and `resets` use now counter downsampling aggregations.
- [#2705](https://github.com/thanos-io/thanos/pull/2705) minio-go: Added support for `af-south-1` and `eu-south-1` regions.
- [#2753](https://github.com/thanos-io/thanos/issues/2753) Sidecar,Receive,Rule: Fixed cause for compactor overlapping blocks in upload error cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit on comment formatting: we should have spaces between component names

walCompression := cmd.Flag("tsdb.wal-compression", "Compress the tsdb WAL.").Default("true").Bool()

allowOutOfOrderUpload := cmd.Flag("shipper.allow-out-of-order-uploads",
"If true shipper will skip failed block uploads in given iteration and retry later. This means that some newer blocks might uploaded sooner than older."+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep these as full sentences to make it more readable.

Suggested change
"If true shipper will skip failed block uploads in given iteration and retry later. This means that some newer blocks might uploaded sooner than older."+
"If true, shipper will skip failed block uploads in the given iteration and retry later. This means that some newer blocks might be uploaded sooner than older blocks."+


allowOutOfOrderUpload := cmd.Flag("shipper.allow-out-of-order-uploads",
"If true shipper will skip failed block uploads in given iteration and retry later. This means that some newer blocks might uploaded sooner than older."+
"This will trigger compaction without those blocks, as a resulted create 'valid overlap situation'. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"This will trigger compaction without those blocks, as a resulted create 'valid overlap situation'. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+
"This will trigger compaction without those blocks and as a result will create a 'valid overlap situation'. Set it to true if you have vertical compaction enabled and wish to upload blocks as soon as possible without caring"+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for clarification, what is a 'valid overlap situation'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to just an overlap situation


### Reasons

- You are running Thanos (sidecar, ruler or receive) older than 0.13.0. During transient upload errors there was possibility to have overlaps caused by compactor not being aware of all blocks See: [this](https://github.com/thanos-io/thanos/issues/2753)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- You are running Thanos (sidecar, ruler or receive) older than 0.13.0. During transient upload errors there was possibility to have overlaps caused by compactor not being aware of all blocks See: [this](https://github.com/thanos-io/thanos/issues/2753)
- You are running Thanos (sidecar, ruler or receive) older than 0.13.0. During transient upload errors there is a possibility to have overlaps caused by the compactor not being aware of all blocks See: [this](https://github.com/thanos-io/thanos/issues/2753)


if err := s.upload(ctx, m); err != nil {
level.Error(s.logger).Log("msg", "shipping failed", "block", m.ULID, "err", err)
if !s.allowOutOfOrderUploads {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

var metas []*metadata.Meta
// blockMetasFromOldest returns the block meta of each block found in dir
// sorted by minTime asc.
func (s *Shipper) blockMetasFromOldest() (metas []*metadata.Meta, _ error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, I think this is a good simplification

@brancz
Copy link
Member

brancz commented Jun 15, 2020

Also technically we should do rc.3 with this not full release. Thoughts?

Yes I think rc.3 is the right thing to do here.

I think a hidden flag for now sounds good.

bwplotka added 2 commits June 15, 2020 15:26
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@bwplotka bwplotka changed the title shipper: Be strict about upload order unless it's specified so. shipper: Be strict about upload order unless it's specified so & cut v0.13.0-rc.2 Jun 15, 2020
@bwplotka bwplotka merged commit 3bf1397 into release-0.13 Jun 15, 2020
@bwplotka bwplotka deleted the fix2753 branch June 15, 2020 14:53
paulfantom added a commit to paulfantom/thanos that referenced this pull request Jul 8, 2020
openshift/master

* upstream/release-0.13:
  Cut release v0.13.0
  shipper: Be strict about upload order unless it's specified so & cut v0.13.0-rc.2 (thanos-io#2765)
  Cut 0.13.0 release. (thanos-io#2762)
  Cut release 0.13.0-rc.1 (thanos-io#2720)
  Store: `irate` and `resets` use now counter downsampling aggregations. (thanos-io#2719)
  deps: Updated minio-go dependency to v6.0.56 to add two region endpoints (thanos-io#2705) (thanos-io#2718)
  store/proxy: Deduplicate chunks on StoreAPI level. Recommend chunk sorting for StoreAPI + Optimized iter chunk dedup. (thanos-io#2710) (thanos-io#2711)
  Allow using multiple memcached clients at the same time. (thanos-io#2648) (thanos-io#2698)
  Updated Prometheus as little as possible to include Isolation fix. (thanos-io#2697)
  Release fix attempt2.
  Fixed test job. (thanos-io#2650)
  Fixed promu build to build in compatible directory that crossbuild understands.
  Cut v0.13.0-rc.0 (thanos-io#2628)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants