Compact: skip compaction for blocks with no samples#904
Compact: skip compaction for blocks with no samples#904bwplotka merged 9 commits intothanos-io:masterfrom
Conversation
devnev
left a comment
There was a problem hiding this comment.
IIUC, the case we were observing was essentially a corrupted block, where we expect data based on the block but see none in the chunks. As we were discussing, this probably shouldn't be silenced entirely, rather the error reporting should not affect the processing of independent blocks. @bwplotka ?
|
Wow, ok I think we have 2-3 separate cases here.
|
…ndle from bucket compactor
bwplotka
left a comment
There was a problem hiding this comment.
Generally OK, just some suggestions.
pkg/compact/compact.go
Outdated
| ) | ||
|
|
||
| var blockTooFreshSentinelError = errors.New("Block too fresh") | ||
| var emptyBlockSentinelULID = ulid.MustNew(123, nil) |
There was a problem hiding this comment.
This sentinel is for noMoreCompactionForGroup I guess? can we name it like this or something similar?
There was a problem hiding this comment.
OK so I think there is misformation here , I would prefer something opposite like noMoreCompactionForGroup instead of this? What do you think? OR even bool flag return argument to make it more explicit (:
There was a problem hiding this comment.
I used a separate bool flag as I think that's the clearest what's happening. The naming was a little awkward but hopefully it makes sense, let me know what you think :)
| // Compact plans and runs a single compaction against the group. The compacted result | ||
| // is uploaded into the bucket the blocks were retrieved from. | ||
| func (cg *Group) Compact(ctx context.Context, dir string, comp tsdb.Compactor) (ulid.ULID, error) { | ||
| func (cg *Group) Compact(ctx context.Context, dir string, comp tsdb.Compactor) (bool, ulid.ULID, error) { |
There was a problem hiding this comment.
I originally tried to remove the ulid.ULID but had to add it back in as we need it for compact_e2e_test.go (as written right now). In the case when the compactor actually does something, the test asserts that the new block was actually uploaded, and it needs the id for that.
I thought about changing the test, but it seemed worthwhile keeping the coverage for the upload stage.
There was a problem hiding this comment.
sure, let's not end in rabbit whole here.
bwplotka
left a comment
There was a problem hiding this comment.
LGTM, thanks.
One small nit, but we can fix it later on, not a blocker
| // Compact plans and runs a single compaction against the group. The compacted result | ||
| // is uploaded into the bucket the blocks were retrieved from. | ||
| func (cg *Group) Compact(ctx context.Context, dir string, comp tsdb.Compactor) (ulid.ULID, error) { | ||
| func (cg *Group) Compact(ctx context.Context, dir string, comp tsdb.Compactor) (bool, ulid.ULID, error) { |
There was a problem hiding this comment.
sure, let's not end in rabbit whole here.
| } | ||
|
|
||
| compID, err := cg.compact(ctx, subDir, comp) | ||
| shouldRerun, compID, err := cg.compact(ctx, subDir, comp) |
There was a problem hiding this comment.
I don't like this as this indicate that something wrong happen and we need to retry.. non empty ulid was actually more sensible maybe? Or at least we should change var name... Not a blocker.
* skip compaction for blocks with no samples * update to actually delete the empty input blocks, and to correctly handle from bucket compactor * warn on error deleting empty block * use ULID instead of error for emptyBlockSentinel * don't use a global variable * full stop at end of comment * use boolean to indicate whether there is more compaction work * rename variables * fix test
Changes
During the Thanos group compactor, in the case when the Prometheus compactor has found that the resulting block would have no data, delete the source blocks (after checking they are indeed empty) and exit early. Introduce a sentinel error to mark this case, to force the caller (bucket compactor) to continue compacting the rest of that group.
It is necessary to handle the empty compaction case separately because the later parts of the group compactor result in an error if we attempt to continue.
Note that the Prometheus compactor uses the empty ULID for the "block with no samples" case:
Verification
Ran the compactor against a bucket where some of the compacted blocks would end up empty.