Make SafeDelete faster for large blocks; TSDB block repair case#1056
Make SafeDelete faster for large blocks; TSDB block repair case#1056bwplotka merged 29 commits intothanos-io:masterfrom
Conversation
When we have label sets that are not in the correct order, fixing that changes the order of the series in the index. So the index must be rewritten in that new order. This makes this repair tool take up a bunch more memory, but produces blocks that verify correctly.
The directory name must be the block ID name exactly to verify. A temp directory or random name will not work here.
Pointer/reference logic error was eliminating all chunks for a series in a given TSDB block that wasn't the first chunk. Chunks are now referenced correctly via pointers.
If we have just downloaded a TSDB block for repair, we create a brand new block that is fixed. The original block can be used as the data we upload in SafeDelete() to save time/bandwidth rather than re-download it from the storage bucket.
Some linters catch errors.Errorf() as its not really part of the errors package.
Where duplicate series means two series with an identical label set. This code just drops any duplicate series that is encountered after the first. No merging or anything fancy.
We're comparing items by pointers, using Go's range variables is misleading here and we need not fall into the same trap.
This prevents us from having to re-implement label sorting.
54b0722 to
8225c79
Compare
8225c79 to
0a01898
Compare
pkg/block/index.go
Outdated
| if err := metadata.Write(logger, resdir, &resmeta); err != nil { | ||
| return resid, err | ||
| } | ||
| // TSDB may rewrite metadata in bdir |
There was a problem hiding this comment.
The TSDB code handily rewrites the block metadata even though we don't write or change read block here. This nukes the extra metadata Thanos uses. Write it back.
There was a problem hiding this comment.
So this is also a bug that you are trying to fix? Perhaps we should add it to the changelog and/or the PR messsage.
There was a problem hiding this comment.
Not a bug, this is part of the initial patches to be able to reuse downloaded blocks. The TSDB library isn't aware of the additional meta data Thanos provides and was removing it. Therefore, when a previously downloaded block is reused it would have incomplete metadata without this change.
There was a problem hiding this comment.
So this is being changed now: prometheus-junkyard/tsdb#637 so this won't be needed TBH (once we use latest TSDB).
Also this is true what @GiedriusS said - splitting PRs into focused changes helps a lot. However this change relies on this fix. I would be more explicit in description. This will help us to understand this early on (:
There was a problem hiding this comment.
Let's add TODO to update TSDB and remove this as it will be not needed in newest TSDB. For now we need this, thanks for fixing this 👍
There was a problem hiding this comment.
Todo added, thanks!
| lastSet := labels.Labels{} | ||
| // Build a new TSDB block. | ||
| for _, s := range series { | ||
| if labels.Compare(lastSet, s.lset) == 0 { |
There was a problem hiding this comment.
The series are sorted based on their labelSet. What happens if two labelSets are identical? For one they will be consecutive in the slice. So if the labelSet we have now is the same as the previous labelSet we have a duplicate series which the TSDB code will produce an error on. So we just drop that series.
There was a problem hiding this comment.
Let's add a comment on the implied behavior here. I believe you when you say you've seen this behavior, but it worries me, as something went terribly wrong wherever these blocks were produced. We should at least document that the heuristic here is that on collision we keep the first series found and drop consecutive ones.
There was a problem hiding this comment.
Sorry one more thing. As this shouldn't generally happen, let's add a metric so we can monitor when it does happen.
There was a problem hiding this comment.
The repair process batch job only which makes metrics less useful. I had some similar comments in #964
Perhaps I should leave a comment if that were to change?
There was a problem hiding this comment.
Yes let's in that case put a TODO in there
pkg/verifier/safe_delete.go
Outdated
| tempdir, err := ioutil.TempDir("", "safe-delete") | ||
| if err != nil { | ||
| return err | ||
| if tempdir == "" { |
There was a problem hiding this comment.
If we already have the block, don't download it again. Blocks can be large. 30GiB or close to it in my environment.
|
I've tested this several times on my own TSDB blocks. I've been wondering how to get better tests integrated into the code as unit tests, but I cannot make these TSDB blocks of mine public. But I've repaired several hundred GiB of data with this. |
pkg/verifier/safe_delete.go
Outdated
| // source bucket. It returns error if block dir already exists in backup | ||
| // bucket (blocks should be immutable) or any of the operation fails. If | ||
| // the block has already been downloaded it must exist in tempdir, else it | ||
| // will be downloaded to be copied to the backup bucket. tempdir set to the |
There was a problem hiding this comment.
I'd prefer not to hide this behavior in tempdir values. Maybe "safe delete modes" (as in "force download", "try local copy") or a bool (as in just "forceDownload") at the very least that makes it very explicit what's going to happen.
There was a problem hiding this comment.
SafeDelete() doesn't know the temporary directory where a previous function may have downloaded the block, so we need to pass in a path somehow. Perhaps rename this argument something like useExisting? Which would mean we use the existing block if found under that path, otherwise a new temporary directory is created and the block is downloaded.
There was a problem hiding this comment.
sorry I meant let's pass two parameters to explicitly describe the intention
There was a problem hiding this comment.
Ok, committed a patch for makes this explicit. I've tested the repair process with my collection of broken blocks I've started keeping and had the behavior expected. (No repeated downloads.)
There was a problem hiding this comment.
Had thanos-compact cut a 450GiB TSDB block today which I referenced in #1152 which is part of the motivation for this.
pkg/verifier/safe_delete.go
Outdated
|
|
||
| if _, err := os.Stat(dir); err != nil { | ||
| // TSDB block not already present, download it | ||
| err = os.Mkdir(dir, 0755) |
There was a problem hiding this comment.
this should only be done if os.IsNotExist(err)
pkg/verifier/safe_delete.go
Outdated
| return errors.Wrap(err, "download from source") | ||
| } | ||
| } else { | ||
| level.Info(logger).Log("msg", "using previously downloaded tsdb block", |
There was a problem hiding this comment.
In what kind of situation would we end up in this path? And if the answer is a (half or maybe even fully) finished previous download, then how do we prevent that the directory is there but the content is not?
There was a problem hiding this comment.
My blocks (and I'm sure others) easily grow above 30GiB when compacted. So if I need to repair block A, Thanos will download block A, create a repaired block B, upload block B. So at this point, we've created, successfully, a new block from A and that block B now passes the VerifyIndex() test.
At this point we call SafeDelete() which downloads block A, uploads block A to the backup bucket, then deletes block A from the main bucket.
This process can be time consuming (and perhaps expensive). So if blocks are immutable, and the repair process verifies that A is a legitimate block (if malformed) we should be able to skip the redundant download of A and backup the existing downloaded copy.
| lastSet := labels.Labels{} | ||
| // Build a new TSDB block. | ||
| for _, s := range series { | ||
| if labels.Compare(lastSet, s.lset) == 0 { |
There was a problem hiding this comment.
Let's add a comment on the implied behavior here. I believe you when you say you've seen this behavior, but it worries me, as something went terribly wrong wherever these blocks were produced. We should at least document that the heuristic here is that on collision we keep the first series found and drop consecutive ones.
pkg/verifier/safe_delete.go
Outdated
| if err := os.RemoveAll(tempdir); err != nil { | ||
| level.Warn(logger).Log("msg", "failed to delete dir", "dir", tempdir, "err", err) | ||
|
|
||
| if _, err := os.Stat(dir); os.IsNotExist(err) { |
There was a problem hiding this comment.
we're swallowing any error that is not "IsNotExist" here
There was a problem hiding this comment.
I think we should just do here os.Mkdir and ignore the error if it is saying that the directory already exists. Otherwise, we will have racy behaviour here (of course, there's barely any chance that someone will run two of these at the same time but still)
There was a problem hiding this comment.
I added some additional logic here to handle any other possible error from os.Stat().
The pre-existence of the directory triggers the code path where we can avoid having to re-download a possibly large TSDB block again. Races with the os.Mkdir() are not anticipated as if the temporary directory isn't supplied ioutil.TempDir()is used to generate a value. When we do supply tempdir in pkg/verifier/index_issue.go is it also generated with ioutil.Tempdir():
tmpdir, err := ioutil.TempDir("", fmt.Sprintf("index-issue-block-%s-", id))
When we want to SafeDelete an already downloaded TSDB block, be very explicit about that use case.
pkg/verifier/safe_delete.go
Outdated
| // previously downloaded block found in tempdir to avoid repeated downloads. | ||
| // If tempdir is an empty string a unique temporary directory is created for | ||
| // scratch space, otherwise the given directory is used. | ||
| func SafeDelete(ctx context.Context, logger log.Logger, bkt objstore.Bucket, backupBkt objstore.Bucket, id ulid.ULID, useExisting bool, tempdir string) error { |
There was a problem hiding this comment.
sorry for nitpicking on this, but can we name this useExistingTempdir? I realize that's long, but we'll understand in 3 months what this means that way :)
There was a problem hiding this comment.
Not a problem. Fixed.
pkg/verifier/safe_delete.go
Outdated
| if err := block.Download(ctx, logger, bkt, id, dir); err != nil { | ||
| return errors.Wrap(err, "download from source") | ||
| if useExisting { | ||
| if _, err := os.Stat(filepath.Join(dir, "meta.json")); err != nil { |
There was a problem hiding this comment.
Is this check really sufficient? Shouldn't we check that all the downloaded content is equal to what is in object storage, ideally by their hash?
There was a problem hiding this comment.
There don't appear to be tools for this in the existing code base. So this seems like a bit of scope creep. Glad to add additional checks here, but these features sound like another GitHub Issue/PR.
In fact, why isn't there a checksum in meta.json? It doesn't have to be cryptographically secure, but it would serve to verify that the block was intact. As we move from "metric data is ephemeral" to "metric data is important long term" that would be quite handy.
pkg/verifier/safe_delete.go
Outdated
| tempdir, err := ioutil.TempDir("", "safe-delete") | ||
| if err != nil { | ||
| return err | ||
| if tempdir == "" { |
There was a problem hiding this comment.
Maybe I wasn't clear, I would expect that when useExisting is set, that tempDir must be set, so I would expect an error from this function if that is not the case, and this if statement to be combined with the parts below, I think that'll also make the code a lot more easy to understand.
There was a problem hiding this comment.
Ok, refactored this a bit. It will return an error if you ask it to use an existing tempdir and do not provide one, and the logic is a little simpler.
Be extra explicit about using existing tempdirs. Added note about future improvements for verifying if we have an intact tsdb block.
|
cluster_test.go:129 fails on me again... |
|
gossip has been removed, if you rebase I think those flakes should go away :) |
|
Sorry for the silence, I'm working off the backlog that accumulated over kubecon, once rebased and CI is green I'm happy to give this another pass |
|
No problem. I know there are a lot of PRs as well. Does rebasing help you move this forward? |
|
The test that kept flaking is gone due to the rebase, that's all :) |
|
Updates on getting this merged? |
brancz
left a comment
There was a problem hiding this comment.
lgtm, but as this is easy to miss something I'd be good if we can get a second approval from @bwplotka or @GiedriusS
pkg/block/index.go
Outdated
| if labels.Compare(lastSet, s.lset) == 0 { | ||
| level.Warn(logger).Log("msg", | ||
| "dropping duplicate series in tsdb block found", | ||
| "labelset", fmt.Sprintf("%s", s.lset), |
There was a problem hiding this comment.
You should use s.lset.String() here. The new meta linter will complain about this 😄
pkg/block/index.go
Outdated
| if err := metadata.Write(logger, resdir, &resmeta); err != nil { | ||
| return resid, err | ||
| } | ||
| // TSDB may rewrite metadata in bdir |
There was a problem hiding this comment.
So this is also a bug that you are trying to fix? Perhaps we should add it to the changelog and/or the PR messsage.
pkg/verifier/safe_delete.go
Outdated
| // previously downloaded block found in tempdir to avoid repeated downloads. | ||
| // If tempdir is an empty string a unique temporary directory is created for | ||
| // scratch space, otherwise the given directory is used. | ||
| func SafeDelete(ctx context.Context, logger log.Logger, bkt objstore.Bucket, backupBkt objstore.Bucket, id ulid.ULID, useExistingTempdir bool, tempdir string) error { |
There was a problem hiding this comment.
Couldn't we just get rid of useExistingTempdir here and use tempdir to control how this function works? It seems like we are making this unnecessarily complex. Perhaps I am missing something.
There was a problem hiding this comment.
I felt like giving a particular value of tempdir special meaning was not obvious enough. I think I prefer this over tempdir being empty. I do agree that below the if doesn't need the tempdir == "" case.
There was a problem hiding this comment.
If you feel strongly about this @GiedriusS I'm ok with the other behavior, just feel I will have forgotten the implicit behavior of the empty string in 3 months, hence I wanted it to be more explicit.
There was a problem hiding this comment.
Hm, I feel strongly about this - boolean in random place of the functions IMO smells bad. Same with 2 flags controlling same thing in very strict way.Essentially it does not make sense to set useExistingTempdir = true and tempdir = "" so we need to validate that etc. Interface is broader then needed.
Also if userExistingTempdir = true and tempdir is specified you literally don't use bkt so it's even worse in terms of readability and surprises ):
I would propose:
BackupAndDelete(ctx context.Context, logger log.Logger, bkt objstore.Bucket, backupBkt objstore.Bucket, id ulid.ULID)as it was.BackupAndDeleteFromLocal(ctx context.Context, logger log.Logger, bdir string, backupBkt objstore.Bucket)
This will ensure separation of concerns AND in fact BackupAndDelete can use BackupAndDeleteFromLocal internally. Right now what I can see is 2 function baked in one function (:
pkg/verifier/safe_delete.go
Outdated
| err = os.Mkdir(dir, 0755) | ||
| if err != nil { | ||
| return err | ||
| if useExistingTempdir && tempdir == "" { |
There was a problem hiding this comment.
It seems like they both can be set or unset which proves my point.
There was a problem hiding this comment.
I touched up the comments for this function again so that they better describe the behavior of the current code. No code changes though.
There was a problem hiding this comment.
I proposed different approach - splitting this into 2 function which will solve this issue (:
There was a problem hiding this comment.
This is now committed. The function signatures didn't quite work out as we would have liked, but I totally aggree, this feels much, much nicer.
The tempdir value is ignored if useExistingTempdir is false.
|
Updates? |
bwplotka
left a comment
There was a problem hiding this comment.
@jjneely Thanks for this!
I think it makes a lot of sense, but propose a split in SafeDelete to ensure readability and address @GiedriusS concerns (: Let me know if that makes sense.
Otherwise LGTM!
pkg/block/index.go
Outdated
| if err := metadata.Write(logger, resdir, &resmeta); err != nil { | ||
| return resid, err | ||
| } | ||
| // TSDB may rewrite metadata in bdir |
There was a problem hiding this comment.
So this is being changed now: prometheus-junkyard/tsdb#637 so this won't be needed TBH (once we use latest TSDB).
Also this is true what @GiedriusS said - splitting PRs into focused changes helps a lot. However this change relies on this fix. I would be more explicit in description. This will help us to understand this early on (:
pkg/block/index.go
Outdated
| if err := metadata.Write(logger, resdir, &resmeta); err != nil { | ||
| return resid, err | ||
| } | ||
| // TSDB may rewrite metadata in bdir |
There was a problem hiding this comment.
Also missing trailing period in comment.
| lastSet := labels.Labels{} | ||
| // Build a new TSDB block. | ||
| for _, s := range series { | ||
| // The TSDB library will throw an error if we add a series with |
There was a problem hiding this comment.
Can we remove double spaces in this comment?
There was a problem hiding this comment.
Old habits die hard...fixed.
There was a problem hiding this comment.
Can I ask where is this habit from? Sorry for my ignorance - super curious (:
There was a problem hiding this comment.
Lol... Search google for "double space after period" you'll learn more than you bargained for. Like, my age. :-)
There was a problem hiding this comment.
TIL never heard of this either. But actually the rule could apply here since code formatting is using monospaced fonts mostly :)
pkg/block/index.go
Outdated
| if err := metadata.Write(logger, resdir, &resmeta); err != nil { | ||
| return resid, err | ||
| } | ||
| // TSDB may rewrite metadata in bdir |
There was a problem hiding this comment.
Let's add TODO to update TSDB and remove this as it will be not needed in newest TSDB. For now we need this, thanks for fixing this 👍
pkg/verifier/safe_delete.go
Outdated
| // previously downloaded block found in tempdir to avoid repeated downloads. | ||
| // If tempdir is an empty string a unique temporary directory is created for | ||
| // scratch space, otherwise the given directory is used. | ||
| func SafeDelete(ctx context.Context, logger log.Logger, bkt objstore.Bucket, backupBkt objstore.Bucket, id ulid.ULID, useExistingTempdir bool, tempdir string) error { |
There was a problem hiding this comment.
Hm, I feel strongly about this - boolean in random place of the functions IMO smells bad. Same with 2 flags controlling same thing in very strict way.Essentially it does not make sense to set useExistingTempdir = true and tempdir = "" so we need to validate that etc. Interface is broader then needed.
Also if userExistingTempdir = true and tempdir is specified you literally don't use bkt so it's even worse in terms of readability and surprises ):
I would propose:
BackupAndDelete(ctx context.Context, logger log.Logger, bkt objstore.Bucket, backupBkt objstore.Bucket, id ulid.ULID)as it was.BackupAndDeleteFromLocal(ctx context.Context, logger log.Logger, bdir string, backupBkt objstore.Bucket)
This will ensure separation of concerns AND in fact BackupAndDelete can use BackupAndDeleteFromLocal internally. Right now what I can see is 2 function baked in one function (:
pkg/verifier/safe_delete.go
Outdated
| err = os.Mkdir(dir, 0755) | ||
| if err != nil { | ||
| return err | ||
| if useExistingTempdir && tempdir == "" { |
There was a problem hiding this comment.
I proposed different approach - splitting this into 2 function which will solve this issue (:
|
And sorry for massive delay in review! |
Co-Authored-By: Bartek Płotka <bwplotka@gmail.com>
Co-Authored-By: Bartek Płotka <bwplotka@gmail.com>
Co-Authored-By: Bartek Płotka <bwplotka@gmail.com>
bwplotka
left a comment
There was a problem hiding this comment.
Happy to put LGTM once we address the https://github.com/improbable-eng/thanos/pull/1056/files#r296908203 (:
pkg/block/index.go
Outdated
| return resid, err | ||
| } | ||
| // TSDB may rewrite metadata in bdir. | ||
| // TODO: This is not needed in newer TSDB code. |
There was a problem hiding this comment.
PR number would be nice for easier trackngm but not a blocker (:
| lastSet := labels.Labels{} | ||
| // Build a new TSDB block. | ||
| for _, s := range series { | ||
| // The TSDB library will throw an error if we add a series with |
There was a problem hiding this comment.
Can I ask where is this habit from? Sorry for my ignorance - super curious (:
bwplotka
left a comment
There was a problem hiding this comment.
One more nit and LGTM, thanks for this!
pkg/verifier/safe_delete.go
Outdated
| return errors.Errorf("%s dir seems to exists in backup bucket. Remove this block manually if you are sure it is safe to do", id) | ||
| } | ||
|
|
||
| // Now we call backupAndDeleteDownloaded() to handle the work. |
There was a problem hiding this comment.
I think this comment is not needed.
pkg/verifier/safe_delete.go
Outdated
| // backupAndDeleteDownloaded is a helper function that uploads a TSDB block | ||
| // found on disk to the backup bucket and, on success, removes the TSDB block | ||
| // from the source bucket. | ||
| func backupAndDeleteDownloaded(ctx context.Context, logger log.Logger, bdir string, bkt, backupBkt objstore.Bucket, id ulid.ULID) error { |
There was a problem hiding this comment.
Sorry for being picky but there is some improvement we can make here.
I like BackupAndDelete vs BackupAndDeleteDownloaded, but it's not clear what's the difference between backupAndDeleteDownloaded and BackupAndDeleteDownloaded is. Sure they are heavily commented, but comment should have less priority than naming. I think here the proper naming and small possible signature of function would do the work.
So what about:
| func backupAndDeleteDownloaded(ctx context.Context, logger log.Logger, bdir string, bkt, backupBkt objstore.Bucket, id ulid.ULID) error { | |
| func backupDownloaded(ctx context.Context, logger log.Logger, bdir string, bktToRemove, backupBkt objstore.Bucket, id ulid.ULID) error { |
and then if no error do on the caller in both BackupAndDeleteDownloaded and BackupAndDelete just.
// Block uploaded, so we are ok to remove from src bucket.
if err := block.Delete(ctx, bkt, id); err != nil {
return errors.Wrap(err, "delete from source")
}
This will make the signature of BackupAndDeleteDownloaded and backupDowloaded smaller (thus reducing complexity) as you can drop bkt argument.
What do you think? (:
There was a problem hiding this comment.
Adjusted as requested, and went through another round of testing.
|
I think netlify issue is because of not rebased PR - nothing changed for netlify here so merging. |
|
Awesome, thanks! |
This makes SafeDelete faster and less error prone when dealing with large blocks. This prevents SafeDelete from re-downloading a block that was just downloaded.
This PR also addresses a TSDB block repair case when duplicate series are detected.
Changes
Verification
I run these patches in production, and have repaired quite a few hundred gig of blocks with these.