receive, rule: Lock TSDB directories #2915
Conversation
|
For multi-tsdb this lgtm, but I'm not entirely sure about the ruler, why do we think we need this there? I think it would be ok to have configurable, so people can make this decision themselves, much like with Prometheus (this goes for both receive and ruler), as locking like this may already be handled by kubernetes for example. |
|
@brancz It was just premature optimization for Ruler, I can revert it. We can add a flag to control this behaviour. What should be the default? Prometheus uses lock file by default, should we as well? |
|
I believe creating the lock by default is correct, with the option to opt out. As long this is clear in the changelog, this is fine in my opinion. Would be good to be confirmed by @bwplotka though. |
bwplotka
left a comment
There was a problem hiding this comment.
Nice, Generally LGTM, just some suggestions 🤗
Thanks! 💪
| walCompression := cmd.Flag("tsdb.wal-compression", "Compress the tsdb WAL.").Default("true").Bool() | ||
| noLockFile := cmd.Flag("tsdb.no-lockfile", "Do not create lockfile in TSDB data directory.").Default("false").Bool() | ||
|
|
||
| ignoreBlockSize := cmd.Flag("shipper.ignore-unequal-block-size", "If true receive will not require min and max block size flags to be set to the same value. Only use this if you want to keep long retention and compaction enabled, as in the worst case it can result in ~2h data loss for your Thanos bucket storage.").Default("false").Hidden().Bool() |
There was a problem hiding this comment.
| ignoreBlockSize := cmd.Flag("shipper.ignore-unequal-block-size", "If true receive will not require min and max block size flags to be set to the same value. Only use this if you want to keep long retention and compaction enabled, as in the worst case it can result in ~2h data loss for your Thanos bucket storage.").Default("false").Hidden().Bool() | |
| ignoreBlockSize := cmd.Flag("shipper.ignore-unequal-block-size", "If true receive will not require min and max block size flags to be set to the same value. Only use this if you want to keep long retention and compaction enabled, as in the worst case it can result in ~2h data loss for your Thanos bucket storage.").Default("false").Hidden().Bool() |
Hm... this is actually something we can improve. Let's add TODO and ensure there is issue for this. We can leverage vertical compaciton just fine in this case.
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>
bwplotka
left a comment
There was a problem hiding this comment.
LGTM, but some tiny nits to improve, we can do them in next iter, feel free to merge (:
| } | ||
| return err | ||
| } | ||
| level.Info(logger).Log("msg", "a leftover lockfile found and removed") |
| if !fi.IsDir() { | ||
| continue | ||
| } | ||
| if err := os.Remove(filepath.Join(t.defaultTenantDataDir(fi.Name()), "lock")); err != nil { |
There was a problem hiding this comment.
I would reuse the removeLockfileIfAny, because... why not? (:
This PR attempts to prevent any concurrent access to TSDB directories on a single session.
Changes
TODO
Verification
make test-localmake test-e2e