-
Notifications
You must be signed in to change notification settings - Fork 413
Change KV storage proposal to guarantee progress #3091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
76387ad to
d2311d7
Compare
|
@arielshaqed think we should add it to the proposal content - adding more people to the review for their input. |
|
Great proposal 💯 It will allow us to serve concurrent committers better, like in the scenario of the a Kafka streaming integration. I do think that we shouldn't support it for the first phase of the |
Thanks! I agree that we should not block KV on this (the existing implementation most likely does even worse under load, see #2405). BUT: Please note however that when load is sufficient for livelock it will happen all at once! Ideally we should add monitoring to the code, e.g. histogram of time-to-commit and/or number of failures, as well as an estimate of the number of concurrent commits (can be done by bumping atomic counters at entry and exit to commit operations). If commit is implemented to retry then time-to-commit is a great metric, otherwise we might be able to do without it. And also load-test this please, a lot :-/ |
arielshaqed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Pulling, glad to have this as part of the proposal :-)
|
Merge by force because this is an unreported unrelated test -- this PR only changes docs, the test tests code. |
|
Here's a similar liveness issue in Iceberg that is still open. So not a theoretical issue :-/ |
This handles these comments on the original design review proposal:
Proposal for running lakeFS on top of a key/value store #2466#discussion_r713713932
Proposal for running lakeFS on top of a key/value store #2466#discussion_r725608233 (this one actually goes into more detail
than the current proposal -- reviewers please let me know if you prefer
the phrasing there!)
Lock freedom 🔓 for liveness
Commits need to be lock-free .
Why is this important? Consider the current proposal running on a loaded system where commits are not
lock-free (i.e., it is possible for no commits to progress for an
arbitrarily long period of time). This can happen when ingesting data
automatically, say for Kafka streaming integration. For instance, let's say
we have a commit every second. For some reason the system introduces a
delay of 10 seconds (maybe the KV store went down for a while), and now we
have 10 commits queued. They all race to finish the commit (by writing the
old staging token with a shorter new sealed tokens queue) with incoming new
commits (which try to write a new staging token and push a longer sealed
tokens queue). Now there is no progress guarantee, we can have more and
more new commits admitted. Request timeouts don't really help here, because
new commit requests arrive at the same rate at which timeouts occur. So
this is at best a steady-state -- and commits finish by a random walk,
i.e. at best we recover in time quadratic in the number of pending
commits.
We can only talk about lock-free (and wait-free operations) for
constant-time operations, so let's separate the "write metarange" operation
(which is trivially concurrent but take time linear in the number of
changes) from the "update commit" (which can run in constant time). Also we
cannot assume that the KV store is lock-free, so for our purposes we assume
either that it is or that we count time in number of operations on the KV.
(This is the same as any discussion of nonblocking operations, which
invariably assume the memory is lock-free even though it probably is
not...).
Becoming lock-free
With the process above, commits are still not lock-free. But we can make them so! The key is to use
sealed_tokensas a "store" of pending commits. Every commit tries to commit allsealed_tokensthat it sees, so in particular it helps previous commits terminate! (In those cases we can chose either to re-order the older commits -- keeping them empty -- or fail them; this is a business (product) logic decision that has nothing to do with correctness!)Now when trying (or possibly retrying) to commit, just add the desired commit record to the bag for that branch, and periodically probe to see if the commit succeeded. Meanwhile, every start commit and every finish-commit probe first repeatedly runs GetAny and tries to commit, until the bag is empty.
Now, if there is a pool of pending commits, then after all start-commits finish every committer (including new committers) is trying to end a commit! The only way a committer has to retry is that another committer succeeded, so we continue to make progress until the bag is (momentarily) empty.
Becoming wait-free
Wait-freedom is (much) harder. I think we could "just" use a queue as a bag above... but that is quite a powerful operation to expect our KV store to have, so it involves either implementing a lock-free queue (algorithms exist) or using a KV that has one (Redis).
This is what the proposed implementation will look like!