Skip to content

Conversation

@arielshaqed
Copy link
Contributor

This handles these comments on the original design review proposal:

Lock freedom 🔓 for liveness

Commits need to be lock-free .

Why is this important? Consider the current proposal running on a loaded system where commits are not
lock-free (i.e., it is possible for no commits to progress for an
arbitrarily long period of time). This can happen when ingesting data
automatically, say for Kafka streaming integration. For instance, let's say
we have a commit every second. For some reason the system introduces a
delay of 10 seconds (maybe the KV store went down for a while), and now we
have 10 commits queued. They all race to finish the commit (by writing the
old staging token with a shorter new sealed tokens queue) with incoming new
commits (which try to write a new staging token and push a longer sealed
tokens queue). Now there is no progress guarantee, we can have more and
more new commits admitted. Request timeouts don't really help here, because
new commit requests arrive at the same rate at which timeouts occur. So
this is at best a steady-state -- and commits finish by a random walk,
i.e. at best we recover in time quadratic in the number of pending
commits.

We can only talk about lock-free (and wait-free operations) for
constant-time operations, so let's separate the "write metarange" operation
(which is trivially concurrent but take time linear in the number of
changes) from the "update commit" (which can run in constant time). Also we
cannot assume that the KV store is lock-free, so for our purposes we assume
either that it is or that we count time in number of operations on the KV.
(This is the same as any discussion of nonblocking operations, which
invariably assume the memory is lock-free even though it probably is
not...).

Becoming lock-free

With the process above, commits are still not lock-free. But we can make them so! The key is to use sealed_tokens as a "store" of pending commits. Every commit tries to commit all sealed_tokens that it sees, so in particular it helps previous commits terminate! (In those cases we can chose either to re-order the older commits -- keeping them empty -- or fail them; this is a business (product) logic decision that has nothing to do with correctness!)

Now when trying (or possibly retrying) to commit, just add the desired commit record to the bag for that branch, and periodically probe to see if the commit succeeded. Meanwhile, every start commit and every finish-commit probe first repeatedly runs GetAny and tries to commit, until the bag is empty.

Now, if there is a pool of pending commits, then after all start-commits finish every committer (including new committers) is trying to end a commit! The only way a committer has to retry is that another committer succeeded, so we continue to make progress until the bag is (momentarily) empty.

Becoming wait-free

Wait-freedom is (much) harder. I think we could "just" use a queue as a bag above... but that is quite a powerful operation to expect our KV store to have, so it involves either implementing a lock-free queue (algorithms exist) or using a KV that has one (Redis).

This is what the proposed implementation will look like!

@arielshaqed arielshaqed requested a review from nopcoder March 22, 2022 10:14
This handles these comments on the original design review proposal:

* #2466#discussion_r713713932

* #2466#discussion_r725608233 (this one actually goes into _more_ detail
  than the current proposal -- reviewers please let me know if you prefer
  the phrasing there!)
@arielshaqed arielshaqed force-pushed the docs/propose-lockfree-staging branch from 76387ad to d2311d7 Compare March 22, 2022 10:16
@nopcoder nopcoder assigned nopcoder and unassigned nopcoder Mar 28, 2022
@nopcoder nopcoder added the team/versioning-engine Team versioning engine label Mar 28, 2022
@nopcoder
Copy link
Contributor

@arielshaqed think we should add it to the proposal content - adding more people to the review for their input.

@itaiad200
Copy link
Contributor

Great proposal 💯 It will allow us to serve concurrent committers better, like in the scenario of the a Kafka streaming integration. I do think that we shouldn't support it for the first phase of the refstore as it's adding complexity to an already
non-trivial design. Until the scenario of concurrent committers becomes a reality for lakeFS (say, native Kafka integration isn't usable without it), we can handle that by retrying/failing after X tries to commit. It will not make us lock-free/wait-free, but it's a debt I think we should take until we realise the current design isn't serving us well and we need to optimize it.

@nopcoder nopcoder added the exclude-changelog PR description should not be included in next release changelog label Mar 28, 2022
@arielshaqed
Copy link
Contributor Author

Great proposal 100 It will allow us to serve concurrent committers better, like in the scenario of the a Kafka streaming integration. I do think that we shouldn't support it for the first phase of the refstore as it's adding complexity to an already non-trivial design. Until the scenario of concurrent committers becomes a reality for lakeFS (say, native Kafka integration isn't usable without it), we can handle that by retrying/failing after X tries to commit. It will not make us lock-free/wait-free, but it's a debt I think we should take until we realise the current design isn't serving us well and we need to optimize it.

Thanks!

I agree that we should not block KV on this (the existing implementation most likely does even worse under load, see #2405).

BUT: Please note however that when load is sufficient for livelock it will happen all at once! Ideally we should add monitoring to the code, e.g. histogram of time-to-commit and/or number of failures, as well as an estimate of the number of concurrent commits (can be done by bumping atomic counters at entry and exit to commit operations). If commit is implemented to retry then time-to-commit is a great metric, otherwise we might be able to do without it.

And also load-test this please, a lot :-/

Copy link
Contributor Author

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Pulling, glad to have this as part of the proposal :-)

@arielshaqed
Copy link
Contributor Author

Merge by force because this is an unreported unrelated test -- this PR only changes docs, the test tests code.

@arielshaqed arielshaqed merged commit 6b1b389 into master Mar 31, 2022
@arielshaqed arielshaqed deleted the docs/propose-lockfree-staging branch March 31, 2022 08:22
@arielshaqed
Copy link
Contributor Author

Here's a similar liveness issue in Iceberg that is still open. So not a theoretical issue :-/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

exclude-changelog PR description should not be included in next release changelog team/versioning-engine Team versioning engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants