Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.
Merged
Changes from 1 commit
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
a5ff4ca
Don't import backing statements directly
eskimor Jun 16, 2022
0155ab9
Get rid of dead code.
eskimor Jun 16, 2022
550aa6b
Dont send approval vote
eskimor Jun 18, 2022
c2d2b1d
Make it pass CI
eskimor Jun 18, 2022
7f43db1
Merge branch 'rk-only-chain-vote-import' into rk-perf-test
eskimor Jun 18, 2022
f97e43b
Bring back tests for fixing them later.
eskimor Jul 14, 2022
c49214d
Explicit signature check.
eskimor Jul 14, 2022
7ee2e14
Resurrect approval-voting tests (not fixed yet)
eskimor Jul 17, 2022
6a99120
Send out approval votes in dispute-distribution.
eskimor Jul 17, 2022
5ae234e
Bring back an important warning.
eskimor Jul 17, 2022
7f950a9
Fix approval voting tests.
eskimor Jul 18, 2022
986e01d
Don't send out dispute message on import + test
eskimor Jul 19, 2022
32cabb4
Guide changes.
eskimor Jul 19, 2022
19ecc78
WIP: guide changes.
eskimor Jul 28, 2022
19833e6
Finish guide changes about dispute-coordinator
eskimor Aug 2, 2022
32e67a8
Finish guide changes for now.
eskimor Aug 3, 2022
98d4389
Remove own approval vote import logic.
eskimor Aug 4, 2022
e5fb770
Implement logic for retrieving approval-votes
eskimor Aug 4, 2022
24275ba
Update roadmap/implementers-guide/src/node/disputes/dispute-coordinat…
eskimor Aug 5, 2022
dd0ac1d
Review feedback.
eskimor Aug 8, 2022
912daf3
Incorporate Review Remarks
eskimor Aug 8, 2022
39e913a
Merge remote-tracking branch 'origin/rk-fast-dispute-coordinator' int…
eskimor Aug 8, 2022
11ab0e3
Get rid of superfluous space.
eskimor Aug 8, 2022
a890970
Tidy up import logic a bit.
eskimor Aug 9, 2022
8395724
We don't need/have empty imports.
eskimor Aug 10, 2022
31e0769
Fix tests and bugs.
eskimor Aug 10, 2022
2011f35
Remove error prone redundancy.
eskimor Aug 10, 2022
6624bc6
Import approval votes on dispute initiated/concluded.
eskimor Aug 10, 2022
e3be4c2
Add test for approval vote import.
eskimor Aug 10, 2022
24537b6
Make guide checker happy (hopefully)
eskimor Aug 11, 2022
793666b
Another sanity check + better logs.
eskimor Aug 11, 2022
943b064
Reasoning about boundedness.
eskimor Aug 11, 2022
4287137
Use `CandidateIndex` as opposed to `CoreIndex`.
eskimor Aug 11, 2022
8ae6797
Remove redundant import.
eskimor Aug 11, 2022
c126b76
Review remarks.
eskimor Aug 12, 2022
b11eadf
Add metric for calls to request signatures
eskimor Aug 12, 2022
b356acc
More review remarks.
eskimor Aug 12, 2022
f8cea8a
Add metric on imported approval votes.
eskimor Aug 12, 2022
d46b03a
Include candidate hash in logs.
eskimor Aug 12, 2022
1cdf49c
Merge branch 'master' into rk-fast-dispute-coordinator
eskimor Aug 13, 2022
73669dc
More trace log
eskimor Aug 13, 2022
4ac86f4
Break cycle.
eskimor Aug 15, 2022
e6754bb
Add some tracing.
eskimor Aug 15, 2022
3ec17cc
Cleanup allowed messages.
eskimor Aug 15, 2022
2c7fa5c
fmt
eskimor Aug 15, 2022
2b44d94
Tracing + timeout for get inherent data.
eskimor Aug 15, 2022
de0e85f
Better error.
eskimor Aug 15, 2022
23f1ee0
Break cycle in all places.
eskimor Aug 15, 2022
0ed7a1a
Clarified comment some more.
eskimor Aug 15, 2022
4466d30
Typo.
eskimor Aug 15, 2022
a002cb6
Break cycle approval-distribution - approval-voting.
eskimor Aug 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Finish guide changes for now.
  • Loading branch information
eskimor committed Aug 3, 2022
commit 32e67a8d53a9219e9df396338770482591159791
134 changes: 74 additions & 60 deletions roadmap/implementers-guide/src/node/disputes/dispute-coordinator.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,51 +8,51 @@ In particular the dispute-coordinator is responsible for:

- Ensuring that the node is able to raise a dispute in case an invalid candidate
is found during approval checking.
- Ensuring malicious approval votes will be recorded so nodes can get slashed
- Ensuring malicious approval votes will be recorded, so nodes can get slashed
properly.
- Coordinating actual participation in a dispute, ensuring that the node
participates in any justified dispute in a way that ensures resolution of
disputes on the network even in the case of many disputes raised (flood/DoS
scenario).
- Provide an API for chain selection, so we can prevent finalization of any
chain which has included candidates for which a dispute is either ongoing or
concluded invalid.
concluded invalid and avoid building on chains with an included invalid
candidate.
- Provide an API for retrieving (resolved) disputes, including all votes, both
implicit (approval, backing) and explicit dispute votes. So validators can get
rewarded/slashed accordingly for example.
rewarded/slashed accordingly.

## Ensuring That Disputes Can Be Raised

## Ensuring that disputes can be raised

If a candidate turns out invalid in approval checking, the approval-voting
subsystem will try to issue a dispute. For this it will send a message
If a candidate turns out invalid in approval checking, the `approval-voting`
subsystem will try to issue a dispute. For this, it will send a message
`DisputeCoordinatorMessage::IssueLocalStatement` to the dispute coordinator,
indicating to cast an explicit invalid vote. It is the responsibility of the
dispute coordinator on reception of such a message to create and sign that
explicit invalid vote and trigger a dispute if none is already
ongoing.

In order to raise a dispute, a node has to be able to provide an opposing vote.
In order to raise a dispute, a node has to be able to provide two opposing votes.
Given that the reason of the backing phase is to have validators with skin in
the game, the opposing valid vote will very likely be a backing vote. It could
also be some already cast approval vote, but the significant point here is: As
long as we have backing votes available, any node will be able to raise a
dispute.

Therefore an important task of the dispute coordinator is to make sure backing
Therefore a vital responsibility of the dispute coordinator is to make sure backing
votes are available for all candidates that might still get disputed. To
accomplish this task in an efficient way the dispute-coordinator relies on chain
scraping for this. Whenever a candidate gets backed on chain, we record in
scraping. Whenever a candidate gets backed on chain, we record in
chain storage the backing votes (gets overridden on every block). We provide a
runtime API for querying those votes. The dispute coordinator makes sure to
query those votes for any non finalized blocks (in case of missed blocks, it
will do chain traversal as necessary).
query those votes for any non finalized blocks: In case of missed blocks, it
will do chain traversal as necessary.

Relying on chain scraping is very efficient for two reasons:

1. Votes are already batched. We import all available backing votes for a
candidate all at once. If instead we imported votes from candidate-backing as
they came along, we would import each vote individually which is very
they came along, we would import each vote individually which is
inefficient in the current dispute coordinator implementation (quadratic
complexity).
2. We also import less votes in total, as we avoid importing statements for
Expand All @@ -61,33 +61,33 @@ Relying on chain scraping is very efficient for two reasons:
It also is secure, because disputes are only ever raised in the approval voting
phase. A node only starts the approval process after it has seen a candidate
included on some chain, for that to happen it must have been backed previously.
This means backing votes are available at that point in time. Signals are
Therefore backing votes are available at that point in time. Signals are
processed first, so even if a block is skipped and we only start importing
backing votes on the including block, we will have seen the backing votes by the
time we process messages from approval voting.

In summary, for making it possible for a dispute to be raised, recording of backing votes
from chain is sufficient and efficient. In particular there is no need to
preemptively import approval votes, which has shown to be a very inefficient
process. (Quadratic complexity adds up, with 35 votes per candidate)
In summary, for making it possible for a dispute to be raised, recording of
backing votes from chain is sufficient and efficient. In particular there is no
need to preemptively import approval votes, which has shown to be a very
inefficient process. (Quadratic complexity adds up, with 35 votes in total per candidate)

Approval votes are very relevant non the less as we are going to see in the next
section.

## Ensuring malicious approval votes will be recorded
## Ensuring Malicious Approval Votes Will Be Recorded

While there is no need to record approval votes in the dispute coordinator
preemptively, we do need to make sure they are recorded when a dispute is
actually happens. The reason is, that only votes recorded by the dispute
preemptively, we do need to make sure they are recorded when a dispute
actually happens. This is because only votes recorded by the dispute
coordinator will be considered for slashing. While the backing group always gets
slashed, a serious attack attempt will likely also consist of malicious approval
checkers which will cast approval votes, although the candidate is invalid. If
we did not import those votes, those nodes would likely cast in invalid explicit
vote once in the dispute in addition to their approval vote and thus avoid a
we did not import those votes, those nodes would likely cast an `invalid` explicit
vote as part of the dispute in addition to their approval vote and thus avoid a
slash. With the 2/3rd honest assumption it seems unrealistic that malicious
actors will keep sending approval votes once they became aware of a raised
dispute. Hence the most crucial approval votes to import are the early ones
(tranch 0), to take into account network latencies and such we still want to
(tranche 0), to take into account network latencies and such we still want to
import approval votes at a later point in time as well (in particular we need to
make sure the dispute can conclude, but more on that later).

Expand All @@ -114,7 +114,7 @@ was raised.

Instead of the dispute coordinator telling approval-voting that a dispute is
ongoing for approval-voting to start sending votes to the dispute coordinator,
it would actually make more sense if the dispute-coordinator would just ask
it would make more sense if the dispute-coordinator would just ask
approval-voting for votes of candidates that are currently disputed. This way
the dispute-coordinator can also pick the time when to ask and we can therefore
maximize the amount of batching.
Expand All @@ -126,14 +126,14 @@ votes. Still we would like to have a record of all, if possible. So what are
other points in time we might query approval votes?

In fact for slashing it is only relevant to have them once the dispute
concluded, so we can query approval voting the moment the dispute concludes.
concluded, so we can query approval voting the moment the dispute concludes!
There are two potential caveats with this though:

1. Timing: We would like to rely as little as possible on implementation details
of approval voting. In particular, if the dispute is ongoing for a long time,
do we have any guarantees that approval votes are kept around long enough by
approval voting? So will approval votes still be present by the time the
dispute concludes in any case? The answer should luckily be yes: As long as
dispute concludes in all cases? The answer should luckily be yes: As long as
the chain is not finalized, which has to be the case once we have an ongoing
dispute, approval votes have to be kept around (and distributed) otherwise we
might not be able to finalize in case the validator set changes for example.
Expand All @@ -146,7 +146,7 @@ There are two potential caveats with this though:
already mentioned, approval voting and disputes are running concurrently, but
not only that, they race with each other! A node might simultaneously start
participating in a dispute via the dispute coordinator, due to learning about
a dispute via dispute-distribution for example, while also participating in
a dispute via dispute-distribution, while also participating in
approval voting. So if we don't import approval votes before the dispute
concluded, we actually are making sure that no local vote is present and any
honest node will cast an explicit vote in addition to its approval vote: The
Expand All @@ -164,6 +164,23 @@ everyday performance - second, even if we imported approval votes, those doubled
work is still present as disputes and approvals are racing. Every time
participation is faster than approval, a node would do double work anyway.

One gotcha remains: We could be receiving our own approval vote via
dispute-distribution (or dispute chain scraping), because some (likely
malicious) node picked it as the opposing valid vote e.g. as an attempt to
prevent the dispute from concluding (it is only sending it to us).
The solution is simple though: When checking for an existing own vote to
determine whether or not to participate, we will instruct `dispute-distribution`
to distribute an already existing own approval vote. This way a dispute will
always be able to conclude, even with these kinds of attacks. Alternatively or
in addition to be double safe, we could also choose to simply drop (own)
approval votes from any import that is not requested from the
dispute-coordinator itself.

Side note: In fact with both of these we would already be triple safe, because
the dispute coordinator also scrapes any votes from ongoing disputes off chain.
Therefore, as soon as the current node becomes a block producer it will put its
own approval vote on chain, and all other honest nodes will retrieve it from
there.

## Coordinating Actual Dispute Participation

Expand All @@ -174,7 +191,8 @@ The dispute coordinator learns about a dispute by importing votes from either
chain scraping or from dispute-distribution. If it finds opposing votes (always
the case when coming from dispute-distribution), it records the presence of a
dispute. Then, in case it does not find any local vote for that dispute already,
it needs to trigger participation in the dispute.
it needs to trigger participation in the dispute (see previous section for
considerations when the found local vote is an approval vote).

Participation means, recovering availability and re-evaluating the POV. The
result of that validation (either valid or invalid) will be the node's vote on
Expand All @@ -192,10 +210,10 @@ result in a slash of the offenders. Therefore we need to make sure that this
slash is actually happening. Attackers could try to prevent the slashing from
taking place, by overwhelming validators with disputes in such a way that no
single dispute ever concludes, because nodes are busy processing newly incoming
disputes. Other attacks are imaginable as well, like raising disputes for
candidates that don't exist, just filling up everyone's disk slowly or worse
making nodes try to participate, which will result in lots of network requests
for recovering availability.
ones. Other attacks are imaginable as well, like raising disputes for candidates
that don't exist, just filling up everyone's disk slowly or worse making nodes
try to participate, which will result in lots of network requests for recovering
availability.

The last point brings up a significant consideration in general: Disputes are
about escalation: Every node will suddenly want to check, instead of only a few.
Expand All @@ -204,22 +222,22 @@ work and will cause lots of network traffic and messages. Hence the
dispute system is very susceptible to being a brutal amplifier for DoS attacks,
resulting in DoS attacks to become very easy and cheap, if we are not careful.

One counter measure we are taking is making raising of disputes a costly thing
in general: If you raise a dispute, because you claim a candidate is invalid,
although it is in fact valid - you will get slashed, hence you pay for consuming
those resources. The issue is: This only works if the dispute concerns a
candidate that actually exists!
One counter measure we are taking is making raising of disputes a costly thing:
If you raise a dispute, because you claim a candidate is invalid, although it is
in fact valid - you will get slashed, hence you pay for consuming those
resources. The issue is: This only works if the dispute concerns a candidate
that actually exists!

If a node raises a dispute for a candidate that never got included (became
available) on any chain, then the dispute can never conclude hence nobody gets
available) on any chain, then the dispute can never conclude, hence nobody gets
slashed. It makes sense to point out that this is less bad than it might sound
at first, as trying to participate in a dispute for a non existing candidate is
"relatively" cheap. Each node will send out a few hundred tiny request messages
for availability chunks, which all will end up in a tiny response "NoSuchChunk"
and then no participation will actually happen as there is nothing to
participate. Malicious nodes could provide chunks, which would make things more
costly, but at the full expense of the attackers bandwidth - no amplification
here. I am bringing that up for completeness only, triggering a thousand nodes
here. I am bringing that up for completeness only: Triggering a thousand nodes
to send out a thousand tiny network messages by just sending out a single
garbage message, is still a significant amplification and is nothing to ignore -
this could absolutely be used to cause harm!
Expand All @@ -246,28 +264,25 @@ participation at all on any _vote import_ if any of the following holds true:
- The dispute is already confirmed: Meaning that 1/3+1 nodes already
participated, as this suggests in our threat model that there was at least one
honest node that already voted, so the dispute must be genuine.
- At least one signing participant of the imported votes has not exceeded its
spam slot limit (more on that later).

It is important to note, that a node might be out of sync with the chain and we
might only learn about a block including a candidate, after we learned about the
dispute. This means, we have to re-evaluate participation decisions on block
import!
Note: A node might be out of sync with the chain and we might only learn about a
block including a candidate, after we learned about the dispute. This means, we
have to re-evaluate participation decisions on block import!

This ensures, that nodes won't waste significant resources on completely made up
With this nodes won't waste significant resources on completely made up
candidates. The next step is to process dispute participation in a (globally)
ordered fashion. Meaning a majority of validators should arrive at at least
roughly the same ordering of participation, in order for disputes to get
resolved one after the other. This order is only relevant if there are lots of
disputes, so we obviously only need to worry about order if participations start
queuing up.
roughly at the same ordering of participation, for disputes to get resolved one
after another. This order is only relevant if there are lots of disputes, so we
obviously only need to worry about order if participations start queuing up.

We treat participation for candidates that we have seen included with priority
and put them on a priority queue which sorts participation based on the block
number of the relay parent of that candidate and for candidates with the same
relay parent height further by the `CandidateHash`. This ordering ensures a
globally unique ordering of participation and also prioritizes older candidates.
The later property makes sense, because of an older candidate turns out invalid,
relay parent height further by the `CandidateHash`. This ordering is globally
unique and also prioritizes older candidates.

The later property makes sense, because if an older candidate turns out invalid,
we can roll back the full chain at once. If we resolved earlier disputes first
and they turned out invalid as well, we might need to roll back a couple of
times instead of just once to the oldest offender. This is obviously a good
Expand All @@ -285,11 +300,11 @@ nodes that are able to sort based on the relay parent block height.

#### Import

In the last section we looked at how to treat queuing participation to handle
In the last section we looked at how to treat queuing participations to handle
heavy dispute load well. This already ensures, that honest nodes won't amplify
any DoS attacks. There is one minor issue remaining: Even if we delay
cheap DoS attacks. There is one minor issue remaining: Even if we delay
participation until we have some confirmation of the authenticity of the
dispute, we should also not blindly import all votes arriving votes into the
dispute, we should also not blindly import all votes arriving into the
database as this might be used to just slowly fill up disk space, until the node
is no longer functional. This leads to our last protection mechanism at the
dispute coordinator level (dispute-distribution also has its own), which is spam
Expand All @@ -300,12 +315,11 @@ The reason this works is because we only need to worry about actual dispute
votes. Import of backing votes are already rate limited and concern only real
candidates for approval votes a similar argument holds (if they come from
approval-voting), but we also don't import them until a dispute already
concluded. For actual dispute votes, we need to opposing votes, so there must be
concluded. For actual dispute votes, we need two opposing votes, so there must be
an explicit `invalid` vote in the import. Only a third of the validators can be
malicious, so spam disk usage is limited to 2*vote_size*n/3*NUM_SPAM_SLOTS, with
malicious, so spam disk usage is limited to ```2*vote_size*n/3*NUM_SPAM_SLOTS```, with
n being the number of validators.
-

More reasoning behind spam considerations can be found on
[this](https://github.com/paritytech/srlabs_findings/issues/179) sr-lab ticket.

Expand Down