Skip to content
Prev Previous commit
Next Next commit
address some more comments
  • Loading branch information
alindima committed Nov 14, 2023
commit a196ddc78a4a8e4642c3e19e3adae52d527c54e7
105 changes: 80 additions & 25 deletions text/0047-random-assignment-of-availability-chunks.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,15 @@
| | |
| --------------- | ------------------------------------------------------------------------------------------- |
| **Start Date** | 03 November 2023 |
| **Description** | An evenly-distributing indirection layer between availability chunks and validators . |
| **Description** | An evenly-distributing indirection layer between availability chunks and validators. |
| **Authors** | Alin Dima |

## Summary

Propose a way of randomly permuting the availability chunk indices assigned to validators for a given core and relay
chain block, in the context of
[recovering available data from systematic chunks](https://github.com/paritytech/polkadot-sdk/issues/598).
[recovering available data from systematic chunks](https://github.com/paritytech/polkadot-sdk/issues/598), with the
purpose of fairly distributing network bandwidth usage.

## Motivation

Expand All @@ -33,9 +34,36 @@ Relay chain node core developers.

An erasure coding algorithm is considered systematic if it preserves the original unencoded data as part of the
resulting code.
[The implementation of the erasure coding algorithm used for polkadot's availability data](https://github.com/paritytech/reed-solomon-novelpoly) is systematic. Roughly speaking, the first N_VALIDATORS/3
chunks of data can be cheaply concatenated to retrieve the original data, without running the resource-intensive and
time-consuming reconstruction algorithm.
[The implementation of the erasure coding algorithm used for polkadot's availability data](https://github.com/paritytech/reed-solomon-novelpoly) is systematic.
Roughly speaking, the first N_VALIDATORS/3 chunks of data can be cheaply concatenated to retrieve the original data,
without running the resource-intensive and time-consuming reconstruction algorithm.

Here's the concatenation procedure of systematic chunks for polkadot's erasure coding algorithm
(minus error handling, for briefness):
```rust
pub fn reconstruct_from_systematic<T: Decode>(
n_validators: usize,
chunks: Vec<&[u8]>,
) -> T {
let mut threshold = (n_validators - 1) / 3;
if !is_power_of_two(threshold) {
threshold = next_lower_power_of_2(threshold);
}

let shard_len = chunks.iter().next().unwrap().len();

let mut systematic_bytes = Vec::with_capacity(shard_len * kpow2);

for i in (0..shard_len).step_by(2) {
for chunk in chunks.iter().take(kpow2) {
systematic_bytes.push(chunk[i]);
systematic_bytes.push(chunk[i + 1]);
}
}

Decode::decode(&mut &systematic_bytes[..]).map_err(|err| Error::Decode(err))
}
```

### Availability recovery now

Expand All @@ -56,13 +84,13 @@ work is under way to modify the Availability Recovery subsystem by leveraging sy
[this comment](https://github.com/paritytech/polkadot-sdk/issues/598#issuecomment-1792007099) for preliminary
performance results.

In this scheme, the relay chain node will first attempt to retrieve the N/3 systematic chunks from the validators that
In this scheme, the relay chain node will first attempt to retrieve the ~N/3 systematic chunks from the validators that
should hold them, before falling back to recovering from regular chunks, as before.

### Chunk assignment function

#### Properties
The function that decides the chunk index for a validator should be parametrised by at least
The function that decides the chunk index for a validator should be parameterized by at least
`(validator_index, session_index, block_number, core_index)`
and have the following properties:
1. deterministic
Expand All @@ -73,39 +101,66 @@ the function should describe a random permutation of the chunk indices
1. considering `session_index` and `block_number` as fixed arguments, the validators that map to the first N/3 chunk indices should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about my argument to use session_index here, I think it's a very niche edge-case that's not worth worrying about. An alternative to (session_index, block_number) would be block hash which would also sidestep that issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fair. Another issue I just thought of with using block number is that for disputes hapenning on unimported forks, the ChainAPI call for getting the block number would also fail.

have as little overlap as possible for different cores.

#### Proposed function and runtime API
In other words we want a uniformly distributed, deterministic mapping from `ValidatorIndex` to `ChunkIndex` per block per core.

#### Proposed runtime API

Pseudocode:

```rust
pub fn get_chunk_index(
n_validators: u32,
validator_index: ValidatorIndex,
session_index: SessionIndex,
block_number: BlockNumber,
core_index: CoreIndex
n_validators: u32,
validator_index: ValidatorIndex,
session_index: SessionIndex,
block_number: BlockNumber,
core_index: CoreIndex
) -> ChunkIndex {
let threshold = systematic_threshold(n_validators); // Roughly n_validators/3
let seed = derive_seed(session_index, block_number);
let mut rng: ChaCha8Rng = SeedableRng::from_seed(seed);
let mut chunk_indices: Vec<ChunkIndex> = (0..n_validators).map(Into::into).collect();
chunk_indices.shuffle(&mut rng);

let core_start_pos = threshold * core_index.0;
return chunk_indices[(core_start_pos + validator_index) % n_validators]
let threshold = systematic_threshold(n_validators); // Roughly n_validators/3
let seed = derive_seed(session_index, block_number);
let mut rng: ChaCha8Rng = SeedableRng::from_seed(seed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Can we not deterministically arrive at an assignment based on block number and para ids, in a way that perfectly evens out load?

For example: We use block number % n_validators as a starting index. Then we take threshold validators starting from that position to be systemic chunk indices for the first para in that block. The next para starts at the additional offset threshold, so block_number + threshold % n_validators and so on.

We could shuffle validator indices before doing that, but I don't see how this gains us anything.

Now, using information that is not available as part of the candidate receipt was part of the problem we wanted to avoid. What is the "next" para, this information is not necessarily available in disputes. Essentially this means we are using the core number.

But:

  1. It usually is, most validators should have seen a block that got disputed, otherwise the candidates could never have gotten included.
  2. Disputes are not the hot path. They are an exceptional event, that should barely ever happen. It should not be an issue, if disputes are not able to use systemic chunks always or even ever.

There are other cases, e.g. collators recovering availability because the block author is censoring. But also those should be exceptional. If systemic chunk recovery would not be possible here, it would not be a huge problem either. On top of the fact that this should also not be a common case, the recovery here is also not done by validators - so worse recovery performance would not be an issue to the network.

Summary:

Systemic chunk recovery should only be important/relevant in approval voting, where we have to recover a whole lot of data every single relay chain block.
Therefore it would be good, if systemic chunks worked always, but I would consider it totally acceptable if it were an optimization that would only be supported in approval voting.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid paraid here imho, not overly keen on relay parent either.

Instead use era/session, slot, and chain spec to define the validator sequence, and then core index to define the start position in the validator sequence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validators are already randomized at the beginning of the session. Core index for start position makes sense. What is wrong with using the block number in addition?

Reason, I would like to have it dependent on the block (could also be slot, I just don't see the benefit) is that by having a start position by core index, we ensure equal distribution of systemic chunks across a block, but paras are not all equal. Some could be heavier than others, hence it would be beneficial to change the mapping each block.

In my opinion we could also use the hash instead of the block number, here I think anything is likely better than static.

Copy link

@burdges burdges Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want slot or block I think. We're not overly adversarial here, but you can manipulate block numbers easily, while slot numbers represent an opportunity for doing something, so you always pay 19 DOTs or more to chose another one. I'd say slot not block.

We randomize the validator list based upon the randomness two epochs/sessions ago? Cool. Any idea if we similarly randomize the map from paraids to cores per era/session too? If yes, then maybe we could just progress sequentially through them?

let k = num_validators / num_cores;
let fist_validator_index_for_core = ((core_id - slot) * k % num_validators) as u32;

We'd prefer the randomization of core_id for this because otherwise you could still make hot spots. We could've hot spots even with this scheme, but not so easy to make them. We'd avoid those if we randomize per slot.

Also this exactly computation does not work due to signed vs unsigned, and the fact that it suggests things progress backwards as time progresses time, which again tried to avoid hot spots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea if we similarly randomize the map from paraids to cores per era/session too?

AFAICT from the scheduler code, we don't (at least for the parachain auction model; for on-demand, I see more complicated logic which takes into account core affinities).

We're not overly adversarial here, but you can manipulate block numbers easily, while slot numbers represent an opportunity for doing something, so you always pay 19 DOTs or more to chose another one. I'd say slot not block.

I'm having a hard time understanding the advantage of slot number vs block number. This may be simply because I don't know that much about slots. AFAICT, slots are equally useful as block number for the mapping function (monotonically increasing by 1), except that there may be slots that go unoccupied (if the chain is stalled for example) and are therefore skipped. Is that correct?

so you always pay 19 DOTs or more to chose another one

what is this fee? where can I read more about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@burdges I don't get your arguments. How can block numbers be manipulated? They are always increasing by 1, you clearly must be talking about forks. So an adversary could create multiple forks, with all the same load distribution and tries to overload validators this way? If we create forks or reversions we already have a performance problem anyway.

Really not getting how slot numbers are better here. I also don't get your argument about hot spots, my proposal above was precisely to avoid hot spots (by not using randomness). What do you mean by hot spots and how would randomness help here?

A single validator not providing its systemic chunk would be enough to break systemic recovery. I don't see how randomization schemes help here. If we were smart we could somehow track the validators withholding systemic chunks and then make an assignment where they are all pooled together into one candidate. This way, at least only one systemic recovery fails. (We could equally well just remove them entirely.)

But honestly, I would not worry too much about this here. If we ever found that validators try to mess with this on purpose, the threat is low enough that social (calling them out) and governance would be adequate measures to deal with them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the ideal would be to use the relay parent hash directly. Then we don't even need to be able to lookup the block number to determine systemic chunk indices. We will eventually have the core index in the candidate receipt - it would be really good to have this self contained at least once we have that.

Obviously hashes can be influenced trivially by block authors ... but is this a real issue? Assuming we have enough cores, load will be pretty much evenly distributed among validators no matter the hash. The only thing that changes is to which core one gets assigned. I don't mind too much if this can be influenced ... Are there any real concerns here?*)

*) As the system matures, I would assume (especially with CoreJam or similar developments) that candidates will be pretty evened out in load (maxed out). So a validator should not gain much by picking which parachain it wants to have a systemic chunk for.

Copy link
Contributor Author

@alindima alindima Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single validator not providing its systemic chunk would be enough to break systemic recovery

Slightly offtopic: In my WIP PR I added functionality to request up to 5 systematic chunks from the backing group as a backup solution, so that a couple of validators not returning their systematic chunks would not invalidate the entire procedure

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could permit requesting them all from the backing group, but make the availability providers the first choice, meaning maybe: 1st, try all systemic chunk providers. 2nd, try remaining systemic chunks from backers. 3rd, fetch random non-systemic chunks. The concern is just that we overload the backers.

It'll also make rewards somewhat more fragile, but likely worth the difficulty for the performance.

let mut chunk_indices: Vec<ChunkIndex> = (0..n_validators).map(Into::into).collect();
chunk_indices.shuffle(&mut rng);

let core_start_pos = threshold * core_index.0;

chunk_indices[(core_start_pos + validator_index) % n_validators]
}
```

The function should be implemented as a runtime API, because:

1. it's critical to the consensus protocol that all validators have a common view of the Validator->Chunk mapping.
1. it enables further atomic changes to the shuffling algorithm.
1. it enables alternative client implementations (in other languages) to use it
1. mitigates the problem of third-party libraries changing the implementations of the `ChaCha8Rng` or the `rand::shuffle`
that could be introduced in further versions, which would stall parachains. This would be quite an "easy" attack.
1. considering how critical it is for parachain consensus that all validators have a common view of the Validator->Chunk
mapping, this mitigates the problem of third-party libraries changing the implementations of the `ChaCha8Rng` or the `rand::shuffle`
that could be introduced in further versions. This would stall parachains if only a portion of validators upgraded the node.

Additionally, so that client code is able to efficiently get the mapping from the runtime, another API will be added
for retrieving chunk indices in bulk for all validators at a given block and core.
for retrieving chunk indices in bulk for all validators at a given block and core:

```rust
pub fn get_chunk_indices(
n_validators: u32,
session_index: SessionIndex,
block_number: BlockNumber,
core_index: CoreIndex
) -> Vec<ChunkIndex> {
let threshold = systematic_threshold(n_validators); // Roughly n_validators/3
let seed = derive_seed(session_index, block_number);
let mut rng: ChaCha8Rng = SeedableRng::from_seed(seed);
let mut chunk_indices: Vec<ChunkIndex> = (0..n_validators).map(Into::into).collect();
chunk_indices.shuffle(&mut rng);

let core_start_pos = threshold * core_index.0;

chunk_indices
.into_iter()
.cycle()
.skip(core_start_pos)
.take(n_validators)
.collect()
}
```

#### Upgrade path

Expand Down