Skip to content

Latest commit

 

History

History
267 lines (196 loc) · 11.9 KB

File metadata and controls

267 lines (196 loc) · 11.9 KB

RFC-0047: Random assignment of availability chunks to validators

Start Date 03 November 2023
Description An evenly-distributing indirection layer between availability chunks and validators.
Authors Alin Dima

Summary

Propose a way of randomly permuting the availability chunk indices assigned to validators for a given core and relay chain block, in the context of recovering available data from systematic chunks, with the purpose of fairly distributing network bandwidth usage.

Motivation

Currently, the ValidatorIndex is always identical to the ChunkIndex. Since the validator array is only shuffled once per session, naively using the ValidatorIndex as the ChunkIndex would pose an unreasonable stress on the first N/3 validators during an entire session, when favouring availability recovery from systematic chunks.

Therefore, the relay chain node needs a deterministic way of evenly distributing the first ~(N_VALIDATORS / 3) systematic availability chunks to different validators, based on the session, relay chain block number and core. The main purpose is to ensure fair distribution of network bandwidth usage for availability recovery in general and in particular for systematic chunk holders.

Stakeholders

Relay chain node core developers.

Explanation

Systematic erasure codes

An erasure coding algorithm is considered systematic if it preserves the original unencoded data as part of the resulting code. The implementation of the erasure coding algorithm used for polkadot's availability data is systematic. Roughly speaking, the first N_VALIDATORS/3 chunks of data can be cheaply concatenated to retrieve the original data, without running the resource-intensive and time-consuming reconstruction algorithm.

Here's the concatenation procedure of systematic chunks for polkadot's erasure coding algorithm (minus error handling, for briefness):

pub fn reconstruct_from_systematic<T: Decode>(
	n_validators: usize,
	chunks: Vec<&[u8]>,
) -> T {
	let mut threshold = (n_validators - 1) / 3;
  if !is_power_of_two(threshold) {
    threshold = next_lower_power_of_2(threshold);
  }

	let shard_len = chunks.iter().next().unwrap().len();

	let mut systematic_bytes = Vec::with_capacity(shard_len * kpow2);

	for i in (0..shard_len).step_by(2) {
		for chunk in chunks.iter().take(kpow2) {
			systematic_bytes.push(chunk[i]);
			systematic_bytes.push(chunk[i + 1]);
		}
	}

	Decode::decode(&mut &systematic_bytes[..]).map_err(|err| Error::Decode(err))
}

Availability recovery now

At this moment, the availability recovery process looks different based on the estimated size of the available data:

(a) for small PoVs (up to 128 Kib), sequentially try requesting the unencoded data from the backing group, in a random order. If this fails, fallback to option (b).

(b) for large PoVs (over 128 Kib), launch N parallel requests for the erasure coded chunks (currently, N has an upper limit of 50), until enough chunks were recovered. Validators are tried in a random order. Then, reconstruct the original data.

Availability recovery from systematic chunks

As part of the effort of increasing polkadot's resource efficiency, scalability and performance, work is under way to modify the Availability Recovery subsystem by leveraging systematic chunks. See this comment for preliminary performance results.

In this scheme, the relay chain node will first attempt to retrieve the ~N/3 systematic chunks from the validators that should hold them, before falling back to recovering from regular chunks, as before.

Chunk assignment function

Properties

The function that decides the chunk index for a validator should be parameterized by at least (validator_index, session_index, block_number, core_index) and have the following properties:

  1. deterministic
  2. pseudo-random
  3. relatively quick to compute and resource-efficient.
  4. when considering the other params besides validator_index as fixed, the function should describe a random permutation of the chunk indices
  5. considering session_index and block_number as fixed arguments, the validators that map to the first N/3 chunk indices should have as little overlap as possible for different cores.

In other words we want a uniformly distributed, deterministic mapping from ValidatorIndex to ChunkIndex per block per core.

Proposed runtime API

Pseudocode:

pub fn get_chunk_index(
  n_validators: u32,
  validator_index: ValidatorIndex,
  session_index: SessionIndex,
  block_number: BlockNumber,
  core_index: CoreIndex
) -> ChunkIndex {
  let threshold = systematic_threshold(n_validators); // Roughly n_validators/3
  let seed = derive_seed(session_index, block_number);
  let mut rng: ChaCha8Rng = SeedableRng::from_seed(seed);
  let mut chunk_indices: Vec<ChunkIndex> = (0..n_validators).map(Into::into).collect();
  chunk_indices.shuffle(&mut rng);

  let core_start_pos = threshold * core_index.0;

  chunk_indices[(core_start_pos + validator_index) % n_validators]
}

The function should be implemented as a runtime API, because:

  1. it enables further atomic changes to the shuffling algorithm.
  2. it enables alternative client implementations (in other languages) to use it
  3. considering how critical it is for parachain consensus that all validators have a common view of the Validator->Chunk mapping, this mitigates the problem of third-party libraries changing the implementations of the ChaCha8Rng or the rand::shuffle that could be introduced in further versions. This would stall parachains if only a portion of validators upgraded the node.

Additionally, so that client code is able to efficiently get the mapping from the runtime, another API will be added for retrieving chunk indices in bulk for all validators at a given block and core:

pub fn get_chunk_indices(
  n_validators: u32,
  session_index: SessionIndex,
  block_number: BlockNumber,
  core_index: CoreIndex
) -> Vec<ChunkIndex> {
  let threshold = systematic_threshold(n_validators); // Roughly n_validators/3
  let seed = derive_seed(session_index, block_number);
  let mut rng: ChaCha8Rng = SeedableRng::from_seed(seed);
  let mut chunk_indices: Vec<ChunkIndex> = (0..n_validators).map(Into::into).collect();
  chunk_indices.shuffle(&mut rng);

  let core_start_pos = threshold * core_index.0;
  
  chunk_indices
    .into_iter()
    .cycle()
    .skip(core_start_pos)
    .take(n_validators)
    .collect()
}

Upgrade path

Considering that the Validator->Chunk mapping is critical to para consensus, the change needs to be enacted atomically via governance, only after all validators have upgraded the node to a version that is aware of this mapping. It needs to be explicitly stated that after the runtime upgrade and governance enactment, validators that run older client versions that don't support this mapping will not be able to participate in parachain consensus.

Getting access to core_index

Availability-recovery can currently be triggered by the following steps in the polkadot protocol:

  1. During the approval voting process.
  2. By other collators of the same parachain
  3. During disputes.

The core_index refers to the index of the core that the candidate was occupying while it was pending availability (from backing to inclusion). Getting the right core index for a candidate can prove troublesome. Here's a breakdown of how different parts of the node implementation can get access to this data:

  1. The approval-voting process for a candidate begins after observing that the candidate was included. Therefore, the node has easy access to the block where the candidate got included (and also the core that it occupied).

  2. The pov_recovery task of the collators starts availability recovery in response to noticing a candidate getting backed, which enables easy access to the core index the candidate started occupying.

  3. Disputes may be initiated on a number of occasions:

    3.a. is initiated by the validator as a result of finding an invalid candidate while participating in the approval-voting protocol. In this case, availability-recovery is not needed, since the validator already issued their vote.

    3.b is initiated by the validator noticing dispute votes recorded on-chain. In this case, we can safely assume that the backing event for that candidate has been recorded and kept in memory.

    3.c is initiated as a result of getting a dispute statement from another validator. It is possible that the dispute is happening on a fork that was not yet imported by this validator, so the subsystem may not have seen this candidate being backed.

As a solution to 3.c, a new version for the disputes request-response networking protocol will be added. This message type will include the relay block hash where the candidate was included. This information will be used in order to query the runtime API and retrieve the core index that the candidate was occupying.

The usage of the systematic data availability recovery feature will also be subject to all nodes using the V2 disputes networking protocol.

Alternatives to using core_index

As an alternative to core_index, the ParaId could be used. It has the advantage of being readily available in the CandidateReceipt, which would enable the dispute communication protocol to not change and would simplify the implementation. However, in the context of CoreJam, ParaIds will no longer exist (at least not in their current form).

Using the candidate hash as a random seed for a shuffle is another option.

Drawbacks

Has a fair amount of technical complexity involved:

  • Introduces another runtime API that is going to be queried by multiple subsystems. With adequate client-side caching, this should be acceptable.

  • Requires a networking protocol upgrade on the disputes request-response protocol

Testing, Security, and Privacy

Extensive testing will be conducted - both automated and manual. This proposal doesn't affect security or privacy.

Performance, Ergonomics, and Compatibility

Performance

This is a necessary DA optimisation, as reed-solomon erasure coding has proven to be a top consumer of CPU time in polkadot as we scale up the parachain block size and number of availability cores.

With this optimisation, preliminary performance results show that CPU time used for reed-solomon coding can be halved and total POV recovery time decrease by 80% for large POVs. See more here.

Ergonomics

Not applicable.

Compatibility

This is a breaking change. See "upgrade path" section above. All validators need to have upgraded their node versions before the feature will be enabled via a runtime upgrade and governance call.

Prior Art and References

See comments on the tracking issue and the in-progress PR

Unresolved Questions

  • Is it a future-proof idea to utilise the core index as a parameter of the chunk index compute function? Is there a better alternative that avoid complicating the implementation?
  • Is there a better upgrade path that would preserve backwards compatibility?

Future Directions and Related Material

This enables future optimisations for the performance of availability recovery, such as retrieving batched systematic chunks from backers/approval-checkers.