-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Offchain Workers: deterministic bookkeeping #3722
Description
Offchain workers are a feature of Substrate allowing us to provide code in the Runtime which may be non-deterministic, and is intended to be executed for each new block to perform tasks such as:
- querying a price feed
- making an HTTP request
- creating a DHT entry
- replicating a file
For nondeterministic tasks such as the above, it would be bad to execute the offchain logic on ancient blocks while performing a major synchronization. You might end up re-executing logic initially triggered years ago, to absolutely no effect. Because of this, offchain workers are designed not to execute for every block in the chain, as a full node.
I've encountered two use-cases which fall into another category of execution: deterministic bookkeeping. These are situations where the computation is deterministic but data-heavy, and we want to off-load data (typically trie nodes) off of the chain state (where trie roots are kept). For these use-cases, the current operation of offchain workers does not seem to be sufficient.
Example 1: Merkle Mountain Ranges (MMR)
#2053 , https://github.com/mimblewimble/grin/blob/master/doc/mmr.md
For many kinds of auxiliary blockchain protocols, it's important to be able to prove that some ancient block header is an ancestor of the finalized chain head. MMRs provide a good way of doing that.
We want to write a runtime module to keep track of the peaks (roots) of a bunch of different merkle tries - there will be log2(N) of these for N blocks (and N trie nodes in total). You can add to the MMR with only the peaks, and prove ancestry if you have all the nodes.
We'd want full nodes to keep track of all of the MMR nodes by keeping them in offchain storage. However, if even one block in the chain is not executed, it is possible to end up in a situation where ancestry can no longer be proven.
Example 2: Historical Slashing
srml-staking and srml-session are designed so that validators and nominators can be slashed for a long bonding duration while they wait for their money to be withdrawable. Keeping months' worth of historical validator sets, session keys, and nominator assignments on-chain is too heavy, so we instead keep a trie root encoding the historical validator sets for every session. Full nodes are intended to keep this trie root.
For slashing, the situation isn't as severe. However, for security it would be best to have as many full nodes as possible be able to report misbehavior. If a full node doesn't execute the off-chain worker, it may not have the trie nodes necessary to issue a report of a misbehavior that it witnesses - reducing the effectiveness of fishermen.
Final notes
For these kinds of deterministic bookkeeping tasks, it would be really useful to have a category of offchain execution which is guaranteed to be run on every block. This could also be done by having an alternate set of storage APIs available to on-chain execution, which places storage into the off-chain DB.
Warp sync also obviously plays a big role in usability of a blockchain client. We don't want it to happen that only nodes which have performed a full sync have all of the bookkeeping trie data. In the MMR case, it would mean that only those kinds of nodes could give out ancestry proofs. In the Historical Slashing case, it would mean that recently warp-synced nodes could not report misbehavior.
Given that this data is all trie-based, with roots in the runtime, it would be nice to be able to warp sync it as well. This may not be too difficult with the right runtime APIs, but it is something to keep in mind.