Deprecation and Removal of Substrate Native Runtime Optimization

With this, I want to start a design discussion about a potential major change for substrate, specifically **deprecation and removal of the Substrate Native Runtime Optimization** and instead rely exclusively on wasm for the runtime execution. It is not a call to action, nor a design document, but rather a request for comments.

Substrate Native Runtime Optimization is an optimization that we perform by we leverage the fact that the both runtime and the node are written in Rust and with some effort could be cross-compiled.

Basically, we take the runtime Rust source code and use it as a regular dependency to the substrate node. This optimization can lead to more than 2x speedups of the runtime code execution.

This design decision was accepted at the very beginning of Substrate, back then Polkadot, AFAIR.

However, I'd like to argue that this optimization doesn't deliver on the promises. Take the following two aspects

1. Sync speed.
2. Transaction throughput

One of the key features of substrate is forkless upgrades. I.e. a chain can update itself without resorting to forking. Typically a healthy chain lives past several runtime upgrades. However, note that the native runtime can be compiled only once. 

From point of view of syncing that means that throughout the whole chain history only the part with the runtime version that happened to be compiled into the node will have the speed up. Block production doesn't actually benefit from the native runtime either since wasm is the canonical runtime and validators use that.

## Costs

Turns out, the cost we pay for supporting the native runtime is non-negligable. I identify two major groups of costs associated with supporting the native runtime:

- First group of problems are slight differences between the substrate wasm environment and compiled in. 
- The second group is related to the complexity the substrate native runtime requires.

The first category is essentially leaks of abstractions caused in the process of translating the same high-level code to two very different environments. The second category is complexity we introduce to bridge this gap.

Here are some instances of such differences:

1. x86_64 vs. wasm32
2. std vs. no_std
3. multithreaded (and multitasked) vs. exclusively singlethreaded
4. panic=unwind vs. panic=abort
5. shared address space vs. exclusively owned by the sandbox address space

Those might seem small, the significance of those is not to be underestimated though since they still bear the risk of consensus errors.

Let's examine each of these.

#### memory and allocator

The native runtime essentially has access to unlimited amounts of memory and the allocator doesn't matter for it.

The wasm runtime has access to finite amounts of memory. Moreover, amount of memory available to the wasm runtime is made unpredictable because of the inefficiencies of the allocator.

The amount of memory available doesn't matter as matters the fact that one environment has a sharp limit and another doesn't. Reaching this sharp edge by the wasm runtime is a potential consensus issue.

#### behavior of mutable globals

From the perspective of wasm, Rust global variables, be it thread_locals or statics or whathave you, are essentially compiled down to be globals in a single-threaded context or thread-locals in multithreaded context. 

From the perspective of the native runtime, however, the translation is direct. `thread_local` will be translated to a `thread_local` and static global will end up as a static global. That's a problem since the runtime writers would have to be careful and respect the threading aspect.

A more worrying difference though is that the globals in wasm are always restored. I.e. when the wasm runtime receives control it can assume that all globals are initialized to their initial values. 

In the native runtime, the behavior depends on the exact type of a global. In case of a `thread_local` it would be the value the last thread left it in. You better not use the static globals in the native runtime.

#### word size differences

While we try to avoid any dependencies on the `usize` in our codebase, the difference can still be observed in some edge cases.

For example. There was a recent event when a person raised a question whether `sort` and `sort_unstable` give the same results. There were different answers to this question from different people.

AFAIR, somebody pointed out that sort unstable is using the pattern-defeating quicksort which defeats patterns based on random shuffling and thus cannot be used in the deterministic environment. I was surprised at the time, like, how would it obtain entropy in wasm and at first thought it wasn't a problem.

Then my investigation showed that there is indeed PRNG [in action](https://github.com/rust-lang/rust/blob/76b8420168a2e14abf025a07ee4e32d87956d940/src/libcore/slice/sort.rs#L566-L572) which is seeded deterministically. However, they generate `usize` using different code paths for 32 bit platforms and 64 bit platforms. I haven't dig deeper to find out whether this actually would lead to a problem, especially considering that we migrated from `sort_unstable` just in case.

The thing I want to point your attention to is how subtle this difference is and what traitorous trick the libcore played with us here.

A more worrying issue though is that some other person pointed out a that the same results are not guaranteed between platforms. I guess that extends even to the point that different versions of the rustc (or rather libcore) can have different behavior and the compilers do differ between native and wasm runtimes.

#### panics

The coding guidelines state that panics in runtime must be avoided at all costs. Exploitation of a panic leads to a potential DoS vector. It is not the game over though since there are still additional mitigations in place. For instance, IIUC we ban a peer that sent a panicking transaction to us.

That's however also has its cost. We must compile the Substrate node in `panic=unwind`. While it doesn't have a direct impact on performance (the mechanism is designed to be zero-cost), it does have all chances to affect the performance indirectly through code-bloat and trashing the icaches.

My very quick and dirty benchmark shows that if you compile the node with panic=abort the syncing will get slightly faster (0.8.24 on rustc 1.48.0-nightly (fc2daaae6 2020-09-28))

Apart from performance, panics actually also suffer from abstraction bleeding. We compile rust code into wasm with panic=abort. This translates into a wasm trap which in turn tears down the instance safely. In native runtime we emulate this behavior by wrapping calls into the native runtime in `panic::catch_unwind`. A panic raised inside the native runtime will be caught there. Simple, that is in theory.

The first complication is that double-panic aborts. That is, during panic the call stack is unwound to the nearest enclosing `catch_unwind` walking the stack destructing all values found on stack, potentially calling the `Drop` implementation if any. If the drop panics then the whole process is brought down at once. It might sound unlikely but this indeed [happened](https://github.com/paritytech/substrate/pull/7167).

The second complication raises due to the fact that we seek to present the user like the following:

```
Version: 0.7.0-3778e05-x86_64-macos

   0: backtrace::backtrace::trace
   1: backtrace::capture::Backtrace::new
   2: sp_panic_handler::set::{{closure}}
   3: std::panicking::rust_panic_with_hook
   4: std::panicking::begin_panic
   5: frame_executive::Executive<System,Block,Context,UnsignedValidator,AllModules,COnRuntimeUpgrade>::execute_block
   ... <snip>
  36: tokio::runtime::context::enter
  37: std::sys_common::backtrace::__rust_begin_short_backtrace
  38: core::ops::function::FnOnce::call_once{{vtable.shim}}
  39: std::sys::unix::thread::Thread::new::thread_start
  40: __pthread_start


Thread 'tokio-runtime-worker' panicked at 'Storage root must match that calculated.', /Users/kun/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libstd/macros.rs:13

This is a bug. Please report it at:

	https://github.com/paritytech/substrate/issues/new

Hash: given=1fb606cbe8cf369d3ff130647d53ff61f6a677d0288b6b2c1ac6fb9ed87dc3cc, expected=f7e930bcbf0380e9c1c30b8125e471f2756680b4b37d7f9e94798c144e7821ab
```

It works like this: there is process wide hook maintained by `sp-panic-handler`. Whenever a panic occurs the hook print the message. Apart from that the panic handler either exits the process or not depending on a special thread local flag. 

This flag is by default set to `abort`. However, before entering into the native runtime it is set to just unwind, so the already mentioned mechanism above handles the panic appropriately. However, we need to set a special guard again when the native runtime calls back into the node again through Substrate Runtime Interface - this is because we assume that the substrate runtime interface implementation doesn't panic, it has a special path to return errors to the node, but if it does panic we want to treat it as a node error and abort.

For example, we expect that backend can always return the storage entries requested by the runtime. We even have this proof

https://github.com/paritytech/substrate/blob/025dd54ce646be3409fcf6e5549792a460a3b147/primitives/state-machine/src/ext.rs#L45-L45

Except, this can indeed happen in a light-client. Think of a light-client that reexecutes a runtime call with a witness that lacks (inadvertendly or maliciously) some trie nodes. In that case, the backend legitly returns an error but because `EXT_NOT_ALLOWED_TO_FAIL` we bring down the whole error. (We also cannot change the interface of storage functions since that would be even worse, because the wasm runtime would have to deal with inherently unrecoverable errors)

To mitigate this another flag [was introduced](https://github.com/paritytech/substrate/pull/3504) called `never_abort`. This flag is used for exactly this case. So after all we have this tri-state panic handler with quite non obvious and far reaching semantics. I assume should a legit error take place in the host function implementations - it will be attributed to the untrusted backend.

#### with_std and without_std

Because of the differences in the different environments and the way how the node interacts with the runtime and the runtime with node differs depending on which environment we are dealing with.

The codepaths do differ between the environments. For instance, the way how the parameters are passed between the environments in runtime_interface may be different in.

Ironically though most of our tests are going through the native path. We will touch this point more later.

#### runtime_version

The hassle around bumping the `runtime_version` is primarily needed for the answering the question: can I pass the control to the native runtime to handle this call?

If the runtime version doesn't alter any behavior then we could demote it to something more convenient to use and that will do the job: e.g. maintain a simple upgrade counter, use the block number of upgrade, or after all, fetch the crate number from the Cargo.toml for the runtime crate.

## Life without the native runtime optimization

Hopefully I managed to convince that the costs of the native runtime are far from trivial.

But what would we gain if we removed the native runtime optimization? 

### Compile Times

First of all, we won't have to compile the runtime dependency graph twice which should be a nice improvement.

More so, considered that runtime is heavy on [generics](https://zulip-archive.rust-lang.org/122651general/08459ParityTechnologiesrrustAMA.html#203889528).

### Decouple Runtime Releases from Node Releases

Removing the native runtime would allow us to decouple runtime upgrades from node upgrades.

Apart from that, we might gain ability to introduce The Substrate Node. I.e. a compiled universal node (aka Bring Your Own Runtime) that would serve a go-to solution for blockchains that are happy with a default out-of-box FRAME experience.

## Complications

### Testing

One thing that won't let us get rid of all of the complexity associated with the native runtime environment is that all of our unit testing for runtime code happens primarily in native.

While that doesn't stop us proceeding with ripping out the native code, this fact indeed will make us to leave support for native environment. At the very least, we could reserve native/std within the context of runtime exclusively for testing. That means we could simplify and deoptimize, perhaps trading for more diagnostics, these codepaths since testing doesn't require utmost efficiency.

However, the fact that we are exercising the native codepaths during testing and not the wasm paths is a bit alarming. It actually was a big source of errors in early days of seal contracts pallet, when it finally was started to be used within wasm runtimes.

In ideal world, we would have a way to write runtime tests that indistingushable, or better, than we have right now, but such that exercises more parts that indeed present on production paths, i.e. run in wasm.

That can be introduced incrementally though.

### RPC

Theoretically, an RPC substrate gateway equipped with a node with the latest runtime version compiled in can handle more 2x+ more throughtput compared to a node that relies exclusively on wasm runtime. I don't I have a good solution for this problem.

### Offchain Workers

To be honest I am not sure about this, but I think one of the selling points of offchain workers is that they are upgradable.

In that case, one would think that it's not a big problem to remove the native runtime, however, in presentations the offchain workers are presented like they could opt-out of wasm.

I guess this still has some advantages over external processes, I am not sure. I.e. a close access to the trie.

For instance, there is substrate-archive, an external process that parses all the data and stuffs it into a relational database. In order to access the data of blockchain it goes... directly to rocksdb.

Which makes me thinking if we should provide special APIs for those use-cases, and leave offchain workers only in wasm.

## Discussion

Now, I probably have missed somethings and some other things I got wrong. This decision permeates substrate and perhaps we made some things ossified. So please speak up or add if you have anything.

The goal of this discussion is to ultimately agree whether it would be a good change or not and under which circumstances. Only if we reach the consensus we can start talking about the particular steps to achieve this.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deprecation and Removal of Substrate Native Runtime Optimization #7288

Costs

memory and allocator

behavior of mutable globals

word size differences

panics

with_std and without_std

runtime_version

Life without the native runtime optimization

Compile Times

Decouple Runtime Releases from Node Releases

Complications

Testing

RPC

Offchain Workers

Discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deprecation and Removal of Substrate Native Runtime Optimization #7288

Description

Costs

memory and allocator

behavior of mutable globals

word size differences

panics

with_std and without_std

runtime_version

Life without the native runtime optimization

Compile Times

Decouple Runtime Releases from Node Releases

Complications

Testing

RPC

Offchain Workers

Discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions