Profile native compiled wasm #1845

cmichi · 2019-02-21T14:06:44Z

This addresses #1519 and this PR is not supposed to be merged, it's just here for discussion and can be closed afterwards.

The idea is to gain insight into how much optimization room we have for calls into wasm. To get an idea of the "ground truth" I use wasm2c to convert our node_runtime.wasm to C, which I then compile as a shared library, which I link to the substrate executable.

I then timed invocations of call_in_wasm_module in a very simple manner and ran cargo run --release -- --dev for 30 blocks. I did the same for wasm calls (here is the timed master branch with forced wasm method invocations: https://github.com/paritytech/substrate/tree/cmichi-time-wasm-calls) and for Rust native method invocations (https://github.com/paritytech/substrate/tree/cmichi-time-native-calls). No transactions took place on the chain, only empty blocks were produced (though there already happens a lot of stuff, even for empty blocks).

This is only a very rough benchmark! I'm totally aware of that. For example, system interruptions by the OS scheduler are not accounted for and could skew the results. But: a better benchmark ‒ e.g. benchmarking --import-blocks of a chain dump and monitoring via perf-stat(1) ‒ would require a lot more work. This is because substrate handles multiple wasmi ModuleInstances and this branch uses shared memory with the so, which basically means only one "instance" can be run at a time.

Either way, I ran the measurements on an otherwise idling system and I think the results are sufficient to give a rough idea how much headroom there is for the different methods.

wasm2c_wo_bc is the shared library compiled with explicit bounds checking disabled, and it can be considered as more realistic, since a real wasm VM implementation most likely will have signal based bounds checking mechanism which is zero cost for the non-exceptional case.

# Core_execute_block
  rust_native:        457 µs (100.0%)
 wasm2c_wo_bc:       1958 µs (23.3%)
       wasm2c:       1894 µs (24.1%)
        wasmi:       7168 µs (6.4%)

# Core_version
  rust_native:          5 µs (100.0%)
 wasm2c_wo_bc:         10 µs (52.0%)
       wasm2c:         10 µs (49.9%)
        wasmi:        155 µs (3.4%)

# Core_authorities
  rust_native:         14 µs (100.0%)
 wasm2c_wo_bc:         32 µs (43.4%)
       wasm2c:         32 µs (42.7%)
        wasmi:        210 µs (6.6%)

# Core_initialise_block
  rust_native:        161 µs (100.0%)
 wasm2c_wo_bc:       1190 µs (13.5%)
       wasm2c:       1092 µs (14.7%)
        wasmi:       4896 µs (3.3%)

# BlockBuilder_apply_extrinsic
  rust_native:         44 µs (100.0%)
 wasm2c_wo_bc:        314 µs (14.0%)
       wasm2c:        310 µs (14.2%)
        wasmi:       1979 µs (2.2%)

# BlockBuilder_finalise_block
  rust_native:        467 µs (100.0%)
 wasm2c_wo_bc:        832 µs (56.2%)
       wasm2c:        801 µs (58.4%)
        wasmi:       1012 µs (46.2%)

# BlockBuilder_inherent_extrinsics
  rust_native:         18 µs (100.0%)
 wasm2c_wo_bc:         93 µs (19.6%)
       wasm2c:         88 µs (20.9%)
        wasmi:        615 µs (3.0%)

# AuraApi_slot_duration
  rust_native:         24 µs (100.0%)
 wasm2c_wo_bc:         29 µs (82.8%)
       wasm2c:         31 µs (77.4%)
        wasmi:         70 µs (34.3%)

# GrandpaApi_grandpa_authorities
  rust_native:         20 µs (100.0%)
 wasm2c_wo_bc:         34 µs (58.8%)
       wasm2c:         33 µs (60.6%)
        wasmi:        118 µs (16.9%)

# GrandpaApi_grandpa_pending_change
  rust_native:          2 µs (100.0%)
 wasm2c_wo_bc:          8 µs (21.7%)
       wasm2c:          9 µs (20.6%)
        wasmi:         73 µs (2.4%)

Taken from https://github.com/WebAssembly/wabt/tree/master/wasm2c.

eira-fransham · 2019-02-21T18:00:02Z

This is really fantastic - it's a great ideal case to aim towards with our Wasm if nothing else. One way to mitigate the cost of functions that will be called many hundreds of thousands of times like the runtime functions is to patch the calls directly instead of using the relocation table, but I don't know if that's necessary since for the runtime we expect users wanting the most speed to use native code directly anyway. A couple questions:

What's going on with AuraApi_slot_duration?
I don't think I quite follow the testing methodology - you're trying to have it exit immediately by processing no transactions so you only test the calling cost? That doesn't take into account inefficiencies like stack overflow checking or even just poor compilation of the Wasm itself even in the small amount of code that gets executed.
As @pepyakin pointed out: why did you not do --release?

cmichi · 2019-02-21T19:00:38Z

The initial version of the numbers in the first post was gathered without --release, I have updated them with everything re-executed in release mode.

@Vurich Thanks!

What's going on with AuraApi_slot_duration?

No idea, for AuraApi_slot_duration and GrandpaApi_grandpa_pending_change wasmi strangely performs better than the wasm2c version. I'll need to look at the implementation to see if something comes up.

I don't think I quite follow the testing methodology - you're trying to have it exit immediately by processing no transactions so you only test the calling cost? That doesn't take into account inefficiencies like stack overflow checking or even just poor compilation of the Wasm itself even in the small amount of code that gets executed.

Yeah, I totally agree, processing transactions would of course provide better insights. This limitation was not done intentionally though, it's just the lowest hanging fruit to test. Getting it to work for a more complex chain would require a more extensive implementation.

Regarding "so you only test the calling cost?": the implementation behind these methods is actually executed (e.g. for Core_execute_block).

rphmeier · 2019-02-21T20:00:52Z

What's going on with AuraApi_slot_duration

It's only called at genesis, FWIW

Run `cargo run --release -- purge-chain --dev && cargo run --release -- --dev`. Besides this nothing works (no tests, no import, etc.).

`make default` has been executed in this commit. wasm2c is available at https://github.com/WebAssembly/wabt/tree/master/wasm2c.

Run it this way: git checkout cmichi-time-native-calls cargo run --release -- purge-chain --dev cargo run --release -- --dev > /tmp/rust_native 2>&1 git checkout cmichi-time-wasm-calls cargo run --release -- purge-chain --dev cargo run --release -- --dev > /tmp/wasmi 2>&1 git checkout cmichi-profile-native-compiled-wasm-2nd-approach cargo run --release -- purge-chain --dev cargo run --release -- --dev > /tmp/wasm2 2>&1 git checkout cmichi-profile-native-compiled-wasm-2nd-approach sed -i '/MEMCHECK(mem, addr, t1);/d' node_runtime.c gcc -fPIC -rdynamic -shared -o libnode_runtime.so node_runtime.c wasm-rt-impl.c cp -f libnode_runtime.so target/release/deps/ cargo run --release -- purge-chain --dev cargo run --release -- --dev > /tmp/wasm2c_wo_bc 2>&1 ./compile-stats.sh

cmichi · 2019-02-22T08:05:16Z

Found the issue with AuraApi_slot_duration and GrandpaApi_grandpa_pending_change ‒ I moved the timer in front of the immediate invocation of the method in wasmi/wasm2c. In the prior version I had timed the call_in_wasm_module method, which did a bit more then just invoke the method externally.

I updated the first post with the re-run metrics.

…8888bd9

pepyakin · 2019-02-27T13:56:09Z

I'm going to close this to clear up the PR list.

cmichi added 3 commits February 21, 2019 11:14

Add wasm2c files

7f39648

Taken from https://github.com/WebAssembly/wabt/tree/master/wasm2c.

Ensure wasm is always used as executor

a088c7c

Expose pointer to start of data in shared lib

6e19331

cmichi force-pushed the cmichi-profile-native-compiled-wasm-2nd-approach branch from 37e7c84 to 497080b Compare February 21, 2019 14:08

pepyakin added the A3-in_progress Pull request is in progress. No review needed at this stage. label Feb 21, 2019

devops-parity added the A4-gotissues label Feb 21, 2019

pepyakin added A0-please_review Pull request needs code review. A1-onice and removed A3-in_progress Pull request is in progress. No review needed at this stage. A4-gotissues labels Feb 21, 2019

cmichi force-pushed the cmichi-profile-native-compiled-wasm-2nd-approach branch from 4bad507 to f735373 Compare February 21, 2019 14:26

devops-parity added the A4-gotissues label Feb 21, 2019

cmichi force-pushed the cmichi-profile-native-compiled-wasm-2nd-approach branch from 5a03290 to ead0efa Compare February 21, 2019 16:47

gavofyork removed A0-please_review Pull request needs code review. A4-gotissues labels Feb 21, 2019

cmichi added 5 commits February 22, 2019 08:30

Execute wasm calls in shared library

56f72cd

Run `cargo run --release -- purge-chain --dev && cargo run --release -- --dev`. Besides this nothing works (no tests, no import, etc.).

Convert wasm to C and compile as shared lib

228731d

`make default` has been executed in this commit. wasm2c is available at https://github.com/WebAssembly/wabt/tree/master/wasm2c.

Add simple timer for duration of wasm calls

46d9386

Move measure to immediate execution

726d95a

cmichi force-pushed the cmichi-profile-native-compiled-wasm-2nd-approach branch from 6d04269 to 726d95a Compare February 22, 2019 07:59

devops-parity updated wasm runtime blobs b5a9f3a2 and merged in master …

603c15e

…8888bd9

devops-parity added the A4-gotissues label Feb 22, 2019

cmichi removed the A4-gotissues label Feb 23, 2019

pepyakin closed this Feb 27, 2019

cmichi mentioned this pull request Feb 27, 2019

Estimate performance gains using compilation #1519

Closed

pepyakin mentioned this pull request May 20, 2019

Wasm execution engine for runtime #2634

Closed

arkpar deleted the cmichi-profile-native-compiled-wasm-2nd-approach branch April 23, 2020 18:10

cmichi mentioned this pull request Sep 28, 2022

Member Request polkadot-fellows/seeding#19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Profile native compiled wasm #1845

Profile native compiled wasm #1845

Uh oh!

cmichi commented Feb 21, 2019 •

edited

Loading

Uh oh!

eira-fransham commented Feb 21, 2019 •

edited

Loading

Uh oh!

cmichi commented Feb 21, 2019 •

edited

Loading

Uh oh!

rphmeier commented Feb 21, 2019

Uh oh!

cmichi commented Feb 22, 2019 •

edited

Loading

Uh oh!

pepyakin commented Feb 27, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Profile native compiled wasm #1845

Profile native compiled wasm #1845

Uh oh!

Conversation

cmichi commented Feb 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eira-fransham commented Feb 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmichi commented Feb 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rphmeier commented Feb 21, 2019

Uh oh!

cmichi commented Feb 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pepyakin commented Feb 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

cmichi commented Feb 21, 2019 •

edited

Loading

eira-fransham commented Feb 21, 2019 •

edited

Loading

cmichi commented Feb 21, 2019 •

edited

Loading

cmichi commented Feb 22, 2019 •

edited

Loading

pepyakin commented Feb 27, 2019 •

edited

Loading