-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Profile native compiled wasm #1845
Conversation
37e7c84 to
497080b
Compare
4bad507 to
f735373
Compare
5a03290 to
ead0efa
Compare
|
This is really fantastic - it's a great ideal case to aim towards with our Wasm if nothing else. One way to mitigate the cost of functions that will be called many hundreds of thousands of times like the runtime functions is to patch the calls directly instead of using the relocation table, but I don't know if that's necessary since for the runtime we expect users wanting the most speed to use native code directly anyway. A couple questions:
|
|
The initial version of the numbers in the first post was gathered without @Vurich Thanks!
No idea, for
Yeah, I totally agree, processing transactions would of course provide better insights. This limitation was not done intentionally though, it's just the lowest hanging fruit to test. Getting it to work for a more complex chain would require a more extensive implementation. Regarding "so you only test the calling cost?": the implementation behind these methods is actually executed (e.g. for |
It's only called at genesis, FWIW |
Run `cargo run --release -- purge-chain --dev && cargo run --release -- --dev`. Besides this nothing works (no tests, no import, etc.).
`make default` has been executed in this commit. wasm2c is available at https://github.com/WebAssembly/wabt/tree/master/wasm2c.
Run it this way: git checkout cmichi-time-native-calls cargo run --release -- purge-chain --dev cargo run --release -- --dev > /tmp/rust_native 2>&1 git checkout cmichi-time-wasm-calls cargo run --release -- purge-chain --dev cargo run --release -- --dev > /tmp/wasmi 2>&1 git checkout cmichi-profile-native-compiled-wasm-2nd-approach cargo run --release -- purge-chain --dev cargo run --release -- --dev > /tmp/wasm2 2>&1 git checkout cmichi-profile-native-compiled-wasm-2nd-approach sed -i '/MEMCHECK(mem, addr, t1);/d' node_runtime.c gcc -fPIC -rdynamic -shared -o libnode_runtime.so node_runtime.c wasm-rt-impl.c cp -f libnode_runtime.so target/release/deps/ cargo run --release -- purge-chain --dev cargo run --release -- --dev > /tmp/wasm2c_wo_bc 2>&1 ./compile-stats.sh
6d04269 to
726d95a
Compare
|
Found the issue with I updated the first post with the re-run metrics. |
|
I'm going to close this to clear up the PR list. |
This addresses #1519 and this PR is not supposed to be merged, it's just here for discussion and can be closed afterwards.
The idea is to gain insight into how much optimization room we have for calls into wasm. To get an idea of the "ground truth" I use wasm2c to convert our
node_runtime.wasmto C, which I then compile as a shared library, which I link to the substrate executable.I then timed invocations of
call_in_wasm_modulein a very simple manner and rancargo run --release -- --devfor 30 blocks. I did the same for wasm calls (here is the timed master branch with forced wasm method invocations: https://github.com/paritytech/substrate/tree/cmichi-time-wasm-calls) and for Rust native method invocations (https://github.com/paritytech/substrate/tree/cmichi-time-native-calls). No transactions took place on the chain, only empty blocks were produced (though there already happens a lot of stuff, even for empty blocks).This is only a very rough benchmark! I'm totally aware of that. For example, system interruptions by the OS scheduler are not accounted for and could skew the results. But: a better benchmark ‒ e.g. benchmarking
--import-blocksof a chain dump and monitoring viaperf-stat(1)‒ would require a lot more work. This is because substrate handles multiple wasmiModuleInstancesand this branch uses shared memory with the so, which basically means only one "instance" can be run at a time.Either way, I ran the measurements on an otherwise idling system and I think the results are sufficient to give a rough idea how much headroom there is for the different methods.
wasm2c_wo_bcis the shared library compiled with explicit bounds checking disabled, and it can be considered as more realistic, since a real wasm VM implementation most likely will have signal based bounds checking mechanism which is zero cost for the non-exceptional case.