Skip to content

Conversation

@kentosugama
Copy link
Contributor

No description provided.

@github-actions
Copy link

github-actions bot commented Mar 14, 2023

Warning
The flamegraph link only works after you merge.

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

  • setTimer measures both the setTimer(0) method and the execution of empty job.
  • It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
    of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

binary_size heartbeat
Motoko 121_445 11_271
Rust 26_919 830

Timer

binary_size setTimer cancelTimer
Motoko 135_274 31_898 1_718
Rust 421_265 50_847 9_660

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

  • generate 50k. Insert 50k Nat32 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
  • max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
  • batch_get 50. Find 50 elements from the collection.
  • batch_put 50. Insert 50 elements to the collection.
  • batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

  • The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
  • We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
  • Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
  • Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • btree comes from Byron Becker's stable BTreeMap library.
  • zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 162_540 2_097_038_276 9_102_052 1_115_747 609_004_394 1_057_139
triemap 166_619 2_021_341_114 9_716_008 773_985 1_855_492 1_035_118
rbtree 164_895 1_881_678_586 10_102_184 699_641 1_693_062 938_158
splay 163_477 2_087_719_583 9_302_108 1_142_246 1_958_463 1_142_695
btree 192_198 1_918_261_233 8_157_968 873_973 1_758_607 946_667
zhenya_hashmap 157_503 1_636_688_227 9_301_800 645_183 1_444_666 649_719
btreemap_rs 418_565 120_378_963 1_638_400 59_680 135_119 60_771
hashmap_rs 407_805 49_413_271 1_835_008 19_785 58_550 21_067

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 149_533 690_802_078 1_400_024 369_948 722_102
heap_rs 386_575 4_975_609 819_200 50_696 20_838

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 162_540 419_471_830 1_820_844 1_114_027 122_731_309 1_054_914
hashmap_rs 407_805 10_188_331 950_272 19_116 57_878 20_009
imrc_hashmap_rs 417_209 19_067_936 1_572_864 30_490 114_539 37_517
movm_rs 1_694_121 1_081_538_103 2_654_208 2_678_181 6_831_749 5_331_044
movm_dynamic_rs 1_851_191 545_039_960 2_129_920 2_144_206 2_943_937 2_123_781

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 141_478 129_115 18_627 8_487 14_604 3_663
Rust 455_851 506_613 58_042 39_026 81_523 46_201

Sample Dapps

Measure the performance of some typical dapps:

  • Basic DAO,
    with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
  • DIP721 NFT

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 229_963 41_021 18_070 12_751 14_944
Rust 726_247 499_376 92_870 115_070 125_850

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 183_305 12_164 22_456 4_747
Rust 767_457 136_720 352_205 86_991

Motoko Garbage Collection

Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit.

  • default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
  • copying. Compile with --force-gc --copying-gc.
  • compacting. Compile with --force-gc --compacting-gc.
  • generational. Compile with --force-gc --generational-gc.
generate 80k max mem batch_get 50 batch_put 50 batch_remove 50
default 247_115_881 15_539_984 50 50 50
copying 247_115_831 15_539_984 247_110_319 247_262_382 247_262_501
compacting 409_365_425 15_539_984 308_339_012 348_775_445 352_663_118
generational 624_423_107 15_540_260 57_009 1_390_483 1_060_163

@github-actions
Copy link

github-actions bot commented Mar 16, 2023

Note
Diffing the performance result against the published result from main branch

Heartbeat

binary_size heartbeat
Motoko 121_445 ($\textcolor{green}{-17.45\%}$) 11_271 ($\textcolor{green}{-5.59\%}$)
Rust 26_919 ($\textcolor{green}{-24.49\%}$) 830 ($\textcolor{red}{41.40\%}$)

Timer

binary_size setTimer cancelTimer
Motoko 135_274 ($\textcolor{green}{-16.98\%}$) 31_898 ($\textcolor{green}{-7.91\%}$) 1_718 ($\textcolor{green}{-10.66\%}$)
Rust 421_265 ($\textcolor{green}{-19.86\%}$) 50_847 ($\textcolor{green}{-8.89\%}$) 9_660 ($\textcolor{green}{-8.36\%}$)

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 162_540 ($\textcolor{green}{-16.91\%}$) 2_097_038_276 ($\textcolor{green}{-12.15\%}$) 9_102_052 1_115_747 ($\textcolor{green}{-13.74\%}$) 609_004_394 ($\textcolor{green}{-11.64\%}$) 1_057_139 ($\textcolor{green}{-13.71\%}$)
triemap 166_619 ($\textcolor{green}{-17.27\%}$) 2_021_341_114 ($\textcolor{green}{-11.71\%}$) 9_716_008 773_985 ($\textcolor{green}{-13.33\%}$) 1_855_492 ($\textcolor{green}{-12.28\%}$) 1_035_118 ($\textcolor{green}{-13.12\%}$)
rbtree 164_895 ($\textcolor{green}{-17.38\%}$) 1_881_678_586 ($\textcolor{green}{-11.14\%}$) 10_102_184 699_641 ($\textcolor{green}{-15.05\%}$) 1_693_062 ($\textcolor{green}{-11.70\%}$) 938_158 ($\textcolor{green}{-13.29\%}$)
splay 163_477 ($\textcolor{green}{-17.28\%}$) 2_087_719_583 ($\textcolor{green}{-11.53\%}$) 9_302_108 1_142_246 ($\textcolor{green}{-12.50\%}$) 1_958_463 ($\textcolor{green}{-12.00\%}$) 1_142_695 ($\textcolor{green}{-12.53\%}$)
btree 192_198 ($\textcolor{green}{-18.31\%}$) 1_918_261_233 ($\textcolor{green}{-11.59\%}$) 8_157_968 873_973 ($\textcolor{green}{-13.19\%}$) 1_758_607 ($\textcolor{green}{-12.07\%}$) 946_667 ($\textcolor{green}{-13.11\%}$)
zhenya_hashmap 157_503 ($\textcolor{green}{-16.72\%}$) 1_636_688_227 ($\textcolor{green}{-11.78\%}$) 9_301_800 645_183 ($\textcolor{green}{-13.55\%}$) 1_444_666 ($\textcolor{green}{-12.54\%}$) 649_719 ($\textcolor{green}{-13.67\%}$)
btreemap_rs 418_565 ($\textcolor{green}{-18.90\%}$) 120_378_963 ($\textcolor{green}{-2.76\%}$) 1_638_400 59_680 ($\textcolor{green}{-0.07\%}$) 135_119 ($\textcolor{green}{-3.67\%}$) 60_771 ($\textcolor{green}{-2.12\%}$)
hashmap_rs 407_805 ($\textcolor{green}{-19.12\%}$) 49_413_271 ($\textcolor{green}{-7.18\%}$) 1_835_008 19_785 ($\textcolor{green}{-7.38\%}$) 58_550 ($\textcolor{green}{-8.22\%}$) 21_067 ($\textcolor{green}{-7.51\%}$)

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 149_533 ($\textcolor{green}{-17.11\%}$) 690_802_078 ($\textcolor{green}{-13.26\%}$) 1_400_024 369_948 ($\textcolor{green}{-12.08\%}$) 722_102 ($\textcolor{green}{-13.46\%}$)
heap_rs 386_575 ($\textcolor{green}{-18.64\%}$) 4_975_609 ($\textcolor{green}{-1.31\%}$) 819_200 50_696 ($\textcolor{green}{-5.35\%}$) 20_838 ($\textcolor{green}{-6.48\%}$)

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 162_540 ($\textcolor{green}{-16.91\%}$) 419_471_830 ($\textcolor{green}{-12.15\%}$) 1_820_844 1_114_027 ($\textcolor{green}{-13.74\%}$) 122_731_309 ($\textcolor{green}{-11.63\%}$) 1_054_914 ($\textcolor{green}{-13.71\%}$)
hashmap_rs 407_805 ($\textcolor{green}{-19.12\%}$) 10_188_331 ($\textcolor{green}{-7.08\%}$) 950_272 19_116 ($\textcolor{green}{-7.54\%}$) 57_878 ($\textcolor{green}{-8.28\%}$) 20_009 ($\textcolor{green}{-7.66\%}$)
imrc_hashmap_rs 417_209 ($\textcolor{green}{-19.23\%}$) 19_067_936 ($\textcolor{green}{-4.00\%}$) 1_572_864 30_490 ($\textcolor{green}{-4.18\%}$) 114_539 ($\textcolor{green}{-4.72\%}$) 37_517 ($\textcolor{green}{-1.06\%}$)
movm_rs 1_694_121 ($\textcolor{green}{-16.76\%}$) 1_081_538_103 ($\textcolor{green}{-1.57\%}$) 2_654_208 2_678_181 ($\textcolor{green}{-2.40\%}$) 6_831_749 ($\textcolor{green}{-1.61\%}$) 5_331_044 ($\textcolor{green}{-1.58\%}$)
movm_dynamic_rs 1_851_191 ($\textcolor{green}{-17.79\%}$) 545_039_960 ($\textcolor{green}{-1.90\%}$) 2_129_920 2_144_206 ($\textcolor{green}{-1.95\%}$) 2_943_937 ($\textcolor{green}{-2.20\%}$) 2_123_781 ($\textcolor{green}{-1.95\%}$)

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 141_478 ($\textcolor{green}{-17.63\%}$) 129_115 ($\textcolor{green}{-17.71\%}$) 18_627 ($\textcolor{green}{-5.17\%}$) 8_487 ($\textcolor{green}{-7.20\%}$) 14_604 ($\textcolor{green}{-6.00\%}$) 3_663 ($\textcolor{green}{-8.45\%}$)
Rust 455_851 ($\textcolor{green}{-19.18\%}$) 506_613 ($\textcolor{green}{-27.24\%}$) 58_042 ($\textcolor{green}{-8.46\%}$) 39_026 ($\textcolor{green}{-9.64\%}$) 81_523 ($\textcolor{green}{-8.79\%}$) 46_201 ($\textcolor{green}{-8.59\%}$)

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 229_963 ($\textcolor{green}{-20.99\%}$) 41_021 ($\textcolor{green}{-8.12\%}$) 18_070 ($\textcolor{green}{-10.00\%}$) 12_751 ($\textcolor{green}{-9.95\%}$) 14_944 ($\textcolor{green}{-11.16\%}$)
Rust 726_247 ($\textcolor{green}{-23.10\%}$) 499_376 ($\textcolor{green}{-7.86\%}$) 92_870 ($\textcolor{green}{-9.36\%}$) 115_070 ($\textcolor{green}{-8.59\%}$) 125_850 ($\textcolor{green}{-9.34\%}$)

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 183_305 ($\textcolor{green}{-22.08\%}$) 12_164 ($\textcolor{green}{-9.08\%}$) 22_456 ($\textcolor{green}{-9.00\%}$) 4_747 ($\textcolor{green}{-11.39\%}$)
Rust 767_457 ($\textcolor{green}{-23.11\%}$) 136_720 ($\textcolor{green}{-7.09\%}$) 352_205 ($\textcolor{green}{-7.61\%}$) 86_991 ($\textcolor{green}{-8.88\%}$)

Motoko Garbage Collection

generate 80k max mem batch_get 50 batch_put 50 batch_remove 50
default 247_115_881 15_539_984 50 50 50
copying 247_115_831 15_539_984 247_110_319 247_262_382 247_262_501
compacting 409_365_425 15_539_984 308_339_012 348_775_445 352_663_118
generational 624_423_107 15_540_260 57_009 1_390_483 1_060_163

@kentosugama kentosugama mentioned this pull request Apr 6, 2023
11 tasks
mergify bot pushed a commit to dfinity/sdk that referenced this pull request Apr 21, 2023
[LANG-124](https://dfinity.atlassian.net/browse/LANG-124)

[Profiling data](https://docs.google.com/document/d/1ICXF083-hfRZr2OfRUk8OxmZqYDQL4XINHQ0FXuDqTA/edit) suggests that it would be useful to take advantage of the performance and binary size wasm optimizations offered by [`wasm-opt`](https://github.com/WebAssembly/binaryen).

See also results in the [canister profiling repo](dfinity/canister-profiling#35).

This tool has been integrated with `ic-wasm` in this PR: dfinity/ic-wasm#28. The tool is added under the `shrink` command that is currently used in `dfx` to perform binary size reduction of canisters.

The next step is to expose this feature to users by allowing users to opt into the optimizer and specify the optimization level in `dfx.json`. Then to invoke `ic-wasm` in `dfx` according to this argument, similarly to `shrink`.

Note: This tool has been used in the past to perform binary size reductions for canisters, but was replaced because it deletes the custom metadata sections from wasm modules. The `ic-wasm` feature invokes `wasm-opt` while preserving these sections.

Example usage in `dfx.json`:
```
"canisters" : {
"backend" : {
"optimize" : "cycles"
}
}
```
@kentosugama kentosugama deleted the wasm-opt-3 branch May 9, 2023 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants