Skip to content

Conversation

@kentosugama
Copy link
Contributor

@kentosugama kentosugama commented Jun 2, 2023

I think it would be good to merge this so that we can measure performance improvements beyond wasm-opt and not reimplement optimizations already included in the optimizer.

Note that these benchmarks directly useic-wasm instead of using the optimize: "cycles" feature in dfx in order to preserve the wasm name sections for the flame graphs. For any users reading this, for the general case we recommend using the optimizer through dfx instead as the binary size reductions will be better when dropping the name sections.

For future reference: dfinity/sdk#3090
See also #50 for previous discussions.

@github-actions
Copy link

github-actions bot commented Jun 2, 2023

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 169_982 ($\textcolor{green}{-13.37\%}$) 2_097_113_506 ($\textcolor{green}{-12.15\%}$) 9_102_052 1_115_399 ($\textcolor{green}{-13.74\%}$) 609_254_124 ($\textcolor{green}{-11.61\%}$) 1_056_869 ($\textcolor{green}{-13.70\%}$)
triemap 174_030 ($\textcolor{green}{-13.76\%}$) 2_020_134_416 ($\textcolor{green}{-11.65\%}$) 9_715_900 773_637 ($\textcolor{green}{-13.26\%}$) 1_853_794 ($\textcolor{green}{-12.21\%}$) 1_033_460 ($\textcolor{green}{-12.98\%}$)
rbtree 171_127 ($\textcolor{green}{-13.99\%}$) 1_797_995_532 ($\textcolor{green}{-11.20\%}$) 8_902_160 670_401 ($\textcolor{green}{-14.90\%}$) 1_623_975 ($\textcolor{green}{-11.70\%}$) 859_340 ($\textcolor{green}{-13.34\%}$)
splay 170_477 ($\textcolor{green}{-13.84\%}$) 2_040_395_523 ($\textcolor{green}{-11.50\%}$) 8_702_096 1_102_393 ($\textcolor{green}{-12.39\%}$) 1_915_542 ($\textcolor{green}{-11.93\%}$) 1_103_332 ($\textcolor{green}{-12.42\%}$)
btree 198_636 ($\textcolor{green}{-15.60\%}$) 1_875_401_612 ($\textcolor{green}{-11.63\%}$) 7_556_172 813_525 ($\textcolor{green}{-13.14\%}$) 1_718_273 ($\textcolor{green}{-12.11\%}$) 862_047 ($\textcolor{green}{-13.07\%}$)
zhenya_hashmap 165_325 ($\textcolor{green}{-13.20\%}$) 1_642_423_605 ($\textcolor{green}{-11.77\%}$) 9_301_800 647_832 ($\textcolor{green}{-13.50\%}$) 1_447_024 ($\textcolor{green}{-12.52\%}$) 652_030 ($\textcolor{green}{-13.63\%}$)
btreemap_rs 438_979 ($\textcolor{green}{-14.72\%}$) 112_676_543 ($\textcolor{green}{-2.86\%}$) 1_638_400 59_465 ($\textcolor{red}{0.05\%}$) 133_080 ($\textcolor{green}{-3.46\%}$) 60_509 ($\textcolor{green}{-2.08\%}$)
hashmap_rs 428_466 ($\textcolor{green}{-14.78\%}$) 49_363_168 ($\textcolor{green}{-7.45\%}$) 1_835_008 19_572 ($\textcolor{green}{-7.11\%}$) 58_237 ($\textcolor{green}{-8.43\%}$) 20_805 ($\textcolor{green}{-7.47\%}$)

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 156_998 ($\textcolor{green}{-13.61\%}$) 688_335_838 ($\textcolor{green}{-13.23\%}$) 1_400_024 338_619 ($\textcolor{green}{-12.12\%}$) 711_943 ($\textcolor{green}{-13.47\%}$)
heap_rs 406_219 ($\textcolor{green}{-14.20\%}$) 4_975_528 ($\textcolor{green}{-1.31\%}$) 819_200 48_902 ($\textcolor{green}{-8.15\%}$) 20_578 ($\textcolor{green}{-6.85\%}$)

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 169_982 ($\textcolor{green}{-13.37\%}$) 419_486_900 ($\textcolor{green}{-12.14\%}$) 1_820_844 1_113_679 ($\textcolor{green}{-13.74\%}$) 122_781_037 ($\textcolor{green}{-11.60\%}$) 1_054_639 ($\textcolor{green}{-13.70\%}$)
hashmap_rs 428_466 ($\textcolor{green}{-14.78\%}$) 10_178_230 ($\textcolor{green}{-7.34\%}$) 950_272 18_903 ($\textcolor{green}{-7.27\%}$) 57_565 ($\textcolor{green}{-8.49\%}$) 19_747 ($\textcolor{green}{-7.61\%}$)
imrc_hashmap_rs 435_292 ($\textcolor{green}{-15.31\%}$) 19_062_328 ($\textcolor{green}{-4.30\%}$) 1_572_864 29_764 ($\textcolor{green}{-5.57\%}$) 113_802 ($\textcolor{green}{-5.33\%}$) 36_791 ($\textcolor{green}{-2.20\%}$)
movm_rs 1_760_914 ($\textcolor{green}{-15.84\%}$) 999_676_261 ($\textcolor{green}{-1.73\%}$) 2_654_208 2_424_874 ($\textcolor{green}{-2.80\%}$) 6_357_705 ($\textcolor{green}{-1.84\%}$) 5_013_896 ($\textcolor{green}{-1.81\%}$)
movm_dynamic_rs 1_943_858 ($\textcolor{green}{-15.31\%}$) 485_763_587 ($\textcolor{green}{-2.12\%}$) 2_129_920 1_909_424 ($\textcolor{green}{-2.18\%}$) 2_642_175 ($\textcolor{green}{-2.49\%}$) 1_907_002 ($\textcolor{green}{-2.21\%}$)

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 242_539 ($\textcolor{green}{-16.79\%}$) 41_042 ($\textcolor{green}{-7.78\%}$) 18_026 ($\textcolor{green}{-9.51\%}$) 12_678 ($\textcolor{green}{-10.71\%}$) 14_924 ($\textcolor{green}{-11.16\%}$)
Rust 751_374 ($\textcolor{green}{-20.11\%}$) 500_487 ($\textcolor{green}{-7.56\%}$) 93_345 ($\textcolor{green}{-8.90\%}$) 114_984 ($\textcolor{green}{-8.37\%}$) 124_724 ($\textcolor{green}{-8.98\%}$)

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 200_814 ($\textcolor{green}{-17.91\%}$) 12_164 ($\textcolor{green}{-9.08\%}$) 22_455 ($\textcolor{green}{-9.01\%}$) 4_747 ($\textcolor{green}{-11.40\%}$)
Rust 801_533 ($\textcolor{green}{-20.30\%}$) 134_675 ($\textcolor{green}{-6.58\%}$) 348_766 ($\textcolor{green}{-7.22\%}$) 86_803 ($\textcolor{green}{-8.39\%}$)

Heartbeat

binary_size heartbeat
Motoko 135_630 ($\textcolor{green}{-13.51\%}$) 8_461 ($\textcolor{green}{-5.76\%}$)
Rust 28_624 ($\textcolor{green}{-19.61\%}$) 830 ($\textcolor{green}{-26.35\%}$)

Timer

binary_size setTimer cancelTimer
Motoko 142_158 ($\textcolor{green}{-13.50\%}$) 17_762 ($\textcolor{green}{-8.80\%}$) 1_706 ($\textcolor{green}{-10.54\%}$)
Rust 447_452 ($\textcolor{green}{-14.67\%}$) 49_589 ($\textcolor{green}{-10.09\%}$) 9_514 ($\textcolor{green}{-8.67\%}$)

Garbage Collection

Note
Same as main branch, skipping.

Actor class

binary size put new bucket put existing bucket get
Map 289_202 ($\textcolor{green}{-12.66\%}$) 748_768 ($\textcolor{green}{-10.18\%}$) 5_609 ($\textcolor{green}{-9.36\%}$) 5_988 ($\textcolor{green}{-8.33\%}$)

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 156_672 ($\textcolor{green}{-13.66\%}$) 143_547 ($\textcolor{green}{-13.84\%}$) 15_760 ($\textcolor{green}{-5.31\%}$) 8_489 ($\textcolor{green}{-7.17\%}$) 11_737 ($\textcolor{green}{-6.39\%}$) 3_665 ($\textcolor{green}{-8.40\%}$)
Rust 478_372 ($\textcolor{green}{-14.79\%}$) 527_123 ($\textcolor{green}{-24.33\%}$) 57_647 ($\textcolor{green}{-8.18\%}$) 38_523 ($\textcolor{green}{-9.27\%}$) 81_062 ($\textcolor{green}{-7.86\%}$) 45_691 ($\textcolor{green}{-7.98\%}$)

@github-actions
Copy link

github-actions bot commented Jun 2, 2023

Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

  • generate 50k. Insert 50k Nat32 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
  • max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
  • batch_get 50. Find 50 elements from the collection.
  • batch_put 50. Insert 50 elements to the collection.
  • batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

  • The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
  • We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
  • Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
  • Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • btree comes from Byron Becker's stable BTreeMap library.
  • zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 169_982 2_097_113_506 9_102_052 1_115_399 609_254_124 1_056_869
triemap 174_030 2_020_134_416 9_715_900 773_637 1_853_794 1_033_460
rbtree 171_127 1_797_995_532 8_902_160 670_401 1_623_975 859_340
splay 170_477 2_040_395_523 8_702_096 1_102_393 1_915_542 1_103_332
btree 198_636 1_875_401_612 7_556_172 813_525 1_718_273 862_047
zhenya_hashmap 165_325 1_642_423_605 9_301_800 647_832 1_447_024 652_030
btreemap_rs 438_979 112_676_543 1_638_400 59_465 133_080 60_509
hashmap_rs 428_466 49_363_168 1_835_008 19_572 58_237 20_805

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 156_998 688_335_838 1_400_024 338_619 711_943
heap_rs 406_219 4_975_528 819_200 48_902 20_578

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 169_982 419_486_900 1_820_844 1_113_679 122_781_037 1_054_639
hashmap_rs 428_466 10_178_230 950_272 18_903 57_565 19_747
imrc_hashmap_rs 435_292 19_062_328 1_572_864 29_764 113_802 36_791
movm_rs 1_760_914 999_676_261 2_654_208 2_424_874 6_357_705 5_013_896
movm_dynamic_rs 1_943_858 485_763_587 2_129_920 1_909_424 2_642_175 1_907_002

Sample Dapps

Measure the performance of some typical dapps:

  • Basic DAO,
    with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
  • DIP721 NFT

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 242_539 41_042 18_026 12_678 14_924
Rust 751_374 500_487 93_345 114_984 124_724

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 200_814 12_164 22_455 4_747
Rust 801_533 134_675 348_766 86_803

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

  • setTimer measures both the setTimer(0) method and the execution of empty job.
  • It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
    of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

binary_size heartbeat
Motoko 135_630 8_461
Rust 28_624 830

Timer

binary_size setTimer cancelTimer
Motoko 142_158 17_762 1_706
Rust 447_452 49_589 9_514

Motoko Specific Benchmarks

Measure various features only available in Motoko.

  • Garbage Collection. Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit. The dfx/ic-wasm optimizer is disabled for the garbage collection test cases due to how the optimizer affects function names, making profiling trickier.

    • default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
    • copying. Compile with --force-gc --copying-gc.
    • compacting. Compile with --force-gc --compacting-gc.
    • generational. Compile with --force-gc --generational-gc.
  • Actor class. Measure the cost of spawning actor class, using the Actor classes example.

Garbage Collection

generate 80k max mem batch_get 50 batch_put 50 batch_remove 50
default 247_113_104 15_539_816 50 50 50
copying 247_113_054 15_539_816 247_107_545 247_259_605 247_259_929
compacting 409_743_010 15_539_816 308_335_419 367_295_137 351_658_670
generational 625_110_580 15_540_080 56_690 1_100_091 622_657

Actor class

binary size put new bucket put existing bucket get
Map 289_202 748_768 5_609 5_988

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 156_672 143_547 15_760 8_489 11_737 3_665
Rust 478_372 527_123 57_647 38_523 81_062 45_691

@kentosugama kentosugama changed the title Keep name section Enable wasm optimizer from dfx 0.14.0 Jun 28, 2023
@kentosugama kentosugama self-assigned this Jun 28, 2023
Copy link
Contributor

@chenyan-dfinity chenyan-dfinity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's add a paragraph at the top-level README.md to explain the use of optimizers.

@kentosugama
Copy link
Contributor Author

Just updated the README.md

@kentosugama kentosugama merged commit 38b2a73 into main Jun 30, 2023
@kentosugama kentosugama deleted the keep-names branch June 30, 2023 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants