Profile wasm-opt level 3 #35

kentosugama · 2023-03-14T21:50:12Z

No description provided.

Merge remote-tracking branch 'origin/gc' into bump

github-actions · 2023-03-14T22:05:40Z

Warning
The flamegraph link only works after you merge.

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

	binary_size	heartbeat
Motoko	121_445	11_271
Rust	26_919	830

Timer

	binary_size	setTimer	cancelTimer
Motoko	135_274	31_898	1_718
Rust	421_265	50_847	9_660

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 50k. Insert 50k Nat32 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

btree comes from Byron Becker's stable BTreeMap library.

zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.

The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	162_540	2_097_038_276	9_102_052	1_115_747	609_004_394	1_057_139
triemap	166_619	2_021_341_114	9_716_008	773_985	1_855_492	1_035_118
rbtree	164_895	1_881_678_586	10_102_184	699_641	1_693_062	938_158
splay	163_477	2_087_719_583	9_302_108	1_142_246	1_958_463	1_142_695
btree	192_198	1_918_261_233	8_157_968	873_973	1_758_607	946_667
zhenya_hashmap	157_503	1_636_688_227	9_301_800	645_183	1_444_666	649_719
btreemap_rs	418_565	120_378_963	1_638_400	59_680	135_119	60_771
hashmap_rs	407_805	49_413_271	1_835_008	19_785	58_550	21_067

Priority queue

	binary_size	heapify 50k	mem	pop_min 50	put 50
heap	149_533	690_802_078	1_400_024	369_948	722_102
heap_rs	386_575	4_975_609	819_200	50_696	20_838

MoVM

	binary_size	generate 10k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	162_540	419_471_830	1_820_844	1_114_027	122_731_309	1_054_914
hashmap_rs	407_805	10_188_331	950_272	19_116	57_878	20_009
imrc_hashmap_rs	417_209	19_067_936	1_572_864	30_490	114_539	37_517
movm_rs	1_694_121	1_081_538_103	2_654_208	2_678_181	6_831_749	5_331_044
movm_dynamic_rs	1_851_191	545_039_960	2_129_920	2_144_206	2_943_937	2_123_781

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	141_478	129_115	18_627	8_487	14_604	3_663
Rust	455_851	506_613	58_042	39_026	81_523	46_201

Sample Dapps

Measure the performance of some typical dapps:

Basic DAO,
with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	229_963	41_021	18_070	12_751	14_944
Rust	726_247	499_376	92_870	115_070	125_850

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	183_305	12_164	22_456	4_747
Rust	767_457	136_720	352_205	86_991

Motoko Garbage Collection

Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit.

default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
copying. Compile with --force-gc --copying-gc.
compacting. Compile with --force-gc --compacting-gc.
generational. Compile with --force-gc --generational-gc.

	generate 80k	max mem	batch_get 50	batch_put 50	batch_remove 50
default	247_115_881	15_539_984	50	50	50
copying	247_115_831	15_539_984	247_110_319	247_262_382	247_262_501
compacting	409_365_425	15_539_984	308_339_012	348_775_445	352_663_118
generational	624_423_107	15_540_260	57_009	1_390_483	1_060_163

github-actions · 2023-03-16T17:40:24Z

Note
Diffing the performance result against the published result from main branch

Heartbeat

	binary_size	heartbeat
Motoko	121_445 ($\textcolor{green}{-17.45\%}$)	11_271 ($\textcolor{green}{-5.59\%}$)
Rust	26_919 ($\textcolor{green}{-24.49\%}$)	830 ($\textcolor{red}{41.40\%}$)

Timer

	binary_size	setTimer	cancelTimer
Motoko	135_274 ($\textcolor{green}{-16.98\%}$)	31_898 ($\textcolor{green}{-7.91\%}$)	1_718 ($\textcolor{green}{-10.66\%}$)
Rust	421_265 ($\textcolor{green}{-19.86\%}$)	50_847 ($\textcolor{green}{-8.89\%}$)	9_660 ($\textcolor{green}{-8.36\%}$)

Map

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	162_540 ($\textcolor{green}{-16.91\%}$)	2_097_038_276 ($\textcolor{green}{-12.15\%}$)	9_102_052	1_115_747 ($\textcolor{green}{-13.74\%}$)	609_004_394 ($\textcolor{green}{-11.64\%}$)	1_057_139 ($\textcolor{green}{-13.71\%}$)
triemap	166_619 ($\textcolor{green}{-17.27\%}$)	2_021_341_114 ($\textcolor{green}{-11.71\%}$)	9_716_008	773_985 ($\textcolor{green}{-13.33\%}$)	1_855_492 ($\textcolor{green}{-12.28\%}$)	1_035_118 ($\textcolor{green}{-13.12\%}$)
rbtree	164_895 ($\textcolor{green}{-17.38\%}$)	1_881_678_586 ($\textcolor{green}{-11.14\%}$)	10_102_184	699_641 ($\textcolor{green}{-15.05\%}$)	1_693_062 ($\textcolor{green}{-11.70\%}$)	938_158 ($\textcolor{green}{-13.29\%}$)
splay	163_477 ($\textcolor{green}{-17.28\%}$)	2_087_719_583 ($\textcolor{green}{-11.53\%}$)	9_302_108	1_142_246 ($\textcolor{green}{-12.50\%}$)	1_958_463 ($\textcolor{green}{-12.00\%}$)	1_142_695 ($\textcolor{green}{-12.53\%}$)
btree	192_198 ($\textcolor{green}{-18.31\%}$)	1_918_261_233 ($\textcolor{green}{-11.59\%}$)	8_157_968	873_973 ($\textcolor{green}{-13.19\%}$)	1_758_607 ($\textcolor{green}{-12.07\%}$)	946_667 ($\textcolor{green}{-13.11\%}$)
zhenya_hashmap	157_503 ($\textcolor{green}{-16.72\%}$)	1_636_688_227 ($\textcolor{green}{-11.78\%}$)	9_301_800	645_183 ($\textcolor{green}{-13.55\%}$)	1_444_666 ($\textcolor{green}{-12.54\%}$)	649_719 ($\textcolor{green}{-13.67\%}$)
btreemap_rs	418_565 ($\textcolor{green}{-18.90\%}$)	120_378_963 ($\textcolor{green}{-2.76\%}$)	1_638_400	59_680 ($\textcolor{green}{-0.07\%}$)	135_119 ($\textcolor{green}{-3.67\%}$)	60_771 ($\textcolor{green}{-2.12\%}$)
hashmap_rs	407_805 ($\textcolor{green}{-19.12\%}$)	49_413_271 ($\textcolor{green}{-7.18\%}$)	1_835_008	19_785 ($\textcolor{green}{-7.38\%}$)	58_550 ($\textcolor{green}{-8.22\%}$)	21_067 ($\textcolor{green}{-7.51\%}$)

Priority queue

	binary_size	heapify 50k	mem	pop_min 50	put 50
heap	149_533 ($\textcolor{green}{-17.11\%}$)	690_802_078 ($\textcolor{green}{-13.26\%}$)	1_400_024	369_948 ($\textcolor{green}{-12.08\%}$)	722_102 ($\textcolor{green}{-13.46\%}$)
heap_rs	386_575 ($\textcolor{green}{-18.64\%}$)	4_975_609 ($\textcolor{green}{-1.31\%}$)	819_200	50_696 ($\textcolor{green}{-5.35\%}$)	20_838 ($\textcolor{green}{-6.48\%}$)

MoVM

	binary_size	generate 10k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	162_540 ($\textcolor{green}{-16.91\%}$)	419_471_830 ($\textcolor{green}{-12.15\%}$)	1_820_844	1_114_027 ($\textcolor{green}{-13.74\%}$)	122_731_309 ($\textcolor{green}{-11.63\%}$)	1_054_914 ($\textcolor{green}{-13.71\%}$)
hashmap_rs	407_805 ($\textcolor{green}{-19.12\%}$)	10_188_331 ($\textcolor{green}{-7.08\%}$)	950_272	19_116 ($\textcolor{green}{-7.54\%}$)	57_878 ($\textcolor{green}{-8.28\%}$)	20_009 ($\textcolor{green}{-7.66\%}$)
imrc_hashmap_rs	417_209 ($\textcolor{green}{-19.23\%}$)	19_067_936 ($\textcolor{green}{-4.00\%}$)	1_572_864	30_490 ($\textcolor{green}{-4.18\%}$)	114_539 ($\textcolor{green}{-4.72\%}$)	37_517 ($\textcolor{green}{-1.06\%}$)
movm_rs	1_694_121 ($\textcolor{green}{-16.76\%}$)	1_081_538_103 ($\textcolor{green}{-1.57\%}$)	2_654_208	2_678_181 ($\textcolor{green}{-2.40\%}$)	6_831_749 ($\textcolor{green}{-1.61\%}$)	5_331_044 ($\textcolor{green}{-1.58\%}$)
movm_dynamic_rs	1_851_191 ($\textcolor{green}{-17.79\%}$)	545_039_960 ($\textcolor{green}{-1.90\%}$)	2_129_920	2_144_206 ($\textcolor{green}{-1.95\%}$)	2_943_937 ($\textcolor{green}{-2.20\%}$)	2_123_781 ($\textcolor{green}{-1.95\%}$)

Publisher & Subscriber

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	141_478 ($\textcolor{green}{-17.63\%}$)	129_115 ($\textcolor{green}{-17.71\%}$)	18_627 ($\textcolor{green}{-5.17\%}$)	8_487 ($\textcolor{green}{-7.20\%}$)	14_604 ($\textcolor{green}{-6.00\%}$)	3_663 ($\textcolor{green}{-8.45\%}$)
Rust	455_851 ($\textcolor{green}{-19.18\%}$)	506_613 ($\textcolor{green}{-27.24\%}$)	58_042 ($\textcolor{green}{-8.46\%}$)	39_026 ($\textcolor{green}{-9.64\%}$)	81_523 ($\textcolor{green}{-8.79\%}$)	46_201 ($\textcolor{green}{-8.59\%}$)

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	229_963 ($\textcolor{green}{-20.99\%}$)	41_021 ($\textcolor{green}{-8.12\%}$)	18_070 ($\textcolor{green}{-10.00\%}$)	12_751 ($\textcolor{green}{-9.95\%}$)	14_944 ($\textcolor{green}{-11.16\%}$)
Rust	726_247 ($\textcolor{green}{-23.10\%}$)	499_376 ($\textcolor{green}{-7.86\%}$)	92_870 ($\textcolor{green}{-9.36\%}$)	115_070 ($\textcolor{green}{-8.59\%}$)	125_850 ($\textcolor{green}{-9.34\%}$)

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	183_305 ($\textcolor{green}{-22.08\%}$)	12_164 ($\textcolor{green}{-9.08\%}$)	22_456 ($\textcolor{green}{-9.00\%}$)	4_747 ($\textcolor{green}{-11.39\%}$)
Rust	767_457 ($\textcolor{green}{-23.11\%}$)	136_720 ($\textcolor{green}{-7.09\%}$)	352_205 ($\textcolor{green}{-7.61\%}$)	86_991 ($\textcolor{green}{-8.88\%}$)

Motoko Garbage Collection

	generate 80k	max mem	batch_get 50	batch_put 50	batch_remove 50
default	247_115_881	15_539_984	50	50	50
copying	247_115_831	15_539_984	247_110_319	247_262_382	247_262_501
compacting	409_365_425	15_539_984	308_339_012	348_775_445	352_663_118
generational	624_423_107	15_540_260	57_009	1_390_483	1_060_163

[LANG-124](https://dfinity.atlassian.net/browse/LANG-124) [Profiling data](https://docs.google.com/document/d/1ICXF083-hfRZr2OfRUk8OxmZqYDQL4XINHQ0FXuDqTA/edit) suggests that it would be useful to take advantage of the performance and binary size wasm optimizations offered by [`wasm-opt`](https://github.com/WebAssembly/binaryen). See also results in the [canister profiling repo](dfinity/canister-profiling#35). This tool has been integrated with `ic-wasm` in this PR: dfinity/ic-wasm#28. The tool is added under the `shrink` command that is currently used in `dfx` to perform binary size reduction of canisters. The next step is to expose this feature to users by allowing users to opt into the optimizer and specify the optimization level in `dfx.json`. Then to invoke `ic-wasm` in `dfx` according to this argument, similarly to `shrink`. Note: This tool has been used in the past to perform binary size reductions for canisters, but was replaced because it deletes the custom metadata sections from wasm modules. The `ic-wasm` feature invokes `wasm-opt` while preserving these sections. Example usage in `dfx.json`: ``` "canisters" : { "backend" : { "optimize" : "cycles" } } ```

chenyan-dfinity and others added 30 commits February 14, 2023 08:37

trace only gc cost

87bbec2

run wasm-opt over canisters

67eba9e

typo

0d83a94

typo

3a57a7b

typo

2645de7

typo

6d466ca

typo

153c37a

bump moc

854f26d

fix loop

7a3b932

typo

5ad710a

typo

01de576

update readme

df1ba54

typo

644996c

typo

7f1cfb7

fix loop

90783f8

fix loop

860e711

makefiles are crazy

be91970

Update gc

3443819

Merge remote-tracking branch 'origin/gc' into bump

sync with bump branch

5d9ce52

typo

791bfcc

fix attempt

4b2d1c0

try bash instead of Make

cea3035

false negative

fa0b2aa

debug

8c24839

list canisters

15ef780

try makefile loop

75a0793

typo

b0795d5

loop through canisters

b30d2f8

list wasm files

239e931

try wasm-opt

09601f3

kentosugama added 2 commits March 14, 2023 14:28

run wasm-opt on every suite

a878b09

profile level 3

af48694

chenyan-dfinity and others added 18 commits March 15, 2023 14:12

try to diff reports

afe68ee

fix

e8e07ef

trigger CI

3a94e41

fix

69e3c68

try

adca888

fix

fa8e801

fix

df287e7

fix

770b9db

fix

509b9ee

fix

941846f

fix

04c4ea3

fix

24cded5

fix

082eedf

fix

c2f9bf9

fix

9811f0d

fix

ef22cd3

fix

7353396

merge bump branch to get diff script

a8d6cd9

kentosugama added 2 commits March 16, 2023 13:42

merge with main

9a5194e

fix

37a0f6e

kentosugama mentioned this pull request Apr 6, 2023

Wasm-opt integration dfinity/ic-wasm#28

Merged

11 tasks

kentosugama mentioned this pull request Apr 14, 2023

feat: expose wasm-opt optimizer in ic-wasm to users dfinity/sdk#3090

Merged

12 tasks

kentosugama closed this Apr 26, 2023

kentosugama deleted the wasm-opt-3 branch May 9, 2023 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Profile wasm-opt level 3 #35

Profile wasm-opt level 3 #35

Uh oh!

kentosugama commented Mar 14, 2023

Uh oh!

github-actions bot commented Mar 14, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Mar 16, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Profile wasm-opt level 3 #35

Profile wasm-opt level 3 #35

Uh oh!

Conversation

kentosugama commented Mar 14, 2023

Uh oh!

github-actions bot commented Mar 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Heartbeat / Timer

Heartbeat

Timer

Collection libraries

💎 Takeaways

Map

Priority queue

MoVM

Publisher & Subscriber

Sample Dapps

Basic DAO

DIP721 NFT

Motoko Garbage Collection

Uh oh!

github-actions bot commented Mar 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Heartbeat

Timer

Map

Priority queue

MoVM

Publisher & Subscriber

Basic DAO

DIP721 NFT

Motoko Garbage Collection

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 14, 2023 •

edited

Loading

github-actions bot commented Mar 16, 2023 •

edited

Loading