Enable wasm optimizer from `dfx 0.14.0` #55

kentosugama · 2023-06-02T21:12:46Z

I think it would be good to merge this so that we can measure performance improvements beyond wasm-opt and not reimplement optimizations already included in the optimizer.

Note that these benchmarks directly useic-wasm instead of using the optimize: "cycles" feature in dfx in order to preserve the wasm name sections for the flame graphs. For any users reading this, for the general case we recommend using the optimizer through dfx instead as the binary size reductions will be better when dropping the name sections.

For future reference: dfinity/sdk#3090
See also #50 for previous discussions.

github-actions · 2023-06-02T21:26:59Z

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Map

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	169_982 ($\textcolor{green}{-13.37\%}$)	2_097_113_506 ($\textcolor{green}{-12.15\%}$)	9_102_052	1_115_399 ($\textcolor{green}{-13.74\%}$)	609_254_124 ($\textcolor{green}{-11.61\%}$)	1_056_869 ($\textcolor{green}{-13.70\%}$)
triemap	174_030 ($\textcolor{green}{-13.76\%}$)	2_020_134_416 ($\textcolor{green}{-11.65\%}$)	9_715_900	773_637 ($\textcolor{green}{-13.26\%}$)	1_853_794 ($\textcolor{green}{-12.21\%}$)	1_033_460 ($\textcolor{green}{-12.98\%}$)
rbtree	171_127 ($\textcolor{green}{-13.99\%}$)	1_797_995_532 ($\textcolor{green}{-11.20\%}$)	8_902_160	670_401 ($\textcolor{green}{-14.90\%}$)	1_623_975 ($\textcolor{green}{-11.70\%}$)	859_340 ($\textcolor{green}{-13.34\%}$)
splay	170_477 ($\textcolor{green}{-13.84\%}$)	2_040_395_523 ($\textcolor{green}{-11.50\%}$)	8_702_096	1_102_393 ($\textcolor{green}{-12.39\%}$)	1_915_542 ($\textcolor{green}{-11.93\%}$)	1_103_332 ($\textcolor{green}{-12.42\%}$)
btree	198_636 ($\textcolor{green}{-15.60\%}$)	1_875_401_612 ($\textcolor{green}{-11.63\%}$)	7_556_172	813_525 ($\textcolor{green}{-13.14\%}$)	1_718_273 ($\textcolor{green}{-12.11\%}$)	862_047 ($\textcolor{green}{-13.07\%}$)
zhenya_hashmap	165_325 ($\textcolor{green}{-13.20\%}$)	1_642_423_605 ($\textcolor{green}{-11.77\%}$)	9_301_800	647_832 ($\textcolor{green}{-13.50\%}$)	1_447_024 ($\textcolor{green}{-12.52\%}$)	652_030 ($\textcolor{green}{-13.63\%}$)
btreemap_rs	438_979 ($\textcolor{green}{-14.72\%}$)	112_676_543 ($\textcolor{green}{-2.86\%}$)	1_638_400	59_465 ($\textcolor{red}{0.05\%}$)	133_080 ($\textcolor{green}{-3.46\%}$)	60_509 ($\textcolor{green}{-2.08\%}$)
hashmap_rs	428_466 ($\textcolor{green}{-14.78\%}$)	49_363_168 ($\textcolor{green}{-7.45\%}$)	1_835_008	19_572 ($\textcolor{green}{-7.11\%}$)	58_237 ($\textcolor{green}{-8.43\%}$)	20_805 ($\textcolor{green}{-7.47\%}$)

Priority queue

	binary_size	heapify 50k	mem	pop_min 50	put 50
heap	156_998 ($\textcolor{green}{-13.61\%}$)	688_335_838 ($\textcolor{green}{-13.23\%}$)	1_400_024	338_619 ($\textcolor{green}{-12.12\%}$)	711_943 ($\textcolor{green}{-13.47\%}$)
heap_rs	406_219 ($\textcolor{green}{-14.20\%}$)	4_975_528 ($\textcolor{green}{-1.31\%}$)	819_200	48_902 ($\textcolor{green}{-8.15\%}$)	20_578 ($\textcolor{green}{-6.85\%}$)

MoVM

	binary_size	generate 10k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	169_982 ($\textcolor{green}{-13.37\%}$)	419_486_900 ($\textcolor{green}{-12.14\%}$)	1_820_844	1_113_679 ($\textcolor{green}{-13.74\%}$)	122_781_037 ($\textcolor{green}{-11.60\%}$)	1_054_639 ($\textcolor{green}{-13.70\%}$)
hashmap_rs	428_466 ($\textcolor{green}{-14.78\%}$)	10_178_230 ($\textcolor{green}{-7.34\%}$)	950_272	18_903 ($\textcolor{green}{-7.27\%}$)	57_565 ($\textcolor{green}{-8.49\%}$)	19_747 ($\textcolor{green}{-7.61\%}$)
imrc_hashmap_rs	435_292 ($\textcolor{green}{-15.31\%}$)	19_062_328 ($\textcolor{green}{-4.30\%}$)	1_572_864	29_764 ($\textcolor{green}{-5.57\%}$)	113_802 ($\textcolor{green}{-5.33\%}$)	36_791 ($\textcolor{green}{-2.20\%}$)
movm_rs	1_760_914 ($\textcolor{green}{-15.84\%}$)	999_676_261 ($\textcolor{green}{-1.73\%}$)	2_654_208	2_424_874 ($\textcolor{green}{-2.80\%}$)	6_357_705 ($\textcolor{green}{-1.84\%}$)	5_013_896 ($\textcolor{green}{-1.81\%}$)
movm_dynamic_rs	1_943_858 ($\textcolor{green}{-15.31\%}$)	485_763_587 ($\textcolor{green}{-2.12\%}$)	2_129_920	1_909_424 ($\textcolor{green}{-2.18\%}$)	2_642_175 ($\textcolor{green}{-2.49\%}$)	1_907_002 ($\textcolor{green}{-2.21\%}$)

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	242_539 ($\textcolor{green}{-16.79\%}$)	41_042 ($\textcolor{green}{-7.78\%}$)	18_026 ($\textcolor{green}{-9.51\%}$)	12_678 ($\textcolor{green}{-10.71\%}$)	14_924 ($\textcolor{green}{-11.16\%}$)
Rust	751_374 ($\textcolor{green}{-20.11\%}$)	500_487 ($\textcolor{green}{-7.56\%}$)	93_345 ($\textcolor{green}{-8.90\%}$)	114_984 ($\textcolor{green}{-8.37\%}$)	124_724 ($\textcolor{green}{-8.98\%}$)

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	200_814 ($\textcolor{green}{-17.91\%}$)	12_164 ($\textcolor{green}{-9.08\%}$)	22_455 ($\textcolor{green}{-9.01\%}$)	4_747 ($\textcolor{green}{-11.40\%}$)
Rust	801_533 ($\textcolor{green}{-20.30\%}$)	134_675 ($\textcolor{green}{-6.58\%}$)	348_766 ($\textcolor{green}{-7.22\%}$)	86_803 ($\textcolor{green}{-8.39\%}$)

Heartbeat

	binary_size	heartbeat
Motoko	135_630 ($\textcolor{green}{-13.51\%}$)	8_461 ($\textcolor{green}{-5.76\%}$)
Rust	28_624 ($\textcolor{green}{-19.61\%}$)	830 ($\textcolor{green}{-26.35\%}$)

Timer

	binary_size	setTimer	cancelTimer
Motoko	142_158 ($\textcolor{green}{-13.50\%}$)	17_762 ($\textcolor{green}{-8.80\%}$)	1_706 ($\textcolor{green}{-10.54\%}$)
Rust	447_452 ($\textcolor{green}{-14.67\%}$)	49_589 ($\textcolor{green}{-10.09\%}$)	9_514 ($\textcolor{green}{-8.67\%}$)

Garbage Collection

Note
Same as main branch, skipping.

Actor class

	binary size	put new bucket	put existing bucket	get
Map	289_202 ($\textcolor{green}{-12.66\%}$)	748_768 ($\textcolor{green}{-10.18\%}$)	5_609 ($\textcolor{green}{-9.36\%}$)	5_988 ($\textcolor{green}{-8.33\%}$)

Publisher & Subscriber

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	156_672 ($\textcolor{green}{-13.66\%}$)	143_547 ($\textcolor{green}{-13.84\%}$)	15_760 ($\textcolor{green}{-5.31\%}$)	8_489 ($\textcolor{green}{-7.17\%}$)	11_737 ($\textcolor{green}{-6.39\%}$)	3_665 ($\textcolor{green}{-8.40\%}$)
Rust	478_372 ($\textcolor{green}{-14.79\%}$)	527_123 ($\textcolor{green}{-24.33\%}$)	57_647 ($\textcolor{green}{-8.18\%}$)	38_523 ($\textcolor{green}{-9.27\%}$)	81_062 ($\textcolor{green}{-7.86\%}$)	45_691 ($\textcolor{green}{-7.98\%}$)

github-actions · 2023-06-02T21:27:01Z

Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 50k. Insert 50k Nat32 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

btree comes from Byron Becker's stable BTreeMap library.

zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.

The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	169_982	2_097_113_506	9_102_052	1_115_399	609_254_124	1_056_869
triemap	174_030	2_020_134_416	9_715_900	773_637	1_853_794	1_033_460
rbtree	171_127	1_797_995_532	8_902_160	670_401	1_623_975	859_340
splay	170_477	2_040_395_523	8_702_096	1_102_393	1_915_542	1_103_332
btree	198_636	1_875_401_612	7_556_172	813_525	1_718_273	862_047
zhenya_hashmap	165_325	1_642_423_605	9_301_800	647_832	1_447_024	652_030
btreemap_rs	438_979	112_676_543	1_638_400	59_465	133_080	60_509
hashmap_rs	428_466	49_363_168	1_835_008	19_572	58_237	20_805

Priority queue

	binary_size	heapify 50k	mem	pop_min 50	put 50
heap	156_998	688_335_838	1_400_024	338_619	711_943
heap_rs	406_219	4_975_528	819_200	48_902	20_578

MoVM

	binary_size	generate 10k	max mem	batch_get 50	batch_put 50	batch_remove 50
hashmap	169_982	419_486_900	1_820_844	1_113_679	122_781_037	1_054_639
hashmap_rs	428_466	10_178_230	950_272	18_903	57_565	19_747
imrc_hashmap_rs	435_292	19_062_328	1_572_864	29_764	113_802	36_791
movm_rs	1_760_914	999_676_261	2_654_208	2_424_874	6_357_705	5_013_896
movm_dynamic_rs	1_943_858	485_763_587	2_129_920	1_909_424	2_642_175	1_907_002

Sample Dapps

Measure the performance of some typical dapps:

Basic DAO,
with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal
Motoko	242_539	41_042	18_026	12_678	14_924
Rust	751_374	500_487	93_345	114_984	124_724

DIP721 NFT

	binary_size	init	mint_token	transfer_token
Motoko	200_814	12_164	22_455	4_747
Rust	801_533	134_675	348_766	86_803

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

	binary_size	heartbeat
Motoko	135_630	8_461
Rust	28_624	830

Timer

	binary_size	setTimer	cancelTimer
Motoko	142_158	17_762	1_706
Rust	447_452	49_589	9_514

Motoko Specific Benchmarks

Measure various features only available in Motoko.

Garbage Collection. Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit. The dfx/ic-wasm optimizer is disabled for the garbage collection test cases due to how the optimizer affects function names, making profiling trickier.
- default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
- copying. Compile with --force-gc --copying-gc.
- compacting. Compile with --force-gc --compacting-gc.
- generational. Compile with --force-gc --generational-gc.
Actor class. Measure the cost of spawning actor class, using the Actor classes example.

Garbage Collection

	generate 80k	max mem	batch_get 50	batch_put 50	batch_remove 50
default	247_113_104	15_539_816	50	50	50
copying	247_113_054	15_539_816	247_107_545	247_259_605	247_259_929
compacting	409_743_010	15_539_816	308_335_419	367_295_137	351_658_670
generational	625_110_580	15_540_080	56_690	1_100_091	622_657

Actor class

	binary size	put new bucket	put existing bucket	get
Map	289_202	748_768	5_609	5_988

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	156_672	143_547	15_760	8_489	11_737	3_665
Rust	478_372	527_123	57_647	38_523	81_062	45_691

chenyan-dfinity

LGTM. Let's add a paragraph at the top-level README.md to explain the use of optimizers.

kentosugama · 2023-06-30T17:14:03Z

Just updated the README.md

keep name section

a94da9f

kentosugama and others added 7 commits June 2, 2023 14:29

install wasm-opt

da8ef93

switch to ic-wasm

bdc51f9

bug

ff5d31d

try alternative install

4ab6a30

ic-wasm PR merged

ed5b6ce

try enabling gc opts

098888d

Typo

68a3908

kentosugama changed the title ~~Keep name section~~ Enable wasm optimizer from dfx 0.14.0 Jun 28, 2023

kentosugama self-assigned this Jun 28, 2023

kentosugama added 2 commits June 28, 2023 18:36

Disable GC cases

b0d5d06

Update README.md

0a12f19

kentosugama requested a review from chenyan-dfinity June 29, 2023 02:09

kentosugama mentioned this pull request Jun 29, 2023

Bump dfx to 0.14.0 and enable wasm optimizer #50

Closed

chenyan-dfinity reviewed Jun 30, 2023

View reviewed changes

kentosugama added 2 commits June 30, 2023 10:05

Add section in readme about optimizers

af5f220

l

2c850d9

chenyan-dfinity approved these changes Jun 30, 2023

View reviewed changes

kentosugama merged commit 38b2a73 into main Jun 30, 2023

kentosugama deleted the keep-names branch June 30, 2023 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable wasm optimizer from `dfx 0.14.0` #55

Enable wasm optimizer from `dfx 0.14.0` #55

Uh oh!

kentosugama commented Jun 2, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Jun 2, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Jun 2, 2023 •

edited

Loading

Uh oh!

chenyan-dfinity left a comment

Uh oh!

kentosugama commented Jun 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable wasm optimizer from dfx 0.14.0 #55

Enable wasm optimizer from dfx 0.14.0 #55

Uh oh!

Conversation

kentosugama commented Jun 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Map

Priority queue

MoVM

Basic DAO

DIP721 NFT

Heartbeat

Timer

Garbage Collection

Actor class

Publisher & Subscriber

Uh oh!

github-actions bot commented Jun 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Collection libraries

💎 Takeaways

Map

Priority queue

MoVM

Sample Dapps

Basic DAO

DIP721 NFT

Heartbeat / Timer

Heartbeat

Timer

Motoko Specific Benchmarks

Garbage Collection

Actor class

Publisher & Subscriber

Uh oh!

chenyan-dfinity left a comment

Choose a reason for hiding this comment

Uh oh!

kentosugama commented Jun 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable wasm optimizer from `dfx 0.14.0` #55

Enable wasm optimizer from `dfx 0.14.0` #55

kentosugama commented Jun 2, 2023 •

edited

Loading

github-actions bot commented Jun 2, 2023 •

edited

Loading

github-actions bot commented Jun 2, 2023 •

edited

Loading