Tweak parallelism and the instantiation benchmark #3775

alexcrichton · 2022-02-07T21:36:48Z

Currently the "sequential" and "parallel" benchmarks reports somewhat
different timings. For sequential it's time-to-instantiate, but for
parallel it's time-to-instantiate-10k instances. The parallelism in the
parallel benchmark can also theoretically be affected by rayon's
work-stealing. For example if rayon doesn't actually do any work
stealing at all then this ends up being a sequential test again.
Otherwise though it's possible for some threads to finish much earlier
as rayon isn't guaranteed to keep threads busy.

This commit applies a few updates to the benchmark:

First an InstancePre<T> is now used instead of a Linker<T> to
front-load type-checking and avoid that on each instantiation (and
this is generally the fastest path to instantiate right now).
Next the instantiation benchmark is changed to measure one
instantiation-per-iteration to measure per-instance instantiation to
better compare with sequential numbers.
Finally rayon is removed in favor of manually creating background
threads that infinitely do work until we tell them to stop. These
background threads are guaranteed to be working for the entire time
the benchmark is executing and should theoretically exhibit what the
situation that there's N units of work all happening at once.

I also applied some minor updates here such as having the parallel
instantiation defined conditionally for multiple modules as well as
upping the limits of the pooling allocator to handle a large module
(rustpython.wasm) that I threw at it.

Currently the "sequential" and "parallel" benchmarks reports somewhat different timings. For sequential it's time-to-instantiate, but for parallel it's time-to-instantiate-10k instances. The parallelism in the parallel benchmark can also theoretically be affected by rayon's work-stealing. For example if rayon doesn't actually do any work stealing at all then this ends up being a sequential test again. Otherwise though it's possible for some threads to finish much earlier as rayon isn't guaranteed to keep threads busy. This commit applies a few updates to the benchmark: * First an `InstancePre<T>` is now used instead of a `Linker<T>` to front-load type-checking and avoid that on each instantiation (and this is generally the fastest path to instantiate right now). * Next the instantiation benchmark is changed to measure one instantiation-per-iteration to measure per-instance instantiation to better compare with sequential numbers. * Finally rayon is removed in favor of manually creating background threads that infinitely do work until we tell them to stop. These background threads are guaranteed to be working for the entire time the benchmark is executing and should theoretically exhibit what the situation that there's N units of work all happening at once. I also applied some minor updates here such as having the parallel instantiation defined conditionally for multiple modules as well as upping the limits of the pooling allocator to handle a large module (rustpython.wasm) that I threw at it.

cfallin · 2022-02-07T23:55:34Z

Will go ahead and hit 'merge' here so I can rebase on top of it in the lazy-tables PR...

) Currently the "sequential" and "parallel" benchmarks reports somewhat different timings. For sequential it's time-to-instantiate, but for parallel it's time-to-instantiate-10k instances. The parallelism in the parallel benchmark can also theoretically be affected by rayon's work-stealing. For example if rayon doesn't actually do any work stealing at all then this ends up being a sequential test again. Otherwise though it's possible for some threads to finish much earlier as rayon isn't guaranteed to keep threads busy. This commit applies a few updates to the benchmark: * First an `InstancePre<T>` is now used instead of a `Linker<T>` to front-load type-checking and avoid that on each instantiation (and this is generally the fastest path to instantiate right now). * Next the instantiation benchmark is changed to measure one instantiation-per-iteration to measure per-instance instantiation to better compare with sequential numbers. * Finally rayon is removed in favor of manually creating background threads that infinitely do work until we tell them to stop. These background threads are guaranteed to be working for the entire time the benchmark is executing and should theoretically exhibit what the situation that there's N units of work all happening at once. I also applied some minor updates here such as having the parallel instantiation defined conditionally for multiple modules as well as upping the limits of the pooling allocator to handle a large module (rustpython.wasm) that I threw at it.

alexcrichton mentioned this pull request Feb 7, 2022

Implement lazy funcref table and anyfunc initialization. #3733

Merged

peterhuene approved these changes Feb 7, 2022

View reviewed changes

cfallin merged commit 43b3794 into bytecodealliance:main Feb 7, 2022

alexcrichton deleted the update-instantiation-benchmark branch February 8, 2022 00:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tweak parallelism and the instantiation benchmark #3775

Tweak parallelism and the instantiation benchmark #3775

Uh oh!

alexcrichton commented Feb 7, 2022

Uh oh!

cfallin commented Feb 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tweak parallelism and the instantiation benchmark #3775

Tweak parallelism and the instantiation benchmark #3775

Uh oh!

Conversation

alexcrichton commented Feb 7, 2022

Uh oh!

cfallin commented Feb 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants