Skip to content

Conversation

@alexcrichton
Copy link
Member

Currently the "sequential" and "parallel" benchmarks reports somewhat
different timings. For sequential it's time-to-instantiate, but for
parallel it's time-to-instantiate-10k instances. The parallelism in the
parallel benchmark can also theoretically be affected by rayon's
work-stealing. For example if rayon doesn't actually do any work
stealing at all then this ends up being a sequential test again.
Otherwise though it's possible for some threads to finish much earlier
as rayon isn't guaranteed to keep threads busy.

This commit applies a few updates to the benchmark:

  • First an InstancePre<T> is now used instead of a Linker<T> to
    front-load type-checking and avoid that on each instantiation (and
    this is generally the fastest path to instantiate right now).

  • Next the instantiation benchmark is changed to measure one
    instantiation-per-iteration to measure per-instance instantiation to
    better compare with sequential numbers.

  • Finally rayon is removed in favor of manually creating background
    threads that infinitely do work until we tell them to stop. These
    background threads are guaranteed to be working for the entire time
    the benchmark is executing and should theoretically exhibit what the
    situation that there's N units of work all happening at once.

I also applied some minor updates here such as having the parallel
instantiation defined conditionally for multiple modules as well as
upping the limits of the pooling allocator to handle a large module
(rustpython.wasm) that I threw at it.

Currently the "sequential" and "parallel" benchmarks reports somewhat
different timings. For sequential it's time-to-instantiate, but for
parallel it's time-to-instantiate-10k instances. The parallelism in the
parallel benchmark can also theoretically be affected by rayon's
work-stealing. For example if rayon doesn't actually do any work
stealing at all then this ends up being a sequential test again.
Otherwise though it's possible for some threads to finish much earlier
as rayon isn't guaranteed to keep threads busy.

This commit applies a few updates to the benchmark:

* First an `InstancePre<T>` is now used instead of a `Linker<T>` to
  front-load type-checking and avoid that on each instantiation (and
  this is generally the fastest path to instantiate right now).

* Next the instantiation benchmark is changed to measure one
  instantiation-per-iteration to measure per-instance instantiation to
  better compare with sequential numbers.

* Finally rayon is removed in favor of manually creating background
  threads that infinitely do work until we tell them to stop. These
  background threads are guaranteed to be working for the entire time
  the benchmark is executing and should theoretically exhibit what the
  situation that there's N units of work all happening at once.

I also applied some minor updates here such as having the parallel
instantiation defined conditionally for multiple modules as well as
upping the limits of the pooling allocator to handle a large module
(rustpython.wasm) that I threw at it.
@cfallin
Copy link
Member

cfallin commented Feb 7, 2022

Will go ahead and hit 'merge' here so I can rebase on top of it in the lazy-tables PR...

@cfallin cfallin merged commit 43b3794 into bytecodealliance:main Feb 7, 2022
@alexcrichton alexcrichton deleted the update-instantiation-benchmark branch February 8, 2022 00:04
mpardesh pushed a commit to avanhatt/wasmtime that referenced this pull request Mar 17, 2022
)

Currently the "sequential" and "parallel" benchmarks reports somewhat
different timings. For sequential it's time-to-instantiate, but for
parallel it's time-to-instantiate-10k instances. The parallelism in the
parallel benchmark can also theoretically be affected by rayon's
work-stealing. For example if rayon doesn't actually do any work
stealing at all then this ends up being a sequential test again.
Otherwise though it's possible for some threads to finish much earlier
as rayon isn't guaranteed to keep threads busy.

This commit applies a few updates to the benchmark:

* First an `InstancePre<T>` is now used instead of a `Linker<T>` to
  front-load type-checking and avoid that on each instantiation (and
  this is generally the fastest path to instantiate right now).

* Next the instantiation benchmark is changed to measure one
  instantiation-per-iteration to measure per-instance instantiation to
  better compare with sequential numbers.

* Finally rayon is removed in favor of manually creating background
  threads that infinitely do work until we tell them to stop. These
  background threads are guaranteed to be working for the entire time
  the benchmark is executing and should theoretically exhibit what the
  situation that there's N units of work all happening at once.

I also applied some minor updates here such as having the parallel
instantiation defined conditionally for multiple modules as well as
upping the limits of the pooling allocator to handle a large module
(rustpython.wasm) that I threw at it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants