Skip to content

Conversation

@steveej
Copy link

@steveej steveej commented Nov 26, 2019

First, this fixes a typo where the hardcoded Inherited visiblity
wasn't properly set due to a variable name mixup.

Second, this adds support for a visibility attribute to the wrap_impl
macro.

I tested the pub modifier, only manually, all others are completely
untested.

First, this fixes a typo where the hardcoded `Inherited` visiblity
wasn't properly set due to a variable name mixup.

Second, this adds support for a visibility attribute to the `wrap_impl`
macro.

I tested the _pub_ modifier, only manually, all others are completely
untested.
@yurydelendik
Copy link
Owner

Thank you

yurydelendik pushed a commit that referenced this pull request Jul 28, 2020
We often see patterns like:

```
    mov w2, #0xffff_ffff   // uses ORR with logical immediate form
    add w0, w1, w2
```

which is just `w0 := w1 - 1`. It would be much better to recognize when
the inverse of an immediate will fit in a 12-bit immediate field if the
immediate itself does not, and flip add to subtract (and vice versa), so
we can instead generate:

```
    sub w0, w1, #1
```

We see this pattern in e.g. `bz2`, where this commit makes the following
difference (counting instructions with `perf stat`, filling in the
wasmtime cache first then running again to get just runtime):

pre:

```
        992.762250      task-clock (msec)         #    0.998 CPUs utilized
               109      context-switches          #    0.110 K/sec
                 0      cpu-migrations            #    0.000 K/sec
             5,035      page-faults               #    0.005 M/sec
     3,224,119,134      cycles                    #    3.248 GHz
     4,000,521,171      instructions              #    1.24  insn per cycle
   <not supported>      branches
        27,573,755      branch-misses

       0.995072322 seconds time elapsed
```

post:

```
        993.853850      task-clock (msec)         #    0.998 CPUs utilized
               123      context-switches          #    0.124 K/sec
                 1      cpu-migrations            #    0.001 K/sec
             5,072      page-faults               #    0.005 M/sec
     3,201,278,337      cycles                    #    3.221 GHz
     3,917,061,340      instructions              #    1.22  insn per cycle
   <not supported>      branches
        28,410,633      branch-misses

       0.996008047 seconds time elapsed
```

In other words, a 2.1% reduction in instruction count on `bz2`.
yurydelendik pushed a commit that referenced this pull request Sep 11, 2020
…ance#2174)

* Don't substract 1 from end_addr in line program writing

Fixes bytecodealliance#2173

* add testcase for end_sequence having offset past retq (#1)

* Update tests/all/debug/translate.rs

Co-authored-by: Gabor Greif <[email protected]>

Co-authored-by: Gabor Greif <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants