perf(interpreter): improve i256 instructions #630

DaniPopes · 2023-08-22T18:23:21Z

Split off from #582

Summary:

i256_sign:
- is split into the two possible variants as before (with the const bool), so we can use it with an immutable reference, avoiding copies in i256_cmp (see slt, sgt impls). i256_sign_compl gets optimized away like i256_sign::<true> before
- by sorting Sign variants we can leverage derived Eq, Ord impls, which are can be further optimized by the compiler. Also, by using -1, 0, 1 it will be the exact same cmp::Ordering and the signum function result, which helps in the sign function (where we transmute num != 0 directly to Sign)
i256_div: if condition generates better code than a fully exhaustive match (branchless vs multiple branches). I don't know why, I would assume the two to be identical ... godbolt: https://godbolt.org/z/939obPrsj
other small changes are for readability: overflowing().0 == wrapping(), >> == wrapping_shr

gakonst

overall supportive, defer to dragan for more

crates/interpreter/src/instructions/bitwise.rs

gakonst · 2023-08-22T19:47:48Z

crates/interpreter/src/instructions/bitwise.rs

-            Sign::Minus => {
-                let shifted = ((op2.overflowing_sub(U256::from(1)).0) >> shift)
-                    .overflowing_add(U256::from(1))
-                    .0;
-                two_compl(shifted)
-            }
+            Sign::Plus | Sign::Zero => op2.wrapping_shr(shift),
+            Sign::Minus => two_compl(op2.wrapping_sub(ONE).wrapping_shr(shift).wrapping_add(ONE)),


Seems fine if we are OK with going from overflowing -> wrapping

We can simplify this to ((op2-ONE)>>shift)+1) as checks for op2 == zero and op1 >= 256 are done and shr in background calls overflowing_shr inside ruint.

Makes sense in the context that ruint operations are overloaded to be wrapping, but it might not be obvious without this knowledge. I wrote them with explicit wrapping, normally I'd use operators but this is one of the very few instances where I'd rather use explicit methods

wrapper_* are nicer here then overflowing_* so I will just merge PR.
But I am curious why you think this is one of those instances, bearing in mind that we already check for op2 == zero and op1 >= 256

crates/interpreter/src/instructions/i256.rs

gakonst · 2023-08-22T19:51:03Z

crates/interpreter/src/instructions/i256.rs

+    match first_sign.cmp(&second_sign) {
+        // note: adding `if first_sign != Sign::Zero` to short circuit zero comparisons performs
+        // slower on average, as of #582
+        Ordering::Equal => first.cmp(second),
+        o => o,
    }


rakita

lgtm! Had a few small nits, really like what you did with Sign

rakita · 2023-08-25T10:24:14Z

crates/interpreter/src/instructions/bitwise.rs

-                    .0;
-                two_compl(shifted)
-            }
+            Sign::Plus | Sign::Zero => op2.wrapping_shr(shift),


shift size is checked so it is okay to use *op2 >>shift without wrapping_shr:
https://doc.rust-lang.org/std/primitive.u8.html#method.wrapping_shl

wrapping_shr calls overflowing_shr and even ordinary shr calls wrapping_shr

crates/interpreter/src/instructions/i256.rs

crates/interpreter/src/instructions/bitwise.rs

rakita · 2023-08-26T22:41:51Z

crates/interpreter/src/instructions/bitwise.rs

-            Sign::Minus => {
-                let shifted = ((op2.overflowing_sub(U256::from(1)).0) >> shift)
-                    .overflowing_add(U256::from(1))
-                    .0;
-                two_compl(shifted)
-            }
+            Sign::Plus | Sign::Zero => op2.wrapping_shr(shift),
+            Sign::Minus => two_compl(op2.wrapping_sub(ONE).wrapping_shr(shift).wrapping_add(ONE)),


We can simplify this to ((op2-ONE)>>shift)+1) as checks for op2 == zero and op1 >= 256 are done and shr in background calls overflowing_shr inside ruint.

crates/interpreter/src/instructions/i256.rs

DaniPopes · 2023-08-27T00:00:36Z

Noticed one more thing (a774b29), we can drop the zero check in div and mod after doing the operation, since it allows for more optimizations at function return and results in better codegen overall.

* perf(interpreter): improve i256 instructions * chore: remove unused code, address review * perf: drop zero check after dividing

perf(interpreter): improve i256 instructions

075684e

gakonst approved these changes Aug 22, 2023

View reviewed changes

DaniPopes mentioned this pull request Aug 25, 2023

perf: refactor interpreter internals (take 2) #582

Merged

rakita approved these changes Aug 26, 2023

View reviewed changes

DaniPopes added 2 commits August 27, 2023 01:56

chore: remove unused code, address review

d6c485a

perf: drop zero check after dividing

a774b29

rakita merged commit 37b0192 into bluealloy:main Aug 27, 2023

DaniPopes deleted the ir-i256 branch August 27, 2023 10:10

Evalir pushed a commit to Evalir/revm that referenced this pull request Sep 14, 2023

perf(interpreter): improve i256 instructions (bluealloy#630)

6c2aa26

* perf(interpreter): improve i256 instructions * chore: remove unused code, address review * perf: drop zero check after dividing

perf(interpreter): improve i256 instructions #630

perf(interpreter): improve i256 instructions #630

Uh oh!

Conversation

DaniPopes commented Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gakonst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gakonst Aug 22, 2023

Choose a reason for hiding this comment

Uh oh!

rakita Aug 26, 2023

Choose a reason for hiding this comment

Uh oh!

DaniPopes Aug 26, 2023

Choose a reason for hiding this comment

Uh oh!

rakita Aug 27, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gakonst Aug 22, 2023

Choose a reason for hiding this comment

Uh oh!

rakita left a comment

Choose a reason for hiding this comment

Uh oh!

rakita Aug 25, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rakita Aug 26, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DaniPopes commented Aug 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DaniPopes commented Aug 22, 2023 •

edited

Loading