-
Notifications
You must be signed in to change notification settings - Fork 967
perf(interpreter): improve i256 instructions #630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gakonst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall supportive, defer to dragan for more
| Sign::Minus => { | ||
| let shifted = ((op2.overflowing_sub(U256::from(1)).0) >> shift) | ||
| .overflowing_add(U256::from(1)) | ||
| .0; | ||
| two_compl(shifted) | ||
| } | ||
| Sign::Plus | Sign::Zero => op2.wrapping_shr(shift), | ||
| Sign::Minus => two_compl(op2.wrapping_sub(ONE).wrapping_shr(shift).wrapping_add(ONE)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine if we are OK with going from overflowing -> wrapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simplify this to ((op2-ONE)>>shift)+1) as checks for op2 == zero and op1 >= 256 are done and shr in background calls overflowing_shr inside ruint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense in the context that ruint operations are overloaded to be wrapping, but it might not be obvious without this knowledge. I wrote them with explicit wrapping, normally I'd use operators but this is one of the very few instances where I'd rather use explicit methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrapper_* are nicer here then overflowing_* so I will just merge PR.
But I am curious why you think this is one of those instances, bearing in mind that we already check for op2 == zero and op1 >= 256
| match first_sign.cmp(&second_sign) { | ||
| // note: adding `if first_sign != Sign::Zero` to short circuit zero comparisons performs | ||
| // slower on average, as of #582 | ||
| Ordering::Equal => first.cmp(second), | ||
| o => o, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems fine
rakita
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! Had a few small nits, really like what you did with Sign
| .0; | ||
| two_compl(shifted) | ||
| } | ||
| Sign::Plus | Sign::Zero => op2.wrapping_shr(shift), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shift size is checked so it is okay to use *op2 >>shift without wrapping_shr:
https://doc.rust-lang.org/std/primitive.u8.html#method.wrapping_shl
wrapping_shr calls overflowing_shr and even ordinary shr calls wrapping_shr
| Sign::Minus => { | ||
| let shifted = ((op2.overflowing_sub(U256::from(1)).0) >> shift) | ||
| .overflowing_add(U256::from(1)) | ||
| .0; | ||
| two_compl(shifted) | ||
| } | ||
| Sign::Plus | Sign::Zero => op2.wrapping_shr(shift), | ||
| Sign::Minus => two_compl(op2.wrapping_sub(ONE).wrapping_shr(shift).wrapping_add(ONE)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simplify this to ((op2-ONE)>>shift)+1) as checks for op2 == zero and op1 >= 256 are done and shr in background calls overflowing_shr inside ruint.
|
Noticed one more thing (a774b29), we can drop the zero check in div and mod after doing the operation, since it allows for more optimizations at function return and results in better codegen overall. |
* perf(interpreter): improve i256 instructions * chore: remove unused code, address review * perf: drop zero check after dividing
* perf(interpreter): improve i256 instructions * chore: remove unused code, address review * perf: drop zero check after dividing
Split off from #582
Summary:
i256_sign:i256_cmp(seeslt,sgtimpls).i256_sign_complgets optimized away likei256_sign::<true>beforeSignvariants we can leverage derivedEq, Ordimpls, which are can be further optimized by the compiler. Also, by using-1, 0, 1it will be the exact samecmp::Orderingand thesignumfunction result, which helps in the sign function (where we transmutenum != 0directly toSign)i256_div: if condition generates better code than a fully exhaustive match (branchless vs multiple branches). I don't know why, I would assume the two to be identical ... godbolt: https://godbolt.org/z/939obPrsjoverflowing().0==wrapping(),>> == wrapping_shr