feat(benchmark): add SLOAD/SSTORE benchmark test with multi-contract support #2256

CPerezz · 2025-10-03T09:20:36Z

🗒️ Description

Add test_sload_empty_erc20_balanceof to benchmark SLOAD operations on non-existing storage slots using ERC20 balanceOf() queries.

The idea of this benchmark is to exploit within a single or series of N contracts calls to non-existing addresses. On this way, we force clients to resolve as many tree branches as possible.

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

marioevz

Did a quick pass and it looks good to me overall.

I left a couple of questions as comments. Thanks!

marioevz · 2025-10-03T22:20:23Z

tests/benchmark/stateful/bloatnet/test_single_opcode.py

+    pre: Alloc,
+    fork: Fork,
+    gas_benchmark_value: int,
+    address_stubs,


I didn't quite expect the fixture to be directly used by the test tbh, it's an interesting workaround!

A couple things:

It will produce unexpected behavior to someone running execute and not knowing the inner workings of this tests because it will try to use all the stubs, including those that are meant to be used by other tests.

The test will change its behavior depending on the stubs passed to the parameter in execute, which is not inherently a bad thing, but I think we should really think about it.

Yes. I basically did that because a lot of the tests we will do will just be the same code, but changing the contract against which we run it.
Thus I thought it would not make sense to duplicate code all the time. And instead, I should just reuse the same code for any number of contracts that share interface.

So it's the stubs what actually determines how the contract behaves.

I spoke with @kamilchodola about this and he's also not sure if this way will be the best for his tool. Nevertheless, IMO it's the best in regards code to maintain and simplicity.

marioevz · 2025-10-03T22:23:05Z

tests/benchmark/stateful/bloatnet/test_single_opcode.py

+#                 3. Most addresses have zero balance → empty storage slots
+#
+# WHY IT STRESSES CLIENTS:
+#   - Each balanceOf() call forces a cold SLOAD on a likely-empty slot


How different would this be from a benchmark test that optimizes for SLOADs instead of trying to mimic the balanceOf() behavior?
The things that come to mind are the extra jumps, keccak operations, and the fact that balanceOf() (iirc) only does a single SLOAD per subcall.

The problem we have here is the following:

The attack wants to stress calling big contracts and doing path resolution on them (at least usually).

The biggest contracts (that share interface) are ERC20. And balanceOf doesn't really add a ton of overhead.

If we had to deploy contracts that allow us to abuse SLOAD, we would need a ton of time to bloat lots of contracts with the same interface and make them 5-20 GB of storage each.

Thus, for this iteration, it just seems significantly easier to go this route.

Do you mean that, in your benchmarking process, the pre-deployed contracts already contain randomized storage values (such as balances or approvals), and you’re benchmarking the SLOAD operation based on that?

I’ve read the recent state analysis report, which shows that USDT is one of the largest contracts in terms of storage state. In your case, would this be similar to benchmarking state operations using such a large contract as a reference?

They don't contain it. But they target the biggest ERC20 contracts on the chain. And ERC20 gives me a common interface to call them which I exploit here.

Otherwise, I'd need to deploy and bloat contracts all the time to perform these tests (which I might do in the future, but not now).

tests/benchmark/stateful/bloatnet/test_single_opcode.py

marioevz · 2025-10-03T22:25:14Z

tests/benchmark/stateful/bloatnet/test_single_opcode.py

+        # RETURN costs 0 gas
+    )
+
+    num_contracts = len(address_stubs.root)


I think we have to do the pattern (erc20_contract_*) discrimination here to have a proper number.

Why is that? I was under the assumption that all ERC20s will share the same interface. Though it's true they might have overwritten it, balanceOf seems to not make sense to modify.

Could you elaborate a bit?

For example, if we run execute for all tests, and therefore pass stub contracts that are needed for other tests (like xen_contract for example), these other contracts are going to be included in address_stubs unconditionally, and we are going to try to send a balanceOf to those other contracts that are not ERC20 contracts.

marioevz · 2025-10-03T22:25:37Z

tests/benchmark/stateful/bloatnet/test_single_opcode.py

+    # In execute mode: stubs point to already-deployed contracts on chain
+    # In fill mode: empty bytecode is deployed as placeholder
+    erc20_addresses = []
+    for stub_name in address_stubs.root:


This loop also needs to discriminate using the pattern.

I have the same question here. Can you elaborate on this a bit?

CPerezz · 2025-10-06T07:43:44Z

tests/benchmark/stateful/bloatnet/test_single_opcode.py

+    BloatNet SSTORE benchmark using ERC20 approve to write to storage.
+
+    This test:
+    1. Auto-discovers ERC20 contracts from stubs (pattern: erc20_contract_*)


I think we might not need the pattern at all.

LouisTsai-Csie

Some small suggestion and question, you could ignore these comment if it does not make sense!

tests/benchmark/stateful/bloatnet/test_single_opcode.py

LouisTsai-Csie · 2025-10-06T09:22:53Z

tests/benchmark/stateful/bloatnet/test_single_opcode.py

+                    + Op.MSTORE(offset=64, value=Op.MLOAD(0))
+                    # Store amount at memory[96] (use counter as amount)
+                    + Op.MSTORE(offset=96, value=Op.MLOAD(0))


Just a random idea, not a proposed change, but maybe Op.GAS could also be used for the spender address and amount? It’s non-sequential and might be slightly cheaper than MLOAD(0).

Notice this won't work because here this is acting as:

As the loop counter (decrement)

As the address/spender value
Using GAS would give non-deterministic values and seems much harder to turn deterministic imo.

LMK if I missunderstood you.

Add test_sload_empty_erc20_balanceof to benchmark SLOAD operations on non-existing storage slots using ERC20 balanceOf() queries. The idea of this benchmark is to exploit within a single or series of N contracts calls to non-existing addresses. On this way, we force clients to resolve as many tree branches as possible.

Add test_sstore_erc20_approve that benchmarks SSTORE operations by calling approve(spender, amount) on pre-deployed ERC20 contracts. Follows the same pattern as the SLOAD benchmark: - Auto-discovers ERC20 contracts from stubs - Splits gas budget evenly across all discovered contracts - Uses counter as both spender address and amount - Forces SSTOREs to allowance mapping storage slots The test measures client performance when writing to many storage slots across multiple contracts, stressing state-handling write operations.

Fixed gas calculation for test_sstore_erc20_approve to ensure accurate gas usage prediction and prevent transaction reverts: Key fixes: - Added memory expansion cost (15 gas per contract) - Corrected G_LOW gas values in comments (5 gas, not 3) - Separated per-contract overhead from per-iteration costs - Improved cost calculation clarity with detailed opcode breakdown Gas calculation (10M gas, 3 contracts): - Intrinsic: 21,000 - Overhead per contract: 38 - Cost per iteration: 20,226 - Calls per contract: 164 - Expected gas used: 9,972,306 (99.72% utilization)

…atios Add test_mixed_sload_sstore to test_multi_opcode.py that combines SLOAD and SSTORE operations with parameterized gas distribution ratios (50-50, 70-30, 90-10). The test stresses clients with mixed read/write workloads by: - Dividing gas budget evenly across all discovered ERC20 contract stubs - Splitting each contract's allocation by the specified percentage ratio - Executing balanceOf (cold SLOAD on empty slots) for the SLOAD portion - Executing approve (SSTORE to new allowance slots) for the SSTORE portion Verified gas calculations for 10M gas budget with 3 contracts (50-50 ratio): - SLOAD operations: ~2,312 gas/iteration → 719 calls per contract - SSTORE operations: ~20,226 gas/iteration → 82 calls per contract - Total operations: 2,403 state operations (2,157 SLOADs + 246 SSTOREs) - Gas usage: 9.98M / 10M (16K buffer, no out-of-gas errors) This benchmark enables testing different read/write ratios to identify client performance characteristics under varying state operation mixes.

…back Address review comments by optimizing loop efficiency: 1. Move function selector MSTORE outside loops (Comment ethereum#2) - BALANCEOF_SELECTOR and APPROVE_SELECTOR now stored once per contract - Saves 3 gas (G_VERY_LOW) per iteration - Total savings: ~6,471 gas for 50-50 ratio with 10M budget and 3 contracts 2. Remove unused return data from CALL operations (Comment ethereum#1) - Changed ret_offset=96/128, ret_size=32 to ret_offset=0, ret_size=0 - Eliminates unnecessary memory expansion - Minor gas savings, cleaner implementation Skipped Comment ethereum#3 (use Op.GAS for addresses): - Would lose determinism (GAS varies per iteration) - Adds complexity for minimal benefit - Counter still needed for loop control Changes applied to: - test_sload_empty_erc20_balanceof - test_sstore_erc20_approve - test_mixed_sload_sstore (both SLOAD and SSTORE loops)

LouisTsai-Csie

@CPerezz , I’ve left some suggestions. Please take a look and let me know if they’re unclear or not practical. These changes might not reduce much gas usage, but i wonder if they could help simplify the layout a bit.

I've not yet reviewed test_multi_opcode.py, but i believe it would be quick if we have consensus on the other test cases!

LouisTsai-Csie · 2025-10-07T03:26:30Z

tests/benchmark/stateful/bloatnet/test_single_opcode.py

+    )
+
+    # Build attack code that loops through each contract
+    attack_code: Bytecode = Op.JUMPDEST  # Entry point


I have a suggestion for the attack loop: (1) the current memory layout duplicates the same counter value in two locations (e.g. MEM[0] and MEM[64]). (2) The memory storage for balance selector could further be taken out of the for-loop, as it is always a constant.

attack_code = Op.MSTORE(offset=0, value=BALANCE_SELECTOR) # This do not need to be inside the for loop as it is constant for erc20_address in erc20_addresses: attack_code += Op.MSTORE(offset=32, value=calls_per_contract) + While( condition=Op.MLOAD(32) + Op.ISZERO + Op.ISZERO, # Continue while counter > 0 body=( + Op.CALL( address=erc20_address, args_offset=28, args_size=36, ) + Op.POP + Op.MSTORE(offset=32, value=Op.SUB(Op.MLOAD(32), 1)) ), )

In this implementation, we use MEM[32] for the counter, and only store the balance selector once. Do you think this works in the current scenario?

The offset of CALL's parameter might be slightly different, please see comments below

LouisTsai-Csie · 2025-10-07T03:41:38Z

tests/benchmark/stateful/bloatnet/test_single_opcode.py

+                    + Op.CALL(
+                        address=erc20_address,
+                        value=0,
+                        args_offset=32,
+                        args_size=36,
+                        ret_offset=0,
+                        ret_size=0,
+                    )


IIUC, a typical Solidity calldata pattern consists of a function selector followed by ABI-encoded arguments.

Considering only the first iteration of the while loop, the memory layout would be:

MSTORE(0, counter) MSTORE(32, BALANCE_SELECTOR) MSTORE(64, counter)

Assuming the counter value is 3, i tried out this memory sequence on evm.codes. (The plyaground with mnemonic input)

PUSH4 0x70A08231 PUSH1 0x20 MSTORE PUSH2 0x0003 PUSH0 MSTORE PUSH0 MLOAD PUSH1 0x40 MSTORE

And i get this memory layout:

0000000000000000000000000000000000000000000000000000000000000003 0000000000000000000000000000000000000000000000000000000070a08231 0000000000000000000000000000000000000000000000000000000000000003

It seems memory is left-padded, so the correct starting offset here for the external call might be 32 + 32 - 4 = 60, rather than 32.

Similarly, the starting offset of external call for the previous comment is 28, not 0.

LouisTsai-Csie · 2025-10-07T03:54:10Z

tests/benchmark/stateful/bloatnet/test_single_opcode.py

+                    + Op.CALL(
+                        address=erc20_address,
+                        value=0,
+                        args_offset=32,
+                        args_size=68,  # 4 bytes selector + 32 bytes spender + 32 bytes amount
+                        ret_offset=0,
+                        ret_size=0,
+                    )


I have the same question about the args_offset here, should it start from 60, not 32 here?

LouisTsai-Csie · 2025-10-07T03:55:12Z

tests/benchmark/stateful/bloatnet/test_single_opcode.py

+    )
+
+    # Build attack code that loops through each contract
+    attack_code: Bytecode = Op.JUMPDEST  # Entry point


Same suggestion here:

Do you think we could simplify the memory layout here?

Could we move the SELECTOR memory operation out of the for-loop?

…alldata encoding - Move selector MSTORE outside for-loop (saves gas per contract) - Use single counter at MEM[32] instead of duplicate at MEM[0] and MEM[64] - Fix calldata encoding by using args_offset=28 for correct ABI format - Selector now properly positioned at start of calldata

…calldata encoding - Move selector MSTORE outside for-loop (saves gas per contract) - Use single counter at MEM[32] instead of duplicate at MEM[0] - Fix calldata encoding by using args_offset=28 for correct ABI format - Selector now properly positioned at start of calldata

…x calldata encoding - Move selectors MSTORE outside for-loop (saves gas per contract) - Use separate memory regions for balanceOf and approve to avoid conflicts - Fix calldata encoding by using correct args_offset for proper ABI format - Selectors now properly positioned at start of calldata

…stently - Reuse MEM[0] for both selectors (sequential operations, no conflict) - Reuse MEM[32] for both counters (balanceOf then approve) - Reuse MEM[64] and MEM[96] for parameters - Consistent args_offset=28 for both operations (was 28 and 128) - Matches single-opcode test pattern for easier understanding - Reduces memory footprint from 196 bytes to 96 bytes

marioevz reviewed Oct 3, 2025

View reviewed changes

CPerezz commented Oct 6, 2025

View reviewed changes

LouisTsai-Csie reviewed Oct 6, 2025

View reviewed changes

CPerezz added 5 commits October 6, 2025 17:42

CPerezz force-pushed the feat/bloatnet-sload-sstore-benchmarks branch from e0ae1ee to 552638e Compare October 6, 2025 15:48

CPerezz requested review from marioevz and LouisTsai-Csie October 6, 2025 15:54

LouisTsai-Csie requested changes Oct 7, 2025

View reviewed changes

LouisTsai-Csie mentioned this pull request Oct 7, 2025

Add automatic gas cost calculation for opcode sequences #2273

Open

CPerezz changed the title ~~feat(benchmark): add SLOAD benchmark test with multi-contract support~~ feat(benchmark): add SLOAD/SSTORE benchmark test with multi-contract support Oct 7, 2025

CPerezz added 4 commits October 8, 2025 14:25

feat(benchmark): add SLOAD/SSTORE benchmark test with multi-contract support #2256

Are you sure you want to change the base?

feat(benchmark): add SLOAD/SSTORE benchmark test with multi-contract support #2256

Uh oh!

Conversation

CPerezz commented Oct 3, 2025

🗒️ Description

✅ Checklist

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!