Skip to content

Use packed representation for StoredNibbles and StoredNibblesSubkey #17843

@Rjected

Description

@Rjected

Describe the feature

Right now StoredNibbles and StoredNibblesSubkey have inefficient encodings.

First, StoredNibbles uses the unpacked representation on disk:

#[cfg(any(test, feature = "reth-codec"))]
impl reth_codecs::Compact for StoredNibbles {
fn to_compact<B>(&self, buf: &mut B) -> usize
where
B: bytes::BufMut + AsMut<[u8]>,
{
for i in self.0.iter() {
buf.put_u8(i);
}
self.0.len()
}
fn from_compact(mut buf: &[u8], len: usize) -> (Self, &[u8]) {
use bytes::Buf;
let nibbles = &buf[..len];
buf.advance(len);
(Self(Nibbles::from_nibbles_unchecked(nibbles)), buf)
}
}

Next, StoredNibblesSubkey pads everything to 65 bytes total, with the format being unpacked nibbles || zeros (up to 64 bytes) || len:

#[cfg(any(test, feature = "reth-codec"))]
impl reth_codecs::Compact for StoredNibblesSubKey {
fn to_compact<B>(&self, buf: &mut B) -> usize
where
B: bytes::BufMut + AsMut<[u8]>,
{
assert!(self.0.len() <= 64);
// right-pad with zeros
for i in self.0.iter() {
buf.put_u8(i);
}
static ZERO: &[u8; 64] = &[0; 64];
buf.put_slice(&ZERO[self.0.len()..]);
buf.put_u8(self.0.len() as u8);
64 + 1
}
fn from_compact(buf: &[u8], _len: usize) -> (Self, &[u8]) {
let len = buf[64] as usize;
(Self(Nibbles::from_nibbles_unchecked(&buf[..len])), &buf[65..])
}
}

We can improve this by using an efficient encoding for both values on-disk. This would be len || packed nibbles.

Here are some stats on path length in existing mainnet account tables:

Length Count Percentage
1 16 0.00%
2 256 0.00%
3 4096 0.02%
4 65536 0.27%
5 1048576 4.33%
6 16726496 69.11%
7 6315318 26.09%
8 41590 0.17%
9 167 0.00%
10 2 0.00%
--- Most Common Nibble Lengths ---
1. Length 6 nibbles: 16726496 entries (69.11%)
2. Length 7 nibbles: 6315318 entries (26.09%)
3. Length 5 nibbles: 1048576 entries (4.33%)
4. Length 4 nibbles: 65536 entries (0.27%)
5. Length 8 nibbles: 41590 entries (0.17%)

And here is the distribution for storage tries:

Length Count Percentage
1 6502357 5.47%
2 17094720 14.38%
3 32187105 27.08%
4 27030971 22.74%
5 25716256 21.63%
6 10112317 8.51%
7 219679 0.18%
8 974 0.00%
9 5 0.00%
--- Most Common Nibble Lengths ---
1. Length 3 nibbles: 32187105 entries (27.08%)
2. Length 4 nibbles: 27030971 entries (22.74%)
3. Length 5 nibbles: 25716256 entries (21.63%)
4. Length 2 nibbles: 17094720 entries (14.38%)
5. Length 6 nibbles: 10112317 entries (8.51%)

Given we use 64 bytes for the path part of the subkey, which is mostly zeroes and has no compression, we would be able to get rid of over 90% of the space used by the key field.

Unsolved problems / TODO

This would probably change the ordering of the storage keys. This would impact where things are on disk, and impact performance. Given lexicographic order, this would cause nodes with the same level to be closer on disk, due to the length prefix. However this might break iterators that rely on next values with the current format

This also requires backwards compatibility - this is a more difficult problem as we would still have padded nodes in older DBs, which we would have to disambiguate. We would have to support both formats unless we did some sort of online migration (which still requires this backwards compatibility work), or decide to do a breaking change. Luckily backwards compatibility can be easily tested with existing DB data.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-dbRelated to the databaseC-enhancementNew feature or requestC-perfA change motivated by improving speed, memory usage or disk footprintM-prevent-stalePrevents old inactive issues/PRs from being closed due to inactivityS-needs-designThis issue requires design work to think about how it would best be accomplished

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions