-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Describe the feature
Right now StoredNibbles and StoredNibblesSubkey have inefficient encodings.
First, StoredNibbles uses the unpacked representation on disk:
reth/crates/trie/common/src/nibbles.rs
Lines 25 to 44 in b2d9cc1
| #[cfg(any(test, feature = "reth-codec"))] | |
| impl reth_codecs::Compact for StoredNibbles { | |
| fn to_compact<B>(&self, buf: &mut B) -> usize | |
| where | |
| B: bytes::BufMut + AsMut<[u8]>, | |
| { | |
| for i in self.0.iter() { | |
| buf.put_u8(i); | |
| } | |
| self.0.len() | |
| } | |
| fn from_compact(mut buf: &[u8], len: usize) -> (Self, &[u8]) { | |
| use bytes::Buf; | |
| let nibbles = &buf[..len]; | |
| buf.advance(len); | |
| (Self(Nibbles::from_nibbles_unchecked(nibbles)), buf) | |
| } | |
| } |
Next, StoredNibblesSubkey pads everything to 65 bytes total, with the format being unpacked nibbles || zeros (up to 64 bytes) || len:
reth/crates/trie/common/src/nibbles.rs
Lines 73 to 96 in b2d9cc1
| #[cfg(any(test, feature = "reth-codec"))] | |
| impl reth_codecs::Compact for StoredNibblesSubKey { | |
| fn to_compact<B>(&self, buf: &mut B) -> usize | |
| where | |
| B: bytes::BufMut + AsMut<[u8]>, | |
| { | |
| assert!(self.0.len() <= 64); | |
| // right-pad with zeros | |
| for i in self.0.iter() { | |
| buf.put_u8(i); | |
| } | |
| static ZERO: &[u8; 64] = &[0; 64]; | |
| buf.put_slice(&ZERO[self.0.len()..]); | |
| buf.put_u8(self.0.len() as u8); | |
| 64 + 1 | |
| } | |
| fn from_compact(buf: &[u8], _len: usize) -> (Self, &[u8]) { | |
| let len = buf[64] as usize; | |
| (Self(Nibbles::from_nibbles_unchecked(&buf[..len])), &buf[65..]) | |
| } | |
| } |
We can improve this by using an efficient encoding for both values on-disk. This would be len || packed nibbles.
Here are some stats on path length in existing mainnet account tables:
| Length | Count | Percentage |
|---|---|---|
| 1 | 16 | 0.00% |
| 2 | 256 | 0.00% |
| 3 | 4096 | 0.02% |
| 4 | 65536 | 0.27% |
| 5 | 1048576 | 4.33% |
| 6 | 16726496 | 69.11% |
| 7 | 6315318 | 26.09% |
| 8 | 41590 | 0.17% |
| 9 | 167 | 0.00% |
| 10 | 2 | 0.00% |
--- Most Common Nibble Lengths ---
1. Length 6 nibbles: 16726496 entries (69.11%)
2. Length 7 nibbles: 6315318 entries (26.09%)
3. Length 5 nibbles: 1048576 entries (4.33%)
4. Length 4 nibbles: 65536 entries (0.27%)
5. Length 8 nibbles: 41590 entries (0.17%)
And here is the distribution for storage tries:
| Length | Count | Percentage |
|---|---|---|
| 1 | 6502357 | 5.47% |
| 2 | 17094720 | 14.38% |
| 3 | 32187105 | 27.08% |
| 4 | 27030971 | 22.74% |
| 5 | 25716256 | 21.63% |
| 6 | 10112317 | 8.51% |
| 7 | 219679 | 0.18% |
| 8 | 974 | 0.00% |
| 9 | 5 | 0.00% |
--- Most Common Nibble Lengths ---
1. Length 3 nibbles: 32187105 entries (27.08%)
2. Length 4 nibbles: 27030971 entries (22.74%)
3. Length 5 nibbles: 25716256 entries (21.63%)
4. Length 2 nibbles: 17094720 entries (14.38%)
5. Length 6 nibbles: 10112317 entries (8.51%)
Given we use 64 bytes for the path part of the subkey, which is mostly zeroes and has no compression, we would be able to get rid of over 90% of the space used by the key field.
Unsolved problems / TODO
This would probably change the ordering of the storage keys. This would impact where things are on disk, and impact performance. Given lexicographic order, this would cause nodes with the same level to be closer on disk, due to the length prefix. However this might break iterators that rely on next values with the current format
This also requires backwards compatibility - this is a more difficult problem as we would still have padded nodes in older DBs, which we would have to disambiguate. We would have to support both formats unless we did some sort of online migration (which still requires this backwards compatibility work), or decide to do a breaking change. Luckily backwards compatibility can be easily tested with existing DB data.
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status