I've been testing the server functionality by using Redis' client to connect to it. To run this server and test with one or more clients do the following:
Ensure the Redis CLI is installed:
mise use --global redisClone this repository and cd to the root directory:
git clone [email protected]:ev-the-dev/redis-go-clone.git && cd redis-go-cloneRun the server from the root directory:
go run .OR if tesing with an existing local rdb file:
go run . --dir ./ --dbfilename dump.rdbOpen a new shell to connect to the server from the previous step (this step can be repeated to simulate multiple connected clients):
redis-cli -p 6379Each RDB file is broken up into a few sections:
- Header
- Metadata / Auxiliary Fields
- Database Selection
- Footer
To fully understand how section parsing and key:value pairs in RDB files work, this section is necessary.
Some caveats:
- Timestamps and Checksums seem to be in little-endian, so when evaluating them be cognizant of reversing the bytes.
- It also seems like the metadata section uses little-endian for its integers.
- This might be true of any ints as strings regardless of section (the special case string encoded).
There are some reserved hex codes for specific sections and sub-sections:
0xFA: Beginning of a new Auxiliary field. Followed by length-encoded key:value pair.0xFE: Beginning of a new Database. Followed by length-encoded value describing DB number.0xFB: Proceeds0xFEand describes hash table sizes for main keyspace and expires.- NOTE: It's my understanding that for newer versions of REDIS, the next two bytes will always be
02 01. This is because REDIS now uses lazy resizing and does not need precise initial sizing.
- NOTE: It's my understanding that for newer versions of REDIS, the next two bytes will always be
0xFD&0xFC: Mutually exclusive (I think) codes representing the proceeding DB field's expire time (FD for seconds & FC for milliseconds).0xFD: Following 4 bytes represent uint Unix timestamp in seconds.0xFC: Following 8 bytes represent unsigned long Unix timestamp in milliseconds.
0xFF: Signifies the end of the RDB file.
When parsing a length-encoded descriptor, you need to think about the underlying bits of the hexadecimal value. The first two bits in the byte (significant bits) determine how to parse the rest of the length-encoded descriptor as well as the field it describes and how many bytes comprise that field. Here are the 4 types of significant bit pairs:
00: Next 6 bits represent length.01: Next 6 bits plus the next byte represent length (14 bits total).10: Discard remaining 6 bits. Next 4 bytes represent length.11: Special format. Next 6 bits describe format. Can be used to store numbers or strings using String Encoding.
Warning
Length Encoding has a significant nuance to it. The first 3 cases (00, 01, and 10) simply return how many bytes to read for the default value type -- this type being determined by external context, such as strings in metadata, or the Value Type flag in the database sections. The 4th case (11) provides some extra, internal context. It, paired with that same external context, explains whether the proceding value is of an integer or LZF type (See: String Encoding).
Example of a length-encoded descriptor and value:
00 05 68 65 6c 6c 6f
00: Value type:0in binary represents a string-encoded value type.05: Convert from HEX->Binary:05->00000101.- First 2 bits are
00, thus remaining 6 bits determine the length in bytes. - Remaining 6 bits are
000101or5, so read the next 5 bytes as UTF-8 (due to non-special format string value type).
- First 2 bits are
68 65 6c 6c 6f: Convert from HEX->String:68 65 6c 6c 6f->h e l l o.
The header is extremely simple, it contains two key pieces of information:
- A "magic" string that spells out
REDIS.- This is represented as HEX->UTF-8:
52 45 44 49 53->R E D I S.
- This is represented as HEX->UTF-8:
- An UTF-8 RDB version number, i.e.
0012.- This is represanted as HEX->UTF-8:
30 30 31 32->0 0 1 2. - NOTE: This is the RDB version, NOT the REDIS version.
- This is represanted as HEX->UTF-8:
This section should have a fixed amount of entries. I say should because there could be unknown fields, but these should be ignored by the parser.
Note
Each key:value pair is preceded by the 0xFA op code and are string-encoded value types.
Because they'll always be of a string value type, there's no need for the extra preceeding byte to tell us what the value type is, like we see in the database section.
These are the supported fields:
redis-ver: REDIS version that wrote the RDB file.redis-bits: Bit architecture of OS that wrote the RDB (32 or 64).ctime: Creation time of the RDB file.used-mem: Used memory of instance that wrote the RDB file.
Here's an example of how one of these fields look like in the RDB file with the op code prefix:
fa 09 72 65 64 69 73 2d 76 65 72 05 37 2e 34 2e 32
fa: Indicates new aux field.09: length-encoded descriptor for the key of the key:value pair.- Converting HEX->Binary:
09->00001001.- First 2 bits are
00, thus remaining 6 bits describe length, in bytes, of proceeding key.
- First 2 bits are
001001=9, thus next 9 bytes are the key.
- Converting HEX->Binary:
72 65 64 69 73 2d 76 65 72: The 9 bytes converted to UTF-8 read as ->redis-ver.05: length encoded descriptor for the value of the key:value pair.- Converting HEX->Binary:
05->00000101. 000101=5, thus 5 bytes are the value.
- Converting HEX->Binary:
37 2e 34 2e 32: the 5 bytes converted to UTF-8->7.4.2.
Tip
Putting everything in this example together we see that there is an aux field named redis-ver with a value of 7.4.2.
There can be n number of DB selectors. Each section starts with 0xFE op code followed by a byte signifying the DB number -- i.e. fe 00 = DB number 00.
After the 0xFE code, there will be an 0xFB op code to indicate the hash and expire hash table sizes subsequently.
Each DB section will contain series of records with a specific order of data:
0xFCor0xFD: Expire times. Represented in little-endian I believe (reverse order of bytes to read the number value properly). Optional?- Value Type: 1 byte flag indicating type (string, list, hash).
- Key: string-encoded.
- Value: Parsed according to previously read Value Type (see #2 above).
Here's an example of a DB record:
fc 7d ab e7 4f 96 01 00 00 00 05 68 65 6c 6c 6f 05 77 6f 72 6c 64
Let's break that up a bit so it's easier to read/parse:
fc 7d ab e7 4f 96 01 00 00 00 05 68 65 6c 6c 6f 05 77 6f 72 6c 64
- Expire time:
fc 7d ab e7 4f 96 01 00 00.- The
fcindicates an expire time in milliseconds, hence the following 8 bytes allocated for that timestamp. - Since these timestamps are in a little-endian format, to convert them to the timestamp they need to be reversed, ergo when computing it would look like this:
00 00 01 96 4f e7 ab 7d.
- The
- Value Type:
00.- String value type.
- Might indicate String Encoded type instead -- in which case the value ultimately is determined by the length encoded significant bits (all cases would be a string type, unless significant bits are
11).
- Key:
05 68 65 6c 6c 6f.05HEX->Binary:05->00000101.- First 2 bytes are
00, thus only the next 6 bits describe size. - Remaining bits equal
000101, or5, thus the key comprises the next 5 bytes.
- First 2 bytes are
- Converting HEX->UTF-8:
68 65 6c 6c 6f->h e l l o
- Value:
05 77 6f 72 6c 64.05HEX->Binary:05->00000101.- First 2 bytes are
00, thus only the next 6 bits describe size. - Remaining bits equal
000101, or5, thus the key comprises the next 5 bytes.
- First 2 bytes are
- Converting HEX->UTF-8:
77 6f 72 6c 64->w o r l d
Tip
Putting everything in this example together we see that there is a DB field with an expiry time in milliseconds equalling 1745097304957, the value type is a string-encoded represented as 00, the key is 5 bytes long and spells hello, the value is 5 bytes long and is a string stating world.
The footer is pretty basic, it just contains two things:
0xFF: EOF indicator op code.- Checksum: little-endian(?) 8 bytes of CRC64 checksum of the entire file.
Value types are generally 1-byte flags. I say generally, because I still use the term "value type" for implicitly defined types, like those for string encoded keys in the metadata section.
The value of the 1-byte flag correlates to a specific type:
0: String Encoding1: List Encoding2: Set Encoding3: Sorted Set Encoding4: Hash Encoding9: Zipmap Encoding10: Ziplist Encoding11: Intset Encoding12: Ziplist Sorted Set Encoding13: Ziplist Hashmap Encoding14: Quicklist List Encoding
There are three types of Strings in a Redis RDB file:
- Length prefixed strings
- 8, 16, or 32 bit integer
- LZF compressed string
Length Prefixed String:
- Length of the string, in bytes, is encoded using Length Encoding. Then the following raw bytes of the string are stored.
- This is the case when the length-encoded significant bits are either
00,01, or10AND the Value Type is00(string encoded) -- whether implied, as in the metadata fields; or explicit, as in the database sections.
Integers as Strings:
- After a length-encoded value produced
11as the significant bits, the remaining 6 bits are read to determine whether the integer is 8, 16, or 32 bits long.0: indicates an 8 bit integer follows. Read the next byte for the int value.1: indicates a 16 bit integer follows. Read the next 2 bytes for the int value.2: indicates a 32 bit integer follows. Read the next 4 bytes for the int value.
Compressed Strings:
- Like "Integers as Strings" above, after the length-encoded value produces
11, the remaining 6 bits are read:3: indicates an LZF string follows.
- The compressed string is read as follows:
- Compressed length
clenis read using length-encoding. - Uncompressed length is read using length-encoding.
- Next
clenbytes are read. - Finally, the read
clenbytes are decompressed using theLZFalgorithm.
- Compressed length