Skip to content

Conversation

@Tpt
Copy link
Contributor

@Tpt Tpt commented Jan 31, 2023

No description provided.

@Tpt Tpt force-pushed the simd_find_entry branch 2 times, most recently from 5427684 to 54d8e85 Compare January 31, 2023 21:49
src/index.rs Outdated
let current = _mm_unpacklo_epi32(first_two, last_two);
let cmp = _mm_movemask_epi8(_mm_cmpeq_epi32(current, target));
if cmp != 0 {
// We found a match, let's validate it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we can simply return entry at i + cmp.leading_zeroes() / 4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky thing is that the shift might be of 30 bits i.e. are left with a 34 bit number that we cast to 32 bits. So, this means we need to check again, sadly. But we can definitely be smarter and skip not useful comparison.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the idea to just skip two bits of check in this case? (but then need to shift 32 bits instead of 30, so we check the 32 bit of the start, the 2 last unchecked bits will be checked with the rest of the key on value access).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly for the skipping. Currently we check the last 32 bits, and check again the remaining 2 bits before returning. We can indeed simplfy the operations by just checking the first 32 bits (i.e. the high weight part of the entry if I am not mistaken).

Now, there is the question of which two bits are the most discriminative, the higher 2 or the lower 2?) Because the partial_key is build with u64::from_be_bytes I would tend to think the higher two might be more relevant but I am not sure. What do you think about it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spend time remembering/reunderstanding this today. might still be wrong, but my conclusion was the higher 2 so we in case of index 16 we would need to move >> 32 instead of 30. (the other way we would miss checking key content as the part of the key stored and check in index is 26 bytes containing the last two bits, then we indeed got 16 bit check from the access to the index, 32 from the index key content and 26 * 8 from the value table, should be 256).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'm going to update the PR to use the high 32 bits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous version was actually more correct. We can't assume that partial key is always 32 bits, so shifts are still needed. We can ignore the last two bits for the cases when it is 33 or 34 bytes however. A bit of explanation:

Keys are always 256 bits. Key bits [48..256] are stored in the value table. So if find_entry returns a match, these bits are compared to make sure the rest of the key matches . Therefore find_entry must only match the first 48 bits of the key. Current version actually matches 50, so that's a bit of an overkill.

The first index_bits of the key are simply the offset of the chunk in the file. All entries in the chunk share the same first index_bits of the key. They are not stored in the entries. Bits [index_bits..50] is what's actually stored in the entry and that's called partial_key. The rest of the entry bits is address. As index_bits grows, so does address_bits, and partial_key shrinks. But the chunk index bits grows as well, so we can always match 50 bits of the key in the index file.

Indexes start with index_bits==16. With index_bits==16 partial_key is 34, but the last two bits can be ignored and we only need to match 48 bits.
With index_bits==21 however partial_key is only 29 bits, so the last 3 of the first 32 entry bits should be excluded from comparison.

Copy link
Contributor Author

@Tpt Tpt Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your detailed explanation. I indeed forgot that the index_bit can grow further. I have applied @cheme comments to fix the implementation.

I also added a test to cover this case because it was not already done by the existing tests.

@arkpar
Copy link
Member

arkpar commented Jan 31, 2023

Great stuff. Curious to see if it has any affect on the query performance.

@arkpar arkpar requested a review from cheme February 1, 2023 10:03
@Tpt Tpt force-pushed the simd_find_entry branch from 54d8e85 to d11dc96 Compare February 1, 2023 19:53
@Tpt Tpt force-pushed the simd_find_entry branch from d11dc96 to 5818de7 Compare February 1, 2023 20:12
@Tpt
Copy link
Contributor Author

Tpt commented Feb 1, 2023

@arkpar @cheme Thank you for your reviews. I have written a version 2 using the assumption that the high 32bits are part of the partial key.

I ran a quick benchmark on my laptop. The SSE2 version seems to be slightly faster but the numbers are not stable enough to tell more than that. We might also try an AVX version.

@Tpt Tpt force-pushed the simd_find_entry branch from 90ac43d to 28ed988 Compare February 2, 2023 07:35
@Tpt Tpt force-pushed the simd_find_entry branch from 28ed988 to a976dd4 Compare February 2, 2023 07:39
@Tpt Tpt force-pushed the simd_find_entry branch from 232e6ed to 21f85bb Compare February 2, 2023 21:00
@Tpt Tpt force-pushed the simd_find_entry branch from 21f85bb to e555604 Compare February 2, 2023 21:12
Copy link
Collaborator

@cheme cheme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, could do :

diff --git a/src/index.rs b/src/index.rs
index 5bd78b9..f5e94bf 100644
--- a/src/index.rs
+++ b/src/index.rs
@@ -235,12 +235,14 @@ impl IndexTable {
                Ok(try_io!(Ok(&map[offset..offset + CHUNK_LEN])))
        }
 
+       #[cfg(target_feature = "sse2")]
        fn find_entry(&self, key_prefix: u64, sub_index: usize, chunk: &[u8]) -> (Entry, usize) {
-               if cfg!(target_feature = "sse2") {
-                       self.find_entry_sse2(key_prefix, sub_index, chunk)
-               } else {
-                       self.find_entry_base(key_prefix, sub_index, chunk)
-               }
+               self.find_entry_sse2(key_prefix, sub_index, chunk)
+       }
+
+       #[cfg(not(target_feature = "sse2"))]
+       fn find_entry(&self, key_prefix: u64, sub_index: usize, chunk: &[u8]) -> (Entry, usize) {
+               self.find_entry_base(key_prefix, sub_index, chunk)
        }
 
        #[cfg(target_feature = "sse2")]
@@ -284,6 +286,7 @@ impl IndexTable {
                (Entry::empty(), 0)
        }
 
+       #[cfg(any(not(target_feature = "sse2"), test))]
        fn find_entry_base(&self, key_prefix: u64, sub_index: usize, chunk: &[u8]) -> (Entry, usize) {
                assert!(chunk.len() >= CHUNK_ENTRIES * 8);
                let partial_key = Entry::extract_key(key_prefix, self.id.index_bits());
@@ -608,19 +611,19 @@ mod test {
 
        #[test]
        fn test_find_entries() {

or expose a dummy find_entry_sse2 function when no feature sse2 (method may be missing if no traget).

@arkpar arkpar merged commit fc1250f into paritytech:master Feb 3, 2023
@arkpar arkpar mentioned this pull request Feb 6, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants