Skip to content

Conversation

@guillaumemichel
Copy link
Collaborator

Part of #1095

Description

The KeyStore is a datastore backed state to keep track of the keys (multihashes) that should be periodically reprovided to the DHT swarm.

The keys are stored by DHT keyspace locality. They are grouped by (constant size) prefix and indexed by this prefix in the datastore. Note that the key by which they are addressed is their Kademlia identifier, and not the hash from the multihash.

Operations

Readers (Get()) only make prefix requests, since the caller wants to retrieve all keys matching a specific set of peers in the network matching the keyspace prefix.

Writers (Put()) are expected to write random keys (not necessarily close in the keyspace).

Keys can be removed using Delete().

Since it is tricky for kubo to determine whether a key should stop being provided or not, a garbage collection (reset) function is available to remove keys that shouldn't be reprovided anymore. Caller can provide a GC function, containing the list of all keys that the KeyStore is expected to hold. If the GC function is set, the KeyStore will periodically purge all its keys, and repopulate itself with the keys supplied by the GC/reset function.

@guillaumemichel guillaumemichel requested a review from gammazero July 9, 2025 08:50
@guillaumemichel guillaumemichel requested a review from a team as a code owner July 9, 2025 08:50
This was referenced Jul 9, 2025
Copy link
Contributor

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the storage strategy could be improved. Currently a whole list of multihahses is stored under a prefix key. This means that is a CID is added or removed, then the whole buffer containing the list of multihashes must be loaded, modified, and then saved again. Also, if the number of prefix bits changes, then all the data must be remapped to new keys.

Consider storing each multihash as an individual record with it key being the bits in the multihash, or at least all the ones we might ever want to use as a prfix. Then prefix querying can be used to retrieve any set of multihashes for any bit-length prefix. Also, removing an individual value can be done by specifying all the bits that make its key unique.

I have not thought through what is the best way to represent the key, but I think there is probably a way to store multihashes in a way that allows an arbitrary length prefix to look up all multihashes that have those prefix bits.

@guillaumemichel
Copy link
Collaborator Author

guillaumemichel commented Jul 14, 2025

I think the storage strategy could be improved. Currently a whole list of multihahses is stored under a prefix key. This means that is a CID is added or removed, then the whole buffer containing the list of multihashes must be loaded, modified, and then saved again. Also, if the number of prefix bits changes, then all the data must be remapped to new keys.

I also share these concerns. @gammazero If you have an idea on how to improve it, feel free to take over the PR.

The multihashes stored in the KeyStore must be indexed by their associated bit256.Key, which is the Kademlia identifier. This identifier correspond to the hash digest of the multihash (see the conversion func).

In practice, we only perform prefix queries, so when reading, it makes sense to have multihashes grouped by prefix. However, as you mentioned, grouping like in the current implementation is bad for writing.

@guillaumemichel
Copy link
Collaborator Author

We may want to store a timestamp to remember when the multihash was reprovided for the last time (as suggested in this comment).

We don't need to do it now, but it would be great if it is easy to add later on.

@guillaumemichel
Copy link
Collaborator Author

Opened #1112 to track improvement opportunity.

Merging this PR to the provider branch after discussion with @gammazero.

@guillaumemichel guillaumemichel merged commit 7069b91 into provider Jul 23, 2025
7 checks passed
guillaumemichel added a commit that referenced this pull request Aug 19, 2025
* keystore
* renamed prefixLen to prefixBits
* remove long lived context from KeyStore
guillaumemichel added a commit to guillaumemichel/go-libp2p-kad-dht that referenced this pull request Sep 17, 2025
* keystore
* renamed prefixLen to prefixBits
* remove long lived context from KeyStore
guillaumemichel added a commit that referenced this pull request Sep 17, 2025
* keystore
* renamed prefixLen to prefixBits
* remove long lived context from KeyStore
guillaumemichel added a commit that referenced this pull request Sep 18, 2025
* keystore
* renamed prefixLen to prefixBits
* remove long lived context from KeyStore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants