-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Use Cases
I need to enrich logs and the enrichment data I have is quite large ~100GB. It's in a key value format.
Attempted Solutions
The existing supported enrichment tables - csv, mmdb aren't a good fit for this use on two dimensions
-
Both tables load the entire dataset in memory. For large datasets (~100GB) this becomes impractical. There may be an opportunity to do
mmap, but it will bring in some complexity (likely someunsafe). -
The underlying file formats don't support general purpose indexes for efficient lookups. While
mmdbdoes have an index it's a specialized one for IP addresses (patricia trie), not for generic keys.
Proposal
Add enrichment tables for a general purpose key value db. As a starting point, I am considering these two candidates
A. redb
Pros: Pure rust, with minimal unsafe (I suspect can't escape for mmap)
Cons: Not a widely adopted format, tooling is specific to Rust
B. sqlite
Pros: Universally known and adopted storage engine, extensive ecosystem
Cons: Brings in compilation baggage
Another flavor of B is limbo, but it's still in Beta.
References
No response
Version
No response