Skip to content

Embedded DB support for enrichment tables #24476

@rohitmanohar

Description

@rohitmanohar

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

I need to enrich logs and the enrichment data I have is quite large ~100GB. It's in a key value format.

Attempted Solutions

The existing supported enrichment tables - csv, mmdb aren't a good fit for this use on two dimensions

  1. Both tables load the entire dataset in memory. For large datasets (~100GB) this becomes impractical. There may be an opportunity to do mmap, but it will bring in some complexity (likely some unsafe).

  2. The underlying file formats don't support general purpose indexes for efficient lookups. While mmdb does have an index it's a specialized one for IP addresses (patricia trie), not for generic keys.

Proposal

Add enrichment tables for a general purpose key value db. As a starting point, I am considering these two candidates

A. redb

Pros: Pure rust, with minimal unsafe (I suspect can't escape for mmap)
Cons: Not a widely adopted format, tooling is specific to Rust

B. sqlite

Pros: Universally known and adopted storage engine, extensive ecosystem
Cons: Brings in compilation baggage

Another flavor of B is limbo, but it's still in Beta.

References

No response

Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: featureA value-adding code addition that introduce new functionality.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions