Skip to content

Compaction: Common prefix listing on a branch is linear in # of uncommitted entries #2092

@ozkatz

Description

@ozkatz

common prefix listing that includes uncommitted changes on a branch (i.e. first screen seen in the UI when clicking on a repo) is actually o(n) where n is the amount of uncommitted changes.

The problem here is recognizing common prefixes that were deleted.
Given an underlying commit and a list of uncommitted change, the way to tell whether a common prefix exists, is by making sure not all entries within it are tombstoned in the list of uncommitted changes.

See this: https://github.com/treeverse/lakeFS/blob/v0.42.0/pkg/graveler/combined_iterator.go#L87
This is the iterator that combines a set of changes (iterA) and an underlying commit (iterB). Assuming prefix "a/" contains 1m entries in the underlying commit, and 999,999 tombstones in the diff, this iterator will scan all of them when calling Next().

Not sure if current data model allows doing anything better, but probably worth exploring and understanding the tradeoff of doing something "better".

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1bugSomething isn't workingno staleUsing this label will prevent items from being marked as staleperformanceteam/versioning-engineTeam versioning engine

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions