Skip to content

block: assume that we do not unmark a block for deletion#8464

Merged
GiedriusS merged 1 commit intomainfrom
assume_unmark_upstream
Sep 3, 2025
Merged

block: assume that we do not unmark a block for deletion#8464
GiedriusS merged 1 commit intomainfrom
assume_unmark_upstream

Conversation

@GiedriusS
Copy link
Member

Just like we assume that the meta.json file doesn't change, let's also assume that we do not unmark a block for deletion.

This solves a critical issue in Thanos Store where there is a race between deletion in compactor and the loading of a block:

  • Deletion starts from meta.json, deletion marker are deleted at the end
  • Store sees that block, loads it by using the local in-memory and disk cache
  • By the time the deletion marker filtering functions is executed, the marker is deleted by Compactor
  • Store happily tries to load that block. If lazy index headers are enabled then this is even worse because the 404s appear during querying.

The root cause is that we are doing listing & checking markers in two or more separate steps. Since that is inevitable, we need to assume that the marker won't disappear until the block is there. This is the case when everything is working normally.

@GiedriusS GiedriusS force-pushed the assume_unmark_upstream branch from d243dd2 to 3d2baf0 Compare September 2, 2025 15:12
@pull-request-size pull-request-size bot added size/L and removed size/M labels Sep 2, 2025
@GiedriusS GiedriusS force-pushed the assume_unmark_upstream branch 2 times, most recently from 6bb6f70 to c7411f5 Compare September 3, 2025 08:04
@pull-request-size pull-request-size bot added size/M and removed size/L labels Sep 3, 2025
Just like we assume that the meta.json file doesn't change, let's also
assume that we do not unmark a block for deletion.

This solves a critical issue in Thanos Store where there is a race
between deletion in compactor and the loading of a block:
- Deletion starts from meta.json, deletion marker are deleted at the end
- Store sees that block, loads it by using the local in-memory and disk
  cache
- By the time the deletion marker filtering functions is executed, the
  marker is deleted by Compactor
- Store happily tries to load that block

The root cause is that we are doing listing & checking markers in two or
more separate steps. Since that is inevitable, we need to assume that
the marker won't disappear until the block is there. This is the
case when everything is working normally.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@GiedriusS GiedriusS force-pushed the assume_unmark_upstream branch from c7411f5 to 6e231d0 Compare September 3, 2025 08:34
@GiedriusS GiedriusS merged commit cafefb8 into main Sep 3, 2025
45 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants