Skip to content

block/compact: rework consistency check, make writers only write#8401

Merged
GiedriusS merged 2 commits intomainfrom
rework_only_write
Jul 31, 2025
Merged

block/compact: rework consistency check, make writers only write#8401
GiedriusS merged 2 commits intomainfrom
rework_only_write

Conversation

@GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented Jul 30, 2025

  • It's weird that on upload errors, we try to clean everything and only then write again. It's an extra operation we don't need since whether a block exists or not hinges on the existence of meta.json. We don't need to delete old, same objects before trying to upload them again.
  • Consequently, we need to always use the upload time, not block creation time when checking for consistency or when deleting partially uploaded blocks. Directories as such don't exist in object storages, it's a client-side "illusion", so we need to iterate through the partial block's directory to fetch the last modified date.
  • Finally, it's a good principle in general to make writers only write - this makes it easier to reason about how the whole system works!

Based on old #2883, #2585

@GiedriusS GiedriusS force-pushed the rework_only_write branch from 6666f78 to e06bf51 Compare July 30, 2025 10:04
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jul 30, 2025
@GiedriusS GiedriusS marked this pull request as ready for review July 30, 2025 10:29
@GiedriusS GiedriusS force-pushed the rework_only_write branch from e06bf51 to 18c9974 Compare July 30, 2025 10:29
- It's weird that on upload errors, we try to clean everything and only
  then write again. It's an extra operation we don't need since whether
  a block exists or not hinges on the existence of meta.json. We don't
  need to delete old, same files before trying to upload them again.
- Consequently, we need to always use the _upload_ time, not block
  creation time when checking for consistency or when deleting partially
  uploaded blocks. Directories as such don't exist in object storages,
  it's a client-side "illusion", so we need to iterate through the
  partial block's directory to fetch the last modified date.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@GiedriusS GiedriusS force-pushed the rework_only_write branch from 18c9974 to d42ecb5 Compare July 30, 2025 11:05
MichaHoffmann
MichaHoffmann previously approved these changes Jul 30, 2025
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@GiedriusS GiedriusS force-pushed the rework_only_write branch from b72f4da to 57031c7 Compare July 31, 2025 07:01
@GiedriusS GiedriusS merged commit 1e740e3 into main Jul 31, 2025
21 of 22 checks passed
@GiedriusS GiedriusS deleted the rework_only_write branch July 31, 2025 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants