Skip to content

Conversation

@ethan-tyler
Copy link

@ethan-tyler ethan-tyler commented Jan 5, 2026

Summary

Adds delete file support to SnapshotProducer, enabling atomic commits of position and equality delete files alongside data files.

Changes

  • Accept delete files via new added_delete_files field
  • Validate delete files: V2+ only, correct content type, equality_ids for equality deletes, partition spec compatibility
  • Write delete manifests with ManifestContentType::Deletes
  • Populate snapshot summary with delete file metrics
  • Detect duplicates across both data and delete files (including cross-type collisions)

Tests

  • Validation: V1 rejection, content type, equality_ids requirements, position delete constraints
  • Duplicate detection: cross-type duplicates (same path as data and delete)
  • Preconditions: empty delete files error

Motivation

Foundation for RowDelta, Delete, and Overwrite operations that atomically commit data and delete files in a single snapshot.

Limitations

Delete files must use the default partition spec (manifest-per-spec grouping deferred)

Related:

@ethan-tyler ethan-tyler marked this pull request as draft January 5, 2026 01:58
@ethan-tyler ethan-tyler changed the title feat(iceberg): add delete file support to SnapshotProducer feat: add delete file support to SnapshotProducer Jan 5, 2026
@CTTY
Copy link
Collaborator

CTTY commented Jan 5, 2026

Hi @ethan-tyler , before you digging too deep, here is an exisiting PR(#1606) that covers some changes here already. I'm planning to open an epic issue to track this effort in a more organized fashion

This enables SnapshotProducer to accept and process delete files
(position deletes and equality deletes) alongside data files.

Changes:
- Add added_delete_files field to SnapshotProducer
- Add validate_added_delete_files() for delete file validation:
  - Rejects delete files in V1 format
  - Validates content types (PositionDeletes, EqualityDeletes)
  - Requires equality_ids for equality delete files
  - Validates partition spec compatibility
- Add write_delete_manifest() to write delete manifests with
  ManifestContentType::Deletes
- Update manifest_file() to include delete manifests
- Update summary() to populate delete file metrics
- Enhance validate_duplicate_files() for both data and delete files
- Add comprehensive unit tests

This lays the groundwork for operations like RowDelta that need to
atomically commit both data files and delete files.
…pend

- Add cross-type duplicate check: reject same path in data and delete files
- Reject equality_ids on position delete files (spec compliance)
- Remove unreachable V1 code path in write_delete_manifest
- Add TODO for partition spec validation strictness (partition evolution)
- Wire validate_added_delete_files into FastAppendAction
- Add tests for cross-type duplicates and position delete validation
- Apply rustfmt formatting
@ethan-tyler ethan-tyler force-pushed the feat/snapshot-producer-delete-files branch from 39debc5 to 26003b0 Compare January 5, 2026 06:12
@ethan-tyler
Copy link
Author

Hi @ethan-tyler , before you digging too deep, here is an exisiting PR(#1606) that covers some changes here already. I'm planning to open an epic issue to track this effort in a more organized fashion

Hey @CTTY - thanks for the heads up, I looked at #1606 and there's definitely overlap in the delete file handling.

This PR is intentionally focused on just the SnapshotProducer piece with validation. It could serve as a clean base that #1606 rebases onto, and it's intended to unlock the DML actions.

Happy to sync on how this fits with your epic and help out where needed. I'm planning to continue with RowDelta, Delete, and Overwrite after this feature lands.

Let me know if I should hold off or if landing this first helps.

// TODO: This validation is too strict for partition evolution scenarios where delete
// files may reference older partition specs. Once manifest-per-spec is implemented,
// relax this to check that the spec_id exists rather than matching the default.
if self.table.metadata().default_partition_spec_id() != delete_file.partition_spec_id {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should a ticket be filed for this TODO? Not sure if one already exists.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will file one once this PR is ready to merge. Waiting to hear from @CTTY on how this fits with the DML epic and next steps.

@ethan-tyler
Copy link
Author

Hey @CTTY — going to mark this ready for review to keep things moving. Happy to coordinate if you want to rebase #1606 on top if this lands, or collaborate on the DML epic. Let me know.

@ethan-tyler ethan-tyler marked this pull request as ready for review January 6, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants