Skip to content

Conversation

@xixipi-lining
Copy link
Contributor

This PR removes the newLastColumnID validation check in the MetadataBuilder.AddSchema method. The previous validation required that the new schema's highest field ID must be greater than or equal to the current lastColumnId, but this constraint was incorrect because when columns are deleted from a schema, the HighestFieldID() can legitimately be smaller than the existing lastColumnId.

@xixipi-lining
Copy link
Contributor Author

Hi @twuebi
I noticed that PR #495 modified this interface. Could you please help review whether this change is fine?

@twuebi
Copy link
Contributor

twuebi commented Aug 20, 2025

Hi @xixipi-lining, sure I'll take a look.

Copy link
Contributor

@twuebi twuebi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, we have a bug in addSchema, iceberg-java does this:

    public Builder addSchema(Schema schema) {
      addSchemaInternal(schema, Math.max(lastColumnId, schema.highestFieldId()));
      return this;
    }

...

private int addSchemaInternal(Schema schema, int newLastColumnId) {
      Preconditions.checkArgument(
          newLastColumnId >= lastColumnId,
          "Invalid last column ID: %s < %s (previous last column ID)",
          newLastColumnId,
          lastColumnId);

iceberg-rust got rid of the check:

pub fn add_schema(mut self, schema: Schema) -> Self {
        let new_schema_id = self.reuse_or_create_new_schema_id(&schema);
        let schema_found = self.metadata.schemas.contains_key(&new_schema_id);

        if schema_found {
            if self.last_added_schema_id != Some(new_schema_id) {
                self.changes.push(TableUpdate::AddSchema {
                    schema: schema.clone(),
                });
                self.last_added_schema_id = Some(new_schema_id);
            }

            return self;
        }

        // New schemas might contain only old columns. In this case last_column_id should not be
        // reduced.
        self.metadata.last_column_id =
            std::cmp::max(self.metadata.last_column_id, schema.highest_field_id());

        // Set schema-id
        let schema = match new_schema_id == schema.schema_id() {
            true => schema,
            false => schema.with_schema_id(new_schema_id),
        };

        self.metadata
            .schemas
            .insert(new_schema_id, schema.clone().into());

        self.changes.push(TableUpdate::AddSchema { schema });

        self.last_added_schema_id = Some(new_schema_id);

        self
    }

your PR aligns us to iceberg-rust.

@zeroshade
Copy link
Member

Shouldn't we (and iceberg-rust) align to iceberg-java as the reference implementation though, which does the check?

@twuebi
Copy link
Contributor

twuebi commented Aug 21, 2025

According to this PR (apache/iceberg-rust#587) in iceberg-rust, there's a todo in Java to get rid of the last column id argument which makes the check not needed.

I'll have to recheck but if my memory serves me right then that parameter was redundant if you checked all call-sites.

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@zeroshade zeroshade merged commit 59b611a into apache:main Aug 21, 2025
11 checks passed
@xixipi-lining xixipi-lining deleted the fix-add_schema branch August 25, 2025 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants