-
Notifications
You must be signed in to change notification settings - Fork 139
refactor(metadata): remove newLastColumnID check in AddSchema method #539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @xixipi-lining, sure I'll take a look. |
twuebi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, we have a bug in addSchema, iceberg-java does this:
public Builder addSchema(Schema schema) {
addSchemaInternal(schema, Math.max(lastColumnId, schema.highestFieldId()));
return this;
}
...
private int addSchemaInternal(Schema schema, int newLastColumnId) {
Preconditions.checkArgument(
newLastColumnId >= lastColumnId,
"Invalid last column ID: %s < %s (previous last column ID)",
newLastColumnId,
lastColumnId);iceberg-rust got rid of the check:
pub fn add_schema(mut self, schema: Schema) -> Self {
let new_schema_id = self.reuse_or_create_new_schema_id(&schema);
let schema_found = self.metadata.schemas.contains_key(&new_schema_id);
if schema_found {
if self.last_added_schema_id != Some(new_schema_id) {
self.changes.push(TableUpdate::AddSchema {
schema: schema.clone(),
});
self.last_added_schema_id = Some(new_schema_id);
}
return self;
}
// New schemas might contain only old columns. In this case last_column_id should not be
// reduced.
self.metadata.last_column_id =
std::cmp::max(self.metadata.last_column_id, schema.highest_field_id());
// Set schema-id
let schema = match new_schema_id == schema.schema_id() {
true => schema,
false => schema.with_schema_id(new_schema_id),
};
self.metadata
.schemas
.insert(new_schema_id, schema.clone().into());
self.changes.push(TableUpdate::AddSchema { schema });
self.last_added_schema_id = Some(new_schema_id);
self
}your PR aligns us to iceberg-rust.
|
Shouldn't we (and iceberg-rust) align to iceberg-java as the reference implementation though, which does the check? |
|
According to this PR (apache/iceberg-rust#587) in iceberg-rust, there's a todo in Java to get rid of the last column id argument which makes the check not needed. I'll have to recheck but if my memory serves me right then that parameter was redundant if you checked all call-sites. |
zeroshade
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
This PR removes the newLastColumnID validation check in the MetadataBuilder.AddSchema method. The previous validation required that the new schema's highest field ID must be greater than or equal to the current lastColumnId, but this constraint was incorrect because when columns are deleted from a schema, the HighestFieldID() can legitimately be smaller than the existing lastColumnId.