-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-1335: Logical type names in parquet-mr are not consistent with parquet-format #496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gszadovszky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| String message = | ||
| "message StringMessage {\n" + | ||
| " required binary string (UTF8);\n" + | ||
| " required binary string (STRING);\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test change needed?
|
@gszadovszky, I think we might need to remove this commit. It looks like this changes the Parquet schema format. Is that correct? |
|
@rdblue, this is not a breaking change but introducing the new logical types. Both |
|
@rdblue like @gszadovszky told, the change shouldn't be a breaking change, since the "old" types are still honored. Though the change I did on the test is misleading, I should have added a new test case for STRING and leave UTF8 untouched. Created a new PR #503. |
|
Thanks, it's good to hear the old types still work. Since the new logical type code changes schema serialization, is this a forward-incompatible change? Will old readers still be able to read files written after this change? |
|
The new API writes both logicalType and converted_type fields for each SchemaElement. Therefore old readers, which only know about converted_type will be able to read files written by new writers. What old parquet versions won't be able to interpret is the changes in the schema language, the text representation parseable by MessageParser. New logical types, like timestamps have new type parameters, which the old parser can't parse. Fortunately - as far as I know - the text schema representation is not written into the file, thus the files written by new writer should be readable by old readers. @rdblue does this answer your concern? |
…h parquet-format (apache#496)
No description provided.