PARQUET-1335: Logical type names in parquet-mr are not consistent with parquet-format #496

nandorKollar · 2018-06-22T12:16:05Z

No description provided.

…h parquet-format

gszadovszky

+1

rdblue · 2018-07-03T22:27:18Z

parquet-column/src/test/java/org/apache/parquet/parser/TestParquetParser.java

    String message =
        "message StringMessage {\n" +
-        "  required binary string (UTF8);\n" +
+        "  required binary string (STRING);\n" +


Why is this test change needed?

rdblue · 2018-07-03T22:27:26Z

@gszadovszky, I think we might need to remove this commit. It looks like this changes the Parquet schema format. Is that correct?

gszadovszky · 2018-07-04T07:45:02Z

@rdblue, this is not a breaking change but introducing the new logical types. Both UTF8 and STRING should work in the schema definition. See PR#463 for the whole change.
Though, you are right that this actual test should not have been modified to keep UTF8 tested. I don't think we need to revert this change, we can modify the tests in an additional commit.
@nandorKollar, could you please modify the tests so both the original type UTF8 and the new logical type STRING are tested? Please also make sure that all the other original types and logical types are tested in the schema definition language.

nandorKollar · 2018-07-04T08:35:04Z

@rdblue like @gszadovszky told, the change shouldn't be a breaking change, since the "old" types are still honored. Though the change I did on the test is misleading, I should have added a new test case for STRING and leave UTF8 untouched. Created a new PR #503.

rdblue · 2018-07-04T18:01:36Z

Thanks, it's good to hear the old types still work. Since the new logical type code changes schema serialization, is this a forward-incompatible change? Will old readers still be able to read files written after this change?

nandorKollar · 2018-07-04T19:57:53Z

The new API writes both logicalType and converted_type fields for each SchemaElement. Therefore old readers, which only know about converted_type will be able to read files written by new writers.

What old parquet versions won't be able to interpret is the changes in the schema language, the text representation parseable by MessageParser. New logical types, like timestamps have new type parameters, which the old parser can't parse. Fortunately - as far as I know - the text schema representation is not written into the file, thus the files written by new writer should be readable by old readers. @rdblue does this answer your concern?

…h parquet-format (apache#496)

PARQUET-1335: Logical type names in parquet-mr are not consistent wit…

eee3906

…h parquet-format

gszadovszky approved these changes Jun 22, 2018

View reviewed changes

gszadovszky merged commit 33ee549 into apache:master Jun 25, 2018

rdblue reviewed Jul 3, 2018

View reviewed changes

ghost pushed a commit to RMS/parquet-mr that referenced this pull request Aug 18, 2018

PARQUET-1335: Logical type names in parquet-mr are not consistent wit…

96a7c21

…h parquet-format (apache#496)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PARQUET-1335: Logical type names in parquet-mr are not consistent with parquet-format #496

PARQUET-1335: Logical type names in parquet-mr are not consistent with parquet-format #496

Uh oh!

nandorKollar commented Jun 22, 2018

Uh oh!

gszadovszky left a comment

Uh oh!

rdblue Jul 3, 2018

Uh oh!

rdblue commented Jul 3, 2018

Uh oh!

gszadovszky commented Jul 4, 2018

Uh oh!

nandorKollar commented Jul 4, 2018

Uh oh!

rdblue commented Jul 4, 2018

Uh oh!

nandorKollar commented Jul 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PARQUET-1335: Logical type names in parquet-mr are not consistent with parquet-format #496

PARQUET-1335: Logical type names in parquet-mr are not consistent with parquet-format #496

Uh oh!

Conversation

nandorKollar commented Jun 22, 2018

Uh oh!

gszadovszky left a comment

Choose a reason for hiding this comment

Uh oh!

rdblue Jul 3, 2018

Choose a reason for hiding this comment

Uh oh!

rdblue commented Jul 3, 2018

Uh oh!

gszadovszky commented Jul 4, 2018

Uh oh!

nandorKollar commented Jul 4, 2018

Uh oh!

rdblue commented Jul 4, 2018

Uh oh!

nandorKollar commented Jul 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants