Skip to content

Conversation

@nandorKollar
Copy link
Contributor

This PR introduces nanosecond precision time and timestamp types to parquet-mr. It depends on apache/parquet-format#102, should be committed once this format change is released.

Nanosecond precision is introduced only into the new logical type API, the old version (ConvertedType in format and OriginalType in mr) doesn't have a corresponding enum value for nanoseconds. Therefore in the Thrift schema these fields will be null, only new releases can interpret nanosecond precision, older readers can only see the physical type.

In addition to nanosecond precision, I also refactored the modules to use the new logical type API for internal decisions (e.g.: replaced the OriginalType-based switch cases, for example in type builder when checking if the proper annotation is present on physical type). This part (
commit with title: Refactor modules to use the new logical type API) doesn't need new format release, and can be split out from this PR if needed.

@gszadovszky
Copy link
Contributor

I would suggest implement the refactor part as a separate JIRA. It is more a follow-up task for PARQUET-1253.
It also can be committed separately (as it does not require any format change).

@nandorKollar
Copy link
Contributor Author

Ok, created a separate PR for the refactor: #520

"TIMESTAMP_NANOS_STRINGIFIER", "yyyy-MM-dd'T'HH:mm:ss.SSS") {
@Override
public String stringify(long value) {
return super.stringify(value) + String.format("%06d", Math.abs(value % 1000_000));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: 1_000_000 (also a few lines below)

cal.clear();
cal.set(2053, Calendar.JULY, 10, 22, 13, 24);
cal.set(Calendar.MILLISECOND, 84);
long micros = cal.getTimeInMillis() * 1000_000 + 1900;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename the variable to nanos and use 1_000_000 instead of 1000_000.

cal.set(1848, Calendar.MARCH, 15, 9, 23, 59);
cal.set(Calendar.MILLISECOND, 765);
micros = cal.getTimeInMillis() * 1000_000 - 1;
assertEquals("1848-03-15T09:23:59.765000001", stringifier.stringify(micros));
Copy link
Contributor

@zivanfi zivanfi Oct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp from the calendar is 1848-03-15T09:23:59.765. You subtract 1 nanosec from it, so it should be 1 nanosec earlier than 1848-03-15T09:23:59.765, which is 1848-03-15T09:23:59.764999999 and not 1848-03-15T09:23:59.765000001. Why does this pass?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants