Skip to content

Conversation

@ptkool
Copy link
Contributor

@ptkool ptkool commented Nov 11, 2019

No description provided.

@cla-bot cla-bot bot added the cla-signed label Nov 11, 2019
@findepi
Copy link
Member

findepi commented Nov 11, 2019

@ptkool thanks!
looks reasonable.
please add a test in TestTupleDomainParquetPredicate.

i still need to check whether min/max is always in microseconds.

@ryanrupp
Copy link
Member

ryanrupp commented Nov 14, 2019

I think you will have to inspect the logical type to determine whether its milliseconds or microseconds (think limited range nano may be supported as well?) on a long. See the PR comments here unfortunately I rebased/forced push out the timestamp changes but the discussion there is probably applicable. Two things:

  1. Considering logical types to determine how to interpret the value (millis vs micros vs nanos?)
  2. See @findepi following comment in the linked PR about timestamp semantics in Presto and if that matters here (this was why the change was originally backed out).

@ryanrupp
Copy link
Member

ryanrupp commented Nov 14, 2019

Looks like the code has changed around slightly since I looked at this last but you'll have to pass along the RichColumnDescriptor (which has Parquet column info) then you can inspect richColumnDescriptor.getPrimitiveType().getOriginalType(). Also I think getOriginalType() can return null (not sure if this is just due to older versions that didn't populate this, forget the exact scenario where this could happen).

@ptkool ptkool force-pushed the use_timestamp_statistics branch from 3d34313 to 370cd61 Compare November 17, 2019 13:28
ParquetIntegerStatistics parquetIntegerStatistics =
statistics.type().getOriginalType() == TIMESTAMP_MICROS
? new ParquetIntegerStatistics(MICROSECONDS.toMillis(min), MICROSECONDS.toMillis(max))
: new ParquetIntegerStatistics(min, max);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TIMESTAMP_MILLIS I think do an explicit check here on original type because it looks like timestamp nanosecond support was added in apache/parquet-java#519 and it's using the new LogicalTypeAnnotation which will replace OriginalType in the next release of parquet-mr (see apache/parquet-java#463). So to be forward compatible if the check is explicit here it won't accidentally go into this path if a newer file is using nanoseconds.

@tooptoop4
Copy link
Contributor

are you still working on this @ptkool ?

@bitsondatadev
Copy link
Member

👋 @ptkool - this PR has become inactive. If you're still interested in working on it, please let us know, and we can try to get reviewers to help with that.
We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

@bitsondatadev
Copy link
Member

@ryanrupp and @ptkool, is this a duplicate of #4104?

@ryanrupp
Copy link
Member

ryanrupp commented Oct 7, 2022

@bitsondatadev yes I believe so

@mosabua
Copy link
Member

mosabua commented Oct 12, 2022

Closing as confirmed duplicate.

@mosabua mosabua closed this Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants