Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
03c3bd9
Refactors Parquet read path to implement backwards-compatibility rules
liancheng Jul 5, 2015
0525346
Removes old Parquet record converters
liancheng Jul 5, 2015
a74fb2c
More comments
liancheng Jul 5, 2015
bcac49f
Removes the 16-byte restriction of decimals
liancheng Jul 5, 2015
6437d4b
Assembles requested schema from Parquet file schema
liancheng Jul 5, 2015
1781dff
Adds test case for SPARK-8811
liancheng Jul 5, 2015
7fb21f1
Reverts an unnecessary debugging change
liancheng Jul 5, 2015
38fe1e7
Adds explicit return type
liancheng Jul 6, 2015
802cbd7
Fixes bugs related to schema merging and empty requested columns
liancheng Jul 6, 2015
884d3e6
Fixes styling issue and reverts unnecessary changes
liancheng Jul 6, 2015
0cc1b37
Fixes MiMa checks
liancheng Jul 6, 2015
a099d3e
More comments
liancheng Jul 6, 2015
06cfe9d
Adds comments about TimestampType handling
liancheng Jul 6, 2015
13b9121
Adds ParquetAvroCompatibilitySuite
liancheng Jul 7, 2015
440f7b3
Adds generated files to .rat-excludes
liancheng Jul 7, 2015
1d390aa
Adds parquet-thrift compatibility test
liancheng Jul 7, 2015
f2208cd
Adds README.md for Thrift/Avro code generation
liancheng Jul 7, 2015
a8f13bb
Using Parquet writer API to do compatibility tests
liancheng Jul 7, 2015
3d7ab36
Fixes .rat-excludes
liancheng Jul 7, 2015
7946ee1
Fixes Scala styling issues
liancheng Jul 7, 2015
926af87
Simplifies Parquet compatibility test suites
liancheng Jul 8, 2015
598c3e8
Adds extra Maven repo for hadoop-lzo, which is a transitive dependenc…
liancheng Jul 8, 2015
b8c1295
Excludes the whole parquet package from MiMa
liancheng Jul 8, 2015
c6fbc06
Removes WIP file committed by mistake
liancheng Jul 8, 2015
360fe18
Adds ParquetHiveCompatibilitySuite
liancheng Jul 8, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Adds comments about TimestampType handling
  • Loading branch information
liancheng committed Jul 8, 2015
commit 06cfe9de612c41a20e1633513fb0b07be48bc261
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ private[parquet] class CatalystRowConverter(
new CatalystStringConverter(updater)

case TimestampType =>
// TODO Implements `TIMESTAMP_MICROS` once parquet-mr has that.
new PrimitiveConverter {
override def addBinary(value: Binary): Unit = {
assert(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -358,9 +358,24 @@ private[parquet] class CatalystSchemaConverter(
case DateType =>
Types.primitive(INT32, repetition).as(DATE).named(field.name)

// NOTE: !! This timestamp type is not specified in Parquet format spec !!
// However, Impala and older versions of Spark SQL use INT96 to store timestamps with
// nanosecond precision (not TIME_MILLIS or TIMESTAMP_MILLIS described in the spec).
// NOTE: Spark SQL TimestampType is NOT a well defined type in Parquet format spec.
//
// As stated in PARQUET-323, Parquet `INT96` was originally introduced to represent nanosecond
// timestamp in Impala for some historical reasons, it's not recommended to be used for any
// other types and will probably be deprecated in future Parquet format spec. That's the
// reason why Parquet format spec only defines `TIMESTAMP_MILLIS` and `TIMESTAMP_MICROS` which
// are both logical types annotating `INT64`.
//
// Originally, Spark SQL uses the same nanosecond timestamp type as Impala and Hive. Starting
// from Spark 1.5.0, we resort to a timestamp type with 100 ns precision so that we can store
// a timestamp into a `Long`. This design decision is subject to change though, for example,
// we may resort to microsecond precision in the future.
//
// For Parquet, we plan to write all `TimestampType` value as `TIMESTAMP_MICROS`, but it's
// currently not implemented yet because parquet-mr 1.7.0 (the version we're currently using)
// hasn't implemented `TIMESTAMP_MICROS` yet.
//
// TODO Implements `TIMESTAMP_MICROS` once parquet-mr has that.
case TimestampType =>
Types.primitive(INT96, repetition).named(field.name)

Expand Down