Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 59 additions & 57 deletions content/en/docs/File Format/implementationstatus.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,14 @@ Implementations:

| Data type | C++ | Java | Go | Rust | cuDF |
| ----------------------------------------- | ----- | ----- | ----- | ----- | ----- |
| BOOLEAN | | | | | ✅ |
| INT32 | | | | | ✅ |
| INT64 | | | | | ✅ |
| INT96 (1) | | | | | ✅ |
| FLOAT | | | | | ✅ |
| DOUBLE | | | | | ✅ |
| BYTE_ARRAY | | | | | ✅ |
| FIXED_LEN_BYTE_ARRAY | | | | | ✅ |
| BOOLEAN | | | | | ✅ |
| INT32 | | | | | ✅ |
| INT64 | | | | | ✅ |
| INT96 (1) | | | | | ✅ |
| FLOAT | | | | | ✅ |
| DOUBLE | | | | | ✅ |
| BYTE_ARRAY | | | | | ✅ |
| FIXED_LEN_BYTE_ARRAY | | | | | ✅ |

* \(1) This type is deprecated, but as of 2024 it's common in currently produced parquet files

Expand All @@ -45,64 +45,66 @@ Implementations:

| Data type | C++ | Java | Go | Rust | cuDF |
| ----------------------------------------- | ----- | ----- | ----- | ----- | ----- |
| STRING | | | | | ✅ |
| ENUM | | | | | ❌ |
| UUID | | | | | ❌ |
| 8, 16, 32, 64 bit signed and unsigned INT | | | | | ✅ |
| DECIMAL (INT32) | | | | | ✅ |
| DECIMAL (INT64) | | | | | ✅ |
| DECIMAL (BYTE_ARRAY) | | | | | ✅ |
| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | ✅ |
| DATE | | | | | ✅ |
| TIME (INT32) | | | | | ✅ |
| TIME (INT64) | | | | | ✅ |
| TIMESTAMP (INT64) | | | | | ✅ |
| INTERVAL | | | | | ❌ |
| JSON | | | | | ❌ |
| BSON | | | | | ❌ |
| LIST | | | | | ✅ |
| MAP | | | | | ✅ |
| UNKNOWN (always null) | | | | | ✅ |
| FLOAT16 | | | | | ✅ |
| STRING | ✅ | ✅ | | | ✅ |
| ENUM | ❌ | ✅ | | | ❌ |
| UUID | ❌ | ✅ | | | ❌ |
| 8, 16, 32, 64 bit signed and unsigned INT | ✅ | ✅ | | | ✅ |
| DECIMAL (INT32) | ✅ | ✅ | | | ✅ |
| DECIMAL (INT64) | ✅ | ✅ | | | ✅ |
| DECIMAL (BYTE_ARRAY) | ✅ | ✅ | | | ✅ |
| DECIMAL (FIXED_LEN_BYTE_ARRAY) | ✅ | ✅ | | | ✅ |
| DATE | ✅ | ✅ | | | ✅ |
| TIME (INT32) | ✅ | ✅ | | | ✅ |
| TIME (INT64) | ✅ | ✅ | | | ✅ |
| TIMESTAMP (INT64) | ✅ | ✅ | | | ✅ |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we split this out on the unit as well?

Suggested change
| TIMESTAMP (INT64) ||| | ||
| TIMESTAMP (INT64, MICROS) ||| | ||

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that? Any unsupported unit in parquet-java?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I'm not sure if all the other implementations support nanos since that was added later on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added almost 7 years ago: apache/parquet-format@b879065. We can be explicit if we really see an unsupported implementation then.

| INTERVAL | ✅ | ✅(*)| | | ❌ |
| JSON | ✅ | ✅(*)| | | ❌ |
| BSON | ❌ | ✅(*)| | | ❌ |
| LIST | ✅ | ✅ | | | ✅ |
| MAP | ✅ | ✅ | | | ✅ |
| UNKNOWN (always null) | ✅ | ✅ | | | ✅ |
| FLOAT16 | ✅ | ✅(*)| | | ✅ |

(*): Only supported to use its annotated physical type

### Encodings

| Encoding | C++ | Java | Go | Rust | cuDF |
| ----------------------------------------- | ----- | ----- | ----- | ----- | ----- |
| PLAIN | | | | | ✅ |
| PLAIN_DICTIONARY | | | | | ✅ |
| RLE_DICTIONARY | | | | | ✅ |
| RLE | | | | | ✅ |
| BIT_PACKED (deprecated) | | | | | (R) |
| DELTA_BINARY_PACKED | | | | | ✅ |
| DELTA_LENGTH_BYTE_ARRAY | | | | | ✅ |
| DELTA_BYTE_ARRAY | | | | | ✅ |
| BYTE_STREAM_SPLIT | | | | | ✅ |
| PLAIN | | | | | ✅ |
| PLAIN_DICTIONARY | | | | | ✅ |
| RLE_DICTIONARY | | | | | ✅ |
| RLE | | | | | ✅ |
| BIT_PACKED (deprecated) | | | | | (R) |
| DELTA_BINARY_PACKED | | | | | ✅ |
| DELTA_LENGTH_BYTE_ARRAY | | | | | ✅ |
| DELTA_BYTE_ARRAY | | | | | ✅ |
| BYTE_STREAM_SPLIT | | | | | ✅ |

### Compressions

| Compression | C++ | Java | Go | Rust | cuDF |
| ----------------------------------------- | ----- | ----- | ----- | ----- | ----- |
| UNCOMPRESSED | | | | | ✅ |
| BROTLI | | | | | (R) |
| GZIP | | | | | (R) |
| LZ4 (deprecated) | | | | | ❌ |
| LZ4_RAW | | | | | ✅ |
| LZO | | | | | ❌ |
| SNAPPY | | | | | ✅ |
| ZSTD | | | | | ✅ |
| UNCOMPRESSED | | | | | ✅ |
| BROTLI | | | | | (R) |
| GZIP | | | | | (R) |
| LZ4 (deprecated) | | | | | ❌ |
| LZ4_RAW | | | | | ✅ |
| LZO | | | | | ❌ |
| SNAPPY | | | | | ✅ |
| ZSTD | | | | | ✅ |

### Other format level features

| | C++ | Java | Go | Rust | cuDF |
| ----------------------------------------- | ----- | ----- | ----- | ----- | ----- |
| xxHash-based bloom filters | | | | | (R) |
| Bloom filter length (1) | | | | | (R) |
| Statistics min_value, max_value | | | | | ✅ |
| Page index | | | | | ✅ |
| Page CRC32 checksum | | | | | ❌ |
| Modular encryption | | | | | ❌ |
| Size statistics (2) | | | | | ✅ |
| xxHash-based bloom filters | (R) | | | | (R) |
| Bloom filter length (1) | (R) | | | | (R) |
| Statistics min_value, max_value | | | | | ✅ |
| Page index | | | | | ✅ |
| Page CRC32 checksum | | | | | ❌ |
| Modular encryption | | | | | ❌ |
| Size statistics (2) | | | | | ✅ |


* \(1) In parquet.thrift: ColumnMetaData->bloom_filter_length
Expand All @@ -113,12 +115,12 @@ Implementations:

| Format | C++ | Java | Go | Rust | cuDF |
| -------------------------------------------- | ----- | ----- | ----- | ----- | ----- |
| External column data (1) | | | | | (W) |
| Row group "Sorting column" metadata (2) | | | | | (W) |
| Row group pruning using statistics | | | | | ✅ |
| Row group pruning using bloom filter | | | | | ✅ |
| Reading select columns only | | | | | ✅ |
| Page pruning using statistics | | | | | ❌ |
| External column data (1) | | | | | (W) |
| Row group "Sorting column" metadata (2) | | | | | (W) |
| Row group pruning using statistics | | | | | ✅ |
| Row group pruning using bloom filter | | | | | ✅ |
| Reading select columns only | | | | | ✅ |
| Page pruning using statistics | | | | | ❌ |


* \(1) In parquet.thrift: ColumnChunk->file_path
Expand Down