diff --git a/content/en/docs/File Format/implementationstatus.md b/content/en/docs/File Format/implementationstatus.md index 8b32876b..3bd1d23d 100644 --- a/content/en/docs/File Format/implementationstatus.md +++ b/content/en/docs/File Format/implementationstatus.md @@ -29,14 +29,14 @@ Implementations: | Data type | C++ | Java | Go | Rust | cuDF | | ----------------------------------------- | ----- | ----- | ----- | ----- | ----- | -| BOOLEAN | | | | | ✅ | -| INT32 | | | | | ✅ | -| INT64 | | | | | ✅ | -| INT96 (1) | | | | | ✅ | -| FLOAT | | | | | ✅ | -| DOUBLE | | | | | ✅ | -| BYTE_ARRAY | | | | | ✅ | -| FIXED_LEN_BYTE_ARRAY | | | | | ✅ | +| BOOLEAN | ✅ | ✅ | | | ✅ | +| INT32 | ✅ | ✅ | | | ✅ | +| INT64 | ✅ | ✅ | | | ✅ | +| INT96 (1) | ✅ | ✅ | | | ✅ | +| FLOAT | ✅ | ✅ | | | ✅ | +| DOUBLE | ✅ | ✅ | | | ✅ | +| BYTE_ARRAY | ✅ | ✅ | | | ✅ | +| FIXED_LEN_BYTE_ARRAY | ✅ | ✅ | | | ✅ | * \(1) This type is deprecated, but as of 2024 it's common in currently produced parquet files @@ -45,64 +45,66 @@ Implementations: | Data type | C++ | Java | Go | Rust | cuDF | | ----------------------------------------- | ----- | ----- | ----- | ----- | ----- | -| STRING | | | | | ✅ | -| ENUM | | | | | ❌ | -| UUID | | | | | ❌ | -| 8, 16, 32, 64 bit signed and unsigned INT | | | | | ✅ | -| DECIMAL (INT32) | | | | | ✅ | -| DECIMAL (INT64) | | | | | ✅ | -| DECIMAL (BYTE_ARRAY) | | | | | ✅ | -| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | ✅ | -| DATE | | | | | ✅ | -| TIME (INT32) | | | | | ✅ | -| TIME (INT64) | | | | | ✅ | -| TIMESTAMP (INT64) | | | | | ✅ | -| INTERVAL | | | | | ❌ | -| JSON | | | | | ❌ | -| BSON | | | | | ❌ | -| LIST | | | | | ✅ | -| MAP | | | | | ✅ | -| UNKNOWN (always null) | | | | | ✅ | -| FLOAT16 | | | | | ✅ | +| STRING | ✅ | ✅ | | | ✅ | +| ENUM | ❌ | ✅ | | | ❌ | +| UUID | ❌ | ✅ | | | ❌ | +| 8, 16, 32, 64 bit signed and unsigned INT | ✅ | ✅ | | | ✅ | +| DECIMAL (INT32) | ✅ | ✅ | | | ✅ | +| DECIMAL (INT64) | ✅ | ✅ | | | ✅ | +| DECIMAL (BYTE_ARRAY) | ✅ | ✅ | | | ✅ | +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | ✅ | ✅ | | | ✅ | +| DATE | ✅ | ✅ | | | ✅ | +| TIME (INT32) | ✅ | ✅ | | | ✅ | +| TIME (INT64) | ✅ | ✅ | | | ✅ | +| TIMESTAMP (INT64) | ✅ | ✅ | | | ✅ | +| INTERVAL | ✅ | ✅(*)| | | ❌ | +| JSON | ✅ | ✅(*)| | | ❌ | +| BSON | ❌ | ✅(*)| | | ❌ | +| LIST | ✅ | ✅ | | | ✅ | +| MAP | ✅ | ✅ | | | ✅ | +| UNKNOWN (always null) | ✅ | ✅ | | | ✅ | +| FLOAT16 | ✅ | ✅(*)| | | ✅ | + +(*): Only supported to use its annotated physical type ### Encodings | Encoding | C++ | Java | Go | Rust | cuDF | | ----------------------------------------- | ----- | ----- | ----- | ----- | ----- | -| PLAIN | | | | | ✅ | -| PLAIN_DICTIONARY | | | | | ✅ | -| RLE_DICTIONARY | | | | | ✅ | -| RLE | | | | | ✅ | -| BIT_PACKED (deprecated) | | | | | (R) | -| DELTA_BINARY_PACKED | | | | | ✅ | -| DELTA_LENGTH_BYTE_ARRAY | | | | | ✅ | -| DELTA_BYTE_ARRAY | | | | | ✅ | -| BYTE_STREAM_SPLIT | | | | | ✅ | +| PLAIN | ✅ | ✅ | | | ✅ | +| PLAIN_DICTIONARY | ✅ | ✅ | | | ✅ | +| RLE_DICTIONARY | ✅ | ✅ | | | ✅ | +| RLE | ✅ | ✅ | | | ✅ | +| BIT_PACKED (deprecated) | ✅ | ✅ | | | (R) | +| DELTA_BINARY_PACKED | ✅ | ✅ | | | ✅ | +| DELTA_LENGTH_BYTE_ARRAY | ✅ | ✅ | | | ✅ | +| DELTA_BYTE_ARRAY | ✅ | ✅ | | | ✅ | +| BYTE_STREAM_SPLIT | ✅ | ✅ | | | ✅ | ### Compressions | Compression | C++ | Java | Go | Rust | cuDF | | ----------------------------------------- | ----- | ----- | ----- | ----- | ----- | -| UNCOMPRESSED | | | | | ✅ | -| BROTLI | | | | | (R) | -| GZIP | | | | | (R) | -| LZ4 (deprecated) | | | | | ❌ | -| LZ4_RAW | | | | | ✅ | -| LZO | | | | | ❌ | -| SNAPPY | | | | | ✅ | -| ZSTD | | | | | ✅ | +| UNCOMPRESSED | ✅ | ✅ | | | ✅ | +| BROTLI | ✅ | ✅ | | | (R) | +| GZIP | ✅ | ✅ | | | (R) | +| LZ4 (deprecated) | ✅ | ❌ | | | ❌ | +| LZ4_RAW | ✅ | ✅ | | | ✅ | +| LZO | ❌ | ❌ | | | ❌ | +| SNAPPY | ✅ | ✅ | | | ✅ | +| ZSTD | ✅ | ✅ | | | ✅ | ### Other format level features | | C++ | Java | Go | Rust | cuDF | | ----------------------------------------- | ----- | ----- | ----- | ----- | ----- | -| xxHash-based bloom filters | | | | | (R) | -| Bloom filter length (1) | | | | | (R) | -| Statistics min_value, max_value | | | | | ✅ | -| Page index | | | | | ✅ | -| Page CRC32 checksum | | | | | ❌ | -| Modular encryption | | | | | ❌ | -| Size statistics (2) | | | | | ✅ | +| xxHash-based bloom filters | (R) | ✅ | | | (R) | +| Bloom filter length (1) | (R) | ✅ | | | (R) | +| Statistics min_value, max_value | ✅ | ✅ | | | ✅ | +| Page index | ✅ | ✅ | | | ✅ | +| Page CRC32 checksum | ✅ | ✅ | | | ❌ | +| Modular encryption | ✅ | ✅ | | | ❌ | +| Size statistics (2) | ✅ | ✅ | | | ✅ | * \(1) In parquet.thrift: ColumnMetaData->bloom_filter_length @@ -113,12 +115,12 @@ Implementations: | Format | C++ | Java | Go | Rust | cuDF | | -------------------------------------------- | ----- | ----- | ----- | ----- | ----- | -| External column data (1) | | | | | (W) | -| Row group "Sorting column" metadata (2) | | | | | (W) | -| Row group pruning using statistics | | | | | ✅ | -| Row group pruning using bloom filter | | | | | ✅ | -| Reading select columns only | | | | | ✅ | -| Page pruning using statistics | | | | | ❌ | +| External column data (1) | ✅ | ✅ | | | (W) | +| Row group "Sorting column" metadata (2) | ✅ | ❌ | | | (W) | +| Row group pruning using statistics | ❌ | ✅ | | | ✅ | +| Row group pruning using bloom filter | ❌ | ✅ | | | ✅ | +| Reading select columns only | ✅ | ✅ | | | ✅ | +| Page pruning using statistics | ❌ | ✅ | | | ❌ | * \(1) In parquet.thrift: ColumnChunk->file_path