apache · alippai · Jun 11, 2023 · wgtmac · Jun 12, 2023 · tustvold
diff --git a/docs/source/status.rst b/docs/source/status.rst
@@ -348,3 +348,107 @@ Notes:
 * \(1) Through JNI bindings. (Provided by ``org.apache.arrow.orc:arrow-orc``)
 
 * \(2) Through JNI bindings to Arrow C++ Datasets. (Provided by ``org.apache.arrow:arrow-dataset``)
+
+
+Parquet format public API details
+=================================
+
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Format                                    | C++   | Python | Java   | Go    | Rust  |
+|                                           |       |        |        |       |       |
++===========================================+=======+========+========+=======+=======+
+| Basic compression                         |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Brotli, LZ4, ZSTD                         |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| LZ4_RAW                                   |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Hive-style partitioning                   |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| File metadata                             |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RowGroup metadata                         |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Column metadata                           |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Chunk metadta                             |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Sorting column                            |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| ColumnIndex statistics                    |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page statistics                           |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Statistics min_value                      |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| xxHash based bloom filter                 |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| bloom filter length                       |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Modular encryption                        |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| External column data                      |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Nanosecond support                        |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| FIXED_LEN_BYTE_ARRAY                      |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Complete Delta encoding support           |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Complete RLE support                      |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| BYTE_STREAM_SPLIT                         |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Partition pruning on the partition column |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RowGroup pruning using statistics         |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RowGroup pruning using bloom filter       |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page pruning using projection pushdown    |       |        |        |       |       |
-| Page pruning using projection pushdown    |       |        |        |       |       |
+| Column Pruning using projection pushdown    |       |        |        |       |       |
-| Page pruning using projection pushdown    |       |        |        |       |       |
+| Column Pruning using projection pushdown    |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page pruning using statistics             |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page pruning using bloom filter           |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Partition append / delete                 |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RowGroup append / delete                  |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page append / delete                      |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Page CRC32 checksum                       |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Parallel partition processing             |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Parallel RowGroup processing              |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Parallel Page processing                  |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Storage-aware defaults (1)                |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Adaptive concurrency (2)                  |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Adaptive IO when pruning used (3)         |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| Arrow schema metadata (4)                 |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+| RLE / REE support (5)                     |       |        |        |       |       |
++-------------------------------------------+-------+--------+--------+-------+-------+
+
+
+Notes:
+
+* *R* = Read supported
+
+* *W* = Write supported
+
+* \(1) In-memory or memory mapped files, SSD direct IO, HDD, NFS, local and remote S3 all need different concurrency and buffer size setups
+
+* \(2) Depending on the encoding, compression and row group sizes different task sizes might be ideal
+
+* \(3) Automatic balancing of the prefetched / block reading and the Page pruning
+
+* \(4) By default, the Arrow schema is serialized and stored in the Parquet file metadata (in the “ARROW:schema” key). When reading the file, if this key is available, it will be used to more faithfully recreate the original Arrow data.
+
+* \(5) Parquet supports RLE encoding of dictionary _data_. Reading and writing a similar structure (eg. Arrow REE) without allocating the expanded values might be supported in different implementations