Skip to content

BUG: writing categorical of intervals (pd.cut() output) to Parquet is not supported #62235

@jorisvandenbossche

Description

@jorisvandenbossche

This might eventually be something to support in PyArrow, but I think it is good to have an issue about this on the pandas side as well (and I was surprised to not find an existing one).

When trying the write the result of pd.cut(), i.e. which returns a column of categorical dtype with Interval categories, you get the following error:

>>> df = pd.DataFrame({"col": np.random.randn(100)})
>>> df["bins"] = pd.cut(df["col"], bins=10)
>>> df.to_parquet("test_category_interval.parquet")
...
File ~/conda/envs/dev/lib/python3.11/site-packages/pyarrow/parquet/core.py:1115, in ParquetWriter.write_table(self, table, row_group_size)
   1110     msg = ('Table schema does not match schema used to create file: '
   1111            '\ntable:\n{!s} vs. \nfile:\n{!s}'
   1112            .format(table.schema, self.schema))
   1113     raise ValueError(msg)
-> 1115 self.writer.write_table(table, row_group_size=row_group_size)

File ~/conda/envs/dev/lib/python3.11/site-packages/pyarrow/_parquet.pyx:2226, in pyarrow._parquet.ParquetWriter.write_table()

File ~/conda/envs/dev/lib/python3.11/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()

ArrowNotImplementedError: Unsupported cast from dictionary<values=extension<pandas.interval<ArrowIntervalType>>, indices=int8, ordered=1> to struct using function cast_struct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions