Skip to content
Prev Previous commit
Next Next commit
Merge main
  • Loading branch information
datapythonista committed Jun 16, 2025
commit 9e71a9d41cdd8d721f8ab32fecbce1ef99a73fd4
92 changes: 16 additions & 76 deletions web/pandas/community/ecosystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,19 @@ Plotly can be used as a pandas plotting backend via:
pd.set_option("plotting.backend", "plotly")
```

### IO engines

Table with the third-party [IO engines](https://pandas.pydata.org/docs/development/extending.html#io-engines)
available to `read_*` functions and `DataFrame.to_*` methods.

| Engine name | Library | Supported formats |
| ----------------|------------------------------------------------------ | ------------------------------- |
| | | |

IO engines can be used by specifying the engine when calling a reader or writer
(e.g. `pd.read_csv("myfile.csv", engine="myengine")`).


## Domain specific pandas extensions

#### [Geopandas](https://github.com/geopandas/geopandas)
Expand Down Expand Up @@ -424,82 +437,9 @@ Pyjanitor provides a clean API for cleaning data, using method chaining.

#### [Hamilton](https://github.com/dagworks-inc/hamilton)

db-dtypes provides an extension types for working with types like
DATE, TIME, and JSON from database systems. This package is used
by pandas-gbq to provide natural dtypes for BigQuery data types without
a natural numpy type.

### [Pandas-Genomics](https://pandas-genomics.readthedocs.io/en/latest/)

Pandas-Genomics provides an extension type and extension array for working
with genomics data. It also includes `genomics` accessors for many useful properties
and methods related to QC and analysis of genomics data.

### [Physipandas](https://github.com/mocquin/physipandas)

Physipandas provides an extension for manipulating physical quantities
(like scalar and numpy.ndarray) in association with a physical unit
(like meter or joule) and additional features for integration of
`physipy` accessors with pandas Series and Dataframe.

### [Pint-Pandas](https://github.com/hgrecco/pint-pandas)

Pint-Pandas provides an extension type for storing numeric arrays with units.
These arrays can be stored inside pandas' Series and DataFrame. Operations
between Series and DataFrame columns which use pint's extension array are then
units aware.

### [Text Extensions](https://ibm.biz/text-extensions-for-pandas)

Text Extensions for Pandas provides extension types to cover common data structures for representing natural language data, plus library integrations that convert the outputs of popular natural language processing libraries into pandas DataFrames.

## Accessors

A directory of projects providing
[extension accessors](https://pandas.pydata.org/docs/development/extending.html#registering-custom-accessors).
This is for users to discover new accessors and for library
authors to coordinate on the namespace.

| Library | Accessor | Classes |
| -------------------------------------------------------------------- | ---------- | --------------------- |
| [awkward-pandas](https://awkward-pandas.readthedocs.io/en/latest/) | `ak` | `Series` |
| [pdvega](https://altair-viz.github.io/pdvega/) | `vgplot` | `Series`, `DataFrame` |
| [pandas-genomics](https://pandas-genomics.readthedocs.io/en/latest/) | `genomics` | `Series`, `DataFrame` |
| [pint-pandas](https://github.com/hgrecco/pint-pandas) | `pint` | `Series`, `DataFrame` |
| [physipandas](https://github.com/mocquin/physipandas) | `physipy` | `Series`, `DataFrame` |
| [composeml](https://github.com/alteryx/compose) | `slice` | `DataFrame` |
| [gurobipy-pandas](https://github.com/Gurobi/gurobipy-pandas) | `gppd` | `Series`, `DataFrame` |
| [staircase](https://www.staircase.dev/) | `sc` | `Series`, `DataFrame` |
| [woodwork](https://github.com/alteryx/woodwork) | `slice` | `Series`, `DataFrame` |

## IO engines

Table with the third-party [IO engines](https://pandas.pydata.org/docs/development/extending.html#io-engines)
available to `read_*` functions and `DataFrame.to_*` methods.

| Engine name | Library | Supported formats |
| ----------------|------------------------------------------------------ | ------------------------------- |
| | | |

IO engines can be used by specifying the engine when calling a reader or writer
(e.g. `pd.read_csv("myfile.csv", engine="myengine")`).

## Development tools

### [pandas-stubs](https://github.com/VirtusLab/pandas-stubs)

While pandas repository is partially typed, the package itself doesn't expose this information for external use.
Install pandas-stubs to enable basic type coverage of pandas API.

Learn more by reading through these issues [14468](https://github.com/pandas-dev/pandas/issues/14468),
[26766](https://github.com/pandas-dev/pandas/issues/26766), [28142](https://github.com/pandas-dev/pandas/issues/28142).

See installation and usage instructions on the [GitHub page](https://github.com/VirtusLab/pandas-stubs).

### [Hamilton](https://github.com/dagworks-inc/hamilton)

Hamilton is a declarative dataflow framework that came out of Stitch Fix. It was designed to help one manage a
Pandas code base, specifically with respect to feature engineering for machine learning models.
Hamilton is a declarative dataflow framework that came out of Stitch Fix. It was
designed to help one manage a Pandas code base, specifically with respect to
feature engineering for machine learning models.

It prescribes an opinionated paradigm, that ensures all code is:

Expand Down
Loading
You are viewing a condensed version of this merge commit. You can view the full changes here.