-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
ENH: Allow third-party packages to register IO engines #61642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
f33778c
New third-party IO engines
datapythonista 555459b
Add tests and fix bugs
datapythonista 1ca77c1
Finishing docs and tests
datapythonista d388101
typo in doc label and typing issues
datapythonista e333510
Merge branch 'main' into io_engines
datapythonista cb82ffb
Fix link in markdown
datapythonista 088e5de
Merge branch 'io_engines' of github.com:datapythonista/pandas into io…
datapythonista 9e71a9d
Merge main
datapythonista ebfc20c
Fix link
datapythonista a4b6cdc
Update doc/source/development/extending.rst
datapythonista 0b3b00c
Update doc/source/development/extending.rst
datapythonista 776e04c
Update pandas/core/frame.py
datapythonista File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Finishing docs and tests
- Loading branch information
commit 1ca77c1cce8e67364f5b9f43b3321cd66723354f
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -489,6 +489,69 @@ registers the default "matplotlib" backend as follows. | |
| More information on how to implement a third-party plotting backend can be found at | ||
| https://github.com/pandas-dev/pandas/blob/main/pandas/plotting/__init__.py#L1. | ||
|
|
||
| .. _extending.plotting-backends: | ||
|
|
||
| IO engines | ||
| ----------- | ||
|
|
||
| pandas provides several IO connectors such as :func:`read_csv` or :meth:`to_parquet`, and many | ||
| of those support multiple engines. For example, :func:`read_csv` supports the ``python``, ``c`` | ||
| and ``pyarrow`` engines, each with its advantages and disadvantages, making each more appropriate | ||
| for certain use cases. | ||
|
|
||
| Third-party package developers can implement engines for any of the pandas readers and writers. | ||
| When a ``pandas.read_*`` function or ``DataFrame.to_*`` method are called with an ``engine="<name>"`` | ||
| that is not known to pandas, pandas will look into the entry points registered in the group | ||
| ``pandas.io_engine`` by the packages in the environment, and will call the corresponding method. | ||
|
|
||
| An engine is a simple Python class which implements one or more of the pandas readers and writers | ||
| as class methods: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| class EmptyDataEngine: | ||
| @classmethod | ||
| def read_json(cls, path_or_buf=None, **kwargs): | ||
| return pd.DataFrame() | ||
|
|
||
| @classmethod | ||
| def to_json(cls, path_or_buf=None, **kwargs): | ||
| with open(path_or_buf, "w") as f: | ||
| f.write() | ||
|
|
||
| @classmethod | ||
| def read_clipboard(cls, sep='\\s+', dtype_backend=None, **kwargs): | ||
| return pd.DataFrame() | ||
|
|
||
| A single engine can support multiple readers and writers. When possible, it is a good practice for | ||
| a reader to provide both a reader and writer for the supported formats. But it is possible to | ||
| provide just one of them. | ||
|
|
||
| The package implementing the engine needs to create an entry point for pandas to be able to discover | ||
| it. This is done in ``pyproject.toml``: | ||
|
|
||
| ```toml | ||
| [project.entry-points."pandas.io_engine"] | ||
| empty = empty_data:EmptyDataEngine | ||
| ``` | ||
|
|
||
| The first line should always be the same, creating the entry point in the ``pandas.io_engine`` group. | ||
| In the second line, ``empty`` is the name of the engine, and ``empty_data:EmptyDataEngine`` is where | ||
| to find the engine class in the package (``empty_data`` is the module name in this case). | ||
|
|
||
| If a user have the package of the example installed, them it would be possible to use: | ||
datapythonista marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| .. code-block:: python | ||
|
|
||
| pd.read_json("myfile.json", engine="empty") | ||
|
|
||
| When pandas detects that no ``empty`` engine exists for the ``read_json`` reader in pandas, will | ||
datapythonista marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| look at the entry points, will find the ``EmptyDataEngine`` engine, and will call the ``read_json`` | ||
| method on it with the arguments provided by the user (except the ``engine`` parameter). | ||
|
|
||
| To avoid conflicts in the names of engines, we keep an "IO engines" section in our | ||
| [Ecosystem page](https://pandas.pydata.org/community/ecosystem.html#io-engines). | ||
|
||
|
|
||
| .. _extending.pandas_priority: | ||
|
|
||
| Arithmetic with 3rd party types | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure if this can happen, but what if the project isn't using
pyproject.tomlfor some reason. Is there another way to do the configuration or is usingpyproject.tomlrequired?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entry points existed before pyproject.toml, and can also be added to setup.py. it makes no difference how the package defines them, pip or conda will add the entry point to the environment registry, and pandas will be able to find them regardless of how the project created them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The language here suggests that the only way to add the entry point is via
pyproject.toml. If this is the recommended way, we can say that. Or if other ways are supported, we should show that too.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pyproject.tomlis the way to do it,setup.pyis how it was done in the past. I'm sure people reading this will be able to figure out how this was done in the past if their code is still usingsetup.pyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would having the entry point being a module variable like in the UDF engine PR address some of the concerns about using
pyproject.toml?