diff --git a/docs/proposal_pytorch_distribution.md b/docs/proposal_pytorch_distribution.md new file mode 100644 index 000000000000..b0e282783c39 --- /dev/null +++ b/docs/proposal_pytorch_distribution.md @@ -0,0 +1,391 @@ +# A PyTorch conda "distribution" + +This proposal addresses the need for a PyTorch conda distribution, meaning a +collection of integration-tested packages that can be installed from a single +channel, to enable package authors to release packages that depend on PyTorch +and let users install them in a reliable way. + + +## Motivation and Scope + +For developers of libraries that depend on PyTorch, it is currently (Nov'20) +quite difficult to express that dependency in a way that makes their package +easily installable with `conda` (or `pip`) by end users. With the PyTorch +ecosystem growing and the dependency graphs of sets of packages users use in +a single environment becoming more complex, streamlining the package +distribution and installation experience is important. + +Examples of packages for which there's interest in making them more easily +available to end users: + +- [fastai](https://docs.fast.ai/): Jeremy Howard expressed interest, and + plans to copy `pytorch` and other dependencies of fastai over to the `fastai` + channel in case this proposal doesn't work out. +- [fairseq](https://github.com/pytorch/fairseq): a fairseq developer inquired + about being added to the `pytorch` channel + [here](https://github.com/pytorch/builder/issues/563), and a conda-forge + contributor wanted to package both PyTorch and fairseq in conda-forge, see + [here](https://github.com/conda-forge/pytorch-cpu-feedstock/issues/7#issuecomment-688467743). +- [TorchANI](https://github.com/aiqm/torchani): see a TorchANI user's recent + attempt to add a conda-forge package + [here](https://github.com/conda-forge/torchani-feedstock/pull/1). + +In scope for this proposal are: + +- Processes related to adding new packages to the `pytorch` conda channel. +- CI infrastructure needed for integration testing and moving already built + packages to the `pytorch` channel. + +_Note: using the `pytorch` channel seems like the most obvious choice for a +single integration channel; using a new channel is also possible, it won't +change the rest of this proposal materially._ + +Out of scope are: + +- Changes related to how libraries are built or packages for conda are created. +- Updating PyTorch packaging in `defaults` or `conda-forge`. +- Improvements to installing with pip or wheel builds. + + +### The current state of affairs + +PyTorch is packaged in the `pytorch` channel; users must either add that +channel to the channels list globally or in an environment (using, e.g., +`conda config --env --add channels pytorch`), or add `-c pytorch` to every +`conda` command they run. Note that the channels method is preferred over `-c +pytorch` but installation instructions invariably use the latter, which can +lead to problems when it's forgotten by the user at some point. + +PyTorch is also packaged in `defaults`, but it's really outdated (1.4.0 for +CUDA-enabled packages, 1.5.0 for CPU-only). The `conda-forge` channel doesn't +have PyTorch packages - there's a desire to add them, however it's unclear if +and how that will happen. + +Authors of _pure Python packages_ tend to use their own conda channel to +distribute their own package. Installation instructions will then have both +the `pytorch` and their own channel in them. For example for fastai and +BoTorch: + +``` +conda install -c fastai -c pytorch fastai +``` + +``` +conda install botorch -c pytorch -c gpytorch +``` + +When a user needs multiple packages, that becomes unwieldy quickly with each +package adding its own channel. Note: alternatively, pure Python packages can +choose to distribute on PyPI only (see the _PyPI, pip and wheels_ section +further down) - Kornia is an example of a package that does this. + +Authors of _packages containing C++ or CUDA code_ which use the PyTorch C++ +API have an additional issue: they need to release new package versions in +sync with PyTorch itself, because there's no stable ABI that would allow +depending on multiple PyTorch versions. For example, the torchvision +`install_requires` dependency is determined like: + +```python +pytorch_dep = 'torch' +if os.getenv('PYTORCH_VERSION'): + pytorch_dep += "==" + os.getenv('PYTORCH_VERSION') + +requirements = [ + 'numpy', + pytorch_dep, +] +``` +and its build script ensure a one-to-one correspondence of `pytorch` and +`torchvision` versions of packages. + +The `pytorch` channel currently already contains other packages that depend +on PyTorch. Those fall into two categories: needed dependencies (e.g., +`magma-cuda`, `ffmpeg`) , and PyTorch-branded and Facebook-owned projects +like `torchvision`, `torchtext`, `torchaudio`, `captum`, `faiss`, `ignite`, etc. +See https://anaconda.org/pytorch/repo for a complete list. + +Those packages maintain their own build and packaging scripts (see +[this comment](https://github.com/pytorch/builder/issues/563#issuecomment-722667815)), +and the integration testing and uploading to the `pytorch` conda channel is done +via scripts in the [pytorch/builder](https://github.com/pytorch/builder) repo. + +There's more integration testing happening already: +- The `test_community_repos/` directory in the `builder` repo contains a + significantly larger set of packages that's tested in addition to the packages + that are distributed on the `pytorch` conda channel. +- The [pytorch-integration-testing](https://github.com/pytorch/pytorch-integration-testing) + repo contains tooling to test PyTorch release candidates. +- An overview of integration test results from the `builder` repo (last updated Oct'19, + so perhaps no longer maintained) can be found + [here](http://ossci-integration-test-results.s3-website-us-east-1.amazonaws.com/test-results.html). + + +## Usage and Impact + +### End users + +The intended outcome for end users is that they will be able to install many +of the most commonly packages easily with `conda` from a single channel, +e.g.: + +``` +conda install pytorch torchvision kornia fastai mmf -c pytorch +``` + +or, a little more complete: + +``` +# Use a new environment for a new project +conda create -n myenv +conda activate myenv +# Add channel to env, so all conda commands will now pick up packages +# in the pytorch channel: +conda config --env --add channels pytorch +conda install pytorch torchvision kornia fastai mmf +``` + +### Maintainers of packages depending on PyTorch + +The intended outcome for maintainers is that: + +1. They have clear documentation on how to add their package to the `pytorch` channel, + including the criteria their packages should meet, how to run integration tests, + and how to release new versions. +2. They can declare their dependencies correctly +3. They will still need their own channel or some staging channel to host packages + before they get `anaconda copy`'d to the `pytorch` channel. +4. They can provide a single install command to their users, `conda install mypkg -c pytorch`, + that will work reliably. + + +## Processes + +### Proposing a new package for inclusion + +Prerequisites for a package being considered for inclusion in the `pytorch` channel are: + +1. The package naturally belongs in the PyTorch ecosystem. I.e., PyTorch is a + key dependency, and the package is focused on an area like deep learning, + machine learning or scientific computing. +2. All runtime dependencies of the package are available in the `defaults` or + `pytorch` channel, or adding them to the `pytorch` is possible with a + reasonable amount of effort. +3. A working recipe for creating a conda package is available. + +A GitHub repository (working name `conda-distro`) will be used for managing +proposals for new packages as well as integration configuration and tooling. +To propose a new package, open an issue and fill out the instructions in the +GitHub issue template. When a maintainer approves the request, the proposer +can open a PR to that same repo to add the package to the integration +testing. + + +### Integration testing infrastructure + +The CI connected to the `conda-distro` repo has to do the following: + +1. Trigger on PRs that add or update an individual package, running the tests + for that package _and_ downstream dependencies of that package. +2. If tests for (1) are successful, sync the conda packages in question to + the `pytorch` channel with `anaconda copy`. +3. Provide a way to run the tests of all packages together. +4. Send notifications if a package releases requires an update (e.g. a + version bump) to a downstream package. + +The individual packages have to do the following: + +1. Ensure there are _upper bounds on dependency versions_, so new releases of + PyTorch or another dependency cannot break already released versions of + the individual package in question. Note that that does mean that a new + PyTorch releases requires version bumps on existing packages - more detail + in strategy will be needed here. +2. Tests for a package should be _runnable in a standardized way_, via + `conda-build --test`. This is easy to achieve via either a `test:` section + in the recipe (`meta.yaml`) or a `run_test.py` file. See [this section of + the conda-build docs](https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#test-section) + for details. An advantage of this method is that `conda-build` is already + aware of channels and dependencies, so it should work with very little + extra effort. + + +### What happens when a new PyTorch release is made? + +For minor or major versions of PyTorch, new releases of downstream packages +will also be necessary. A number of packages, such as `torchvision`, +`torchaudio` and `torchtext`, are anyway released in sync. Other packages in +the `pytorch` channel may need to be manually released via a PR to the +`conda-distro` repo). + +Version constraints should be set such that a bugfix release of PyTorch does +not require any new downstream package releases. + + +### Dealing with packages that aren't maintained + +Proposing a package for inclusion in the `pytorch` channel implies a +commitment to keep maintaining the package. There wil be a place to list one +or more maintainers for each package so they can be pinged if needed. In case +a package is not up-to-date or broken and it does not get fixed, after a +certain duration (length TBD) it may be removed from the channel. + + +## Alternatives + +### Conda-forge + +The main alternative to making the `pytorch` channel an integration channel +that distributes many packages that depend on PyTorch is to have a +(GPU-enabled) PyTorch package in conda-forge, and tell users and package +authors that that is the place to go. It will require working with +conda-forge in order to ensure that the `pytorch` package is of high quality, +either by copying over the binaries from the `pytorch` channel or by +migrating recipes and keeping them in sync. See +[this very long discussion](https://github.com/conda-forge/pytorch-cpu-feedstock/issues/7) +for details (and issues). + +Advantages of this alternative are: + +- Conda-forge has a lot of packages, so it will be easier to install PyTorch + in combination with other non-deep learning packages (e.g. the geo-science + stack). +- Conda-forge already has established tools and processes for adding and + updating them. Which means it's less likely for there to be issues with + dependencies (e.g. packages with many or unusual dependencies may not be + accepted into the `pytorch` channel, while `conda-forge` will be fine with + them). +- Users are likely already familiar with using the `conda-forge` channel. + +Disadvantages of this alternative are: + +- As of today, conda-forge doesn't have GPU hardware. Building is stil + possible using CUDA stubs, however testing cannot really be done inside CI, + only manually (which is a pain, especially when having to test multiple + hardware and OS platforms). + _Note that there are packages that follow this approach (mostly without + problems so far), for example `arrow-cpp` and `cupy`. To obtain a full list of packages, clone https://github.com/conda-forge/feedstocks and run + `grep 'compiler(' feedstocks/*/meta.yaml | grep cuda`._ +- `conda-forge` and `defaults` aren't guaranteed to be compatible, so + standardizing on `conda-forge` may cause problems for people who prefer + `defaults`. +- Exotic hardware support may be difficult. PyTorch has support for TPUs (via + XLA), AMD ROCm, Linux on ARM64, Vulkan, Metal, Android NNAPI - this list + will continue to grow. Most of this is experimental and hence not present + in official binaries (and/or in the C++/Java packages which aren't + distributed with conda), but this is likely to change and present issues + with compilers or dependencies not present in conda-forge. + For more details, see [this comment by Soumith](https://github.com/conda-forge/pytorch-cpu-feedstock/issues/7#issuecomment-538253388). +- Release coordination is more difficult. For a PyTorch release, packages for + `pytorch`, `torchvision`, `torchtext`, `torchaudio` will all be built + together and then released. There may be manual quality assurance steps + before uploading the packages. + Building a set of packages like that depend on each other and releasing + them in a coordinated fashion is hard to do on conda-forge, given that if + everything is in feedstocks, the new pytorch package must already be + available before the next build can start. It may be possible to do this + with channel labels (build sequentially, then move all packages to the + `main` label at once), but either way all the released artifacts will be + publicly visible before the official release. + +Other points: + +- If the PyTorch team does not package for conda-forge, someone else will do + that at some point. +- Conda-forge no longer uses a single compiler toolchain for all packages it + builds for a given platform - it is now possible to use a newer compiler, + which itself is built with an older glibc/binutils (that does need to be + common). See + [this example](https://github.com/conda-forge/omniscidb-feedstock/blob/master/recipe/conda_build_config.yaml) + for how to specify using GCC 8. So not having a recent enough compiler + available is unlikely to be a relevant concern. +- Mirroring packages in the `pytorch` channel to the `conda-forge` channel + would alleviate worries about the disadvantages here, however there's no + conda-forge tooling currently to verify ABI compatibility of the packages, + which is the main worry of the conda-forge team with this approach. + + +### DIY for every package + +Letting authors of every package depending on PyTorch find their own solution +is basically the status quo of today. The most likely outcome longer-term is +that PyTorch plus those packages depending on it will be packaged in +conda-forge independently. At that point there are two competing `pytorch` +packages, one in the `pytorch` and one in the `conda-forge` channel. And +users who need a prebuilt version of other packages not available in the +`pytorch` channel will likely migrate to `conda-forge`. + +The advantage is: no need to do any work to implement this proposal. The +disadvantage is: depending on PyTorch will remain difficult for downstream +packages. + + +## Related work and issues + +### Conda channels + +Mixing multiple conda channels is rarely a good idea. It isn't even +completely clear what a channel is for, opinions of conda and conda-forge +maintainers differ - see +https://github.com/conda-forge/conda-forge.github.io/issues/883. + + +### RAPIDS + +RAPIDS has a really complex setup for distributing conda packages. Its install instructions currently look like: +``` +conda create -n rapids-0.16 -c rapidsai -c nvidia -c conda-forge \ + -c defaults rapids=0.16 python=3.7 cudatoolkit=10.1 +``` + +Depending on a user's config (e.g. having `channel_priority: strict` in +`.condarc`), this may not work even in a clean environment. If one would add +the `pytorch` channel as well, for users that need both PyTorch and RAPIDS, +it's even less likely to work - the conda solver cannot handle that many +channels and will fail to find a solution. + + +### Cudatoolkit + +CUDA libraries are distributed for conda users via the `cudatoolkit` package. +That package is only available in the `nvidia`, `defaults` and `conda-forge` +channels. The license of the package prohibits redistribution, and an +exception is difficult to obtain. Therefore it should not be added to the +`pytorch` channel (also not necessary, obtaining it from `defaults` is fine). + + +### PyPI, pip and wheels + +The experience installing PyTorch with `pip` is suboptimal, mainly because +there's no way to control CUDA versions via `pip`, so the user gets whatever +the default CUDA version is (10.2 at the time of writing) when running `pip +install torch`. In case the user needs a different CUDA version or the +CPU-only package, the install instruction looks like: +``` +pip install torch==1.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html +``` +There's the [pytorch-pip-shim](https://github.com/pmeier/pytorch-pip-shim) +tool to handle auto-detecting CUDA versions and retrieving the right wheel. +It relies on monkeypatching pip though, so it may break when new versions of +pip are released. + +For package authors wanting to add a dependency on PyTorch, the above +usability issue is a serious problem. If they add a runtime dependency on +PyTorch (via `install_requires` in `setup.py` or via `pyproject.toml`), the +only thing they can add is `torch` and there's no good way of signalling to +the user that there's a CUDA version issue or how to deal with it. + +Finally note that `pip` and `conda` work together reasonably well, so for +package authors that want to release packages that _do not contain C++ or +CUDA code_, releasing on PyPI only and telling their users to install PyTorch +with `conda` and their package with `pip` will work best. As soon as C++/CUDA +code gets added, that's no longer reliable though. + + +## Effort estimate + +TODO + +### Initial setup + + +### Ongoing effort +