Skip to content

Commit a959a46

Browse files
authored
RFC-0006: a PyTorch conda distribution (#10)
* RFC-0006: a PyTorch conda distribution * Add status Rejected to RFC-0006 After discussion with `@malfet` we agreed that, given the progress conda-forge is making and GPU hardware for use in CI being added in the near future, it's better to help out with that effort.
1 parent 22cc388 commit a959a46

File tree

1 file changed

+398
-0
lines changed

1 file changed

+398
-0
lines changed

RFC-0006-conda-distribution.md

Lines changed: 398 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,398 @@
1+
# A PyTorch conda "distribution"
2+
3+
| | |
4+
| ---------- | --------------- |
5+
| Authors | Ralf Gommers |
6+
| Status | Rejected |
7+
| Type | Process |
8+
| Created | 2020-11-26 |
9+
10+
This proposal addresses the need for a PyTorch conda distribution, meaning a
11+
collection of integration-tested packages that can be installed from a single
12+
channel, to enable package authors to release packages that depend on PyTorch
13+
and let users install them in a reliable way.
14+
15+
16+
## Motivation and Scope
17+
18+
For developers of libraries that depend on PyTorch, it is currently (Nov'20)
19+
quite difficult to express that dependency in a way that makes their package
20+
easily installable with `conda` (or `pip`) by end users. With the PyTorch
21+
ecosystem growing and the dependency graphs of sets of packages users use in
22+
a single environment becoming more complex, streamlining the package
23+
distribution and installation experience is important.
24+
25+
Examples of packages for which there's interest in making them more easily
26+
available to end users:
27+
28+
- [fastai](https://docs.fast.ai/): Jeremy Howard expressed interest, and
29+
plans to copy `pytorch` and other dependencies of fastai over to the `fastai`
30+
channel in case this proposal doesn't work out.
31+
- [fairseq](https://github.com/pytorch/fairseq): a fairseq developer inquired
32+
about being added to the `pytorch` channel
33+
[here](https://github.com/pytorch/builder/issues/563), and a conda-forge
34+
contributor wanted to package both PyTorch and fairseq in conda-forge, see
35+
[here](https://github.com/conda-forge/pytorch-cpu-feedstock/issues/7#issuecomment-688467743).
36+
- [TorchANI](https://github.com/aiqm/torchani): see a TorchANI user's recent
37+
attempt to add a conda-forge package
38+
[here](https://github.com/conda-forge/torchani-feedstock/pull/1).
39+
40+
In scope for this proposal are:
41+
42+
- Processes related to adding new packages to the `pytorch` conda channel.
43+
- CI infrastructure needed for integration testing and moving already built
44+
packages to the `pytorch` channel.
45+
46+
_Note: using the `pytorch` channel seems like the most obvious choice for a
47+
single integration channel; using a new channel is also possible, it won't
48+
change the rest of this proposal materially._
49+
50+
Out of scope are:
51+
52+
- Changes related to how libraries are built or packages for conda are created.
53+
- Updating PyTorch packaging in `defaults` or `conda-forge`.
54+
- Improvements to installing with pip or wheel builds.
55+
56+
57+
### The current state of affairs
58+
59+
PyTorch is packaged in the `pytorch` channel; users must either add that
60+
channel to the channels list globally or in an environment (using, e.g.,
61+
`conda config --env --add channels pytorch`), or add `-c pytorch` to every
62+
`conda` command they run. Note that the channels method is preferred over `-c
63+
pytorch` but installation instructions invariably use the latter, which can
64+
lead to problems when it's forgotten by the user at some point.
65+
66+
PyTorch is also packaged in `defaults`, but it's really outdated (1.4.0 for
67+
CUDA-enabled packages, 1.5.0 for CPU-only). The `conda-forge` channel doesn't
68+
have PyTorch packages - there's a desire to add them, however it's unclear if
69+
and how that will happen.
70+
71+
Authors of _pure Python packages_ tend to use their own conda channel to
72+
distribute their own package. Installation instructions will then have both
73+
the `pytorch` and their own channel in them. For example for fastai and
74+
BoTorch:
75+
76+
```
77+
conda install -c fastai -c pytorch fastai
78+
```
79+
80+
```
81+
conda install botorch -c pytorch -c gpytorch
82+
```
83+
84+
When a user needs multiple packages, that becomes unwieldy quickly with each
85+
package adding its own channel. Note: alternatively, pure Python packages can
86+
choose to distribute on PyPI only (see the _PyPI, pip and wheels_ section
87+
further down) - Kornia is an example of a package that does this.
88+
89+
Authors of _packages containing C++ or CUDA code_ which use the PyTorch C++
90+
API have an additional issue: they need to release new package versions in
91+
sync with PyTorch itself, because there's no stable ABI that would allow
92+
depending on multiple PyTorch versions. For example, the torchvision
93+
`install_requires` dependency is determined like:
94+
95+
```python
96+
pytorch_dep = 'torch'
97+
if os.getenv('PYTORCH_VERSION'):
98+
pytorch_dep += "==" + os.getenv('PYTORCH_VERSION')
99+
100+
requirements = [
101+
'numpy',
102+
pytorch_dep,
103+
]
104+
```
105+
and its build script ensure a one-to-one correspondence of `pytorch` and
106+
`torchvision` versions of packages.
107+
108+
The `pytorch` channel currently already contains other packages that depend
109+
on PyTorch. Those fall into two categories: needed dependencies (e.g.,
110+
`magma-cuda`, `ffmpeg`) , and PyTorch-branded and Facebook-owned projects
111+
like `torchvision`, `torchtext`, `torchaudio`, `captum`, `faiss`, `ignite`, etc.
112+
See https://anaconda.org/pytorch/repo for a complete list.
113+
114+
Those packages maintain their own build and packaging scripts (see
115+
[this comment](https://github.com/pytorch/builder/issues/563#issuecomment-722667815)),
116+
and the integration testing and uploading to the `pytorch` conda channel is done
117+
via scripts in the [pytorch/builder](https://github.com/pytorch/builder) repo.
118+
119+
There's more integration testing happening already:
120+
- The `test_community_repos/` directory in the `builder` repo contains a
121+
significantly larger set of packages that's tested in addition to the packages
122+
that are distributed on the `pytorch` conda channel.
123+
- The [pytorch-integration-testing](https://github.com/pytorch/pytorch-integration-testing)
124+
repo contains tooling to test PyTorch release candidates.
125+
- An overview of integration test results from the `builder` repo (last updated Oct'19,
126+
so perhaps no longer maintained) can be found
127+
[here](http://ossci-integration-test-results.s3-website-us-east-1.amazonaws.com/test-results.html).
128+
129+
130+
## Usage and Impact
131+
132+
### End users
133+
134+
The intended outcome for end users is that they will be able to install many
135+
of the most commonly packages easily with `conda` from a single channel,
136+
e.g.:
137+
138+
```
139+
conda install pytorch torchvision kornia fastai mmf -c pytorch
140+
```
141+
142+
or, a little more complete:
143+
144+
```
145+
# Use a new environment for a new project
146+
conda create -n myenv
147+
conda activate myenv
148+
# Add channel to env, so all conda commands will now pick up packages
149+
# in the pytorch channel:
150+
conda config --env --add channels pytorch
151+
conda install pytorch torchvision kornia fastai mmf
152+
```
153+
154+
### Maintainers of packages depending on PyTorch
155+
156+
The intended outcome for maintainers is that:
157+
158+
1. They have clear documentation on how to add their package to the `pytorch` channel,
159+
including the criteria their packages should meet, how to run integration tests,
160+
and how to release new versions.
161+
2. They can declare their dependencies correctly
162+
3. They will still need their own channel or some staging channel to host packages
163+
before they get `anaconda copy`'d to the `pytorch` channel.
164+
4. They can provide a single install command to their users, `conda install mypkg -c pytorch`,
165+
that will work reliably.
166+
167+
168+
## Processes
169+
170+
### Proposing a new package for inclusion
171+
172+
Prerequisites for a package being considered for inclusion in the `pytorch` channel are:
173+
174+
1. The package naturally belongs in the PyTorch ecosystem. I.e., PyTorch is a
175+
key dependency, and the package is focused on an area like deep learning,
176+
machine learning or scientific computing.
177+
2. All runtime dependencies of the package are available in the `defaults` or
178+
`pytorch` channel, or adding them to the `pytorch` is possible with a
179+
reasonable amount of effort.
180+
3. A working recipe for creating a conda package is available.
181+
182+
A GitHub repository (working name `conda-distro`) will be used for managing
183+
proposals for new packages as well as integration configuration and tooling.
184+
To propose a new package, open an issue and fill out the instructions in the
185+
GitHub issue template. When a maintainer approves the request, the proposer
186+
can open a PR to that same repo to add the package to the integration
187+
testing.
188+
189+
190+
### Integration testing infrastructure
191+
192+
The CI connected to the `conda-distro` repo has to do the following:
193+
194+
1. Trigger on PRs that add or update an individual package, running the tests
195+
for that package _and_ downstream dependencies of that package.
196+
2. If tests for (1) are successful, sync the conda packages in question to
197+
the `pytorch` channel with `anaconda copy`.
198+
3. Provide a way to run the tests of all packages together.
199+
4. Send notifications if a package releases requires an update (e.g. a
200+
version bump) to a downstream package.
201+
202+
The individual packages have to do the following:
203+
204+
1. Ensure there are _upper bounds on dependency versions_, so new releases of
205+
PyTorch or another dependency cannot break already released versions of
206+
the individual package in question. Note that that does mean that a new
207+
PyTorch releases requires version bumps on existing packages - more detail
208+
in strategy will be needed here.
209+
2. Tests for a package should be _runnable in a standardized way_, via
210+
`conda-build --test`. This is easy to achieve via either a `test:` section
211+
in the recipe (`meta.yaml`) or a `run_test.py` file. See [this section of
212+
the conda-build docs](https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#test-section)
213+
for details. An advantage of this method is that `conda-build` is already
214+
aware of channels and dependencies, so it should work with very little
215+
extra effort.
216+
217+
218+
### What happens when a new PyTorch release is made?
219+
220+
For minor or major versions of PyTorch, new releases of downstream packages
221+
will also be necessary. A number of packages, such as `torchvision`,
222+
`torchaudio` and `torchtext`, are anyway released in sync. Other packages in
223+
the `pytorch` channel may need to be manually released via a PR to the
224+
`conda-distro` repo).
225+
226+
Version constraints should be set such that a bugfix release of PyTorch does
227+
not require any new downstream package releases.
228+
229+
230+
### Dealing with packages that aren't maintained
231+
232+
Proposing a package for inclusion in the `pytorch` channel implies a
233+
commitment to keep maintaining the package. There wil be a place to list one
234+
or more maintainers for each package so they can be pinged if needed. In case
235+
a package is not up-to-date or broken and it does not get fixed, after a
236+
certain duration (length TBD) it may be removed from the channel.
237+
238+
239+
## Alternatives
240+
241+
### Conda-forge
242+
243+
The main alternative to making the `pytorch` channel an integration channel
244+
that distributes many packages that depend on PyTorch is to have a
245+
(GPU-enabled) PyTorch package in conda-forge, and tell users and package
246+
authors that that is the place to go. It will require working with
247+
conda-forge in order to ensure that the `pytorch` package is of high quality,
248+
either by copying over the binaries from the `pytorch` channel or by
249+
migrating recipes and keeping them in sync. See
250+
[this very long discussion](https://github.com/conda-forge/pytorch-cpu-feedstock/issues/7)
251+
for details (and issues).
252+
253+
Advantages of this alternative are:
254+
255+
- Conda-forge has a lot of packages, so it will be easier to install PyTorch
256+
in combination with other non-deep learning packages (e.g. the geo-science
257+
stack).
258+
- Conda-forge already has established tools and processes for adding and
259+
updating them. Which means it's less likely for there to be issues with
260+
dependencies (e.g. packages with many or unusual dependencies may not be
261+
accepted into the `pytorch` channel, while `conda-forge` will be fine with
262+
them).
263+
- Users are likely already familiar with using the `conda-forge` channel.
264+
265+
Disadvantages of this alternative are:
266+
267+
- As of today, conda-forge doesn't have GPU hardware. Building is stil
268+
possible using CUDA stubs, however testing cannot really be done inside CI,
269+
only manually (which is a pain, especially when having to test multiple
270+
hardware and OS platforms).
271+
_Note that there are packages that follow this approach (mostly without
272+
problems so far), for example `arrow-cpp` and `cupy`. To obtain a full list of packages, clone https://github.com/conda-forge/feedstocks and run
273+
`grep 'compiler(' feedstocks/*/meta.yaml | grep cuda`._
274+
- `conda-forge` and `defaults` aren't guaranteed to be compatible, so
275+
standardizing on `conda-forge` may cause problems for people who prefer
276+
`defaults`.
277+
- Exotic hardware support may be difficult. PyTorch has support for TPUs (via
278+
XLA), AMD ROCm, Linux on ARM64, Vulkan, Metal, Android NNAPI - this list
279+
will continue to grow. Most of this is experimental and hence not present
280+
in official binaries (and/or in the C++/Java packages which aren't
281+
distributed with conda), but this is likely to change and present issues
282+
with compilers or dependencies not present in conda-forge.
283+
For more details, see [this comment by Soumith](https://github.com/conda-forge/pytorch-cpu-feedstock/issues/7#issuecomment-538253388).
284+
- Release coordination is more difficult. For a PyTorch release, packages for
285+
`pytorch`, `torchvision`, `torchtext`, `torchaudio` will all be built
286+
together and then released. There may be manual quality assurance steps
287+
before uploading the packages.
288+
Building a set of packages like that depend on each other and releasing
289+
them in a coordinated fashion is hard to do on conda-forge, given that if
290+
everything is in feedstocks, the new pytorch package must already be
291+
available before the next build can start. It may be possible to do this
292+
with channel labels (build sequentially, then move all packages to the
293+
`main` label at once), but either way all the released artifacts will be
294+
publicly visible before the official release.
295+
296+
Other points:
297+
298+
- If the PyTorch team does not package for conda-forge, someone else will do
299+
that at some point.
300+
- Conda-forge no longer uses a single compiler toolchain for all packages it
301+
builds for a given platform - it is now possible to use a newer compiler,
302+
which itself is built with an older glibc/binutils (that does need to be
303+
common). See
304+
[this example](https://github.com/conda-forge/omniscidb-feedstock/blob/master/recipe/conda_build_config.yaml)
305+
for how to specify using GCC 8. So not having a recent enough compiler
306+
available is unlikely to be a relevant concern.
307+
- Mirroring packages in the `pytorch` channel to the `conda-forge` channel
308+
would alleviate worries about the disadvantages here, however there's no
309+
conda-forge tooling currently to verify ABI compatibility of the packages,
310+
which is the main worry of the conda-forge team with this approach.
311+
312+
313+
### DIY for every package
314+
315+
Letting authors of every package depending on PyTorch find their own solution
316+
is basically the status quo of today. The most likely outcome longer-term is
317+
that PyTorch plus those packages depending on it will be packaged in
318+
conda-forge independently. At that point there are two competing `pytorch`
319+
packages, one in the `pytorch` and one in the `conda-forge` channel. And
320+
users who need a prebuilt version of other packages not available in the
321+
`pytorch` channel will likely migrate to `conda-forge`.
322+
323+
The advantage is: no need to do any work to implement this proposal. The
324+
disadvantage is: depending on PyTorch will remain difficult for downstream
325+
packages.
326+
327+
328+
## Related work and issues
329+
330+
### Conda channels
331+
332+
Mixing multiple conda channels is rarely a good idea. It isn't even
333+
completely clear what a channel is for, opinions of conda and conda-forge
334+
maintainers differ - see
335+
https://github.com/conda-forge/conda-forge.github.io/issues/883.
336+
337+
338+
### RAPIDS
339+
340+
RAPIDS has a really complex setup for distributing conda packages. Its install instructions currently look like:
341+
```
342+
conda create -n rapids-0.16 -c rapidsai -c nvidia -c conda-forge \
343+
-c defaults rapids=0.16 python=3.7 cudatoolkit=10.1
344+
```
345+
346+
Depending on a user's config (e.g. having `channel_priority: strict` in
347+
`.condarc`), this may not work even in a clean environment. If one would add
348+
the `pytorch` channel as well, for users that need both PyTorch and RAPIDS,
349+
it's even less likely to work - the conda solver cannot handle that many
350+
channels and will fail to find a solution.
351+
352+
353+
### Cudatoolkit
354+
355+
CUDA libraries are distributed for conda users via the `cudatoolkit` package.
356+
That package is only available in the `nvidia`, `defaults` and `conda-forge`
357+
channels. The license of the package prohibits redistribution, and an
358+
exception is difficult to obtain. Therefore it should not be added to the
359+
`pytorch` channel (also not necessary, obtaining it from `defaults` is fine).
360+
361+
362+
### PyPI, pip and wheels
363+
364+
The experience installing PyTorch with `pip` is suboptimal, mainly because
365+
there's no way to control CUDA versions via `pip`, so the user gets whatever
366+
the default CUDA version is (10.2 at the time of writing) when running `pip
367+
install torch`. In case the user needs a different CUDA version or the
368+
CPU-only package, the install instruction looks like:
369+
```
370+
pip install torch==1.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
371+
```
372+
There's the [pytorch-pip-shim](https://github.com/pmeier/pytorch-pip-shim)
373+
tool to handle auto-detecting CUDA versions and retrieving the right wheel.
374+
It relies on monkeypatching pip though, so it may break when new versions of
375+
pip are released.
376+
377+
For package authors wanting to add a dependency on PyTorch, the above
378+
usability issue is a serious problem. If they add a runtime dependency on
379+
PyTorch (via `install_requires` in `setup.py` or via `pyproject.toml`), the
380+
only thing they can add is `torch` and there's no good way of signalling to
381+
the user that there's a CUDA version issue or how to deal with it.
382+
383+
Finally note that `pip` and `conda` work together reasonably well, so for
384+
package authors that want to release packages that _do not contain C++ or
385+
CUDA code_, releasing on PyPI only and telling their users to install PyTorch
386+
with `conda` and their package with `pip` will work best. As soon as C++/CUDA
387+
code gets added, that's no longer reliable though.
388+
389+
390+
## Effort estimate
391+
392+
TODO
393+
394+
### Initial setup
395+
396+
397+
### Ongoing effort
398+

0 commit comments

Comments
 (0)