|
| 1 | +# A PyTorch conda "distribution" |
| 2 | + |
| 3 | +| | | |
| 4 | +| ---------- | --------------- | |
| 5 | +| Authors | Ralf Gommers | |
| 6 | +| Status | Rejected | |
| 7 | +| Type | Process | |
| 8 | +| Created | 2020-11-26 | |
| 9 | + |
| 10 | +This proposal addresses the need for a PyTorch conda distribution, meaning a |
| 11 | +collection of integration-tested packages that can be installed from a single |
| 12 | +channel, to enable package authors to release packages that depend on PyTorch |
| 13 | +and let users install them in a reliable way. |
| 14 | + |
| 15 | + |
| 16 | +## Motivation and Scope |
| 17 | + |
| 18 | +For developers of libraries that depend on PyTorch, it is currently (Nov'20) |
| 19 | +quite difficult to express that dependency in a way that makes their package |
| 20 | +easily installable with `conda` (or `pip`) by end users. With the PyTorch |
| 21 | +ecosystem growing and the dependency graphs of sets of packages users use in |
| 22 | +a single environment becoming more complex, streamlining the package |
| 23 | +distribution and installation experience is important. |
| 24 | + |
| 25 | +Examples of packages for which there's interest in making them more easily |
| 26 | +available to end users: |
| 27 | + |
| 28 | +- [fastai](https://docs.fast.ai/): Jeremy Howard expressed interest, and |
| 29 | + plans to copy `pytorch` and other dependencies of fastai over to the `fastai` |
| 30 | + channel in case this proposal doesn't work out. |
| 31 | +- [fairseq](https://github.com/pytorch/fairseq): a fairseq developer inquired |
| 32 | + about being added to the `pytorch` channel |
| 33 | + [here](https://github.com/pytorch/builder/issues/563), and a conda-forge |
| 34 | + contributor wanted to package both PyTorch and fairseq in conda-forge, see |
| 35 | + [here](https://github.com/conda-forge/pytorch-cpu-feedstock/issues/7#issuecomment-688467743). |
| 36 | +- [TorchANI](https://github.com/aiqm/torchani): see a TorchANI user's recent |
| 37 | + attempt to add a conda-forge package |
| 38 | + [here](https://github.com/conda-forge/torchani-feedstock/pull/1). |
| 39 | + |
| 40 | +In scope for this proposal are: |
| 41 | + |
| 42 | +- Processes related to adding new packages to the `pytorch` conda channel. |
| 43 | +- CI infrastructure needed for integration testing and moving already built |
| 44 | + packages to the `pytorch` channel. |
| 45 | + |
| 46 | +_Note: using the `pytorch` channel seems like the most obvious choice for a |
| 47 | +single integration channel; using a new channel is also possible, it won't |
| 48 | +change the rest of this proposal materially._ |
| 49 | + |
| 50 | +Out of scope are: |
| 51 | + |
| 52 | +- Changes related to how libraries are built or packages for conda are created. |
| 53 | +- Updating PyTorch packaging in `defaults` or `conda-forge`. |
| 54 | +- Improvements to installing with pip or wheel builds. |
| 55 | + |
| 56 | + |
| 57 | +### The current state of affairs |
| 58 | + |
| 59 | +PyTorch is packaged in the `pytorch` channel; users must either add that |
| 60 | +channel to the channels list globally or in an environment (using, e.g., |
| 61 | +`conda config --env --add channels pytorch`), or add `-c pytorch` to every |
| 62 | +`conda` command they run. Note that the channels method is preferred over `-c |
| 63 | +pytorch` but installation instructions invariably use the latter, which can |
| 64 | +lead to problems when it's forgotten by the user at some point. |
| 65 | + |
| 66 | +PyTorch is also packaged in `defaults`, but it's really outdated (1.4.0 for |
| 67 | +CUDA-enabled packages, 1.5.0 for CPU-only). The `conda-forge` channel doesn't |
| 68 | +have PyTorch packages - there's a desire to add them, however it's unclear if |
| 69 | +and how that will happen. |
| 70 | + |
| 71 | +Authors of _pure Python packages_ tend to use their own conda channel to |
| 72 | +distribute their own package. Installation instructions will then have both |
| 73 | +the `pytorch` and their own channel in them. For example for fastai and |
| 74 | +BoTorch: |
| 75 | + |
| 76 | +``` |
| 77 | +conda install -c fastai -c pytorch fastai |
| 78 | +``` |
| 79 | + |
| 80 | +``` |
| 81 | +conda install botorch -c pytorch -c gpytorch |
| 82 | +``` |
| 83 | + |
| 84 | +When a user needs multiple packages, that becomes unwieldy quickly with each |
| 85 | +package adding its own channel. Note: alternatively, pure Python packages can |
| 86 | +choose to distribute on PyPI only (see the _PyPI, pip and wheels_ section |
| 87 | +further down) - Kornia is an example of a package that does this. |
| 88 | + |
| 89 | +Authors of _packages containing C++ or CUDA code_ which use the PyTorch C++ |
| 90 | +API have an additional issue: they need to release new package versions in |
| 91 | +sync with PyTorch itself, because there's no stable ABI that would allow |
| 92 | +depending on multiple PyTorch versions. For example, the torchvision |
| 93 | +`install_requires` dependency is determined like: |
| 94 | + |
| 95 | +```python |
| 96 | +pytorch_dep = 'torch' |
| 97 | +if os.getenv('PYTORCH_VERSION'): |
| 98 | + pytorch_dep += "==" + os.getenv('PYTORCH_VERSION') |
| 99 | + |
| 100 | +requirements = [ |
| 101 | + 'numpy', |
| 102 | + pytorch_dep, |
| 103 | +] |
| 104 | +``` |
| 105 | +and its build script ensure a one-to-one correspondence of `pytorch` and |
| 106 | +`torchvision` versions of packages. |
| 107 | + |
| 108 | +The `pytorch` channel currently already contains other packages that depend |
| 109 | +on PyTorch. Those fall into two categories: needed dependencies (e.g., |
| 110 | +`magma-cuda`, `ffmpeg`) , and PyTorch-branded and Facebook-owned projects |
| 111 | +like `torchvision`, `torchtext`, `torchaudio`, `captum`, `faiss`, `ignite`, etc. |
| 112 | +See https://anaconda.org/pytorch/repo for a complete list. |
| 113 | + |
| 114 | +Those packages maintain their own build and packaging scripts (see |
| 115 | +[this comment](https://github.com/pytorch/builder/issues/563#issuecomment-722667815)), |
| 116 | +and the integration testing and uploading to the `pytorch` conda channel is done |
| 117 | +via scripts in the [pytorch/builder](https://github.com/pytorch/builder) repo. |
| 118 | + |
| 119 | +There's more integration testing happening already: |
| 120 | +- The `test_community_repos/` directory in the `builder` repo contains a |
| 121 | + significantly larger set of packages that's tested in addition to the packages |
| 122 | + that are distributed on the `pytorch` conda channel. |
| 123 | +- The [pytorch-integration-testing](https://github.com/pytorch/pytorch-integration-testing) |
| 124 | + repo contains tooling to test PyTorch release candidates. |
| 125 | +- An overview of integration test results from the `builder` repo (last updated Oct'19, |
| 126 | + so perhaps no longer maintained) can be found |
| 127 | + [here](http://ossci-integration-test-results.s3-website-us-east-1.amazonaws.com/test-results.html). |
| 128 | + |
| 129 | + |
| 130 | +## Usage and Impact |
| 131 | + |
| 132 | +### End users |
| 133 | + |
| 134 | +The intended outcome for end users is that they will be able to install many |
| 135 | +of the most commonly packages easily with `conda` from a single channel, |
| 136 | +e.g.: |
| 137 | + |
| 138 | +``` |
| 139 | +conda install pytorch torchvision kornia fastai mmf -c pytorch |
| 140 | +``` |
| 141 | + |
| 142 | +or, a little more complete: |
| 143 | + |
| 144 | +``` |
| 145 | +# Use a new environment for a new project |
| 146 | +conda create -n myenv |
| 147 | +conda activate myenv |
| 148 | +# Add channel to env, so all conda commands will now pick up packages |
| 149 | +# in the pytorch channel: |
| 150 | +conda config --env --add channels pytorch |
| 151 | +conda install pytorch torchvision kornia fastai mmf |
| 152 | +``` |
| 153 | + |
| 154 | +### Maintainers of packages depending on PyTorch |
| 155 | + |
| 156 | +The intended outcome for maintainers is that: |
| 157 | + |
| 158 | +1. They have clear documentation on how to add their package to the `pytorch` channel, |
| 159 | + including the criteria their packages should meet, how to run integration tests, |
| 160 | + and how to release new versions. |
| 161 | +2. They can declare their dependencies correctly |
| 162 | +3. They will still need their own channel or some staging channel to host packages |
| 163 | + before they get `anaconda copy`'d to the `pytorch` channel. |
| 164 | +4. They can provide a single install command to their users, `conda install mypkg -c pytorch`, |
| 165 | + that will work reliably. |
| 166 | + |
| 167 | + |
| 168 | +## Processes |
| 169 | + |
| 170 | +### Proposing a new package for inclusion |
| 171 | + |
| 172 | +Prerequisites for a package being considered for inclusion in the `pytorch` channel are: |
| 173 | + |
| 174 | +1. The package naturally belongs in the PyTorch ecosystem. I.e., PyTorch is a |
| 175 | + key dependency, and the package is focused on an area like deep learning, |
| 176 | + machine learning or scientific computing. |
| 177 | +2. All runtime dependencies of the package are available in the `defaults` or |
| 178 | + `pytorch` channel, or adding them to the `pytorch` is possible with a |
| 179 | + reasonable amount of effort. |
| 180 | +3. A working recipe for creating a conda package is available. |
| 181 | + |
| 182 | +A GitHub repository (working name `conda-distro`) will be used for managing |
| 183 | +proposals for new packages as well as integration configuration and tooling. |
| 184 | +To propose a new package, open an issue and fill out the instructions in the |
| 185 | +GitHub issue template. When a maintainer approves the request, the proposer |
| 186 | +can open a PR to that same repo to add the package to the integration |
| 187 | +testing. |
| 188 | + |
| 189 | + |
| 190 | +### Integration testing infrastructure |
| 191 | + |
| 192 | +The CI connected to the `conda-distro` repo has to do the following: |
| 193 | + |
| 194 | +1. Trigger on PRs that add or update an individual package, running the tests |
| 195 | + for that package _and_ downstream dependencies of that package. |
| 196 | +2. If tests for (1) are successful, sync the conda packages in question to |
| 197 | + the `pytorch` channel with `anaconda copy`. |
| 198 | +3. Provide a way to run the tests of all packages together. |
| 199 | +4. Send notifications if a package releases requires an update (e.g. a |
| 200 | + version bump) to a downstream package. |
| 201 | + |
| 202 | +The individual packages have to do the following: |
| 203 | + |
| 204 | +1. Ensure there are _upper bounds on dependency versions_, so new releases of |
| 205 | + PyTorch or another dependency cannot break already released versions of |
| 206 | + the individual package in question. Note that that does mean that a new |
| 207 | + PyTorch releases requires version bumps on existing packages - more detail |
| 208 | + in strategy will be needed here. |
| 209 | +2. Tests for a package should be _runnable in a standardized way_, via |
| 210 | + `conda-build --test`. This is easy to achieve via either a `test:` section |
| 211 | + in the recipe (`meta.yaml`) or a `run_test.py` file. See [this section of |
| 212 | + the conda-build docs](https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#test-section) |
| 213 | + for details. An advantage of this method is that `conda-build` is already |
| 214 | + aware of channels and dependencies, so it should work with very little |
| 215 | + extra effort. |
| 216 | + |
| 217 | + |
| 218 | +### What happens when a new PyTorch release is made? |
| 219 | + |
| 220 | +For minor or major versions of PyTorch, new releases of downstream packages |
| 221 | +will also be necessary. A number of packages, such as `torchvision`, |
| 222 | +`torchaudio` and `torchtext`, are anyway released in sync. Other packages in |
| 223 | +the `pytorch` channel may need to be manually released via a PR to the |
| 224 | +`conda-distro` repo). |
| 225 | + |
| 226 | +Version constraints should be set such that a bugfix release of PyTorch does |
| 227 | +not require any new downstream package releases. |
| 228 | + |
| 229 | + |
| 230 | +### Dealing with packages that aren't maintained |
| 231 | + |
| 232 | +Proposing a package for inclusion in the `pytorch` channel implies a |
| 233 | +commitment to keep maintaining the package. There wil be a place to list one |
| 234 | +or more maintainers for each package so they can be pinged if needed. In case |
| 235 | +a package is not up-to-date or broken and it does not get fixed, after a |
| 236 | +certain duration (length TBD) it may be removed from the channel. |
| 237 | + |
| 238 | + |
| 239 | +## Alternatives |
| 240 | + |
| 241 | +### Conda-forge |
| 242 | + |
| 243 | +The main alternative to making the `pytorch` channel an integration channel |
| 244 | +that distributes many packages that depend on PyTorch is to have a |
| 245 | +(GPU-enabled) PyTorch package in conda-forge, and tell users and package |
| 246 | +authors that that is the place to go. It will require working with |
| 247 | +conda-forge in order to ensure that the `pytorch` package is of high quality, |
| 248 | +either by copying over the binaries from the `pytorch` channel or by |
| 249 | +migrating recipes and keeping them in sync. See |
| 250 | +[this very long discussion](https://github.com/conda-forge/pytorch-cpu-feedstock/issues/7) |
| 251 | +for details (and issues). |
| 252 | + |
| 253 | +Advantages of this alternative are: |
| 254 | + |
| 255 | +- Conda-forge has a lot of packages, so it will be easier to install PyTorch |
| 256 | + in combination with other non-deep learning packages (e.g. the geo-science |
| 257 | + stack). |
| 258 | +- Conda-forge already has established tools and processes for adding and |
| 259 | + updating them. Which means it's less likely for there to be issues with |
| 260 | + dependencies (e.g. packages with many or unusual dependencies may not be |
| 261 | + accepted into the `pytorch` channel, while `conda-forge` will be fine with |
| 262 | + them). |
| 263 | +- Users are likely already familiar with using the `conda-forge` channel. |
| 264 | + |
| 265 | +Disadvantages of this alternative are: |
| 266 | + |
| 267 | +- As of today, conda-forge doesn't have GPU hardware. Building is stil |
| 268 | + possible using CUDA stubs, however testing cannot really be done inside CI, |
| 269 | + only manually (which is a pain, especially when having to test multiple |
| 270 | + hardware and OS platforms). |
| 271 | + _Note that there are packages that follow this approach (mostly without |
| 272 | + problems so far), for example `arrow-cpp` and `cupy`. To obtain a full list of packages, clone https://github.com/conda-forge/feedstocks and run |
| 273 | + `grep 'compiler(' feedstocks/*/meta.yaml | grep cuda`._ |
| 274 | +- `conda-forge` and `defaults` aren't guaranteed to be compatible, so |
| 275 | + standardizing on `conda-forge` may cause problems for people who prefer |
| 276 | + `defaults`. |
| 277 | +- Exotic hardware support may be difficult. PyTorch has support for TPUs (via |
| 278 | + XLA), AMD ROCm, Linux on ARM64, Vulkan, Metal, Android NNAPI - this list |
| 279 | + will continue to grow. Most of this is experimental and hence not present |
| 280 | + in official binaries (and/or in the C++/Java packages which aren't |
| 281 | + distributed with conda), but this is likely to change and present issues |
| 282 | + with compilers or dependencies not present in conda-forge. |
| 283 | + For more details, see [this comment by Soumith](https://github.com/conda-forge/pytorch-cpu-feedstock/issues/7#issuecomment-538253388). |
| 284 | +- Release coordination is more difficult. For a PyTorch release, packages for |
| 285 | + `pytorch`, `torchvision`, `torchtext`, `torchaudio` will all be built |
| 286 | + together and then released. There may be manual quality assurance steps |
| 287 | + before uploading the packages. |
| 288 | + Building a set of packages like that depend on each other and releasing |
| 289 | + them in a coordinated fashion is hard to do on conda-forge, given that if |
| 290 | + everything is in feedstocks, the new pytorch package must already be |
| 291 | + available before the next build can start. It may be possible to do this |
| 292 | + with channel labels (build sequentially, then move all packages to the |
| 293 | + `main` label at once), but either way all the released artifacts will be |
| 294 | + publicly visible before the official release. |
| 295 | + |
| 296 | +Other points: |
| 297 | + |
| 298 | +- If the PyTorch team does not package for conda-forge, someone else will do |
| 299 | + that at some point. |
| 300 | +- Conda-forge no longer uses a single compiler toolchain for all packages it |
| 301 | + builds for a given platform - it is now possible to use a newer compiler, |
| 302 | + which itself is built with an older glibc/binutils (that does need to be |
| 303 | + common). See |
| 304 | + [this example](https://github.com/conda-forge/omniscidb-feedstock/blob/master/recipe/conda_build_config.yaml) |
| 305 | + for how to specify using GCC 8. So not having a recent enough compiler |
| 306 | + available is unlikely to be a relevant concern. |
| 307 | +- Mirroring packages in the `pytorch` channel to the `conda-forge` channel |
| 308 | + would alleviate worries about the disadvantages here, however there's no |
| 309 | + conda-forge tooling currently to verify ABI compatibility of the packages, |
| 310 | + which is the main worry of the conda-forge team with this approach. |
| 311 | + |
| 312 | + |
| 313 | +### DIY for every package |
| 314 | + |
| 315 | +Letting authors of every package depending on PyTorch find their own solution |
| 316 | +is basically the status quo of today. The most likely outcome longer-term is |
| 317 | +that PyTorch plus those packages depending on it will be packaged in |
| 318 | +conda-forge independently. At that point there are two competing `pytorch` |
| 319 | +packages, one in the `pytorch` and one in the `conda-forge` channel. And |
| 320 | +users who need a prebuilt version of other packages not available in the |
| 321 | +`pytorch` channel will likely migrate to `conda-forge`. |
| 322 | + |
| 323 | +The advantage is: no need to do any work to implement this proposal. The |
| 324 | +disadvantage is: depending on PyTorch will remain difficult for downstream |
| 325 | +packages. |
| 326 | + |
| 327 | + |
| 328 | +## Related work and issues |
| 329 | + |
| 330 | +### Conda channels |
| 331 | + |
| 332 | +Mixing multiple conda channels is rarely a good idea. It isn't even |
| 333 | +completely clear what a channel is for, opinions of conda and conda-forge |
| 334 | +maintainers differ - see |
| 335 | +https://github.com/conda-forge/conda-forge.github.io/issues/883. |
| 336 | + |
| 337 | + |
| 338 | +### RAPIDS |
| 339 | + |
| 340 | +RAPIDS has a really complex setup for distributing conda packages. Its install instructions currently look like: |
| 341 | +``` |
| 342 | +conda create -n rapids-0.16 -c rapidsai -c nvidia -c conda-forge \ |
| 343 | + -c defaults rapids=0.16 python=3.7 cudatoolkit=10.1 |
| 344 | +``` |
| 345 | + |
| 346 | +Depending on a user's config (e.g. having `channel_priority: strict` in |
| 347 | +`.condarc`), this may not work even in a clean environment. If one would add |
| 348 | +the `pytorch` channel as well, for users that need both PyTorch and RAPIDS, |
| 349 | +it's even less likely to work - the conda solver cannot handle that many |
| 350 | +channels and will fail to find a solution. |
| 351 | + |
| 352 | + |
| 353 | +### Cudatoolkit |
| 354 | + |
| 355 | +CUDA libraries are distributed for conda users via the `cudatoolkit` package. |
| 356 | +That package is only available in the `nvidia`, `defaults` and `conda-forge` |
| 357 | +channels. The license of the package prohibits redistribution, and an |
| 358 | +exception is difficult to obtain. Therefore it should not be added to the |
| 359 | +`pytorch` channel (also not necessary, obtaining it from `defaults` is fine). |
| 360 | + |
| 361 | + |
| 362 | +### PyPI, pip and wheels |
| 363 | + |
| 364 | +The experience installing PyTorch with `pip` is suboptimal, mainly because |
| 365 | +there's no way to control CUDA versions via `pip`, so the user gets whatever |
| 366 | +the default CUDA version is (10.2 at the time of writing) when running `pip |
| 367 | +install torch`. In case the user needs a different CUDA version or the |
| 368 | +CPU-only package, the install instruction looks like: |
| 369 | +``` |
| 370 | +pip install torch==1.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html |
| 371 | +``` |
| 372 | +There's the [pytorch-pip-shim](https://github.com/pmeier/pytorch-pip-shim) |
| 373 | +tool to handle auto-detecting CUDA versions and retrieving the right wheel. |
| 374 | +It relies on monkeypatching pip though, so it may break when new versions of |
| 375 | +pip are released. |
| 376 | + |
| 377 | +For package authors wanting to add a dependency on PyTorch, the above |
| 378 | +usability issue is a serious problem. If they add a runtime dependency on |
| 379 | +PyTorch (via `install_requires` in `setup.py` or via `pyproject.toml`), the |
| 380 | +only thing they can add is `torch` and there's no good way of signalling to |
| 381 | +the user that there's a CUDA version issue or how to deal with it. |
| 382 | + |
| 383 | +Finally note that `pip` and `conda` work together reasonably well, so for |
| 384 | +package authors that want to release packages that _do not contain C++ or |
| 385 | +CUDA code_, releasing on PyPI only and telling their users to install PyTorch |
| 386 | +with `conda` and their package with `pip` will work best. As soon as C++/CUDA |
| 387 | +code gets added, that's no longer reliable though. |
| 388 | + |
| 389 | + |
| 390 | +## Effort estimate |
| 391 | + |
| 392 | +TODO |
| 393 | + |
| 394 | +### Initial setup |
| 395 | + |
| 396 | + |
| 397 | +### Ongoing effort |
| 398 | + |
0 commit comments