0.2.4
What's Changed
- cleaning
skipiffor past Torch dev versions by @Borda in #2125 - fix missing images when released on PyPI by @Borda in #2130
- Add custom decompositions for cross entropy loss for the nvfuser executor by @protonu in #2043
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2139
- Remove qualified access to methods from autodiff by @riccardofelluga in #2147
- add non-None check for
torch.utils.collect_env.get_pip_packagesoutputs by @crcrpar in #2124 - Print repro command when
test_core_vs_torch_consistencyfails with sample index specified by @crcrpar in #2131 - Set input_quantizer.internal to True by @beverlylytle in #2146
- Bump transformers from 4.50.3 to 4.52.4 by @dependabot in #2160
- Update ipython[all] requirement from ~=8.36.0 to ~=8.37.0 by @dependabot in #2159
- Update coverage requirement from ~=7.6.8 to ~=7.8.2 by @dependabot in #2158
- TE: update test to be more stable by @kshitij12345 in #2156
- Bump pytest-timeout from 2.3.1 to 2.4.0 by @dependabot in #2161
- Update dependabot - reviewers by @Borda in #2162
- Fix autodiff joint trace dataflow and in-place ops in higher order functions by @riccardofelluga in #2143
- Reduces the test time by @kiya00 in #2077
- add a use_hf option to benchmarking by @t-vi in #2154
- Bump pytest-xdist from 3.6.1 to 3.7.0 by @dependabot in #2164
- Update hypothesis requirement from ~=6.131.9 to ~=6.133.0 by @dependabot in #2165
- Update snowballstemmer requirement from <3 to <4 by @dependabot in #2168
- nvFuser Executor: Ensure cross-entropy loss fwd is not recomputed when computing bwd by @protonu in #2180
- Add parity check of shape/dtype/device of runtime and trace by @crcrpar in #2069
- Add docstrings for recipes by @KaelanDt in #2185
- sdpa_ex: relax test tolerances by @kshitij12345 in #2178
- removing nv_enable_embedding by @jjsjann123 in #2057
- Improve error reporting in benchmark job and add cleanup logic by @Borda in #2176
- Add mode flag to TorchCompileExecutor by @t-vi in #2188
- Use
to_dtypeandto_torch_dtypenot_torch_to_thunder_dtype_mapand_thunder_to_torch_dtype_mapby @crcrpar in #2181 - use hf recipe in quickstart by @t-vi in #2191
- Remove kwarg construction from FusionDefinitionWrapper.call by @IvanYashchuk in #1871
- Add tests for HFTransformers recipe with static cache by @KaelanDt in #2179
- bump: PyTorch to be latest
2.7.1by @Borda in #2193 prims.whereignores shape/device ofpredif it's a CPU scalar tensor by @crcrpar in #2135- add decomposition for repeat interleave by @t-vi in #2194
- fix traceback in with / try: finally: for Python 3.10 by @t-vi in #2195
- Use joint trace in transform_for_execution by @beverlylytle in #2102
- Handle proxy objects in the cuDNN SDPA checker by @kiya00 in #2073
- default to hf recipe in thunder.compile for hf models by @KaelanDt in #2199
- add autocast lookaside to hf recipe for tracing on meta device by @t-vi in #2200
- make test_networks.py not rely on HF downloads by @KaelanDt in #2202
- Add plugins documentation by @KaelanDt in #2207
- implement partial, avoid tuple addition, test partialmethod by @t-vi in #2209
- Add
bitwise_left_shiftandbitwise_right_shiftby @crcrpar in #2210 - [thunderfx] Avoid split at
Tensor.__eq__by registering it inthunder.torchby @crcrpar in #2211 - fixed installing NCCL for CUDA by @Borda in #2208
- Enable
ruff-checkin pre-commit by @crcrpar in #2192 - unxfail passing test by @t-vi in #2220
- Add #2192 to
.git-blame-ignore-revsby @crcrpar in #2219 - fix cache validity issue, tighten assert by @t-vi in #2223
- Avoid negative number rhs values to bitwise shift tests by @crcrpar in #2227
- Add missing opinfo for
bitwise_right_shifttoelementwise_binary_opsby @crcrpar in #2214 - Enable ruff format in pre-commit by @crcrpar in #2142
- Fixes baddbmm() got an unexpected keyword argument 'batch1' by @kiya00 in #2228
- Only register cudnn executor if it is available by @KaelanDt in #2174
- Representing DTensor in thunder traces by @kshitij12345 in #1907
- install cudnn in quickstarts by @KaelanDt in #2235
- add experimental by @t-vi in #2236
- DTensor: don't error if torch.distributed is unavailable by @kshitij12345 in #2243
- [pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2245
- bump: OS versions for CI by @Borda in #2249
- Plumbing the topk to the nvFuser executor by @protonu in #2237
- bump: bitsandbytes to its next compatible release by @Borda in #2248
- ci: test with latest dependencies by @Borda in #2122
- Adds empty_like,rand_like by @kiya00 in #2225
- Support converting SymTypes Node to input proxy by @kiya00 in #2171
- fix test_reports_benchmark timeout by @kiya00 in #2229
- Fix torch.gather function signature to accept
inputpassed as keyword argument by @kiya00 in #2250 - Fix
bitsandbytesdependency conditions to useplatform_machineinstead ofsys_platformby @Borda in #2257 - add float exception to assertion in jit_ext by @KaelanDt in #2256
- register softmax fudge function for stacklevel by @t-vi in #2259
- include message in NotImplementedError in proxy methods by @t-vi in #2260
- add support for full with tensor input by @t-vi in #2262
- Remove W291, W293, E702, and F722 from
ignoreby @crcrpar in #2267 - Add #2142 of ruff format integration to
.git-blame-ignore-revsby @crcrpar in #2266 - split getitem into basic and "purely" advanced indexing by @t-vi in #2258
- TE: fix related to delayed forward-backward split by @kshitij12345 in #2222
- bump version to 0.2.4 for release by @t-vi in #2273
Full Changelog: 0.2.3...0.2.4