11# Changelog
22
3+ ## Thrust 2.1.0
4+
5+ ### New Features
6+
7+ - NVIDIA/thrust #1805 : Add default constructors to ` transform_output_iterator `
8+ and ` transform_input_output_iterator ` . Thanks to Mark Harris (@harrism ) for this contribution.
9+ - NVIDIA/thrust #1836 : Enable constructions of vectors from ` std::initializer_list ` .
10+
11+ ### Bug Fixes
12+
13+ - NVIDIA/thrust #1768 : Fix type conversion warning in the ` thrust::complex ` utilities. Thanks to
14+ Zishi Wu (@zishiwu123 ) for this contribution.
15+ - NVIDIA/thrust #1809 : Fix some warnings about usage of ` __host__ ` functions in ` __device__ ` code.
16+ - NVIDIA/thrust #1825 : Fix Thrust's CMake install rules. Thanks to Robert Maynard (@robertmaynard )
17+ for this contribution.
18+ - NVIDIA/thrust #1827 : Fix ` thrust::reduce_by_key ` when using non-default-initializable iterators.
19+ - NVIDIA/thrust #1832 : Fix bug in device-side CDP ` thrust::reduce ` when using a large number of
20+ inputs.
21+
22+ ### Other Enhancements
23+
24+ - NVIDIA/thrust #1815 : Update Thrust's libcu++ git submodule to version 1.8.1.
25+ - NVIDIA/thrust #1841 : Fix invalid code in execution policy documentation example. Thanks to Raphaël
26+ Frantz (@Eren121 ) for this contribution.
27+ - NVIDIA/thrust #1848 : Improve error messages when attempting to launch a kernel on a device that is
28+ not supported by compiled PTX versions. Thanks to Zahra Khatami (@zkhatami ) for this contribution.
29+ - NVIDIA/thrust #1855 : Remove usage of deprecated CUDA error codes.
30+
31+ ## Thrust 2.0.1
32+
33+ ### Other Enhancements
34+
35+ - Disable CDP parallelization of device-side invocations of Thrust algorithms on SM90+. The removal
36+ of device-side synchronization support in recent architectures makes Thrust's fork-join model
37+ unimplementable on device, so a serial implementation will be used instead. Host-side invocations
38+ of Thrust algorithms are not affected.
39+
340## Thrust 2.0.0
441
542### Summary
@@ -26,7 +63,7 @@ several minor bugfixes and cleanups.
2663 - ` THRUST_INCLUDE_HOST_CODE ` : Replace with ` NV_IF_TARGET ` .
2764 - ` THRUST_INCLUDE_DEVICE_CODE ` : Replace with ` NV_IF_TARGET ` .
2865 - ` THRUST_DEVICE_CODE ` : Replace with ` NV_IF_TARGET ` .
29- - NVIDIA/thrust #1661 : Thrust’ s CUDA Runtime support macros have been updated to
66+ - NVIDIA/thrust #1661 : Thrust' s CUDA Runtime support macros have been updated to
3067 support ` NV_IF_TARGET ` . They are now defined consistently across all
3168 host/device compilation passes. This should not affect most usages of these
3269 macros, but may require changes for some edge cases.
@@ -59,7 +96,7 @@ several minor bugfixes and cleanups.
5996 - CMake builds that use the Thrust packages via CPM, ` add_subdirectory ` ,
6097 or ` find_package ` are not affected.
6198- NVIDIA/thrust #1760 : A compile-time error is now emitted when a ` __device__ `
62- -only lambda’ s return type is queried from host code (requires libcu++ ≥
99+ -only lambda' s return type is queried from host code (requires libcu++ ≥
63100 1.9.0).
64101 - Due to limitations in the CUDA programming model, the result of this query
65102 is unreliable, and will silently return an incorrect result. This leads to
@@ -83,7 +120,7 @@ several minor bugfixes and cleanups.
83120 to ` thrust::make_zip_function ` . Thanks to @mfbalin for this contribution.
84121- NVIDIA/thrust #1722 : Remove CUDA-specific error handler from code that may be
85122 executed on non-CUDA backends. Thanks to @dkolsen-pgi for this contribution.
86- - NVIDIA/thrust #1756 : Fix ` copy_if ` for output iterators that don’ t support copy
123+ - NVIDIA/thrust #1756 : Fix ` copy_if ` for output iterators that don' t support copy
87124 assignment. Thanks for @mfbalin for this contribution.
88125
89126### Other Enhancements
@@ -157,7 +194,7 @@ numerous bugfixes and stability improvements.
157194
158195#### New ` thrust::cuda::par_nosync ` Execution Policy
159196
160- Most of Thrust’ s parallel algorithms are fully synchronous and will block the
197+ Most of Thrust' s parallel algorithms are fully synchronous and will block the
161198calling CPU thread until all work is completed. This design avoids many pitfalls
162199associated with asynchronous GPU programming, resulting in simpler and
163200less-error prone usage for new CUDA developers. Unfortunately, this improvement
@@ -222,12 +259,12 @@ on the calling GPU thread instead of launching a device-wide kernel.
222259
223260### Enhancements
224261
225- - NVIDIA/thrust#1511: Use CUB’ s new `DeviceMergeSort` API and remove Thrust’ s
262+ - NVIDIA/thrust#1511: Use CUB' s new `DeviceMergeSort` API and remove Thrust' s
226263 internal implementation.
227264- NVIDIA/thrust#1566: Improved performance of `thrust::shuffle`. Thanks to
228265 @djns99 for this contribution.
229266- NVIDIA/thrust#1584: Support user-defined `CMAKE_INSTALL_INCLUDEDIR` values in
230- Thrust’ s CMake install rules. Thanks to @robertmaynard for this contribution.
267+ Thrust' s CMake install rules. Thanks to @robertmaynard for this contribution.
231268
232269### Bug Fixes
233270
@@ -239,7 +276,7 @@ on the calling GPU thread instead of launching a device-wide kernel.
239276- NVIDIA/thrust#1597: Fix some collisions with the `small` macro defined
240277 in `windows.h`.
241278- NVIDIA/thrust#1599, NVIDIA/thrust#1603: Fix some issues with version handling
242- in Thrust’ s CMake packages.
279+ in Thrust' s CMake packages.
243280- NVIDIA/thrust#1614: Clarify that scan algorithm results are non-deterministic
244281 for pseudo-associative operators (e.g. floating-point addition).
245282
@@ -752,7 +789,7 @@ Starting with the upcoming 1.10.0 release, C++03 support will be dropped
752789 passing a size.
753790 This was necessary to enable usage of Thrust caching MR allocators with
754791 synchronous Thrust algorithms.
755- This change has allowed NVC++’ s C++17 Parallel Algorithms implementation to
792+ This change has allowed NVC++' s C++17 Parallel Algorithms implementation to
756793 switch to use Thrust caching MR allocators for device temporary storage,
757794 which gives a 2x speedup on large multi-GPU systems such as V100 and A100
758795 DGX where `cudaMalloc` is very slow.
0 commit comments