Conversation
c065b3a to
94d78b3
Compare
06366b2 to
8983abe
Compare
8983abe to
d588658
Compare
| append_option_if_available("-Wgnu" cxx_compile_options) | ||
| # Calling a variadic macro with zero args is a GNU extension until C++20, | ||
| # but the THRUST_PP_ARITY macro is used with zero args. Need to see if this | ||
| # is a real problem worth fixing. |
There was a problem hiding this comment.
It shouldn't be a problem as every compiler under the sun supports the extension, including NVRTC I believe.
|
You should re-run this with CUB in |
|
The whole |
cub/grid/grid_even_share.cuh
Outdated
| this->block_end = num_items_; // Initialize past-the-end | ||
| this->num_items = num_items_; | ||
| this->total_tiles = (num_items_ + tile_items - 1) / tile_items; | ||
| this->grid_size = CUB_MIN(static_cast<int>(total_tiles), max_grid_size); |
There was a problem hiding this comment.
This could overflow, because total_tiles is OffsetT which could be 64bit.
There was a problem hiding this comment.
I think I'd prefer us to either make sure all the static casts don't overflow, or move them to a separate PR.
There was a problem hiding this comment.
Fixed this and similar issues in the thrust/cub kernel dispatch code with a new cub::DivideAndRoundUp method that avoids these (n + d - 1) / d overflows.
cub/grid/grid_even_share.cuh
Outdated
| OffsetT avg_tiles_per_block = total_tiles / grid_size; | ||
| this->big_shares = total_tiles - (avg_tiles_per_block * grid_size); // leftover grains go to big blocks | ||
| // leftover grains go to big blocks | ||
| this->big_shares = static_cast<int>(total_tiles - (avg_tiles_per_block * grid_size)); |
There was a problem hiding this comment.
Is there any reason big_share can't be an OffsetT?
There was a problem hiding this comment.
I took a closer look and if anything, total_tiles should be changed to a 32-bit int since it's used as a block dimension and those can't exceed INT32_MAX. The only usages of the variable assume it is a 32-bit int.
I'll change this so that total_tiles is appropriately sized, which will eliminate the need for this (and a few other) static_casts.
| #pragma GCC diagnostic push | ||
| #pragma GCC diagnostic ignored "-Wdeprecated-declarations" | ||
| #endif | ||
|
|
There was a problem hiding this comment.
I would deprecate or remove this entirely.
| ir |= ia >> (24 - 11); | ||
| ia = ia << (32 - (24 - 11)); | ||
| ir = ir + ((14 + shift) << 10); | ||
| ir = static_cast<uint16_t>(ir + ((14 + shift) << 10)); |
There was a problem hiding this comment.
This is probably okay because it's just in the test code, but I'm pretty sure this is actually a bug due to potential overflow.
| TestDispatch<T, OffsetT, LengthT>(num_items); | ||
| unsigned int num; | ||
| RandomBits(num); | ||
| num = (unsigned int) ((double(num) * double(10000000)) / double(max_int)); |
There was a problem hiding this comment.
Since we're changing this already, maybe static_cast here instead?
test/test_device_scan.cu
Outdated
| TestOp<InputT>(num_items, identity, initial_value); | ||
| unsigned int num; | ||
| RandomBits(num); | ||
| num = (unsigned int) ((double(num) * double(10000000)) / double(max_int)); |
There was a problem hiding this comment.
Maybe static_cast here as a drive-by fix?
brycelelbach
left a comment
There was a problem hiding this comment.
This looks fine, I'd prefer to split the static_cast into a separate PR, or make sure they don't overflow, maybe add a comment to each explaining why they're fine.
Updated #1175 with this recommendation. |
After fixing #221, I reviewed the remaining usages and all are safe. Added comments to any non-obviously-safe ones. |
Anonymous structs are C features. In C++, they're non-portable compiler extensions. These only seemed to pop up in CUB-style `TempStorage` objects, I just picked some reasonable sounding names for them.
We need to deprecate this class since the underlying CUDA APIs are deprecated. This suppression is a temporary workaround. Tracked by NVIDIA#191.
Changing to a `static_cast` fixes this warning.
The `cuda_std_17` compile feature is broken for MSVC when CMake < 3.18.3.
d588658 to
91e9ac7
Compare
91e9ac7 to
0fb8228
Compare
Users have been reporting that device algorithms return invalid `temp_storage_bytes` values when `num_items` is close to -- but not over -- INT32_MAX. This is caused by an overflow in the numerator of the pattern `num_tiles = (num_items + items_per_tile - 1) / items_per_tile`. The new function implements the same calculation but protects against overflow. Fixes NVIDIA#221. Bug 3075796
This value will always be representable with an int, and all usages of it treat it as a 32-bit int. Changing the type avoids some casts at usage sites.
0fb8228 to
e0a6736
Compare
Fixes #228.