Comparing changes

nvc++ will stop defining __NVCOMPILER_CUDA_ARCH__ soon, removing the ability to determine the PTX arch at compile time. This updates agents and collective algorithms to no longer require the PTX_ARCH template parameter, and changes the CUB_WARP_SIZE(PTX_ARCH), etc helpers to ignore their argument. These macros only differed on obsolete arches and have no effect on currently supported architectures.

This fixes the issue reported in #299. There's no clear reason why this should use `RandomBits` unconditionally.

The merge sort test with pow2 >20 fails on GTX 1650. Detect bad_alloc failures and skip those tests. Tests for smaller problem sizes will still fail if there's a bad_alloc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on May 16, 2022

This comparison is taking too long to generate.

Uh oh!