-
Notifications
You must be signed in to change notification settings - Fork 761
Mark functions that take the address of a __global__ function as host… #831
Conversation
|
cc @Artem-B |
|
Thanks Justin. This won't quite work for us because Thrust does actually launch kernels from I think what we ought to do to make this work for Clang is to retain the annotations of these functions but also guard these places where the address of a Does Clang also provide this macro? |
Interesting, I would have expected this not to compile with clang either, then. (We are building with sm_35.) In any case, let's figure out that macro, if it exists. I don't see it in the nvcc docs [1], and I can't figure out how to print all [1] http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/ |
|
nvcc defines CUDACC_RDC macro in a separable compilation mode (-rdc=true). Together with CUDA_ARCH >= 350, it indicates that device-side kernel launches are permitted. Thrust has unit tests that check device-side launches [1]. Device-side launches will be used only with -rdc=true -arch=sm_35 (or higher). A macro like this [2] can be used to annotate a function. If device-side launches are supported, it will annotate a function with host device, otheriwse it will be just host [1] https://github.com/thrust/thrust/blob/master/testing/backend/cuda/for_each.cu#L63 |
|
Thanks Evghenii. I will probably be less disruptive on the overall code to retain the annotations (I'm worried that will lead to attempting to call a |
…-only. clang is about to get pickier about disallowing references to things from host+device code when it won't work on either host or device. clang doesn't currently support launching kernels from the device side, thus these __host__ __device functions that take a function pointer to __global__ functions when __CUDA_RDC__ is not defined are no good.
157bedb to
98b4e16
Compare
WFM, updated the patch. |
| { | ||
| // Don't try to take the address of launch_by_value from the device side if | ||
| // we don't support launching kernels from __device__ functions. | ||
| #if !defined(__CUDA_ARCH__) || defined(__CUDACC_RDC__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct condition should be
#if !defined(__CUDA_ARCH__) || (defined(__CUDACC_RDC__) && __CUDA_ARCH >= 350)
|
Sorry, I should have waited for Evghenii's review before merging this. @jlebar: Could you resubmit a PR with the corrected guard? |
…-only. clang is about to get pickier about disallowing references to things from host+device code when it won't work on either host or device. clang doesn't currently support launching kernels from the device side, thus these __host__ __device functions that take a function pointer to __global__ functions when __CUDA_RDC__ is not defined are no good. Originally landed as 98b4e16, reverted in 884d199 because the condition was wrong. See NVIDIA#831 (review)
…-only. clang is about to get pickier about disallowing references to things from host+device code when it won't work on either host or device. clang doesn't currently support launching kernels from the device side, thus these __host__ __device functions that take a function pointer to __global__ functions when __CUDA_RDC__ is not defined are no good. Originally landed as 98b4e16, reverted in 884d199 because the condition was wrong. See #831 (review)
…-only.
clang is about to get pickier about disallowing references to things
from host+device code when it won't work on either host or device.
clang doesn't currently support launching kernels from the device side,
thus these
__host__ __device__functions that take a function pointer to__global__functions are no good.It doesn't look like Thrust tries to use nested kernels, so the
__device__attribute appears unnecessary here. But if you like we couldmake these functions
__host__ __device__for compilers other than clang.