-
Notifications
You must be signed in to change notification settings - Fork 761
Mark functions that take the address of a __global__ function as host… #835
Conversation
…-only. clang is about to get pickier about disallowing references to things from host+device code when it won't work on either host or device. clang doesn't currently support launching kernels from the device side, thus these __host__ __device functions that take a function pointer to __global__ functions when __CUDA_RDC__ is not defined are no good. Originally landed as 98b4e16, reverted in 884d199 because the condition was wrong. See NVIDIA#831 (review)
|
LGTM, but let me run a unit tester with -rdc=true to verify the correctness. Will report here. |
|
All tests pass. |
|
Great, thanks! |
|
Thank you, folks! |
|
Bugger. This change must be reverted. I only tested it with -rdc=true -arch=sm_35 or higher. It will fail otherwise because kernel invocation, or address taking, must not be guarded by CUDA_ARCH. I overlooked this aspect. |
|
If it is guarded by CUDA_ARCH device compilation will never specialize kernel. kaboom! |
|
Thanks for the analysis. It may be that this guard needs to be made clang-specific. |
|
This may be easier if you guys write the patch? I am happy to test it. |
|
Wouldn't it be easier if clang would not disallow taking address of a kernel in a device code? |
Possibly, but clang emphasizes being a sound compiler. :) Taking the address of a function you cannot call isn't allowed in C++. For example, you can't take the address of a private function you don't have access to. Indeed taking addresses of functions from device code should probably be disallowed entirely, because indirect calls are not supported on the GPU. We're not there yet, but this is a step in that direction. |
|
Any progress on this? |
|
I am happy to write another patch, but at this point I'm pretty confused about what the guard should be. We checked in the code that makes this fail in clang a few days ago. |
|
Can we please get this fixed as soon as possible? Would changing the guard to only disable the code in question for |
|
Since clang doesn't support Dynamic Parallelism, a simple guard should suffice: @jlebar Please submit PR and I will test it. |
…-only.
clang is about to get pickier about disallowing references to things
from host+device code when it won't work on either host or device.
clang doesn't currently support launching kernels from the device side,
thus these host device functions that take a function pointer to
__global functions when CUDA_RDC is not defined are no good.
Originally landed as 98b4e16, reverted in 884d199 because the
condition was wrong. See
#831 (review)