-
Notifications
You must be signed in to change notification settings - Fork 761
use overloading with clang to avoid taking a reference to a __global_… #842
use overloading with clang to avoid taking a reference to a __global_… #842
Conversation
…_ function in __device__ code
|
You beat me on the idea :) Having came to same idea independently probably tells it is not such a bad one after all. I'd suggest one slight improvement though. It would be good to fix macro in https://github.com/thrust/thrust/blob/master/thrust/detail/config/compiler.h#L77 to set It appears we use in many places in Thrust, and it would be a nightmare to change if the signature of clang device compiler will change. @andrewcorrigan can you please adjust https://github.com/thrust/thrust/blob/master/thrust/detail/config/compiler.h#L77 correspondingly, so that we can just type Thanks |
|
working on it. |
|
The change looks good to me. Unit tests pass with nvcc in both separable and whole program compilation mode. |
We would never dream of such a thing. :) I'm happy we were able to settle on something we're all happy with. |
|
|
||
| __device__ | ||
| static global_function_pointer_t global_function_pointer() | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment saying that clang doesn't support dynamic parallelism, and add assert(0) or something like this to terminate execution should this code path be ever taken at runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is assert(0) ok in __device__ code? How about bulk::detail::terminate() instead? If you want analogous changes in thrust/system/cuda/detail/detail/launch_closure.inl, can that call directly into bulk::detail::terminate() too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(0) works in nvcc, don't know about clang. bulk::detail::terminate() is better I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert works in clang too, fwiw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used bulk::detail::terminate() in bulk and assert(0) otherwise. Anything else?
jaredhoberock
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. There's also terminate_with_message(). If you wanted, you could add a message indicating why the program had to be terminated.
| #if defined(__CUDA__) | ||
| #define THRUST_DEVICE_COMPILER THRUST_DEVICE_COMPILER_NVCC | ||
| #else | ||
| #if defined(__CUDA__) && defined(__clang__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if defined __CUDACC__ on line 66 would still set compiler to THRUST_DEVICE_COMPILER_NVCC and we'll never make it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, if THRUST_HOST_COMPILER == THRUST_HOST_COMPILER_CLANG, then it will trigger static asserts in number of places that expect compiler for CUDA code to be NVCC.
For instance there's a static assert in thrust/system/cuda/detail/for_each.inl:109
…_ function in device code
"So the canonical way of solving this in clang, I think, is to write a host and a device overload of the function in question." #840 (comment)
Since Clang allows overloading, how about making use of that? @jaredhoberock @egaburov @jlebar