Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Conversation

@andrewcorrigan
Copy link
Contributor

…_ function in device code

"So the canonical way of solving this in clang, I think, is to write a host and a device overload of the function in question." #840 (comment)

Since Clang allows overloading, how about making use of that? @jaredhoberock @egaburov @jlebar

@3gx
Copy link
Contributor

3gx commented Oct 11, 2016

You beat me on the idea :) Having came to same idea independently probably tells it is not such a bad one after all.

I'd suggest one slight improvement though. It would be good to fix macro in https://github.com/thrust/thrust/blob/master/thrust/detail/config/compiler.h#L77 to set THRUST_DEVICE_COMPILER to THRUST_DEVICE_COMPILER_CLANG

It appears we use

#if (defined(__clang__) && defined(__CUDA__))

in many places in Thrust, and it would be a nightmare to change if the signature of clang device compiler will change.

@andrewcorrigan can you please adjust https://github.com/thrust/thrust/blob/master/thrust/detail/config/compiler.h#L77 correspondingly, so that we can just type

#if THRUST_DEVICE_COMPILER  == THRUST_DEVICE_COPMILER_CLANG
<clang code>
#else
<default code>
#endif

Thanks

@andrewcorrigan
Copy link
Contributor Author

working on it.

@3gx
Copy link
Contributor

3gx commented Oct 11, 2016

The change looks good to me. Unit tests pass with nvcc in both separable and whole program compilation mode.

@jlebar
Copy link

jlebar commented Oct 11, 2016

it would be a nightmare to change if the signature of clang device compiler will change.

We would never dream of such a thing. :)

I'm happy we were able to settle on something we're all happy with.


__device__
static global_function_pointer_t global_function_pointer()
{
Copy link
Contributor

@3gx 3gx Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment saying that clang doesn't support dynamic parallelism, and add assert(0) or something like this to terminate execution should this code path be ever taken at runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is assert(0) ok in __device__ code? How about bulk::detail::terminate() instead? If you want analogous changes in thrust/system/cuda/detail/detail/launch_closure.inl, can that call directly into bulk::detail::terminate() too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert(0) works in nvcc, don't know about clang. bulk::detail::terminate() is better I think.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert works in clang too, fwiw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used bulk::detail::terminate() in bulk and assert(0) otherwise. Anything else?

Copy link
Contributor

@jaredhoberock jaredhoberock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. There's also terminate_with_message(). If you wanted, you could add a message indicating why the program had to be terminated.

@jaredhoberock jaredhoberock merged commit 7a8ea01 into NVIDIA:master Oct 11, 2016
#if defined(__CUDA__)
#define THRUST_DEVICE_COMPILER THRUST_DEVICE_COMPILER_NVCC
#else
#if defined(__CUDA__) && defined(__clang__)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#if defined __CUDACC__ on line 66 would still set compiler to THRUST_DEVICE_COMPILER_NVCC and we'll never make it here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, if THRUST_HOST_COMPILER == THRUST_HOST_COMPILER_CLANG, then it will trigger static asserts in number of places that expect compiler for CUDA code to be NVCC.
For instance there's a static assert in thrust/system/cuda/detail/for_each.inl:109

3gx added a commit to 3gx/thrust that referenced this pull request Oct 17, 2016
andrewcorrigan pushed a commit to andrewcorrigan/thrust that referenced this pull request Oct 23, 2016
andrewcorrigan added a commit to andrewcorrigan/thrust that referenced this pull request Oct 24, 2016
andrewcorrigan added a commit to andrewcorrigan/thrust that referenced this pull request Oct 24, 2016
3gx pushed a commit that referenced this pull request Oct 24, 2016
@andrewcorrigan andrewcorrigan deleted the issue_385_or_how_about_overloading_in_clang branch October 29, 2016 11:04
brycelelbach pushed a commit that referenced this pull request May 16, 2020
brycelelbach pushed a commit that referenced this pull request May 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants