Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Conversation

@jlebar
Copy link

@jlebar jlebar commented Sep 15, 2016

…-only.

clang is about to get pickier about disallowing references to things
from host+device code when it won't work on either host or device.

clang doesn't currently support launching kernels from the device side,
thus these host device functions that take a function pointer to
__global
functions when CUDA_RDC is not defined are no good.

Originally landed as 98b4e16, reverted in 884d199 because the
condition was wrong. See
#831 (review)

…-only.

clang is about to get pickier about disallowing references to things
from host+device code when it won't work on either host or device.

clang doesn't currently support launching kernels from the device side,
thus these __host__ __device functions that take a function pointer to
__global__ functions when __CUDA_RDC__ is not defined are no good.

Originally landed as 98b4e16, reverted in 884d199 because the
condition was wrong. See
NVIDIA#831 (review)
@3gx
Copy link
Contributor

3gx commented Sep 15, 2016

LGTM, but let me run a unit tester with -rdc=true to verify the correctness. Will report here.

@3gx
Copy link
Contributor

3gx commented Sep 15, 2016

All tests pass.

@jaredhoberock
Copy link
Contributor

Great, thanks!

@jaredhoberock jaredhoberock merged commit 62df72e into NVIDIA:master Sep 15, 2016
@jlebar
Copy link
Author

jlebar commented Sep 15, 2016

Thank you, folks!

@3gx
Copy link
Contributor

3gx commented Sep 16, 2016

Bugger. This change must be reverted. I only tested it with -rdc=true -arch=sm_35 or higher. It will fail otherwise because kernel invocation, or address taking, must not be guarded by CUDA_ARCH. I overlooked this aspect.

@3gx
Copy link
Contributor

3gx commented Sep 16, 2016

If it is guarded by CUDA_ARCH device compilation will never specialize kernel. kaboom!

@jaredhoberock
Copy link
Contributor

Thanks for the analysis. It may be that this guard needs to be made clang-specific.

@jlebar
Copy link
Author

jlebar commented Sep 16, 2016

This may be easier if you guys write the patch? I am happy to test it.

@3gx
Copy link
Contributor

3gx commented Sep 17, 2016

Wouldn't it be easier if clang would not disallow taking address of a kernel in a device code?

@jlebar
Copy link
Author

jlebar commented Sep 17, 2016

Wouldn't it be easier if clang would not disallow taking address of a kernel in a device code?

Possibly, but clang emphasizes being a sound compiler. :)

Taking the address of a function you cannot call isn't allowed in C++. For example, you can't take the address of a private function you don't have access to.

Indeed taking addresses of functions from device code should probably be disallowed entirely, because indirect calls are not supported on the GPU. We're not there yet, but this is a step in that direction.

@gnzlbg
Copy link

gnzlbg commented Oct 9, 2016

Any progress on this?

@jlebar
Copy link
Author

jlebar commented Oct 9, 2016

I am happy to write another patch, but at this point I'm pretty confused about what the guard should be.

We checked in the code that makes this fail in clang a few days ago.

@andrewcorrigan
Copy link
Contributor

Can we please get this fixed as soon as possible? Would changing the guard to only disable the code in question for clang, while leaving it alone for nvcc, be acceptable to everyone?

#if !(defined(__clang__) && defined(__CUDA__)) && (!defined(__CUDA_ARCH__) || (defined(__CUDACC_RDC__) && __CUDA_ARCH__ >= 350))

@3gx
Copy link
Contributor

3gx commented Oct 10, 2016

Since clang doesn't support Dynamic Parallelism, a simple guard should suffice:

#if !(defined(__clang__) && defined(__CUDA__))

@jlebar Please submit PR and I will test it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants