Mark functions that take the address of a global function as host… #831

jlebar · 2016-09-14T20:43:54Z

…-only.

clang is about to get pickier about disallowing references to things
from host+device code when it won't work on either host or device.

clang doesn't currently support launching kernels from the device side,
thus these __host__ __device__ functions that take a function pointer to
__global__ functions are no good.

It doesn't look like Thrust tries to use nested kernels, so the
__device__ attribute appears unnecessary here. But if you like we could
make these functions __host__ __device__ for compilers other than clang.

jlebar · 2016-09-14T20:44:14Z

cc @Artem-B

jaredhoberock · 2016-09-14T20:58:11Z

Thanks Justin. This won't quite work for us because Thrust does actually launch kernels from __device__ functions :-) It's possible to use nested Thrust algorithms when the compiler generates relocatable device code and the SM arch is >= sm_35.

I think what we ought to do to make this work for Clang is to retain the annotations of these functions but also guard these places where the address of a __global__ function is taken with a preprocessor macro. I think nvcc already defines a preprocessor symbol that indicates whether kernel launches are enabled for __device__ functions. Basically this macro (whose name I can't recall at the moment) indicates whether "Dynamic Parallelism" is available.

Does Clang also provide this macro?

jlebar · 2016-09-14T21:11:10Z

This won't quite work for us because Thrust does actually launch kernels from device functions :-)

Interesting, I would have expected this not to compile with clang either, then. (We are building with sm_35.)

In any case, let's figure out that macro, if it exists. I don't see it in the nvcc docs [1], and I can't figure out how to print all #defines from nvcc [2]. Any other ideas?

[1] http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/
[2] With clang, it's -E -x clang -dM /dev/null

3gx · 2016-09-14T21:49:50Z

nvcc defines CUDACC_RDC macro in a separable compilation mode (-rdc=true). Together with CUDA_ARCH >= 350, it indicates that device-side kernel launches are permitted.

Thrust has unit tests that check device-side launches [1]. Device-side launches will be used only with -rdc=true -arch=sm_35 (or higher).

A macro like this [2] can be used to annotate a function. If device-side launches are supported, it will annotate a function with host device, otheriwse it will be just host

[1] https://github.com/thrust/thrust/blob/master/testing/backend/cuda/for_each.cu#L63
[2] https://github.com/NVlabs/cub/blob/1.5.2/cub/util_arch.cuh#L58

jaredhoberock · 2016-09-14T22:09:23Z

Thanks Evghenii.

I will probably be less disruptive on the overall code to retain the annotations (I'm worried that will lead to attempting to call a __host__ function from a __host__ __device__ function, for example). So I think all we need to do is to guard taking the address with something like #if defined(__CUDACC_RDC__). The #else branch can just return 0.

…-only. clang is about to get pickier about disallowing references to things from host+device code when it won't work on either host or device. clang doesn't currently support launching kernels from the device side, thus these __host__ __device functions that take a function pointer to __global__ functions when __CUDA_RDC__ is not defined are no good.

jlebar · 2016-09-14T22:18:11Z

I will probably be less disruptive on the overall code to retain the annotations (I'm worried that will lead to attempting to call a host function from a host device function, for example). So I think all we need to do is to guard taking the address with something like #if defined(CUDACC_RDC). The #else branch can just return 0.

WFM, updated the patch.

3gx · 2016-09-14T22:22:47Z

thrust/system/cuda/detail/bulk/detail/cuda_launcher/triple_chevron_launcher.hpp

  {
+    // Don't try to take the address of launch_by_value from the device side if
+    // we don't support launching kernels from __device__ functions.
+#if !defined(__CUDA_ARCH__) || defined(__CUDACC_RDC__)


The correct condition should be

#if !defined(__CUDA_ARCH__) || (defined(__CUDACC_RDC__) && __CUDA_ARCH >= 350)

jaredhoberock · 2016-09-14T23:15:09Z

Sorry, I should have waited for Evghenii's review before merging this.

@jlebar: Could you resubmit a PR with the corrected guard?

…-only. clang is about to get pickier about disallowing references to things from host+device code when it won't work on either host or device. clang doesn't currently support launching kernels from the device side, thus these __host__ __device functions that take a function pointer to __global__ functions when __CUDA_RDC__ is not defined are no good. Originally landed as 98b4e16, reverted in 884d199 because the condition was wrong. See NVIDIA#831 (review)

jlebar · 2016-09-15T20:36:50Z

#835

…-only. clang is about to get pickier about disallowing references to things from host+device code when it won't work on either host or device. clang doesn't currently support launching kernels from the device side, thus these __host__ __device functions that take a function pointer to __global__ functions when __CUDA_RDC__ is not defined are no good. Originally landed as 98b4e16, reverted in 884d199 because the condition was wrong. See #831 (review)

jlebar force-pushed the launch-host-only branch from 157bedb to 98b4e16 Compare September 14, 2016 22:17

jaredhoberock merged commit 1ca24ce into NVIDIA:master Sep 14, 2016

3gx reviewed Sep 14, 2016

View reviewed changes

jaredhoberock mentioned this pull request Sep 14, 2016

Revert "Mark functions that take the address of a __global__ function as host…" #832

Merged

jlebar mentioned this pull request Sep 15, 2016

Mark functions that take the address of a __global__ function as host… #835

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mark functions that take the address of a global function as host… #831

Mark functions that take the address of a global function as host… #831

Uh oh!

jlebar commented Sep 14, 2016 •

edited

Loading

Uh oh!

jlebar commented Sep 14, 2016

Uh oh!

jaredhoberock commented Sep 14, 2016

Uh oh!

jlebar commented Sep 14, 2016

Uh oh!

3gx commented Sep 14, 2016

Uh oh!

jaredhoberock commented Sep 14, 2016 •

edited

Loading

Uh oh!

jlebar commented Sep 14, 2016

Uh oh!

3gx Sep 14, 2016

Uh oh!

jaredhoberock commented Sep 14, 2016

Uh oh!

jlebar commented Sep 15, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mark functions that take the address of a __global__ function as host… #831

Mark functions that take the address of a __global__ function as host… #831

Uh oh!

Conversation

jlebar commented Sep 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlebar commented Sep 14, 2016

Uh oh!

jaredhoberock commented Sep 14, 2016

Uh oh!

jlebar commented Sep 14, 2016

Uh oh!

3gx commented Sep 14, 2016

Uh oh!

jaredhoberock commented Sep 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlebar commented Sep 14, 2016

Uh oh!

3gx Sep 14, 2016

Choose a reason for hiding this comment

Uh oh!

jaredhoberock commented Sep 14, 2016

Uh oh!

jlebar commented Sep 15, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mark functions that take the address of a global function as host… #831

Mark functions that take the address of a global function as host… #831

jlebar commented Sep 14, 2016 •

edited

Loading

jaredhoberock commented Sep 14, 2016 •

edited

Loading