Skip to content

RuntimeError: CUDA error: no kernel image is available for execution on the device #235

@gunshi

Description

@gunshi

Hi, I'm using detectron2 on a computing cluster and thus have various gpus that the code will be run on as per allocation. detectron was installed successfully and i'm able to import it from python.

However I get the following error on certain(most) gpus:

RuntimeError: CUDA error: no kernel image is available for execution on the device (ROIAlign_forward_cuda at /network/home/guptagun/od/detectron2_repo/detectron2/layers/csrc/ROIAlign/ROIAlign_cuda.cu:361)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f2459bbc687 in /network/home/guptagun/anaconda3/envs/detectron/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: detectron2::ROIAlign_forward_cuda(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0xa24 (0x7f23f419189c in /network/home/guptagun/od/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #2: detectron2::ROIAlign_forward(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0xb6 (0x7f23f4132f66 in /network/home/guptagun/od/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x4ec8f (0x7f23f4144c8f in /network/home/guptagun/od/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x49750 (0x7f23f413f750 in /network/home/guptagun/od/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
<omitting python frames>
frame #9: THPFunction_apply(_object*, _object*) + 0x8d6 (0x7f245a4abe96 in /network/home/guptagun/anaconda3/envs/detectron/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

While on some gpus (one of them being Geforce GTX) the code runs as expected.

I was trying to run the demo.py file through:

python detectron2_repo/demo/demo.py --config-file detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input ./leftImg8bit.png --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

Environment

output of python -m detectron2.utils.collect_env.

------------------------  --------------------------------------------------
sys.platform              linux
Python                    3.7.5 (default, Oct 25 2019, 15:51:11) [GCC 7.3.0]
Numpy                     1.15.4
Detectron2 Compiler       GCC 7.4
Detectron2 CUDA Compiler  10.0
DETECTRON2_ENV_MODULE     <not set>
PyTorch                   1.3.0
PyTorch Debug Build       False
torchvision               0.4.1a0+d94043a
CUDA available            True
GPU 0                     GeForce GTX TITAN X
CUDA_HOME                 None
Pillow                    5.3.0
cv2                       4.1.0
------------------------  --------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

when i build detectron using : python setup.py build develop
TORCH_CUDA_ARCH_LIST was set empty, and so it should have been compiled for all architectures? (acc to #62 (comment))
What can I do while compiling so that I'm able to use detectron on most gpus, or is this an issue with the compute node I'm using?

Thanks,
Gunshi

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions