Skip to content

Model takes very long to load and tutorial script fails #7

@DragonDuck

Description

@DragonDuck

If you do not know the root cause of the problem / bug, and wish someone to help you, please
include:

When I try to run the code from the Detectron2 Tutorial Collab, the model takes an extremely long time to load and then crashes with a CUDA Error / Segmentation fault.

To Reproduce

  1. what changes you made / what code you wrote: None
  2. what command you run: Taken from Detectron2 Tutorial Collab
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import matplotlib.pyplot as plt
import numpy as np
import cv2

im = cv2.imread("./input.jpg")

from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
cfg = get_cfg()
cfg.merge_from_file("configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl"
predictor = DefaultPredictor(cfg)
outputs = predictor(im)
  1. what you observed (full logs are preferred):
  • The step DefaultPredictor(cfg) takes an extremely long time (>15 minutes). Specifically, the command build_model(cfg) within the class __init__() takes this long to complete.
  • Minimal usage of the GPU and maximal usage of the CPU (top lists the python process as taking up 100% of CPU power and approx. 3.5% of memory while the GPU takes only approximately 500MB out of available 16GB). It appears that PyTorch is attempting to execute everything on the CPU.
  • Once the model is finally built, attempting to run prediction results in an error:
WARNING [10/11 07:45:38 d2.config.compat]: Config 'configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml' has no VERSION. Assuming it to be compatible with latest v2.
Traceback (most recent call last):
  File "test.py", line 19, in <module>
    outputs = predictor(im)
  File "/home/jan/miniconda3/envs/detectron/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
    return func(*args, **kwargs)
  File "/home/jan/detectron2/detectron2/engine/defaults.py", line 171, in __call__
    height, width = original_image.shape[:2]
AttributeError: 'NoneType' object has no attribute 'shape'

However, I've checked all CUDA versions and everything points to CUDA 10.1, so I don't think this is a version mismatch:

$ conda list cuda
# packages in environment at /home/jan/miniconda3/envs/detectron:
#
# Name                    Version                   Build  Channel
cudatoolkit               10.1.168                      0  
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168
$ echo $LD_LIBRARY_PATH 
:/usr/local/cuda-10.1.back/lib64
$ nvidia-smi 
Fri Oct 11 07:45:41 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   49C    P0    41W / 250W |    269MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   51C    P0    41W / 250W |     10MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     16135      C   python                                       259MiB |
+-----------------------------------------------------------------------------+

All required versions as per the install page are fulfilled:

$ python --version
Python 3.6.9 :: Anaconda, Inc.

$ conda list
# packages in environment at /home/jan/miniconda3/envs/detectron:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
absl-py                   0.8.1                    pypi_0    pypi
backcall                  0.1.0                    py36_0  
blas                      1.0                         mkl  
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2019.8.28                     0  
cairo                     1.14.12              h8948797_3  
certifi                   2019.9.11                py36_0  
cffi                      1.12.3           py36h2e261b9_0  
cloudpickle               1.2.2                    pypi_0    pypi
cudatoolkit               10.1.168                      0  
cycler                    0.10.0                   pypi_0    pypi
cython                    0.29.13                  pypi_0    pypi
decorator                 4.4.0                    py36_1  
detectron2                0.1                       dev_0    <develop>
ffmpeg                    4.0                  hcdf2ecd_0  
fontconfig                2.13.0               h9420a91_0  
freeglut                  3.0.0                hf484d3e_5  
freetype                  2.9.1                h8a8886c_1  
fvcore                    0.1                      pypi_0    pypi
glib                      2.56.2               hd408876_0  
graphite2                 1.3.13               h23475e2_0  
grpcio                    1.24.1                   pypi_0    pypi
harfbuzz                  1.8.8                hffaf4a1_0  
hdf5                      1.10.2               hba1933b_1  
icu                       58.2                 h9c2bf20_1  
intel-openmp              2019.4                      243  
ipython                   7.8.0            py36h39e3cac_0  
ipython_genutils          0.2.0                    py36_0  
jasper                    2.0.14               h07fcdf6_1  
jedi                      0.15.1                   py36_0  
jpeg                      9b                   h024ee3a_2  
kiwisolver                1.1.0                    pypi_0    pypi
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libglu                    9.0.0                hf484d3e_1  
libopencv                 3.4.2                hb342d67_1  
libopus                   1.3                  h7b6447c_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.0.10               h2733197_2  
libuuid                   1.0.3                h1bed415_2  
libvpx                    1.7.0                h439df22_0  
libxcb                    1.13                 h1bed415_1  
libxml2                   2.9.9                hea5a465_1  
markdown                  3.1.1                    pypi_0    pypi
matplotlib                3.1.1                    pypi_0    pypi
mkl                       2019.4                      243  
mkl-service               2.3.0            py36he904b0f_0  
mkl_fft                   1.0.14           py36ha843d7b_0  
mkl_random                1.1.0            py36hd6b4f25_0  
ncurses                   6.1                  he6710b0_1  
ninja                     1.9.0            py36hfd86e86_0  
numpy                     1.17.2           py36haad9e8e_0  
numpy-base                1.17.2           py36hde5b4d6_0  
olefile                   0.46                     py36_0  
opencv                    3.4.2            py36h6fd60c2_1  
openssl                   1.1.1d               h7b6447c_2  
parso                     0.5.1                      py_0  
pcre                      8.43                 he6710b0_0  
pexpect                   4.7.0                    py36_0  
pickleshare               0.7.5                    py36_0  
pillow                    6.2.0            py36h34e0f95_0  
pip                       19.2.3                   py36_0  
pixman                    0.38.0               h7b6447c_0  
portalocker               1.5.1                    pypi_0    pypi
prompt_toolkit            2.0.10                     py_0  
protobuf                  3.10.0                   pypi_0    pypi
ptyprocess                0.6.0                    py36_0  
py-opencv                 3.4.2            py36hb342d67_1  
pycocotools               2.0                      pypi_0    pypi
pycparser                 2.19                     py36_0  
pygments                  2.4.2                      py_0  
pyparsing                 2.4.2                    pypi_0    pypi
python                    3.6.9                h265db76_0  
python-dateutil           2.8.0                    pypi_0    pypi
pytorch                   1.3.0           py3.6_cuda10.1.243_cudnn7.6.3_0    pytorch
pyyaml                    5.1.2                    pypi_0    pypi
readline                  7.0                  h7b6447c_5  
setuptools                41.4.0                   py36_0  
shapely                   1.6.4.post2              pypi_0    pypi
six                       1.12.0                   py36_0  
sqlite                    3.30.0               h7b6447c_0  
tensorboard               2.0.0                    pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
tk                        8.6.8                hbc83047_0  
torchvision               0.4.1                py36_cu101    pytorch
tqdm                      4.36.1                   pypi_0    pypi
traitlets                 4.3.3                    py36_0  
wcwidth                   0.1.7                    py36_0  
werkzeug                  0.16.0                   pypi_0    pypi
wheel                     0.33.6                   py36_0  
xz                        5.2.4                h14c3975_4  
yacs                      0.1.6                    pypi_0    pypi
zlib                      1.2.11               h7b6447c_3  
zstd                      1.3.7                h0b5b093_0  

$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609

Expected behavior

PyTorch appears to run primarily on the CPU instead of the GPU. I expect it to run primarily on the GPU. As I've made no edits to the code, I also expect it to run error-free.

Environment

$ python -m detectron2.utils.collect_env
---------------------  -------------------------------------------------------------------
Python                 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0]
Detectron2 Compiler    GCC 5.4
DETECTRON2_ENV_MODULE  <not set>
PyTorch                1.3.0
PyTorch Debug Build    False
CUDA available         True
GPU 0,1                Tesla P100-PCIE-16GB
Pillow                 6.2.0
cv2                    3.4.2
---------------------  -------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_50,code=compute_50
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions