Skip to content

Exception: process 2 terminated with signal SIGFPE #3

@lzrobots

Description

@lzrobots
  • what changes you made / what code you wrote: No

  • what command you run: python tools/train_net.py --num-gpus 8 --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml

  • what you observed (full logs are preferred)

(detectron2) [engs1870@arcus-htc-dgxmaxq004 detectron2]$ python tools/train_net.py --num-gpus 8 --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml

Command Line Args: Namespace(config_file='configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml', dist_url='tcp://127.0.0.1:54401', eval_only=False, machine_rank=0, num_gpus=8, num_machines=1, opts=[], resume=False)
Process group URL: tcp://127.0.0.1:54401
Traceback (most recent call last):
File "tools/train_net.py", line 154, in
args=(args,),
File "/data/engs-tvg-lz/engs1870/projects/Det/detectron2/detectron2/engine/launch.py", line 49, in launch
daemon=False,
File "/data/engs-tvg-lz/engs1870/anaconda3/envs/detectron2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/data/engs-tvg-lz/engs1870/anaconda3/envs/detectron2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 107, in join
(error_index, name)
Exception: process 2 terminated with signal SIGFPE

##Environment

(detectron2) [engs1870@arcus-htc-dgxmaxq004 detectron2]$ python -m detectron2.utils.collect_env


Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
Detectron2 Compiler GCC 5.4
DETECTRON2_ENV_MODULE
PyTorch 1.3.0
PyTorch Debug Build False
CUDA available False
Pillow 6.2.0
cv2 4.1.1


PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

Hi any thoughts on above error? thanks.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions