-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Description
-
what changes you made / what code you wrote: No
-
what command you run:
python tools/train_net.py --num-gpus 8 --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml
-
what you observed (full logs are preferred)
(detectron2) [engs1870@arcus-htc-dgxmaxq004 detectron2]$ python tools/train_net.py --num-gpus 8 --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml
Command Line Args: Namespace(config_file='configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml', dist_url='tcp://127.0.0.1:54401', eval_only=False, machine_rank=0, num_gpus=8, num_machines=1, opts=[], resume=False)
Process group URL: tcp://127.0.0.1:54401
Traceback (most recent call last):
File "tools/train_net.py", line 154, in
args=(args,),
File "/data/engs-tvg-lz/engs1870/projects/Det/detectron2/detectron2/engine/launch.py", line 49, in launch
daemon=False,
File "/data/engs-tvg-lz/engs1870/anaconda3/envs/detectron2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/data/engs-tvg-lz/engs1870/anaconda3/envs/detectron2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 107, in join
(error_index, name)
Exception: process 2 terminated with signal SIGFPE
##Environment
(detectron2) [engs1870@arcus-htc-dgxmaxq004 detectron2]$ python -m detectron2.utils.collect_env
Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
Detectron2 Compiler GCC 5.4
DETECTRON2_ENV_MODULE
PyTorch 1.3.0
PyTorch Debug Build False
CUDA available False
Pillow 6.2.0
cv2 4.1.1
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
Hi any thoughts on above error? thanks.