-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Description
Without any code changes, I get errors on the data loading part when I try to train the model.
The followings are the error log and my environment
How To Reproduce the Issue
[11/27 11:20:35 d2.data.build]: Using training sampler RepeatFactorTrainingSampler
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Traceback (most recent call last):
File "tools/train_net.py", line 161, in
args=(args,),
File "/detectron2_repo/detectron2/engine/launch.py", line 49, in launch
daemon=False,
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/detectron2_repo/detectron2/engine/launch.py", line 84, in _distributed_worker
main_func(*args)
File "/detectron2_repo/tools/train_net.py", line 143, in main
trainer = Trainer(cfg)
File "/detectron2_repo/detectron2/engine/defaults.py", line 234, in init
super().init(model, data_loader, optimizer)
File "/detectron2_repo/detectron2/engine/train_loop.py", line 194, in init
self._data_loader_iter = iter(data_loader)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 278, in iter
return _MultiProcessingDataLoaderIter(self)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 682, in init
w.start()
File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 59, in _launch
cmd, self._fds)
File "/usr/lib/python3.6/multiprocessing/util.py", line 417, in spawnv_passfds
False, False, None)
OSError: [Errno 12] Cannot allocate memory
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Environment
sys.platform linux
Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0]
Numpy 1.16.4
Detectron2 Compiler GCC 7.4
Detectron2 CUDA Compiler 10.1
DETECTRON2_ENV_MODULE
PyTorch 1.3.1
PyTorch Debug Build False
torchvision 0.4.2
CUDA available True
GPU 0,1,2,3,4,5,6,7 GeForce GTX 1080 Ti
CUDA_HOME /usr/local/cuda
NVCC Cuda compilation tools, release 10.1, V10.1.243
TORCH_CUDA_ARCH_LIST Kepler;Kepler+Tesla;Maxwell;Maxwell+Tegra;Pascal;Volta;Turing
Pillow 6.2.1
cv2 4.1.0
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,