unpickling error on dataloading

Without any code changes, I get errors on the data loading part when I try to train the model.
The followings are the error log and my environment

## How To Reproduce the Issue

[11/27 11:20:35 d2.data.build]: Using training sampler RepeatFactorTrainingSampler
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Traceback (most recent call last):
  File "tools/train_net.py", line 161, in <module>
    args=(args,),
  File "/detectron2_repo/detectron2/engine/launch.py", line 49, in launch
    daemon=False,
  File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 171, in spawn
    while not spawn_context.join():
  File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/detectron2_repo/detectron2/engine/launch.py", line 84, in _distributed_worker
    main_func(*args)
  File "/detectron2_repo/tools/train_net.py", line 143, in main
    trainer = Trainer(cfg)
  File "/detectron2_repo/detectron2/engine/defaults.py", line 234, in __init__
    super().__init__(model, data_loader, optimizer)
  File "/detectron2_repo/detectron2/engine/train_loop.py", line 194, in __init__
    self._data_loader_iter = iter(data_loader)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 278, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 682, in __init__
    w.start()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 59, in _launch
    cmd, self._fds)
  File "/usr/lib/python3.6/multiprocessing/util.py", line 417, in spawnv_passfds
    False, False, None)
OSError: [Errno 12] Cannot allocate memory

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated

## Environment

------------------------  -------------------------------------------------------------
sys.platform              linux
Python                    3.6.9 (default, Nov  7 2019, 10:44:02) [GCC 8.3.0]
Numpy                     1.16.4
Detectron2 Compiler       GCC 7.4
Detectron2 CUDA Compiler  10.1
DETECTRON2_ENV_MODULE     <not set>
PyTorch                   1.3.1
PyTorch Debug Build       False
torchvision               0.4.2
CUDA available            True
GPU 0,1,2,3,4,5,6,7       GeForce GTX 1080 Ti
CUDA_HOME                 /usr/local/cuda
NVCC                      Cuda compilation tools, release 10.1, V10.1.243
TORCH_CUDA_ARCH_LIST      Kepler;Kepler+Tesla;Maxwell;Maxwell+Tegra;Pascal;Volta;Turing
Pillow                    6.2.1
cv2                       4.1.0
------------------------  -------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

unpickling error on dataloading #410

How To Reproduce the Issue

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

unpickling error on dataloading #410

Description

How To Reproduce the Issue

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions