When the model is not trained on 8 GPUS but on fewer devices, what should we be careful of?

## ❓ Questions and Help

Thank you for such a wonderful job!

My question is:
When the model is not trained on 8 GPUS but on fewer devices, such as 4 GPUS and 2 GPUS, what should we be careful of?

e.g., In the original maskrcnn-benchmark, the POST_NMS_TOPK_TRAIN should be set as 1000 * batch_per_gpu.

Thanks for your attention! Look forward to your reply.