update

yjh0410 · yjh0410 · commit 41c7bc928162 · 2024-02-17T10:17:59.000+08:00
diff --git a/config/plain_detr_config.py b/config/plain_detr_config.py
@@ -9,7 +9,7 @@
         'backbone_norm': 'FrozeBN',
         'res5_dilation': False,
         'pretrained': True,
-        'pretrained_weight': 'spark_resnet50',
+        'pretrained_weight': 'spark_resnet50',  # Cls: imagenet1k_v2; MIM: spark_resnet50
         'max_stride': 32,
         'out_stride': 16,
         # Transformer Ecndoer
diff --git a/config/rtdetr_config.py b/config/rtdetr_config.py
@@ -180,12 +180,13 @@
         'lr_epoch': [33],     # 1x
         # ----------------- Input -----------------
         ## Transforms
-        'train_min_size': [640, 640],   # short edge of image
+        'train_min_size': [[640, 640]],   # short edge of image
         'train_min_size2': [[400, 400], [500, 500], [600, 600]],
         'train_max_size': 640,
-        'test_min_size': [640, 640],
+        'test_min_size': [[640, 640]],
         'test_max_size': 640,
         'random_crop_size': [320, 600],
+        'random_size': [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800],
         ## Pixel mean & std
         'pixel_mean': [0.485, 0.456, 0.406],
         'pixel_std':  [0.229, 0.224, 0.225],
diff --git a/models/detectors/plain_detr/README.md b/models/detectors/plain_detr/README.md
@@ -0,0 +1,57 @@
+# PlainDETR
+
+Our `PlainDETR-R50-1x` baseline on COCO-val:
+```Shell
+```
+
+## Results on COCO
+
+| Model           |  Scale     |  Pretrained  |  FPS  | AP<sup>val<br>0.5:0.95 | AP<sup>val<br>0.5 | Weight | Logs  |
+| --------------- | ---------- | ------------ | ----- | ---------------------- |  ---------------  | ------ | ----- |
+| PlainDETR-R50   |  800,1333  |   IN1K-Cls   |       |                        |                   |  |  |
+| PlainDETR-R50   |  800,1333  |   IN1K-MIM   |       |                        |                   |  |  |
+
+- We explore whether PlainDETR can still be powerful when using ResNet as the backbone.
+- We set up two comparative experiments, using the ResNet-50 pre-trained for the IN1K classification task and the ResNet-50 pre-trained by IN1K's MIM as the backbone of PlainDETR. Among them, we used the MIM pre-trained ResNet-50 provided by [SparK](https://github.com/keyu-tian/SparK).
+
+
+## Train PlainDETR
+### Single GPU
+Taking training **PlainDETR** on COCO as the example,
+```Shell
+python main.py --cuda -d coco --root path/to/coco -m plain_detr_r50 --batch_size 16 --eval_epoch 2
+```
+
+### Multi GPU
+Taking training **PlainDETR** on COCO as the example,
+```Shell
+python -m torch.distributed.run --nproc_per_node=8 train.py --cuda -dist -d coco --root path/to/coco -m plain_detr_r50 --batch_size 16 --eval_epoch 2 
+```
+
+## Test PlainDETR
+Taking testing **PlainDETR** on COCO-val as the example,
+```Shell
+python test.py --cuda -d coco --root path/to/coco -m plain_detr_r50 --weight path/to/plain_detr_r50.pth -vt 0.4 --show 
+```
+
+## Evaluate PlainDETR
+Taking evaluating **PlainDETR** on COCO-val as the example,
+```Shell
+python main.py --cuda -d coco --root path/to/coco -m plain_detr_r50 --resume path/to/plain_detr_r50.pth --eval_first
+```
+
+## Demo
+### Detect with Image
+```Shell
+python demo.py --mode image --path_to_img path/to/image_dirs/ --cuda -m plain_detr_r50 --weight path/to/weight -vt 0.4 --show
+```
+
+### Detect with Video
+```Shell
+python demo.py --mode video --path_to_vid path/to/video --cuda -m plain_detr_r50 --weight path/to/weight -vt 0.4 --show --gif
+```
+
+### Detect with Camera
+```Shell
+python demo.py --mode camera --cuda -m plain_detr_r50 --weight path/to/weight -vt 0.4 --show --gif
+```
diff --git a/models/detectors/rtdetr/README.md b/models/detectors/rtdetr/README.md
@@ -0,0 +1,54 @@
+# Real-time DETR
+
+Our `RT-DETR` baseline on COCO-val:
+```Shell
+```
+
+## Results on COCO
+
+| Model         |  Scale     |  FPS  | AP<sup>val<br>0.5:0.95 | AP<sup>val<br>0.5 | Weight | Logs  |
+| ------------- | ---------- | ----- | ---------------------- |  ---------------  | ------ | ----- |
+| RT-DETR-R18   |  800,1333  |       |                        |                   |  |  |
+| RT-DETR-R50   |  800,1333  |       |                        |                   |  |  |
+
+
+## Train RT-DETR
+### Single GPU
+Taking training **RT-DETR** on COCO as the example,
+```Shell
+python main.py --cuda -d coco --root path/to/coco -m rtdetr_r50 --batch_size 16 --eval_epoch 2
+```
+
+### Multi GPU
+Taking training **RT-DETR** on COCO as the example,
+```Shell
+python -m torch.distributed.run --nproc_per_node=8 train.py --cuda -dist -d coco --root path/to/coco -m rtdetr_r50 --batch_size 16 --eval_epoch 2 
+```
+
+## Test RT-DETR
+Taking testing **RT-DETR** on COCO-val as the example,
+```Shell
+python test.py --cuda -d coco --root path/to/coco -m rtdetr_r50 --weight path/to/rtdetr_r50.pth -vt 0.4 --show 
+```
+
+## Evaluate RT-DETR
+Taking evaluating **RT-DETR** on COCO-val as the example,
+```Shell
+python main.py --cuda -d coco --root path/to/coco -m rtdetr_r50 --resume path/to/rtdetr_r50.pth --eval_first
+```
+
+## Demo
+### Detect with Image
+```Shell
+python demo.py --mode image --path_to_img path/to/image_dirs/ --cuda -m rtdetr_r50 --weight path/to/weight -vt 0.4 --show
+```
+
+### Detect with Video
+```Shell
+python demo.py --mode video --path_to_vid path/to/video --cuda -m rtdetr_r50 --weight path/to/weight -vt 0.4 --show --gif
+```
+
+### Detect with Camera
+```Shell
+python demo.py --mode camera --cuda -m rtdetr_r50 --weight path/to/weight -vt 0.4 --show --gif
+```