Jiao Zhan, Yarong Luo, Chi Guo, Yejun Wu, Jingnan Liu
- If you find our work useful, please cite this paper: Zhan J, Luo Y, Guo C, et al. YOLOPX: Anchor-free multi-task learning network for panoptic driving perception[J]. Pattern Recognition, 2024, 148: 110152. paper
-
2023-4-27: We've uploaded the experiment results along with some code, and the full code will be released soon! -
2023-9-15: We have uploaded part of the code and the full code will be released soon! -
2024-10-24: We have released the full code!
Panoptic driving perception encompasses traffic object detection, drivable area segmentation, and lane detection. Existing methods typically utilize anchor-based multi-task learning networks to complete this task. While these methods yield promising results, they suffer from the inherent limitations of anchor-based detectors. In this paper, we propose YOLOPX, a simple and efficient anchor-free multi-task learning network for panoptic driving perception. To the best of our knowledge, this is the first work to employ the anchor-free detection head in panoptic driving perception. This anchor-free manner simplifies training by avoiding anchor-related heuristic tuning, and enhances the adaptability and scalability of our multi-task learning network. In addition, YOLOPX incorporates a novel lane detection head that combines multi-scale high-resolution features and long-distance contextual dependencies to improve segmentation performance. Beyond structure optimization, we propose optimization improvements to enhance network training, enabling our multi-task learning network to achieve optimal performance through simple end-to-end training. Experimental results on the challenging BDD100K dataset demonstrate the state-of-the-art (SOTA) performance of YOLOPX: it achieves 93.7 % recall and 83.3% mAP50 on traffic object detection, 93.2% mIoU on drivable area segmentation, and 88.6% accuracy and 27.2% IoU on lane detection. Moreover, YOLOPX has faster inference speed compared to the lightweight network YOLOP. Consequently, YOLOPX is a powerful solution for panoptic driving perception problems. The code is available at https://github.com/jiaoZ7688/YOLOPX.
- We use the BDD100K as our datasets,and experiments are run on NVIDIA TESLA V100.
- model : trained on the BDD100k train set and test on the BDD100k val set .
-
Our network has excellent robustness and generalization!!!!!
-
Even on new datasets (KITTI) with different image sizes and application scenarios, our network performs well.
-
This is helpful for related research in SLAM.
-
Note: The raw videos comes from KITTI
-
The results of our experiments are as follows:
- Note: The raw videos comes from YOLOP and HybridNets
- The results of our experiments are as follows:
- The results on the BDD100k val set.
We compare YOLOPX with the current open source YOLOP and HybridNets on the NVIDIA RTX 3080. In terms of real-time, we compare the inference speed (excluding data pre-processing and NMS operations) at batch size 1.
| Model | Backbone | Params | Speed (fps) | Anchor |
|---|---|---|---|---|
YOLOP |
CSPDarknet | 7.9M | 39 | √ |
HybridNets |
EfficientNet | 12.8M | 17 | √ |
YOLOPX |
ELANNet | 32.9M | 47 | × |
| Result | Visualization | |||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| Result | Visualization | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| Result | Visualization | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
├─inference
│ ├─image # inference images
│ ├─image_output # inference result
├─lib
│ ├─config/default # configuration of training and validation
│ ├─core
│ │ ├─activations.py # activation function
│ │ ├─evaluate.py # calculation of metric
│ │ ├─function.py # training and validation of model
│ │ ├─general.py #calculation of metric、nms、conversion of data-format、visualization
│ │ ├─loss.py # loss function
│ │ ├─postprocess.py # postprocess(refine da-seg and ll-seg, unrelated to paper)
│ ├─dataset
│ │ ├─AutoDriveDataset.py # Superclass dataset,general function
│ │ ├─bdd.py # Subclass dataset,specific function
│ │ ├─convect.py
│ │ ├─DemoDataset.py # demo dataset(image, video and stream)
│ ├─models
│ │ ├─YOLOP.py # Setup and Configuration of model
│ │ ├─YOLOX_Head.py # YOLOX's decoupled Head
│ │ ├─YOLOX_Loss.py # YOLOX's detection Loss
│ │ ├─commom.py # calculation module
│ ├─utils
│ │ ├─augmentations.py # data augumentation
│ │ ├─autoanchor.py # auto anchor(k-means)
│ │ ├─split_dataset.py # (Campus scene, unrelated to paper)
│ │ ├─plot.py # plot_box_and_mask
│ │ ├─utils.py # logging、device_select、time_measure、optimizer_select、model_save&initialize 、Distributed training
│ ├─run
│ │ ├─dataset/training time # Visualization, logging and model_save
├─tools
│ │ ├─demo.py # demo(folder、camera)
│ │ ├─test.py
│ │ ├─train.py
├─weights # Pretraining modelThis codebase has been developed with python version 3.7, PyTorch 1.12+ and torchvision 0.13+
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
or
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
See requirements.txt for additional dependencies and version requirements.
pip install -r requirements.txt
You can get the pre-trained model from baidu or google. Baidu extraction code:fvuc
For BDD100K: imgs, det_annot, da_seg_annot, ll_seg_annot
We recommend the dataset directory structure to be the following:
# The id represent the correspondence relation
├─dataset root
│ ├─images
│ │ ├─train
│ │ ├─val
│ ├─det_annotations
│ │ ├─train
│ │ ├─val
│ ├─da_seg_annotations
│ │ ├─train
│ │ ├─val
│ ├─ll_seg_annotations
│ │ ├─train
│ │ ├─val
Update the your dataset path in the ./lib/config/default.py.
python tools/train.pypython tools/test.py --weights weights/epoch-195.pthYou can store the image or video in --source, and then save the reasoning result to --save-dir
python tools/demo.py --weights weights/epoch-195.pth
--source inference/image
--save-dir inference/image_output
--conf-thres 0.3
--iou-thres 0.45YOLOPX is released under the MIT Licence.
Our work would not be complete without the wonderful work of the following authors:










