DRIPNet: Decoupled Region of Interest Pooling Feature Network Based on Diffusion Model for UAV Small Object Detection

doi:10.21203/rs.3.rs-4244827/v1

Download PDF

Research Article

DRIPNet: Decoupled Region of Interest Pooling Feature Network Based on Diffusion Model for UAV Small Object Detection

https://doi.org/10.21203/rs.3.rs-4244827/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Small object detection has always been one of the most challenging tasks in the computer vision. Up to now, a prior bounding box is often applied to Unmanned Aerial Vehicle (UAV) image object detection. However, anchors need to be pre-set and not optimal for training data in many object detection algorithms. In 2022, the Diffusion Model was introduced in object detection method, in which the random boxes are employed. Inspired by this approach and the characteristics of UAV images, we find the great potential of diffusion models in UAV image detection and propose a more reasonable Decoupled Region of Interest Pooling Feature Diffusion Network. First of all, a more rational decoupled region of interest pooling(DRIP) feature extraction module has been designed, which decouples the feature extraction process between different scales, to make full use of the features at each level of the pyramid. Our approach eliminates the negative effects of unreasonable bounding box assignments, thereby enhancing the overall performance. Secondly, we propose a high-resolution scale-varying robust backbone(HSRB), where we architect the convolution module in the backbone using atrous convolution with switchable atrous rates and Pixel-Shuffle upsampling to mitigate the negative effects of scale variation and downsampling. Finally,loss functions with normalized Wasserstein distance (NWD) terms are applied, NWD is led into measuring the similarity between the prediction box and the ground truth box. The purpose is to eliminate the influence of positional sensitivity on the matching between the predicted box and the ground truth box.The optimal results of 27.91% mAP on the VisDrone dataset and 8.42% mAP on the TinyPerson dataset demonstrate the effectiveness of the proposed model.

Diffusion model

Small object detection

UAV image

Switchable atrous convolution

Pixel-Shuffle

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

DRIPNet: Decoupled Region of Interest Pooling Feature Network Based on Diffusion Model for UAV Small Object Detection

Status:

Version 1

Abstract

Full Text

Additional Declarations

Status:

Version 1