Small object detection has always been one of the most challenging tasks in the computer vision. Up to now, a prior bounding box is often applied to Unmanned Aerial Vehicle (UAV) image object detection. However, anchors need to be pre-set and not optimal for training data in many object detection algorithms. In 2022, the Diffusion Model was introduced in object detection method, in which the random boxes are employed. Inspired by this approach and the characteristics of UAV images, we find the great potential of diffusion models in UAV image detection and propose a more reasonable Decoupled Region of Interest Pooling Feature Diffusion Network. First of all, a more rational decoupled region of interest pooling(DRIP) feature extraction module has been designed, which decouples the feature extraction process between different scales, to make full use of the features at each level of the pyramid. Our approach eliminates the negative effects of unreasonable bounding box assignments, thereby enhancing the overall performance. Secondly, we propose a high-resolution scale-varying robust backbone(HSRB), where we architect the convolution module in the backbone using atrous convolution with switchable atrous rates and Pixel-Shuffle upsampling to mitigate the negative effects of scale variation and downsampling. Finally,loss functions with normalized Wasserstein distance (NWD) terms are applied, NWD is led into measuring the similarity between the prediction box and the ground truth box. The purpose is to eliminate the influence of positional sensitivity on the matching between the predicted box and the ground truth box.The optimal results of 27.91% mAP on the VisDrone dataset and 8.42% mAP on the TinyPerson dataset demonstrate the effectiveness of the proposed model.