计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 143-150.doi: 10.11896/jsjkx.230600028
徐放, 苗夺谦, 张红云
XU Fang, MIAO Duoqian, ZHANG Hongyun
摘要: 与其他尺度目标不同,小目标具有携带语义信息和训练样本数量较少等特点。因此,当前目标检测算法存在小目标检测精度较低的问题。针对该问题,提出了基于多粒度的Transformer目标检测算法。首先,采用多粒度思想,设计了一种新的Transformer序列化方法,从粗到细逐个粒度地预测目标位置,从而提升模型的目标定位效果。然后,基于三支决策思想,细粒度挖掘小目标样本和常规尺度目标样本,从而增加小目标样本和难例负样本数量。最后,实验结果表明,在COCO数据集上,该算法的小目标检测精度(APs)达到了31.5%,平均检测精度(mAP)达到了49.1%;相比基线模型,APS提升了1.4%,mAP提升了2.2%;改进后的算法有效地提升了小目标检测效果,并显著提高了目标检测的整体精度。
中图分类号:
[1]VASWANI A,SHAZEERN,PARMAR N,et al.Attention Is All You Need[C]//Advances in Neural Information Processing Systems.Curran,2017:5998-6008. [2]WANG Z Y,MIAO D Q,ZHAO C R,et al.A Pedestrian Tra-cking Algorithm Based on Multi-Granularity Feature[J].Journal of Computer Research and Development,2020,57(5):996-1002. [3]CHEN Y F,MIAO D Q.Granular Regression with A Gradient Descent Method[J].Information Sciences,2020,537:247-260. [4]QIAN J,LIU C H,MIAO D Q,et al.Sequential Three-way Decisions via Multi-granularity[J].Information Sciences,2020,507:606-629. [5]YUE X D,CHEN Y F,MIAO D Q,et al.Fuzzy Neighborhood Covering for Three-way Classification[J].Information Sciences,2020,507:795-808. [6]LANGG M,MIAOD Q,HAMIDO F.Three-way Group Conflict Analysis Based on Pythagorean Fuzzy Set Theory[J].IEEE Transactions on Fuzzy Systems,2020,28(3):447-461. [7]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImagenetClassification with Deep Convolutional Neural Networks[J].Advances in Neural Information Processing Systems,2012,25:1097-1105. [8]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2014:1714-1722. [9]GIRSHICK R.Fast R-CNN[C]//International Conference on Computer Vision(ICCV).Cham:Springer,2015:1440-1448. [10]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149. [11]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN[C]//International Conference on Computer Vision(ICCV).Cham:Springer,2017:2980-2988. [12]CAI Z W,VASCONCELOS N.Cascade R-CNN:Delving into High-Quality Object Detection[C]//Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2018:6154-6162. [13]WEI L,DRAGOMIR A,DUMITRU E,et al.SSD:Single Shot MultiBox Detector[C]//European Conference on Computer Vision(ECCV).Cham:Springer,2016:21-37. [14]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]//Conference on Computer Vision(ICCV).Cham:Springer,2017:2980-2988. [15]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2017:6517-6525. [16]REDMON J,FARHADI A.YOLOv3:An Incremental Improvement[J].arXiv:1804.02767,2018. [17]BOCHKOVSKIY A,WANG C,LIAO H.YOLOv4:OptimalSpeed and Accuracy of Object Detection[J].arXiv:2004.10934,2020. [18]WANG C,BOCHKOVSKIY A,LIAO H.YOLOv7:Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors[J].arXiv:2207.02696,2022. [19]ZHANG W L,CHEN X H.SSD Object Detection Algorithmwith Cross-layer Fusion and Receptive Field Amplification[J].Computer Science,2023,50(3):231-237. [20]JIA T H,PENG L.SSD Object Detection Algorithm with Resi-dual Learning and Cyclic Attention[J].Computer Science,2023,50(5):170-176. [21]LAW H,DENG J.CornerNet:Detecting Objects as Paired Keypoints[C]//European Conference on ComputerVision(ECCV).Cham:Springer,2018. [22]TIAN Z,SHEN C H,CHENH,et al.FCOS:Fully Convolutional One-Stage Object Detection[C]//International Conference on Computer Vision(ICCV).NJ:IEEE,2019:9627-9636. [23]DUAN K W,BAI S,XIE L X,et al.CenterNet:Keypoint Triplets for Object Detection[C]//IEEE/CVF International Confe-rence on Computer Vision(ICCV).Cham:Springer,2019:1-16. [24]ZHOU X Y,ZHUO J C,PHILIPP K.Bottom-Up Object Detection by Grouping Extreme and Center Points[C]//Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2019:850-859. [25]LIU Z,LIN Y T,CAO Y,et al.Swin Transformer:Hierarchical Vision Transformer Using Shifted Windows[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2021:10012-10022. [26]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is Worth 16x16 Words:Transformers for Image Recognition at Scale[J].arXiv:2010.11929,2020. [27]WANG W H,XIE E Z,LI X,et al.PyramidVision Transformer:A Versatile Backbone for Dense Prediction without Convolutions[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2021:13977-13986. [28]NICOLAS C,FRANCISCO M,GABRIEL S,et al.End-to-endObject Detection with Transformers[C]//European Conference on Computer Vision(ECCV).Cham:Springer,2020:586-603. [29]ZHU X Z,SU W J,LU L W,et al.Deformable DETR:Defor-mable Transformers for End-to-End Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2021. [30]LIU S L,LI F,ZHANG H,et al.DAB-DETR:Dynamic Anchor Boxes are Better Queries for DETR[J].arXiv:2201.12329,2022. [31]HE K,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Computer Vision and Pattern Re-cognition(CVPR).NJ:IEEE,2016:770-778. [32]LIN T Y,DOLLÁR P,GIRSHICK R B,et al.Feature Pyramid Networks for Object Detection[C]//Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2017:2117-2125. [33]ZHANG Z,QI H,LIU S,et al.CIoU:Enhancing Convolutional Neural Networks for Object Detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).NJ:IEEE,2019:7155-7163. [34]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common Objects in Context[C]//European Conference on Computer Vision(ECCV).Cham:Springer,2014:740-755. [35]DENG J,WEI D,SOCHER R,et al.ImageNet:A Large-scaleHierarchical Image Database[C]//Computer Vision and Pattern Recognition.Florida(CVPR),NJ:IEEE,2009:248-255. |
|