计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 143-150.doi: 10.11896/jsjkx.230600028

• 计算机图形学&多媒体 • 上一篇    下一篇

基于多粒度的Transformer目标检测算法

徐放, 苗夺谦, 张红云   

  1. 同济大学电子与信息工程学院 上海 201804
  • 收稿日期:2023-06-03 修回日期:2023-09-18 出版日期:2023-11-15 发布日期:2023-11-06
  • 通讯作者: 苗夺谦(dqmiao@tongji.edu.cn)
  • 作者简介:(2132959@tongji.edu.cn)
  • 基金资助:
    国家重点研发计划(2022YFB3104700);国家自然科学基金(62006172,61976158,61976160,62076182,62163016);江西省自然科学基金重点项目(20212ACB202001)

Transformer Object Detection Algorithm Based on Multi-granularity

XU Fang, MIAO Duoqian, ZHANG Hongyun   

  1. College of Electronic and Information Engineering,Tongji University,Shanghai 201804,China
  • Received:2023-06-03 Revised:2023-09-18 Online:2023-11-15 Published:2023-11-06
  • About author:XU Fang,born in 1999,postgraduate.His main research interests include object detection and granular computing.MIAO Duoqian,born in 1964,professor,Ph.D supervisor.His main research interests include rough set and machine learning.
  • Supported by:
    National Key Research and Development Program of China(2022YFB3104700),National Natural Science Foundation of China(62006172,61976158,61976160,62076182,62163016) and Natural Science Foundation of Jiangxi Province,China(20212ACB202001).

摘要: 与其他尺度目标不同,小目标具有携带语义信息和训练样本数量较少等特点。因此,当前目标检测算法存在小目标检测精度较低的问题。针对该问题,提出了基于多粒度的Transformer目标检测算法。首先,采用多粒度思想,设计了一种新的Transformer序列化方法,从粗到细逐个粒度地预测目标位置,从而提升模型的目标定位效果。然后,基于三支决策思想,细粒度挖掘小目标样本和常规尺度目标样本,从而增加小目标样本和难例负样本数量。最后,实验结果表明,在COCO数据集上,该算法的小目标检测精度(APs)达到了31.5%,平均检测精度(mAP)达到了49.1%;相比基线模型,APS提升了1.4%,mAP提升了2.2%;改进后的算法有效地提升了小目标检测效果,并显著提高了目标检测的整体精度。

关键词: 小目标检测, 多粒度, 三支决策, Transformer, 深度学习

Abstract: Different from other scale objects,small objects have the characteristics of carrying less semantic information and a small number of training samples.Therefore,the current object detection algorithm has the problem of low detection accuracy for small objects.Aiming at this problem,a Transformer object detection algorithm based on multi-granularity is proposed.Firstly,adopting the multi-granularity idea,a new Transformer serialization method is designed to predict the object position granularly from coarse to fine,thereby improving the object location effect of the model.Then,based on the three-way decision idea,fine-grained mining of small object samples and regular-scale object samples increases the number of small object samples and hardnegative samples.Finally,experimental results on the COCO dataset show that,the small object detection average accuracy(APs) of the algorithm reaches 31.5%,and the mean average accuracy(mAP) reaches 49.1%.Compared with the baseline model,the APs is improved by 1.4% and the mAP is improved by 2.2%.The algorithm effectively improves the detection effect of small objects and significantly improves the overall accuracy of object detection.

Key words: Small object detection, Multi-granularity, Three-way decision, Transformer, Deep learning

中图分类号: 

  • TP389.1
[1]VASWANI A,SHAZEERN,PARMAR N,et al.Attention Is All You Need[C]//Advances in Neural Information Processing Systems.Curran,2017:5998-6008.
[2]WANG Z Y,MIAO D Q,ZHAO C R,et al.A Pedestrian Tra-cking Algorithm Based on Multi-Granularity Feature[J].Journal of Computer Research and Development,2020,57(5):996-1002.
[3]CHEN Y F,MIAO D Q.Granular Regression with A Gradient Descent Method[J].Information Sciences,2020,537:247-260.
[4]QIAN J,LIU C H,MIAO D Q,et al.Sequential Three-way Decisions via Multi-granularity[J].Information Sciences,2020,507:606-629.
[5]YUE X D,CHEN Y F,MIAO D Q,et al.Fuzzy Neighborhood Covering for Three-way Classification[J].Information Sciences,2020,507:795-808.
[6]LANGG M,MIAOD Q,HAMIDO F.Three-way Group Conflict Analysis Based on Pythagorean Fuzzy Set Theory[J].IEEE Transactions on Fuzzy Systems,2020,28(3):447-461.
[7]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImagenetClassification with Deep Convolutional Neural Networks[J].Advances in Neural Information Processing Systems,2012,25:1097-1105.
[8]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2014:1714-1722.
[9]GIRSHICK R.Fast R-CNN[C]//International Conference on Computer Vision(ICCV).Cham:Springer,2015:1440-1448.
[10]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149.
[11]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN[C]//International Conference on Computer Vision(ICCV).Cham:Springer,2017:2980-2988.
[12]CAI Z W,VASCONCELOS N.Cascade R-CNN:Delving into High-Quality Object Detection[C]//Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2018:6154-6162.
[13]WEI L,DRAGOMIR A,DUMITRU E,et al.SSD:Single Shot MultiBox Detector[C]//European Conference on Computer Vision(ECCV).Cham:Springer,2016:21-37.
[14]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]//Conference on Computer Vision(ICCV).Cham:Springer,2017:2980-2988.
[15]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2017:6517-6525.
[16]REDMON J,FARHADI A.YOLOv3:An Incremental Improvement[J].arXiv:1804.02767,2018.
[17]BOCHKOVSKIY A,WANG C,LIAO H.YOLOv4:OptimalSpeed and Accuracy of Object Detection[J].arXiv:2004.10934,2020.
[18]WANG C,BOCHKOVSKIY A,LIAO H.YOLOv7:Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors[J].arXiv:2207.02696,2022.
[19]ZHANG W L,CHEN X H.SSD Object Detection Algorithmwith Cross-layer Fusion and Receptive Field Amplification[J].Computer Science,2023,50(3):231-237.
[20]JIA T H,PENG L.SSD Object Detection Algorithm with Resi-dual Learning and Cyclic Attention[J].Computer Science,2023,50(5):170-176.
[21]LAW H,DENG J.CornerNet:Detecting Objects as Paired Keypoints[C]//European Conference on ComputerVision(ECCV).Cham:Springer,2018.
[22]TIAN Z,SHEN C H,CHENH,et al.FCOS:Fully Convolutional One-Stage Object Detection[C]//International Conference on Computer Vision(ICCV).NJ:IEEE,2019:9627-9636.
[23]DUAN K W,BAI S,XIE L X,et al.CenterNet:Keypoint Triplets for Object Detection[C]//IEEE/CVF International Confe-rence on Computer Vision(ICCV).Cham:Springer,2019:1-16.
[24]ZHOU X Y,ZHUO J C,PHILIPP K.Bottom-Up Object Detection by Grouping Extreme and Center Points[C]//Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2019:850-859.
[25]LIU Z,LIN Y T,CAO Y,et al.Swin Transformer:Hierarchical Vision Transformer Using Shifted Windows[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2021:10012-10022.
[26]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is Worth 16x16 Words:Transformers for Image Recognition at Scale[J].arXiv:2010.11929,2020.
[27]WANG W H,XIE E Z,LI X,et al.PyramidVision Transformer:A Versatile Backbone for Dense Prediction without Convolutions[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2021:13977-13986.
[28]NICOLAS C,FRANCISCO M,GABRIEL S,et al.End-to-endObject Detection with Transformers[C]//European Conference on Computer Vision(ECCV).Cham:Springer,2020:586-603.
[29]ZHU X Z,SU W J,LU L W,et al.Deformable DETR:Defor-mable Transformers for End-to-End Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2021.
[30]LIU S L,LI F,ZHANG H,et al.DAB-DETR:Dynamic Anchor Boxes are Better Queries for DETR[J].arXiv:2201.12329,2022.
[31]HE K,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Computer Vision and Pattern Re-cognition(CVPR).NJ:IEEE,2016:770-778.
[32]LIN T Y,DOLLÁR P,GIRSHICK R B,et al.Feature Pyramid Networks for Object Detection[C]//Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2017:2117-2125.
[33]ZHANG Z,QI H,LIU S,et al.CIoU:Enhancing Convolutional Neural Networks for Object Detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).NJ:IEEE,2019:7155-7163.
[34]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common Objects in Context[C]//European Conference on Computer Vision(ECCV).Cham:Springer,2014:740-755.
[35]DENG J,WEI D,SOCHER R,et al.ImageNet:A Large-scaleHierarchical Image Database[C]//Computer Vision and Pattern Recognition.Florida(CVPR),NJ:IEEE,2009:248-255.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!