Decoupling and Interaction: task coordination in single-stage object detection

Ma, Jia-Wei; Tian, Shu; Man, Haixia; Chen, Song-Lu; Qin, Jingyan; Yin, Xu-Cheng

doi:10.1007/s11042-024-19257-x

Decoupling and Interaction: task coordination in single-stage object detection

Published: 30 April 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jia-Wei Ma ORCID: orcid.org/0000-0002-7628-6047¹,
Shu Tian¹,
Haixia Man²,
Song-Lu Chen¹,
Jingyan Qin¹ &
…
Xu-Cheng Yin¹

Abstract

In the field of computer vision, general single-stage object detection methods employ two individual subnets within detection head, serving classification and localization purposes respectively. However, the lack of explicit modeling for distinctions and associations poses challenges for aligning the spatial feature perception of these two tasks, consequently leading to sub-optimal detection performance. Although some methods utilize classification to evaluate localization, it is a compromise rather than multi-task optimization. In this paper, we propose a Task-coordinated Single-stage Object Detector (TSOD) to enhance the coordination of multiple tasks. Firstly, we introduce a Task-decoupled Feature Alignment Mechanism (TFAM), which adaptively provides compatible features for different tasks by decoupling spatial information. For classification and localization, the network adaptively samples from category-sensitive regions and boundary-separable regions, respectively. Secondly, we propose a Task-interactive Enhancement Mechanism (TEM), which explicitly combines different task-sensitive features for joint classification score prediction and selects samples with high task consistency for training. Through this interaction mechanism, consistency between tasks is bolstered. We conduct extensive experiments on the COCO, Cityscapes, CrowdHuman and WiderFace datasets to evaluate the performance of TSOD. The results demonstrate that our model outperforms several state-of-the-art detectors, achieving a 2.0 AP improvement over the baseline on COCO minival and a remarkable 50.4 AP at single-model single-scale testing on COCO test-dev. Additionally, our model, equipped with ResNet-50, performs significantly better than other representative detectors on the Cityscapes, CrowdHuman, and WiderFace datasets, showcasing its robustness and generalizability. Our study contributes a new perspective to the design of single-stage object detectors by emphasizing the importance of decoupling and interaction, which is crucial for task coordination. The experimental results validate the effectiveness of our proposed TSOD and its potential as a leading approach in the field. Codes are available at https://github.com/Majiawei/tsod-complete.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-task feature-aligned head in one-stage object detection

Article 03 September 2022

Decouple and align classification and regression in one-stage object detection

Article 18 December 2023

Rethinking the Misalignment Problem in Dense Object Detection

Data availability and access

The data that support the findings of this study are openly available in MS-COCO at urlhttps://cocodataset.org/, reference number [25] and in Cityscapes at urlhttps://www.cityscapes-dataset.com/, reference number [27].

Abbreviations

TSOD::: Task-coordinated Single-stage Object Detector
TFAM::: Task-decoupled Feature Alignment Mechanism
TEM::: Task-interactive Enhancement Mechanism
JD::: Joint Decision
TS::: Task-consistent Sampling
AP::: Average Precision
COCO::: Common Objects in Context
NMS::: Non-Maximum Suppression
IoU::: Intersection over Union
IACS::: IoU-aware Classification Score
FPN::: Feature Pyramid Network
DecAdaDconv::: Decoupled Adaptive Deformable Convolution
FPS::: Frames Per Second
FLOPs::: Floating Point Operations
GT::: Ground Truth
PCC::: Pearson Correlation Coefficient

References

Feng D, Haase-Schütz C, Rosenbaum L, Hertlein H, Glaeser C, Timm F, Wiesbeck W, Dietmayer K (2020) Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans Intell Trans Syst 22(3):1341–1360
Article Google Scholar
Naqvi SMA, Shabaz M, Khan MA, Hassan SI (2023) Adversarial attacks on visual objects using the fast gradient sign method. J Grid Comput 21(4):52. https://doi.org/10.1007/S10723-023-09684-9
Article Google Scholar
Qadeer N, Shah JH, Sharif M, Khan MA, Muhammad G, Zhang Y (2022) Intelligent tracking of mechanically thrown objects by industrial catching robot for automated in-plant logistics 4.0. Sensors 22(6):2113
Article Google Scholar
Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J Photogrammetry Remote Sens 159:296–307
Article Google Scholar
Redmon J, Divvala SK, Girshick RB, Farhadi A (2016) You only look once: unified real-time object detection. In: CVPR, pp 779–788
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. CoRR arXiv:1804.02767
Liu W, Anguelov D, al DE (2016) SSD: single shot multibox detector. In: ECCV, vol 9905, pp 21–37
Lin T, Goyal P, et al (2017) R.B.G: focal loss for dense object detection. In: ICCV, pp 2999–3007
Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: ICCV, pp 9626–9635
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: ECCV, vol 11218, pp 765–781
Duan K, Bai S, al LX (2019) Centernet: keypoint triplets for object detection. In: ICCV, pp 6568–6577
Yang Z, Liu S, al HH (2019) Reppoints: point set representation for object detection. In: ICCV, pp 9656–9665
Zhang S, Chi C, al YY (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: CVPR, pp 9756–9765
Tang Z, Yang J, Pei Z, Song X (2021) Coordinate-based anchor-free module for object detection. Appl Intell 51(12):9066–9080
Article Google Scholar
Wen G, Cao P, Wang H, Chen H, Liu X, Xu J, Zaïane OR (2023) MS-SSD: multi-scale single shot detector for ship detection in remote sensing images. Appl Intell 53(2):1586–1604
Article Google Scholar
Li Y, Zhou S, Chen H (2022) Attention-based fusion factor in FPN for object detection. Appl Intell 52(13):15547–15556
Article Google Scholar
Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS, pp 91–99
Cai Z, Vasconcelos N (2021) Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans Pattern Anal Mach Intell 43(5):1483–1498
Article Google Scholar
Pang J, Chen K, al JS (2019) Libra R-CNN: towards balanced learning for object detection. In: CVPR, pp 821–830
Song G, Liu Y, Wang X (2020) Revisiting the sibling head in object detector. In: CVPR, pp 11560–11569
Wu Y, Chen Y, al LY (2020) Rethinking classification and localization for object detection. In: CVPR, pp 10183–10192
Kim K, Lee HS (2020) Probabilistic anchor assignment with iou prediction for object detection. In: ECCV, vol 12370, pp 355–371
Li X, Wang W, al LW (2020) Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. In: NeurIPS
Zhang H, Wang Y, Dayoub F, Sunderhauf N (2021) Varifocalnet: an iou-aware dense object detector. In: CVPR, pp 8514–8523
Lin T, Maire M, al (2014) SJB Microsoft COCO common objects in context. In: ECCV, vol 8693, pp 740–755
Jiang B, Luo R, al JM (2018) Acquisition of localization confidence for accurate object detection. In: ECCV, vol 11218, pp 816–832
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp 3213–3223
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: CVPR, pp 840–849
Wang J, Chen K, al SY (2019) Region proposal by guided anchoring. In: CVPR, pp 2965–2974
Wang C, Bochkovskiy A, Liao HM (2022) Yolov7 trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. CoRR abs/220702696
Dong Z, Li G, Liao Y, Wang F, Ren P, Qian C (2020) Centripetalnet Pursuing high-quality keypoint pairs for object detection. In: CVPR, pp 10516–10525
Kong T, Sun F, al HL (2020) Foveabox Beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398
Vu T, Jang H, Pham TX, Yoo CD (2019) Cascade RPN delving into high-quality region proposal network with adaptive convolution. In: NeurIPS, pp 1430–1440
Qiu H, Ma Y, Li Z, Liu S, Sun J (2020) Borderdet: border feature for dense object detection. In: ECCV, vol 12346, pp 549–564
Chen Z, Yang C, Li Q, Zhao F, Zha Z, Wu F (2021) Disentangle your dense object detector. In: ACM multimedia conference, pp 4939–4948
Dai X, Chen Y, al BX (2021) Dynamic head Unifying object detection heads with attentions. In: CVPR, pp 7373–7382
Vandenhende S, Georgoulis S, Van Gansbeke W, Proesmans M, Dai D, Van Gool L (2021) Multi-task learning for dense prediction tasks: a survey. IEEE Trans Pattern Anal Mach Intell 44(7):3614–3633
Google Scholar
Masood H, Zafar A, Ali MU, Hussain T, Khan MA, Tariq U, Damasevicius R (2022) Tracking of a fixed-shape moving object based on the gradient descent method. Sensors 22(3):1098
Article Google Scholar
Hussain N, Khan MA, Kadry S, Tariq U, Mostafa RR, Choi J-I, Nam Y (2021) Intelligent deep learning and improved whale optimization algorithm based framework for object recognition. Hum Cent Comput Inf Sci 11(34):2021
Google Scholar
Rashid M, Khan MA, Alhaisoni M, Wang S-H, Naqvi SR, Rehman A, Saba T (2020) A sustainable deep learning framework for object recognition using multi-layers deep features fusion and selection. Sustainability 12(12):5037
Article Google Scholar
Tychsen-Smith L, Petersson L (2018) Improving object localization with fitness NMS and bounded iou loss. In: CVPR, pp 6877–6885
Li X, Wang W, al XH (2021) Generalized focal loss V2 learning reliable localization quality estimation for dense object detection. In: CVPR, pp 11632–11641
Feng C, Zhong Y, Gao Y, Scott MR, Huang W (2021) TOOD task-aligned one-stage object detection. arXiv:2108.07755
Oksuz K, Cam BC, Akbas E, Kalkan S (2020) A ranking-based, balanced loss function unifying classification and localisation in object detection. In: NeurIPS
Chen K, Lin W, Li J, See J, Wang J, Zou J (2021) Ap-loss for accurate one-stage object detection. IEEE Trans Pattern Anal Mach Intell 43(11):3782–3798
Article Google Scholar
Lin T, Dollar P, al RBG (2017) Feature pyramid networks for object detection. In: CVPR, pp 936–944
Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets V2: more deformable, better results. In: CVPR, pp 9308–9316
Ma Y, Liu S, Li Z, Sun J (2021) Iqdet Instance-wise quality distribution sampling for object detection. In: CVPR, pp 1717–1725
Rezatofighi H, Tsoi N, al JG (2019) Generalized intersection over union A metric and a loss for bounding box regression. In: CVPR, pp 658–666
He K, Gkioxari G, Dollar P, Girshick RB (2017) Mask R-CNN. In: ICCV, pp 2980–2988
Gao Z, Wang L, Wu G (2021) Mutual supervision for dense object detection. In: ICCV, pp 3621–3630
Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: ICCV, pp 6053–6062
Zhang H, Chang H, Ma B, Wang N, Chen X (2020) Dynamic R-CNN towards high quality object detection via dynamic training. In: ECCV, vol 12360, pp 260–275
Zhang X, Wan F, Liu C, Ji R, Ye Q (2019) Freeanchor Learning to match anchors for visual object detection. In: NeurIPS, pp 147–155
Ke W, Zhang T, Huang Z, Ye Q, Liu J, Huang D (2020) Multiple anchor learning for visual object detection. In: CVPR, pp 10203–10212
Zhu C, Chen F, Shen Z, Savvides M (2020) Soft anchor-point object detection 12354:91–107
Chen Y, Zhang Z, Cao Y, Wang L, Lin S, Hu H (2020) Reppoints v2 Verification meets regression for object detection. In: NeurIPS
Ge Z, Liu S, Li Z, Yoshie O, Sun J (2021) OTA: optimal transport assignment for object detection. In: CVPR, pp 303–312
Li S, He C, Li R, Zhang L (2022) A dual weighting label assignment scheme for object detection. arXiv:2203.09730
Chen K, Wang J, al JP (2019) Mmdetection Open mmlab detection toolbox and benchmark. arXiv:1906.07155
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778
Deng J, Dong W, al RS (2009) Imagenet: a large-scale hierarchical image database. In: CVPR, pp 248–255
Li S, Li M, Li R, He C, Zhang L (2023) One-to-few label assignment for end-to-end dense detection. In: CVPR, IEEE, pp 7350–7359
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proc Eur Conf Comput Vis (ECCV), vol 12346, pp 213–229
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2021) Deformable DETR deformable transformers for end-to-end object detection. In: ICLR
Sun P, Zhang R, Jiang Y, Kong T, Xu C, Zhan W, Tomizuka M, Li L, Yuan Z, Wang C, Luo P (2021) Sparse R-CNN end-to-end object detection with learnable proposals. In: CVPR, pp 14454–14463
Gao Z, Wang L, Han B, Guo S (2022) Adamixer; a fast-converging query-based object detector. In: CVPR, pp 5354–5363
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun, J (2018) Crowdhuman: a benchmark for detecting human in a crowd. arXiv:1805.00123
Yang S, Luo P, Loy CC, Tang X (2016) WIDER FACE A face detection benchmark. In: CVPR, IEEE Computer Society, pp 5525–5533

Download references

Funding

This work was supported in part by the National Key Research and Development Program of China under Grant 2020AAA0109700, in part by the National Science Fund for Distinguished Young Scholars under Grant 62125601, by the National Social Science Fund of China under Grant 21FZXB020, and in part by the National Natural Science Foundation of China under Grant 62076024 and Grant 62006018.

Author information

Authors and Affiliations

Department of Computer Science and Technology, University of Science and Technology Beijing, No. 30 Xueyuan Road, Haidian District, 100083, Beijing, China
Jia-Wei Ma, Shu Tian, Song-Lu Chen, Jingyan Qin & Xu-Cheng Yin
School of Foreign Studies, University of Science and Technology Beijing, No. 30 Xueyuan Road, Haidian District, 100083, Beijing, China
Haixia Man

Authors

Jia-Wei Ma
View author publications
You can also search for this author in PubMed Google Scholar
Shu Tian
View author publications
You can also search for this author in PubMed Google Scholar
Haixia Man
View author publications
You can also search for this author in PubMed Google Scholar
Song-Lu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jingyan Qin
View author publications
You can also search for this author in PubMed Google Scholar
Xu-Cheng Yin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Jia-Wei Ma, Shu Tian and Haixia Man. The first draft of the manuscript was written by Jia-Wei Ma and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Haixia Man or Jingyan Qin.

Ethics declarations

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Ethical and informed consent for data used

Ethical and informed consent for data used of this paper was obtained from University of Science and Technology Beijing and all authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, JW., Tian, S., Man, H. et al. Decoupling and Interaction: task coordination in single-stage object detection. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19257-x

Download citation

Received: 03 November 2023
Revised: 28 March 2024
Accepted: 15 April 2024
Published: 30 April 2024
DOI: https://doi.org/10.1007/s11042-024-19257-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decoupling and Interaction: task coordination in single-stage object detection

Abstract

Access this article

Similar content being viewed by others

Multi-task feature-aligned head in one-stage object detection

Decouple and align classification and regression in one-stage object detection

Rethinking the Misalignment Problem in Dense Object Detection

Data availability and access

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Decoupling and Interaction: task coordination in single-stage object detection

Abstract

Access this article

Similar content being viewed by others

Multi-task feature-aligned head in one-stage object detection

Decouple and align classification and regression in one-stage object detection

Rethinking the Misalignment Problem in Dense Object Detection

Data availability and access

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation