Skip to main content
Log in

Decoupling and Interaction: task coordination in single-stage object detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the field of computer vision, general single-stage object detection methods employ two individual subnets within detection head, serving classification and localization purposes respectively. However, the lack of explicit modeling for distinctions and associations poses challenges for aligning the spatial feature perception of these two tasks, consequently leading to sub-optimal detection performance. Although some methods utilize classification to evaluate localization, it is a compromise rather than multi-task optimization. In this paper, we propose a Task-coordinated Single-stage Object Detector (TSOD) to enhance the coordination of multiple tasks. Firstly, we introduce a Task-decoupled Feature Alignment Mechanism (TFAM), which adaptively provides compatible features for different tasks by decoupling spatial information. For classification and localization, the network adaptively samples from category-sensitive regions and boundary-separable regions, respectively. Secondly, we propose a Task-interactive Enhancement Mechanism (TEM), which explicitly combines different task-sensitive features for joint classification score prediction and selects samples with high task consistency for training. Through this interaction mechanism, consistency between tasks is bolstered. We conduct extensive experiments on the COCO, Cityscapes, CrowdHuman and WiderFace datasets to evaluate the performance of TSOD. The results demonstrate that our model outperforms several state-of-the-art detectors, achieving a 2.0 AP improvement over the baseline on COCO minival and a remarkable 50.4 AP at single-model single-scale testing on COCO test-dev. Additionally, our model, equipped with ResNet-50, performs significantly better than other representative detectors on the Cityscapes, CrowdHuman, and WiderFace datasets, showcasing its robustness and generalizability. Our study contributes a new perspective to the design of single-stage object detectors by emphasizing the importance of decoupling and interaction, which is crucial for task coordination. The experimental results validate the effectiveness of our proposed TSOD and its potential as a leading approach in the field. Codes are available at https://github.com/Majiawei/tsod-complete.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability and access

The data that support the findings of this study are openly available in MS-COCO at urlhttps://cocodataset.org/, reference number [25] and in Cityscapes at urlhttps://www.cityscapes-dataset.com/, reference number [27].

Abbreviations

TSOD::

Task-coordinated Single-stage Object Detector

TFAM::

Task-decoupled Feature Alignment Mechanism

TEM::

Task-interactive Enhancement Mechanism

JD::

Joint Decision

TS::

Task-consistent Sampling

AP::

Average Precision

COCO::

Common Objects in Context

NMS::

Non-Maximum Suppression

IoU::

Intersection over Union

IACS::

IoU-aware Classification Score

FPN::

Feature Pyramid Network

DecAdaDconv::

Decoupled Adaptive Deformable Convolution

FPS::

Frames Per Second

FLOPs::

Floating Point Operations

GT::

Ground Truth

PCC::

Pearson Correlation Coefficient

References

  1. Feng D, Haase-Schütz C, Rosenbaum L, Hertlein H, Glaeser C, Timm F, Wiesbeck W, Dietmayer K (2020) Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans Intell Trans Syst 22(3):1341–1360

    Article  Google Scholar 

  2. Naqvi SMA, Shabaz M, Khan MA, Hassan SI (2023) Adversarial attacks on visual objects using the fast gradient sign method. J Grid Comput 21(4):52. https://doi.org/10.1007/S10723-023-09684-9

    Article  Google Scholar 

  3. Qadeer N, Shah JH, Sharif M, Khan MA, Muhammad G, Zhang Y (2022) Intelligent tracking of mechanically thrown objects by industrial catching robot for automated in-plant logistics 4.0. Sensors 22(6):2113

    Article  Google Scholar 

  4. Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J Photogrammetry Remote Sens 159:296–307

    Article  Google Scholar 

  5. Redmon J, Divvala SK, Girshick RB, Farhadi A (2016) You only look once: unified real-time object detection. In: CVPR, pp 779–788

  6. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. CoRR arXiv:1804.02767

  7. Liu W, Anguelov D, al DE (2016) SSD: single shot multibox detector. In: ECCV, vol 9905, pp 21–37

  8. Lin T, Goyal P, et al (2017) R.B.G: focal loss for dense object detection. In: ICCV, pp 2999–3007

  9. Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: ICCV, pp 9626–9635

  10. Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: ECCV, vol 11218, pp 765–781

  11. Duan K, Bai S, al LX (2019) Centernet: keypoint triplets for object detection. In: ICCV, pp 6568–6577

  12. Yang Z, Liu S, al HH (2019) Reppoints: point set representation for object detection. In: ICCV, pp 9656–9665

  13. Zhang S, Chi C, al YY (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: CVPR, pp 9756–9765

  14. Tang Z, Yang J, Pei Z, Song X (2021) Coordinate-based anchor-free module for object detection. Appl Intell 51(12):9066–9080

    Article  Google Scholar 

  15. Wen G, Cao P, Wang H, Chen H, Liu X, Xu J, Zaïane OR (2023) MS-SSD: multi-scale single shot detector for ship detection in remote sensing images. Appl Intell 53(2):1586–1604

    Article  Google Scholar 

  16. Li Y, Zhou S, Chen H (2022) Attention-based fusion factor in FPN for object detection. Appl Intell 52(13):15547–15556

    Article  Google Scholar 

  17. Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS, pp 91–99

  18. Cai Z, Vasconcelos N (2021) Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans Pattern Anal Mach Intell 43(5):1483–1498

    Article  Google Scholar 

  19. Pang J, Chen K, al JS (2019) Libra R-CNN: towards balanced learning for object detection. In: CVPR, pp 821–830

  20. Song G, Liu Y, Wang X (2020) Revisiting the sibling head in object detector. In: CVPR, pp 11560–11569

  21. Wu Y, Chen Y, al LY (2020) Rethinking classification and localization for object detection. In: CVPR, pp 10183–10192

  22. Kim K, Lee HS (2020) Probabilistic anchor assignment with iou prediction for object detection. In: ECCV, vol 12370, pp 355–371

  23. Li X, Wang W, al LW (2020) Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. In: NeurIPS

  24. Zhang H, Wang Y, Dayoub F, Sunderhauf N (2021) Varifocalnet: an iou-aware dense object detector. In: CVPR, pp 8514–8523

  25. Lin T, Maire M, al (2014) SJB Microsoft COCO common objects in context. In: ECCV, vol 8693, pp 740–755

  26. Jiang B, Luo R, al JM (2018) Acquisition of localization confidence for accurate object detection. In: ECCV, vol 11218, pp 816–832

  27. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp 3213–3223

  28. Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: CVPR, pp 840–849

  29. Wang J, Chen K, al SY (2019) Region proposal by guided anchoring. In: CVPR, pp 2965–2974

  30. Wang C, Bochkovskiy A, Liao HM (2022) Yolov7 trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. CoRR abs/220702696

  31. Dong Z, Li G, Liao Y, Wang F, Ren P, Qian C (2020) Centripetalnet Pursuing high-quality keypoint pairs for object detection. In: CVPR, pp 10516–10525

  32. Kong T, Sun F, al HL (2020) Foveabox Beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398

  33. Vu T, Jang H, Pham TX, Yoo CD (2019) Cascade RPN delving into high-quality region proposal network with adaptive convolution. In: NeurIPS, pp 1430–1440

  34. Qiu H, Ma Y, Li Z, Liu S, Sun J (2020) Borderdet: border feature for dense object detection. In: ECCV, vol 12346, pp 549–564

  35. Chen Z, Yang C, Li Q, Zhao F, Zha Z, Wu F (2021) Disentangle your dense object detector. In: ACM multimedia conference, pp 4939–4948

  36. Dai X, Chen Y, al BX (2021) Dynamic head Unifying object detection heads with attentions. In: CVPR, pp 7373–7382

  37. Vandenhende S, Georgoulis S, Van Gansbeke W, Proesmans M, Dai D, Van Gool L (2021) Multi-task learning for dense prediction tasks: a survey. IEEE Trans Pattern Anal Mach Intell 44(7):3614–3633

    Google Scholar 

  38. Masood H, Zafar A, Ali MU, Hussain T, Khan MA, Tariq U, Damasevicius R (2022) Tracking of a fixed-shape moving object based on the gradient descent method. Sensors 22(3):1098

    Article  Google Scholar 

  39. Hussain N, Khan MA, Kadry S, Tariq U, Mostafa RR, Choi J-I, Nam Y (2021) Intelligent deep learning and improved whale optimization algorithm based framework for object recognition. Hum Cent Comput Inf Sci 11(34):2021

    Google Scholar 

  40. Rashid M, Khan MA, Alhaisoni M, Wang S-H, Naqvi SR, Rehman A, Saba T (2020) A sustainable deep learning framework for object recognition using multi-layers deep features fusion and selection. Sustainability 12(12):5037

    Article  Google Scholar 

  41. Tychsen-Smith L, Petersson L (2018) Improving object localization with fitness NMS and bounded iou loss. In: CVPR, pp 6877–6885

  42. Li X, Wang W, al XH (2021) Generalized focal loss V2 learning reliable localization quality estimation for dense object detection. In: CVPR, pp 11632–11641

  43. Feng C, Zhong Y, Gao Y, Scott MR, Huang W (2021) TOOD task-aligned one-stage object detection. arXiv:2108.07755

  44. Oksuz K, Cam BC, Akbas E, Kalkan S (2020) A ranking-based, balanced loss function unifying classification and localisation in object detection. In: NeurIPS

  45. Chen K, Lin W, Li J, See J, Wang J, Zou J (2021) Ap-loss for accurate one-stage object detection. IEEE Trans Pattern Anal Mach Intell 43(11):3782–3798

    Article  Google Scholar 

  46. Lin T, Dollar P, al RBG (2017) Feature pyramid networks for object detection. In: CVPR, pp 936–944

  47. Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets V2: more deformable, better results. In: CVPR, pp 9308–9316

  48. Ma Y, Liu S, Li Z, Sun J (2021) Iqdet Instance-wise quality distribution sampling for object detection. In: CVPR, pp 1717–1725

  49. Rezatofighi H, Tsoi N, al JG (2019) Generalized intersection over union A metric and a loss for bounding box regression. In: CVPR, pp 658–666

  50. He K, Gkioxari G, Dollar P, Girshick RB (2017) Mask R-CNN. In: ICCV, pp 2980–2988

  51. Gao Z, Wang L, Wu G (2021) Mutual supervision for dense object detection. In: ICCV, pp 3621–3630

  52. Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: ICCV, pp 6053–6062

  53. Zhang H, Chang H, Ma B, Wang N, Chen X (2020) Dynamic R-CNN towards high quality object detection via dynamic training. In: ECCV, vol 12360, pp 260–275

  54. Zhang X, Wan F, Liu C, Ji R, Ye Q (2019) Freeanchor Learning to match anchors for visual object detection. In: NeurIPS, pp 147–155

  55. Ke W, Zhang T, Huang Z, Ye Q, Liu J, Huang D (2020) Multiple anchor learning for visual object detection. In: CVPR, pp 10203–10212

  56. Zhu C, Chen F, Shen Z, Savvides M (2020) Soft anchor-point object detection 12354:91–107

  57. Chen Y, Zhang Z, Cao Y, Wang L, Lin S, Hu H (2020) Reppoints v2 Verification meets regression for object detection. In: NeurIPS

  58. Ge Z, Liu S, Li Z, Yoshie O, Sun J (2021) OTA: optimal transport assignment for object detection. In: CVPR, pp 303–312

  59. Li S, He C, Li R, Zhang L (2022) A dual weighting label assignment scheme for object detection. arXiv:2203.09730

  60. Chen K, Wang J, al JP (2019) Mmdetection Open mmlab detection toolbox and benchmark. arXiv:1906.07155

  61. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778

  62. Deng J, Dong W, al RS (2009) Imagenet: a large-scale hierarchical image database. In: CVPR, pp 248–255

  63. Li S, Li M, Li R, He C, Zhang L (2023) One-to-few label assignment for end-to-end dense detection. In: CVPR, IEEE, pp 7350–7359

  64. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proc Eur Conf Comput Vis (ECCV), vol 12346, pp 213–229

  65. Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2021) Deformable DETR deformable transformers for end-to-end object detection. In: ICLR

  66. Sun P, Zhang R, Jiang Y, Kong T, Xu C, Zhan W, Tomizuka M, Li L, Yuan Z, Wang C, Luo P (2021) Sparse R-CNN end-to-end object detection with learnable proposals. In: CVPR, pp 14454–14463

  67. Gao Z, Wang L, Han B, Guo S (2022) Adamixer; a fast-converging query-based object detector. In: CVPR, pp 5354–5363

  68. Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun, J (2018) Crowdhuman: a benchmark for detecting human in a crowd. arXiv:1805.00123

  69. Yang S, Luo P, Loy CC, Tang X (2016) WIDER FACE A face detection benchmark. In: CVPR, IEEE Computer Society, pp 5525–5533

Download references

Funding

This work was supported in part by the National Key Research and Development Program of China under Grant 2020AAA0109700, in part by the National Science Fund for Distinguished Young Scholars under Grant 62125601, by the National Social Science Fund of China under Grant 21FZXB020, and in part by the National Natural Science Foundation of China under Grant 62076024 and Grant 62006018.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Jia-Wei Ma, Shu Tian and Haixia Man. The first draft of the manuscript was written by Jia-Wei Ma and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Haixia Man or Jingyan Qin.

Ethics declarations

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Ethical and informed consent for data used

Ethical and informed consent for data used of this paper was obtained from University of Science and Technology Beijing and all authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, JW., Tian, S., Man, H. et al. Decoupling and Interaction: task coordination in single-stage object detection. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19257-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-19257-x

Keywords

Navigation