Reconciling global and local optimal label assignments for heavily occluded pedestrian detection

Liu, Chongwei; Li, Haojie; Wang, Zhihui; Xu, Rui

doi:10.1007/s00530-024-01304-0

Reconciling global and local optimal label assignments for heavily occluded pedestrian detection

Regular Paper
Published: 29 March 2024

Volume 30, article number 100, (2024)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Chongwei Liu¹,
Haojie Li²,
Zhihui Wang¹ &
…
Rui Xu¹

66 Accesses
Explore all metrics

Abstract

Heavily occluded pedestrian detection remains challenging for CNN detectors. Recent methods such as OTA and simOTA utilize optimal transport for label assignment but still encounter limitations in handling local occlusion. To tackle this issue, we thoroughly investigate the relationship between data assignment algorithms and the label assignment problem. We propose a theoretical framework to explain the underlying causes of suboptimal label assignments in heavily occluded regions and identify the ideal assignment method. In our pursuit of the ideal method, we propose two label assignment methods: the K-means method (KMM) and the LAPJV method (LAM), which correspond to the Clustering Algorithm and the Linear Assignment Problem, respectively. KMM assigns anchors based on the lowest cost, similar to K-means clustering. LAM applies LAPJV iteratively on occluded regions for local optimization, and maintains global optimality in non-occluded regions. LAM also achieves 30% execution time reduction compared to OTA. We provide both theoretical analysis and experimental validation to demonstrate that LAM is the ideal method in our theoretical framework. It elegantly reconciles global and local optimal assignments efficiently, thus achieving the highest performance in Average Precision (AP) and Recall on five datasets, i.e., CrowdHuman, WiderPerson, CityPersons, COCOPersons, and COCO.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic-driven multi-camera pedestrian detection

Article Open access 09 April 2022

Count- and Similarity-Aware R-CNN for Pedestrian Detection

PolyTracker: Progressive Contour Regression for Multiple Object Tracking and Segmentation

Data availability

No datasets were generated or analyzed during the current study.

Notes

In general, different outputs of a detector are responsible for detecting objects of different scales. Therefore, assigning ground truths with similar scales to anchors from the same output is crucial for effective training.
www.github.com/gatagat/lap
COCOPersons is a subset of COCO, where only annotations of “person” are considered for training and evaluation.
The test set does not provide annotations and the server is no longer accessible.

References

He, Y., He, N., Yu, H., Zhang, R., Yan, K.: From macro to micro: rethinking multi-scale pedestrian detection. Multimed. Syst. 29(3), 1417–1429 (2023)
Article Google Scholar
Chu, X., Zheng, A., Zhang, X., Sun, J.: Detection in crowded scenes: one proposal, multiple predictions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12214–12223 (2020)
Liu, S., Huang, D., Wang, Y.: Adaptive NMS: refining pedestrian detection in a crowd. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6459–6468 (2019)
Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in CNNS. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6995–7003 (2018)
Zhou, C., Yuan, J.: Bi-box regression for pedestrian detection and occlusion estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 135–151 (2018)
Ge, Z., Liu, S., Li, Z., Yoshie, O., Sun, J.: Ota: Optimal transport assignment for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 303–312 (2021)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999). https://doi.org/10.1145/331499.331504
Article Google Scholar
Martello, S., Toth, P.: Linear assignment problems. In: Martello, S., Laporte, G., Minoux, M., Ribeiro, C. (eds.) Surveys in Combinatorial Optimization. North-Holland Mathematics Studies, vol. 132, pp. 259–282. North-Holland, Amsterdam (1987)
Chapter Google Scholar
Peyré, G., Cuturi, M., et al.: Computational optimal transport: With applications to data science. Foundations Trends Mach. Learn. 11(5–6), 355–607 (2019)
Article Google Scholar
Jonker, R., Volgenant, T.: A shortest augmenting path algorithm for dense and sparse linear assignment problems. In: DGOR/NSOR: Papers of the 16th Annual Meeting of DGOR in Cooperation with NSOR/Vorträge der 16. Jahrestagung der DGOR Zusammen Mit der NSOR, p. 622. Springer (1988)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. Adv. Neural Inf. Process. Syst. 26, 2292–2300 (2013)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9627–9636 (2019)
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020)
Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., Sun, J.: Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018)
Zhang, S., Xie, Y., Wan, J., Xia, H., Li, S.Z., Guo, G.: Widerperson: a diverse dataset for dense pedestrian detection in the wild. IEEE Trans. Multimed. 22(2), 380–393 (2019)
Article Google Scholar
Zhang, S., Benenson, R., Schiele, B.: Citypersons: a diverse dataset for pedestrian detection. In: CVPR, pp. 3213–3221 (2017)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Gao, Z., Chen, P., Zhuo, T., Liu, M., Zhu, L., Wang, M., Chen, S.: A semantic perception and cnn-transformer hybrid network for occluded person re-identification. IEEE Trans. Circuits Syst. Video Technol. (2023). https://doi.org/10.1109/TCSVT.2023.3296680
Article Google Scholar
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 637–653 (2018)
He, Y., Zhu, C., Yin, X.-C.: Occluded pedestrian detection via distribution-based mutual-supervised feature learning. IEEE Trans. Intell. Transp. Syst. 23(8), 10514–10529 (2022). https://doi.org/10.1109/TITS.2021.3094800
Article Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Quart. 2(1–2), 83–97 (1955)
Article MathSciNet Google Scholar
Bertsekas, D.P.: Auction algorithms for network flow problems: a tutorial introduction. Comput. Optim. Appl. 1(1), 7–66 (1992)
Article MathSciNet Google Scholar
Frogner, C., Zhang, C., Mobahi, H., Araya, M., Poggio, T.A.: Learning with a wasserstein loss. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28 (2015). https://proceedings.neurips.cc/paper/2015/file/a9eb812238f753132652ae09963a05e9-Paper.pdf
Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), 1111–1138 (2015)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: Freeanchor: learning to match anchors for visual object detection. In: Advances in Neural Information Processing Systems, pp. 147–155 (2019)
Kim, K., Lee, H.S.: Probabilistic anchor assignment with IOU prediction for object detection. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pp. 355–371. Springer (2020)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China (NSFC) under Grant (No.61932020), and the Taishan Scholar Program of Shandong Province (tstp20221128).

Author information

Authors and Affiliations

DUT School of Software Technology & DUT-RU International School of Information Science and Engineering, Dalian University of Technology, Dalian, China
Chongwei Liu, Zhihui Wang & Rui Xu
College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, China
Haojie Li

Authors

Chongwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Haojie Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhihui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rui Xu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.L. wrote the main manuscript text and prepared figures and tables. All the authors reviewed the manuscript.

Corresponding author

Correspondence to Haojie Li.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Communicated by T. Li.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, C., Li, H., Wang, Z. et al. Reconciling global and local optimal label assignments for heavily occluded pedestrian detection. Multimedia Systems 30, 100 (2024). https://doi.org/10.1007/s00530-024-01304-0

Download citation

Received: 06 January 2024
Accepted: 22 February 2024
Published: 29 March 2024
DOI: https://doi.org/10.1007/s00530-024-01304-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reconciling global and local optimal label assignments for heavily occluded pedestrian detection

Abstract

Access this article

Similar content being viewed by others

Semantic-driven multi-camera pedestrian detection

Count- and Similarity-Aware R-CNN for Pedestrian Detection

PolyTracker: Progressive Contour Regression for Multiple Object Tracking and Segmentation

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reconciling global and local optimal label assignments for heavily occluded pedestrian detection

Abstract

Access this article

Similar content being viewed by others

Semantic-driven multi-camera pedestrian detection

Count- and Similarity-Aware R-CNN for Pedestrian Detection

PolyTracker: Progressive Contour Regression for Multiple Object Tracking and Segmentation

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation