Skip to main content
Log in

MGSNet: A multi-scale and gated spatial attention network for crowd counting

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Recently, crowd counting via estimating a density map has been widely studied. However, it still has a variety of issues to overcome, such as large-scale variation of population, complex background noise, perspective distortion, etc. The large-scale variation of heads will restrict the performance of crowd counting approaches, and the complex background noise will result in the background, such as leaf and mesh, being incorrectly recognized as heads. To maintain large-scale variation and generate a high-quality estimated density map, we propose a novel multi-scale fusion scale-aware attention network called multi-scale and gated spatial attention network (MGSNet). In MGSNet, the first 10 layers of VGG16 with Batch Normalization (BN) are utilized as backbone. Then, two branches, i.e., a large-scale branch and a scale–aware attention branch, are followed. The large-scale branch is used to overcome the large-scale variation of heads in crowd images, in which a Scale Information Aggregation Block (SIAB) is employed to extract multi-scale features by utilizing dilated convolution with different receptive fields. The scale-aware attention branch is used to address complex background noise in crowd scenes, in which a Gated Spatial Attention Block (GSAB) inspired by the Long Short-term Memory Networks (LSTM) is employed to fuse the previous information with different scales and retain the appropriate scale information of crowds. We demonstrate our proposed method on the ShanghaiTech (Part AB), UCF-CC-50 and UCF-QNRF datasets. The experimental results show its effectiveness over the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Deb D, Ventura J (2018) An aggregated multicolumn dilated convolution network for perspective-free counting. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 195–204

  2. Cheng Z-Q, Li J-X, Dai Q, Wu X, He J-Y, Hauptmann AG (2019) Improving the learning of multi-column convolutional neural network for crowd counting. In: Proceedings of the 27th ACM international conference on multimedia, pp 1897–1906

  3. Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5245–5254

  4. Zhou Y, Yang J, Li H, Cao T, Kung S-Y (2020) Adversarial learning for multiscale crowd counting under complex scenes. IEEE transactions on cybernetics

  5. Sindagi VA, Patel VM (2018) A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recogn Lett 107:3–16

    Article  Google Scholar 

  6. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  7. Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2547–2554

  8. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, Ieee, pp 886–893

  9. Liu J, Gao C, Meng D, Hauptmann AG (2018) Decidenet: Counting varying density crowds through attention guided detection and density estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5197–5206

  10. Pham V-Q, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3253–3261

  11. Wang C, Zhang H, Yang L, Liu S, Cao X (2015) Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on multimedia, pp 1299–1302

  12. Fu M, Xu P, Li X, Liu Q, Ye M, Zhu C (2015) Fast crowd density estimation with convolutional neural networks. Eng Appl Artif Intell 43:81–88

    Article  Google Scholar 

  13. Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 833–841

  14. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597

  15. Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 4031–4039

  16. Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1091–1100

  17. Varior RR, Shuai B, Tighe J, Modolo D (2019) Scale-aware attention network for crowd counting. arXiv:1901.06026 1(2):3

    Google Scholar 

  18. Liu N, Long Y, Zou C, Niu Q, Pan L, Wu H (2019) Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3225–3234

  19. Zeng L, Xu X, Cai B, Qiu S, Zhang T (2017) Multi-scale convolutional neural networks for crowd counting. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp 465–469

  20. Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 734–750

  21. Hossain M, Hosseinzadeh M, Chanda O, Wang Y (2019) Crowd counting using scale-aware attention networks. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp 1280–1288

  22. Varior RR, Shuai B, Tighe J, Modolo D (2019) Multi-scale attention network for crowd counting. arXiv:1901.06026

  23. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  24. Idrees H, Tayyab M, Athrey K, Zhang D, Al-Maadeed S, Rajpoot N, Shah M (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 532– 546

  25. Wang Q, Gao J, Lin W, Yuan Y (2019) Learning from synthetic data for crowd counting in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8198–8207

  26. Jiang X, Xiao Z, Zhang B, Zhen X, Cao X, Doermann D, Shao L (2019) Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6133–6142

  27. Shi M, Yang Z, Xu C, Chen Q (2019) Revisiting perspective information for efficient crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7279–7288

  28. Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5099–5108

  29. Yan Z, Yuan Y, Zuo W, Tan X, Wang Y, Wen S, Ding E (2019) Perspective-guided convolution networks for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 952–961

  30. Sindagi VA, Patel VM (2019) Ha-ccn: Hierarchical attention-based crowd counting network. IEEE Trans Image Process 29: 323–335

    Article  MathSciNet  Google Scholar 

  31. Wang Q, Breckon TP (2019) Crowd counting via segmentation guided attention networks and curriculum loss . arXiv:1911.07990

  32. Thanasutives P, Fukui K-, Numao M, Kijsirikul B (2021) Encoder-decoder based convolutional neural networks with multi-scale-aware modules for crowd counting. In: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, pp 2382–2389

  33. Zhang A, Shen J, Xiao Z, Zhu F, Zhen X, Cao X, Shao L (2019) Relational attention network for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6788–6797

  34. Xu C, Qiu K, Fu J, Bai S, Xu Y, Bai X (2019) Learn to scale: Generating multipolar normalized density maps for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8382–8390

  35. Wang B, Liu H, Samaras D, Hoai M (2020) Distribution matching for crowd counting. arXiv:2009.13077

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61971073).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Sang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, Y., Sang, J., Wu, Z. et al. MGSNet: A multi-scale and gated spatial attention network for crowd counting. Appl Intell 52, 15436–15446 (2022). https://doi.org/10.1007/s10489-022-03263-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03263-3

Keywords

Navigation