LRCN-RetailNet: A recurrent neural network architecture for accurate people counting

Massa, Lucas; Barbosa, Adriano; Oliveira, Krerley; Vieira, Thales

doi:10.1007/s11042-020-09971-7

LRCN-RetailNet: A recurrent neural network architecture for accurate people counting

Published: 07 October 2020

Volume 80, pages 5517–5537, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Lucas Massa¹,
Adriano Barbosa²,
Krerley Oliveira³ &
…
Thales Vieira ORCID: orcid.org/0000-0001-7775-5258¹

358 Accesses
7 Citations
Explore all metrics

Abstract

Measuring and analyzing the flow of customers in retail stores is essential for a retailer to better comprehend customers’ behavior and support decision-making. Nevertheless, not much attention has been given to the development of novel technologies for automatic people counting. We introduce LRCN-RetailNet: a recurrent neural network architecture capable of learning a non-linear regression model and accurately predicting the people count from videos captured by low-cost surveillance cameras. The input video format follows the recently proposed RGBP image format, which is comprised of color and people (foreground) information. Our architecture is capable of considering two relevant aspects: spatial features extracted through convolutional layers from the RGBP images; and the temporal coherence of the problem, which is exploited by recurrent layers. We show that, through a supervised learning approach, the trained models are capable of predicting the people count with high accuracy. Additionally, we present and demonstrate that a straightforward modification of the methodology is effective to exclude salespeople from the people count. Comprehensive experiments were conducted to validate, evaluate and compare the proposed architecture. Results corroborated that LRCN-RetailNet remarkably outperforms both the previous RetailNet architecture, which was limited to evaluating a single image per iteration; and two state-of-the-art neural networks for object detection. Finally, computational performance experiments confirmed that the entire methodology is effective to estimate people count in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

Notes

References

Amaral L, Júnior GLN, Vieira T, Vieira T (2019) Evaluating deep models for dynamic brazilian sign language recognition. In: Vera-Rodriguez R, Fierrez J, Morales A (eds) Progress in pattern recognition, image analysis, computer vision, and applications. Springer International Publishing, Cham, pp 930–937
Boominathan L, Kruthiventi SS, Babu RV (2016) Crowdnet: a deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM international conference on Multimedia. ACM, pp 640–644
Bouwmans T (2014) Traditional and recent approaches in background modeling for foreground detection: an overview. Comput Sci Rev 11:31–66
Article Google Scholar
Chan AB, Liang ZSJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking. In: 2008 IEEE Conference on computer vision and pattern recognition. IEEE, pp 1–7
Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. In: Proceedings of the British Machine Vision Conference. BMVA Press, pp 21.1–21.11. https://doi.org/10.5244/C.26.21
Chollet F, et al. (2015) Keras. https://keras.io
Chu X, Zheng A, Zhang X, Sun J (2020) Detection in crowded scenes: One proposal, multiple predictions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12214–12223
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR, vol 1. IEEE Computer Society, pp 886–893
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Felzenszwalb PF, Girshick R, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Gao C, Liu J, Feng Q, Lv J (2016) People-flow counting in complex environments by combining depth and color information. Multimed Tools Appl 75(15):9315–9331
Article Google Scholar
Hawkins DI, Mothersbaugh DL (2015) Consumer behavior: Building marketing strategy. McGraw-Hill Education
Huang X, Ge Z, Jie Z, Yoshie O (2020) Nms by representative region: Towards crowded pedestrian detection by proposal pairing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10750–10759
Jiang X, Zhang L, Xu M, Zhang T, Lv P, Zhou B, Yang X, Pang Y (2020) Attention scaling for crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4706–4715
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Lam S, Vandenbosch M, Pearce M (1998) Retail sales force scheduling based on store traffic forecasting. J Retail 74(1):61–88
Article Google Scholar
Lempitsky V, Zisserman A (2010) Learning to count objects in images. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1, NIPS’10. Curran Associates Inc., Red Hook, pp 1324–1332
Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In: 2008 19Th international conference on pattern recognition, pp 1–4
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755
Liu J, Liu Y, Cui Y, Chen YQ (2013) Real-time human detection and tracking in complex environments using single rgbd camera. In: 2013 IEEE International conference on image processing. IEEE, pp 3088–3092
Liu G, Yin Z, Jia Y, Xie Y (2017) Passenger flow estimation based on convolutional neural network in public transportation system. Knowl-Based Syst 123:102–115
Article Google Scholar
Liu J, Gu Y, Kamijo S (2017) Customer behavior classification using surveillance camera for marketing. Multimed Tools Appl 76(5):6595–6622
Article Google Scholar
Loy CC, Chen K, Gong S, Xiang T (2013) Crowd counting and profiling: Methodology and evaluation. In: Modeling, simulation and visual analysis of crowds. Springer, pp 347–382
Marana AN, Costa LF, Lotufo RA, Velastin SA (1998) On the efficacy of texture analysis for crowd monitoring. In: Proceedings SIBGRAPI’98. International Symposium on Computer Graphics, Image Processing, and Vision (Cat. No.98EX237), pp 354–361
Nogueira V, Oliveira H, Silva JA, Vieira T, Oliveira K (2019) Retailnet: a deep learning approach for people counting and hot spots detection in retail stores. In: 2019 32Nd SIBGRAPI conference on graphics, patterns and images (SIBGRAPI). IEEE, pp 155–162
Paragios N, Ramesh V (2001) A mrf-based approach for real-time subway monitoring. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001 1 I–I
Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: The IEEE international conference on computer vision (ICCV)
Rauter M (2013) Reliable human detection and tracking in top-view depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 529–534
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: CVPR
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Sabzmeydani P, Mori G (2007) Detecting pedestrians by learning shapelet features. In: 2007 IEEE Conference on computer vision and pattern recognition. IEEE, pp 1–8
Sindagi VA, Patel VM (2018) A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recogn Lett 107:3–16
Article Google Scholar
Sun S, Akhtar N, Song H, Zhang C, Li J, Mian A (2019) Benchmark data and method for real-time people counting in cluttered scenes using depth sensors. IEEE Trans Intell Transp Syst 20(10):3599–3612
Article Google Scholar
Viola P, Jones M (2004) Robust real-time face detection. Int J Comput Vis 57:137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb
Article Google Scholar
Wang Y, Zou Y (2016) Fast visual object counting via example-based density estimation. In: 2016 IEEE International conference on image processing (ICIP). IEEE, pp 3653–3657
Wei X, Du J, Liang M, Ye L (2019) Boosting deep attribute learning via support vector regression for fast moving crowd counting. Pattern Recogn Lett 119:12–23
Article Google Scholar
Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: Tenth IEEE international conference on computer vision (ICCV’05), vol 1. IEEE, pp 90–97
Xu B, Qiu G (2016) Crowd density estimation based on rich features and random projection forest. In: 2016 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 1–8
Yang Y, Li G, Wu Z, Su L, Huang Q, Sebe N (2020) Reverse perspective network for perspective-aware object counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4374–4383
Zhang X, Yan J, Feng S, Lei Z, Yi D, Li SZ (2012) Water filling: Unsupervised people counting via vertical kinect sensor. In: 2012 IEEE Ninth international conference on advanced video and signal-based surveillance. IEEE, pp 215–220
Zhang K, Zhang L, Liu Q, Zhang D, Yang MH (2014) Fast visual tracking via dense spatio-temporal context learning. In: European conference on computer vision. Springer, pp 127–141
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: CVPR
Zhao T, Nevatia R, Wu B (2008) Segmentation and tracking of multiple humans in crowded environments. IEEE Trans Pattern Anal Mach Intell 30:1198–211. https://doi.org/10.1109/TPAMI.2007.70770
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank CNPq/PIBITI/UFAL for the first author’s scholarship and PRMB Comércio e Distribuidora de Calçados LTDA for partially financing this research.

Author information

Authors and Affiliations

Institute of Computing, Federal University of Alagoas, Maceió, AL, Brazil
Lucas Massa & Thales Vieira
Faculty of Exact Sciences and Technology, Federal University of Grande Dourados, Dourados, MS, Brazil
Adriano Barbosa
Institute of Mathematics, Federal University of Alagoas, Maceió, AL, Brazil
Krerley Oliveira

Authors

Lucas Massa
View author publications
You can also search for this author in PubMed Google Scholar
Adriano Barbosa
View author publications
You can also search for this author in PubMed Google Scholar
Krerley Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Thales Vieira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thales Vieira.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Massa, L., Barbosa, A., Oliveira, K. et al. LRCN-RetailNet: A recurrent neural network architecture for accurate people counting. Multimed Tools Appl 80, 5517–5537 (2021). https://doi.org/10.1007/s11042-020-09971-7

Download citation

Received: 12 May 2020
Revised: 27 August 2020
Accepted: 24 September 2020
Published: 07 October 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11042-020-09971-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LRCN-RetailNet: A recurrent neural network architecture for accurate people counting

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

LRCN-RetailNet: A recurrent neural network architecture for accurate people counting

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation