Skip to main content
Log in

LRCN-RetailNet: A recurrent neural network architecture for accurate people counting

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Measuring and analyzing the flow of customers in retail stores is essential for a retailer to better comprehend customers’ behavior and support decision-making. Nevertheless, not much attention has been given to the development of novel technologies for automatic people counting. We introduce LRCN-RetailNet: a recurrent neural network architecture capable of learning a non-linear regression model and accurately predicting the people count from videos captured by low-cost surveillance cameras. The input video format follows the recently proposed RGBP image format, which is comprised of color and people (foreground) information. Our architecture is capable of considering two relevant aspects: spatial features extracted through convolutional layers from the RGBP images; and the temporal coherence of the problem, which is exploited by recurrent layers. We show that, through a supervised learning approach, the trained models are capable of predicting the people count with high accuracy. Additionally, we present and demonstrate that a straightforward modification of the methodology is effective to exclude salespeople from the people count. Comprehensive experiments were conducted to validate, evaluate and compare the proposed architecture. Results corroborated that LRCN-RetailNet remarkably outperforms both the previous RetailNet architecture, which was limited to evaluating a single image per iteration; and two state-of-the-art neural networks for object detection. Finally, computational performance experiments confirmed that the entire methodology is effective to estimate people count in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://ic.ufal.br/professor/thales/retailnet/

  2. http://www.ic.ufal.br/professor/thales/retailnet/

  3. https://www.youtube.com/watch?v=BkChGr9nrVg

  4. https://github.com/Purkialo/CrowdDet

References

  1. Amaral L, Júnior GLN, Vieira T, Vieira T (2019) Evaluating deep models for dynamic brazilian sign language recognition. In: Vera-Rodriguez R, Fierrez J, Morales A (eds) Progress in pattern recognition, image analysis, computer vision, and applications. Springer International Publishing, Cham, pp 930–937

  2. Boominathan L, Kruthiventi SS, Babu RV (2016) Crowdnet: a deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM international conference on Multimedia. ACM, pp 640–644

  3. Bouwmans T (2014) Traditional and recent approaches in background modeling for foreground detection: an overview. Comput Sci Rev 11:31–66

    Article  Google Scholar 

  4. Chan AB, Liang ZSJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking. In: 2008 IEEE Conference on computer vision and pattern recognition. IEEE, pp 1–7

  5. Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. In: Proceedings of the British Machine Vision Conference. BMVA Press, pp 21.1–21.11. https://doi.org/10.5244/C.26.21

  6. Chollet F, et al. (2015) Keras. https://keras.io

  7. Chu X, Zheng A, Zhang X, Sun J (2020) Detection in crowded scenes: One proposal, multiple predictions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12214–12223

  8. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR, vol 1. IEEE Computer Society, pp 886–893

  9. Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634

  10. Felzenszwalb PF, Girshick R, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  11. Gao C, Liu J, Feng Q, Lv J (2016) People-flow counting in complex environments by combining depth and color information. Multimed Tools Appl 75(15):9315–9331

    Article  Google Scholar 

  12. Hawkins DI, Mothersbaugh DL (2015) Consumer behavior: Building marketing strategy. McGraw-Hill Education

  13. Huang X, Ge Z, Jie Z, Yoshie O (2020) Nms by representative region: Towards crowded pedestrian detection by proposal pairing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10750–10759

  14. Jiang X, Zhang L, Xu M, Zhang T, Lv P, Zhou B, Yang X, Pang Y (2020) Attention scaling for crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4706–4715

  15. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980

  16. Lam S, Vandenbosch M, Pearce M (1998) Retail sales force scheduling based on store traffic forecasting. J Retail 74(1):61–88

    Article  Google Scholar 

  17. Lempitsky V, Zisserman A (2010) Learning to count objects in images. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1, NIPS’10. Curran Associates Inc., Red Hook, pp 1324–1332

  18. Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In: 2008 19Th international conference on pattern recognition, pp 1–4

  19. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755

  20. Liu J, Liu Y, Cui Y, Chen YQ (2013) Real-time human detection and tracking in complex environments using single rgbd camera. In: 2013 IEEE International conference on image processing. IEEE, pp 3088–3092

  21. Liu G, Yin Z, Jia Y, Xie Y (2017) Passenger flow estimation based on convolutional neural network in public transportation system. Knowl-Based Syst 123:102–115

    Article  Google Scholar 

  22. Liu J, Gu Y, Kamijo S (2017) Customer behavior classification using surveillance camera for marketing. Multimed Tools Appl 76(5):6595–6622

    Article  Google Scholar 

  23. Loy CC, Chen K, Gong S, Xiang T (2013) Crowd counting and profiling: Methodology and evaluation. In: Modeling, simulation and visual analysis of crowds. Springer, pp 347–382

  24. Marana AN, Costa LF, Lotufo RA, Velastin SA (1998) On the efficacy of texture analysis for crowd monitoring. In: Proceedings SIBGRAPI’98. International Symposium on Computer Graphics, Image Processing, and Vision (Cat. No.98EX237), pp 354–361

  25. Nogueira V, Oliveira H, Silva JA, Vieira T, Oliveira K (2019) Retailnet: a deep learning approach for people counting and hot spots detection in retail stores. In: 2019 32Nd SIBGRAPI conference on graphics, patterns and images (SIBGRAPI). IEEE, pp 155–162

  26. Paragios N, Ramesh V (2001) A mrf-based approach for real-time subway monitoring. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001 1 I–I

  27. Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: The IEEE international conference on computer vision (ICCV)

  28. Rauter M (2013) Reliable human detection and tracking in top-view depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 529–534

  29. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: CVPR

  30. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv

  31. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  32. Sabzmeydani P, Mori G (2007) Detecting pedestrians by learning shapelet features. In: 2007 IEEE Conference on computer vision and pattern recognition. IEEE, pp 1–8

  33. Sindagi VA, Patel VM (2018) A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recogn Lett 107:3–16

    Article  Google Scholar 

  34. Sun S, Akhtar N, Song H, Zhang C, Li J, Mian A (2019) Benchmark data and method for real-time people counting in cluttered scenes using depth sensors. IEEE Trans Intell Transp Syst 20(10):3599–3612

    Article  Google Scholar 

  35. Viola P, Jones M (2004) Robust real-time face detection. Int J Comput Vis 57:137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb

    Article  Google Scholar 

  36. Wang Y, Zou Y (2016) Fast visual object counting via example-based density estimation. In: 2016 IEEE International conference on image processing (ICIP). IEEE, pp 3653–3657

  37. Wei X, Du J, Liang M, Ye L (2019) Boosting deep attribute learning via support vector regression for fast moving crowd counting. Pattern Recogn Lett 119:12–23

    Article  Google Scholar 

  38. Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: Tenth IEEE international conference on computer vision (ICCV’05), vol 1. IEEE, pp 90–97

  39. Xu B, Qiu G (2016) Crowd density estimation based on rich features and random projection forest. In: 2016 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 1–8

  40. Yang Y, Li G, Wu Z, Su L, Huang Q, Sebe N (2020) Reverse perspective network for perspective-aware object counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4374–4383

  41. Zhang X, Yan J, Feng S, Lei Z, Yi D, Li SZ (2012) Water filling: Unsupervised people counting via vertical kinect sensor. In: 2012 IEEE Ninth international conference on advanced video and signal-based surveillance. IEEE, pp 215–220

  42. Zhang K, Zhang L, Liu Q, Zhang D, Yang MH (2014) Fast visual tracking via dense spatio-temporal context learning. In: European conference on computer vision. Springer, pp 127–141

  43. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: CVPR

  44. Zhao T, Nevatia R, Wu B (2008) Segmentation and tracking of multiple humans in crowded environments. IEEE Trans Pattern Anal Mach Intell 30:1198–211. https://doi.org/10.1109/TPAMI.2007.70770

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank CNPq/PIBITI/UFAL for the first author’s scholarship and PRMB Comércio e Distribuidora de Calçados LTDA for partially financing this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thales Vieira.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Massa, L., Barbosa, A., Oliveira, K. et al. LRCN-RetailNet: A recurrent neural network architecture for accurate people counting. Multimed Tools Appl 80, 5517–5537 (2021). https://doi.org/10.1007/s11042-020-09971-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09971-7

Keywords

Navigation