Skip to main content
Log in

Pedestrian tracking in surveillance video based on modified CNN

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the prevalence of surveillance video, surveillance data can be used in a wide variety of applications where moving object detection, object recognition and pedestrian tracking has become a significant field of research. Especially for pedestrian tracking, it has become an urgent problem to be solved. This paper proposes a novel method based on convolutional neural network called Matching-Siamese network for pedestrian tracking. First, the pedestrians are detected from surveillance videos through Faster-R-CNN and are numbered sequentially. Second, Matching-Siamese network is designed by modifying the structure of the traditional Siamese network to calculate the similarity of two input images. Third, using the image similarity determines whether the probe image of the target pedestrian and each pedestrian images are of the same identity or not. Finally, we track the target pedestrian in all videos by using the identity of probe image and pedestrian images. The results in this paper show that the proposed method outperforms most popular algorithms in terms of accuracy, overlap rate and computational efficiency, especially in the circumstances of object disappearing and reappearing. In addition, our method could use a latest probe pedestrian image to accomplish its tracking in videos ranging from randomly selected time and regions well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. An J, Zhang X (2011) Robust image matching method based on complex wavelet structural Similarity[C]. In: Advances in computer science, environment, ecoinformatics, and education, pp 81–88

  2. Araujo A, Girod B (2017) Large-scale video retrieval using image queries[j]. IEEE transactions on circuits and systems for video technology

  3. Bertinetto L, Valmadre J, Golodetz S et al (2016) Staple: complementary learners for real-time tracking[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1401–1409

  4. Bontar J, Lecun Y (2015) Stereo matching by training a convolutional neural network to compare image patches[J]. J Mach Learn Res 17(1):2287–2318

    Google Scholar 

  5. Bromley J, Guyon I, Lecun Y et al (1993) Signature verification using a siamese time delay neural Network[C]. Adv Neural Inf Proces Syst, DBLP 7(4):737–744

  6. Chopra S, Hadsell R, Lecun Y (2005) Learning a similarity metric discriminatively, with application to face verification[C]. IEEE Comput Soc Conf Comput Vis Pattern Recogn 1:539–546

    Google Scholar 

  7. Cui Z, Xiao S, Feng J et al (2016) Recurrently target-attending tracking[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1449–1458

  8. Danelljan M, Robinson A, Khan FS et al (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking[C]. European Conference on Computer Vision. Springer International Publishing, Berlin

    Google Scholar 

  9. Feng P, Xu C, Zhao Z et al (2017) Sparse representation combined with context information for visual tracking[J]. Neurocomputing 225:92–102

    Article  Google Scholar 

  10. Henriques JF, Rui C, Martins P et al (2014) High-speed tracking with kernelized correlation filters[j]. IEEE Trans Pattern Anal Mach Intell 37(3):583

    Article  Google Scholar 

  11. Jin X, Xu C, Feng J et al (2016) Deep learning with s-shaped rectified linear activation units[c]. In: AAAI, pp 1737–1743

  12. Milan A, Rezatofighi SH, Dick AR et al (2017) Online multi-target tracking using recurrent neural networks[c]. In: AAAI, pp 4225–4232

  13. Rehman A, Gao Y, Wang J et al (2013) Image classification based on complex wavelet structural similarity[J]. Signal Process: Image Commun 28(8):984–992

    Google Scholar 

  14. Ren S, He K, Girshick R et al (2017) Faster r-cnn: Towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell 39 (6):1137–1149

    Article  Google Scholar 

  15. Shen X, Sui X, Pan K et al (2016) Adaptive pedestrian tracking via patch-based features and spatialCtemporal similarity measurement[J]. Pattern Recogn 53:163–173

    Article  Google Scholar 

  16. Smeulders AWM, Chu DM, Cucchiara R et al (2014) Visual tracking: An experimental survey[J]. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468

    Article  Google Scholar 

  17. Tao R, Gavves E, Smeulders AWM (2016) Siamese instance search for tracking[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429

  18. Vojir T, Noskova J, Matas J (2014) Robust scale-adaptive mean-shift for tracking[J]. Pattern Recogn Lett 49(C):250–258

    Article  Google Scholar 

  19. Wang Z, Lu L, Bovik AC (2004) Video quality assessment based on structural distortion measurement[J]. Signal Process: Image Commun 19(2):121–132

    Google Scholar 

  20. Weng L, Preneel B (2011) A secure perceptual hash algorithm for image content authentication[C]. IFIP International Conference on Communications and Multimedia Security. Springer, Berlin Heidelberg

    Google Scholar 

  21. Xu C, Lu C, Liang X et al (2016) Multi-loss regularized deep neural network[J]. IEEE Trans Circ Syst Video Technol 26(12):2273–2283

    Article  Google Scholar 

  22. Yan C, Xie H, Liu S et al (2017) Effective uyghur language text detection in complex background images for traffic prompt identification[j]. IEEE Transactions on Intelligent Transportation Systems

  23. Yan C, Zhang Y, Xu J et al (2014) Efficient parallel framework for HEVC motion estimation on many-core processors[J]. IEEE Trans Circ Syst Video Technol 24 (12):2077–2089

    Article  Google Scholar 

  24. Yan C, Zhang Y, Xu J et al (2014) A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors[J]. IEEE Signal Process Lett 21(5):573–576

    Article  Google Scholar 

  25. Yan C, Xie H, Yang D et al (2017) Supervised hash coding with deep neural network for environment perception of intelligent vehicles[j]. IEEE Transactions on Intelligent Transportation Systems

  26. Yang T, Fu D, Pan S (2017) Pedestrian tracking for infrared image sequence based on trajectory manifold of spatio-temporal slice[J]. Multimed Tools Appl 76 (8):11021–11035

    Article  Google Scholar 

  27. Yilmaz A, Javed O, Shah M (2006) Object tracking: A survey[j]. ACM Comput Surv (CSUR) 38(4):13

    Article  Google Scholar 

  28. Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4353–4361

  29. Zhong B, Shen Y, Chen Y et al (2015) Online learning 3D context for robust visual tracking[J]. Neurocomputing 151:710–718

    Article  Google Scholar 

  30. Zhu L, Wang R, Xu K (2016) SU-f-j-226: Structural similarity-based ultrasound image similarity measurement[J]. Med Phys 43(6):3461–3461

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by Anhui Science and Technology Department project (No. 1401b042001) and Security and Campus Management of USTC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Yin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, Y., Yin, D., Wang, A. et al. Pedestrian tracking in surveillance video based on modified CNN. Multimed Tools Appl 77, 24041–24058 (2018). https://doi.org/10.1007/s11042-018-5728-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5728-8

Keywords

Navigation