Skip to main content
Log in

Person retrieval in surveillance videos using attribute recognition

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

In person attribute recognition (PAR), an individual is described by his or her appearance. PAR-based person retrieval is a cross-modal problem where the input is a textual description of the person’s appearance and the output is an image of the person. The paper introduces PAR model development by merging a large-scale RAP dataset with the person retrieval benchmark dataset of AVSS 2018 challenge II. It uses a single deep network to detect a person’s attributes. The proposed approach uses five attributes; age, upper body (uBody) clothing color, uBody clothing type, lower body (lBody) clothing color, and lBody clothing type. Mask R-CNN is used for person detection, and the approach weighs each attribute to generate a ranking score for every detected person. Unlike the existing approaches, the proposed method uses a single deep network and fewer attributes to achieve state-of-the-art average intersection-of-union (IoU) of 66.7%, retrieval with IoU \(\ge\) 0.4 is 85.6%, and an average true positive rate (TPR) of 85.30%. It is better by 10.80% average IoU, 5.94% IoU \(\ge\) 0.4, and 3.85% TPR than the existing state-of-the-art person retrieval using attributes recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. https://github.com/simondenman/SemanticSearchChallengeAVSS18.

References

  • Chen D, Li H, Liu X, Shen Y, Shao J, Yuan Z, Wang X (2018) Improving deep visual representation for person re-identification by global and local image-language association. In: Proceedings of the European conference on computer vision (ECCV), pp 54–70,

  • Denman S, Fookes C, Bialkowski A, Sridharan S (2009) Soft-biometrics: unconstrained authentication in a surveillance environment. In: Proceedings 2009 digital image computing: techniques and applications (DICTA). IEEE, pp 196–203

  • Denman S, Halstead M, Fookes C, Sridharan S (2015) Searching for people using semantic soft biometric descriptions. Pattern Recognit Lett 68(2):306–315. https://doi.org/10.1016/j.patrec.2015.06.015

    Article  ADS  Google Scholar 

  • Galiyawala H, Raval MS (2021) Person retrieval in surveillance using textual query: a review. Multim Tools Appl 80(18):27343–27383. https://doi.org/10.1007/s11042-021-10983-0

    Article  Google Scholar 

  • Galiyawala H, Shah K, Gajjar V, Raval M S (2018) Person retrieval in surveillance video using height, color and gender. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6

  • Galiyawala H, Raval MS, Dave S (2019) Visual appearance based person retrieval in unconstrained environment videos. Image Vis Comput. https://doi.org/10.1016/j.imavis.2019.10.002

    Article  Google Scholar 

  • Galiyawala H, Raval M S,, Laddha A(2020) Person retrieval in surveillance videos using deep soft biometrics. In: Richard J, Chang-Tsun L, Danny C, Weizhi M, Christophe R (eds) Deep biometrics. Springer, , pp 191–214

  • Galiyawala H, Raval MS, Savaliya D (2021) Dsa-pr: discrete soft biometric attribute-based person retrieval in surveillance videos. In: 2021 17th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–7

  • Gao S, Cheng M, Zhao K, Zhang X, Yang M, Torr P (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662. https://doi.org/10.1109/TPAMI.2019.2938758

    Article  Google Scholar 

  • Halstead M, Denman S, Sridharan S, Fookes C (2014) Locating people in video from semantic descriptions: a new database and approach. In: 2014 22nd international conference on pattern recognition (ICPR). IEEE, pp 4501–4506

  • Halstead M, Denman S, Fookes C, Tian Y, Nixon MS (2018) Semantic person retrieval in surveillance using soft biometrics: AVSS 2018 challenge II. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 770–778

  • He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2961–2969

  • Huang G, Liu Z, Van Der Maaten L, Weinberger K (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4700–4708

  • Jain AK, Dass SC, Nandakumar K (2004) Can soft biometric traits assist user recognition? In: Biometric technology for human identification, vol 5404, pp 561–572

  • Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25

  • Li D, Chen X, Huang K (2015) Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR). IEEE, pp 111–115

  • Li D, Zhang Z, Chen X, Huang K (2018) A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios. IEEE Trans Image Process 28(4):1575–1590. https://doi.org/10.1109/TIP.2018.2878349

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  • Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision(ICCV). IEEE, pp 2980–2988

  • Sakib S, Deb K, Dhar P, Kwon O (2022) A framework for pedestrian attribute recognition using deep learning. Appl Sci 12(2):622. https://doi.org/10.3390/app12020622

    Article  CAS  Google Scholar 

  • Schumann A, Specker A, Beyerer J (2018) Attribute-based person retrieval and search in video sequences. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6

  • Shah P, Raval MS, Pandya S, Chaudhary S, Laddha A, Galiyawala H (2017) Description based person identification: use of clothes color and type. In: National conference on computer vision, pattern recognition, image processing, and graphics. Springer, pp 457–469

  • Shah P, Garg A, Gajjar V (2021) Per-vis: Person retrieval in video surveillance using semantic description. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp 41–50

  • Specker A, Beyerer J (2021) Improving attribute-based person retrieval by using a calibrated, weighted, and distribution-based distance metric. In: 2021 IEEE international conference on image processing (ICIP). IEEE, pp 2378–2382

  • Sudowe P, Spitzer H, Leibe B (2015) Person attribute recognition with a jointly-trained holistic CNN model. In: Proceedings of the IEEE international conference on computer vision workshops. IEEE, pp 87–95

  • Tsai R (1987) A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses. IEEE J Robot Autom 3(4):323–344. https://doi.org/10.1109/JRA.1987.1087109

    Article  Google Scholar 

  • Yaguchi T, Nixon MS (2018) Transfer learning based approach for semantic person retrieval. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6

  • Zhao Y, Shen C, Yu X, Chen H, Gao Y, Xiong S (2021) Learning deep part-aware embedding for person retrieval. Pattern Recognit. https://doi.org/10.1016/j.patcog.2021.107938

    Article  PubMed  PubMed Central  Google Scholar 

  • Zhao Y, Yam G, Lu J, Bian Z, Tian J(2022) Flsrnet: pedestrian attribute recognition using focal label smoothing regularization. Signal Image Video Process. https://doi.org/10.1007/s11760-021-02099-7

  • Zhen L, Hu P, Wang X, Peng D (2019) Deep supervised cross-modal retrieval. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10394–10403

Download references

Acknowledgements

The authors acknowledge NVIDIA Corporation’s support by way of a donation of the Quadro K5200 GPU used for this research. We would also like to thank the AVSS 2018 challenge II organizers and Mr. Dhyey Savaliya for providing inputs at various stages.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiren Galiyawala.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Galiyawala, H., Raval, M.S. & Patel, M. Person retrieval in surveillance videos using attribute recognition. J Ambient Intell Human Comput 15, 291–303 (2024). https://doi.org/10.1007/s12652-022-03891-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-022-03891-0

Keywords

Navigation