Person retrieval in surveillance videos using attribute recognition

Galiyawala, Hiren; Raval, Mehul S.; Patel, Meet

doi:10.1007/s12652-022-03891-0

Person retrieval in surveillance videos using attribute recognition

Original Research
Published: 20 May 2022

Volume 15, pages 291–303, (2024)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

149 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

In person attribute recognition (PAR), an individual is described by his or her appearance. PAR-based person retrieval is a cross-modal problem where the input is a textual description of the person’s appearance and the output is an image of the person. The paper introduces PAR model development by merging a large-scale RAP dataset with the person retrieval benchmark dataset of AVSS 2018 challenge II. It uses a single deep network to detect a person’s attributes. The proposed approach uses five attributes; age, upper body (uBody) clothing color, uBody clothing type, lower body (lBody) clothing color, and lBody clothing type. Mask R-CNN is used for person detection, and the approach weighs each attribute to generate a ranking score for every detected person. Unlike the existing approaches, the proposed method uses a single deep network and fewer attributes to achieve state-of-the-art average intersection-of-union (IoU) of 66.7%, retrieval with IoU \(\ge\) 0.4 is 85.6%, and an average true positive rate (TPR) of 85.30%. It is better by 10.80% average IoU, 5.94% IoU \(\ge\) 0.4, and 3.85% TPR than the existing state-of-the-art person retrieval using attributes recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

End-to-End Object Detection with Transformers

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Notes

https://github.com/simondenman/SemanticSearchChallengeAVSS18.

References

Chen D, Li H, Liu X, Shen Y, Shao J, Yuan Z, Wang X (2018) Improving deep visual representation for person re-identification by global and local image-language association. In: Proceedings of the European conference on computer vision (ECCV), pp 54–70,
Denman S, Fookes C, Bialkowski A, Sridharan S (2009) Soft-biometrics: unconstrained authentication in a surveillance environment. In: Proceedings 2009 digital image computing: techniques and applications (DICTA). IEEE, pp 196–203
Denman S, Halstead M, Fookes C, Sridharan S (2015) Searching for people using semantic soft biometric descriptions. Pattern Recognit Lett 68(2):306–315. https://doi.org/10.1016/j.patrec.2015.06.015
Article ADS Google Scholar
Galiyawala H, Raval MS (2021) Person retrieval in surveillance using textual query: a review. Multim Tools Appl 80(18):27343–27383. https://doi.org/10.1007/s11042-021-10983-0
Article Google Scholar
Galiyawala H, Shah K, Gajjar V, Raval M S (2018) Person retrieval in surveillance video using height, color and gender. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6
Galiyawala H, Raval MS, Dave S (2019) Visual appearance based person retrieval in unconstrained environment videos. Image Vis Comput. https://doi.org/10.1016/j.imavis.2019.10.002
Article Google Scholar
Galiyawala H, Raval M S,, Laddha A(2020) Person retrieval in surveillance videos using deep soft biometrics. In: Richard J, Chang-Tsun L, Danny C, Weizhi M, Christophe R (eds) Deep biometrics. Springer, , pp 191–214
Galiyawala H, Raval MS, Savaliya D (2021) Dsa-pr: discrete soft biometric attribute-based person retrieval in surveillance videos. In: 2021 17th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–7
Gao S, Cheng M, Zhao K, Zhang X, Yang M, Torr P (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662. https://doi.org/10.1109/TPAMI.2019.2938758
Article Google Scholar
Halstead M, Denman S, Sridharan S, Fookes C (2014) Locating people in video from semantic descriptions: a new database and approach. In: 2014 22nd international conference on pattern recognition (ICPR). IEEE, pp 4501–4506
Halstead M, Denman S, Fookes C, Tian Y, Nixon MS (2018) Semantic person retrieval in surveillance using soft biometrics: AVSS 2018 challenge II. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 770–778
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2961–2969
Huang G, Liu Z, Van Der Maaten L, Weinberger K (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4700–4708
Jain AK, Dass SC, Nandakumar K (2004) Can soft biometric traits assist user recognition? In: Biometric technology for human identification, vol 5404, pp 561–572
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25
Li D, Chen X, Huang K (2015) Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR). IEEE, pp 111–115
Li D, Zhang Z, Chen X, Huang K (2018) A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios. IEEE Trans Image Process 28(4):1575–1590. https://doi.org/10.1109/TIP.2018.2878349
Article ADS MathSciNet PubMed Google Scholar
Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision(ICCV). IEEE, pp 2980–2988
Sakib S, Deb K, Dhar P, Kwon O (2022) A framework for pedestrian attribute recognition using deep learning. Appl Sci 12(2):622. https://doi.org/10.3390/app12020622
Article CAS Google Scholar
Schumann A, Specker A, Beyerer J (2018) Attribute-based person retrieval and search in video sequences. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6
Shah P, Raval MS, Pandya S, Chaudhary S, Laddha A, Galiyawala H (2017) Description based person identification: use of clothes color and type. In: National conference on computer vision, pattern recognition, image processing, and graphics. Springer, pp 457–469
Shah P, Garg A, Gajjar V (2021) Per-vis: Person retrieval in video surveillance using semantic description. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp 41–50
Specker A, Beyerer J (2021) Improving attribute-based person retrieval by using a calibrated, weighted, and distribution-based distance metric. In: 2021 IEEE international conference on image processing (ICIP). IEEE, pp 2378–2382
Sudowe P, Spitzer H, Leibe B (2015) Person attribute recognition with a jointly-trained holistic CNN model. In: Proceedings of the IEEE international conference on computer vision workshops. IEEE, pp 87–95
Tsai R (1987) A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses. IEEE J Robot Autom 3(4):323–344. https://doi.org/10.1109/JRA.1987.1087109
Article Google Scholar
Yaguchi T, Nixon MS (2018) Transfer learning based approach for semantic person retrieval. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6
Zhao Y, Shen C, Yu X, Chen H, Gao Y, Xiong S (2021) Learning deep part-aware embedding for person retrieval. Pattern Recognit. https://doi.org/10.1016/j.patcog.2021.107938
Article PubMed PubMed Central Google Scholar
Zhao Y, Yam G, Lu J, Bian Z, Tian J(2022) Flsrnet: pedestrian attribute recognition using focal label smoothing regularization. Signal Image Video Process. https://doi.org/10.1007/s11760-021-02099-7
Zhen L, Hu P, Wang X, Peng D (2019) Deep supervised cross-modal retrieval. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10394–10403

Download references

Acknowledgements

The authors acknowledge NVIDIA Corporation’s support by way of a donation of the Quadro K5200 GPU used for this research. We would also like to thank the AVSS 2018 challenge II organizers and Mr. Dhyey Savaliya for providing inputs at various stages.

Author information

Authors and Affiliations

School of Engineering and Applied Science, Ahmedabad University, Ahmedabad, Gujarat, 380009, India
Hiren Galiyawala & Mehul S. Raval
L. D. College of Engineering, Ahmedabad, Gujarat, 380015, India
Meet Patel

Authors

Hiren Galiyawala
View author publications
You can also search for this author in PubMed Google Scholar
Mehul S. Raval
View author publications
You can also search for this author in PubMed Google Scholar
Meet Patel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiren Galiyawala.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Galiyawala, H., Raval, M.S. & Patel, M. Person retrieval in surveillance videos using attribute recognition. J Ambient Intell Human Comput 15, 291–303 (2024). https://doi.org/10.1007/s12652-022-03891-0

Download citation

Received: 06 July 2021
Accepted: 28 April 2022
Published: 20 May 2022
Issue Date: January 2024
DOI: https://doi.org/10.1007/s12652-022-03891-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Person retrieval in surveillance videos using attribute recognition

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

ImageNet Large Scale Visual Recognition Challenge

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Person retrieval in surveillance videos using attribute recognition

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

ImageNet Large Scale Visual Recognition Challenge

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation