Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization

Chen, Wei-Chen; Yu, Xin-Yi; Ou, Lin-Lin

doi:10.1007/s11633-022-1321-8

Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization

Research Article
Published: 07 January 2022

Volume 19, pages 153–168, (2022)
Cite this article

Machine Intelligence Research Aims and scope Submit manuscript

292 Accesses
7 Citations
2 Altmetric
Explore all metrics

Abstract

Pedestrian attribute recognition in surveillance scenarios is still a challenging task due to the inaccurate localization of specific attributes. In this paper, we propose a novel view-attribute localization method based on attention (VALA), which utilizes view information to guide the recognition process to focus on specific attributes and attention mechanism to localize specific attribute-corresponding areas. Concretely, view information is leveraged by the view prediction branch to generate four view weights that represent the confidences for attributes from different views. View weights are then delivered back to compose specific view-attributes, which will participate and supervise deep feature extraction. In order to explore the spatial location of a view-attribute, regional attention is introduced to aggregate spatial information and encode inter-channel dependencies of the view feature. Subsequently, a fine attentive attribute-specific region is localized, and regional weights for the view-attribute from different spatial locations are gained by the regional attention. The final view-attribute recognition outcome is obtained by combining the view weights with the regional weights. Experiments on three wide datasets (richly annotated pedestrian (RAP), annotated pedestrian v2 (RAPv2), and PA-100K) demonstrate the effectiveness of our approach compared with state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pedestrian Attributes Recognition in Surveillance Scenarios with Hierarchical Multi-task CNN Models

Weakly Supervised Pedestrian Attribute Recognition with Attention in Latent Space

Cascaded Split-and-Aggregate Learning with Feature Recombination for Pedestrian Attribute Recognition

Article 18 July 2021

References

P. Sudowe, H. Spitzer, B. Leibe. Person attribute recognition with a jointly-trained holistic CNN model. In Proceedings of IEEE International Conference on Computer Vision Workshop, IEEE, Santiago, Chile, pp. 329–377, 2015. DOI: https://doi.org/10.1109/ICCVW.2015.51.
Google Scholar
D. W. Li, X. T. Chen, Z. Zhang, K. Q. Huang. Pose guided deep model for pedestrian attribute recognition in surveillance scenarios. In Proceedings of IEEE International Conference on Multimedia and Expo, IEEE, San Diego, USA, pp. 1–6, 2018. DOI: https://doi.org/10.1109/ICME.2018.8486604.
Google Scholar
L. Bourdev, S. Maji, J. Malik. Describing people: A poselet-based approach to attribute classification. In Proceedings of International Conference on Computer Vision, IEEE, Barcelona, Spain, pp. 1543–1550, 2011. DOI: https://doi.org/10.1109/ICCV.2011.6126413.
Google Scholar
P. Z. Liu, X. H. Liu, J. J. Yan, J. Shao. Localization guided learning for pedestrian attribute recognition, [Online], Available: https://arxiv.org/abs/1808.09102, 2018.
D. W. Li, Z. Zhang, X. T. Chen, K. Q. Huang. A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios. IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 1575–1590, 1919. DOI: https://doi.org/10.1109/TIP.2018.2878349.
Article MathSciNet Google Scholar
N. Sarafianos, X. Xu, I. A. Kakadiaris. Deep imbalanced attribute classification using visual attention aggregation. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 708–725, 2018. DOI: https://doi.org/10.1007/978-3-030-01252-6_42.
Google Scholar
E. Yaghoubi, D. Borza, J. Neves, A. Kumar, H. Proença. An attention-based deep learning model for multiple pedestrian attributes recognition. Image and Vision Computing, vol. 102, Article number 103981, 2020. DOI: https://doi.org/10.1016/j.imavis.2020.103981.
M. D. Wu, D. Huang, Y. F. Guo, Y. H. Wang. Distraction-aware feature learning for human attribute recognition via coarse-to-fine attention mechanism. In Proceedings of AAAI Conference on Artificial Intelligence, vol.34, no.7, pp. 12394–12401, 2020. DOI: https://doi.org/10.1609/aaai.v34i07.6925.
Article Google Scholar
J. Hu, L. Shen, G. Sun. Squeeze-and-excitation networks. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 7132–7141, 2018. DOI: https://doi.org/10.1102/CVPR.2018.00745.
Google Scholar
S. Woo, J. Park, J. Y. Lee, I. S. Kweon. CBAM: Convolutional block attention module. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 3–19, 2018. DOI: https://doi.org/10.1007/978-3-030-01234-2_1.
Google Scholar
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.
Google Scholar
D. W. Li, Z. Zhang, X. T. Chen, H. B. Ling, K. Q. Huang. A richly annotated dataset for pedestrian attribute recognition, [Online], Available: https://arxiv.org/abs/1603.07054, April 27, 2016.
X. H. Liu, H. Y. Zhao, M. Q. Tian, L. Sheng, J. Shao, S. Yi, J. J. Yan, X. G. Wang. HydraPlus-Net: Attentive deep features for pedestrian analysis. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 350–352, 2017. DOI: https://doi.org/10.1102/ICCV.2017.46.
Google Scholar
C. Su, S. L. Zhang, J. L. Xing, W. Gao, Q. Tian. Deep attributes driven mutti-camera person re-identification. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 475–491, 2016. DOI: https://doi.org/10.1007/978-3-319-46475-6_30.
Google Scholar
Y. T. Lin, L. Zheng, Z. D. Zheng, Y. Wu, Z. L. Hu, C. G. Yan, Y. Yang. Improving person re-identification by attribute and identity learning. Pattern Recognition, vol. 95, pp. 151–161, 2019. DOI: https://doi.org/10.1016/j.patcog.2019.06.006.
Article Google Scholar
Z. D. Zheng, X. D. Yang, Z. D. Yu, L. Zheng, Y. Yang, J. Kautz. Joínt discriminative and generative learníng for person re-identification. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 2133–2142, 2012. DOI: https://doi.org/10.1109/CVPR.2019.00224.
Google Scholar
Y. L. Tian, P. Luo, X. G. Wang, X. O. Tang. Pedestrian detection aided by deep learning semantic tasks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 5079–5087, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7299143.
Google Scholar
X. B. Liu, Y. L. Xu, L. Zhu, Y. D. Mu. A stochastic attribute grammar for robust cross-view human tracking. IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2884–2895, 2018. DOI: https://doi.org/10.1109/TCSVT.2017.2781738.
Article Google Scholar
X. W. Wang, T. Zhang, D. R. Tretter, Q. Lin. Personal clothing retrieval on photo collections by color and attributes. IEEE Transactions on Multimedia, vol. 15, no. 8, pp. 2035–2045, 2013. DOI: https://doi.org/10.1109/TMM.2013.2279658.
Article Google Scholar
R. Feris, R. Bobbitt, L. Brown, S. Pankanti. Attribute-based people search: Lessons learnt from a practical surveillance system. In Proceedings of International Conference on Multimedia Retrieval, ACM, Glasgow, UK, pp. 153–160, 2014. DOI: https://doi.org/10.1145/2578726.2578732.
Google Scholar
J. E. Liu, B. Kuipers, S. Savarese. Recognizing human actions by attributes. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Colorado Springs, USA, pp. 3337–3344, 2011. DOI: https://doi.org/10.1109/CVPR.2011.5995353.
Google Scholar
X. F. Ji, Q. Q. Wu, Z. J. Ju, Y. Y. Wang. Study of human action recognition based on improved spatio-temporal features. International Journal of Automation and Computing, vol. 11, no. 5, pp. 500–509, 2014. DOI: https://doi.org/10.1007/s11633-014-0831-4.
Article Google Scholar
L. F. Wu, Q. Wang, M. Jian, Y. Qiao, B. X. Zhao. A comprehensive review of group activity recognition in videos. International Journal of Automation and Computing, vol. 18, no. 3, pp. 334–350, 2021. DOI: https://doi.org/10.1007/s11633-020-1258-8.
Article Google Scholar
Z. W. Xu, X. J. Wu, J. Kittler. STRNet: Triple-stream spatiotemporal relation network for action recognition. International Journal of Automation and Computing, vol. 18, no. 5, pp. 718–730, 2021. DOI: https://doi.org/10.1007/s11633-021-1289-9.
Article Google Scholar
M. Fayyaz, J. Gall. SCT: Set constrained temporal transformer for set supervised action segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp.498–507, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00058.
Google Scholar
J. Li, S. Todorovic. Set-constrained viterbi for set-supervised action segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 10817–10826, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.01083.
Google Scholar
Y. F. Huang, Y. Sugano, Y. Sato. Improving action segmentation via graph-based temporal reasoning. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 14021–14031, 2020. DOI: https://doi.org/10.1009/CVPR42600.2020.01404.
Google Scholar
J. Chen, Z. H. Li, J. B. Luo, C. L. Xu. Learning a weakly-supervised video actor-action segmentation model with a wise selection. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp.9898–9908, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00992.
Google Scholar
N. Dalal, B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, San Diego, USA, pp. 886–893, 2005. DOI: https://doi.org/10.1109/CVPR.2005.177.
Google Scholar
R. Layne, T. Hospedales, S. G. Gong. Person re-identification by attributes. In Proceedings of British Machine Vision Conference, Surrey, UK, Article number 24, 2012. DOI: https://doi.org/10.5244/C.26.24.
D. W. Li, X. T. Chen, K. Q. Huang. Multi-attribute learning for pedestrian attribute recognition m surveillance scenarios. In Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition, IEEE, Kuala Lumpur, Malaysia, pp. 111–115, 2015. DOI: https://doi.org/10.0109/CPPR.2015.7486476.
Google Scholar
J. J. Zhang, P. Y. Ren, J. M. Li. Deep template matching for pedestrian attribute recognition with the auxiliary supervision of attribute-wise keypoints, [Online], Available: https://arxiv.org/abs/2011.06798, November 13, 2020.
J. Y. Wang, X. T. Zhu, S. G. Gong, W. Li. Attribute recognition by joint recurrent learning of context and correlation. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 531–540, 2017. DOI: https://doi.org/10.1109/ICCV.2017.65.
Google Scholar
X. Zhao, L. F. Sang, G. G. Ding, Y. C. Guo, X. M. Jin. Grouping attribute recognition for pedestrian with joint recurrent learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI, Stockholm, Sweden, pp. 3177–3183, 2018. DOI: https://doi.org/10.24963/ijcai.2018/441.
Google Scholar
C. F. Tang, L. Sheng, Z. X. Zhang, X. L. Hu. Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 4996–5005, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00510.
Google Scholar
C. L. Zitnick, P. Dollár. Edge boxes: Locating object proposals from edges. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 391–405, 2014. DOI: https://doi.org/10.1007/978-3-319-10602-1_26.
Google Scholar
Z. X. Feng, J. H. Lai, X. H. Xie. Learning view-specific deep networks for person re-identification. IEEE Transactions on Image Processing, vol. 27, no. 7, pp. 3472–3483, 2018. DOI: https://doi.org/10.1109/TIP.2018.2818438.
Article MathSciNet MATH Google Scholar
S. S. Farfade, M. J. Saberian, L. J. Li. Multi-view face detection using deep convolutional neural networks. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ACM, Shanghai, China, pp. 643–650, 2015. DOI: https://doi.org/10.1145/2671188.2749408.
Google Scholar
H. Sadr, M. M. Pedram, M. Teshnehlab. Multi-view deep network: A deep model based on learning features from heterogeneous neural networks or sentiment analysis. IEEE Access, vol. 8, pp. 86984–86997, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.2992063.
Article Google Scholar
F. Zhu, H. S. Li, W. L. Ouyang, N. H. Yu, X. G. Wang. Learning spatial regularization with image-level supervisions or multi-label image classification. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 2027–2036, 2017. DOI: https://doi.org/10.1109/CVPR.2017.219.
Google Scholar
Z. C. Tan, Y. Yang, J. Wan, H. Y. Hang, G. D. Guo, S. Z. Li. Attention-based pedestrian attribute analysis. IEEE Transactions on Image Processing, vol. 28, no. 12, pp. 6126–6140, 2019. DOI: https://doi.org/10.1109/TIP.2019.2919199.
Article MathSciNet MATH Google Scholar
C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, USA, pp. 4278–4284, 2017.
Google Scholar
C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 1–9, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298594.
Google Scholar
S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, JMLR, Lille, France, pp. 448–456, 2015.
Google Scholar
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna. Rethinking the inception architecture for computer vision. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 2818–2826, 2016. DOI: https://doi.org/10.1109/CVPR.2016.308.
Google Scholar
H. Cai, C. Gan, T. Z. Wang, Z. K. Zhang, S. Han. Once-for-all: Train one network and specialize it for efficient deployment, [Online], Available: https://arxiv.org/abs/1908.09791, 2019.
A. Howard, M. Sandler, B. Chen, W. J. Wang, L. C. Chen, M. X. Tan, G. Chu, V. Vasudevan, Y. K. Zhu, R. M. Pang, H, Adam, Q. Le. Searching for MobileNetV3. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 1314–1324, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00140.
Google Scholar
Y. B. Deng, P. Luo, C. C. Loy, X. O. Tang. Pedestrian attribute recognition at far distance. In Proceedings of the 22nd ACM International Conference on Multimedia, ACM, Lisboa, Portugal, pp. 789–792, 2014. DOI: https://doi.org/10.1145/2647868.2654966.
Google Scholar
M. S. Sarfraz, A. Schumann, Y. Wang, R. Stiefelhagen. Deep view-sensitive pedestrian attribute inference in an end-to-end model, [Online], Available: https://arxiv.org/abs/1707.06089, July 19, 2017.
H. Guo, K. Zheng, X. C. Fan, H. K. Yu, S. Wang. Visual attention consistency under image transforms for multi-label image classification. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 729–739, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00082.
Google Scholar
J. Jia, H. J. Huang, W. J. Yang, X. T. Chen, K. Q. Huang. Rethinking of pedestrian attribute recognition: Realistic datasets with efficient method, [Online], Available: https://arxiv.org/abs/2005.11909, May 26, 2020.
H. T. Zeng, H. Z. Ai, Z. J. Zhuang, L. Chen. Multi-task learning via co-attentive sharing for pedestrian attribute recognition. In Proceedings of IEEE International Conference on Multimedia and Expo, IEEE, London, UK, pp. 1–6, 2020. DOI: https://doi.org/10.1109/ICME46284.2020.9102757.
Google Scholar
X. Y. Yu, W. C. Chen, Y. F. Jin, L. L. Ou. Pedestrian View-attribute Location and Recognition Method in Video Surveillance Scene Based on Attention Mechanism, CN113361336A, September 2021. (in Chinese)

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (No. 2018YFB1308000) and Natural Science Foundation of Zhejiang province (No. LY21F 030018).

Author information

Authors and Affiliations

College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
Wei-Chen Chen, Xin-Yi Yu & Lin-Lin Ou

Authors

Wei-Chen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xin-Yi Yu
View author publications
You can also search for this author in PubMed Google Scholar
Lin-Lin Ou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xin-Yi Yu or Lin-Lin Ou.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Wei-Chen Chen received the B. Eng. degree in new energy science and engineering from Tianjin University of Technology, China in 2020. She is currently a master student at Zhejiang University of Technology, China.

Her research interests include person detection, person re-identification, person tracking and pedestrian attribute recognition.

Xin-Yi Yu received the B. Eng. and Ph. D. degrees in mechanical design and theory from Harbin Institute of Technology, China in 2003 and 2009, respectively. He is engaged in postdoctoral work in Foshan Enterprise Postdoctoral Workstation, China from 2009 to 2012. He is with College of Information Engineering, Zhejiang University of Technology, China, as a lecturer since 2012. He was a recipient of the China Machinery Industry Technology.

His research interests include human-robot integration and intelligent manufacturing system.

Lin-Lin Ou received the B. Eng. and Ph. D. degrees in control science and engineering from Shanghai Jiao Tong University, China in 2001 and 2006, respectively. She was with College of Information Engineering, Zhejiang University of Technology, China, a lecturer from 2006 to 2007 and later as an associate professor from 2008 to 2012. She is currently a professor since 2013. She was a recipient of the China Machinery Industry Science and Technology.

Her research interests include intelligent learning and robot system, multi-robot collaborative control, and human-robot integration.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, WC., Yu, XY. & Ou, LL. Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization. Mach. Intell. Res. 19, 153–168 (2022). https://doi.org/10.1007/s11633-022-1321-8

Download citation

Received: 20 June 2021
Accepted: 03 December 2021
Published: 07 January 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11633-022-1321-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization

Abstract

Access this article

Similar content being viewed by others

Pedestrian Attributes Recognition in Surveillance Scenarios with Hierarchical Multi-task CNN Models

Weakly Supervised Pedestrian Attribute Recognition with Attention in Latent Space

Cascaded Split-and-Aggregate Learning with Feature Recombination for Pedestrian Attribute Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization

Abstract

Access this article

Similar content being viewed by others

Pedestrian Attributes Recognition in Surveillance Scenarios with Hierarchical Multi-task CNN Models

Weakly Supervised Pedestrian Attribute Recognition with Attention in Latent Space

Cascaded Split-and-Aggregate Learning with Feature Recombination for Pedestrian Attribute Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation