Abstract
The performance of multi-person pose estimation has been greatly improved due to the rapid development of deep learning. However, the problems of self-occlusion, mutual occlusion and complex background not only have not been effectively solved. In order to further effectively solve these problems, in this paper, we design a novel Global and Local Content-aware Feature Boosting Network (GLCFBNet) that includes Intra-layer Feature Residual-like Module (IFRM), Input Feature Aggregation Module (IFAM), Spatial and Channel Feature Hourglass Attention Module (SCFHAM). We propose a novel IFRM that can expand receptive field of each convolution layer through aggregation feature. The IRAM can fully extract the edge information of the input image,and effectively solve the problem of negative background impact. The SCFHAM can accurately determine the location of the occluded keypoints, judge the global information of the reasonable keypoints, and extract the effective features for joint node positioning from the redundant feature information. We evaluate the effectiveness of our proposed method on the MSCOCO keypoint detection dataset, the MPII Human Pose dataset and the CrowdPose dataset.
Similar content being viewed by others
Data Availability
The MPII Human Pose dataset was derived and analysed from the following public domain resources:http://human-pose.mpi-inf.mpg.de/. The MSCOCO keypoint detection dataset was derived and analysed from the following public domain resources:https://cocodataset.org/. The CrowdPose dataset was derived and analysed from the following public domain resources:https://github.com/Jeff-sjtu/CrowdPose
References
Luo Y, Xu Z, Liu P, Du Y, Guo J-M (2018) Multi-person pose estimation via multi-layer fractal network and joints kinship pattern. IEEE Trans Image Process 28(1):142–155
Majd M, Safabakhsh R (2019) A motion-aware convLSTM network for action recognition. Appl Intell 49(7):2515–2521
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 466–481
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, Springer, pp 483–499
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, Springer, pp 740–755
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693
Li J, Wang C, Zhu H, Mao Y, Fang H-S, Lu C (2019) Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10863–10872
Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: CVPR 2011, IEEE, pp 1385–1392
Pishchulin L, Andriluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, vol 27
Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3073–3082
Song J, Wang L, Van Gool L, Hilliges O (2017) Thin-slicing network: a deep structured model for pose estimation in videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4220–4229
Kong D, Chen Y, Ma H, Yan X, Xie X (2019) Adaptive graphical model network for 2d handpose estimation. arXiv preprint arXiv:1909.08205
Kong D, Ma H, Chen Y, Xie X (2020) Rotation-invariant mixed graphical model network for 2d hand pose estimation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1546–1555
Tang W, Yu P, Wu Y (2018) Deeply learned compositional models for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 190–206
Chen Y, Shen C, Wei XS, Liu L, Yang J (2017) Adversarial posenet: a structure-aware convolutional network for human pose estimation. In: IEEE computer society
Chou C-J, Chien J-T, Chen H-T (2018) Self adversarial training for human pose estimation. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp 17–30
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi- context attention for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1831–1840
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: International conference on computer vision
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4903–4911
Fang H-S, Xie S, Tai Y-W, Lu C (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 2334–2343
Huang S, Gong M, Tao D (2017) A coarse-fine network for keypoint localization. In: 2017 IEEE international conference on computer vision (ICCV)
Yang W, Li S, Ouyang W, Li H, Wang X (2017) Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 1281–1290
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937
Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. In: Advances in neural information processing systems, vol 30
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol 25
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Wang R, Geng F, Wang X (2022) Mtpose: Human pose estimation with high-resolution multi-scale transformers. Neural Process Lett 54(5):3941–3964
Juan Lyu, Sai Ho, Ling (2018) Using multi-level convolutional neural network for classification of lung nodules on CT images. In: Conference proceedings : annual international conference of the IEEE engineering in medicine and biology society. IEEE Engineering in Medicine and Biology Society. Annual Conference, vol 2018, pp 686–689
Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2403–2412
Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr P (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662
Ke L, Chang M-C, Qi H, Lyu S (2018) Multi-scale structure-aware network for human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 713–728
Zhao H, Kong X, He J, Qiao Y, Dong C (2020) Efficient image super-resolution using pixel attention. In: European conference on computer vision, Springer, pp 56–72
Zhang J, Su Q, Tang B, Wang C, Li Y (2021) Dpsnet: Multitask learning using geometry reasoning for scene depth and semantics. IEEE Trans Neural Netw Learn Syst. 1–12. https://doi.org/10.1109/TNNLS.2021.3107362
Zhang J, Su Q, Wang C, Gu H (2020) Monocular 3d vehicle detection with multi-instance depth and geometry reasoning for autonomous driving. Neurocomputing 403:182–192
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154
Wang Q, Wu B, Zhu P, Li P, Hu Q (2020) Eca-net: Efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Roy Abhijit, Guha Navab, Wachinger Nassir, Christian (2019) Recalibrating fully convolutional networks with spatial and channel squeeze and excitation blocks. IEEE Trans Med Imaging 38(2):540–549
Hu Y, Li J, Huang Y, Gao X (2019) Channel-wise and spatial feature modulation network for single image super-resolution. IEEE Trans Circ Syst Vid Technol 30(11):3911–3927
Wang X, Tong J, Wang R (2021) Attention refined network for human pose estimation. Neural Process Lett 53(4):2853–2872
Su K, Yu D, Xu Z, Geng X, Wang C (2019) Multi-person pose estimation with enhanced channel-wise and spatial information. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. Computer Science
Luvizon DC, Tabia H, Picard D (2019) Human pose regression by combining indirect part detection and contextual information. Compute Graph 85:15–22
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zou, X., Bi, X. & Yu, C. Improving Human Pose Estimation Based on Stacked Hourglass Network. Neural Process Lett 55, 9521–9544 (2023). https://doi.org/10.1007/s11063-023-11212-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11212-5