Human pose estimation model based on DiracNets and integral pose regression

Xu, Xinzheng; Guo, Yanyan; Wang, Xin

doi:10.1007/s11042-023-15057-x

Human pose estimation model based on DiracNets and integral pose regression

Published: 14 March 2023

Volume 82, pages 36019–36039, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

268 Accesses
1 Altmetric
Explore all metrics

Abstract

Human pose estimation has achieved great progress in recent years. However, many methods use max pooling, average pooling, or simple downsampling in the form of stepped convolution on the feature map to increase the feature receptive field of the network, which will lead to the loss of original feature information and quantization errors. In order to solve above problems, we propose the RM-IPR-DDHPE model. Specifically, we firstly propose the DDHPE model which uses Mask R-CNN as the backbone network. In this model, we replace the residual module with an improved Dirac network module (DiracNets) to adaptively learn deeper features. Besides, we adopt the detail-preserving pooling (DPP) method which can amplify the spatial changes to solve the problem of key details loss in traditional pooling methods. On the basis of the above improvements, a RM-IPR-DDHPE model based on Ranger optimizer, Mish activation function and integral attitude regression is constructed, which can avoid quantization errors, optimize the gradient propagation and structure of the network. We validate the classification ability of DDHPE on the CIFAR dataset and the performance of the RM-IPR-DDHPE model for predicting human keypoints on the MSCOCO2014 dataset and the MPII dataset. The results of DDHPE on CIFAR-10 and CIFAR-100 are 95.27 and 77.51 respectively. The AP, AP⁵⁰, AP⁷⁵, AP^M, AP^L of RM-IPR-DDHPE on MSCOCO2014 are 78.0, 93.9, 85.4, 74.3, 84.9. And the average accuracy mAP of all key points on the MPII is 94.1. The results show that DDHPE has a good feature extraction ability, and the RM-IPR-DDHPE model improves the prediction accuracy while solving the quantization error of the DDHPE network joint point estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SP-YOLO: an end-to-end lightweight network for real-time human pose estimation

Article 18 October 2023

Human pose estimation based on feature enhancement and multi-scale feature fusion

Article 18 June 2022

IDPNet: a light-weight network and its variants for human pose estimation

Article 18 October 2023

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

Andriluka M, Pishchulin L, Gehler P, Schiele B (n.d.) 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. June 2014:3686-3693. https://doi.org/10.1109/CVPR.2014.471
Bin Y, Chen ZM, Wei XS, Chen X, Gao C, Sang N (2020) Structure-aware human pose estimation with graph convolutional networks. Pattern Recogn 106:107410. https://doi.org/10.1016/j.patcog.2020.107410
Article Google Scholar
Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7291-7299). https://doi.org/10.1109/cvpr.2017.143
Cao D, Liu W, Xing W, Wei X (2022) Human pose estimation based on feature enhancement and multi-scale feature fusion. SIViP:1–8. https://doi.org/10.1007/s11760-022-02271-7
Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human Pose Estimation with Iterative Error Feedback. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference On, 4733-4742. https://doi.org/10.1109/CVPR.2016.512
Chen X, Yuille A (2014) Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations. https://doi.org/10.48550/arXiv.1407.3399
Chen Y, Shen C, Wei X-S, Liu L, Yang J (2017) Adversarial PoseNet: A Structure-aware Convolutional Network for Human Pose Estimation. https://doi.org/10.48550/arXiv.1705.00389
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018). Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7103-7112. https://doi.org/10.48550/arXiv.1711.07319
Chen J, Lei B, Song Q, Ying H, Chen DZ, Wu J (2020) A hierarchical graph network for 3d object detection on point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 392-401. https://doi.org/10.1109/CVPR42600.2020.00047
Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2020 IEEE/CVF Conference on, CVPR, 5385-5394. https://doi.org/10.1109/CVPR42600.2020.00543
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-Context Attention for Human Pose Estimation. https://doi.org/10.48550/arXiv.1702.07432
Clevert DA, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289. https://doi.org/10.48550/arXiv.1511.07289
Fang H-S, Xie S, Tai Y-W, Lu C (2016) RMPE: Regional Multi-person Pose Estimation. https://doi.org/10.48550/arXiv.1612.00137
Felzenszwalb PF, Huttenlocher DP (2005) Pictorial Structures for Object Recognition. Int J Comput Vis 61:55–79. https://doi.org/10.1023/B:VISI.0000042934.15159.49
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference On, 770-778. https://doi.org/10.1109/CVPR.2016.90
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. pp. 2961-2969. https://doi.org/10.48550/arXiv.1703.06870
He K, Gkioxari G, Dollar P, Girshick R (n.d.) Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, 2017-October, 2980-2988. https://doi.org/10.1109/ICCV.2017.322
Hu J, Shen L, Sun G (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. (pp. 7132-7141). https://doi.org/10.48550/arXiv.1709.01507
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4700-4708. https://doi.org/10.48550/arXiv.1608.06993
Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (n.d.) Deepercut: A deeper, stronger, and faster multi-person pose estimation model: Vol. 9910 LNCS. Springer Verlag. https://doi.org/10.1007/978-3-319-46466-4_3
Kreiss S, Bertoni L, Alahi A (2019) PifPaf: Composite Fields for Human Pose Estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2019 IEEE/CVF Conference On, 11969-11978. https://doi.org/10.1109/CVPR.2019.01225
Kumar M, Jindal MK, Kumar M (2022) Distortion, rotation and scale invariant recognition of hollow Hindi characters. Sādhanā 47(2):1–6. https://doi.org/10.1007/s12046-022-01847-w
Article Google Scholar
Li W, Wang Z, Yin B, Peng Q, Du Y, Xiao T, ..., Sun J (2019) Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148. https://doi.org/10.48550/arXiv.1901.00148
Li R, Huang H, Zheng Y (2022) Human Pose Estimation Based on Lite HRNet with Coordinate Attention. 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Intelligent Computing and Signal Processing (ICSP), 2022 7th International Conference On, 1166-1170. https://doi.org/10.1109/ICSP54964.2022.9778346
Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400. https://doi.org/10.48550/arXiv.1312.4400
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440). https://doi.org/10.1109/TPAMI.2016.2572683
Luvizon DC, Tabia H, Picard D (2017) Human Pose Regression by Combining Indirect Part Detection and Contextual Information. https://doi.org/10.1016/j.cag.2019.09.002
Marusov A, Kaprielova M, Neychev R (2022) Enhancing Human Pose Estimation with Privileged Learning. 2022 31st Conference of Open Innovations Association (FRUCT), Open Innovations Association (FRUCT), 2022 31st Conference Of, 174-180. https://doi.org/10.23919/FRUCT54823.2022.9770903
Misra D (2019) Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:1908.08681, 4(2), 10-48550. https://doi.org/10.48550/arXiv.1908.0868
Newell A, Yang K, Deng J (2016) Stacked Hourglass Networks for Human Pose Estimation. https://doi.org/10.1007/978-3-319-46484-8_29
Newell A, Huang Z, Deng J (2017) Associative embedding: End-to-end learning for joint detection and grouping. Adv Neural Inf Proces Syst 30. https://doi.org/10.48550/arXiv.1611.05424
Ou Z, Luo Y, Chen J, Chen G (2022) SRFNet: selective receptive field network for human pose estimation. J Supercomput 78(1):691–711. https://doi.org/10.1007/s11042-017-5537-5
Article Google Scholar
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4903-4911. https://doi.org/10.48550/arXiv.1701.01779
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2016) DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference On, 4929-4937. https://doi.org/10.1109/CVPR.2016.533
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Proces Syst 28:1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550. https://doi.org/10.48550/arXiv.1412.6550
Saeedan F, Weber N, Goesele M, Roth S (2018). Detail-Preserving Pooling in Deep Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE/CVF Conference on, CVPR, 9108–9116. https://doi.org/10.1109/CVPR.2018.00949
Sarafianos N, Boteanu B, Ionescu B, Kakadiaris IA (2016) 3D Human pose estimation: A review of the literature and analysis of covariates. Comput Vis Image Underst 152:1–20. https://doi.org/10.1016/j.cviu.2016.09.002
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. https://doi.org/10.48550/arXiv.1409.1556
Sun X, Shang J, Liang S, Wei Y (2017) Compositional Human Pose Regression. https://doi.org/10.48550/arXiv.1704.00159. Accessed 18 July 2022
Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In Proceedings of the European conference on computer vision (ECCV) (pp. 529-545). https://doi.org/10.48550/arXiv.1711.08229
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (pp. 5693-5703). https://doi.org/10.1109/CVPR.2019.00584
Tong Q, Liang G, Bi J (2022) Calibrating the adaptive learning rate to improve convergence of ADAM. Neurocomputing 481:333–356. https://doi.org/10.1016/j.neucom.2022.01.014
Article Google Scholar
Toshev A, Szegedy C (2014) DeepPose: Human Pose Estimation via Deep Neural Networks. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference On, 1653-1660. https://doi.org/10.1109/CVPR.2014.214
Wang M, Tighe J, Modolo D (2020) Combining Detection and Tracking for Human Pose Estimation in Videos. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2020 IEEE/CVF Conference on, CVPR, 11085-11093. https://doi.org/10.1109/CVPR42600.2020.01110
Wang Z, Lu Y, Ni W, Song L (2021) An RGB-D Based Approach for Human Pose Estimation. 2021 International Conference on Networking Systems of AI (INSAI), Networking Systems of AI (INSAI), 2021 International Conference on, INSAI, 166-170. https://doi.org/10.1109/INSAI54028.2021.00039
Wang W, Zhang K, Ren H, Wei D, Gao Y, Liu J (2022) UULPN: An ultra-lightweight network for human pose estimation based on unbiased data processing. Neurocomputing 480:220–233. https://doi.org/10.1016/j.neucom.2021.12.083
Article Google Scholar
Wei S-E, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional Pose Machines. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference On, 4724-4732. https://doi.org/10.1109/CVPR.2016.511
Wu X, Chen Z, Liu H (2021) Learning Delicate Pixel-Level Representations for Bottom-Up Human Pose Estimation. 2021 IEEE International Conference on Engineering, Technology & Education (TALE), Engineering, Technology & Education (TALE), 2021 IEEE International Conference On, 01-06. https://doi.org/10.1109/TALE52509.2021.9678618
Xiang X, Zong W, Li G (2022) Learnable Upsampling-Based Point Cloud Semantic Segmentation. 2022 7th International Conference on Image, Vision and Computing (ICIVC), Image, Vision and Computing (ICIVC), 2022 7th International Conference On, 340-347. https://doi.org/10.1109/ICIVC55077.2022.9886287
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. (pp. 1492-1500). https://doi.org/10.48550/arXiv.1611.05431
Xiong S, Qu Z, Wang Y, Wang X, Xia H (2021) MLP-Pose: Human Pose Estimation by MLP-Mixer. 2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS), Cloud Computing and Intelligent Systems (CCIS), 2021 IEEE 7th International Conference On, 183-187. https://doi.org/10.1109/CCIS53392.2021.9754658
Zagoruyko S, Komodakis N (2017) DiracNets: Training Very Deep Neural Networks Without Skip-Connections[OL]. https://doi.org/10.48550/arXiv.1706.00388

Download references

Acknowledgements

We thank anonymous reviewers for valuable suggestions.

Funding

This work is supported by the National Natural Science Foundation of China (No. 61976217), the the Fundamental Research Funds for the Central Universities(No. 2019XKQYMS87), and the Science and Technology Planning Project of Xuzhou (No.KC21193).

Author information

Authors and Affiliations

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
Xinzheng Xu, Yanyan Guo & Xin Wang

Authors

Xinzheng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yanyan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinzheng Xu.

Ethics declarations

Conflict of Interest

Authors have no conflict of Interest to declare.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, X., Guo, Y. & Wang, X. Human pose estimation model based on DiracNets and integral pose regression. Multimed Tools Appl 82, 36019–36039 (2023). https://doi.org/10.1007/s11042-023-15057-x

Download citation

Received: 28 April 2022
Revised: 04 October 2022
Accepted: 27 February 2023
Published: 14 March 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s11042-023-15057-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human pose estimation model based on DiracNets and integral pose regression

Abstract

Access this article

Similar content being viewed by others

SP-YOLO: an end-to-end lightweight network for real-time human pose estimation

Human pose estimation based on feature enhancement and multi-scale feature fusion

IDPNet: a light-weight network and its variants for human pose estimation

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human pose estimation model based on DiracNets and integral pose regression

Abstract

Access this article

Similar content being viewed by others

SP-YOLO: an end-to-end lightweight network for real-time human pose estimation

Human pose estimation based on feature enhancement and multi-scale feature fusion

IDPNet: a light-weight network and its variants for human pose estimation

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation