Abstract
Human pose estimation has achieved great progress in recent years. However, many methods use max pooling, average pooling, or simple downsampling in the form of stepped convolution on the feature map to increase the feature receptive field of the network, which will lead to the loss of original feature information and quantization errors. In order to solve above problems, we propose the RM-IPR-DDHPE model. Specifically, we firstly propose the DDHPE model which uses Mask R-CNN as the backbone network. In this model, we replace the residual module with an improved Dirac network module (DiracNets) to adaptively learn deeper features. Besides, we adopt the detail-preserving pooling (DPP) method which can amplify the spatial changes to solve the problem of key details loss in traditional pooling methods. On the basis of the above improvements, a RM-IPR-DDHPE model based on Ranger optimizer, Mish activation function and integral attitude regression is constructed, which can avoid quantization errors, optimize the gradient propagation and structure of the network. We validate the classification ability of DDHPE on the CIFAR dataset and the performance of the RM-IPR-DDHPE model for predicting human keypoints on the MSCOCO2014 dataset and the MPII dataset. The results of DDHPE on CIFAR-10 and CIFAR-100 are 95.27 and 77.51 respectively. The AP, AP50, AP75, APM, APL of RM-IPR-DDHPE on MSCOCO2014 are 78.0, 93.9, 85.4, 74.3, 84.9. And the average accuracy mAP of all key points on the MPII is 94.1. The results show that DDHPE has a good feature extraction ability, and the RM-IPR-DDHPE model improves the prediction accuracy while solving the quantization error of the DDHPE network joint point estimation.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
Andriluka M, Pishchulin L, Gehler P, Schiele B (n.d.) 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. June 2014:3686-3693. https://doi.org/10.1109/CVPR.2014.471
Bin Y, Chen ZM, Wei XS, Chen X, Gao C, Sang N (2020) Structure-aware human pose estimation with graph convolutional networks. Pattern Recogn 106:107410. https://doi.org/10.1016/j.patcog.2020.107410
Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7291-7299). https://doi.org/10.1109/cvpr.2017.143
Cao D, Liu W, Xing W, Wei X (2022) Human pose estimation based on feature enhancement and multi-scale feature fusion. SIViP:1–8. https://doi.org/10.1007/s11760-022-02271-7
Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human Pose Estimation with Iterative Error Feedback. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference On, 4733-4742. https://doi.org/10.1109/CVPR.2016.512
Chen X, Yuille A (2014) Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations. https://doi.org/10.48550/arXiv.1407.3399
Chen Y, Shen C, Wei X-S, Liu L, Yang J (2017) Adversarial PoseNet: A Structure-aware Convolutional Network for Human Pose Estimation. https://doi.org/10.48550/arXiv.1705.00389
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018). Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7103-7112. https://doi.org/10.48550/arXiv.1711.07319
Chen J, Lei B, Song Q, Ying H, Chen DZ, Wu J (2020) A hierarchical graph network for 3d object detection on point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 392-401. https://doi.org/10.1109/CVPR42600.2020.00047
Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2020 IEEE/CVF Conference on, CVPR, 5385-5394. https://doi.org/10.1109/CVPR42600.2020.00543
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-Context Attention for Human Pose Estimation. https://doi.org/10.48550/arXiv.1702.07432
Clevert DA, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289. https://doi.org/10.48550/arXiv.1511.07289
Fang H-S, Xie S, Tai Y-W, Lu C (2016) RMPE: Regional Multi-person Pose Estimation. https://doi.org/10.48550/arXiv.1612.00137
Felzenszwalb PF, Huttenlocher DP (2005) Pictorial Structures for Object Recognition. Int J Comput Vis 61:55–79. https://doi.org/10.1023/B:VISI.0000042934.15159.49
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference On, 770-778. https://doi.org/10.1109/CVPR.2016.90
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. pp. 2961-2969. https://doi.org/10.48550/arXiv.1703.06870
He K, Gkioxari G, Dollar P, Girshick R (n.d.) Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, 2017-October, 2980-2988. https://doi.org/10.1109/ICCV.2017.322
Hu J, Shen L, Sun G (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. (pp. 7132-7141). https://doi.org/10.48550/arXiv.1709.01507
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4700-4708. https://doi.org/10.48550/arXiv.1608.06993
Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (n.d.) Deepercut: A deeper, stronger, and faster multi-person pose estimation model: Vol. 9910 LNCS. Springer Verlag. https://doi.org/10.1007/978-3-319-46466-4_3
Kreiss S, Bertoni L, Alahi A (2019) PifPaf: Composite Fields for Human Pose Estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2019 IEEE/CVF Conference On, 11969-11978. https://doi.org/10.1109/CVPR.2019.01225
Kumar M, Jindal MK, Kumar M (2022) Distortion, rotation and scale invariant recognition of hollow Hindi characters. Sādhanā 47(2):1–6. https://doi.org/10.1007/s12046-022-01847-w
Li W, Wang Z, Yin B, Peng Q, Du Y, Xiao T, ..., Sun J (2019) Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148. https://doi.org/10.48550/arXiv.1901.00148
Li R, Huang H, Zheng Y (2022) Human Pose Estimation Based on Lite HRNet with Coordinate Attention. 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Intelligent Computing and Signal Processing (ICSP), 2022 7th International Conference On, 1166-1170. https://doi.org/10.1109/ICSP54964.2022.9778346
Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400. https://doi.org/10.48550/arXiv.1312.4400
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440). https://doi.org/10.1109/TPAMI.2016.2572683
Luvizon DC, Tabia H, Picard D (2017) Human Pose Regression by Combining Indirect Part Detection and Contextual Information. https://doi.org/10.1016/j.cag.2019.09.002
Marusov A, Kaprielova M, Neychev R (2022) Enhancing Human Pose Estimation with Privileged Learning. 2022 31st Conference of Open Innovations Association (FRUCT), Open Innovations Association (FRUCT), 2022 31st Conference Of, 174-180. https://doi.org/10.23919/FRUCT54823.2022.9770903
Misra D (2019) Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:1908.08681, 4(2), 10-48550. https://doi.org/10.48550/arXiv.1908.0868
Newell A, Yang K, Deng J (2016) Stacked Hourglass Networks for Human Pose Estimation. https://doi.org/10.1007/978-3-319-46484-8_29
Newell A, Huang Z, Deng J (2017) Associative embedding: End-to-end learning for joint detection and grouping. Adv Neural Inf Proces Syst 30. https://doi.org/10.48550/arXiv.1611.05424
Ou Z, Luo Y, Chen J, Chen G (2022) SRFNet: selective receptive field network for human pose estimation. J Supercomput 78(1):691–711. https://doi.org/10.1007/s11042-017-5537-5
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4903-4911. https://doi.org/10.48550/arXiv.1701.01779
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2016) DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference On, 4929-4937. https://doi.org/10.1109/CVPR.2016.533
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Proces Syst 28:1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550. https://doi.org/10.48550/arXiv.1412.6550
Saeedan F, Weber N, Goesele M, Roth S (2018). Detail-Preserving Pooling in Deep Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition (CVPR), 2018 IEEE/CVF Conference on, CVPR, 9108–9116. https://doi.org/10.1109/CVPR.2018.00949
Sarafianos N, Boteanu B, Ionescu B, Kakadiaris IA (2016) 3D Human pose estimation: A review of the literature and analysis of covariates. Comput Vis Image Underst 152:1–20. https://doi.org/10.1016/j.cviu.2016.09.002
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. https://doi.org/10.48550/arXiv.1409.1556
Sun X, Shang J, Liang S, Wei Y (2017) Compositional Human Pose Regression. https://doi.org/10.48550/arXiv.1704.00159. Accessed 18 July 2022
Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In Proceedings of the European conference on computer vision (ECCV) (pp. 529-545). https://doi.org/10.48550/arXiv.1711.08229
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (pp. 5693-5703). https://doi.org/10.1109/CVPR.2019.00584
Tong Q, Liang G, Bi J (2022) Calibrating the adaptive learning rate to improve convergence of ADAM. Neurocomputing 481:333–356. https://doi.org/10.1016/j.neucom.2022.01.014
Toshev A, Szegedy C (2014) DeepPose: Human Pose Estimation via Deep Neural Networks. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference On, 1653-1660. https://doi.org/10.1109/CVPR.2014.214
Wang M, Tighe J, Modolo D (2020) Combining Detection and Tracking for Human Pose Estimation in Videos. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2020 IEEE/CVF Conference on, CVPR, 11085-11093. https://doi.org/10.1109/CVPR42600.2020.01110
Wang Z, Lu Y, Ni W, Song L (2021) An RGB-D Based Approach for Human Pose Estimation. 2021 International Conference on Networking Systems of AI (INSAI), Networking Systems of AI (INSAI), 2021 International Conference on, INSAI, 166-170. https://doi.org/10.1109/INSAI54028.2021.00039
Wang W, Zhang K, Ren H, Wei D, Gao Y, Liu J (2022) UULPN: An ultra-lightweight network for human pose estimation based on unbiased data processing. Neurocomputing 480:220–233. https://doi.org/10.1016/j.neucom.2021.12.083
Wei S-E, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional Pose Machines. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference On, 4724-4732. https://doi.org/10.1109/CVPR.2016.511
Wu X, Chen Z, Liu H (2021) Learning Delicate Pixel-Level Representations for Bottom-Up Human Pose Estimation. 2021 IEEE International Conference on Engineering, Technology & Education (TALE), Engineering, Technology & Education (TALE), 2021 IEEE International Conference On, 01-06. https://doi.org/10.1109/TALE52509.2021.9678618
Xiang X, Zong W, Li G (2022) Learnable Upsampling-Based Point Cloud Semantic Segmentation. 2022 7th International Conference on Image, Vision and Computing (ICIVC), Image, Vision and Computing (ICIVC), 2022 7th International Conference On, 340-347. https://doi.org/10.1109/ICIVC55077.2022.9886287
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. (pp. 1492-1500). https://doi.org/10.48550/arXiv.1611.05431
Xiong S, Qu Z, Wang Y, Wang X, Xia H (2021) MLP-Pose: Human Pose Estimation by MLP-Mixer. 2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS), Cloud Computing and Intelligent Systems (CCIS), 2021 IEEE 7th International Conference On, 183-187. https://doi.org/10.1109/CCIS53392.2021.9754658
Zagoruyko S, Komodakis N (2017) DiracNets: Training Very Deep Neural Networks Without Skip-Connections[OL]. https://doi.org/10.48550/arXiv.1706.00388
Acknowledgements
We thank anonymous reviewers for valuable suggestions.
Funding
This work is supported by the National Natural Science Foundation of China (No. 61976217), the the Fundamental Research Funds for the Central Universities(No. 2019XKQYMS87), and the Science and Technology Planning Project of Xuzhou (No.KC21193).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
Authors have no conflict of Interest to declare.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xu, X., Guo, Y. & Wang, X. Human pose estimation model based on DiracNets and integral pose regression. Multimed Tools Appl 82, 36019–36039 (2023). https://doi.org/10.1007/s11042-023-15057-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15057-x