Sign language recognition based on skeleton and SK3D-Residual network

Han, Qing; Huangfu, Zhanlu; Min, Weidong; Ding, TianQi; Liao, Yanqiu

doi:10.1007/s11042-023-16117-y

Sign language recognition based on skeleton and SK3D-Residual network

Published: 22 July 2023

Volume 83, pages 18059–18072, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Qing Han^1,2,3,
Zhanlu Huangfu⁴,
Weidong Min ORCID: orcid.org/0000-0003-2526-2181^1,2,3,
TianQi Ding¹ &
…
Yanqiu Liao¹

150 Accesses
1 Citation
Explore all metrics

Abstract

Most of the existing dynamic sign language recognition methods based on deep learning directly use the video sequence or the whole sequence based on RGB information, not just the video sequence representing the change of gesture. These make it difficult for sign language recognition to achieve good accuracy. In order to solve these problems, this paper proposes a method of sign language recognition based on skeleton and SK3D-Residual network. In SK3D-Residual network, a key frame optimization algorithm for skeleton sequence based on mutual information is designed. The 3D-LSTM module extracts spatiotemporal features from the skeleton key frame sequences, analyzes the features of each action in the sequence, and then recognizes sign language. The experimental accuracy is 88.6%. In addition, the accuracy of the combination of RGB and skeleton information is 93.2%. Our experiment has achieved a good recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Article 03 June 2022

Data availability

No new datasets were generated in this paper. The datasets used for the experiments are available datasets.

References

Baribina N, Oks A, Baltina I, Katashev A, Emjonova G (2019) Development of pressure sensitive glove prototype. Key Eng Mater 800:326–330
Article Google Scholar
Boulahia SY, Anquetil E, Multon F, Kulpa R (2017) Dynamic hand gesture recognition based on 3D pattern assembled trajectories. In: 2017 seventh international conference on image processing theory, tools and applications, pp. 1–6. https://doi.org/10.1109/IPTA.2017.8310146
Brock H, Law F, Nakadai K, Nagashima Y (2020) Learning three-dimensional skeleton data from sign language video. ACM Transactions on Intelligent Systems and Technology (TIST), 11(3):1–24
Chai X, Wang H, Chen X (2014) The devisign large vocabulary of chinese sign language database and baseline evaluations. In: Technical report VIPL-TR-14-SLR-001. Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS). Institute of Computing Technology
Chen X, Guo H, Wang G, Zhang L (2017) Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition. In: 2017 IEEE International Conference on Image Processing, pp. 2881–2885. https://doi.org/10.1109/ICIP.2017.8296809
Chen X, Wang G, Guo H, Zhang C (2020) Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing 395:138–149
Article Google Scholar
Du T, Ray J, Shou Z, Chang S, Paluar M (2017) ConvNet architecture search for spatiotemporal feature learning. ArXiv, vol. abs/1708.05038. https://arxiv.org/abs/1708.05038. Accessed 19 Jan 2022
Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans Cybern 43:1318–1334
Article Google Scholar
Hou J, Wang G, Chen X, Xue J, Zhu R, Yang H (2019) Spatial-Temporal attention res-TCN for skeleton-based dynamic hand gesture recognition. Lect Notes Comput Sci 11134:273–286
Article Google Scholar
Huang X, Wang Q, Zang S, Wang J, Yang G, Huang Y, Ren X (2019) Tracing the motion of finger joints for gesture recognition via sewing rgo-coated fibers onto a textile glove. IEEE Sens J 19:9504–9511
Article Google Scholar
Ionescu B, Coquin D, Lambert P, Buzuloiu V (2005) Dynamic hand gesture recognition using the skeleton of the hand. EURASIP J Adv Signal Process 13:2101–2109
Google Scholar
Jiang L, Xia H, Guo C (2019) A model-based system for real-time articulated hand tracking using a simple data glove and a depth camera. Sensors 19(21):4680. https://doi.org/10.3390/s19214680
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732. https://doi.org/10.1109/CVPR.2014.223
Kim T, Keane J, Wang W, Tang H, Riggle J (2016) Lexicon-Free fingerspelling recognition from video: data, models, and signer adaptation. Comput Speech Lang 46:209–232
Article Google Scholar
Kishore P, Kumar D, Goutham E, Manikanta M (2016) Continuous sign language recognition from tracking and shape features using fuzzy inference engine. In: 2016 International Conference on Wireless Communications, Signal Processing and Networking, pp. 2165–2170. https://doi.org/10.1109/WiSPNET.2016.7566526
Koller O, Ney H, Bowden R (2016) Deep hand: how to train a CNN on 1 million hand images when your data is continuous and weakly labelled. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3793–3802. https://doi.org/10.1109/CVPR.2016.412
Kopuklu O, Kose N, Rigoll G (2018) Motion Fused Frames: data level fusion strategy for hand gesture recognition. In: Conference on Computer Vision and Pattern Recognition Workshops, pp. 2184–21848. https://doi.org/10.1109/CVPRW.2018.00284
Liao Y, Xiong P, Min W, Lu J (2019) Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks. IEEE Access 7:38044–38054
Article Google Scholar
Lin Y, Chai X, Yu Z, Chen X (2015) Curve matching from the view of manifold for sign language recognition. Asian Conference on Computer Vision, 233–246
Moon G, Chang JY, Lee KM (2018) V2v-posenet: voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In: Proceedings of the IEEE conference on computer vision and pattern Recognition, pp. 5079–5088. https://doi.org/10.1109/CVPR.2018.00533
Nunez I, Cabido R, Pantrigo J, Montemayor A, Velez J (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn 76:80–94
Article Google Scholar
Piergiovanni A, Fan C, Ryoo M (2017) Learning latent subevents in activity videos using temporal attention filters. In: Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11240
Reddy S, Latha P, Babu M (2011) Hand gesture recognition using skeleton of hand and distance based metric. Adv Comput Inf Technol 198:346–354
Google Scholar
Ryoo MS, Rothrock B, Matthies L (2015) Pooled motion features for first-person videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 896–904. https://doi.org/10.1109/CVPR.2015.7298691
Shou Z, Chan J, Zareian A, Miyazawa K, Chang S (2017) Cdc: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5734-5743. https://doi.org/10.48550/arXiv.1703.01515
Singh B, Marks TK, Jones M, Tuzel O, Shao M (2016) A multi-stream bi-directional recurrent neural network for fine-grained action detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1961–1970. https://doi.org/10.1109/CVPR.2016.216
Smedt Q Wannous H, Vandeborre J (2016) Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–9. https://doi.org/10.1109/CVPRW.2016.153
Smedt Q, Wannous H, Vandeborre J (2017) SHREC17 Track: 3D hand gesture recognition using a depth and skeletal dataset. Eurographics Workshop on 3D Object Retrieval, pp. 33–38. https://doi.org/10.2312/3dor.20171049
Song W, Wang A, Chen Y, Bai S (2019) Design of a wearable smart sEMG recorder integrated gradient boosting decision tree based hand gesture recognition. IEEE Trans Biomed Circuits Syst 13:1563–1574
Article Google Scholar
Varol G, Laptev I, Schmid C (2018) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40:1510–1517
Article Google Scholar
Wang C, Chan SC (2014) A new hand gesture recognition algorithm based on joint color-depth superpixel earth mover's distance. In: 2014 4th International Workshop on Cognitive Information Processing (CIP), pp. 1–6. https://doi.org/10.1109/TMM.2014.2374357
Wang H, Chai X, Chen X (2016) Sparse observation (SO) alignment for sign language recognition. Neurocomputing 175:674–685
Article Google Scholar
Wang Z, Chen X, Guo H, Zhang C (2018) Region ensemble vetwork: towards good practices for deep 3D hand pose estimation. J Vis Commun Image Represent 55:404–414
Article Google Scholar
Xiao Q, Qin M, Yin Y (2020) Skeleton-based chinese sign language recognition and generation for bidirectional communication between deaf and hearing people. Neural Netw 125:41–55
Article Google Scholar
Xiong X, Min W, Zheng W, Liao P, Yao H, Wang S (2020) S3D-CNN: skeleton-based 3D consecutive-low-pooling neural network for fall detection. Appl Intell 50(10):3521–3534
Article Google Scholar
Xiong X, Wu H, Min W, Xu J, Peng C (2021) Traffic police gesture recognition based on gesture skeleton extractor and multichannel dilated graph convolution network. Electronics 10:551
Article Google Scholar
Xu H, Da S, Saenko K (2017) R-C3D: region convolutional 3D network for temporal activity detection. In: IEEE International Conference on Computer Vision, pp. 5783–5792. https://doi.org/10.48550/arXiv.1703.07814
Yang H, Liu L, Min W, Yang X, Xiong X (2020) Driver yawning detection based on subtle facial action recognition. IEEE Trans Multimedia 23:572–583
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 62076117, No. 61762061 and No.62166026) and Jiangxi Key Laboratory of Smart City (Grant No. 20192BCD40002).

Author information

Authors and Affiliations

School of Mathematics and Computer Science, Nanchang University, Xuefu Avenue, Nanchang, 330031, Jiangxi Province, China
Qing Han, Weidong Min, TianQi Ding & Yanqiu Liao
Jiangxi Key Laboratory of Smart City, Nanchang University, Xuefu Avenue, Nanchang, 330031, Jiangxi Province, China
Qing Han & Weidong Min
Institute of Metaverse, Nanchang University, Xuefu Avenue, Nanchang, 330031, Jiangxi Province, China
Qing Han & Weidong Min
School of Software, Nanchang University, Nanjing East Road, Nanchang, 330047, Jiangxi Province, China
Zhanlu Huangfu

Authors

Qing Han
View author publications
You can also search for this author in PubMed Google Scholar
Zhanlu Huangfu
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Min
View author publications
You can also search for this author in PubMed Google Scholar
TianQi Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yanqiu Liao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weidong Min.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, Q., Huangfu, Z., Min, W. et al. Sign language recognition based on skeleton and SK3D-Residual network. Multimed Tools Appl 83, 18059–18072 (2024). https://doi.org/10.1007/s11042-023-16117-y

Download citation

Received: 21 January 2022
Revised: 30 March 2023
Accepted: 26 June 2023
Published: 22 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-16117-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sign language recognition based on skeleton and SK3D-Residual network

Abstract

Access this article

Similar content being viewed by others

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sign language recognition based on skeleton and SK3D-Residual network

Abstract

Access this article

Similar content being viewed by others

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation