Skip to main content
Log in

Sign language recognition based on skeleton and SK3D-Residual network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Most of the existing dynamic sign language recognition methods based on deep learning directly use the video sequence or the whole sequence based on RGB information, not just the video sequence representing the change of gesture. These make it difficult for sign language recognition to achieve good accuracy. In order to solve these problems, this paper proposes a method of sign language recognition based on skeleton and SK3D-Residual network. In SK3D-Residual network, a key frame optimization algorithm for skeleton sequence based on mutual information is designed. The 3D-LSTM module extracts spatiotemporal features from the skeleton key frame sequences, analyzes the features of each action in the sequence, and then recognizes sign language. The experimental accuracy is 88.6%. In addition, the accuracy of the combination of RGB and skeleton information is 93.2%. Our experiment has achieved a good recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1:
Fig. 4

Similar content being viewed by others

Data availability

No new datasets were generated in this paper. The datasets used for the experiments are available datasets.

References

  1. Baribina N, Oks A, Baltina I, Katashev A, Emjonova G (2019) Development of pressure sensitive glove prototype. Key Eng Mater 800:326–330

    Article  Google Scholar 

  2. Boulahia SY, Anquetil E, Multon F, Kulpa R (2017) Dynamic hand gesture recognition based on 3D pattern assembled trajectories. In: 2017 seventh international conference on image processing theory, tools and applications, pp. 1–6. https://doi.org/10.1109/IPTA.2017.8310146

  3. Brock H, Law F, Nakadai K, Nagashima Y (2020) Learning three-dimensional skeleton data from sign language video. ACM Transactions on Intelligent Systems and Technology (TIST), 11(3):1–24

  4. Chai X, Wang H, Chen X (2014) The devisign large vocabulary of chinese sign language database and baseline evaluations. In: Technical report VIPL-TR-14-SLR-001. Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS). Institute of Computing Technology

  5. Chen X, Guo H, Wang G, Zhang L (2017) Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition. In: 2017 IEEE International Conference on Image Processing, pp. 2881–2885. https://doi.org/10.1109/ICIP.2017.8296809

  6. Chen X, Wang G, Guo H, Zhang C (2020) Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing 395:138–149

    Article  Google Scholar 

  7. Du T, Ray J, Shou Z, Chang S, Paluar M (2017) ConvNet architecture search for spatiotemporal feature learning. ArXiv, vol. abs/1708.05038. https://arxiv.org/abs/1708.05038. Accessed 19 Jan 2022

  8. Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans Cybern 43:1318–1334

    Article  Google Scholar 

  9. Hou J, Wang G, Chen X, Xue J, Zhu R, Yang H (2019) Spatial-Temporal attention res-TCN for skeleton-based dynamic hand gesture recognition. Lect Notes Comput Sci 11134:273–286

    Article  Google Scholar 

  10. Huang X, Wang Q, Zang S, Wang J, Yang G, Huang Y, Ren X (2019) Tracing the motion of finger joints for gesture recognition via sewing rgo-coated fibers onto a textile glove. IEEE Sens J 19:9504–9511

    Article  Google Scholar 

  11. Ionescu B, Coquin D, Lambert P, Buzuloiu V (2005) Dynamic hand gesture recognition using the skeleton of the hand. EURASIP J Adv Signal Process 13:2101–2109

    Google Scholar 

  12. Jiang L, Xia H, Guo C (2019) A model-based system for real-time articulated hand tracking using a simple data glove and a depth camera. Sensors 19(21):4680. https://doi.org/10.3390/s19214680

  13. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732. https://doi.org/10.1109/CVPR.2014.223

  14. Kim T, Keane J, Wang W, Tang H, Riggle J (2016) Lexicon-Free fingerspelling recognition from video: data, models, and signer adaptation. Comput Speech Lang 46:209–232

    Article  Google Scholar 

  15. Kishore P, Kumar D, Goutham E, Manikanta M (2016) Continuous sign language recognition from tracking and shape features using fuzzy inference engine. In: 2016 International Conference on Wireless Communications, Signal Processing and Networking, pp. 2165–2170. https://doi.org/10.1109/WiSPNET.2016.7566526

  16. Koller O, Ney H, Bowden R (2016) Deep hand: how to train a CNN on 1 million hand images when your data is continuous and weakly labelled. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3793–3802. https://doi.org/10.1109/CVPR.2016.412

  17. Kopuklu O, Kose N, Rigoll G (2018) Motion Fused Frames: data level fusion strategy for hand gesture recognition. In: Conference on Computer Vision and Pattern Recognition Workshops, pp. 2184–21848. https://doi.org/10.1109/CVPRW.2018.00284

  18. Liao Y, Xiong P, Min W, Lu J (2019) Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks. IEEE Access 7:38044–38054

    Article  Google Scholar 

  19. Lin Y, Chai X, Yu Z, Chen X (2015) Curve matching from the view of manifold for sign language recognition. Asian Conference on Computer Vision, 233–246

  20. Moon G, Chang JY, Lee KM (2018) V2v-posenet: voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In: Proceedings of the IEEE conference on computer vision and pattern Recognition, pp. 5079–5088. https://doi.org/10.1109/CVPR.2018.00533

  21. Nunez I, Cabido R, Pantrigo J, Montemayor A, Velez J (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn 76:80–94

    Article  Google Scholar 

  22. Piergiovanni A, Fan C, Ryoo M (2017) Learning latent subevents in activity videos using temporal attention filters. In: Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11240

  23. Reddy S, Latha P, Babu M (2011) Hand gesture recognition using skeleton of hand and distance based metric. Adv Comput Inf Technol 198:346–354

    Google Scholar 

  24. Ryoo MS, Rothrock B, Matthies L (2015) Pooled motion features for first-person videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 896–904. https://doi.org/10.1109/CVPR.2015.7298691

  25. Shou Z, Chan J, Zareian A, Miyazawa K, Chang S (2017) Cdc: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5734-5743. https://doi.org/10.48550/arXiv.1703.01515

  26. Singh B, Marks TK, Jones M, Tuzel O, Shao M (2016) A multi-stream bi-directional recurrent neural network for fine-grained action detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1961–1970. https://doi.org/10.1109/CVPR.2016.216

  27. Smedt Q Wannous H, Vandeborre J (2016) Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–9. https://doi.org/10.1109/CVPRW.2016.153

  28. Smedt Q, Wannous H, Vandeborre J (2017) SHREC17 Track: 3D hand gesture recognition using a depth and skeletal dataset. Eurographics Workshop on 3D Object Retrieval, pp. 33–38. https://doi.org/10.2312/3dor.20171049

  29. Song W, Wang A, Chen Y, Bai S (2019) Design of a wearable smart sEMG recorder integrated gradient boosting decision tree based hand gesture recognition. IEEE Trans Biomed Circuits Syst 13:1563–1574

    Article  Google Scholar 

  30. Varol G, Laptev I, Schmid C (2018) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40:1510–1517

    Article  Google Scholar 

  31. Wang C, Chan SC (2014) A new hand gesture recognition algorithm based on joint color-depth superpixel earth mover's distance. In: 2014 4th International Workshop on Cognitive Information Processing (CIP), pp. 1–6. https://doi.org/10.1109/TMM.2014.2374357

  32. Wang H, Chai X, Chen X (2016) Sparse observation (SO) alignment for sign language recognition. Neurocomputing 175:674–685

    Article  Google Scholar 

  33. Wang Z, Chen X, Guo H, Zhang C (2018) Region ensemble vetwork: towards good practices for deep 3D hand pose estimation. J Vis Commun Image Represent 55:404–414

    Article  Google Scholar 

  34. Xiao Q, Qin M, Yin Y (2020) Skeleton-based chinese sign language recognition and generation for bidirectional communication between deaf and hearing people. Neural Netw 125:41–55

    Article  Google Scholar 

  35. Xiong X, Min W, Zheng W, Liao P, Yao H, Wang S (2020) S3D-CNN: skeleton-based 3D consecutive-low-pooling neural network for fall detection. Appl Intell 50(10):3521–3534

    Article  Google Scholar 

  36. Xiong X, Wu H, Min W, Xu J, Peng C (2021) Traffic police gesture recognition based on gesture skeleton extractor and multichannel dilated graph convolution network. Electronics 10:551

    Article  Google Scholar 

  37. Xu H, Da S, Saenko K (2017) R-C3D: region convolutional 3D network for temporal activity detection. In: IEEE International Conference on Computer Vision, pp. 5783–5792. https://doi.org/10.48550/arXiv.1703.07814

  38. Yang H, Liu L, Min W, Yang X, Xiong X (2020) Driver yawning detection based on subtle facial action recognition. IEEE Trans Multimedia 23:572–583

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 62076117, No. 61762061 and No.62166026) and Jiangxi Key Laboratory of Smart City (Grant No. 20192BCD40002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weidong Min.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, Q., Huangfu, Z., Min, W. et al. Sign language recognition based on skeleton and SK3D-Residual network. Multimed Tools Appl 83, 18059–18072 (2024). https://doi.org/10.1007/s11042-023-16117-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16117-y

Keywords

Navigation