Skip to main content
Log in

Sign Language Recognition (SLR): A Brisk Paired Deep Metric Attention Learning (BPDMAL) Model for Video Data Applications

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Existing sign language recognition (SLR) models have shown to lack precision in identifying a sign due to their inability to rationalize inter-class discriminations. Specifically, the trained SLR models are sensitive to small variations in hand movements and finger shapes across signs in a video sequence. To overcome the above problem, this work proposes to learn a class label by computing a metric variable that squeezes the displacement between within-class and across-class labels. Generally, metric learning is considerably slower than other deep SLR classification architectures on video data due to the triplet pairing process. In traditional triplet pairing, all frames in all classes participate during the training process in each episode. Contrastingly, this paper proposes a self-sourced singular pairing process between the anchor and positive frames along with an attention mechanism, resulting in Brisk Paired Deep Metric Learning (BPDMAL) model. The BPDMAL integrated with standard deep learning architectures is evaluated on our 2D video sign language dataset named KL2DSL and two other benchmark video-based sign language datasets. The proposed BPDMAL has improved performance over the traditional DML and state-of-the-art SLR Deep Learning Models with an incremental downfall in training and inferencing times making it a useful model for real-time deployment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Availability of Data and Materials

Data will be made available on reasonable request.

References

  1. Koller O, Zargaran S, Ney H, Bowden R. Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int J Comput Vis. 2018;126(12):1311–25. https://doi.org/10.1007/s11263-018-1121-3.

    Article  Google Scholar 

  2. Kumar EK, Kishore P, Sastry A, Kumar MTK, Kumar DA. Training cnns for 3-d sign language recognition with color texture coded joint angular displacement maps. IEEE Signal Process Lett. 2018;25(5):645–9.

    Article  Google Scholar 

  3. Ayuningsih T, Suhendar A, Suyanto S. Feasibility study of artificial intelligence technology for home video surveillance system. In: 2022 1st International Conference on Information System and Information Technology (ICISIT). IEEE (2022). https://doi.org/10.1109/icisit54091.2022.9872822.

  4. Ghosh I, Ramamurthy SR, Chakma A, Roy N. Sports analytics review: artificial intelligence applications, emerging technologies, and algorithmic perspective. WIREs Data Min Knowl Discov. 2023. https://doi.org/10.1002/widm.1496.

    Article  Google Scholar 

  5. Wu J, Wang X, Dang Y, Lv Z. Digital twins and artificial intelligence in transportation infrastructure: classification, application, and future research directions. Comput Electr Eng. 2022;101: 107983. https://doi.org/10.1016/j.compeleceng.2022.107983.

    Article  Google Scholar 

  6. Wu B, Lu Z, Yang C. A modified LSTM model for Chinese sign language recognition using leap motion. In: 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2022. https://doi.org/10.1109/smc53654.2022.9945287.

  7. Rao GA, Syamala K, Kishore PVV, Sastry ASCS. Deep convolutional neural networks for sign language recognition. In: 2018 Conference on Signal Processing and Communication Engineering Systems (SPACES). IEEE, 2018. https://doi.org/10.1109/spaces.2018.8316344.

  8. Ali SA, Prasad MVD, Kumar PP, Kishore PVV. Deep multi view spatio temporal spectral feature embedding on skeletal sign language videos for recognition. Int J Adv Comput Sci Appl. 2022. https://doi.org/10.14569/ijacsa.2022.0130494.

    Article  Google Scholar 

  9. Kumar EK, Kishore PVV, Kumar MTK, Kumar DA, Sastry ASCS. Three-dimensional sign language recognition with angular velocity maps and connived feature ResNet. IEEE Signal Process Lett. 2018;25(12):1860–4. https://doi.org/10.1109/lsp.2018.2877891.

    Article  Google Scholar 

  10. Maddala TKK, Kishore PVV, Eepuri KK, Dande AK. YogaNet: 3-d yoga asana recognition using joint angular displacement maps with ConvNets. IEEE Trans Multimed. 2019;21(10):2492–503. https://doi.org/10.1109/tmm.2019.2904880.

    Article  Google Scholar 

  11. Nassif A.B, Shahin I, Attili I, Azzeh M, Shaalan K. Speech recognition using deep neural networks: a systematic review. IEEE Access. 2019;7:19143–65. https://doi.org/10.1109/access.2019.2896880.

    Article  Google Scholar 

  12. Hoffer E, Ailon N. Deep metric learning using triplet network. In: Similarity-based pattern recognition. Cham: Springer; 2015. p. 84–92. https://doi.org/10.1007/978-3-319-24261-3_7.

    Chapter  Google Scholar 

  13. Mopidevi S, Prasad MVD, Kishore PVV. Multiview meta-metric learning for sign language recognition using triplet loss embeddings. Pattern Anal Appl. 2023;26(3):1125–41. https://doi.org/10.1007/s10044-023-01134-2.

    Article  Google Scholar 

  14. Yu J, Hu C-H, Jing X-Y, Feng Y-J. Deep metric learning with dynamic margin hard sampling loss for face verification. SIViP. 2019;14(4):791–8. https://doi.org/10.1007/s11760-019-01612-3.

    Article  Google Scholar 

  15. Tubaiz N, Shanableh T, Assaleh K. Glove-based continuous Arabic sign language recognition in user-dependent mode. IEEE Trans Hum-Mach Syst. 2015;45(4):526–33. https://doi.org/10.1109/thms.2015.2406692.

    Article  Google Scholar 

  16. Ayoub H, Grierson M. Hand gesture recognition and speech synthesis data glove for children with non-verbal disabilities 2020.

  17. Raghuveera T, Deepthi R, Mangalashri R, Akshaya R. A depth-based Indian sign language recognition using Microsoft Kinect. Sādhanā. 2020. https://doi.org/10.1007/s12046-019-1250-6.

    Article  Google Scholar 

  18. Kishore PVV, Kumar DA, Sastry ASCS, Kumar EK. Motionlets matching with adaptive kernels for 3-d Indian sign language recognition. IEEE Sens J. 2018;18(8):3327–37. https://doi.org/10.1109/jsen.2018.2810449.

    Article  Google Scholar 

  19. Miah ASM, Hasan MAM, Shin J, Okuyama Y, Tomioka Y. Multistage spatial attention-based neural network for hand gesture recognition. Computers. 2023;12(1):13. https://doi.org/10.3390/computers12010013.

    Article  Google Scholar 

  20. Chen N, Feng Z, Li F, Wang H, Yu R, Jiang J, Tang L, Rong P, Wang W. A fully automatic target detection and quantification strategy based on object detection convolutional neural network YOLOv3 for one-step x-ray image grading. Anal Methods. 2023;15(2):164–70. https://doi.org/10.1039/d2ay01526a.

    Article  Google Scholar 

  21. Abu-Jamie TN, Abu-Naser SS. Classification of sign-language using vgg16 2022.

  22. Kanchimani S, Suman M, Kishore PVV. Learning global average attention pooling (GAAP) on resnet50 backbone for person re-identification problem. Int J Adv Comput Sci Appl. 2022. https://doi.org/10.14569/ijacsa.2022.0130796.

    Article  Google Scholar 

  23. Suresh AJ, Visumathi J. WITHDRAWN: Inception ResNet deep transfer learning model for human action recognition using LSTM. Mater Today: Proc. 2020. https://doi.org/10.1016/j.matpr.2020.09.609.

    Article  Google Scholar 

  24. Koushik CVN, Tarun C, Kamal RVN, Anuradha T. Sign language interpreter using inception v2 and faster r-CNN. In: Lecture notes in electrical engineering. Cham: Springer; 2022. p. 771–81. https://doi.org/10.1007/978-981-19-2281-7_71.

    Chapter  Google Scholar 

  25. Zhang S, Tong H, Xu J, Maciejewski R. Graph convolutional networks: a comprehensive review. Comput Soc Netw. 2019. https://doi.org/10.1186/s40649-019-0069-y.

    Article  Google Scholar 

  26. Ulhaq A, Akhtar N, Pogrebna G, Mian A. Vision transformers for action recognition: a survey. 2022 arXiv preprint arXiv:2209.05700

  27. Sincan OM, Tur AO, Keles HY. Isolated sign language recognition with multi-scale features using LSTM. In: 2019 27th Signal Processing and Communications Applications Conference (SIU). IEEE, 2019. https://doi.org/10.1109/siu.2019.8806467.

  28. Wang Q, Lai J, Yang Z, Xu K, Kan P, Liu W, Lei L. Improving cross-dimensional weighting pooling with multi-scale feature fusion for image retrieval. Neurocomputing. 2019;363:17–26. https://doi.org/10.1016/j.neucom.2019.08.025.

    Article  Google Scholar 

  29. Suneetha M, Prasad MVD, Kishore PVV. Sharable and unshareable within class multi view deep metric latent feature learning for video-based sign language recognition. Multimed Tools Appl. 2022;81(19):27247–73. https://doi.org/10.1007/s11042-022-12646-0.

    Article  Google Scholar 

  30. Forster J, Schmidt C, Hoyoux T, Koller O, Zelle U, Piater JH, Ney H. Rwth-phoenix-weather: a large vocabulary sign language recognition and translation corpus. LREC. 2012;9:3785–9.

    Google Scholar 

  31. Camgoz NC, Hadfield S, Koller O, Ney H, Bowden R. Neural sign language translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018; 7784–7793.

  32. Weinberger KQ, Saul LK. Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res. 2009;10(2):207–44.

    Google Scholar 

  33. Xu Z, Cao L, Chen X. Meta-learning via weighted gradient update. IEEE Access. 2019;7:110846–55.

    Article  Google Scholar 

  34. Zhao W, Rao Y, Wang Z, Lu J, Zhou J. Towards interpretable deep metric learning with structural matching. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021; pp. 9887– 9896.

  35. Duarte A, Palaskar S, Ventura L, Ghadiyaram D, DeHaan K, Metze F, Torres J, Giro-i-Nieto X. How2sign: a large-scale multimodal dataset for continuous american sign language. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021; pp 2735– 2744.

  36. Wojke N, Bewley A. Deep cosine metric learning for person re-identification. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018; pp. 748– 756. IEEE.

  37. Chen G, Zhang T, Lu J, Zhou J. Deep meta metric learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019; pp 9547– 9556.

  38. He X, Zhou Y, Zhou Z, Bai S, Bai X. Triplet-center loss for multi-view 3d object retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018; pp 1945–1954.

  39. Qu F, Liu J, Liu X, Jiang L. A multi-fault detection method with improved triplet loss based on hard sample mining. IEEE Trans Sustain Energy. 2020;12(1):127–37.

    Article  Google Scholar 

  40. He Z, Jung C, Fu Q, Zhang Z. Deep feature embedding learning for person re-identification based on lifted structured loss. Multimed Tools Appl. 2019;78:5863–80.

    Article  Google Scholar 

  41. Chen M, Ge Y, Feng X, Xu C, Yang D. Person re-identification by pose invariant deep metric learning with improved triplet loss. IEEE Access. 2018;6:68089–95.

    Article  Google Scholar 

  42. Dong X, Shen J. Triplet loss in Siamese network for object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018; pp 459–474.

  43. Choi H, Som A, Turaga P. Amc-loss: angular margin contrastive loss for improved explainability in image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020; pp 838–839.

  44. Zhong P, Wang D, Miao C. An affect-rich neural conversational model with biased attention and weighted cross-entropy loss. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019; pp 7492–7500.

  45. Alvarez PC, Nieto XG, Benet LT. Sign language translation based on transformers for the how2sign dataset 2022.

  46. Natarajan B, Elakkiya R, Prasad ML. Sentence2signgesture: a hybrid neural machine translation network for sign language video generation. J Ambient Intell Humaniz Comput. 2023;14(8):9807–21.

    Article  Google Scholar 

  47. Kishore P, Prasad MV, Prasad CR, Rahul R. 4-camera model for sign language recognition using elliptical Fourier descriptors and ann. In: 2015 International Conference on Signal Processing and Communication Engineering Systems, 2015; pp 34– 38. IEEE.

  48. Wang Q, Chen X, Zhang L-G, Wang C, Gao W. Viewpoint invariant sign language recognition. Comput Vis Image Underst. 2007;108(1–2):87–97.

    Article  Google Scholar 

  49. Elons AS, Abull-Ela M, Tolba MF. A proposed pcnn features quality optimization technique for pose-invariant 3d Arabic sign language recognition. Appl Soft Comput. 2013;13(4):1646–60.

    Article  Google Scholar 

  50. Ravi S, Suman M, Kishore P, Kumar K, Kumar A, et al. Multi modal spatio temporal co-trained cnns with single modal testing on rgb-d based sign language gesture recognition. J Comput Lang. 2019;52:88–102.

    Article  Google Scholar 

  51. Liao Y, Xiong P, Min W, Min W, Lu J. Dynamic sign language recognition based on video sequence with blstm-3d residual networks. IEEE Access. 2019;7:38044–54.

    Article  Google Scholar 

  52. Cui R, Liu H, Zhang C. Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017; pp 7361–7369.

  53. Rastgoo R, Kiani K, Escalera S. Hand sign language recognition using multi-view hand skeleton. Expert Syst Appl. 2020;150: 113336.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. V. V. Kishore.

Ethics declarations

Conflict of Interest

The author(s) declare that they have no Conflict of Interests for this research in any form.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kishore, P.V.V., Anil Kumar, D. & Srinivasa Rao, K. Sign Language Recognition (SLR): A Brisk Paired Deep Metric Attention Learning (BPDMAL) Model for Video Data Applications. SN COMPUT. SCI. 5, 419 (2024). https://doi.org/10.1007/s42979-024-02793-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-02793-6

Keywords

Navigation