Sign Language Recognition (SLR): A Brisk Paired Deep Metric Attention Learning (BPDMAL) Model for Video Data Applications

Kishore, P. V. V.; Anil Kumar, D.; Srinivasa Rao, K.

doi:10.1007/s42979-024-02793-6

Sign Language Recognition (SLR): A Brisk Paired Deep Metric Attention Learning (BPDMAL) Model for Video Data Applications

Original Research
Published: 10 April 2024

Volume 5, article number 419, (2024)
Cite this article

SN Computer Science Aims and scope Submit manuscript

P. V. V. Kishore¹,
D. Anil Kumar²^na1 &
K. Srinivasa Rao³^na1

54 Accesses
Explore all metrics

Abstract

Existing sign language recognition (SLR) models have shown to lack precision in identifying a sign due to their inability to rationalize inter-class discriminations. Specifically, the trained SLR models are sensitive to small variations in hand movements and finger shapes across signs in a video sequence. To overcome the above problem, this work proposes to learn a class label by computing a metric variable that squeezes the displacement between within-class and across-class labels. Generally, metric learning is considerably slower than other deep SLR classification architectures on video data due to the triplet pairing process. In traditional triplet pairing, all frames in all classes participate during the training process in each episode. Contrastingly, this paper proposes a self-sourced singular pairing process between the anchor and positive frames along with an attention mechanism, resulting in Brisk Paired Deep Metric Learning (BPDMAL) model. The BPDMAL integrated with standard deep learning architectures is evaluated on our 2D video sign language dataset named KL2DSL and two other benchmark video-based sign language datasets. The proposed BPDMAL has improved performance over the traditional DML and state-of-the-art SLR Deep Learning Models with an incremental downfall in training and inferencing times making it a useful model for real-time deployment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Article 10 June 2021

Deep learning-based sign language recognition system for static signs

Article 01 January 2020

Availability of Data and Materials

Data will be made available on reasonable request.

References

Koller O, Zargaran S, Ney H, Bowden R. Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int J Comput Vis. 2018;126(12):1311–25. https://doi.org/10.1007/s11263-018-1121-3.
Article Google Scholar
Kumar EK, Kishore P, Sastry A, Kumar MTK, Kumar DA. Training cnns for 3-d sign language recognition with color texture coded joint angular displacement maps. IEEE Signal Process Lett. 2018;25(5):645–9.
Article Google Scholar
Ayuningsih T, Suhendar A, Suyanto S. Feasibility study of artificial intelligence technology for home video surveillance system. In: 2022 1st International Conference on Information System and Information Technology (ICISIT). IEEE (2022). https://doi.org/10.1109/icisit54091.2022.9872822.
Ghosh I, Ramamurthy SR, Chakma A, Roy N. Sports analytics review: artificial intelligence applications, emerging technologies, and algorithmic perspective. WIREs Data Min Knowl Discov. 2023. https://doi.org/10.1002/widm.1496.
Article Google Scholar
Wu J, Wang X, Dang Y, Lv Z. Digital twins and artificial intelligence in transportation infrastructure: classification, application, and future research directions. Comput Electr Eng. 2022;101: 107983. https://doi.org/10.1016/j.compeleceng.2022.107983.
Article Google Scholar
Wu B, Lu Z, Yang C. A modified LSTM model for Chinese sign language recognition using leap motion. In: 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2022. https://doi.org/10.1109/smc53654.2022.9945287.
Rao GA, Syamala K, Kishore PVV, Sastry ASCS. Deep convolutional neural networks for sign language recognition. In: 2018 Conference on Signal Processing and Communication Engineering Systems (SPACES). IEEE, 2018. https://doi.org/10.1109/spaces.2018.8316344.
Ali SA, Prasad MVD, Kumar PP, Kishore PVV. Deep multi view spatio temporal spectral feature embedding on skeletal sign language videos for recognition. Int J Adv Comput Sci Appl. 2022. https://doi.org/10.14569/ijacsa.2022.0130494.
Article Google Scholar
Kumar EK, Kishore PVV, Kumar MTK, Kumar DA, Sastry ASCS. Three-dimensional sign language recognition with angular velocity maps and connived feature ResNet. IEEE Signal Process Lett. 2018;25(12):1860–4. https://doi.org/10.1109/lsp.2018.2877891.
Article Google Scholar
Maddala TKK, Kishore PVV, Eepuri KK, Dande AK. YogaNet: 3-d yoga asana recognition using joint angular displacement maps with ConvNets. IEEE Trans Multimed. 2019;21(10):2492–503. https://doi.org/10.1109/tmm.2019.2904880.
Article Google Scholar
Nassif A.B, Shahin I, Attili I, Azzeh M, Shaalan K. Speech recognition using deep neural networks: a systematic review. IEEE Access. 2019;7:19143–65. https://doi.org/10.1109/access.2019.2896880.
Article Google Scholar
Hoffer E, Ailon N. Deep metric learning using triplet network. In: Similarity-based pattern recognition. Cham: Springer; 2015. p. 84–92. https://doi.org/10.1007/978-3-319-24261-3_7.
Chapter Google Scholar
Mopidevi S, Prasad MVD, Kishore PVV. Multiview meta-metric learning for sign language recognition using triplet loss embeddings. Pattern Anal Appl. 2023;26(3):1125–41. https://doi.org/10.1007/s10044-023-01134-2.
Article Google Scholar
Yu J, Hu C-H, Jing X-Y, Feng Y-J. Deep metric learning with dynamic margin hard sampling loss for face verification. SIViP. 2019;14(4):791–8. https://doi.org/10.1007/s11760-019-01612-3.
Article Google Scholar
Tubaiz N, Shanableh T, Assaleh K. Glove-based continuous Arabic sign language recognition in user-dependent mode. IEEE Trans Hum-Mach Syst. 2015;45(4):526–33. https://doi.org/10.1109/thms.2015.2406692.
Article Google Scholar
Ayoub H, Grierson M. Hand gesture recognition and speech synthesis data glove for children with non-verbal disabilities 2020.
Raghuveera T, Deepthi R, Mangalashri R, Akshaya R. A depth-based Indian sign language recognition using Microsoft Kinect. Sādhanā. 2020. https://doi.org/10.1007/s12046-019-1250-6.
Article Google Scholar
Kishore PVV, Kumar DA, Sastry ASCS, Kumar EK. Motionlets matching with adaptive kernels for 3-d Indian sign language recognition. IEEE Sens J. 2018;18(8):3327–37. https://doi.org/10.1109/jsen.2018.2810449.
Article Google Scholar
Miah ASM, Hasan MAM, Shin J, Okuyama Y, Tomioka Y. Multistage spatial attention-based neural network for hand gesture recognition. Computers. 2023;12(1):13. https://doi.org/10.3390/computers12010013.
Article Google Scholar
Chen N, Feng Z, Li F, Wang H, Yu R, Jiang J, Tang L, Rong P, Wang W. A fully automatic target detection and quantification strategy based on object detection convolutional neural network YOLOv3 for one-step x-ray image grading. Anal Methods. 2023;15(2):164–70. https://doi.org/10.1039/d2ay01526a.
Article Google Scholar
Abu-Jamie TN, Abu-Naser SS. Classification of sign-language using vgg16 2022.
Kanchimani S, Suman M, Kishore PVV. Learning global average attention pooling (GAAP) on resnet50 backbone for person re-identification problem. Int J Adv Comput Sci Appl. 2022. https://doi.org/10.14569/ijacsa.2022.0130796.
Article Google Scholar
Suresh AJ, Visumathi J. WITHDRAWN: Inception ResNet deep transfer learning model for human action recognition using LSTM. Mater Today: Proc. 2020. https://doi.org/10.1016/j.matpr.2020.09.609.
Article Google Scholar
Koushik CVN, Tarun C, Kamal RVN, Anuradha T. Sign language interpreter using inception v2 and faster r-CNN. In: Lecture notes in electrical engineering. Cham: Springer; 2022. p. 771–81. https://doi.org/10.1007/978-981-19-2281-7_71.
Chapter Google Scholar
Zhang S, Tong H, Xu J, Maciejewski R. Graph convolutional networks: a comprehensive review. Comput Soc Netw. 2019. https://doi.org/10.1186/s40649-019-0069-y.
Article Google Scholar
Ulhaq A, Akhtar N, Pogrebna G, Mian A. Vision transformers for action recognition: a survey. 2022 arXiv preprint arXiv:2209.05700
Sincan OM, Tur AO, Keles HY. Isolated sign language recognition with multi-scale features using LSTM. In: 2019 27th Signal Processing and Communications Applications Conference (SIU). IEEE, 2019. https://doi.org/10.1109/siu.2019.8806467.
Wang Q, Lai J, Yang Z, Xu K, Kan P, Liu W, Lei L. Improving cross-dimensional weighting pooling with multi-scale feature fusion for image retrieval. Neurocomputing. 2019;363:17–26. https://doi.org/10.1016/j.neucom.2019.08.025.
Article Google Scholar
Suneetha M, Prasad MVD, Kishore PVV. Sharable and unshareable within class multi view deep metric latent feature learning for video-based sign language recognition. Multimed Tools Appl. 2022;81(19):27247–73. https://doi.org/10.1007/s11042-022-12646-0.
Article Google Scholar
Forster J, Schmidt C, Hoyoux T, Koller O, Zelle U, Piater JH, Ney H. Rwth-phoenix-weather: a large vocabulary sign language recognition and translation corpus. LREC. 2012;9:3785–9.
Google Scholar
Camgoz NC, Hadfield S, Koller O, Ney H, Bowden R. Neural sign language translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018; 7784–7793.
Weinberger KQ, Saul LK. Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res. 2009;10(2):207–44.
Google Scholar
Xu Z, Cao L, Chen X. Meta-learning via weighted gradient update. IEEE Access. 2019;7:110846–55.
Article Google Scholar
Zhao W, Rao Y, Wang Z, Lu J, Zhou J. Towards interpretable deep metric learning with structural matching. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021; pp. 9887– 9896.
Duarte A, Palaskar S, Ventura L, Ghadiyaram D, DeHaan K, Metze F, Torres J, Giro-i-Nieto X. How2sign: a large-scale multimodal dataset for continuous american sign language. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021; pp 2735– 2744.
Wojke N, Bewley A. Deep cosine metric learning for person re-identification. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018; pp. 748– 756. IEEE.
Chen G, Zhang T, Lu J, Zhou J. Deep meta metric learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019; pp 9547– 9556.
He X, Zhou Y, Zhou Z, Bai S, Bai X. Triplet-center loss for multi-view 3d object retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018; pp 1945–1954.
Qu F, Liu J, Liu X, Jiang L. A multi-fault detection method with improved triplet loss based on hard sample mining. IEEE Trans Sustain Energy. 2020;12(1):127–37.
Article Google Scholar
He Z, Jung C, Fu Q, Zhang Z. Deep feature embedding learning for person re-identification based on lifted structured loss. Multimed Tools Appl. 2019;78:5863–80.
Article Google Scholar
Chen M, Ge Y, Feng X, Xu C, Yang D. Person re-identification by pose invariant deep metric learning with improved triplet loss. IEEE Access. 2018;6:68089–95.
Article Google Scholar
Dong X, Shen J. Triplet loss in Siamese network for object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018; pp 459–474.
Choi H, Som A, Turaga P. Amc-loss: angular margin contrastive loss for improved explainability in image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020; pp 838–839.
Zhong P, Wang D, Miao C. An affect-rich neural conversational model with biased attention and weighted cross-entropy loss. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019; pp 7492–7500.
Alvarez PC, Nieto XG, Benet LT. Sign language translation based on transformers for the how2sign dataset 2022.
Natarajan B, Elakkiya R, Prasad ML. Sentence2signgesture: a hybrid neural machine translation network for sign language video generation. J Ambient Intell Humaniz Comput. 2023;14(8):9807–21.
Article Google Scholar
Kishore P, Prasad MV, Prasad CR, Rahul R. 4-camera model for sign language recognition using elliptical Fourier descriptors and ann. In: 2015 International Conference on Signal Processing and Communication Engineering Systems, 2015; pp 34– 38. IEEE.
Wang Q, Chen X, Zhang L-G, Wang C, Gao W. Viewpoint invariant sign language recognition. Comput Vis Image Underst. 2007;108(1–2):87–97.
Article Google Scholar
Elons AS, Abull-Ela M, Tolba MF. A proposed pcnn features quality optimization technique for pose-invariant 3d Arabic sign language recognition. Appl Soft Comput. 2013;13(4):1646–60.
Article Google Scholar
Ravi S, Suman M, Kishore P, Kumar K, Kumar A, et al. Multi modal spatio temporal co-trained cnns with single modal testing on rgb-d based sign language gesture recognition. J Comput Lang. 2019;52:88–102.
Article Google Scholar
Liao Y, Xiong P, Min W, Min W, Lu J. Dynamic sign language recognition based on video sequence with blstm-3d residual networks. IEEE Access. 2019;7:38044–54.
Article Google Scholar
Cui R, Liu H, Zhang C. Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017; pp 7361–7369.
Rastgoo R, Kiani K, Escalera S. Hand sign language recognition using multi-view hand skeleton. Expert Syst Appl. 2020;150: 113336.
Article Google Scholar

Download references

Author information

D. Anil Kumar and K. Srinivasa Rao have contributed equally to this work.

Authors and Affiliations

Department of Electronics and Communication Engineering, Biomechanics and Vision Computing Research Center, Koneru Lakshmaiah Education Foundation, Deemed to Be University, Vaddeswaram, Guntur, Andhra Pradesh, 522302, India
P. V. V. Kishore
Department of Electronics and Communication Engineering, PACE Institute of Technology and Sciences, Vallur Village, Ongole, Andhra Pradesh, 523272, India
D. Anil Kumar
Department of Electronics and Communication Engineering, Dhanekula Institute of Engineering and Technology, Ganguru, Vijayawada, Andhra Pradesh, 521139, India
K. Srinivasa Rao

Authors

P. V. V. Kishore
View author publications
You can also search for this author in PubMed Google Scholar
D. Anil Kumar
View author publications
You can also search for this author in PubMed Google Scholar
K. Srinivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. V. V. Kishore.

Ethics declarations

Conflict of Interest

The author(s) declare that they have no Conflict of Interests for this research in any form.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kishore, P.V.V., Anil Kumar, D. & Srinivasa Rao, K. Sign Language Recognition (SLR): A Brisk Paired Deep Metric Attention Learning (BPDMAL) Model for Video Data Applications. SN COMPUT. SCI. 5, 419 (2024). https://doi.org/10.1007/s42979-024-02793-6

Download citation

Received: 17 December 2023
Accepted: 14 March 2024
Published: 10 April 2024
DOI: https://doi.org/10.1007/s42979-024-02793-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sign Language Recognition (SLR): A Brisk Paired Deep Metric Attention Learning (BPDMAL) Model for Video Data Applications

Abstract

Access this article

Similar content being viewed by others

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Deep learning-based sign language recognition system for static signs

Availability of Data and Materials

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sign Language Recognition (SLR): A Brisk Paired Deep Metric Attention Learning (BPDMAL) Model for Video Data Applications

Abstract

Access this article

Similar content being viewed by others

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Deep learning-based sign language recognition system for static signs

Availability of Data and Materials

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation