Skip to main content
Log in

Cross-Sign Language Transfer Learning Using Domain Adaptation with Multi-scale Temporal Alignment

  • 1230: Sentient Multimedia Systems and Visual Intelligence
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Sign language serves as a vital means of communication for individuals with hearing impairments, yet recognition resources for the over 100 distinct sign languages are severely lacking. In response, we present our work on sign language recognition using transfer learning and the domain adaptation method TA3N, which utilizes the Temporal Relational Network (TRN) module for aligning multi-scale temporal relations. Our findings highlight the superior performance of Domain Adaptation to neural network-based transfer learning, particularly in improving recognition of American Sign Language (ASL). Our research also identifies the effectiveness of aligning shorter-term temporal features between source and target domains. In addition to using RGB, we conducted experiments using Optical Flow mode for the sign language samples, ultimately determining that RGB outperforms Optical Flow in the majority of cases. Our work aims to improve accessibility and communication for individuals who rely on sign language as their primary mode of communication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Availability of data and materials

All data generated or analyzed during this study are included in these published articles [2, 4, 26,27,28] (and its supplementary information files). The subsets we used are detailed in Section 4.1. For additional guidance on extracting the subsets from their originating datasets, please contact the authors.

Code Availability

The codes used for domain adaptation are based on TA3N [24]. Our modification includes setting the batch size to 20, the mode of learning to supervised learning, and the value of num_segments to the N-multiscale TRN. The codes for converting videos into RGB and Optical Flow frames are available from this repository, https://doi.org/10.6084/m9.figshare.20223444 . For additional guidance, please contact the authors.

References

  1. Farnebäck G (2003) Two-Frame Motion Estimation Based on Polynomial Expansion. SCIA 363-370

  2. Ronchetti F, Quiroga F, Estrebou C, Lanzarini L, Rosete A (2016) LSA64: A Dataset of Argentinian Sign Language. XX II Congreso Argentino de Ciencias de la Computación (CACIC). 794–803

  3. Wang H, Chai X, Hong X, Zhao G, Chen X (2016) Isolated Sign Language Recognition with Grassmann Covariance Matrices. ACM Transactions on Accessible Computing 8(4):1–21. https://doi.org/10.1145/2897735

    Article  Google Scholar 

  4. Li D, Rodriguez C, Yu X, Li H (2020) Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison. The IEEE Winter Conference on Applications of Computer Vision. 1459–1469

  5. Farhadi A, Forsyth D, White R (2007) Transfer Learning in Sign language. IEEE Conference on Computer Vision and Pattern Recognition 2007:1–8. https://doi.org/10.1109/cvpr.2007.383346

    Article  Google Scholar 

  6. Mocialov B, Turner G, Hastie HF (2020) Transfer Learning for British Sign Language Modelling. CoRR abs/2006.02144 https://arxiv.org/abs/2006.02144https://dblp.org/rec/journals/corr/abs-2006-02144.bibhttps://dblp.org

  7. Morocho-Cayamcela ME, Lim W (2019) Fine-tuning a pre-trained Convolutional Neural Network Model to translate American Sign Language in Real-time. 2019 International Conference on Computing, Networking and Communications (ICNC), 100–104

  8. Nishat ZK, Shopon M (2020) Unsupervised Pretraining and Transfer Learning-Based Bangla Sign Language Recognition. Proceedings of International Joint Conference on Computational Intelligence Algorithms for Intelligent Systems 529–540. https://doi.org/10.1007/978-981-15-3607-6_42

  9. Rathi D (2018) Optimization of Transfer Learning for Sign Language Recognition Targeting Mobile Platform. Int J Recent Innov Trends Comput Commun 6(4):198–203

  10. Bird JJ, Ekárt A, Faria DR (2020) British Sign Language Recognition via Late Fusion of Computer Vision and Leap Motion with Transfer Learning to American Sign Language. Sensors 20:5151

  11. Simonyan K, Zisserman A (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556

  12. Li D, Opazo CR, Yu X, Li H (2020) Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV) https://doi.org/10.1109/wacv45572.2020.9093512

  13. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016:770–778

    Google Scholar 

  14. Kocmi T (2020) Exploring Benefits of Transfer Learning in Neural Machine Translation. ArXiv abs/2001.01622

  15. Kocmi T, Bojar O (2018) Trivial Transfer Learning for Low-Resource Neural Machine Translation. WMT

  16. Wang H, Stefan A, Athitsos V (2009) A Similarity Measure for Vision-Based Sign Recognition. HCI

  17. Krishnan R, Sarkar S (2013) Similarity Measure between Two Gestures Using Triplets. IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2013:506–513

    Google Scholar 

  18. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications abs/1704.04861

  19. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the Inception Architecture for Computer Vision. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016:2818–2826

    Google Scholar 

  20. Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning Transferable Architectures for Scalable Image Recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:8697–8710

    Google Scholar 

  21. Bragg D, Koller O, Bellard M, Berke L, Boudreault P, Braffort A, Caselli NK, Huenerfauth M, Kacorri H, Verhoef T, Vogler C, Morris M (2019) Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective. The 21st International ACM SIGACCESS Conference on Computers and Accessibility

  22. Sevilla-Lara L, Liao Y, Göney F, Jampani V, Geiger A, Black MJ (2018) On the Integration of Optical Flow and Action Recognition. GCPR 281–297

  23. Virk JS, Bathula DR (2021) Domain-Specific, Semi-Supervised Transfer Learning for Medical Imaging. 8th ACM IKDD CODS and 26th COMAD

  24. Chen MH, Kira Z, Al-Regib G, Yoo J, Chen R, Zheng J (2019) Temporal Attentive Alignment for Large-Scale Video Domain Adaptation. IEEE/CVF International Conference on Computer Vision (ICCV) 2019:6320–6329

    Google Scholar 

  25. Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? 1411.1792, arXiv, cs.LG

  26. Zhang J, Zhou W, Xie C, Pu J, Li H (2016) Chinese sign language recognition with adaptive HMM. IEEE International Conference on Multimedia and Expo (ICME) 2016:1–6. https://doi.org/10.1109/ICME.2016.7552950

  27. Pu J, Zhou W, Li H (2016) Sign Language Recognition with Multi-modal Features. In: PCM 252–261

  28. Huang J, Zhou W, Zhang Q, Li H, Li W (2018) Video-Based Sign Language Recognition without Temporal Segmentation. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans, Louisiana, USA AAAI’18/IAAI’18/EAAI’18, 2257–2264

  29. Kumar A, Thankachan K, Dominic MM (2016) Sign language recognition. 2016 3rd International Conference on Recent Advances in Information Technology (RAIT), 422–428

  30. Sultani W, Saleemi I (2014) Human Action Recognition across Datasets by Foreground-Weighted Histogram Decomposition. IEEE Conference on Computer Vision and Pattern Recognition 2014:764–771

    Google Scholar 

  31. Xu T, Zhu F, Wong EK, Fang Y (2016) Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition. Image Vis Comput 55:127–137

    Article  Google Scholar 

  32. Jamal A, Namboodiri VP, Deodhare D, Venkatesh KS (2018) Deep Domain Adaptation in Action Space. BMVC

  33. Sahoo A, Shah R, Panda R, Saenko K, Das A (2021) Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Vaughan J (eds.) Advances in Neural Information Processing Systems 34:23386–23400

  34. Soomro K, Zamir AR, Shah M (2012) UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

  35. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: A large video database for human motion recognition. International Conference on Computer Vision 2011:2556–2563. https://doi.org/10.1109/ICCV.2011.6126543

    Article  Google Scholar 

  36. Zhou B, Andonian A, Oliva A, Torralba A (2018) Temporal Relational Reasoning in Videos. European Conference on Computer Vision, 831–846

  37. Wang Y, Quanming Y, Tin-Yau Kwok J, Ni LM (2020) Generalizing from a Few Examples. ACM Computing Surveys (CSUR) 53:1–34

    Google Scholar 

  38. Halvardsson G, Peterson J, Soto-Valero C, Baudry B (2021) Interpretation of Swedish Sign Language using Convolutional Neural Networks and Transfer Learning. SN Computer Science 207. https://doi.org/10.1007/s42979-021-00612-w

  39. Rahman MM, Mdrafi R, Gurbuz AC, Malaia E, Crawford C, Griffin D, Gurbuz SZ (2021) Word-level Sign Language Recognition Using Linguistic Adaptation of 77 GHz FMCW Radar Data, 2021 IEEE Radar Conference (RadarConf21), 1–6 https://doi.org/10.1109/RadarConf2147009.2021.9455190

  40. Abner N, Geraci C, Yu S, Lettieri J, Mertz J, Salgat A (2020) Getting the Upper Hand on Sign Language Families: Historical Analysis and Annotation Methods. FEAST. Formal and Experimental Advances in Sign language Theory. 3:17–29

    Article  Google Scholar 

  41. Vázquez-Enríquez M, Alba-Castro JL, Docío-Fernández L, Rodríguez-Banga E (2021) Isolated Sign Language Recognition with Multi-Scale Spatial-Temporal Graph Convolutional Networks. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2021:3457–3466. https://doi.org/10.1109/CVPRW53098.2021.00385

  42. Zakariah M, Alotaibi YA, Koundal D, Guo Y, Elahi MM (2022) Sign Language Recognition for Arabic Alphabets Using Transfer Learning Technique. Computational Intelligence and Neuroscience, 2022

  43. Shania S, Naufal MF, Prasetyo VR, Azmi MSB (2022) Translator of Indonesian Sign Language Video using Convolutional Neural Network with Transfer Learning. Indones J Inf Syst

  44. Abdullayeva GG, Alishzade NO (2022) Transfer learning for Azerbaijani Sign Language Recognition. Informatics and Control Problems

  45. Thakar S, Shah S, Shah B, Nimkar AV (2022) Sign Language to Text Conversion in Real Time using Transfer Learning. 2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT) 1–5

  46. Das S, Imtiaz MS, Neom N, Siddique N, Wang H (2022) A hybrid approach for Bangla sign language recognition using deep transfer learning model with random forest classifier. Expert Syst Appl 213:118914

  47. Jiang X, Hu B, Satapathy SC, Wang S, Zhang Y (2020) Fingerspelling Identification for Chinese Sign Language via AlexNet-Based Transfer Learning and Adam Optimizer. Sci Program 2020:3291426–3291426

  48. Sharma CM, Tomar K, Mishra RK, Chariar VM (2021) Indian Sign Language Recognition Using Fine-tuned Deep Transfer Learning Model. SSRN Electron J

  49. Suharjito, Thiracitta N, Gunawan H (2021) SIBI Sign Language Recognition Using Convolutional Neural Network Combined with Transfer Learning and non-trainable Parameters. Procedia Comput Sci 179:72–80

Download references

Funding

This research was funded by the Shenzhen Science and Technology Innovation Commission (JCYJ20210324135011030), Science and Technology Innovation Committee of Shenzhen-Platform and Carrier (International Science and Technology Information Center), High-end Foreign Expert Talent Introduction Plan (G2021032022L), Guangdong Pearl River Plan (2019QN01X890), and National Natural Science Foundation of China (Grant No. 71971127).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection, and analysis were performed by Keren Artiaga, Yang Li, Ercan Engin Kuruoglu, and Wai Kin (Victor) Chan. The first draft of the manuscript was written by Keren Artiaga and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Keren Artiaga.

Ethics declarations

Conflict of interest/Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Ethics approval

Not applicable

Consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Artiaga, K., Li, Y., Kuruoglu, E.E. et al. Cross-Sign Language Transfer Learning Using Domain Adaptation with Multi-scale Temporal Alignment. Multimed Tools Appl 83, 37025–37051 (2024). https://doi.org/10.1007/s11042-023-16703-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16703-0

Keywords

Navigation