Skip to main content

Advertisement

Log in

A Comprehensive Analysis of Speech Recognition Systems in Healthcare: Current Research Challenges and Future Prospects

  • Review Article
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

The primary purpose of the article is to explore and highlight the benefits of the automatic speech recognition system in the healthcare sector. Healthcare is only one of the many industries that have benefited from the tremendous breakthroughs brought about by pervasive computing. The rising use of speech-processing technologies is one of the most notable. In this work, we discuss how speech recognition can benefit healthcare in multiple ways. Also, voice signal analysis can be used to aid in early identification and tracking the progression of illness conditions over a period. This article gives comprehensive prospects—on work accomplished in speech recognition system in the healthcare domain and a widespread view towards this specific research area. This article also covers different speech recognition software used in the healthcare domain for various purposes and various datasets used by different researchers for healthcare. This article aims to present a comprehensive analysis of various applications of the healthcare domain using speech recognition and further summarize a synthesis analysis based on the research findings and various speech feature extraction techniques for health speech data for recognition and transcription. The article gives conscious and constructive assistance to researchers working in speech recognition systems in healthcare.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Availability of Data and Materials

Not applicable.

References

  1. Furui S. Recent progress in corpus-based spontaneous speech recognition. IEICE Trans Inf Syst. 2005;E88D(3):366–75.

    Article  Google Scholar 

  2. Warfel T, Chang P. Integrating dictation with PACS to eliminate paper. J Digit Imaging. 2004;17(1):37–44.

    Article  Google Scholar 

  3. Goedert J. Is now the time for speech recognition? Health Data Manag. 2006;14(11):44–50.

    Google Scholar 

  4. Homma S, Kobayashi A, Oku T, Sato S, Imai T, Takagi T. New real-time closed captioning system for Japanese broadcast news programs. In: Proceedings of the international conference computers helping people with special needs. Austria: Univ. Linz; 2008. p. 651–4.

  5. Kumar Y, Singh N. An automatic speech recognition system for spontaneous Punjabi speech corpus. Int J Speech Technol. 2017;20(2):1–7.

    Article  MathSciNet  Google Scholar 

  6. Kumar Y, Singh N. A comprehensive view of automatic speech recognition system—a systematic literature review. In: Proceedings of the international conference on automation, computational and technology management (ICACTM); 2019. p. 168–73.

  7. Kumar Y, Mahajan M. Machine learning based speech emotions recognition system. Int J Sci Technol Res. 2019;8(07):722–9.

    Google Scholar 

  8. Ziaei A, Sangwan A, Hansen JHL. Prof-lifelog: personal interaction analysis for naturalistic audio streams. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP); 2013. p. 7770–4.

  9. Wang D, Narayanan S. Robust speech rate estimation for spontaneous speech. IEEE Trans Audio Speech Lang Process. 2007;15(8):2190–201.

    Article  Google Scholar 

  10. Smolenski BY, Ramachandran RP. Usable speech processing: a filterless approach in the presence of interference. Circ Syst Mag. 2011;11(2):8–22.

    Article  Google Scholar 

  11. Mendonca EA, Haas J, Shagina L, Larson E, Friedman C. Extracting information on pneumonia in infants using natural language processing of radiology reports. J Biomed Inform. 2005;38(4):314–21.

    Article  Google Scholar 

  12. https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html; 2021.

  13. https://www.britannica.com/science/cleft-palate; 25-07-2023.

  14. https://archive.ics.uci.edu/dataset/189/parkinsons+telemonitoring; 2010.

  15. https://www.zenodo.org/record/1188976/; 05-04-2018.

  16. https://karger.com/dib/article/5/1/78/100175/Voice-for-Health-The-Use-of-Vocal-Biomarkers-from; April 2021.

  17. https://www.spiceworks.com/tech/artificial-intelligence/articles/speech-recognition-software/; 14 Sept 2022.

  18. Sahidullah M, Goutam S. Design, analysis and experimental evaluation of block-based transformation in MFCC computation for speaker recognition. Speech Commun. 2012;54(4):543–65.

    Article  Google Scholar 

  19. Singh N, Khan RA, Shree R. MFCC and prosodic feature extraction techniques: a comparative study. Int J Comput Appl. 2012;54(1):9–13.

    Google Scholar 

  20. Jeff Bilmes CPC, Ellis DPW. WA on speech feature smoothing for robust ASR. In: Proceedings of the international conference on acoustics, speech, and signal processing; 2005. p. 525–8.

  21. Hermansky H, Morgan N. Rasta processing of speech. IEEE Trans Speech Audio Process. 1994;2(4):578–89.

    Article  Google Scholar 

  22. Wang Y, Han K, Wang DL. Exploring monaural features for classification-based speech segregation. IEEE Trans Audio Speech Lang Process. 2012;21(2):270–9.

    Article  Google Scholar 

  23. Lu L, Renals S. Probabilistic linear discriminant analysis for acoustic modelling. IEEE Signal Process Lett. 2014;10(10):702–6.

    Article  Google Scholar 

  24. Charles AH, Devaraj G. Alaigal—a tamil speech recognition. Tamil Internet. 2004;2004:125–31.

    Google Scholar 

  25. Dumitru CO, Gavat I. A comparative study of feature extraction methods applied to continuous speech recognition in romanian language. In: Proceedings of the 48th international symposium on multimedia signal processing and communications; 2006. p. 115–8.

  26. Lingam STC. A review of feature extraction techniques in automatic speech recognition. Int J Sci Eng Technol. 2013;2(6):479–84.

    Google Scholar 

  27. Furui S, Ichiba T, Shinozaki T, Whittaker EW, Iwano K. Cluster-based modeling for ubiquitous speech recognition. Interspeech. 2005;2005:2865–8.

    Google Scholar 

  28. Gaikwad SK, Gawali BW, Yannawar P. A review on speech recognition technique. Int J Comput Appl. 2010;10(3):16–24.

    Google Scholar 

  29. Ghai W, Singh N. Literature review on automatic speech recognition. Int J Comput Appl. 2012;41(8):42–50.

    Google Scholar 

  30. Kumar Y, Koul A, Singh C. A deep learning approaches in text-to-speech system: a systematic review and recent research perspective. Multimed Tools Appl. 2022;82:15171–97. https://doi.org/10.1007/s11042-022-13943-4.

    Article  Google Scholar 

  31. Alzubaidi L, Zhang J, Humaidi AJ, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 2021;8:53. https://doi.org/10.1186/s40537-021-00444-8.

    Article  Google Scholar 

  32. Singh M, Pal TR. Voice recognition technology implementation in surgical pathology: advantages and limitations. Arch Pathol Lab Med. 2011;135(11):1476–81.

    Article  Google Scholar 

  33. Chaudhry B, Wang J, Wu S, Maglione M, Mojica W, Roth E, Morton S, Shekell PG. Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann Intern Med. 2006;144(10):742–52.

    Article  Google Scholar 

  34. Goldzweig CL, Towfigh A, Maglione M, Shekelle PF. Costs and benefits of health information technology: new trends from the literature. Health Aff. 2009;28(2):1–5.

    Google Scholar 

  35. Khanapi Abd Ghani AM, Novita Dewi I. Comparing speech recognition and text writing in recording patient health records. In: Proceedings of the EMBS international conference on biomedical engineering and sciences; 2012. p. 365–70.

  36. Johnson M, Lapkin S, Long V, Sanchez P, Suominen H, Basilakis J, Dawson L. A systematic review of speech recognition technology in health care. BMC Med Inform Decis Mak. 2014;14(94):1–14.

    Google Scholar 

  37. Wang X, Wu F, Ye Z. The application of speech recognition in radiology information system. In: Proceedings of the international conference on biomedical engineering and computer science; 2010. p. 1–3.

  38. Pitaksirianantl N, Saykhum K, Wutiwiwatchai C, Chotimongko A, Pimkhaokham A. A study of automatic speech intelligibility testing for thai oral surgical patients. In: Proceedings of the 8th electrical engineering/electronics, computer, telecommunications and information technology (ECTI) association of Thailand—conference; 2011. p. 938–41.

  39. https://disvoice.readthedocs.io/en/latest/; 2020.

  40. https://www.notta.ai/en/blog/speech-recognition-software; 26-09-2022.

  41. https://www.notta.ai/en/blog/medical-dictation-apps; 03-04-2023.

  42. Takaoka M, Nishizaki H, Sekiguchi Y. Utterance verification using garbage words for a hospital appointment system with speech interface. In: Proceedings of the workshop on automatic speech recognition and understanding; 2011. p. 336–41.

  43. Su B-H, Fu P-W, Lin P-C, Shih P-Y, Lin Y-C, Wang J-F, Tsai A-C. A spoken dialogue system with situation an emotion detection based on anthropomorphic learning for warming healthcare. In: Proceedings of the international conference on orange technologies; 2014. p 133–6.

  44. Frid A, Hazan H, Hilu D, Manevitz L, Ramig LO, Sapir S. Computational diagnosis of Parkinson’s disease directly from natural speech using machine learning techniques. In: Proceedings of the international conference on software science, technology and engineering; 2014. p. 50–3.

  45. Nagy G, Varkonyi-Koczy AR, Toth J. An anytime voice controlled ambient assisted living system for motion disabled persons. In: Proceedings of the international symposium on medical measurements and applications (MeMeA); 2015. p. 1–6.

  46. Gomez-Vilda P, Rodellar-Biarge V, Palacios-Alonso D, Martinez-Olalla R, Alvarez-Marquina A, Lasso-Vazquez JM, Scola-Yurrita B, Poletti-Serafini D. Pattern matching of voice quality features from vocal-fold paralysis patients treated with stem-cell grafting. In: Proceedings of the international conference on pattern recognition systems (ICPRS-16); 2016. p. 1–6.

  47. Zinchenko K, Wu CY, Song K-T. A study on motion control of a robotic endoscope holder using speech recognition. In: Proceedings of the international conference on industrial technology (ICIT); 2016. p. 1472–5.

  48. Hezarjaribi N, Reynolds CA, Miller DT, Chaytor N, Ghasemzadeh H. S2NI: a mobile platform for nutrition monitoring from spoken data. In: Proceedings of the 38th annual international conference of the IEEE engineering in medicine and biology society (EMBC); 2016. p. 1991–4.

  49. Kulkarni A, Kalburgi D, Ghuli P. Design of predictive model for healthcare assistance using voice recognition. In: Proceedings of the 2nd IEEE international conference on computational systems and information technology for sustainable solutions; 2017. pp 61–4.

  50. Krishnaveni M, Subaashini P, Gracy J, Manjutha M. An optimal speech recognition module for patient’s voice monitoring system in smart healthcare applications. In: Proceedings of the renewable energies, power systems & green inclusive economy (REPS-GIE); 2018. p. 1–6.

  51. Lakdawala B, Khan F, Khan A, Tomar Y, Gupta R, Shaikh A. Voice to text transcription using CMU sphinx a mobile application for healthcare organization. In: Proceedings of the 2nd international conference on inventive communication and computational technologies (ICICCT 2018); 2018. p. 749–53.

  52. Hezarjaribi N, Mazrouee S, Ghasemzadeh H. Speech2Health: a mobile framework for monitoring dietary composition from spoken data. J Biomed Health Inf. 2018;22(1):1–12.

    Google Scholar 

  53. Wu GD, Lei Y. A register array based low power FFT processor for speech recognition. J Inf Sci Eng. 2008;24:981–91.

    Google Scholar 

  54. Juang BH, Rabiner LR. Automatic speech recognition—a brief history of the technology development. Encyclop Lang Ling. 2005;1–24:2005.

    Google Scholar 

  55. King S, Frankel J, Livescu K, McDermott E, Richmond K, Wester M. Speech production knowledge in automatic speech recognition. J Acoust Soc Am. 2007;2007:723–42.

    Article  Google Scholar 

  56. Kumar Y, Gupta S, Singh W. A novel deep transfer learning models for recognition of birds sounds in different environment. Soft Comput. 2022;26:1003–23. https://doi.org/10.1007/s00500-021-06640-1.

    Article  Google Scholar 

  57. Pakhomov SV, Buntrock JD, Chute CG. Automating the assignment of diagnosis codes to patient encounters using example based and machine learning techniques. J Am Med Inform Assoc. 2006;13(5):516–25.

    Article  Google Scholar 

  58. Jamal A, McKenzie K, Clark M. The impact of health information technology on the quality of medical and health care: a systematic review. Health Inf Manag J. 2009;38(3):26–37.

    Google Scholar 

  59. Kreps GL, Neuhauser L. New directions in eHealth communication: opportunities and challenges. Patient Educ Counsel. 2010;78(3):329–36.

    Article  Google Scholar 

  60. Waneka R, Spetz J. Hospital information technology systems’ impact on nurses and nursing care. J Nurs Adm. 2010;40(12):509–14.

    Article  Google Scholar 

  61. Pearson JF, Brownstein CA, Brownstein JS. Potential for electronic health records and online social networking to redefine medical research. Clin Chem. 2011;57(2):196–204.

    Article  Google Scholar 

  62. Al-Aynati MM, Chorneyko KA. Comparison of voice-automated transcription and human transcription in generating pathology reports. Arch Pathol Lab Med. 2003;127(5):721–5.

    Article  Google Scholar 

  63. Itakura F. Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process. 1975;23(1):67–72.

    Article  Google Scholar 

  64. Callaway EC, Sweet CF, Siegel E, Reiser JM, Beall DP. Speech recognition interface to a hospital information system using a self-designed visual basic program: initial experience. J Digit Imaging. 2002;15(1):43–53.

    Article  Google Scholar 

  65. Houston JD, Rupp FW. Experience with implementation of a radiology speech recognition system. J Digit Imaging. 2000;13(124):124–8.

    Article  Google Scholar 

  66. Mohr DN, Turner DW, Pond GR, Kamath JS, De Vos CB, Carpenter PC. Speech recognition as a transcription aid: a randomized comparison with standard transcription. J Am Med Inform Assoc. 2003;10(1):85–93.

    Article  Google Scholar 

  67. Buntin MB, Burke MF, Hoaglin MC, Blumenthal D. The benefits of health information technology: a review of the recent literature shows predominantly positive results. Health Aff. 2011;30(3):464–71.

    Article  Google Scholar 

  68. Agrawal P, Ganapathy S. Deep variational filter learning models for speech recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP); 2019. p. 1–5.

  69. Pardede HF, Yuliani AR, Subekti A. On the effect of the implementation of human auditory systems on Q-log-based features for robustness of speech recognition against noise. J Inf Sci Eng. 2018;2018:1–16.

    Google Scholar 

  70. Wang J, Wang D, Zhu Z, Zheng TF, Soong F. Discriminative scoring for speaker recognition based on I-vectors. In: Signal and information processing association annual summit and conference (APSIPA); 2014. p. 1–5

  71. Tjandra A, Sakti S, Nakamura S. Unifying speech recognition and generation with machine speech chain. In: The association for natural language processing; 2019. p. 183–5.

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yogesh Kumar.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, Y. A Comprehensive Analysis of Speech Recognition Systems in Healthcare: Current Research Challenges and Future Prospects. SN COMPUT. SCI. 5, 137 (2024). https://doi.org/10.1007/s42979-023-02466-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02466-w

Keywords

Navigation