Skip to main content

Optimal Fractal Feature Selection and Estimation for Speech Recognition Under Mismatched Conditions

  • Chapter
  • First Online:
  • 501 Accesses

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

Children are likely to have trouble pronouncing vowels in real-life conversations. Therefore, it becomes necessary to handle the correct pronunciation corresponding to every vowel for the improved efficiency of Automatic Speech Recognition (ASR) system. In this research, the linguistic study of native speakers and their auditory inconsistency was pursued using the extraction of efficient front-end speech vectors utilizing three varying fractal dimensions (FD) : Higuchi FD, Katz FD, and Petrosian FD. These depicted fractal measurements are based on precise evaluation of FD, which serves as a key parameter of fractal geometry, thus helping in much easier representation of complex shapes in an input signal as compared to conventional speech parameters. Furthermore, experimental results on the use of these short-term fractal components on pooling with Mel frequency cepstral coefficients (MFCC) have been recorded with modest changes using hidden Markov models (HMM). The selection of optimal features was made possible by increasing child data through adaptation measures on adult data, which has allowed for the examination of new features under mismatched conditions resulting in an overall improvement of 11.54% in the performance of the proposed ASR framework.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. L.R. Rabiner, Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, in Readings in Speech Recognition, (IEEE, 1990), p. 267

    Chapter  Google Scholar 

  2. W. Zhang, Y. Liu, X. Wang, X. Tian, The dynamic and task-dependent representational transformation between the motor and sensory systems during speech production. Cogn. Neurosci. 11(4), 194–204 (2020). https://doi.org/10.1080/17588928.2020.1792868

    Article  Google Scholar 

  3. J. Wolfe, J. Smith, S. Neumann, S. Miller, E.C. Schafer, A.L. Birath, et al., Optimizing communication in schools and other settings during COVID-19. Hear. J. 73(9), 40–42 (2020). https://doi.org/10.1097/01.HJ.0000717184.65906.b9

    Article  Google Scholar 

  4. D. Giuliani, M. Gerosa, Investigating recognition of children’s speech, in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings (ICASSP'03), vol. 2, (IEEE, 2003), p. II-137. https://doi.org/10.1109/ICASSP.2003.1202313

    Chapter  Google Scholar 

  5. M. Russell, C. Brown, A. Skilling, R. Series, J. Wallace, B. Bonham, P. Barker, Applications of automatic speech recognition to speech and language development in young children, in Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, vol. 1, (IEEE, 1996), pp. 176–179. https://doi.org/10.1109/ICSLP.1996.607069

    Chapter  Google Scholar 

  6. J.L. Wu, H.M. Yang, Y.H. Lin, Q.J. Fu, Effects of computer-assisted speech training on Mandarin-speaking hearing-impaired children. Audiol. Neurotol. 12(5), 307–312 (2007)

    Article  Google Scholar 

  7. J. Oliveira, I. Praça, On the usage of pre-trained speech recognition deep layers to detect emotions. IEEE Access 9, 9699–9705 (2021)

    Article  Google Scholar 

  8. S. Dey, P. Motlicek, S. Madikeri, M. Ferras, Template-matching for text-dependent speaker verification. Speech Comm. 88, 96–105 (2017). https://doi.org/10.1016/j.specom.2017.01.009

    Article  Google Scholar 

  9. A. Arora, V. Kadyan, A. Singh, Effect of Tonal Features on Various Dialectal Variations of Punjabi Language, in Advances in Signal Processing and Communication: Select Proceedings of ICSC 2018, ed. by B. S. Rawat, A. Trivedi, S. Manhas, V. Karwal, (Springer, New York, 2018), pp. 467–472

    Google Scholar 

  10. N. Bassan, V. Kadyan, An Experimental Study of Continuous Automatic Speech Recognition System Using MFCC with Reference to Punjabi, in Recent Findings in Intelligent Computing Techniques: Proceedings of the 5th ICACNI 2017, vol. 707, (Springer Nature, Singapore, 2018), p. 267. https://doi.org/10.1007/978-981-10-8639-7_288

    Chapter  Google Scholar 

  11. Y. Kumar, N. Singh, M. Kumar, A. Singh, AutoSSR: An efficient approach for automatic spontaneous speech recognition model for the Punjabi language. Soft Comput.., Springer (2020). https://doi.org/10.1007/s00500-020-05248-1

  12. A. Chern, Y.H. Lai, Y.P. Chang, Y. Tsao, R.Y. Chang, H.W. Chang, A smartphone-based multi-functional hearing assistive system to facilitate speech recognition in the classroom. IEEE Access 5, 10339–10351 (2017). https://doi.org/10.1109/ACCESS.2017.27114899

    Article  Google Scholar 

  13. Z. Zhang, J. Geiger, J. Pohjalainen, A.E.D. Mousa, W. Jin, B. Schuller, Deep learning for environmentally robust speech recognition: An overview of recent developments. ACM Trans. Intel. Syst. Technol. (TIST) 9(5), 1–28 (2018). https://doi.org/10.1145/3178115

    Article  Google Scholar 

  14. H. Wang, J. Li, L. Guo, Z. Dou, Y. Lin, R. Zhou, Fractal complexity-based feature extraction algorithm of communication signals. Fractals 25(04), 1740008 (2017). https://doi.org/10.1142/S0218348X17400084

    Article  Google Scholar 

  15. M. Dalal, M. Tanveer, R.B. Pachori, Automated Identification System for Focal EEG Signals Using Fractal Dimension of FAWT-based Sub-bands Signals, in Machine Intelligence and Signal Analysis, (Springer, Singapore, 2019), pp. 583–596. https://doi.org/10.1007/978-981-13-0923-6_50

    Chapter  Google Scholar 

  16. J. Kaur, A. Singh, V. Kadyan, Automatic Speech Recognition System for Tonal Languages: State-of-the-Art Survey, in Archives of Computational Methods in Engineering, (Springer, 2020). https://doi.org/10.1007/s11831-020-09414-4

    Chapter  Google Scholar 

  17. A. Korolj, H.T. Wu, M. Radisic, A healthy dose of chaos: Using fractal frameworks for engineering higher-fidelity biomedical systems. Biomaterials 219, 119363 (2019). https://doi.org/10.1016/j.biomaterials.2019.119363

    Article  Google Scholar 

  18. A. Singh, V. Kadyan, M. Kumar, N. Bassan, ASRoIL: a comprehensive survey for automatic speech recognition of Indian languages. Artif. Intel. Rev.., Springer 53, 3673–3704 (2019)

    Article  Google Scholar 

  19. J.P.A. Sanchez, O.C. Alegria, M.V. Rodriguez, J.A.L.C. Abeyro, J.R.M. Almaraz, A.D. Gonzalez, Detection of ULF geomagnetic anomalies associated to seismic activity using EMD method and fractal dimension theory. IEEE Lat. Am. Trans. 15(2), 197–205 (2017). https://doi.org/10.1109/TLA.2017.7854612

    Article  Google Scholar 

  20. Y.D. Zhang, X.Q. Chen, T.M. Zhan, Z.Q. Jiao, Y. Sun, Z.M. Chen, S.H. Wang, Fractal dimension estimation for developing pathological brain detection system based on Minkowski-Bouligand method. IEEE Access 4, 5937–5947 (2016). https://doi.org/10.1109/ACCESS.2016.2611530

    Article  Google Scholar 

  21. Y. Gui, Hausdorff Dimension Spectrum of Self-affine Carpets Indexed by Nonlinear Fibre-coding, in 2009 International Workshop on Chaos-Fractals Theories and Applications, (IEEE, 2009), pp. 382–386. https://doi.org/10.1109/IWCFTA.2009.86

    Chapter  Google Scholar 

  22. E. Guariglia, Entropy and fractal antennas. Entropy 18(3), 84 (2016). https://doi.org/10.3390/e18030084

    Article  MathSciNet  Google Scholar 

  23. C. Sevcik, A procedure to estimate the fractal dimension of waveforms. arXiv (2010) preprint arXiv:1003.5266

    Google Scholar 

  24. A. Petrosian, Kolmogorov complexity of finite sequences and recognition of different preictal EEG patterns, in Proceedings Eighth IEEE Symposium on Computer-Based Medical Systems, (IEEE, 1995), pp. 212–217. https://doi.org/10.1109/CBMS.1995.465426

    Chapter  Google Scholar 

  25. M. Ezz-Eldin, A.A. Khalaf, H.F. Hamed, A.I. Hussein, Efficient feature-aware hybrid model of deep learning architectures for speech emotion recognition. IEEE Access 9, 19999–20011 (2021). https://doi.org/10.1109/ACCESS.2021.3054345

    Article  Google Scholar 

  26. V. Kadyan, S. Shanawazuddin, A. Singh, Developing children’s speech recognition system for low resource punjabi language. Appl. Acoust. 178 (2021). https://doi.org/10.1016/j.apacoust.2021.108002

  27. E. Guariglia, Spectral analysis of the Weierstrass-Mandelbrot function, in 2017 2nd International Multidisciplinary Conference on Computer and Energy Science (SpliTech), (IEEE, 2017), pp. 1–6

    Google Scholar 

  28. C.T. Shi, Signal pattern recognition based on fractal features and machine learning. Appl. Sci. 8(8), 1327 (2018). https://doi.org/10.3390/app8081327

    Article  Google Scholar 

  29. A. Ezeiza, K.L. de Ipina, C. Hernández, N. Barroso, Enhancing the feature extraction process for automatic speech recognition with fractal dimensions. Cogn. Comput. 5(4), 545–550 (2013). https://doi.org/10.1007/s12559-012-9165-0

    Article  Google Scholar 

  30. V. Kadyan, A. Mantri, R.K. Aggarwal, A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers. Int. J. Speech Technol. 20(4), 761–769 (2017). https://doi.org/10.1007/s10772-017-9446-9

    Article  Google Scholar 

  31. J. Singh, K. Kaur, Speech Enhancement for Punjabi Language Using Deep Neural Network, in 2019 International Conference on Signal Processing and Communication (ICSC), (IEEE, 2019), pp. 202–204. https://doi.org/10.1109/ICSC45622.2019.8938309

    Chapter  Google Scholar 

  32. M. Qian, I. McLoughlin, W. Quo, L. Dai, Mismatched training data enhancement for automatic recognition of children’s speech using DNN-HMM, in 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), (IEEE, 2016), pp. 1–5. https://doi.org/10.1109/ISCSLP.2016.7918386

    Chapter  Google Scholar 

  33. M. Manjutha, P. Subashini, M. Krishnaveni, V. Narmadha, An Optimized Cepstral Feature Selection method for Dysfluencies Classification using Tamil Speech Dataset, in 2019 IEEE International Smart Cities Conference (ISC2), (IEEE, 2019), pp. 671–677. https://doi.org/10.1109/ISC246665.2019.9071756

    Chapter  Google Scholar 

  34. V. Kadyan, A. Mantri, R.K. Aggarwal, A. Singh, A comparative study of deep neural network based Punjabi-ASR system. Int. J. Speech Technol. 22(1), 111–119 (2019). https://doi.org/10.1007/s10772-018-09577-3

    Article  Google Scholar 

  35. J. Guglani, A.N. Mishra, Continuous Punjabi speech recognition model based on Kaldi ASR toolkit. Int. J. Speech Technol. 21(2), 211–216 (2018). https://doi.org/10.1007/s10772-018-9497-6

    Article  Google Scholar 

  36. K. Goyal, A. Singh, V. Kadyan, A comparison of laryngeal effect in the dialects of Punjabi language. J. Ambient. Intell. Human. Comput. (2021). https://doi.org/10.1007/s12652-021-03235-4

  37. J. Guglani, A.N. Mishra, Automatic speech recognition system with pitch dependent features for Punjabi language on KALDI toolkit. Appl. Acoust. 167, 107386 (2020). https://doi.org/10.1016/j.apacoust.2020.107386

    Article  Google Scholar 

  38. G. Sreeram, K. Dhawan, K. Priyadarshi, R. Sinha, Joint Language Identification of Code-Switching Speech using Attention-based E2E Network, in 2020 International Conference on Signal Processing and Communications (SPCOM), (IEEE, 2020), pp. 1–5. https://doi.org/10.1109/SPCOM50965.2020.9179636

    Chapter  Google Scholar 

  39. J. Li, L. Deng, Y. Gong, R. Haeb-Umbach, An overview of noise-robust automatic speech recognition, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22(4), (IEEE, 2014), pp. 745–777. https://doi.org/10.1109/TASLP.2014.2304637

    Chapter  Google Scholar 

  40. D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, K. Vesely, The Kaldi speech recognition toolkit, in IEEE 2011 workshop on automatic speech recognition and understanding (No.CONF), (IEEE Signal Processing Society, 2011)

    Google Scholar 

  41. S. Rajendran, P. Jayagopal, Preserving learnability and intelligibility at the point of care with assimilation of different speech recognition techniques. Int. J. Speech Technol. 23, 265–276 (2020). https://doi.org/10.1007/s10772-020-09687-x

    Article  Google Scholar 

Download references

Conflict of Interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Virender Kadyan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bawa, P., Kadyan, V., Mantri, A., Kumar, V. (2021). Optimal Fractal Feature Selection and Estimation for Speech Recognition Under Mismatched Conditions. In: Kadyan, V., Singh, A., Mittal, M., Abualigah, L. (eds) Deep Learning Approaches for Spoken and Natural Language Processing. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-79778-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79778-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79777-5

  • Online ISBN: 978-3-030-79778-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics