Optimal Fractal Feature Selection and Estimation for Speech Recognition Under Mismatched Conditions

Bawa, Puneet; Kadyan, Virender; Mantri, Archana; Kumar, Vaibhav

doi:10.1007/978-3-030-79778-2_3

Optimal Fractal Feature Selection and Estimation for Speech Recognition Under Mismatched Conditions

Puneet Bawa¹⁰,
Virender Kadyan¹¹,
Archana Mantri¹² &
…
Vaibhav Kumar¹⁰

Chapter
First Online: 01 January 2022

501 Accesses

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

Children are likely to have trouble pronouncing vowels in real-life conversations. Therefore, it becomes necessary to handle the correct pronunciation corresponding to every vowel for the improved efficiency of Automatic Speech Recognition (ASR) system. In this research, the linguistic study of native speakers and their auditory inconsistency was pursued using the extraction of efficient front-end speech vectors utilizing three varying fractal dimensions (FD) : Higuchi FD, Katz FD, and Petrosian FD. These depicted fractal measurements are based on precise evaluation of FD, which serves as a key parameter of fractal geometry, thus helping in much easier representation of complex shapes in an input signal as compared to conventional speech parameters. Furthermore, experimental results on the use of these short-term fractal components on pooling with Mel frequency cepstral coefficients (MFCC) have been recorded with modest changes using hidden Markov models (HMM). The selection of optimal features was made possible by increasing child data through adaptation measures on adult data, which has allowed for the examination of new features under mismatched conditions resulting in an overall improvement of 11.54% in the performance of the proposed ASR framework.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

L.R. Rabiner, Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, in Readings in Speech Recognition, (IEEE, 1990), p. 267
Chapter Google Scholar
W. Zhang, Y. Liu, X. Wang, X. Tian, The dynamic and task-dependent representational transformation between the motor and sensory systems during speech production. Cogn. Neurosci. 11(4), 194–204 (2020). https://doi.org/10.1080/17588928.2020.1792868
Article Google Scholar
J. Wolfe, J. Smith, S. Neumann, S. Miller, E.C. Schafer, A.L. Birath, et al., Optimizing communication in schools and other settings during COVID-19. Hear. J. 73(9), 40–42 (2020). https://doi.org/10.1097/01.HJ.0000717184.65906.b9
Article Google Scholar
D. Giuliani, M. Gerosa, Investigating recognition of children’s speech, in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings (ICASSP'03), vol. 2, (IEEE, 2003), p. II-137. https://doi.org/10.1109/ICASSP.2003.1202313
Chapter Google Scholar
M. Russell, C. Brown, A. Skilling, R. Series, J. Wallace, B. Bonham, P. Barker, Applications of automatic speech recognition to speech and language development in young children, in Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, vol. 1, (IEEE, 1996), pp. 176–179. https://doi.org/10.1109/ICSLP.1996.607069
Chapter Google Scholar
J.L. Wu, H.M. Yang, Y.H. Lin, Q.J. Fu, Effects of computer-assisted speech training on Mandarin-speaking hearing-impaired children. Audiol. Neurotol. 12(5), 307–312 (2007)
Article Google Scholar
J. Oliveira, I. Praça, On the usage of pre-trained speech recognition deep layers to detect emotions. IEEE Access 9, 9699–9705 (2021)
Article Google Scholar
S. Dey, P. Motlicek, S. Madikeri, M. Ferras, Template-matching for text-dependent speaker verification. Speech Comm. 88, 96–105 (2017). https://doi.org/10.1016/j.specom.2017.01.009
Article Google Scholar
A. Arora, V. Kadyan, A. Singh, Effect of Tonal Features on Various Dialectal Variations of Punjabi Language, in Advances in Signal Processing and Communication: Select Proceedings of ICSC 2018, ed. by B. S. Rawat, A. Trivedi, S. Manhas, V. Karwal, (Springer, New York, 2018), pp. 467–472
Google Scholar
N. Bassan, V. Kadyan, An Experimental Study of Continuous Automatic Speech Recognition System Using MFCC with Reference to Punjabi, in Recent Findings in Intelligent Computing Techniques: Proceedings of the 5th ICACNI 2017, vol. 707, (Springer Nature, Singapore, 2018), p. 267. https://doi.org/10.1007/978-981-10-8639-7_288
Chapter Google Scholar
Y. Kumar, N. Singh, M. Kumar, A. Singh, AutoSSR: An efficient approach for automatic spontaneous speech recognition model for the Punjabi language. Soft Comput.., Springer (2020). https://doi.org/10.1007/s00500-020-05248-1
A. Chern, Y.H. Lai, Y.P. Chang, Y. Tsao, R.Y. Chang, H.W. Chang, A smartphone-based multi-functional hearing assistive system to facilitate speech recognition in the classroom. IEEE Access 5, 10339–10351 (2017). https://doi.org/10.1109/ACCESS.2017.27114899
Article Google Scholar
Z. Zhang, J. Geiger, J. Pohjalainen, A.E.D. Mousa, W. Jin, B. Schuller, Deep learning for environmentally robust speech recognition: An overview of recent developments. ACM Trans. Intel. Syst. Technol. (TIST) 9(5), 1–28 (2018). https://doi.org/10.1145/3178115
Article Google Scholar
H. Wang, J. Li, L. Guo, Z. Dou, Y. Lin, R. Zhou, Fractal complexity-based feature extraction algorithm of communication signals. Fractals 25(04), 1740008 (2017). https://doi.org/10.1142/S0218348X17400084
Article Google Scholar
M. Dalal, M. Tanveer, R.B. Pachori, Automated Identification System for Focal EEG Signals Using Fractal Dimension of FAWT-based Sub-bands Signals, in Machine Intelligence and Signal Analysis, (Springer, Singapore, 2019), pp. 583–596. https://doi.org/10.1007/978-981-13-0923-6_50
Chapter Google Scholar
J. Kaur, A. Singh, V. Kadyan, Automatic Speech Recognition System for Tonal Languages: State-of-the-Art Survey, in Archives of Computational Methods in Engineering, (Springer, 2020). https://doi.org/10.1007/s11831-020-09414-4
Chapter Google Scholar
A. Korolj, H.T. Wu, M. Radisic, A healthy dose of chaos: Using fractal frameworks for engineering higher-fidelity biomedical systems. Biomaterials 219, 119363 (2019). https://doi.org/10.1016/j.biomaterials.2019.119363
Article Google Scholar
A. Singh, V. Kadyan, M. Kumar, N. Bassan, ASRoIL: a comprehensive survey for automatic speech recognition of Indian languages. Artif. Intel. Rev.., Springer 53, 3673–3704 (2019)
Article Google Scholar
J.P.A. Sanchez, O.C. Alegria, M.V. Rodriguez, J.A.L.C. Abeyro, J.R.M. Almaraz, A.D. Gonzalez, Detection of ULF geomagnetic anomalies associated to seismic activity using EMD method and fractal dimension theory. IEEE Lat. Am. Trans. 15(2), 197–205 (2017). https://doi.org/10.1109/TLA.2017.7854612
Article Google Scholar
Y.D. Zhang, X.Q. Chen, T.M. Zhan, Z.Q. Jiao, Y. Sun, Z.M. Chen, S.H. Wang, Fractal dimension estimation for developing pathological brain detection system based on Minkowski-Bouligand method. IEEE Access 4, 5937–5947 (2016). https://doi.org/10.1109/ACCESS.2016.2611530
Article Google Scholar
Y. Gui, Hausdorff Dimension Spectrum of Self-affine Carpets Indexed by Nonlinear Fibre-coding, in 2009 International Workshop on Chaos-Fractals Theories and Applications, (IEEE, 2009), pp. 382–386. https://doi.org/10.1109/IWCFTA.2009.86
Chapter Google Scholar
E. Guariglia, Entropy and fractal antennas. Entropy 18(3), 84 (2016). https://doi.org/10.3390/e18030084
Article MathSciNet Google Scholar
C. Sevcik, A procedure to estimate the fractal dimension of waveforms. arXiv (2010) preprint arXiv:1003.5266
Google Scholar
A. Petrosian, Kolmogorov complexity of finite sequences and recognition of different preictal EEG patterns, in Proceedings Eighth IEEE Symposium on Computer-Based Medical Systems, (IEEE, 1995), pp. 212–217. https://doi.org/10.1109/CBMS.1995.465426
Chapter Google Scholar
M. Ezz-Eldin, A.A. Khalaf, H.F. Hamed, A.I. Hussein, Efficient feature-aware hybrid model of deep learning architectures for speech emotion recognition. IEEE Access 9, 19999–20011 (2021). https://doi.org/10.1109/ACCESS.2021.3054345
Article Google Scholar
V. Kadyan, S. Shanawazuddin, A. Singh, Developing children’s speech recognition system for low resource punjabi language. Appl. Acoust. 178 (2021). https://doi.org/10.1016/j.apacoust.2021.108002
E. Guariglia, Spectral analysis of the Weierstrass-Mandelbrot function, in 2017 2nd International Multidisciplinary Conference on Computer and Energy Science (SpliTech), (IEEE, 2017), pp. 1–6
Google Scholar
C.T. Shi, Signal pattern recognition based on fractal features and machine learning. Appl. Sci. 8(8), 1327 (2018). https://doi.org/10.3390/app8081327
Article Google Scholar
A. Ezeiza, K.L. de Ipina, C. Hernández, N. Barroso, Enhancing the feature extraction process for automatic speech recognition with fractal dimensions. Cogn. Comput. 5(4), 545–550 (2013). https://doi.org/10.1007/s12559-012-9165-0
Article Google Scholar
V. Kadyan, A. Mantri, R.K. Aggarwal, A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers. Int. J. Speech Technol. 20(4), 761–769 (2017). https://doi.org/10.1007/s10772-017-9446-9
Article Google Scholar
J. Singh, K. Kaur, Speech Enhancement for Punjabi Language Using Deep Neural Network, in 2019 International Conference on Signal Processing and Communication (ICSC), (IEEE, 2019), pp. 202–204. https://doi.org/10.1109/ICSC45622.2019.8938309
Chapter Google Scholar
M. Qian, I. McLoughlin, W. Quo, L. Dai, Mismatched training data enhancement for automatic recognition of children’s speech using DNN-HMM, in 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), (IEEE, 2016), pp. 1–5. https://doi.org/10.1109/ISCSLP.2016.7918386
Chapter Google Scholar
M. Manjutha, P. Subashini, M. Krishnaveni, V. Narmadha, An Optimized Cepstral Feature Selection method for Dysfluencies Classification using Tamil Speech Dataset, in 2019 IEEE International Smart Cities Conference (ISC2), (IEEE, 2019), pp. 671–677. https://doi.org/10.1109/ISC246665.2019.9071756
Chapter Google Scholar
V. Kadyan, A. Mantri, R.K. Aggarwal, A. Singh, A comparative study of deep neural network based Punjabi-ASR system. Int. J. Speech Technol. 22(1), 111–119 (2019). https://doi.org/10.1007/s10772-018-09577-3
Article Google Scholar
J. Guglani, A.N. Mishra, Continuous Punjabi speech recognition model based on Kaldi ASR toolkit. Int. J. Speech Technol. 21(2), 211–216 (2018). https://doi.org/10.1007/s10772-018-9497-6
Article Google Scholar
K. Goyal, A. Singh, V. Kadyan, A comparison of laryngeal effect in the dialects of Punjabi language. J. Ambient. Intell. Human. Comput. (2021). https://doi.org/10.1007/s12652-021-03235-4
J. Guglani, A.N. Mishra, Automatic speech recognition system with pitch dependent features for Punjabi language on KALDI toolkit. Appl. Acoust. 167, 107386 (2020). https://doi.org/10.1016/j.apacoust.2020.107386
Article Google Scholar
G. Sreeram, K. Dhawan, K. Priyadarshi, R. Sinha, Joint Language Identification of Code-Switching Speech using Attention-based E2E Network, in 2020 International Conference on Signal Processing and Communications (SPCOM), (IEEE, 2020), pp. 1–5. https://doi.org/10.1109/SPCOM50965.2020.9179636
Chapter Google Scholar
J. Li, L. Deng, Y. Gong, R. Haeb-Umbach, An overview of noise-robust automatic speech recognition, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22(4), (IEEE, 2014), pp. 745–777. https://doi.org/10.1109/TASLP.2014.2304637
Chapter Google Scholar
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, K. Vesely, The Kaldi speech recognition toolkit, in IEEE 2011 workshop on automatic speech recognition and understanding (No.CONF), (IEEE Signal Processing Society, 2011)
Google Scholar
S. Rajendran, P. Jayagopal, Preserving learnability and intelligibility at the point of care with assimilation of different speech recognition techniques. Int. J. Speech Technol. 23, 265–276 (2020). https://doi.org/10.1007/s10772-020-09687-x
Article Google Scholar

Download references

Conflict of Interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Centre of Excellence for Speech and Multimodal Laboratory, Chitkara University Institute of Engineering & Technology, Chitkara University, Chandigarh, Punjab, India
Puneet Bawa & Vaibhav Kumar
Speech and Language Research Centre (SLRC), School of Computer Science, University of Petroleum & Energy Studies, Dehradun, Uttarakhand, India
Virender Kadyan
Chitkara University Institute of Engineering & Technology, Chitkara University, Chandigarh, Punjab, India
Archana Mantri

Authors

Puneet Bawa
View author publications
You can also search for this author in PubMed Google Scholar
Virender Kadyan
View author publications
You can also search for this author in PubMed Google Scholar
Archana Mantri
View author publications
You can also search for this author in PubMed Google Scholar
Vaibhav Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Virender Kadyan .

Editor information

Editors and Affiliations

University of Petroleum and Energy Studies, Dehradun, India
Virender Kadyan
Jagat Guru Nanak Dev Punjab State Open University, Patiala, Punjab, India
Amitoj Singh
Kyoto Sangyo University, Kyoto, Japan
Mohit Mittal
Amman Arab University, Amman, Jordan
Laith Abualigah

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bawa, P., Kadyan, V., Mantri, A., Kumar, V. (2021). Optimal Fractal Feature Selection and Estimation for Speech Recognition Under Mismatched Conditions. In: Kadyan, V., Singh, A., Mittal, M., Abualigah, L. (eds) Deep Learning Approaches for Spoken and Natural Language Processing. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-79778-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-79778-2_3
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79777-5
Online ISBN: 978-3-030-79778-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics