Abstract
Children are likely to have trouble pronouncing vowels in real-life conversations. Therefore, it becomes necessary to handle the correct pronunciation corresponding to every vowel for the improved efficiency of Automatic Speech Recognition (ASR) system. In this research, the linguistic study of native speakers and their auditory inconsistency was pursued using the extraction of efficient front-end speech vectors utilizing three varying fractal dimensions (FD) : Higuchi FD, Katz FD, and Petrosian FD. These depicted fractal measurements are based on precise evaluation of FD, which serves as a key parameter of fractal geometry, thus helping in much easier representation of complex shapes in an input signal as compared to conventional speech parameters. Furthermore, experimental results on the use of these short-term fractal components on pooling with Mel frequency cepstral coefficients (MFCC) have been recorded with modest changes using hidden Markov models (HMM). The selection of optimal features was made possible by increasing child data through adaptation measures on adult data, which has allowed for the examination of new features under mismatched conditions resulting in an overall improvement of 11.54% in the performance of the proposed ASR framework.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
L.R. Rabiner, Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, in Readings in Speech Recognition, (IEEE, 1990), p. 267
W. Zhang, Y. Liu, X. Wang, X. Tian, The dynamic and task-dependent representational transformation between the motor and sensory systems during speech production. Cogn. Neurosci. 11(4), 194–204 (2020). https://doi.org/10.1080/17588928.2020.1792868
J. Wolfe, J. Smith, S. Neumann, S. Miller, E.C. Schafer, A.L. Birath, et al., Optimizing communication in schools and other settings during COVID-19. Hear. J. 73(9), 40–42 (2020). https://doi.org/10.1097/01.HJ.0000717184.65906.b9
D. Giuliani, M. Gerosa, Investigating recognition of children’s speech, in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings (ICASSP'03), vol. 2, (IEEE, 2003), p. II-137. https://doi.org/10.1109/ICASSP.2003.1202313
M. Russell, C. Brown, A. Skilling, R. Series, J. Wallace, B. Bonham, P. Barker, Applications of automatic speech recognition to speech and language development in young children, in Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, vol. 1, (IEEE, 1996), pp. 176–179. https://doi.org/10.1109/ICSLP.1996.607069
J.L. Wu, H.M. Yang, Y.H. Lin, Q.J. Fu, Effects of computer-assisted speech training on Mandarin-speaking hearing-impaired children. Audiol. Neurotol. 12(5), 307–312 (2007)
J. Oliveira, I. Praça, On the usage of pre-trained speech recognition deep layers to detect emotions. IEEE Access 9, 9699–9705 (2021)
S. Dey, P. Motlicek, S. Madikeri, M. Ferras, Template-matching for text-dependent speaker verification. Speech Comm. 88, 96–105 (2017). https://doi.org/10.1016/j.specom.2017.01.009
A. Arora, V. Kadyan, A. Singh, Effect of Tonal Features on Various Dialectal Variations of Punjabi Language, in Advances in Signal Processing and Communication: Select Proceedings of ICSC 2018, ed. by B. S. Rawat, A. Trivedi, S. Manhas, V. Karwal, (Springer, New York, 2018), pp. 467–472
N. Bassan, V. Kadyan, An Experimental Study of Continuous Automatic Speech Recognition System Using MFCC with Reference to Punjabi, in Recent Findings in Intelligent Computing Techniques: Proceedings of the 5th ICACNI 2017, vol. 707, (Springer Nature, Singapore, 2018), p. 267. https://doi.org/10.1007/978-981-10-8639-7_288
Y. Kumar, N. Singh, M. Kumar, A. Singh, AutoSSR: An efficient approach for automatic spontaneous speech recognition model for the Punjabi language. Soft Comput.., Springer (2020). https://doi.org/10.1007/s00500-020-05248-1
A. Chern, Y.H. Lai, Y.P. Chang, Y. Tsao, R.Y. Chang, H.W. Chang, A smartphone-based multi-functional hearing assistive system to facilitate speech recognition in the classroom. IEEE Access 5, 10339–10351 (2017). https://doi.org/10.1109/ACCESS.2017.27114899
Z. Zhang, J. Geiger, J. Pohjalainen, A.E.D. Mousa, W. Jin, B. Schuller, Deep learning for environmentally robust speech recognition: An overview of recent developments. ACM Trans. Intel. Syst. Technol. (TIST) 9(5), 1–28 (2018). https://doi.org/10.1145/3178115
H. Wang, J. Li, L. Guo, Z. Dou, Y. Lin, R. Zhou, Fractal complexity-based feature extraction algorithm of communication signals. Fractals 25(04), 1740008 (2017). https://doi.org/10.1142/S0218348X17400084
M. Dalal, M. Tanveer, R.B. Pachori, Automated Identification System for Focal EEG Signals Using Fractal Dimension of FAWT-based Sub-bands Signals, in Machine Intelligence and Signal Analysis, (Springer, Singapore, 2019), pp. 583–596. https://doi.org/10.1007/978-981-13-0923-6_50
J. Kaur, A. Singh, V. Kadyan, Automatic Speech Recognition System for Tonal Languages: State-of-the-Art Survey, in Archives of Computational Methods in Engineering, (Springer, 2020). https://doi.org/10.1007/s11831-020-09414-4
A. Korolj, H.T. Wu, M. Radisic, A healthy dose of chaos: Using fractal frameworks for engineering higher-fidelity biomedical systems. Biomaterials 219, 119363 (2019). https://doi.org/10.1016/j.biomaterials.2019.119363
A. Singh, V. Kadyan, M. Kumar, N. Bassan, ASRoIL: a comprehensive survey for automatic speech recognition of Indian languages. Artif. Intel. Rev.., Springer 53, 3673–3704 (2019)
J.P.A. Sanchez, O.C. Alegria, M.V. Rodriguez, J.A.L.C. Abeyro, J.R.M. Almaraz, A.D. Gonzalez, Detection of ULF geomagnetic anomalies associated to seismic activity using EMD method and fractal dimension theory. IEEE Lat. Am. Trans. 15(2), 197–205 (2017). https://doi.org/10.1109/TLA.2017.7854612
Y.D. Zhang, X.Q. Chen, T.M. Zhan, Z.Q. Jiao, Y. Sun, Z.M. Chen, S.H. Wang, Fractal dimension estimation for developing pathological brain detection system based on Minkowski-Bouligand method. IEEE Access 4, 5937–5947 (2016). https://doi.org/10.1109/ACCESS.2016.2611530
Y. Gui, Hausdorff Dimension Spectrum of Self-affine Carpets Indexed by Nonlinear Fibre-coding, in 2009 International Workshop on Chaos-Fractals Theories and Applications, (IEEE, 2009), pp. 382–386. https://doi.org/10.1109/IWCFTA.2009.86
E. Guariglia, Entropy and fractal antennas. Entropy 18(3), 84 (2016). https://doi.org/10.3390/e18030084
C. Sevcik, A procedure to estimate the fractal dimension of waveforms. arXiv (2010) preprint arXiv:1003.5266
A. Petrosian, Kolmogorov complexity of finite sequences and recognition of different preictal EEG patterns, in Proceedings Eighth IEEE Symposium on Computer-Based Medical Systems, (IEEE, 1995), pp. 212–217. https://doi.org/10.1109/CBMS.1995.465426
M. Ezz-Eldin, A.A. Khalaf, H.F. Hamed, A.I. Hussein, Efficient feature-aware hybrid model of deep learning architectures for speech emotion recognition. IEEE Access 9, 19999–20011 (2021). https://doi.org/10.1109/ACCESS.2021.3054345
V. Kadyan, S. Shanawazuddin, A. Singh, Developing children’s speech recognition system for low resource punjabi language. Appl. Acoust. 178 (2021). https://doi.org/10.1016/j.apacoust.2021.108002
E. Guariglia, Spectral analysis of the Weierstrass-Mandelbrot function, in 2017 2nd International Multidisciplinary Conference on Computer and Energy Science (SpliTech), (IEEE, 2017), pp. 1–6
C.T. Shi, Signal pattern recognition based on fractal features and machine learning. Appl. Sci. 8(8), 1327 (2018). https://doi.org/10.3390/app8081327
A. Ezeiza, K.L. de Ipina, C. Hernández, N. Barroso, Enhancing the feature extraction process for automatic speech recognition with fractal dimensions. Cogn. Comput. 5(4), 545–550 (2013). https://doi.org/10.1007/s12559-012-9165-0
V. Kadyan, A. Mantri, R.K. Aggarwal, A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers. Int. J. Speech Technol. 20(4), 761–769 (2017). https://doi.org/10.1007/s10772-017-9446-9
J. Singh, K. Kaur, Speech Enhancement for Punjabi Language Using Deep Neural Network, in 2019 International Conference on Signal Processing and Communication (ICSC), (IEEE, 2019), pp. 202–204. https://doi.org/10.1109/ICSC45622.2019.8938309
M. Qian, I. McLoughlin, W. Quo, L. Dai, Mismatched training data enhancement for automatic recognition of children’s speech using DNN-HMM, in 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), (IEEE, 2016), pp. 1–5. https://doi.org/10.1109/ISCSLP.2016.7918386
M. Manjutha, P. Subashini, M. Krishnaveni, V. Narmadha, An Optimized Cepstral Feature Selection method for Dysfluencies Classification using Tamil Speech Dataset, in 2019 IEEE International Smart Cities Conference (ISC2), (IEEE, 2019), pp. 671–677. https://doi.org/10.1109/ISC246665.2019.9071756
V. Kadyan, A. Mantri, R.K. Aggarwal, A. Singh, A comparative study of deep neural network based Punjabi-ASR system. Int. J. Speech Technol. 22(1), 111–119 (2019). https://doi.org/10.1007/s10772-018-09577-3
J. Guglani, A.N. Mishra, Continuous Punjabi speech recognition model based on Kaldi ASR toolkit. Int. J. Speech Technol. 21(2), 211–216 (2018). https://doi.org/10.1007/s10772-018-9497-6
K. Goyal, A. Singh, V. Kadyan, A comparison of laryngeal effect in the dialects of Punjabi language. J. Ambient. Intell. Human. Comput. (2021). https://doi.org/10.1007/s12652-021-03235-4
J. Guglani, A.N. Mishra, Automatic speech recognition system with pitch dependent features for Punjabi language on KALDI toolkit. Appl. Acoust. 167, 107386 (2020). https://doi.org/10.1016/j.apacoust.2020.107386
G. Sreeram, K. Dhawan, K. Priyadarshi, R. Sinha, Joint Language Identification of Code-Switching Speech using Attention-based E2E Network, in 2020 International Conference on Signal Processing and Communications (SPCOM), (IEEE, 2020), pp. 1–5. https://doi.org/10.1109/SPCOM50965.2020.9179636
J. Li, L. Deng, Y. Gong, R. Haeb-Umbach, An overview of noise-robust automatic speech recognition, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22(4), (IEEE, 2014), pp. 745–777. https://doi.org/10.1109/TASLP.2014.2304637
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, K. Vesely, The Kaldi speech recognition toolkit, in IEEE 2011 workshop on automatic speech recognition and understanding (No.CONF), (IEEE Signal Processing Society, 2011)
S. Rajendran, P. Jayagopal, Preserving learnability and intelligibility at the point of care with assimilation of different speech recognition techniques. Int. J. Speech Technol. 23, 265–276 (2020). https://doi.org/10.1007/s10772-020-09687-x
Conflict of Interest
The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Bawa, P., Kadyan, V., Mantri, A., Kumar, V. (2021). Optimal Fractal Feature Selection and Estimation for Speech Recognition Under Mismatched Conditions. In: Kadyan, V., Singh, A., Mittal, M., Abualigah, L. (eds) Deep Learning Approaches for Spoken and Natural Language Processing. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-79778-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-79778-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79777-5
Online ISBN: 978-3-030-79778-2
eBook Packages: Computer ScienceComputer Science (R0)