Abstract
Digit recognition is one of the elegant research topics in modern world. Scientists had already got an excellent output in their research work on this topic for English and Chinese like languages. However, very few works exist on digit recognition for regional language. Moreover in case of regional language the pronunciation rapidly varies on the basis of area and the performance of their proposed recognition reasonably low. In this paper we have worked on digit recognition on a regional language, Bengali referred to as Bangla. In our proposed work of isolated digit and word recognition, we created a small speech database, containing ten Bangla digit zero to nine (pronounced as ‘sunno’ to ‘noi’) and four Bangla words (English equivalent Right, Left, Above, and Below respectively) with 100 samples for each class. We have done a pre-processing phase, followed by a 39 dimensional feature extraction procedure of Mel Frequency Cepstral Coefficients (MFCC), \(\varDelta \) MFCC and \(\varDelta \) \(\varDelta \) MFCC. Finally, we used the Dynamic Time Warping (DTW) for classification of testing purpose. The system has achieved a highest accuracy of 93% for classification of Bangla words and digits, which is considerably satisfactory. The comparative analysis with the existing method is given in Sect. 5.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mohanty, P., Nayak, A.K.: Isolated odia digit recognition using htk: an implementation view. In: 2018 2nd International Conference on Data Science and Business Analytics (ICDSBA), pp. 30–35. IEEE (2018)
Ghanty, S.K., Shaikh, S.H., Chaki, N.: On recognition of spoken Bengali numerals. In: 2010 International Conference on Computer Information Systems and Industrial Management Applications (CISIM), pp. 54–59. IEEE (2010)
Muhammad, G., Alotaibi, Y.A., Huda, M.N.: Automatic speech recognition for Bangladigits. In: 2009 12th International Conference on Computers and Information Technology, pp. 379–383. IEEE (2009)
Nahid, M.M.H., Islam, M.A., Islam, M.S.: A noble approach for recognizing Bangla real number automatically using CMU Sphinx4. In: 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV), pp. 844–849. IEEE (2016)
Abdullah-al-Mamun, M.D., Mahmud, F.: Performance analysis of isolated Bangla speech recognition system using Hidden Markov Model
Gupta, A., Sarkar, K.: Recognition of spoken bengali numerals using MLP, SVM, RF based models with PCA based feature summarization. Int. Arab J. Inf. Technol. 15(2), 263–269 (2018)
Ahammad, K., Rahman, M.M.: Connected bangla speech recognition using artificial neural network. Int. J. Comput. Appl. 149(9), 38–41 (2016)
Karpagavalli, S., Rani, K.U., Deepika, R., Kokila, P.: Isolated Tamil digits speech recognition using vector quantization. Int. J. Eng. Res. Technol. 1(4), 1–12 (2012)
Hejazi, S.A., Kazemi, R., Ghaemmaghami, S.: Isolated Persian digit recognition using a hybrid HMM-SVM. In: 2008 International Symposium on Intelligent Signal Processing and Communications Systems, pp. 1–4. IEEE (2009)
Hai, N.T., Van Thuyen, N., Mai, T.T., Van Toi, V.: MFCC-DTW algorithm for speech recognition in an intelligent wheelchair. In: 5th International Conference on Biomedical Engineering in Vietnam, pp. 417–421. Springer, Cham (2015)
Marković, B.G., Stevanović, G., Jovičić, S.T., Mijić, M., Galić, J.: Recognition of normal and whispered speech based on RASTA filtering and DTW algorithm. In: Proceedings of the International Conference IcETRAN-2017, pp. 8–2 (2017)
Permanasari, Y., Harahap, E.H., Ali, E.P.: Speech recognition using Dynamic Time Warping (DTW). In: Journal of Physics: Conference Series (Vol. 1366, No. 1, p. 012091). IOP Publishing (2019)
Shaikh, H., Mesquita, L.C., Araujo, S.D.C.S., Student, P.: Recognition of isolated spoken words and numeric using MFCC and DTW. Int. J. Eng. Sci., 10539 (2017)
Walid, M., Bousselmi, S., Dabbabi, K., Cherif, A.: Real-time implementation of isolated-word speech recognition system on raspberry Pi 3 Using WAT-MFCC. IJCSNS 19(3), 42 (2019)
Zhang, L., Wu, D., Han, X., Zhu, Z.: Feature extraction of underwater target signal using Mel frequency cepstrum coefficients based on acoustic vector sensor. J. Sens. (2016)
Paul, B., Phadikar, S., Bera, S.: Indian regional spoken language identification using deep learning approach. In: Proceedings of the Sixth International Conference on Mathematics and Computing, pp. 263–274. Springer, Singapore (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Paul, B., Paul, R., Bera, S., Phadikar, S. (2023). Isolated Bangla Spoken Digit and Word Recognition Using MFCC and DTW. In: Gyei-Kark, P., Jana, D.K., Panja, P., Abd Wahab, M.H. (eds) Engineering Mathematics and Computing. Studies in Computational Intelligence, vol 1042. Springer, Singapore. https://doi.org/10.1007/978-981-19-2300-5_16
Download citation
DOI: https://doi.org/10.1007/978-981-19-2300-5_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2299-2
Online ISBN: 978-981-19-2300-5
eBook Packages: EngineeringEngineering (R0)