Grapheme to Phoneme Conversion for Malayalam Speech Using Encoder-Decoder Architecture

Priyamvada, R.; Govind, D.; Menon, Vijay Krishna; Premjith, B.; Soman, K. P.

doi:10.1007/978-981-16-6624-7_5

R. Priyamvada ORCID: orcid.org/0000-0003-4455-4104⁸,
D. Govind⁸,
Vijay Krishna Menon⁸,
B. Premjith ORCID: orcid.org/0000-0003-1188-1838⁸ &
…
K. P. Soman⁸

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 266))

423 Accesses
1 Citations

Abstract

The two key components of Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems are language modeling and acoustic modeling. The language model generates a lexicon, which is a pronunciation dictionary. A lexicon can be created using a variety of approaches. For low-resource languages, rule-based methods are typically employed to build the lexicon. However, because the corpus is often tiny, this methodology does not account for all possible pronunciation variances. As a result, low-resource languages like Malayalam require a method for developing a comprehensive lexicon as the corpus grows. In this work, we explored deep learning based encoder-decoder models for grapheme-to-phoneme (G2P) conversion in Malayalam. Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BiLSTM) encoder models with varying embedding dimensions were used to create the encoder model. The performance of the deep learning models used for G2P conversion was measured using the Word Error Rate (WER) and Phoneme Error Rate (PER). With 1024 embedding dimensions, the encoder using the BiLSTM model had the maximum accuracy of 98.04% and the lowest PER of 2.57% at the phoneme level, and the highest accuracy of 90.58% and the lowest WER of 9.42% at the word level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Babu, L.B., George, A., Sreelakshmi, K.R., Mary, L.: Continuous speech recognition system for malayalam language using Kaldi. In: 2018 International Conference on Emerging Trends and Innovations In Engineering And Technological Research. ICETIETR, pp. 1–4. IEEE, Ernakulam (2018). https://doi.org/10.1109/ICETIETR.2018.8529045
Young, S.: The HTK hidden Markov model toolkit: Design and philosophy. University of Cambridge, Department of Engineering Cambridge, England (1994)
Google Scholar
Sri, K.V.L., Srinivasan, M., Nair, R.R., Priya, K.J., Gupta, D.: Kaldi recipe in Hindi for word level recognition and phoneme level transcription. In: Procedia Computer Science, pp. 2476–2485 (2020)
Google Scholar
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., Andrew Y. Ng.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)
Kumar, C.S., Govind, D., Nijil, C., Manish, N.: Grapheme to phone conversion for Hindi. In: Oriental COCOSDA. Malaysia (2006)
Google Scholar
Chen, S.F.: Conditional and joint models for grapheme-to-phoneme conversion. In: EUROSPEECH-2003, pp. 2033–2036. Geneva (2003)
Google Scholar
Taylor, P.: Hidden Markov models for grapheme to phoneme conversion. In: INTERSPEECH-2005, pp. 1973–1976. Lisbon (2005)
Google Scholar
Rao, K., Peng, F., Sak, H., Beaufays, F.: Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP, pp. 4225–4229. IEEE, South Brisbane (2015). https://doi.org/10.1109/ICASSP.2015.7178767
Yolchuyeva, S., Németh, G., Gyires-Tóth, B.: Grapheme-to-phoneme conversion with convolutional neural networks. Appl. Sci. 96, 1143 (2019)
Google Scholar
Yolchuyeva, S., Németh, G., Gyires-Tóth, B.: Transformer based grapheme-to-phoneme conversion. In: INTERSPEECH-2019, ISCA (2019). https://doi.org/10.21437/interspeech.2019-1954
Premjith, B., Kumar, M.A., Soman, K.P.: Neural machine translation system for English to Indian language translation using MTIL parallel corpus. J. Intel. Syst. 28(3), 387–398 (2019)
Google Scholar
Premjith, B., Soman, K.P., Poornachandran, P.: A deep learning based Part-of-Speech (POS) tagger for Sanskrit language by embedding character level features. In: Proceedings of the 10th annual meeting of the Forum for Information Retrieval Evaluation, pp. 56–60. Association for Computing Machinery, New York, USA (2018)
Google Scholar
Krishnan, V.V., Anto, P.B.: Features of wavelet packet decomposition and discrete wavelet transform for Malayalam speech recognition. Int. J. Recent Trends Eng. 1(2), 93–96 (2009)
Google Scholar
Lekshmi, K.R., Jithesh, V.S., Sherly, E.: Malayalam speech corpus: design and development for dravidian language. In: Proceedings of the WILDRE5–5th Workshop on Indian Language Data: Resources and Evaluation, pp. 25–28. European Language Resources Association (ELRA), Marseille (2020)
Google Scholar
Nair, S.S., Rechitha, C.R., Kumar, C.S.: Rule-based grapheme to phoneme converter for Malayalam. Int. J. Comput. Linguist. Nat. Lang. Process. 2(7), 417–420 (2013)
Google Scholar
Baby, A., Nishanthi, N.L., Thomas, A.L., Murthy, H.A.: A unified parser for developing Indian language text to speech synthesizers. In: International Conference on Text, Speech, and Dialogue. TSD 2016, LNCS, vol. 9924, pp. 514–521. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45510-5_59

Download references

Author information

Authors and Affiliations

Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
R. Priyamvada, D. Govind, Vijay Krishna Menon, B. Premjith & K. P. Soman

Authors

R. Priyamvada
View author publications
You can also search for this author in PubMed Google Scholar
D. Govind
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Krishna Menon
View author publications
You can also search for this author in PubMed Google Scholar
B. Premjith
View author publications
You can also search for this author in PubMed Google Scholar
K. P. Soman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. Premjith .

Editor information

Editors and Affiliations

School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, Odisha, India
Suresh Chandra Satapathy
Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
Peter Peer
College of Computing, Michigan Technological University, Michigan, MI, USA
Jinshan Tang
Shri Ramswaroop Memorial College of Engineering and Management (SRMCEM), Lucknow, India
Vikrant Bhateja
Department of Electronics and Communication Engineering, National Institute of Technology (NIT) Mizoram, Aizawl, Mizoram, India
Anumoy Ghosh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Priyamvada, R., Govind, D., Menon, V.K., Premjith, B., Soman, K.P. (2022). Grapheme to Phoneme Conversion for Malayalam Speech Using Encoder-Decoder Architecture. In: Satapathy, S.C., Peer, P., Tang, J., Bhateja, V., Ghosh, A. (eds) Intelligent Data Engineering and Analytics. Smart Innovation, Systems and Technologies, vol 266. Springer, Singapore. https://doi.org/10.1007/978-981-16-6624-7_5

Download citation

DOI: https://doi.org/10.1007/978-981-16-6624-7_5
Published: 28 February 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6623-0
Online ISBN: 978-981-16-6624-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Grapheme to Phoneme Conversion for Malayalam Speech Using Encoder-Decoder Architecture