Skip to main content

Grapheme to Phoneme Conversion for Malayalam Speech Using Encoder-Decoder Architecture

  • Conference paper
  • First Online:
Intelligent Data Engineering and Analytics

Abstract

The two key components of Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems are language modeling and acoustic modeling. The language model generates a lexicon, which is a pronunciation dictionary. A lexicon can be created using a variety of approaches. For low-resource languages, rule-based methods are typically employed to build the lexicon. However, because the corpus is often tiny, this methodology does not account for all possible pronunciation variances. As a result, low-resource languages like Malayalam require a method for developing a comprehensive lexicon as the corpus grows. In this work, we explored deep learning based encoder-decoder models for grapheme-to-phoneme (G2P) conversion in Malayalam. Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BiLSTM) encoder models with varying embedding dimensions were used to create the encoder model. The performance of the deep learning models used for G2P conversion was measured using the Word Error Rate (WER) and Phoneme Error Rate (PER). With 1024 embedding dimensions, the encoder using the BiLSTM model had the maximum accuracy of 98.04% and the lowest PER of 2.57% at the phoneme level, and the highest accuracy of 90.58% and the lowest WER of 9.42% at the word level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Babu, L.B., George, A., Sreelakshmi, K.R., Mary, L.: Continuous speech recognition system for malayalam language using Kaldi. In: 2018 International Conference on Emerging Trends and Innovations In Engineering And Technological Research. ICETIETR, pp. 1–4. IEEE, Ernakulam (2018). https://doi.org/10.1109/ICETIETR.2018.8529045

  2. Young, S.: The HTK hidden Markov model toolkit: Design and philosophy. University of Cambridge, Department of Engineering Cambridge, England (1994)

    Google Scholar 

  3. Sri, K.V.L., Srinivasan, M., Nair, R.R., Priya, K.J., Gupta, D.: Kaldi recipe in Hindi for word level recognition and phoneme level transcription. In: Procedia Computer Science, pp. 2476–2485 (2020)

    Google Scholar 

  4. Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., Andrew Y. Ng.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)

  5. Kumar, C.S., Govind, D., Nijil, C., Manish, N.: Grapheme to phone conversion for Hindi. In: Oriental COCOSDA. Malaysia (2006)

    Google Scholar 

  6. Chen, S.F.: Conditional and joint models for grapheme-to-phoneme conversion. In: EUROSPEECH-2003, pp. 2033–2036. Geneva (2003)

    Google Scholar 

  7. Taylor, P.: Hidden Markov models for grapheme to phoneme conversion. In: INTERSPEECH-2005, pp. 1973–1976. Lisbon (2005)

    Google Scholar 

  8. Rao, K., Peng, F., Sak, H., Beaufays, F.: Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP, pp. 4225–4229. IEEE, South Brisbane (2015). https://doi.org/10.1109/ICASSP.2015.7178767

  9. Yolchuyeva, S., Németh, G., Gyires-Tóth, B.: Grapheme-to-phoneme conversion with convolutional neural networks. Appl. Sci. 96, 1143 (2019)

    Google Scholar 

  10. Yolchuyeva, S., Németh, G., Gyires-Tóth, B.: Transformer based grapheme-to-phoneme conversion. In: INTERSPEECH-2019, ISCA (2019). https://doi.org/10.21437/interspeech.2019-1954

  11. Premjith, B., Kumar, M.A., Soman, K.P.: Neural machine translation system for English to Indian language translation using MTIL parallel corpus. J. Intel. Syst. 28(3), 387–398 (2019)

    Google Scholar 

  12. Premjith, B., Soman, K.P., Poornachandran, P.: A deep learning based Part-of-Speech (POS) tagger for Sanskrit language by embedding character level features. In: Proceedings of the 10th annual meeting of the Forum for Information Retrieval Evaluation, pp. 56–60. Association for Computing Machinery, New York, USA (2018)

    Google Scholar 

  13. Krishnan, V.V., Anto, P.B.: Features of wavelet packet decomposition and discrete wavelet transform for Malayalam speech recognition. Int. J. Recent Trends Eng. 1(2), 93–96 (2009)

    Google Scholar 

  14. Lekshmi, K.R., Jithesh, V.S., Sherly, E.: Malayalam speech corpus: design and development for dravidian language. In: Proceedings of the WILDRE5–5th Workshop on Indian Language Data: Resources and Evaluation, pp. 25–28. European Language Resources Association (ELRA), Marseille (2020)

    Google Scholar 

  15. Nair, S.S., Rechitha, C.R., Kumar, C.S.: Rule-based grapheme to phoneme converter for Malayalam. Int. J. Comput. Linguist. Nat. Lang. Process. 2(7), 417–420 (2013)

    Google Scholar 

  16. Baby, A., Nishanthi, N.L., Thomas, A.L., Murthy, H.A.: A unified parser for developing Indian language text to speech synthesizers. In: International Conference on Text, Speech, and Dialogue. TSD 2016, LNCS, vol. 9924, pp. 514–521. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45510-5_59

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. Premjith .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Priyamvada, R., Govind, D., Menon, V.K., Premjith, B., Soman, K.P. (2022). Grapheme to Phoneme Conversion for Malayalam Speech Using Encoder-Decoder Architecture. In: Satapathy, S.C., Peer, P., Tang, J., Bhateja, V., Ghosh, A. (eds) Intelligent Data Engineering and Analytics. Smart Innovation, Systems and Technologies, vol 266. Springer, Singapore. https://doi.org/10.1007/978-981-16-6624-7_5

Download citation

Publish with us

Policies and ethics