Skip to main content

Incorporation of Language Discriminative Information into Recurrent Neural Networks Models to LID Tasks

  • Conference paper
  • First Online:
Smart Technologies, Systems and Applications (SmartTech-IC 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1154))

  • 401 Accesses

Abstract

Language Identification (LID) is an essential research topic in the Automatic Recognition Speech area. One of the most important characteristics relative to language is context information. In this article, considering a phonotactic approach where the phonetic units called “phone-grams” are used, in order to introduce such context information, a novel technique is proposed. Language discriminative information has been incorporated in the Recurrent Neural Network Language Models generation (RNNLMs) in the weights initialization stage to improve the Language Identification task. This technique has been evaluated using KALAKA-3 database that contains 108 h of audios of six languages to be recognized. The metric used in this work has been the Average Detection Cost metric Cavg. In relation to the phonetic units called “phone-grams” used in order to incorporate context information in the features used to train the RNNLM, it has been considered phone-grams of two elements “2phone-grams” and three elements “3phone-grams”, obtaining a relative improvement up to 17% and 15,44% respectively compared to the results obtaining using RNNLMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Muthusamy, Y., Barnard, E., Cole, A.: Reviewing automatic language identification. IEEE Signal Process. Mag. 11(4), 33–41 (1994)

    Article  Google Scholar 

  2. D’Haro, L., Cordoba, R., Salamea, C., Echeverry, J.: Extended phone-likelihood ratio features and acoustic-based i-vectors for language recognition. In: Proceedings in Acoustics, Speech and Signal Processing, ICASSP, pp. 5342–5346 (2014)

    Google Scholar 

  3. Salamea, C., D’Haro, L., Córdoba, R., Caraballo, M.: Incorporation of discriminative ngrams to improve a phonotactic language recognizer based on i-vectors. Procesamiento del Lenguaje Natural 51, 145–152 (2013)

    Google Scholar 

  4. Mikolov, T., Karafiát, M., Burget, L., Cernock, J., Khudanpur, S.: Recurrent neural net- work based language model. Interspeech 2010, 1045–1048 (2010)

    Google Scholar 

  5. Brummer, N., et al.: Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation. IEEE Trans. Audio Speech Lang. Process. 15(7), 2072–2084 (2007)

    Article  MathSciNet  Google Scholar 

  6. Mikolov, T., Kombrink, S., Deoras, A., Burget, L., Cernocky, J.: RNNLM-Recurrent neural network language modeling toolkit. In: Proceedings in ASRU Workshop, pp. 196 – 201 (2011). http://www.fit.vutbr.cz/~imikolov/rnnlm/

  7. Zaremba, W., Sutskever I., Vinyals. O.: Recurrent neural network regularization. In: arXiv preprint arXiv:1409.2329 (2014)

  8. Werbos, P.J.: Backpropagation through time: What it does and how to do it. In: Proceedings of the IEEE, vol. 78, no. 10, pp. 1550-1560 (1990)

    Google Scholar 

  9. Mikolov T.: Statistical language models based on neural networks. Ph.D. dissertation, Ph. D. thesis, Brno University of Technology (2012)

    Google Scholar 

  10. Rodriguez-Fuentes, L.J., Brümmer, N., Penagarikano, M., Varona, A., Bordel, G., Diez, M.: Kalaka-3: a database for the assessment of spoken language recognition technology on youtube audios. Lang. Resour. Eval. 50(2), 221–243 (2016)

    Article  Google Scholar 

  11. Martin, A., Greenberg C.: The 2009 NIST language recognition evaluation. In: Odyssey, p. 30 (2010)

    Google Scholar 

  12. D’Haro, L., Cordoba, R.: The gth-lid system for the albayzin LRE12 evaluation. In: Proceedings. Iberspeech, pp. 528–539 (2012)

    Google Scholar 

  13. Zissman, M., et al.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4(1), 31 (1996)

    Article  Google Scholar 

  14. Ace, P., Schwarz, P., Ace, V.: Phoneme recognition based on long temporal context (2009)

    Google Scholar 

  15. Brummer, N., Van Leeuwen, D.: On calibration of language recognition scores. In: IEEE Odyssey Speaker and Language Recognition Workshop, pp. 1–8 (2006)

    Google Scholar 

  16. BenZeghiba, M., Gauvain J., Lamel, L.: Language score calibration using adapted Gaussian back-end. In: 10th Annual Conference of the International Speech Communication Association, pp. 2191–2194 (2009)

    Google Scholar 

  17. Caraballo, M., D’Haro, L., Cordoba, R., San-Segundo, R., Pardo, J.: A discriminative text categorization technique for language identification built into a PPRLM System. In: FALA 2010 VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop (2010)

    Google Scholar 

  18. Cavnar, W., Trenkle, J.: N-gram-based text categorization. In: 3rd annual Proceedings of SDAIR-94 (1994)

    Google Scholar 

  19. Duda, R., Hart, P., Stork, D.: Pattern Classification and scene analysis (1973)

    Google Scholar 

  20. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Acoustics, Speech and Signal Processing, pp. 357–366 (1980)

    Google Scholar 

  21. Goodman, J.T.: A bit of progress in language modeling. Comput. Speech Lang. 15(4), 403–434 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Salamea .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Salamea, C., Cordoba, R., D’Haro, L., Romero, D. (2020). Incorporation of Language Discriminative Information into Recurrent Neural Networks Models to LID Tasks. In: Narváez, F., Vallejo, D., Morillo, P., Proaño, J. (eds) Smart Technologies, Systems and Applications. SmartTech-IC 2019. Communications in Computer and Information Science, vol 1154. Springer, Cham. https://doi.org/10.1007/978-3-030-46785-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46785-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46784-5

  • Online ISBN: 978-3-030-46785-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics