Incorporation of Language Discriminative Information into Recurrent Neural Networks Models to LID Tasks

Salamea, Christian; Cordoba, Ricardo; D’Haro, Luis; Romero, David

doi:10.1007/978-3-030-46785-2_14

Christian Salamea^10,11,
Ricardo Cordoba¹¹,
Luis D’Haro¹¹ &
…
David Romero¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1154))

Included in the following conference series:

International Conference on Smart Technologies, Systems and Applications

401 Accesses

Abstract

Language Identification (LID) is an essential research topic in the Automatic Recognition Speech area. One of the most important characteristics relative to language is context information. In this article, considering a phonotactic approach where the phonetic units called “phone-grams” are used, in order to introduce such context information, a novel technique is proposed. Language discriminative information has been incorporated in the Recurrent Neural Network Language Models generation (RNNLMs) in the weights initialization stage to improve the Language Identification task. This technique has been evaluated using KALAKA-3 database that contains 108 h of audios of six languages to be recognized. The metric used in this work has been the Average Detection Cost metric C_avg. In relation to the phonetic units called “phone-grams” used in order to incorporate context information in the features used to train the RNNLM, it has been considered phone-grams of two elements “2phone-grams” and three elements “3phone-grams”, obtaining a relative improvement up to 17% and 15,44% respectively compared to the results obtaining using RNNLMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhancing recurrent neural network-based language models by word tokenization

Article Open access 27 April 2018

A hybrid input-type recurrent neural network for LVCSR language modeling

Article Open access 08 August 2016

Deep Learning Based Language Modeling for Domain-Specific Speech Recognition

References

Muthusamy, Y., Barnard, E., Cole, A.: Reviewing automatic language identification. IEEE Signal Process. Mag. 11(4), 33–41 (1994)
Article Google Scholar
D’Haro, L., Cordoba, R., Salamea, C., Echeverry, J.: Extended phone-likelihood ratio features and acoustic-based i-vectors for language recognition. In: Proceedings in Acoustics, Speech and Signal Processing, ICASSP, pp. 5342–5346 (2014)
Google Scholar
Salamea, C., D’Haro, L., Córdoba, R., Caraballo, M.: Incorporation of discriminative ngrams to improve a phonotactic language recognizer based on i-vectors. Procesamiento del Lenguaje Natural 51, 145–152 (2013)
Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernock, J., Khudanpur, S.: Recurrent neural net- work based language model. Interspeech 2010, 1045–1048 (2010)
Google Scholar
Brummer, N., et al.: Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation. IEEE Trans. Audio Speech Lang. Process. 15(7), 2072–2084 (2007)
Article MathSciNet Google Scholar
Mikolov, T., Kombrink, S., Deoras, A., Burget, L., Cernocky, J.: RNNLM-Recurrent neural network language modeling toolkit. In: Proceedings in ASRU Workshop, pp. 196 – 201 (2011). http://www.fit.vutbr.cz/~imikolov/rnnlm/
Zaremba, W., Sutskever I., Vinyals. O.: Recurrent neural network regularization. In: arXiv preprint arXiv:1409.2329 (2014)
Werbos, P.J.: Backpropagation through time: What it does and how to do it. In: Proceedings of the IEEE, vol. 78, no. 10, pp. 1550-1560 (1990)
Google Scholar
Mikolov T.: Statistical language models based on neural networks. Ph.D. dissertation, Ph. D. thesis, Brno University of Technology (2012)
Google Scholar
Rodriguez-Fuentes, L.J., Brümmer, N., Penagarikano, M., Varona, A., Bordel, G., Diez, M.: Kalaka-3: a database for the assessment of spoken language recognition technology on youtube audios. Lang. Resour. Eval. 50(2), 221–243 (2016)
Article Google Scholar
Martin, A., Greenberg C.: The 2009 NIST language recognition evaluation. In: Odyssey, p. 30 (2010)
Google Scholar
D’Haro, L., Cordoba, R.: The gth-lid system for the albayzin LRE12 evaluation. In: Proceedings. Iberspeech, pp. 528–539 (2012)
Google Scholar
Zissman, M., et al.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4(1), 31 (1996)
Article Google Scholar
Ace, P., Schwarz, P., Ace, V.: Phoneme recognition based on long temporal context (2009)
Google Scholar
Brummer, N., Van Leeuwen, D.: On calibration of language recognition scores. In: IEEE Odyssey Speaker and Language Recognition Workshop, pp. 1–8 (2006)
Google Scholar
BenZeghiba, M., Gauvain J., Lamel, L.: Language score calibration using adapted Gaussian back-end. In: 10th Annual Conference of the International Speech Communication Association, pp. 2191–2194 (2009)
Google Scholar
Caraballo, M., D’Haro, L., Cordoba, R., San-Segundo, R., Pardo, J.: A discriminative text categorization technique for language identification built into a PPRLM System. In: FALA 2010 VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop (2010)
Google Scholar
Cavnar, W., Trenkle, J.: N-gram-based text categorization. In: 3rd annual Proceedings of SDAIR-94 (1994)
Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification and scene analysis (1973)
Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Acoustics, Speech and Signal Processing, pp. 357–366 (1980)
Google Scholar
Goodman, J.T.: A bit of progress in language modeling. Comput. Speech Lang. 15(4), 403–434 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Interaction, Robotics and Automation Research Group, Universidad Politecnica Salesiana, Calle Vieja 12-30 y Elia Liut, Cuenca, Ecuador
Christian Salamea & David Romero
Speech Technology Group, Information and Telecommunications Center, Universidad Politecnica de Madrid, Ciudad Universitaria Av. Complutense, 30, 28040, Madrid, Spain
Christian Salamea, Ricardo Cordoba & Luis D’Haro

Authors

Christian Salamea
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Cordoba
View author publications
You can also search for this author in PubMed Google Scholar
Luis D’Haro
View author publications
You can also search for this author in PubMed Google Scholar
David Romero
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Salamea .

Editor information

Editors and Affiliations

Universidad Politécnica Salesiana, Quito, Ecuador
Fabián R. Narváez
Universidad Politécnica Salesiana, Quito, Ecuador
Diego F. Vallejo
Universidad Politécnica Salesiana, Quito, Ecuador
Paulina A. Morillo
Universidad Politécnica Salesiana, Quito, Ecuador
Julio R. Proaño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salamea, C., Cordoba, R., D’Haro, L., Romero, D. (2020). Incorporation of Language Discriminative Information into Recurrent Neural Networks Models to LID Tasks. In: Narváez, F., Vallejo, D., Morillo, P., Proaño, J. (eds) Smart Technologies, Systems and Applications. SmartTech-IC 2019. Communications in Computer and Information Science, vol 1154. Springer, Cham. https://doi.org/10.1007/978-3-030-46785-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-46785-2_14
Published: 01 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46784-5
Online ISBN: 978-3-030-46785-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Incorporation of Language Discriminative Information into Recurrent Neural Networks Models to LID Tasks

Abstract

Access this chapter

Similar content being viewed by others

Enhancing recurrent neural network-based language models by word tokenization

A hybrid input-type recurrent neural network for LVCSR language modeling

Deep Learning Based Language Modeling for Domain-Specific Speech Recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Incorporation of Language Discriminative Information into Recurrent Neural Networks Models to LID Tasks

Abstract

Access this chapter

Similar content being viewed by others

Enhancing recurrent neural network-based language models by word tokenization

A hybrid input-type recurrent neural network for LVCSR language modeling

Deep Learning Based Language Modeling for Domain-Specific Speech Recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation