Abstract
Language Identification (LID) is an essential research topic in the Automatic Recognition Speech area. One of the most important characteristics relative to language is context information. In this article, considering a phonotactic approach where the phonetic units called “phone-grams” are used, in order to introduce such context information, a novel technique is proposed. Language discriminative information has been incorporated in the Recurrent Neural Network Language Models generation (RNNLMs) in the weights initialization stage to improve the Language Identification task. This technique has been evaluated using KALAKA-3 database that contains 108 h of audios of six languages to be recognized. The metric used in this work has been the Average Detection Cost metric Cavg. In relation to the phonetic units called “phone-grams” used in order to incorporate context information in the features used to train the RNNLM, it has been considered phone-grams of two elements “2phone-grams” and three elements “3phone-grams”, obtaining a relative improvement up to 17% and 15,44% respectively compared to the results obtaining using RNNLMs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Muthusamy, Y., Barnard, E., Cole, A.: Reviewing automatic language identification. IEEE Signal Process. Mag. 11(4), 33–41 (1994)
D’Haro, L., Cordoba, R., Salamea, C., Echeverry, J.: Extended phone-likelihood ratio features and acoustic-based i-vectors for language recognition. In: Proceedings in Acoustics, Speech and Signal Processing, ICASSP, pp. 5342–5346 (2014)
Salamea, C., D’Haro, L., Córdoba, R., Caraballo, M.: Incorporation of discriminative ngrams to improve a phonotactic language recognizer based on i-vectors. Procesamiento del Lenguaje Natural 51, 145–152 (2013)
Mikolov, T., Karafiát, M., Burget, L., Cernock, J., Khudanpur, S.: Recurrent neural net- work based language model. Interspeech 2010, 1045–1048 (2010)
Brummer, N., et al.: Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation. IEEE Trans. Audio Speech Lang. Process. 15(7), 2072–2084 (2007)
Mikolov, T., Kombrink, S., Deoras, A., Burget, L., Cernocky, J.: RNNLM-Recurrent neural network language modeling toolkit. In: Proceedings in ASRU Workshop, pp. 196 – 201 (2011). http://www.fit.vutbr.cz/~imikolov/rnnlm/
Zaremba, W., Sutskever I., Vinyals. O.: Recurrent neural network regularization. In: arXiv preprint arXiv:1409.2329 (2014)
Werbos, P.J.: Backpropagation through time: What it does and how to do it. In: Proceedings of the IEEE, vol. 78, no. 10, pp. 1550-1560 (1990)
Mikolov T.: Statistical language models based on neural networks. Ph.D. dissertation, Ph. D. thesis, Brno University of Technology (2012)
Rodriguez-Fuentes, L.J., Brümmer, N., Penagarikano, M., Varona, A., Bordel, G., Diez, M.: Kalaka-3: a database for the assessment of spoken language recognition technology on youtube audios. Lang. Resour. Eval. 50(2), 221–243 (2016)
Martin, A., Greenberg C.: The 2009 NIST language recognition evaluation. In: Odyssey, p. 30 (2010)
D’Haro, L., Cordoba, R.: The gth-lid system for the albayzin LRE12 evaluation. In: Proceedings. Iberspeech, pp. 528–539 (2012)
Zissman, M., et al.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4(1), 31 (1996)
Ace, P., Schwarz, P., Ace, V.: Phoneme recognition based on long temporal context (2009)
Brummer, N., Van Leeuwen, D.: On calibration of language recognition scores. In: IEEE Odyssey Speaker and Language Recognition Workshop, pp. 1–8 (2006)
BenZeghiba, M., Gauvain J., Lamel, L.: Language score calibration using adapted Gaussian back-end. In: 10th Annual Conference of the International Speech Communication Association, pp. 2191–2194 (2009)
Caraballo, M., D’Haro, L., Cordoba, R., San-Segundo, R., Pardo, J.: A discriminative text categorization technique for language identification built into a PPRLM System. In: FALA 2010 VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop (2010)
Cavnar, W., Trenkle, J.: N-gram-based text categorization. In: 3rd annual Proceedings of SDAIR-94 (1994)
Duda, R., Hart, P., Stork, D.: Pattern Classification and scene analysis (1973)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Acoustics, Speech and Signal Processing, pp. 357–366 (1980)
Goodman, J.T.: A bit of progress in language modeling. Comput. Speech Lang. 15(4), 403–434 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Salamea, C., Cordoba, R., D’Haro, L., Romero, D. (2020). Incorporation of Language Discriminative Information into Recurrent Neural Networks Models to LID Tasks. In: Narváez, F., Vallejo, D., Morillo, P., Proaño, J. (eds) Smart Technologies, Systems and Applications. SmartTech-IC 2019. Communications in Computer and Information Science, vol 1154. Springer, Cham. https://doi.org/10.1007/978-3-030-46785-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-46785-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46784-5
Online ISBN: 978-3-030-46785-2
eBook Packages: Computer ScienceComputer Science (R0)