A lazy learning-based language identification from speech using MFCC-2 features

Mukherjee, Himadri; Obaidullah, Sk Md; Santosh, K. C.; Phadikar, Santanu; Roy, Kaushik

doi:10.1007/s13042-019-00928-3

A lazy learning-based language identification from speech using MFCC-2 features

Original Article
Published: 28 January 2019

Volume 11, pages 1–14, (2020)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Himadri Mukherjee¹,
Sk Md Obaidullah²,
K. C. Santosh ORCID: orcid.org/0000-0003-4176-0236³,
Santanu Phadikar⁴ &
…
Kaushik Roy¹

626 Accesses
35 Citations
Explore all metrics

Abstract

Developing an automatic speech recognition system for multilingual countries like India is a challenging task due to the fact that the people are inured to using multiple languages while talking. This makes language identification from speech an important and essential task prior to recognition of the same. In this paper a system is proposed towards language identification from multilingual speech signals. A new second level Mel frequency cepstral coefficient-based feature named MFCC-2 that handles the large and uneven dimensionality of MFCC has been used to characterize languages in the thick of English, Bangla and Hindi. The system has been tested with recordings of as many as 12,000 utterances of numerals and 41,884 clips extracted from YouTube videos considering background music, data from multiple environments, avoidance of noise suppression and use of keywords from different languages in a single phrase. The highest and average accuracies (for Top-3 classifiers from a pool of nine classifiers) of 98.09% and 95.54%, respectively were achieved for YouTube data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LIFA: Language identification from audio with LPCC-G features

Article 14 December 2023

Spoken Language Identification of Indian Languages Using MFCC Features

Automatic spoken language identification using MFCC based time series features

Article 03 January 2022

References

Ali R, Naim I (2015) User feedback based metasearching using neural network. Int J Mach Learn Cybern 6(2):265–275
Article Google Scholar
Audacity. http://www.audacityteam.org/. Accessed 20 Oct 2018
Bang S, Kang J, Jhun M, Kim E (2017) Hierarchically penalized support vector machine with grouped variables. Int J Mach Learn Cybern 8(4):1211–1221
Article Google Scholar
Bekker AJ, Opher I, Lapidot I, Goldberger J (2016) Intra-cluster training strategy for deep learning with applications to language identification. In: MLSP, pp 1–6
Berkling KM, Barnard E (1994) Language identification of six languages based on a common set of broad phonemes. In: ICSLP, pp 1891–1894
Bhalke D, Rao CR, Bormane DS (2016) Automatic musical instrument classification using fractional fourier transform based-mfcc features and counter propagation neural network. J Intell Inf Syst 46(3):425–446
Article Google Scholar
Bouguelia MR, Nowaczyk S, Santosh K, Verikas A (2018) Agreeing to disagree: active learning with noisy labels without crowdsourcing. Int J Mach Learn Cybern 9(8):1307–1319
Article Google Scholar
Bracewell RN, Bracewell RN (1986) The Fourier transform and its applications, vol 31999. McGraw-Hill, New York
MATH Google Scholar
Chandrasekhar V, Sargin ME, Ross DA (2011) Automatic language identification in music videos with low level audio and visual features. In: ICASSP, pp 5724–5727
Chen S, Cao J, Gan L, Song Q, Han D (2018) Experimental study on generalization capability of extended naive bayesian classifier. Int J Mach Learn Cybern 9(1):5–19
Article Google Scholar
Cleary JG, Trigg LE (1995) K*: an instance-based learner using an entropic distance measure identification. In: 12th ICML, pp 108–114
Chapter Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Ethnologue. http://www.ethnologue.com/. Accessed 20 Oct 2018
Fei J, Wang T (2018) Adaptive fuzzy-neural-network based on rbfnn control for active power filter. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-018-0792-y
Article Google Scholar
Galván IM, Valls JM, García M, Isasi P (2011) A lazy learning approach for building classification models. Int J Intell Syst 26(8):773–786
Article Google Scholar
Garcia EK, Feldman S, Gupta MR, Srivastava S (2009) Completely lazy learning. IEEE Trans Knowl Data Eng 9:1274–1285
Google Scholar
Ghazikhani A, Monsefi R, Yazdi HS (2014) Online neural network model for non-stationary and imbalanced data stream classification. Int J Mach Learn Cybern 5(1):51–62
Article Google Scholar
Gheisari S, Meybodi M, Dehghan M, Ebadzadeh M (2017) Bayesian network structure training based on a game of learning automata. Int J Mach Learn Cybern 8(4):1093–1105
Article Google Scholar
Haldar R, Mishra PK (2016) A novel approach for multilingual speech recognition with back propagation artificial neural network. Int J Recent Innov Trends Comput Commun 4(5):312–318
Google Scholar
Halder C, Obaidullah SM, Roy K (2015) Effect of writer information on bangla handwritten character recognition. In: Computer vision, pattern recognition, image processing and graphics (NCVPRIPG), 2015 fifth national conference on, IEEE, pp 1–4
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor 11(1):10–18
Article Google Scholar
Hieronymus J, Kadambe S (1997) Robust spoken language identification using large vocabulary speech recognition. In: ICASSP, pp 1111–1114
Kashiwagi Y, Zhang C, Saito D, Minematsu N (2016) Divergence estimation based on deep neural networks and its use for language identification. In: ICASSP, pp 5435–5439
Koolagudi SG, Rastogi D, Rao KS (2012) Identification of language using mel-frequency cepstral coefficients (mfcc). Proc Eng 38:3391–3398
Article Google Scholar
Lamel LF, Gauvain JL (1994) Language identification using phone-based acoustic likelihoods. ICASSP 1:293–296
Google Scholar
Lopez-Moreno I, Gonzalez-Dominguez J, Plchot O, Martinez D, Gonzalez-Rodriguez J, Moreno P (2014) Automatic language identification using deep neural networks. In: ICASSP, pp 5374–5378
Lowe S, Demedts A, Gillick L, Mandel M, Peskin B (1994) Language identification via large vocabulary speaker independent continuous speech recognition. In: ARPA human language technology workshop, pp 437–441
Mendoza S, Gillick L, Ito Y, Lowe S, Newman M (1996) Automatic language identification using large vocabulary continuous speech recognition. In: ICASSP, pp 785–788
Mohanty S (2011) Phonotactic model for spoken language identification in indian language perspective. Int J Comput Appl 19(9):18–24
Google Scholar
Muda L, Begam M, Elamvazuthi I (2010) Voice recognition algorithms using mel frequency cepstral coefficient (mfcc) and dynamic time warping (dtw) techniques. Int J Comput Appl 2(3):138–143
Google Scholar
Mukherjee H, Dhar A, Phadikar S, Roy K (2017) Recal-a language identification system. In: Signal processing and communication (ICSPC), 2017 international conference on, IEEE, pp 300–304
Mukherjee H, Obaidullah SM, Santosh K, Phadikar S, Roy K (2018) Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. Int J Speech Technol 21(4):735–760
Article Google Scholar
Muthusamy YK, Berkling KM, T Arai RAC, Barnard E (1993) A comparison of approaches to automatic language identification using telephone speech. In: Eurospeech, pp 1307–1310
Niesler T, Willett D (2006) Language identification and multilingual speech recognition using discriminatively trained acoustic models. In: Multilingual speech and language processing
Obaidullah SM, Halder C, Santosh KC, Das N, Roy K (2017) PHDIndic_11: page-level handwritten document image dataset of 11 official indic scripts for script identification. Multimed Tools Appl 77(2):1643–1678
Article Google Scholar
Peng Z, Hu Q, Dang J (2017) Multi-kernel svm based depression recognition using social media data. Int J Mach Learn Cybern 10(1):43–57
Article Google Scholar
Philippot E, Santosh K, Belaïd A, Belaïd Y (2015) Bayesian networks for incomplete data analysis in form processing. Int J Mach Learn Cybern 6(3):347–363
Article Google Scholar
Rai MK, Neetish, Fahad MS, Yadav J, Rao KS (2016) Language identification using plda based on i-vector in noisy environment. In: ICACCI, pp 1014–1020
Ranjan S, Yu C, Zhang C, Kelly F, Hansen JHL (2016) Language recognition using deep neural network with very limited training data. In: ICASSP, pp 5830–5834
Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. Signal Process Lett 22(10):1671–1675
Article Google Scholar
Sharkawy AB, El-Sharief MA, Soliman MES (2014) Surface roughness prediction in end milling process using intelligent systems. Int J Mach Learn Cybern 5(1):135–150
Article Google Scholar
Singer E, Torres-Carrasquillo P, Gleason T, Campbell W, Reynolds D (2003) Acoustic, phonetic, and discriminative approaches to automatic language identification. In: Eurospeech, pp 1345–1348
Singha J, Laskar RH (2017) Hand gesture recognition using two-level speed normalization, feature selection and classifier fusion. Multimed Syst 23(4):499–514
Article Google Scholar
Vajda S, Santosh K (2016) A fast k-nearest neighbor classifier using unsupervised clustering. In: International conference on recent trends in image processing and pattern recognition, Springer, pp 185–193
Verma P, Das PK (2015) i-vectors in speech processing applications: a survey. Int J Speech Technol 18(4):529–546
Article Google Scholar
Webb GI (2010) Lazy learning, Springer US, Boston, pp 571–572. https://doi.org/10.1007/978-0-387-30164-8_443
Google Scholar
(WEKA) CP. http://weka.sourceforge.net/doc.stable/. Accessed 20 Oct 2018
Wong K, Siu M (2004) Automatic language identification using discrete hidden markov model. In: ICSLP, pp 399–402
Yang L, Xu Z (2017) Feature extraction by pca and diagnosis of breast tumors using SVM with DE-based parameter tuning. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-017-0741-1
Article Google Scholar
Yang X, Dong Y, Li J (2017) Review of data features-based music emotion recognition methods. Multimed Syst 24(4):365–389
Article Google Scholar
YouTube. https://www.youtube.com/. Accessed 20 Oct 2018
Zhang Y (2017) A projected-based neural network method for second-order cone programming. Int J Mach Learn Cybern 8(6):1907–1914
Article Google Scholar
Zissman MA, Berkling KM (2001) Automatic language identification. Speech Commun 35:115–124
Article Google Scholar
Zissman MA, Singer E (1994) Automatic language identification of telephone speech messages using phoneme recognition and n-gram modeling. In: ICASSP, pp 305–308

Download references

Acknowledgements

The authors would like to sincerely thank Mr. Chayan Halder, Miss Ankita Dhar and Miss Payel Rakshit of Department of Computer Science, West Bengal State University for extending a helping hand as and when required during the entire span of this work.

Author information

Authors and Affiliations

Department of Computer Science, West Bengal State University, Kolkata, India
Himadri Mukherjee & Kaushik Roy
Department of Computer Science and Engineering, Aliah University, Kolkata, India
Sk Md Obaidullah
Department of Computer Science, The University of South Dakota, Vermillion, SD, 57069, USA
K. C. Santosh
Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, India
Santanu Phadikar

Authors

Himadri Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Sk Md Obaidullah
View author publications
You can also search for this author in PubMed Google Scholar
K. C. Santosh
View author publications
You can also search for this author in PubMed Google Scholar
Santanu Phadikar
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. C. Santosh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mukherjee, H., Obaidullah, S.M., Santosh, K.C. et al. A lazy learning-based language identification from speech using MFCC-2 features. Int. J. Mach. Learn. & Cyber. 11, 1–14 (2020). https://doi.org/10.1007/s13042-019-00928-3

Download citation

Received: 15 May 2018
Accepted: 14 January 2019
Published: 28 January 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s13042-019-00928-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A lazy learning-based language identification from speech using MFCC-2 features

Abstract

Access this article

Similar content being viewed by others

LIFA: Language identification from audio with LPCC-G features

Spoken Language Identification of Indian Languages Using MFCC Features

Automatic spoken language identification using MFCC based time series features

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A lazy learning-based language identification from speech using MFCC-2 features

Abstract

Access this article

Similar content being viewed by others

LIFA: Language identification from audio with LPCC-G features

Spoken Language Identification of Indian Languages Using MFCC Features

Automatic spoken language identification using MFCC based time series features

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation