Noise Robust Speaker Identification Using RASTA–MFCC Feature with Quadrilateral Filter Bank Structure

Selva Nidhyananthan, S.; Shantha Selva Kumari, R.; Senthur Selvi, T.

doi:10.1007/s11277-016-3530-3

Noise Robust Speaker Identification Using RASTA–MFCC Feature with Quadrilateral Filter Bank Structure

Published: 08 August 2016

Volume 91, pages 1321–1333, (2016)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

S. Selva Nidhyananthan¹,
R. Shantha Selva Kumari¹ &
T. Senthur Selvi ORCID: orcid.org/0000-0003-1249-9956¹

Abstract

This paper motivates the use of Relative Spectra–Mel Frequency Cepstral Coefficients (RASTA–MFCC) feature extracted from the newly designed Quadrilateral filter bank structure and Gaussian Mixture Model–Universal Background Model (GMM–UBM) for improved text independent speaker identification under noisy environment. Unlike neural network model which requires retraining of entire database when a new sample is added to it, GMM–UBM model does not require retraining of entire database which leads to easier and faster processing. RASTA–MFCC is found to be more robust to noisy environment compared with traditional MFCC method. MFCC is an efficient feature for identifying the speaker as it has speaker specific information capturing ability. RASTA processing of speech improves the performance of recognizer in the presence of convolution and additive noise. This work combines the better of these two processes to yield RASTA–MFCC feature which is robust to noise and also proposes a new Quadrilateral filter bank structure which approximates the response of cochlear membrane of human ear to effectively capture the feature vectors. The proposed Quadrilateral filter bank structure with RASTA–MFCC feature and GMM–UBM modeling for speaker identification demonstrates supremacy over triangular and Gaussian filter banks and offers a speaker identification accuracy of 97.67 % for the MEPCO noisy speech database with 50 speakers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

Article 27 October 2022

Closed-Set Text-Independent Automatic Speaker Recognition System Using VQ/GMM

Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech

References

Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312. doi:10.1121/1.1914702.
Article Google Scholar
Reynolds, D.A. (2008). Gaussian mixture models. Lexington, MA: MIT Lincoln Laboratory.
Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2013). Speaker verification using adapted gaussian mixture models. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1006/dspr.1999.0361.
Google Scholar
Bhattacharjee, U., & Sarmah, K. (2012). GMM–UBM based speaker verification in multilingual environments. IJCSI International Journal of Computer Science Issues, 9(6), 2.
Google Scholar
Xiaojia, Z., Yang, S., & De Liang, W. (2011). Robust speaker identification using auditory features and computational auditory scene analysis. IEEE Proceedings of the ICASSP-2008. doi:10.1109/ICASSP.2008.4517928.
Google Scholar
Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1992). RASTA-PLP speech analysis technique. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 121–124. doi:10.1109/ICASSP.1992.225957.
Google Scholar
Skowronski, M. D., & Harris, J. G. (2003). Improving the filter bank of a classic speech feature extraction algorithm. IEEE International Symposium on Circuits and Systems. doi:10.1109/ISCAS.2003.1205828.
Google Scholar
Schwarz, R., et al. (1994). Comparative experiments on large vocabulary speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1(1), 561–564. doi:10.1109/ICASSP.1994.389232.
Google Scholar
Un, C., & Lee, H. (1980). Voiced/unvoiced/silence discrimination of speech by delta modulation. IEEE Transaction on Acoustics, Speech and Signal Processing, 28(4), 398–407. doi:10.1109/TASSP.1980.1163424.
Article Google Scholar
Picone, J. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9), 1215–1247. doi:10.1109/5.237532.
Article Google Scholar
Deller, J. R., Proakis, J. G., & Hansen, J. H. L. (1993). Discrete time processing of speech signals. London: Macmillan.
Google Scholar
Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transaction Speech Audio Process, 2(4), 578–589. doi:10.1109/89.326616.
Article Google Scholar
Gaubitch, N. D., Brookes, M., & Naylor, P. A. (2013). Blind channel magnitude response estimation in speech using spectrum classification. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1109/TASL.2013.2270406.
Google Scholar
Togneri, R., & Pullela, D. (2011). An overview of speaker identification and accuracy. IEEE Circuits and Systems Magazine. doi:10.1109/MCAS.2011.941079.
Google Scholar
Stockham, T., Cannon, T., & Ingebretsen, R. (1975). Blind deconvolution through digital signal processing. Proceedings of the IEEE, 63, 678–692. doi:10.1109/PROC.1975.9800.
Article Google Scholar
Wojcicki, K., & Loizou, P. (2012). Channel selection in the modulation domain for improved speech intelligibility in noise. Journal of the Acoustical Society of America, 131(4), 2904–2913. doi:10.1121/1.3688488.
Article Google Scholar
Moore, B. C. J., & Glasberg, B. R. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. The Journal of the Acoustical Society of America, 74(3), 750–753.
Article Google Scholar
Grimaldi, M., & Cummins, F. (2008). Speaker identification using instantaneous frequencies. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1109/TASL.2008.2001109.
Google Scholar
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture models. IEEE Transaction on Speech Audio Processing, 3(1), 72–83. doi:10.1109/89.365379.
Article Google Scholar
Revathi, A., Ganapathy, R., & Venkataramani, Y. (2009). Text independent speaker recognition and speaker independent speech recognition using iterative clustering approach. International Journal of Computer Science and Information Technology, 1(2), 30–42.
Google Scholar
Gomez, P. (2011). A text independent speaker recognition system using a novel parametric neural network. Proceedings of International Journal of Signal Processing, Image Processing and Pattern Recognition, 4(4), 1–16.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of ECE, Mepco Schlenk Engineering College, Sivakasi, Tamil Nadu, India
S. Selva Nidhyananthan, R. Shantha Selva Kumari & T. Senthur Selvi

Authors

S. Selva Nidhyananthan
View author publications
You can also search for this author in PubMed Google Scholar
R. Shantha Selva Kumari
View author publications
You can also search for this author in PubMed Google Scholar
T. Senthur Selvi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to T. Senthur Selvi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Selva Nidhyananthan, S., Shantha Selva Kumari, R. & Senthur Selvi, T. Noise Robust Speaker Identification Using RASTA–MFCC Feature with Quadrilateral Filter Bank Structure. Wireless Pers Commun 91, 1321–1333 (2016). https://doi.org/10.1007/s11277-016-3530-3

Download citation

Published: 08 August 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11277-016-3530-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Noise Robust Speaker Identification Using RASTA–MFCC Feature with Quadrilateral Filter Bank Structure

Abstract

Access this article

Similar content being viewed by others

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

Closed-Set Text-Independent Automatic Speaker Recognition System Using VQ/GMM

Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Noise Robust Speaker Identification Using RASTA–MFCC Feature with Quadrilateral Filter Bank Structure

Abstract

Access this article

Similar content being viewed by others

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

Closed-Set Text-Independent Automatic Speaker Recognition System Using VQ/GMM

Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation