Abstract
This paper motivates the use of Relative Spectra–Mel Frequency Cepstral Coefficients (RASTA–MFCC) feature extracted from the newly designed Quadrilateral filter bank structure and Gaussian Mixture Model–Universal Background Model (GMM–UBM) for improved text independent speaker identification under noisy environment. Unlike neural network model which requires retraining of entire database when a new sample is added to it, GMM–UBM model does not require retraining of entire database which leads to easier and faster processing. RASTA–MFCC is found to be more robust to noisy environment compared with traditional MFCC method. MFCC is an efficient feature for identifying the speaker as it has speaker specific information capturing ability. RASTA processing of speech improves the performance of recognizer in the presence of convolution and additive noise. This work combines the better of these two processes to yield RASTA–MFCC feature which is robust to noise and also proposes a new Quadrilateral filter bank structure which approximates the response of cochlear membrane of human ear to effectively capture the feature vectors. The proposed Quadrilateral filter bank structure with RASTA–MFCC feature and GMM–UBM modeling for speaker identification demonstrates supremacy over triangular and Gaussian filter banks and offers a speaker identification accuracy of 97.67 % for the MEPCO noisy speech database with 50 speakers.
Similar content being viewed by others
References
Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312. doi:10.1121/1.1914702.
Reynolds, D.A. (2008). Gaussian mixture models. Lexington, MA: MIT Lincoln Laboratory.
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2013). Speaker verification using adapted gaussian mixture models. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1006/dspr.1999.0361.
Bhattacharjee, U., & Sarmah, K. (2012). GMM–UBM based speaker verification in multilingual environments. IJCSI International Journal of Computer Science Issues, 9(6), 2.
Xiaojia, Z., Yang, S., & De Liang, W. (2011). Robust speaker identification using auditory features and computational auditory scene analysis. IEEE Proceedings of the ICASSP-2008. doi:10.1109/ICASSP.2008.4517928.
Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1992). RASTA-PLP speech analysis technique. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 121–124. doi:10.1109/ICASSP.1992.225957.
Skowronski, M. D., & Harris, J. G. (2003). Improving the filter bank of a classic speech feature extraction algorithm. IEEE International Symposium on Circuits and Systems. doi:10.1109/ISCAS.2003.1205828.
Schwarz, R., et al. (1994). Comparative experiments on large vocabulary speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1(1), 561–564. doi:10.1109/ICASSP.1994.389232.
Un, C., & Lee, H. (1980). Voiced/unvoiced/silence discrimination of speech by delta modulation. IEEE Transaction on Acoustics, Speech and Signal Processing, 28(4), 398–407. doi:10.1109/TASSP.1980.1163424.
Picone, J. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9), 1215–1247. doi:10.1109/5.237532.
Deller, J. R., Proakis, J. G., & Hansen, J. H. L. (1993). Discrete time processing of speech signals. London: Macmillan.
Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transaction Speech Audio Process, 2(4), 578–589. doi:10.1109/89.326616.
Gaubitch, N. D., Brookes, M., & Naylor, P. A. (2013). Blind channel magnitude response estimation in speech using spectrum classification. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1109/TASL.2013.2270406.
Togneri, R., & Pullela, D. (2011). An overview of speaker identification and accuracy. IEEE Circuits and Systems Magazine. doi:10.1109/MCAS.2011.941079.
Stockham, T., Cannon, T., & Ingebretsen, R. (1975). Blind deconvolution through digital signal processing. Proceedings of the IEEE, 63, 678–692. doi:10.1109/PROC.1975.9800.
Wojcicki, K., & Loizou, P. (2012). Channel selection in the modulation domain for improved speech intelligibility in noise. Journal of the Acoustical Society of America, 131(4), 2904–2913. doi:10.1121/1.3688488.
Moore, B. C. J., & Glasberg, B. R. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. The Journal of the Acoustical Society of America, 74(3), 750–753.
Grimaldi, M., & Cummins, F. (2008). Speaker identification using instantaneous frequencies. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1109/TASL.2008.2001109.
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture models. IEEE Transaction on Speech Audio Processing, 3(1), 72–83. doi:10.1109/89.365379.
Revathi, A., Ganapathy, R., & Venkataramani, Y. (2009). Text independent speaker recognition and speaker independent speech recognition using iterative clustering approach. International Journal of Computer Science and Information Technology, 1(2), 30–42.
Gomez, P. (2011). A text independent speaker recognition system using a novel parametric neural network. Proceedings of International Journal of Signal Processing, Image Processing and Pattern Recognition, 4(4), 1–16.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Selva Nidhyananthan, S., Shantha Selva Kumari, R. & Senthur Selvi, T. Noise Robust Speaker Identification Using RASTA–MFCC Feature with Quadrilateral Filter Bank Structure. Wireless Pers Commun 91, 1321–1333 (2016). https://doi.org/10.1007/s11277-016-3530-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-016-3530-3