Skip to main content
Log in

Noise Robust Speaker Identification Using RASTA–MFCC Feature with Quadrilateral Filter Bank Structure

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

This paper motivates the use of Relative Spectra–Mel Frequency Cepstral Coefficients (RASTA–MFCC) feature extracted from the newly designed Quadrilateral filter bank structure and Gaussian Mixture Model–Universal Background Model (GMM–UBM) for improved text independent speaker identification under noisy environment. Unlike neural network model which requires retraining of entire database when a new sample is added to it, GMM–UBM model does not require retraining of entire database which leads to easier and faster processing. RASTA–MFCC is found to be more robust to noisy environment compared with traditional MFCC method. MFCC is an efficient feature for identifying the speaker as it has speaker specific information capturing ability. RASTA processing of speech improves the performance of recognizer in the presence of convolution and additive noise. This work combines the better of these two processes to yield RASTA–MFCC feature which is robust to noise and also proposes a new Quadrilateral filter bank structure which approximates the response of cochlear membrane of human ear to effectively capture the feature vectors. The proposed Quadrilateral filter bank structure with RASTA–MFCC feature and GMM–UBM modeling for speaker identification demonstrates supremacy over triangular and Gaussian filter banks and offers a speaker identification accuracy of 97.67 % for the MEPCO noisy speech database with 50 speakers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312. doi:10.1121/1.1914702.

    Article  Google Scholar 

  2. Reynolds, D.A. (2008). Gaussian mixture models. Lexington, MA: MIT Lincoln Laboratory.

    Google Scholar 

  3. Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2013). Speaker verification using adapted gaussian mixture models. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1006/dspr.1999.0361.

    Google Scholar 

  4. Bhattacharjee, U., & Sarmah, K. (2012). GMM–UBM based speaker verification in multilingual environments. IJCSI International Journal of Computer Science Issues, 9(6), 2.

    Google Scholar 

  5. Xiaojia, Z., Yang, S., & De Liang, W. (2011). Robust speaker identification using auditory features and computational auditory scene analysis. IEEE Proceedings of the ICASSP-2008. doi:10.1109/ICASSP.2008.4517928.

    Google Scholar 

  6. Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1992). RASTA-PLP speech analysis technique. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 121–124. doi:10.1109/ICASSP.1992.225957.

    Google Scholar 

  7. Skowronski, M. D., & Harris, J. G. (2003). Improving the filter bank of a classic speech feature extraction algorithm. IEEE International Symposium on Circuits and Systems. doi:10.1109/ISCAS.2003.1205828.

    Google Scholar 

  8. Schwarz, R., et al. (1994). Comparative experiments on large vocabulary speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1(1), 561–564. doi:10.1109/ICASSP.1994.389232.

    Google Scholar 

  9. Un, C., & Lee, H. (1980). Voiced/unvoiced/silence discrimination of speech by delta modulation. IEEE Transaction on Acoustics, Speech and Signal Processing, 28(4), 398–407. doi:10.1109/TASSP.1980.1163424.

    Article  Google Scholar 

  10. Picone, J. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9), 1215–1247. doi:10.1109/5.237532.

    Article  Google Scholar 

  11. Deller, J. R., Proakis, J. G., & Hansen, J. H. L. (1993). Discrete time processing of speech signals. London: Macmillan.

    Google Scholar 

  12. Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transaction Speech Audio Process, 2(4), 578–589. doi:10.1109/89.326616.

    Article  Google Scholar 

  13. Gaubitch, N. D., Brookes, M., & Naylor, P. A. (2013). Blind channel magnitude response estimation in speech using spectrum classification. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1109/TASL.2013.2270406.

    Google Scholar 

  14. Togneri, R., & Pullela, D. (2011). An overview of speaker identification and accuracy. IEEE Circuits and Systems Magazine. doi:10.1109/MCAS.2011.941079.

    Google Scholar 

  15. Stockham, T., Cannon, T., & Ingebretsen, R. (1975). Blind deconvolution through digital signal processing. Proceedings of the IEEE, 63, 678–692. doi:10.1109/PROC.1975.9800.

    Article  Google Scholar 

  16. Wojcicki, K., & Loizou, P. (2012). Channel selection in the modulation domain for improved speech intelligibility in noise. Journal of the Acoustical Society of America, 131(4), 2904–2913. doi:10.1121/1.3688488.

    Article  Google Scholar 

  17. Moore, B. C. J., & Glasberg, B. R. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. The Journal of the Acoustical Society of America, 74(3), 750–753.

    Article  Google Scholar 

  18. Grimaldi, M., & Cummins, F. (2008). Speaker identification using instantaneous frequencies. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1109/TASL.2008.2001109.

    Google Scholar 

  19. Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture models. IEEE Transaction on Speech Audio Processing, 3(1), 72–83. doi:10.1109/89.365379.

    Article  Google Scholar 

  20. Revathi, A., Ganapathy, R., & Venkataramani, Y. (2009). Text independent speaker recognition and speaker independent speech recognition using iterative clustering approach. International Journal of Computer Science and Information Technology, 1(2), 30–42.

    Google Scholar 

  21. Gomez, P. (2011). A text independent speaker recognition system using a novel parametric neural network. Proceedings of International Journal of Signal Processing, Image Processing and Pattern Recognition, 4(4), 1–16.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T. Senthur Selvi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Selva Nidhyananthan, S., Shantha Selva Kumari, R. & Senthur Selvi, T. Noise Robust Speaker Identification Using RASTA–MFCC Feature with Quadrilateral Filter Bank Structure. Wireless Pers Commun 91, 1321–1333 (2016). https://doi.org/10.1007/s11277-016-3530-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-016-3530-3

Keywords

Navigation