Skip to main content
Log in

LP spectra vs. Mel spectra for identification of professional mimics in Indian languages

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Automatic Speaker Recognition (ASR) is an economic tool for voice biometrics because of availability of low cost and powerful processors. For an ASR system to be successful in practical environments, it must have high mimic resistance, i.e., the system should not be defeated by determined mimics which may be either identical twins or professional mimics. In this paper, we demonstrate the effectiveness of Linear Prediction (LP)-based features, viz., Linear Prediction Coefficients (LPC) and Linear Prediction Cepstral Coefficients (LPCC) over filterbank-based features such as Mel-Frequency Cepstral Coefficients (MFCC) and newly proposed Teager energy-based MFCC (T-MFCC) for the identification of professional mimics in Indian languages. Results are reported for real and fictitious experiments. On the whole, it is observed that LP-based features perform better than filterbank-based features (an average jump of 23.21% and 31.43% for fictitious experiments with professional mimic in Marathi and Hindi, respectively, whereas there is an average jump of 1.64% for real experiments with professional mimic in Hindi) and we believe that this is the first time such results on identification of professional mimics in ASR are obtained. Analysis of the results is given with the help of Mean Square Error (MSE) between training and testing utterances for mimic’s imitations for target speakers and target speakers’ normal voice. Fourier spectra and corresponding LP spectra for target speaker and its impersonations provided by professional mimic are shown to justify the results. Finally, dependence of LPC on physiological characteristics of vocal tract and its relation with respect to the problem addressed in this paper is studied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Atal, B. S., & Hanuaer, S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. Journal of the Acoustical Society of America, 50, 637–655.

    Article  Google Scholar 

  • Atal, B. S. (1974). Effectiveness of linear prediction of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America, 55, 1304–1312.

    Article  Google Scholar 

  • Campbell, Jr. J. P., & Reynolds, D. A. (1999). Corpora for the evaluation of speaker recognition systems. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’99 (Vol. 2, pp. 829–1312).

  • Campbell, W. M., Assaleh, K. T., & Broun, C. C. (2002). Speaker recognition with polynomial classifiers. IEEE Transactions on Speech and Audio Processing, 10, 205–212.

    Article  Google Scholar 

  • Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, & Signal Processing, 28, 357–366.

    Article  Google Scholar 

  • Doddington, G. R. (1974). Speaker verification—Final report. Rome Air Development Center, Griffiss AFB, NY, Tech. Rep. RADC 74-179.

  • Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification and scene analysis, 2nd edn. New York: Wiley–Interscience.

    Google Scholar 

  • Fant, G. (1970). Acoustic theory of speech production. The Hauge: Mouton.

    Google Scholar 

  • Flanagan, J. L. (1972). Speech analysis, synthesis and perception. Berlin: Springer.

    Google Scholar 

  • Hair, G. D., & Rekieta, T. W. (1972). Mimic resistance of speaker verification using phoneme spectra. Journal of the Acoustical Society of America, 51, 131 (A).

    Article  Google Scholar 

  • Itakura, F., & Saito, S. (1968). Analysis synthesis telephony based on maximum likelihood method. In Y. Kohasi (Ed.), Rep 6th Int. Cong. Acoustics (pp. C17–C20).

  • Kaiser, J. F. (1990). On a simple algorithm to calculate the ‘energy’ of a signal. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 381–384).

  • Levinson, S. E. (2005). Mathematical methods of speech technology. London: Wiley.

    Book  Google Scholar 

  • Levinson, S. E. (2006). Mathematical methods of speech technology. Presentation made during Winter School on Speech and Audio Processing, WiSSAP’06, IISc Bangalore, India.

  • Lummis, R. C., & Rosenberg, A. E. (1972). Test of an automatic speaker verification method with intensively trained mimics. Journal of the Acoustical Society of America, 51, 131 (A).

    Article  Google Scholar 

  • Luck, J. E. (1969). Automatic speaker verification using cepstral measurements. Journal of the Acoustical Society of America, 46, 1026–1031.

    Article  Google Scholar 

  • Mammone, R. J., Zhang, X., & Ramachandran, R. P. (1996). Robust speaker recognition—A feature based approach. IEEE Signal Processing Magazine, 13, 58–71.

    Article  Google Scholar 

  • Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63, 561–580.

    Article  Google Scholar 

  • Makhoul, J. (1977). Stable and efficient lattice methods for linear prediction. IEEE Transactions on Acoustics, Speech, & Signal Processing, 25, 423–428.

    Article  MATH  Google Scholar 

  • Maragos, P., Quatieri, T., & Kaiser, J. F. (1993). On amplitude and frequency demodulation using energy operators. IEEE Transactions on Signal Processing, 41, 1532–1550.

    Article  MATH  Google Scholar 

  • Markel, J. D., & Gray, Jr. A. H. (1976). Linear prediction of speech. New York: Springer.

    MATH  Google Scholar 

  • Mitra, S. Patil, H. A., & Basu, T. K. (2003). Polynomial classifier techniques for speaker recognition in Indian languages. In National System Conference, NSC’03, IIT Kharagpur, India (pp. 304–308).

  • Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13, 52–55.

    Article  Google Scholar 

  • Oppenheim, A. V., & Schafer, R. W. (1989). Discrete-time signal processing. Englewood Cliffs: Prentice-Hall.

    MATH  Google Scholar 

  • Patil, H. A., & Basu, T. K. (2004). The Teager energy based features for identification of identical twins in multilingual environment. In N. R. Pal (Eds.), Lecture notes in computer science : Vol. 3316. ICONIP 2004 (pp. 333–337). Berlin: Springer.

    Google Scholar 

  • Patil, H. A., Dutta, P. K., & Basu, T. K. (2004). Comparison of performance of different speech features for identification of professional mimics in Hindi and Urdu languages. In National Symposium on Acoustics, NSA’04, Mysore, India, Nov. 25–27.

  • Patil, H. A. (2005). Speaker recognition in Indian languages: A feature based approach. PhD Thesis, Department of Electrical Engineering, IIT Kharagpur, India.

  • Patil, H. A., Dutta, P. K., & Basu, T. K. (2006). Effectiveness of LP based features for identification of professional mimics in Indian languages. In Int. Workshop on Multimodal User Authentication, MMUA’06, Toulouse, France.

  • Patil, H. A., & Basu, T. K. (2008). A novel approach to language identification using modified polynomial networks. In B. Prasad and S. R. M. Prasanna (Eds.), Speech, audio, image and biomedical signal processing using neural networks (Vol. 83, pp. 117–144) Springer, Berlin (submitted for possible publications).

  • Patil, H. A., & Basu, T. K. (2009). Development of speech corpora for speaker recognition research and evaluation in Indian languages. International Journal of Speech Technology. 10.1007/s10772-009-9029-5.

    Google Scholar 

  • Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Reynolds, D. A. (1994). Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing, 2, 639–643.

    Article  Google Scholar 

  • Rosenberg, A. E. (1976). Automatic speaker verification: A review. Proceedings of the IEEE, 64, 475–487.

    Article  Google Scholar 

  • Rosenberg, A. E., & Sambur, M. R. (1975). New techniques for automatic speaker verification. IEEE Transactions on Acoustics, Speech, & Signal Processing, 23, 169–175.

    Article  Google Scholar 

  • Teager, H. M. (1980). Some observations on oral air flow during phonation. IEEE Transactions on Acoustics, Speech, & Signal Processing, 28, 599–601.

    Article  Google Scholar 

  • Wakita, H. (1973). Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. IEEE Transactions on Audio and Electroacoustics, 21, 417–427.

    Article  Google Scholar 

  • Whittle, P. (1954). Some recent contributions to the theory of stationary processes. In H. Wold (Ed.), A Study in the Analysis of Stationary Time Series (pp. 333–337). Stockholm: Almqvist and Wiksell. Appendix 2.

    Google Scholar 

  • Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Transactions on Speech and Audio Processing, 13, 575–582.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hemant A. Patil.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patil, H.A., Basu, T.K. LP spectra vs. Mel spectra for identification of professional mimics in Indian languages. Int J Speech Technol 11, 1–16 (2008). https://doi.org/10.1007/s10772-009-9031-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-009-9031-y

Keywords

Navigation