Abstract
Automatic Speaker Recognition (ASR) is an economic tool for voice biometrics because of availability of low cost and powerful processors. For an ASR system to be successful in practical environments, it must have high mimic resistance, i.e., the system should not be defeated by determined mimics which may be either identical twins or professional mimics. In this paper, we demonstrate the effectiveness of Linear Prediction (LP)-based features, viz., Linear Prediction Coefficients (LPC) and Linear Prediction Cepstral Coefficients (LPCC) over filterbank-based features such as Mel-Frequency Cepstral Coefficients (MFCC) and newly proposed Teager energy-based MFCC (T-MFCC) for the identification of professional mimics in Indian languages. Results are reported for real and fictitious experiments. On the whole, it is observed that LP-based features perform better than filterbank-based features (an average jump of 23.21% and 31.43% for fictitious experiments with professional mimic in Marathi and Hindi, respectively, whereas there is an average jump of 1.64% for real experiments with professional mimic in Hindi) and we believe that this is the first time such results on identification of professional mimics in ASR are obtained. Analysis of the results is given with the help of Mean Square Error (MSE) between training and testing utterances for mimic’s imitations for target speakers and target speakers’ normal voice. Fourier spectra and corresponding LP spectra for target speaker and its impersonations provided by professional mimic are shown to justify the results. Finally, dependence of LPC on physiological characteristics of vocal tract and its relation with respect to the problem addressed in this paper is studied.
Similar content being viewed by others
References
Atal, B. S., & Hanuaer, S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. Journal of the Acoustical Society of America, 50, 637–655.
Atal, B. S. (1974). Effectiveness of linear prediction of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America, 55, 1304–1312.
Campbell, Jr. J. P., & Reynolds, D. A. (1999). Corpora for the evaluation of speaker recognition systems. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’99 (Vol. 2, pp. 829–1312).
Campbell, W. M., Assaleh, K. T., & Broun, C. C. (2002). Speaker recognition with polynomial classifiers. IEEE Transactions on Speech and Audio Processing, 10, 205–212.
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, & Signal Processing, 28, 357–366.
Doddington, G. R. (1974). Speaker verification—Final report. Rome Air Development Center, Griffiss AFB, NY, Tech. Rep. RADC 74-179.
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification and scene analysis, 2nd edn. New York: Wiley–Interscience.
Fant, G. (1970). Acoustic theory of speech production. The Hauge: Mouton.
Flanagan, J. L. (1972). Speech analysis, synthesis and perception. Berlin: Springer.
Hair, G. D., & Rekieta, T. W. (1972). Mimic resistance of speaker verification using phoneme spectra. Journal of the Acoustical Society of America, 51, 131 (A).
Itakura, F., & Saito, S. (1968). Analysis synthesis telephony based on maximum likelihood method. In Y. Kohasi (Ed.), Rep 6th Int. Cong. Acoustics (pp. C17–C20).
Kaiser, J. F. (1990). On a simple algorithm to calculate the ‘energy’ of a signal. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 381–384).
Levinson, S. E. (2005). Mathematical methods of speech technology. London: Wiley.
Levinson, S. E. (2006). Mathematical methods of speech technology. Presentation made during Winter School on Speech and Audio Processing, WiSSAP’06, IISc Bangalore, India.
Lummis, R. C., & Rosenberg, A. E. (1972). Test of an automatic speaker verification method with intensively trained mimics. Journal of the Acoustical Society of America, 51, 131 (A).
Luck, J. E. (1969). Automatic speaker verification using cepstral measurements. Journal of the Acoustical Society of America, 46, 1026–1031.
Mammone, R. J., Zhang, X., & Ramachandran, R. P. (1996). Robust speaker recognition—A feature based approach. IEEE Signal Processing Magazine, 13, 58–71.
Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63, 561–580.
Makhoul, J. (1977). Stable and efficient lattice methods for linear prediction. IEEE Transactions on Acoustics, Speech, & Signal Processing, 25, 423–428.
Maragos, P., Quatieri, T., & Kaiser, J. F. (1993). On amplitude and frequency demodulation using energy operators. IEEE Transactions on Signal Processing, 41, 1532–1550.
Markel, J. D., & Gray, Jr. A. H. (1976). Linear prediction of speech. New York: Springer.
Mitra, S. Patil, H. A., & Basu, T. K. (2003). Polynomial classifier techniques for speaker recognition in Indian languages. In National System Conference, NSC’03, IIT Kharagpur, India (pp. 304–308).
Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13, 52–55.
Oppenheim, A. V., & Schafer, R. W. (1989). Discrete-time signal processing. Englewood Cliffs: Prentice-Hall.
Patil, H. A., & Basu, T. K. (2004). The Teager energy based features for identification of identical twins in multilingual environment. In N. R. Pal (Eds.), Lecture notes in computer science : Vol. 3316. ICONIP 2004 (pp. 333–337). Berlin: Springer.
Patil, H. A., Dutta, P. K., & Basu, T. K. (2004). Comparison of performance of different speech features for identification of professional mimics in Hindi and Urdu languages. In National Symposium on Acoustics, NSA’04, Mysore, India, Nov. 25–27.
Patil, H. A. (2005). Speaker recognition in Indian languages: A feature based approach. PhD Thesis, Department of Electrical Engineering, IIT Kharagpur, India.
Patil, H. A., Dutta, P. K., & Basu, T. K. (2006). Effectiveness of LP based features for identification of professional mimics in Indian languages. In Int. Workshop on Multimodal User Authentication, MMUA’06, Toulouse, France.
Patil, H. A., & Basu, T. K. (2008). A novel approach to language identification using modified polynomial networks. In B. Prasad and S. R. M. Prasanna (Eds.), Speech, audio, image and biomedical signal processing using neural networks (Vol. 83, pp. 117–144) Springer, Berlin (submitted for possible publications).
Patil, H. A., & Basu, T. K. (2009). Development of speech corpora for speaker recognition research and evaluation in Indian languages. International Journal of Speech Technology. 10.1007/s10772-009-9029-5.
Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall.
Reynolds, D. A. (1994). Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing, 2, 639–643.
Rosenberg, A. E. (1976). Automatic speaker verification: A review. Proceedings of the IEEE, 64, 475–487.
Rosenberg, A. E., & Sambur, M. R. (1975). New techniques for automatic speaker verification. IEEE Transactions on Acoustics, Speech, & Signal Processing, 23, 169–175.
Teager, H. M. (1980). Some observations on oral air flow during phonation. IEEE Transactions on Acoustics, Speech, & Signal Processing, 28, 599–601.
Wakita, H. (1973). Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. IEEE Transactions on Audio and Electroacoustics, 21, 417–427.
Whittle, P. (1954). Some recent contributions to the theory of stationary processes. In H. Wold (Ed.), A Study in the Analysis of Stationary Time Series (pp. 333–337). Stockholm: Almqvist and Wiksell. Appendix 2.
Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Transactions on Speech and Audio Processing, 13, 575–582.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Patil, H.A., Basu, T.K. LP spectra vs. Mel spectra for identification of professional mimics in Indian languages. Int J Speech Technol 11, 1–16 (2008). https://doi.org/10.1007/s10772-009-9031-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-009-9031-y