LP spectra vs. Mel spectra for identification of professional mimics in Indian languages

Patil, Hemant A.; Basu, T. K.

doi:10.1007/s10772-009-9031-y

LP spectra vs. Mel spectra for identification of professional mimics in Indian languages

Published: 19 May 2009

Volume 11, pages 1–16, (2008)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Hemant A. Patil¹ &
T. K. Basu²

125 Accesses
4 Citations
Explore all metrics

Abstract

Automatic Speaker Recognition (ASR) is an economic tool for voice biometrics because of availability of low cost and powerful processors. For an ASR system to be successful in practical environments, it must have high mimic resistance, i.e., the system should not be defeated by determined mimics which may be either identical twins or professional mimics. In this paper, we demonstrate the effectiveness of Linear Prediction (LP)-based features, viz., Linear Prediction Coefficients (LPC) and Linear Prediction Cepstral Coefficients (LPCC) over filterbank-based features such as Mel-Frequency Cepstral Coefficients (MFCC) and newly proposed Teager energy-based MFCC (T-MFCC) for the identification of professional mimics in Indian languages. Results are reported for real and fictitious experiments. On the whole, it is observed that LP-based features perform better than filterbank-based features (an average jump of 23.21% and 31.43% for fictitious experiments with professional mimic in Marathi and Hindi, respectively, whereas there is an average jump of 1.64% for real experiments with professional mimic in Hindi) and we believe that this is the first time such results on identification of professional mimics in ASR are obtained. Analysis of the results is given with the help of Mean Square Error (MSE) between training and testing utterances for mimic’s imitations for target speakers and target speakers’ normal voice. Fourier spectra and corresponding LP spectra for target speaker and its impersonations provided by professional mimic are shown to justify the results. Finally, dependence of LPC on physiological characteristics of vocal tract and its relation with respect to the problem addressed in this paper is studied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Atal, B. S., & Hanuaer, S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. Journal of the Acoustical Society of America, 50, 637–655.
Article Google Scholar
Atal, B. S. (1974). Effectiveness of linear prediction of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America, 55, 1304–1312.
Article Google Scholar
Campbell, Jr. J. P., & Reynolds, D. A. (1999). Corpora for the evaluation of speaker recognition systems. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’99 (Vol. 2, pp. 829–1312).
Campbell, W. M., Assaleh, K. T., & Broun, C. C. (2002). Speaker recognition with polynomial classifiers. IEEE Transactions on Speech and Audio Processing, 10, 205–212.
Article Google Scholar
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, & Signal Processing, 28, 357–366.
Article Google Scholar
Doddington, G. R. (1974). Speaker verification—Final report. Rome Air Development Center, Griffiss AFB, NY, Tech. Rep. RADC 74-179.
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification and scene analysis, 2nd edn. New York: Wiley–Interscience.
Google Scholar
Fant, G. (1970). Acoustic theory of speech production. The Hauge: Mouton.
Google Scholar
Flanagan, J. L. (1972). Speech analysis, synthesis and perception. Berlin: Springer.
Google Scholar
Hair, G. D., & Rekieta, T. W. (1972). Mimic resistance of speaker verification using phoneme spectra. Journal of the Acoustical Society of America, 51, 131 (A).
Article Google Scholar
Itakura, F., & Saito, S. (1968). Analysis synthesis telephony based on maximum likelihood method. In Y. Kohasi (Ed.), Rep 6th Int. Cong. Acoustics (pp. C17–C20).
Kaiser, J. F. (1990). On a simple algorithm to calculate the ‘energy’ of a signal. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 381–384).
Levinson, S. E. (2005). Mathematical methods of speech technology. London: Wiley.
Book Google Scholar
Levinson, S. E. (2006). Mathematical methods of speech technology. Presentation made during Winter School on Speech and Audio Processing, WiSSAP’06, IISc Bangalore, India.
Lummis, R. C., & Rosenberg, A. E. (1972). Test of an automatic speaker verification method with intensively trained mimics. Journal of the Acoustical Society of America, 51, 131 (A).
Article Google Scholar
Luck, J. E. (1969). Automatic speaker verification using cepstral measurements. Journal of the Acoustical Society of America, 46, 1026–1031.
Article Google Scholar
Mammone, R. J., Zhang, X., & Ramachandran, R. P. (1996). Robust speaker recognition—A feature based approach. IEEE Signal Processing Magazine, 13, 58–71.
Article Google Scholar
Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63, 561–580.
Article Google Scholar
Makhoul, J. (1977). Stable and efficient lattice methods for linear prediction. IEEE Transactions on Acoustics, Speech, & Signal Processing, 25, 423–428.
Article MATH Google Scholar
Maragos, P., Quatieri, T., & Kaiser, J. F. (1993). On amplitude and frequency demodulation using energy operators. IEEE Transactions on Signal Processing, 41, 1532–1550.
Article MATH Google Scholar
Markel, J. D., & Gray, Jr. A. H. (1976). Linear prediction of speech. New York: Springer.
MATH Google Scholar
Mitra, S. Patil, H. A., & Basu, T. K. (2003). Polynomial classifier techniques for speaker recognition in Indian languages. In National System Conference, NSC’03, IIT Kharagpur, India (pp. 304–308).
Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13, 52–55.
Article Google Scholar
Oppenheim, A. V., & Schafer, R. W. (1989). Discrete-time signal processing. Englewood Cliffs: Prentice-Hall.
MATH Google Scholar
Patil, H. A., & Basu, T. K. (2004). The Teager energy based features for identification of identical twins in multilingual environment. In N. R. Pal (Eds.), Lecture notes in computer science : Vol. 3316. ICONIP 2004 (pp. 333–337). Berlin: Springer.
Google Scholar
Patil, H. A., Dutta, P. K., & Basu, T. K. (2004). Comparison of performance of different speech features for identification of professional mimics in Hindi and Urdu languages. In National Symposium on Acoustics, NSA’04, Mysore, India, Nov. 25–27.
Patil, H. A. (2005). Speaker recognition in Indian languages: A feature based approach. PhD Thesis, Department of Electrical Engineering, IIT Kharagpur, India.
Patil, H. A., Dutta, P. K., & Basu, T. K. (2006). Effectiveness of LP based features for identification of professional mimics in Indian languages. In Int. Workshop on Multimodal User Authentication, MMUA’06, Toulouse, France.
Patil, H. A., & Basu, T. K. (2008). A novel approach to language identification using modified polynomial networks. In B. Prasad and S. R. M. Prasanna (Eds.), Speech, audio, image and biomedical signal processing using neural networks (Vol. 83, pp. 117–144) Springer, Berlin (submitted for possible publications).
Patil, H. A., & Basu, T. K. (2009). Development of speech corpora for speaker recognition research and evaluation in Indian languages. International Journal of Speech Technology. 10.1007/s10772-009-9029-5.
Google Scholar
Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall.
Google Scholar
Reynolds, D. A. (1994). Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing, 2, 639–643.
Article Google Scholar
Rosenberg, A. E. (1976). Automatic speaker verification: A review. Proceedings of the IEEE, 64, 475–487.
Article Google Scholar
Rosenberg, A. E., & Sambur, M. R. (1975). New techniques for automatic speaker verification. IEEE Transactions on Acoustics, Speech, & Signal Processing, 23, 169–175.
Article Google Scholar
Teager, H. M. (1980). Some observations on oral air flow during phonation. IEEE Transactions on Acoustics, Speech, & Signal Processing, 28, 599–601.
Article Google Scholar
Wakita, H. (1973). Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. IEEE Transactions on Audio and Electroacoustics, 21, 417–427.
Article Google Scholar
Whittle, P. (1954). Some recent contributions to the theory of stationary processes. In H. Wold (Ed.), A Study in the Analysis of Stationary Time Series (pp. 333–337). Stockholm: Almqvist and Wiksell. Appendix 2.
Google Scholar
Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Transactions on Speech and Audio Processing, 13, 575–582.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, Gujarat, India
Hemant A. Patil
Department of Electrical Engineering, Indian Institute of Technology, 721302, Kharagpur, India
T. K. Basu

Authors

Hemant A. Patil
View author publications
You can also search for this author in PubMed Google Scholar
T. K. Basu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hemant A. Patil.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patil, H.A., Basu, T.K. LP spectra vs. Mel spectra for identification of professional mimics in Indian languages. Int J Speech Technol 11, 1–16 (2008). https://doi.org/10.1007/s10772-009-9031-y

Download citation

Received: 28 October 2006
Accepted: 21 April 2009
Published: 19 May 2009
Issue Date: March 2008
DOI: https://doi.org/10.1007/s10772-009-9031-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LP spectra vs. Mel spectra for identification of professional mimics in Indian languages

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

A Deep Learning Framework for Audio Deepfake Detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

LP spectra vs. Mel spectra for identification of professional mimics in Indian languages

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

A Deep Learning Framework for Audio Deepfake Detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation