Abstract
The embedded linear transformation is a popular technique which integrates both transformation and diagonal-covariance Gaussian mixture into a unified framework to improve the performance of speaker recognition. However, the mixture number of GMM must be given in model training. The cluster expectation-maximization (EM) algorithm is a well-known technique in which the mixture number is regarded as an estimated parameter. This paper presents a new model that integrates an improved cluster algorithm into the estimating process of GMM with the embedded transformation. In the approach, the transformation matrix, the mixture number and other traditional model parameters are simultaneously estimated according to a maximum likelihood criterion. The proposed method is demonstrated on a database of three data sessions for text independent speaker identification. The experiments show that this method outperforms the traditional GMM with cluster EM algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Furui, S.: An Overview of Speaker Recognition Technology. In: Lee, C., Soong, F., Paliwal, K. (eds.) Automatic Speech and Speaker Recognition, Kluwer Academic Press, Dordrecht (1996)
Reynolds, D.A., Rose, R.C.: Robust Text-independent Speaker Identification Using Gaussian Mixture Speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
You, K.H., Wang, H.C.: Joint Estimation of Feature Transformation Parameters and Gaussian mixture Model for Speaker identification. Speech Communication 28, 227–241 (1999)
Hong, Q.Y., Kwong, S.: A Discriminative Training Approach for Text-independent Speaker Recognition. Signal Processing 85, 1449–1463 (2005)
Li, H., Haton, J.P., Gong, Y.: On MMI Learning of Gaussian mixture for speaker models. In: Proceddings EUROSPEECH’95, pp. 363–366 (1995)
Ljolje, A.: The importance of cepstral parameter correlations in speech recognition. Computer Speech and Language 8, 223–232 (1994)
Chen, C.-C.T., Chen, C.T., Hou, C.K.: Speaker Identification Using Hybrid Karhunen-Loeve transform and Gaussian mixture model approach. Pattern Recognition 37, 1073–1075 (2004)
Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, London (1990)
Boulis, C., Diakoloukas, V., Digalakis, V.: Maximum Likelihood Stochastic Transformation Adaptation for Medium and Small Data Sets. Computer Speech and Language 15, 257–285 (2001)
Bouman, C.A.: Cluster: An Unsupervised Algorithm for Modeling Gaussian Mixtures (2005), http://www.ece.purdue.edu/~bouman
Rissanen, J.: A Universal Prior for Integers and Estimation by Minimum Description Length. Annals of Statistics 11(2), 417–431 (1983)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Xu, L., Tang, Z., He, K., Qian, B. (2007). Transformation-Based GMM with Improved Cluster Algorithm for Speaker Identification. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_113
Download citation
DOI: https://doi.org/10.1007/978-3-540-71701-0_113
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71700-3
Online ISBN: 978-3-540-71701-0
eBook Packages: Computer ScienceComputer Science (R0)