Abstract
This paper investigates various approaches to segmentation of media streams into speaker homogenous segments and approaches to clustering of speakers within a speaker diarization system for processing of broadcast audio. Evaluated segmentation approaches are all based on the widely used Bayesian Information Criterion (BIC). They differ in a strategy for choice of the length of the window (fixed or variable) and in a strategy for estimation of the decision threshold (fixed or adaptive). Further, we compare two bottom-up clustering approaches. The traditional BIC-based clustering is compared with the approach based on a measure of the distance between GMMs estimated for the data of clusters by the Maximum A Posteriori (MAP) adaptation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Nouza, J., Zdansky, J., Cerva, P., Kolorenc, J.: Continual On-line Monitoring of Czech Spoken Broadcast Programs. In: Proceedings of 7th International Conference on Spoken Language Processing (ICSLP 2006), Pittsburgh, pp. 1650–1653 (2006)
Chen, S.S., Gopalakrishnan, P.S.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: Proceedings 1998 DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, pp. 127–132 (1998)
Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Transactions on Audio, Speech, Language Processing 14(5), 1557–1565 (2006)
Meignier, S., Moraru, D., Fredouille, C., Bonastre, J.-F., Besacier, L.: Step-by-Step and integrated approaches in broadcast news speaker diarization. Computer Speech And Language (20), 303–330 (2005)
Ben, M., Bester, M., Bimbot, F., Gravier, G.: Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs. In: Proceedings of 8th International Conference Spoken Language Processing, Jeju Island, pp. 2329–2332 (2004)
Reynolds, D.A., Quatieri, T., Dunn, R.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10(1-3), 19–41 (2000)
Gauvain, J.-L., Lamel, L., Adda, G.: Partitioning and transcription of broadcast news data. In: Proceedings International Conference Spoken Language Processing, Sydney, pp. 1335–1338 (1998)
Lopez-Otero, P., Fernandez, L.D., Garcia-Mateo, C.: Novel strategies for reducing the false alarm rate in a speaker segmentation system. In: Proceedings of ICASSP 2010, Dallas, pp. 4970–4973 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Prazak, J., Silovsky, J. (2011). Comparison of Segmentation and Clustering Methods for Speaker Diarization of Broadcast Stream Audio. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds) Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues. Lecture Notes in Computer Science, vol 6800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25775-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-25775-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25774-2
Online ISBN: 978-3-642-25775-9
eBook Packages: Computer ScienceComputer Science (R0)