Comparison of Segmentation and Clustering Methods for Speaker Diarization of Broadcast Stream Audio

Prazak, Jan; Silovsky, Jan

doi:10.1007/978-3-642-25775-9_21

Jan Prazak²¹ &
Jan Silovsky²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6800))

2532 Accesses

Abstract

This paper investigates various approaches to segmentation of media streams into speaker homogenous segments and approaches to clustering of speakers within a speaker diarization system for processing of broadcast audio. Evaluated segmentation approaches are all based on the widely used Bayesian Information Criterion (BIC). They differ in a strategy for choice of the length of the window (fixed or variable) and in a strategy for estimation of the decision threshold (fixed or adaptive). Further, we compare two bottom-up clustering approaches. The traditional BIC-based clustering is compared with the approach based on a measure of the distance between GMMs estimated for the data of clusters by the Maximum A Posteriori (MAP) adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nouza, J., Zdansky, J., Cerva, P., Kolorenc, J.: Continual On-line Monitoring of Czech Spoken Broadcast Programs. In: Proceedings of 7th International Conference on Spoken Language Processing (ICSLP 2006), Pittsburgh, pp. 1650–1653 (2006)
Google Scholar
Chen, S.S., Gopalakrishnan, P.S.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: Proceedings 1998 DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, pp. 127–132 (1998)
Google Scholar
Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Transactions on Audio, Speech, Language Processing 14(5), 1557–1565 (2006)
Article Google Scholar
Meignier, S., Moraru, D., Fredouille, C., Bonastre, J.-F., Besacier, L.: Step-by-Step and integrated approaches in broadcast news speaker diarization. Computer Speech And Language (20), 303–330 (2005)
Google Scholar
Ben, M., Bester, M., Bimbot, F., Gravier, G.: Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs. In: Proceedings of 8th International Conference Spoken Language Processing, Jeju Island, pp. 2329–2332 (2004)
Google Scholar
Reynolds, D.A., Quatieri, T., Dunn, R.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10(1-3), 19–41 (2000)
Article Google Scholar
Gauvain, J.-L., Lamel, L., Adda, G.: Partitioning and transcription of broadcast news data. In: Proceedings International Conference Spoken Language Processing, Sydney, pp. 1335–1338 (1998)
Google Scholar
Lopez-Otero, P., Fernandez, L.D., Garcia-Mateo, C.: Novel strategies for reducing the false alarm rate in a speaker segmentation system. In: Proceedings of ICASSP 2010, Dallas, pp. 4970–4973 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Technology and Electronics, Faculty of Mechatronics, Technical University of Liberec, Studentska 2, CZ 46117, Liberec, Czech Republic
Jan Prazak & Jan Silovsky

Authors

Jan Prazak
View author publications
You can also search for this author in PubMed Google Scholar
Jan Silovsky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Psychology and IIASS, International Institute for Advanced Scientific Studies, Second University of Naples, Vietri sul Mare, SA, Italy
Anna Esposito
School of Computing Science, University of Glasgow, Glasgow, UK
Alessandro Vinciarelli
Department of Telecommunication and Media Informatics, Laboratory of Speech Acoustics, Budapest University of Technology and Economics, 1117, Budapest, Hungary
Klára Vicsi
TELECOM ParisTech, CNRS-LTCI UMR 5141, 75014, Paris, France
Catherine Pelachaud
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 7500 AE, Enschede, The Netherlands
Anton Nijholt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Prazak, J., Silovsky, J. (2011). Comparison of Segmentation and Clustering Methods for Speaker Diarization of Broadcast Stream Audio. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds) Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues. Lecture Notes in Computer Science, vol 6800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25775-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-25775-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25774-2
Online ISBN: 978-3-642-25775-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics