Skip to main content

Comparison of Segmentation and Clustering Methods for Speaker Diarization of Broadcast Stream Audio

  • Conference paper
Book cover Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6800))

  • 2532 Accesses

Abstract

This paper investigates various approaches to segmentation of media streams into speaker homogenous segments and approaches to clustering of speakers within a speaker diarization system for processing of broadcast audio. Evaluated segmentation approaches are all based on the widely used Bayesian Information Criterion (BIC). They differ in a strategy for choice of the length of the window (fixed or variable) and in a strategy for estimation of the decision threshold (fixed or adaptive). Further, we compare two bottom-up clustering approaches. The traditional BIC-based clustering is compared with the approach based on a measure of the distance between GMMs estimated for the data of clusters by the Maximum A Posteriori (MAP) adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nouza, J., Zdansky, J., Cerva, P., Kolorenc, J.: Continual On-line Monitoring of Czech Spoken Broadcast Programs. In: Proceedings of 7th International Conference on Spoken Language Processing (ICSLP 2006), Pittsburgh, pp. 1650–1653 (2006)

    Google Scholar 

  2. Chen, S.S., Gopalakrishnan, P.S.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: Proceedings 1998 DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, pp. 127–132 (1998)

    Google Scholar 

  3. Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Transactions on Audio, Speech, Language Processing 14(5), 1557–1565 (2006)

    Article  Google Scholar 

  4. Meignier, S., Moraru, D., Fredouille, C., Bonastre, J.-F., Besacier, L.: Step-by-Step and integrated approaches in broadcast news speaker diarization. Computer Speech And Language (20), 303–330 (2005)

    Google Scholar 

  5. Ben, M., Bester, M., Bimbot, F., Gravier, G.: Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs. In: Proceedings of 8th International Conference Spoken Language Processing, Jeju Island, pp. 2329–2332 (2004)

    Google Scholar 

  6. Reynolds, D.A., Quatieri, T., Dunn, R.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10(1-3), 19–41 (2000)

    Article  Google Scholar 

  7. Gauvain, J.-L., Lamel, L., Adda, G.: Partitioning and transcription of broadcast news data. In: Proceedings International Conference Spoken Language Processing, Sydney, pp. 1335–1338 (1998)

    Google Scholar 

  8. Lopez-Otero, P., Fernandez, L.D., Garcia-Mateo, C.: Novel strategies for reducing the false alarm rate in a speaker segmentation system. In: Proceedings of ICASSP 2010, Dallas, pp. 4970–4973 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Prazak, J., Silovsky, J. (2011). Comparison of Segmentation and Clustering Methods for Speaker Diarization of Broadcast Stream Audio. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds) Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues. Lecture Notes in Computer Science, vol 6800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25775-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25775-9_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25774-2

  • Online ISBN: 978-3-642-25775-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics