The LIA RT’07 Speaker Diarization System

Fredouille, Corinne; Evans, Nicholas

doi:10.1007/978-3-540-68585-2_48

Corinne Fredouille¹ &
Nicholas Evans^1,2

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4625))

Included in the following conference series:

1257 Accesses
4 Citations

Abstract

This paper presents the LIA submission to the speaker diarization task of the 2007 NIST Rich Transcription (RT’07) evaluation campaign. We report a system optimised for conference meeting recordings and experiments on all three RT’07 subdomains and microphone conditions. Results show that, despite state-of-the-art performance for the single distant microphone (SDM) condition, in its current form the system is not effective in utilising the additional information that is available with the multiple distant microphone (MDM) condition. With post evaluation tuning we achieve a DER of 19% on the MDM task with conference meeting data. Some early experimental work highlights both the limitations and potential of utilising between-channel delay features for diarization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

NIST: (RT 2007) Rich Transcription meeting recognition evaluation plan (2007), http://www.nist.gov/speech/tests/rt/rt2007/docs/rt07-meeting-eval-plan-v2.pdf
Istrate, D., Fredouille, C., Meignier, S., Besacier, L., Bonastre, J.F.: NIST RT’05S evaluation: pre-processing techniques and speaker diarization on multiple microphone meetings. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, Springer, Heidelberg (2006)
Chapter Google Scholar
Fredouille, C., Senay, G.: Technical improvements of the e-hmm based speaker diarization system for meeting records. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, Springer, Heidelberg (2006)
Chapter Google Scholar
Anguera, X., Wooters, C., Hernando, J.: Speaker diarization for multi-party meetings using acoustic fusion. In: Proc. ASRU 2005 (2005)
Google Scholar
Fredouille, C., Moraru, D., Meignier, S., Besacier, L., Bonastre, J.F.: The NIST 2004 spring rich transcription evaluation: two-axis merging strategy in the context of multiple distance microphone based meeting speaker segmentation. In: RT2004 Spring Meeting Recognition Workshop, p. 5 (2004)
Google Scholar
Bonastre, J.F., Wils, F., Meignier, S.: ALIZE, a free toolkit for speaker recognition. In: ICASSP 2005, Philadelphia, USA (2005)
Google Scholar
Meignier, S., Bonastre, J.F., Fredouille, C., Merlin, T.: Evolutive HMM for speaker tracking system. In: ICASSP 2000, Istanbul, Turkey (2000)
Google Scholar
Meignier, S., Moraru, D., Fredouille, C., Bonastre, J.F., Besacier, L.: Step-by-step and integrated approaches in broadcast news speaker diarization. Special issue of Computer and Speech Language Journal 20(2-3) (2006)
Google Scholar
Zhu, X., Barras, C., Meignier, S., Gauvain, J.L.: Combining speaker identification and BIC for speaker diarization. In: EuroSpeech 2005, Lisboa, Portugal (2005)
Google Scholar
Pardo, J.M., Anguera, X., Wooters, C.: Speaker diarization for multiple distant microphone meetings: mixing acoustic features and inter-channel time differences. In: Proc. ICSLP 2006 (2006)
Google Scholar
Ellis, D.P.W., Liu, J.C.: Speaker turn detection based on between-channel differences. In: Proc. ICASSP 2004 (2004)
Google Scholar
Pardo, J.M., Anguera, X., Wooters, C.: Speaker diarization for multi-microhpone meetings using only between-channel differences. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, Springer, Heidelberg (2006)
Chapter Google Scholar
Brandstein, M.S., Silverman, H.F.: A robust method for speech signal time-delay estimation in reverberent rooms. In: Proc. ICASSP 1997 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

LIA-University of Avignon, BP1228, 84911, Avignon Cedex 9, France
Corinne Fredouille & Nicholas Evans
University of Wales Swansea, Singleton Park, Swansea, SA2 8PP, UK
Nicholas Evans

Authors

Corinne Fredouille
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Evans
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Rainer Stiefelhagen Rachel Bowers Jonathan Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fredouille, C., Evans, N. (2008). The LIA RT’07 Speaker Diarization System. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_48

Download citation

DOI: https://doi.org/10.1007/978-3-540-68585-2_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics