Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

Huijbregts, Marijn; Ordelman, Roeland; de Jong, Franciska

doi:10.1007/978-3-540-77051-0_8

Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition

Marijn Huijbregts¹,
Roeland Ordelman¹ &
Franciska de Jong¹

Conference paper

596 Accesses
15 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4816))

Abstract

This paper reports on the setup and evaluation of robust speech recognition system parts, geared towards transcript generation for heterogeneous, real-life media collections. The system is deployed for generating speech transcripts for the NIST/TRECVID-2007 test collection, part of a Dutch real-life archive of news-related genres. Performance figures for this type of content are compared to figures for broadcast news test data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anguera, X., Wooters, C., Pardo, J.: Robust speaker diarization for meetings: Icsi rt06s evaluation system. In: RT 2006. LNCS, vol. 4299, Springer, Heidelberg (2007)
Google Scholar
Cardenal, A., Dieguez, J., Garcia-Mateo, C.: Fast lm look-ahead for large vocabulary continuous speech recognition using perfect hashing. In: Proceedings ICASSP 2002, Orlando, USA, pp. 705–708 (2002)
Google Scholar
Czech, Z.J., Havas, G., Majewski, B.S.: An optimal algorithm for generating minimal perfect hash functions. Information Processing Letters 43(5), 257–264 (1992)
Article MATH MathSciNet Google Scholar
de Jong, F.M.G., Ordelman, R.J.F., Huijbregts, M.A.H.: Automated speech and audio analysis for semantic access to multimedia. In: Avrithis, Y., Kompatsiaris, Y., Staab, S., O’Connor, N.E. (eds.) SAMT 2006. LNCS, vol. 4306, pp. 226–240. Springer, Heidelberg (2006)
Google Scholar
Demuynck, K., Duchateau, J., Van Compernolle, D., Wambacq, P.: An efficient search space representation for large vocabulary continuous speech recognition. Speech Commun. 30(1), 37–53 (2000)
Article Google Scholar
Finke, M., Fritsch, J., Koll, D., Waibel, A.: Modeling and efficient decoding of large vocabulary conversational speech. In: Proceedings Eurospeech 1999, Budapest, Hungary, pp. 467–470 (1999)
Google Scholar
Garofolo, J.S., Auzanne, C.G.P., Voorhees, E.M: The TREC SDR Track: A Success Story. In: Eighth Text Retrieval Conference, Washington, pp. 107–129 (2000)
Google Scholar
Gauvain, J.-L., Adda, G., Adda-Decker, M., Allauzen, A., Gendner, V., Lamel, L., Schwenk, H.: Where Are We in Transcribing French Broadcast News? In: InterSpeech, Lisbon (September 2005)
Google Scholar
Nguyen, L., Abdou, S., Afify, M., Makhoul, J., Matsoukas, S., Schwartz, R., Xiang, B., Lamel, L., Gauvain, J.L., Adda, G., Schwenk, H., Lefevre, F.: The 2004 BBN/LIMSI 10xRT English Broadcast News Transcription System. In: Proc. DARPA RT 2004, Palisades NY (November 2004)
Google Scholar
Oostdijk, N.: The Spoken Dutch Corpus. Overview and first evaluation. In: Gravilidou, M., Carayannis, G., Markantonatou, S., Piperidis, S., Stainhaouer, G. (eds.) Second International Conference on Language Resources and Evaluation, vol. II, pp. 887–894 (2000)
Google Scholar
Ordelman, R.: Dutch Speech Recognition in Multimedia Information Retrieval. PhD thesis, University of Twente, The Netherlands (October 2003)
Google Scholar
Siohan, O., Myrvol, T., Lee, C.: Structural maximum a posteriori linear regression for fast hmm adaptation. In: ISCA ITRW Automatic Speech Recognition: Challenges for the Millenium, pp. 120–127 (2000)
Google Scholar
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and trecvid. In: MIR 2006. 8th ACM SIGMM International Workshop on Multimedia Information Retrieval (2006)
Google Scholar
van Leeuwen, D.A., Huijbregts, M.A.H.: The ami speaker diarization system for nist rt06s meeting data. In: RT 2006. LNCS, vol. 4299, pp. 371–384. Springer, Heidelberg (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Twente, Dept. of Electrical Engineering, Mathematics and Computer Science, P.O. Box 217, 7500 AE, Enschede, The Netherlands
Marijn Huijbregts, Roeland Ordelman & Franciska de Jong

Authors

Marijn Huijbregts
View author publications
You can also search for this author in PubMed Google Scholar
Roeland Ordelman
View author publications
You can also search for this author in PubMed Google Scholar
Franciska de Jong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bianca Falcidieno Michela Spagnuolo Yannis Avrithis Ioannis Kompatsiaris Paul Buitelaar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huijbregts, M., Ordelman, R., de Jong, F. (2007). Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition. In: Falcidieno, B., Spagnuolo, M., Avrithis, Y., Kompatsiaris, I., Buitelaar, P. (eds) Semantic Multimedia. SAMT 2007. Lecture Notes in Computer Science, vol 4816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77051-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-77051-0_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77033-6
Online ISBN: 978-3-540-77051-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics