Abstract
This paper reports on the setup and evaluation of robust speech recognition system parts, geared towards transcript generation for heterogeneous, real-life media collections. The system is deployed for generating speech transcripts for the NIST/TRECVID-2007 test collection, part of a Dutch real-life archive of news-related genres. Performance figures for this type of content are compared to figures for broadcast news test data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Anguera, X., Wooters, C., Pardo, J.: Robust speaker diarization for meetings: Icsi rt06s evaluation system. In: RT 2006. LNCS, vol. 4299, Springer, Heidelberg (2007)
Cardenal, A., Dieguez, J., Garcia-Mateo, C.: Fast lm look-ahead for large vocabulary continuous speech recognition using perfect hashing. In: Proceedings ICASSP 2002, Orlando, USA, pp. 705–708 (2002)
Czech, Z.J., Havas, G., Majewski, B.S.: An optimal algorithm for generating minimal perfect hash functions. Information Processing Letters 43(5), 257–264 (1992)
de Jong, F.M.G., Ordelman, R.J.F., Huijbregts, M.A.H.: Automated speech and audio analysis for semantic access to multimedia. In: Avrithis, Y., Kompatsiaris, Y., Staab, S., O’Connor, N.E. (eds.) SAMT 2006. LNCS, vol. 4306, pp. 226–240. Springer, Heidelberg (2006)
Demuynck, K., Duchateau, J., Van Compernolle, D., Wambacq, P.: An efficient search space representation for large vocabulary continuous speech recognition. Speech Commun. 30(1), 37–53 (2000)
Finke, M., Fritsch, J., Koll, D., Waibel, A.: Modeling and efficient decoding of large vocabulary conversational speech. In: Proceedings Eurospeech 1999, Budapest, Hungary, pp. 467–470 (1999)
Garofolo, J.S., Auzanne, C.G.P., Voorhees, E.M: The TREC SDR Track: A Success Story. In: Eighth Text Retrieval Conference, Washington, pp. 107–129 (2000)
Gauvain, J.-L., Adda, G., Adda-Decker, M., Allauzen, A., Gendner, V., Lamel, L., Schwenk, H.: Where Are We in Transcribing French Broadcast News? In: InterSpeech, Lisbon (September 2005)
Nguyen, L., Abdou, S., Afify, M., Makhoul, J., Matsoukas, S., Schwartz, R., Xiang, B., Lamel, L., Gauvain, J.L., Adda, G., Schwenk, H., Lefevre, F.: The 2004 BBN/LIMSI 10xRT English Broadcast News Transcription System. In: Proc. DARPA RT 2004, Palisades NY (November 2004)
Oostdijk, N.: The Spoken Dutch Corpus. Overview and first evaluation. In: Gravilidou, M., Carayannis, G., Markantonatou, S., Piperidis, S., Stainhaouer, G. (eds.) Second International Conference on Language Resources and Evaluation, vol. II, pp. 887–894 (2000)
Ordelman, R.: Dutch Speech Recognition in Multimedia Information Retrieval. PhD thesis, University of Twente, The Netherlands (October 2003)
Siohan, O., Myrvol, T., Lee, C.: Structural maximum a posteriori linear regression for fast hmm adaptation. In: ISCA ITRW Automatic Speech Recognition: Challenges for the Millenium, pp. 120–127 (2000)
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and trecvid. In: MIR 2006. 8th ACM SIGMM International Workshop on Multimedia Information Retrieval (2006)
van Leeuwen, D.A., Huijbregts, M.A.H.: The ami speaker diarization system for nist rt06s meeting data. In: RT 2006. LNCS, vol. 4299, pp. 371–384. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huijbregts, M., Ordelman, R., de Jong, F. (2007). Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition. In: Falcidieno, B., Spagnuolo, M., Avrithis, Y., Kompatsiaris, I., Buitelaar, P. (eds) Semantic Multimedia. SAMT 2007. Lecture Notes in Computer Science, vol 4816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77051-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-77051-0_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77033-6
Online ISBN: 978-3-540-77051-0
eBook Packages: Computer ScienceComputer Science (R0)