This paper reports recent work at ORL on segmentation of digital audio/video recordings. Firstly, we describe an audio segmentation algorithm that partitions a soundtrack into manageably sized segments for speech recognition. Secondly, we present an algorithm for detecting camera shot-break locations in the video. The output of these two algorithms is combined to produce a semantically meaningful segmentation of audio/video content, appropriate for information retrieval. We report the success of the algorithms in the context of television news retrieval.
Cite as: Pye, D., Hollinghurst, N.J., Mills, T.J., Wood, K.R. (1998) Audio-visual segmentation for content-based retrieval. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0517, doi: 10.21437/ICSLP.1998-598
@inproceedings{pye98_icslp, author={David Pye and Nicholas J. Hollinghurst and Timothy J. Mills and Kenneth R. Wood}, title={{Audio-visual segmentation for content-based retrieval}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0517}, doi={10.21437/ICSLP.1998-598} }