Abstract
We present a method for motion-based video segmentation and segment classification as a step towards video summarization. The sequential segmentation of the video is performed by detecting changes in the dominant image motion, assumed to be related to camera motion and represented by a 2D affine model. The detection is achieved by analysing the temporal variations of some coefficients of the 2D affine model (robustly) estimated. The obtained video segments supply reasonable temporal units to be further classified. For the second stage, we adopt a statistical representation of the residual motion content of the video scene, relying on the distribution of temporal co-occurrences of local motion-related measurements. Pre-identified classes of dynamic events are learned off-line from a training set of video samples of the genre of interest. Each video segment is then classified according to a Maximum Likelihood criterion. Finally, excerpts of the relevant classes can be selected for video summarization. Experiments regarding the two steps of the method are presented on different video genres leading to very encouraging results while only low-level motion information is considered.
Similar content being viewed by others
References
E. Ardizzone and M. La Casia, “Video indexing using optical flow field,” in Third IEEE International Conference on Image Processing, Lausanne, September 1996.
M. Basseville, “Detecting changes in signals and systems—a survey,” Automatica, Vol. 24, No. 3, pp. 309–326, 1988.
J.S. Boreczky and L.A. Rowe, “Comparison of video shot boundary detection techniques,” in SPIE Conference on Storage and Retrieval for Image and Video Databases IV, volume SPIE 2670, San Jose, January 1996, pp. 170–179.
P. Bouthemy, M. Gelgon, and F. Ganansia, “A unified approach for shot change detection and camera motion characterization,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 9, No. 7, pp. 1030–1044, 1999.
E. Bruno and S. Marchand-Maillet, “Nonlinear temporal modeling for motion-based video overviewing,” in Third International workshop on Content-Based Multimedia Indexing, CBMI’2003, Rennes, September 2003.
M. Christel, M. Smith, C. Taylor, and D. Winkler, “Evolving video skims into useful multimedia abstractions,” in ACM Conference on Human Factors in Computing Systems, CHI’1998, Los Angeles, April 1998.
R. Fablet and P. Bouthemy, “Non parametric motion recognition using temporal multiscale Gibbs models,” in IEEE Int. Conf. on Computer Vision and Pattern Recognition, Kauai, Hawaii, December 2001.
R. Fablet, P. Bouthemy, and P. Pérez, “Non-parametric motion characterization using causal probabilistic models for video indexing and retrieval,” IEEE Transactions on Image Processing, Vol. 11, No. 4, pp. 393–407, 2002.
S. Geman and D. Geman, “Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 6, pp. 721–741, 1984.
W. Liu, Z. Su, S. Li, Y. Sun, and H. Zhang, “Performance evaluation protocol for content-based image retrieval,” in CVPR Workshop on Empirical Evaluation Methods in Computer Vision, Hawaii, December 2001.
Y.-F. Ma and H.-J. Zhang, “Motion pattern-based video classificationretrieval,” EURASIP Journal on Applied Signal Processing, Vol. 2, pp. 199–208, 2003.
Y.-F. Ma, H.-J. Zhang, and M. Li, “A user attention model for video summarization,” in 10th ACM International Conference on Multimedia, Juan-Les-Pins, December 2002.
J. Meng and S.-F. Chang, “CVEPS—A compressed video editing and parsing system,” in 4th ACM International Conference on Multimedia, Boston, November 1996.
J. Nam and H. Tewfik, “Dynamic video summarization and visualization,” in 7th ACM International Conference on Multimedia, Orlando, November 1999, pp. 53–56.
J-M. Odobez and P. Bouthemy, “Robust multiresolution estimation of parametric motionmodels,” Journal of Visual Communication and Image Representation, Vol. 6, No. 4, pp. 348–365, 1995.
N. Peyrard and P. Bouthemy, “Content-based video segmentation using statistical motion models,” in British Machine Vision Conference BMVC’2002, Cardiff, September 2002.
N. Peyrard and P. Bouthemy, “Detection of meaningful events in videos based on a supervised classification approach,” in International Conference on Image Processing, ICIP’2003, Barcelona, September 2003.
Y. Rui and P. Anandan, “Segmenting visual actions based on spatio-temporal motion patterns,” in IEEE International Conference on Computer Vision and Pattern Recognition, CVPR’2000, Hilton Head, SC, June 2000, Vol. 1, pp. 111–118.
J. Sánchez, X. Binefa, and J. Kender, “Coupled Markov chains for video contents characterization,” in IEEE International Conference on Pattern Recognition, ICPR’2002, Quebec City, August 2002.
N. Vasconcelos and A. Lippman, “Statistical models of video structure for content analysis and characterization,” IEEE Transactions on Image Processing, Vol. 9, No. 1, pp. 3–19, 2000.
J. Vermaak, P. Pérez, and M. Gangnet, “Rapid summarisation and browsing of video sequences,” in British Machine Vision Conference, BMVC’2002, Cardiff, September 2002.
L. Zelnik-Manor and M. Irani, “Event-based video analysis,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR’2001, Kauai, Hawaii, December 2001, Vol. 2, pp. 123–130.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Peyrard, N., Bouthemy, P. Motion-Based Selection of Relevant Video Segments for Video Summarization. Multimed Tools Appl 26, 259–276 (2005). https://doi.org/10.1007/s11042-005-0891-0
Issue Date:
DOI: https://doi.org/10.1007/s11042-005-0891-0