Elsevier

Signal Processing

Volume 80, Issue 6, June 2000, Pages 1049-1067
Signal Processing

A fuzzy video content representation for video summarization and content-based retrieval

https://doi.org/10.1016/S0165-1684(00)00019-0Get rights and content

Abstract

In this paper, a fuzzy representation of visual content is proposed, which is useful for the new emerging multimedia applications, such as content-based image indexing and retrieval, video browsing and summarization. In particular, a multidimensional fuzzy histogram is constructed for each video frame based on a collection of appropriate features, extracted using video sequence analysis techniques. This approach is then applied both for video summarization, in the context of a content-based sampling algorithm, and for content-based indexing and retrieval. In the first case, video summarization is accomplished by discarding shots or frames of similar visual content so that only a small but meaningful amount of information is retained (key-frames). In the second case, a content-based retrieval scheme is investigated, so that the most similar images to a query are extracted. Experimental results and comparison with other known methods are presented to indicate the good performance of the proposed scheme on real-life video recordings.

Zusammenfassung

In dieser Arbeit wird eine unscharfe (fuzzy) Darstellung des visuellen lnhalts vorgeschlagen, die von Nutzen für die neu aufkommenden Multimedia-Anwendungen wie inhaltsbasierte Bildindizierung und -auffindung, Video-Browsing und Video-Zusammenfassung ist. Im speziellen wird ein mehrdimensionales fuzzy Histogramm für jedes Videobild konstruiert, welches auf einer Menge von geeigneten Merkmalen beruht, die mittels Verfahren zur Video-Sequenzanalyse extrahiert werden. Dieser Ansatz wird sowohl auf die Video-Zusammenfassung (in Zusammenhang mit einem inhaltsbasierten Abtastalgorithmus) als auch auf die inhaltsbasierte Indizierung und Auffindung angewandt. Im ersten Fall wird die Video-Zusammenfassung durch Verwerfen von Bildern mit ähnlichem visuellen Inhalt erzielt, so daß nur eine kleine aber aussagekräftige Menge von Information zurückbehalten wird (Schlüsselbilder). Im zweiten Fall wird eine inhaltsbasierte Auffindungsmethode untersucht, so daß die einer Anfrage am meisten entsprechenden Bilder extrahiert werden. Die gute Leistungsfähigkeit der vorgeschlagenen Methode bei echten Videoaufnahmen wird durch experimentelle Ergebnisse und durch Vergleich mit anderen bekannten Methoden gezeigt.

Résumé

Dans cet article, nous proposons une représentation floue du contenu visuel qui est utile pour les nouvelles applications multimédias émergeantes, telles l'indexage et la récupération d'images par le contenu, ou le parcours et le résumé vidéo. En particulier, un histogramme flou multidimensionnel est construit pour chaque trame vidéo, sur base d'une collection de caractéristiques appropriées, extraites en utilisant des techniques d'analyse de séquences vidéo. Cette approche est ensuite appliquée à la fois au résumé vidéo, dans le contexte d'un algorithme d'échantillonnage basé sur le contenu, et à l'indexage et la récupération basés sur le contenu. Dans le premier cas, le résumé vidéo est accompli est éliminant des plans ou des trames de contenu visuel similaire, (trames-clés). Dans le second cas, nous investiguons un schéma de récupération sur base du contenu, de sorte que les images les plus semblables à une requête soient extraites. Des résultats expérimentaux et une comparison avec d'autres méthodes connues sont présentées pour indiquer les bonnes performances du schéma proposé sur des enregistrements vidéo de la vie réelle

Introduction

The increasing amount of digital image and video data has stimulated new technologies for efficient searching, indexing, content-based retrieving and managing multimedia databases. The traditional approach of keyword annotation to accessing image or video information has the drawback that, apart from the large amount of effort for developing annotations, it cannot efficiently characterize the rich visual content using only text. For this reason, content-based retrieval algorithms have been recently proposed and they have attracted a great research interest in the image processing community. [10], [19]. Examples of content-based retrieval systems, either academic or in the first stage of commercial exploitation include the QBIC [8], Virage [11] or VisualSeek [21] prototypes. In this framework, the moving picture expert group (MPEG) is currently defining the new MPEG-7 standard [17], to specify a set of descriptors for an efficient interface of multimedia information.

The aforementioned systems are mainly restricted to still images and cannot easily be applied to video databases [4]. This is due to the fact that the standard representation of video as a sequence of consecutive frames results in significant temporal redundancy of visual content and thus it is very inefficient and time consuming to perform queries on every video frame. Furthermore, most video databases are often located on distributed platforms and impose both large storage and transmission bandwidth requirements, even if they are compressed. Such linear representation of video sequences is also not adequate for the new emerging multimedia applications, such as video browsing, content-based indexing and retrieval. For this reason, a content-based sampling algorithm is usually applied to video data for extracting a small but “meaningful” amount of the video information [3], [13]. This results in a video summarization scheme similar to that used in document search engines, where a brief text summary corresponds to one or multiple documents.

However, efficient implementation of content-based retrieval algorithms and video summarization schemes requires a more meaningful representation of visual content than the traditional pixel-based one. This is due to the fact that there is a lack of semantic information at the pixel level. For this reason, several works have been presented in the literature towards a more efficient image/video representation. A hidden Markov model has been investigated in [15] for color image retrieval, while in [5] an approach of image retrieval based on user sketches has been reported. A hierarchical color clustering method has been presented in [22]. For video summarization, construction of a compact image map or image mosaics has been described in [13], while a pictorial summary of video sequences based on story units has been presented in [24].

In the context of this paper, a fuzzy representation of visual content is proposed for both video summarization and content-based indexing and retrieval. This representation increases the flexibility of content-based retrieval systems since it provides an interpretation closer to the human perception [14]. It also results in a more robust description of visual content, since possible instabilities of the segmentation, used for describing the visual content, are reduced. In particular, the adopted fuzzy representation is applied for both video summarization and content-based retrieval. In the first case, a small set of key-frames is extracted which provides an efficient description of visual content. This is performed by minimizing a cross correlation criterion among the video frames by means of a genetic algorithm. The correlation is computed using several features extracted using a color/ motion segmentation on a fuzzy feature vector formulation basis. In the second case, the user provides queries in the form of images or sketches which are analyzed in the same way as video frames in video summarization scheme. A metric distance or similarity measure is then used to find a set of frames that best match the user's query.

This paper is organized as follows: In Section 2, the video sequences are analyzed by applying a color/motion segmentation algorithm. The extracted features for each color or motion segment are fuzzy classified as is presented in Section 3. Application of the proposed fuzzy representation schemes to video summarization is discussed in Section 4, while the application to content-based retrieval is discussed in Section 5. Furthermore, several practical implementation issues, such as selected parameters and numerical values, are also mentioned in these sections. Experimental results on a large image/video databases are presented in Section 6 along with comparisons with other known techniques to show the good performance of the proposed scheme. Finally, Section 7 concludes the paper.

Section snippets

Video sequence analysis

Semantic segmentation, i.e., extraction of meaningful entities, is essential in a content-based retrieval environment. However, this remains one of the most difficult problems in the image analysis community, especially if no constraints are imposed on the kind of the examined video sequences [6], [7], [9]. For this reason, a color/motion segmentation algorithm is applied in this paper for visual content description.

A multiresolution implementation of the recursive shortest spanning tree (RSST)

Fuzzy visual content representation

The size, location and average color components of all color segments are used as color properties. In a similar way, motion properties include the size, location and average motion vectors of all motion segments. Since the segment number is not constant for each video frame, the aforementioned properties cannot be directly included in a feature vector, because the size of this vector is not constant. Thus, direct comparison between vectors of different frames is practically impossible. For

Video summarization

Fig. 5 depicts the block diagram of the proposed video summarization scheme. Since a video sequence is a collection of different shots, each of which corresponds to a continuous action of a single camera operation, a shot cut detection algorithm is first applied, to identify video frames of similar visual content. The algorithm proposed in [22] has been adopted for this purpose, since it presents high accuracy and low computational complexity compared to other techniques [3]. Then, analyzing

Content-based retrieval

The problem of content-based retrieval from image and video databases is discussed in this section. Particularly, for content-based video retrieval the aforementioned video summarization scheme is applied so that all the redundant temporal video information is discarded. At this point, the problem of content-based retrieval from a video database has actually reduced to still image retrieval since video queries are applied on the selected key-frames. The proposed fuzzy representation scheme is

Experimental results

The proposed fuzzy representation of visual content has been evaluated both for video summarization and content-based indexing and retrieval, using a large database consisting of MPEG coded video sequences and several images compressed in JPEG format. The Optibase Fusion MPEG encoder at a bit-rate of 2 Mbits/s has been used to encode the video sequences.

Fig. 11 illustrates a shot used to demonstrate the performance of the key-frame extraction algorithm. The shot comes from an educational series

Conclusions

A new approach for efficient visual content representation has been presented in this paper. In particular, in the proposed framework, the traditional pixel-based representation of visual content is transformed to a fuzzy feature-based one, which is more suitable for the new emerging multimedia applications, such as video browsing, content-based image indexing and retrieval and video summarization. First, an analysis of video sequences is performed by applying a color/motion segmentation

Acknowledgements

The authors would like to thank Georgios Akrivas, for providing them with an efficient implementation of the key-frame selection technique presented in [23].

References (24)

  • A. Alatan, L. Onural, M. Wollborn, R. Mech, E.Tuncel, T. Sikora, Image sequence analysis for emerging interactive...
  • Y. Avrithis, A. Doulamis, N.D. Doulamis, S. Kollias, An adaptive approach to video indexing and retrieval using fuzzy...
  • Y. Avrithis, A. Doulamis, N. Doulamis, S. Kollias, A stochastic framework for optimal key frame extraction from MPEG...
  • S.-F. Chang, W. Chen, H.J. Meng, H. Sundaram, D. Zhong, A fully automated content-based video search engine supporting...
  • A. Del Bimbo, P. Pala, Visual image retrieval by elastic matching of user sketches, IEEE Trans. Pattern Anal. Mach....
  • N. Doulamis, A. Doulamis, D. Kalogeras, S. Kollias, Very low bit-rate coding of image sequences using adaptive regions...
  • A. Doulamis, N. Doulamis, S. Kollias, On line retrainable neural networks: improving the performance of neural networks...
  • M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D....
  • L. Garrido, F. Marques, M. Pardas, P. Salembier, V. Vilaplana, A hierarchical technique for image sequence analysis, in...
  • V.N. Gudivada, J.V. Raghavan (Eds.), Special Issue on Content-Based Image Retrieval Systems, IEEE Comput. Mag. 28 (9)...
  • A. Hamrapur, A. Gupta, B. Horowitz, C.F. Shu, C. Fuller, J. Bach, M. Gorkani, R. Jain, Virage video engine, SPIE...
  • S. Haykin

    Adaptive Filter Theory

    (1996)
  • Cited by (114)

    • A comprehensive survey and mathematical insights towards video summarization

      2022, Journal of Visual Communication and Image Representation
    • Spatio-temporal summarization of dance choreographies

      2018, Computers and Graphics (Pergamon)
      Citation Excerpt :

      Such an abstract content representation is useful in many applications ranging from multimedia systems (e.g., indexing, browsing, content-based search and retrieval) [2] and education (e.g., teaching/learning of a dance choreography) [3,4] to documentation and preservation of the Intangible Cultural Heritage (ICH) assets, [5]. Extraction of representative key frames for an abstract description of a video sequence, is an important topic in multimedia research [6,7]. Actually, video summarization algorithms are content-based sampling procedures that reduce semantically unimportant or redundant content.

    View all citing articles on Scopus
    View full text