Elsevier

Pattern Recognition

Volume 39, Issue 10, October 2006, Pages 1892-1904
Pattern Recognition

Multivariate image similarity in the compressed domain using statistical graph matching

https://doi.org/10.1016/j.patcog.2006.04.015Get rights and content

Abstract

We address the problem of image similarity in the compressed domain, using a multivariate statistical test for comparing color distributions. Our approach is based on the multivariate Wald–Wolfowitz test, a nonparametric test that assesses the commonality between two different sets of multivariate observations. Using some pre-selected feature attributes, the similarity measure provides a comprehensive estimate of the match between different images based on graph theory and the notion of minimal spanning tree (MST). Feature extraction is directly provided from the JPEG discrete cosine transform (DCT) domain, without involving full decompression or inverse DCT. Based on the zig-zag scheme, a novel selection technique is introduced that guarantees image's enhanced invariance to geometric transformations. To demonstrate the performance of the proposed method, the application on a diverse collection of images has been systematically studied in a query-by-example image retrieval task. Experimental results show that a powerful measure of similarity between compressed images can emerge from the statistical comparison of their pattern representations.

Introduction

Through the last two decades, the problem of image similarity has become a challenging task in the machine vision community. The estimate of (dis)similarity between color images or parts of images has been studied upon different application domains, such as image retrieval and indexing, classification and unsupervised segmentation [1]. In these directions, several (dis)similarity measures have been developed and used as empirical estimates of the distribution of image features, confirming that distribution-based measures exhibit excellent performance in all areas. In order to estimate the similarity between color images, a feature space has to be chosen first, low-level features need to be extracted afterwards so as to construct a theoretically sound distribution and, finally, a measure needs to be chosen that properly assesses the difference or commonality of the represented distributions [1].

Multidimensional distributions are often used in computer vision applications, in order to represent the extracted color and texture features of a given image into a properly defined feature space. The most suitable representation for color information is color histogram [2], a nonparametric estimate of the feature distribution which statistically denotes the joint probability of intensities of the different color channels. In general, histogram techniques provide useful clues for the subsequent expression of similarity between images, due to their robustness to background complications and object distortion. Moreover, they are translation, scale and rotation invariant, very simple to implement and systems encountering histograms exhibit fast-retrieval responses that make real-time implementation easier. Texture, on the other hand, is characterized by responses to specially tuned spatial and orientation filters in a neighborhood of a pixel position and is also represented using histogram-based methodologies [1]. Regarding similarity, a profound number of distance measures have been proposed for computing the distance between histograms, which can be classified into four categories [1]: heuristic histogram distances, nonparametric test statistics, information theory divergences and ground distance ones. Histogram-based similarity techniques, however, exhibit several drawbacks. The necessary trade-off during the binning procedure has been recognized as the major cause of this limitation. By using too few bins, only a crude description is obtained that is insufficient for complicated images. On the other hand, the use of too many bins results in a sparse representation for simple images with the majority of bins remaining empty. In order to overcome the above limitation, the Earth Mover's Distance (EMD) [3] was presented recently, which combines the benefits from the use of a distribution distance with a flexible description of visual content, adapting its resolution to individual images. Apart from the efficiency issue, there is still a continuing research interest, due to the incorporated representation scheme and the subsequent computation of EMD. This is related with the reliability of comparisons between distributions, as there is lack of supporting evidence in the field of statistics that EMD is indeed an appropriate measure for comparing multivariate distributions, except from the simplified case of equal masses, where the EMD is proven to be exactly the same as the Mallows distance [4].

Nowadays, more than 90% of digital images provided on the WEB are stored in JPEG (Joint Photographic Experts Group) format [5]. Similarity of compressed-domain images has recently become a very active research area [6]. In particular, the JPEG compression standard applies discrete cosine transform (DCT) in order to achieve a large amount of compression, significantly reducing the image size. Such compression is suitable for Internet-based applications, reducing the storage space while increasing the downloading speed. Thus, measuring image similarity directly in the compressed domain becomes more and more beneficial, compared to the pixel-domain one. In order to extend the feature extraction scheme to compressed-images, the conventional approaches need to primarily decode the images to the pixel-level domain, before carrying out image processing and analysis techniques. This means that a full decompression is a prerequisite for such image retrieval systems, which not only is time consuming, but also computationally expensive. To bridge the gap between the compressed domain and the pixel domain, where the majority of image-processing algorithms are developed, recent research is now starting apace to develop content feature extraction algorithms directly in the compressed domain [7], [8], [9], [10], [11], [12]. As the inverse DCT (IDCT) is an embedded part of the JPEG decoder [5] and DCT itself is one of the best filters for the feature extraction, working on DCT domain directly remains to be the most promising area for compressed-image processing and similarity-based retrieval. The DCT domain, to a certain extent, has unique scale invariance and zooming characteristics [9], which can provide insight into objects and texture identification [6]. In addition, DCT exhibits a set of good properties such as energy compaction and image data decorrelation. Therefore, it is naturally considered to be a potential domain in mining visual information. Thus, direct feature extraction from DCT domain can provide better solutions in characterizing the image content, apart from its advantage of eliminating any necessity of decomposing the image and detecting its features in the pixel domain.

In this work, a novel methodology for estimating the similarity of compressed JPEG images is presented, using multivariate statistical graph matching. The proposed method relies on a dual segregation algorithmic step. First, color and texture features are directly extracted from the DCT-compressed domain, in the form of an ensemble of feature vectors. In line with the JPEG and MPEG standards, the YCrCb tri-chromatic model [6] is used for representing color images. Color features are obtained, by keeping the DC component from each N×N block, acting separately to each Y, Cr and Cb frame. In addition, texture (or color–texture) features are extracted, by selecting primarily a number of k vectors from the diagonal zig-zag lines inside an image block and estimating afterwards the k magnitudes of the AC components contained along the corresponding zig-zag vectors. The specific-indexing scheme was found to be robust, when similarity-based image rotation is considered. Once the appropriate features are extracted from the images to be compared, matching techniques are considered using a properly defined similarity measure. Since color and texture attributes are encountered as the low-level features selected from the compressed application domain, data similarity need to be considered using appropriate representations. In our study, we followed a pattern-analytic and graph-theoretic approach. The visual content of each image is described by means of a vectorial distribution directly in the compressed-domain feature space. The comparison between two such images incorporates the computation of a distributional difference and therefore shares the invariant characteristics of the histogram-like methods. In the core of the proposed method lies a nonparametric test dealing with the “Multivariate Two-Sample Problem” [13], which has been adopted here for expressing visual image similarity. The specific test is a multivariate extension of the classical Wald–Wolfowitz test (WW-test) and compares two different samples of vectorial observations (i.e., two sets of points in RP by checking whether they form different branches in the overall minimal-spanning tree (MST) [14]. The output of this test can be expressed as the probability that the two-point samples are coming from the same distribution. Its great advantage is that no a priori assumption about the distribution of points in the two samples is a prerequisite [15].

The remainder of the paper is organized as follows. In Section 2, the proposed similarity measure is fully justified, including the graph-theoretic framework of MST and the multivariate WW-test. The feature extraction process is analyzed in Section 3, where color and texture attributes are directly provided from the DCT-compressed domain. In addition, the JPEG compression scheme is given in short detail. In order to test our algorithm in terms of efficiency and effectiveness, a query-by-example image retrieval system is built and several experiments are performed and presented thoroughly in Section 4. Simulation results and comparisons with other (dis)similarity measures are provided, along with a short discussion of the visual experimental observation. Finally, conclusions are drawn in Section 5, including an outline of our future research objectives.

Section snippets

Multivariate statistical graph matching

In the following, the nonparametric multivariate generalization of the Wald–Wolfowitz two-sample problem is presented and justified as a similarity measure. In our analysis, the specific case of the MST as the graph-theoretic description of the image content is mandatory and thus we begin by reviewing some terms from graph theory.

Feature extraction in the compressed domain

Since human perception is less sensitive to high-frequency components of the spectral energy in an image, most compression algorithms transform images into the frequency domain so as to separate low- and high- frequency components. Since the majority of images and videos stored in the WEB are in JPEG compressed format [5], we limit our discussion and feature extraction scheme to DCT-based compression technique. By looking at the DCT compressed domain, we extract the appropriate features

Experimental analysis

In order to test the proposed similarity measure, using color and texture features directly from the compressed JPEG domain, a query-by-example image retrieval system was built. The image database included in the retrieval scheme is part of the Corel Collection, containing D=1000 JPEG images of sizes [192×128] or [128×192] pixels of different topics such as animals, plants, views, natural images, etc. The dataset was formed by pre-assigning the images into 20 distinct classes of S=50 similar

Conclusions and future objectives

In this paper, a novel approach for measuring similarity between JPEG images was introduced, based on the multivariate WW-test. The proposed method relies on a dual algorithmic step. Initially, color and texture features are directly extracted from the DCT-compressed domain in the form of an ensemble of feature vectors and afterwards, commonality is assessed between pairs of image representations in a properly formed space. The great advantage of WW-approach is that since it involves a

Acknowledgment

This work was supported by the European Social Fund (ESF), Operational Program for Educational and Vocational Training II (EPEAEK II), under the Program HERAKLEITOS.

References (29)

  • M.J. Swain et al.

    Color indexing

    Int. J. Comput. Vision

    (1991)
  • Y. Rubner et al.

    The earth mover's distance as a metric for image retrieval

    Int. J. Comput. Vision

    (2000)
  • E. Levina et al.

    The earth mover's distance is the mallows distance: some insights from statistics

  • G.K. Wallace

    The JPEG still picture compression standard

    Commun. ACM

    (1991)
  • Cited by (0)

    View full text