Spatially consistent partial matching for intra- and inter-image prototype selection

https://doi.org/10.1016/j.image.2008.04.017Get rights and content

Abstract

This paper describes a method of introducing spatial consistency constraints in the process of matching set-based descriptors extracted from digital images. The proposed matching technique is guided by a rule that can be summarized as follows: a descriptor is important for the match if it is similar to some descriptor from the other image and its spatial neighbors are important. The resulting match is partial in the sense that it deliberately avoids the complexity of searching for one-to-one correspondences among particular descriptors, but established affinity among groups of descriptors instead.

Formally, the proposed method is expressed as an eigenvalue problem, where the principal eigenvector's components render the importance values of individual descriptors, while the corresponding eigenvalue represents an estimate of the overall strength of affinity between images being matched. These measures of descriptor importance and image affinity are shown to provide a natural basis for intra- and inter-image prototype selection. Several variations of the proposed technique are empirically evaluated on the task of content-based image retrieval, demonstrating encouraging results.

Introduction

This work is situated in the domain of content-based image retrieval where digital images are represented by a variable number of feature vectors commonly referred to as set-based descriptors. The main contribution of this work is a spatially consistent partial matching method designed to establish correspondence between groups of descriptors rather than between individual descriptors themselves. The described method is empirically evaluated as an intra- and inter-image prototype selection approach. In the former case, the method allows to find descriptors that contribute best to the match, whereas in the latter case, the technique allows to select descriptors that are representative of a group of images belonging to a certain class.

Our choice of the set-based descriptors is motivated by their superior performance [11] and applicability in a wide range of content-based image retrieval applications. In general, most of these descriptors characterize local image features extracted at certain interest points within an image. Examples of these descriptors include maximally stable extremal regions (MSER) [10], scale-invariant feature transform (SIFT) [9], speeded up robust features (SURF) [1], as well as their extensions such as PCA-SIFT [7], SIFT with global context [12], gradient location-orientation histogram (GLOH) [11].

Further, our additional motivation in this work is to improve the two following characteristics of the popular matching and retrieval techniques for set-based descriptors: spatial consistency and many-to-many correspondence. Spatial consistency. The former aspect relates to the fact that spatial configuration of descriptors and their positions relative to each other are seldom considered when a match is calculated. We believe that this kind of approach may be detrimental to the overall performance, since oftentimes images contain a large number of high similarity descriptors that are not localized on the visual objects of interest and thus lead to an erroneous match (see Section 3.1 for such an example).

Certainly, there exists earlier work that attempts to add some context to the information content of the local feature descriptors. But these contributions together with their advantages have a number of drawbacks as well. For instance, in [12] the authors add only the shape context information to that of the SIFT descriptor, while the semi-local constraints proposed in [13] require threshold tuning for neighbor match percentage and impose complicated restrictions on admissible angles between neighboring descriptors being matched. The proposed approach described below strives to take into account both spatial proximity and feature-based similarity information during the matching process, while at the same time trying to avoid the above mentioned pitfalls.

Many-to-many correspondence. The latter aspect concerns the way the correspondence is established between matching descriptors in two images being compared. In the proposed method, we focus on deriving a common quality estimate (importance) for each descriptor in an image, and use this value to decide whether a given descriptor belongs to a group that constitutes a match. Thus, a many-to-many correspondence is found between groups of image descriptors, without the need to resort to model- (e.g., RANSAC [3]) and graph-based (e.g., bipartite graph matching, assignment problem [16]) techniques that can be quite costly from the computational complexity point of view.

In the section that follows, we will introduce the method of spatially consistent descriptor matching. In so doing, we will provide a detailed description of how it calculates descriptor importance and image affinity, and show how these two measures may serve as a basis for various intra- and inter-image prototype selection techniques. Then, in Section 3, we will present the experimental results which include exploratory data analysis and an empirical evaluation of the proposed method on the task of content-based image retrieval. The article will conclude with a summary of the developed techniques and their important properties.

Section snippets

Spatially consistent descriptor matching

This section presents the method of spatially consistent partial matching of local image descriptors. Here, we detail the problem formulation of the proposed method and provide an illustrative example of its usage. We also show how descriptor importance and image affinity measures computed by the proposed approach are applied to the problem of intra- and inter-image prototype selection problem.

Experimental results

Here we present the details of the experimental results obtained while evaluating the proposed technique. Throughout all of the experiments, we chose to use SIFT [9] descriptors to provide set-based representation of image contents. Proximity and similarity functions are respectively the same to those defined in Eqs. (7) and (9), but dgrid is the Euclidean distance measured in pixels normalized by the size of the image, and dRGB becomes dSIFT, the Euclidean distance between SIFT descriptors.

Conclusion

We have introduced a spatially consistent descriptor matching method and demonstrated its possible application in the domain of content-based image retrieval. The developed approach incorporates descriptor proximity data when the matching is computed to make sure that the quality of a match of a given descriptor depends on both descriptor itself and its neighbors.

The proposed matching method has been formulated and shown to be equivalent to a standard eigenvalue problem, where the principal

References (16)

  • M.A. Fischler et al.

    Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

  • H. Bay, T. Tuytelaars, L.J.V. Gool, SURF: speeded up robust features, in: ECCV, vol. 1, 2006, pp....
  • R. Fergus et al.

    Object class recognition by unsupervised scale-invariant learning

  • E. Fix, J. Hodges, Discriminatory analysis: nonparametric discrimination: consistency properties, Technical Report 4,...
  • G.H. Golub et al.

    Matrix Computations

    (1989)
  • R.A. Horn et al.

    Matrix Analysis

    (1986)
  • Y. Ke et al.

    PCA-SIFT: a more distinctive representation for local image descriptors

    CVPR

    (2004)
  • T. Lindeberg

    Feature detection with automatic scale selection

    Int. J. Comput. Vision

    (1998)
There are more references available in the full text version of this article.

Cited by (2)

This work is supported by the Advanced Media Management group at Intel Corp. and the Swiss NCCR Interactive Multi-modal Information Management (IM2).

1

Now at Google, Zürich, Switzerland.

View full text