Elsevier

Image and Vision Computing

Volume 21, Issue 1, 10 January 2003, Pages 87-97
Image and Vision Computing

An observation-constrained generative approach for probabilistic classification of image regions

https://doi.org/10.1016/S0262-8856(02)00125-7Get rights and content

Abstract

In this paper, we propose a probabilistic region classification scheme for natural scene images. In conventional generative methods, a generative model is learnt for each class using all the available training data belonging to that class. However, if an input image has been generated from only a subset of the model support, use of the full model to assign generative probabilities can produce serious artifacts in the probability assignments. This problem arises mainly when the different classes have multimodal distributions with considerable overlap in the feature space. We propose an approach to constrain the class generative probability of a set of newly observed data by exploiting the distribution of the new data itself and using linear weighted mixing. A Kullback–Leibler Divergence-based fast model selection procedure is also proposed for learning mixture models in a low dimensional feature space. The preliminary results on the natural scene images support the effectiveness of the proposed approach.

Introduction

Automatic extraction of the semantic context of a scene is useful for image indexing and retrieval, robotic navigation, surveillance, robust object detection and recognition, auto-albuming, etc. Recent literature reveals increasing attention to this field [18], [20], [21]. However, most of the attempts have been made to extract the context of a scene at a high level in the abstraction hierarchy. For example, Torralba et al. [20] represent the context by using the power spectra at different spatial frequencies, while Vailaya [21] uses the edge coherence histograms to differentiate between natural and urban scenes. Limited attention has been paid to the task of specific context generation from a scene (scene classification), e.g. if a scene is a beach or an office. The main hurdle in such context generation is that it requires not only the knowledge of the regions or objects in the image, but the semantic information contained in their spatial arrangement as well.

The motivation for our work comes from the following paradox of scene classification. In absence of any a priori information, the scene classification task requires the knowledge of regions and objects contained in the image. On the other hand, it is increasingly being recognized in vision community that context information is necessary for reliable extraction of the image regions and objects [18], [20]. To solve this paradox, an iterative feedback scheme can be envisaged, which refines the scene context and image region hypotheses iteratively. Under this paradigm, an obvious choice for the region classification scheme is one that allows easy modification of the initial classification without requiring to classify the regions afresh. Probabilistic classification of image regions can provide great flexibility in future refinement using the Bayesian approach, as the context information can be encoded as improved priors.

In this paper, we deal with the probabilistic classification of image regions belonging to scenes primarily containing natural objects, e.g. water, sky, sand, skin, etc. as a priming step for the problem of scene context generation. Several techniques have been proposed to classify various image regions in distinct categories, e.g. [2], [17]. However, these are primarily based on discriminative approaches leading to hard class assignments and hence are not suitable for the iterative refinement scheme mentioned above.

In conventional schemes [8], [12], [14], [15], a generative model for each class is learnt using all the available training data belonging to that class and newly observed data is assigned a probability based on the learnt model. However, it is possible that an input image has been generated from a subset of the full generative model support, and using the full model to assign generative probabilities can produce serious artifacts in the probability assignments. For example, in the training set, various pixels associated with the class water may have wide variations in the color and texture depending on the location, illumination conditions, and the scale. A subset of the training image set in Fig. 1 shows these variations. A generative model for class water will try to capture all these variations in the same model. Hence, while assigning probabilities to the newly observed data, the learnt generative model may assign high probability not only to the pixels belonging to the class water, but also to those that may have some semblance to water data (in some arbitrary combination of illumination and other physical conditions). Similar problems can arise with the other classes as well. This problem arises mainly when different classes have multimodal distributions that are close in feature space.

Another problem with generative models is that they tend to give more weight to regions in feature space that contain more data, instead of emphasizing the discriminative boundary between the data belonging to different classes. This implies that the data near the boundary will be assigned similar probabilities irrespective of their class affiliations.

We propose to alleviate these problems by using the simple observation that the newly observed data are usually generated from a small support of the overall generative model. In our previous water example, this means that the data belonging to the water class in a new test image is usually generated at the same location as well as under relatively homogeneous illumination and other physical conditions. Thus, the distribution of the newly observed data can be used to constrain the overall generative model when computing the generative class density maps for that data. That is why the proposed technique has been named as the ‘observation-constrained generative approach’. The preliminary results on this approach were presented in Ref. [9].

This idea can be illustrated through the scatter plots of the class data. Fig. 2(a) shows the distribution of the water data from all the training images in a 2D feature space (normalized color space). Given an input test image (Fig. 4(a)), the distribution of the water data contained in this image, superimposed on the overall water data in Fig. 2(a) is shown in Fig. 2(b). Similarly, the distribution of the sky data from the input image along with the data in Fig. 2(b) is given in Fig. 2(c). Fig. 2(c) shows that the sky data is contained within the distribution of the overall water data. Thus, the sky regions in the input image will be assigned high probability of being generated from class water. However, it can be noted that the distributions of water and sky data from the input test image are fairly separable. Furthermore, it is clear from Fig. 2(b) that the distribution of the input water data is contained within a small support of the distribution of the overall water data. Thus, for the given test image, if the distribution of the input water data could be used to constrain the full generative model for the class water, the potential errors in probability assignment to the sky data can be significantly reduced.

Section snippets

Generative approach

In the first stage of our approach, we use two well-known techniques, i.e. supervised learning applied to labeled training data, and unsupervised learning applied to test data. The main contribution of our approach lies in the next stage, where the outputs of these two techniques are merged using a Bayesian scheme. We begin by briefly reviewing the unsupervised and supervised learning methods applied in the present context.

Let Xω be the set of training data associated with class ω, where Xω={xkω

Model selection

In the proposed generative approach, we need to know the number of the components in the Gaussian mixture models in Eq. (1) as well as in Eq. (4), which amounts to the problem of model selection. The maximum likelihood approach is not appropriate for this task, as it would always favor more components. Several techniques have been proposed for model selection [5], [6], [7], [13]. Full Bayesian techniques provide a more principled method of model selection and generally use a parametric or

Feature extraction

The type of features to be extracted from an image depends on the nature of the scene classification task. In the present work, we deal with the scene images primarily containing natural regions. Although not sufficient, low-level features such as color and texture contain good representation power for the region classification of natural scenes.

Results and discussion

This section is divided into two subsections. Section 5.1 discusses the simulation results of the proposed KLD-based model selection scheme on synthetic data, and Section 5.2 contains the results of the proposed observation-constrained generative approach applied to real natural scenes.

Conclusions

We have proposed and successfully demonstrated the use of an observation-constrained generative approach for the probabilistic classification of image regions. A probabilistic approach towards clustering and classification leads to a useful technique, which is capable of refining the classification results obtained using the generative models. The proposed scheme is robust to errors in clustering. A KLD-based fast component selection procedure has been proposed for natural scene images. In the

Acknowledgements

We would like to thank Amit Singhal and Zhao Hui Sun from Eastman Kodak Company, Rochester, and Daniel Huber and Goksel Dedeoglu from Carnegie Mellon University, Pittsburgh for their useful suggestions.

References (21)

There are more references available in the full text version of this article.

Cited by (16)

  • Roadside vegetation segmentation with Adaptive Texton Clustering Model

    2019, Engineering Applications of Artificial Intelligence
    Citation Excerpt :

    However, a problem of these works is that the contextual and spatial constraints are generally based on knowledge learnt either from the training data or a specific environment, and thus their performance is difficult to be generalized to new data. There are also studies that generated statistical models, such as mixture Gaussian models (Kumar et al., 2003) and probability density functions (Bosch et al., 2007) of newly observation data, to provide refinements to pixel-level classification results. As collective classification over spatial regions has shown higher accuracy than pixel-level classification (Zhang et al., 2015a; Achanta et al., 2012), the ATCM utilizes superpixel-level voting to enforce a spatial constraint on the smoothness of class labels in neighboring pixels.

  • Segmentation and description of natural outdoor scenes

    2007, Image and Vision Computing
    Citation Excerpt :

    At the classification stage, we take advantage of the information provided for the test image. In [18] they constraint the object model using the distribution of the newly observed data, while in our approach we do the reverse: we extend the object model using the newly test data distribution. Moreover in [21], they used active regions segmentation to obtain the regions of the image.

  • Object Detection Its Progress and Principles

    2023, Proceedings of the 2023 12th International Conference on System Modeling and Advancement in Research Trends, SMART 2023
  • Discriminative graphical models for context-based classification

    2010, Studies in Computational Intelligence
View all citing articles on Scopus
View full text