An observation-constrained generative approach for probabilistic classification of image regions
Introduction
Automatic extraction of the semantic context of a scene is useful for image indexing and retrieval, robotic navigation, surveillance, robust object detection and recognition, auto-albuming, etc. Recent literature reveals increasing attention to this field [18], [20], [21]. However, most of the attempts have been made to extract the context of a scene at a high level in the abstraction hierarchy. For example, Torralba et al. [20] represent the context by using the power spectra at different spatial frequencies, while Vailaya [21] uses the edge coherence histograms to differentiate between natural and urban scenes. Limited attention has been paid to the task of specific context generation from a scene (scene classification), e.g. if a scene is a beach or an office. The main hurdle in such context generation is that it requires not only the knowledge of the regions or objects in the image, but the semantic information contained in their spatial arrangement as well.
The motivation for our work comes from the following paradox of scene classification. In absence of any a priori information, the scene classification task requires the knowledge of regions and objects contained in the image. On the other hand, it is increasingly being recognized in vision community that context information is necessary for reliable extraction of the image regions and objects [18], [20]. To solve this paradox, an iterative feedback scheme can be envisaged, which refines the scene context and image region hypotheses iteratively. Under this paradigm, an obvious choice for the region classification scheme is one that allows easy modification of the initial classification without requiring to classify the regions afresh. Probabilistic classification of image regions can provide great flexibility in future refinement using the Bayesian approach, as the context information can be encoded as improved priors.
In this paper, we deal with the probabilistic classification of image regions belonging to scenes primarily containing natural objects, e.g. water, sky, sand, skin, etc. as a priming step for the problem of scene context generation. Several techniques have been proposed to classify various image regions in distinct categories, e.g. [2], [17]. However, these are primarily based on discriminative approaches leading to hard class assignments and hence are not suitable for the iterative refinement scheme mentioned above.
In conventional schemes [8], [12], [14], [15], a generative model for each class is learnt using all the available training data belonging to that class and newly observed data is assigned a probability based on the learnt model. However, it is possible that an input image has been generated from a subset of the full generative model support, and using the full model to assign generative probabilities can produce serious artifacts in the probability assignments. For example, in the training set, various pixels associated with the class water may have wide variations in the color and texture depending on the location, illumination conditions, and the scale. A subset of the training image set in Fig. 1 shows these variations. A generative model for class water will try to capture all these variations in the same model. Hence, while assigning probabilities to the newly observed data, the learnt generative model may assign high probability not only to the pixels belonging to the class water, but also to those that may have some semblance to water data (in some arbitrary combination of illumination and other physical conditions). Similar problems can arise with the other classes as well. This problem arises mainly when different classes have multimodal distributions that are close in feature space.
Another problem with generative models is that they tend to give more weight to regions in feature space that contain more data, instead of emphasizing the discriminative boundary between the data belonging to different classes. This implies that the data near the boundary will be assigned similar probabilities irrespective of their class affiliations.
We propose to alleviate these problems by using the simple observation that the newly observed data are usually generated from a small support of the overall generative model. In our previous water example, this means that the data belonging to the water class in a new test image is usually generated at the same location as well as under relatively homogeneous illumination and other physical conditions. Thus, the distribution of the newly observed data can be used to constrain the overall generative model when computing the generative class density maps for that data. That is why the proposed technique has been named as the ‘observation-constrained generative approach’. The preliminary results on this approach were presented in Ref. [9].
This idea can be illustrated through the scatter plots of the class data. Fig. 2(a) shows the distribution of the water data from all the training images in a 2D feature space (normalized color space). Given an input test image (Fig. 4(a)), the distribution of the water data contained in this image, superimposed on the overall water data in Fig. 2(a) is shown in Fig. 2(b). Similarly, the distribution of the sky data from the input image along with the data in Fig. 2(b) is given in Fig. 2(c). Fig. 2(c) shows that the sky data is contained within the distribution of the overall water data. Thus, the sky regions in the input image will be assigned high probability of being generated from class water. However, it can be noted that the distributions of water and sky data from the input test image are fairly separable. Furthermore, it is clear from Fig. 2(b) that the distribution of the input water data is contained within a small support of the distribution of the overall water data. Thus, for the given test image, if the distribution of the input water data could be used to constrain the full generative model for the class water, the potential errors in probability assignment to the sky data can be significantly reduced.
Section snippets
Generative approach
In the first stage of our approach, we use two well-known techniques, i.e. supervised learning applied to labeled training data, and unsupervised learning applied to test data. The main contribution of our approach lies in the next stage, where the outputs of these two techniques are merged using a Bayesian scheme. We begin by briefly reviewing the unsupervised and supervised learning methods applied in the present context.
Let Xω be the set of training data associated with class ω, where
Model selection
In the proposed generative approach, we need to know the number of the components in the Gaussian mixture models in Eq. (1) as well as in Eq. (4), which amounts to the problem of model selection. The maximum likelihood approach is not appropriate for this task, as it would always favor more components. Several techniques have been proposed for model selection [5], [6], [7], [13]. Full Bayesian techniques provide a more principled method of model selection and generally use a parametric or
Feature extraction
The type of features to be extracted from an image depends on the nature of the scene classification task. In the present work, we deal with the scene images primarily containing natural regions. Although not sufficient, low-level features such as color and texture contain good representation power for the region classification of natural scenes.
Results and discussion
This section is divided into two subsections. Section 5.1 discusses the simulation results of the proposed KLD-based model selection scheme on synthetic data, and Section 5.2 contains the results of the proposed observation-constrained generative approach applied to real natural scenes.
Conclusions
We have proposed and successfully demonstrated the use of an observation-constrained generative approach for the probabilistic classification of image regions. A probabilistic approach towards clustering and classification leads to a useful technique, which is capable of refining the classification results obtained using the generative models. The proposed scheme is robust to errors in clustering. A KLD-based fast component selection procedure has been proposed for natural scene images. In the
Acknowledgements
We would like to thank Amit Singhal and Zhao Hui Sun from Eastman Kodak Company, Rochester, and Daniel Huber and Goksel Dedeoglu from Carnegie Mellon University, Pittsburgh for their useful suggestions.
References (21)
- et al.
Probabilistic classification of image regions using an observation-constrained generative approach
The ECCV Workshop on Generative Model Based Vision, Copenhagen
(2002) - et al.
Texture classification and segmentation using multiresolution simultaneous autoregressive models
Pattern Recognition
(1992) Modeling by shortest data description
Automatica
(1978)Neural Networks for Pattern Recognition
(1995)- et al.
The automatic classification of outdoor images
International Conference on Engineering Applications of Neural Networks
(1996) - C. Carson, S. Belongie, H. Grennspan, J. Malik, Region-based image querying, CVPR'97, Workshop on Content-based Access...
- et al.
Maximum likelihood from incomplete data via the EM algorithm
Journal of the Royal Statistical Society, Series B
(1977) - C. Farley, A.E. Raftery, How many clusters? Which clustering method? Answers via model-based cluster analysis,...
- et al.
Model selection and the principle of minimum description length
JASA
(1998) - D. Heckerman, D. Chickering, A comparison of scientific and engineering criteria for Bayesian model selection,...
Cited by (16)
Roadside vegetation segmentation with Adaptive Texton Clustering Model
2019, Engineering Applications of Artificial IntelligenceCitation Excerpt :However, a problem of these works is that the contextual and spatial constraints are generally based on knowledge learnt either from the training data or a specific environment, and thus their performance is difficult to be generalized to new data. There are also studies that generated statistical models, such as mixture Gaussian models (Kumar et al., 2003) and probability density functions (Bosch et al., 2007) of newly observation data, to provide refinements to pixel-level classification results. As collective classification over spatial regions has shown higher accuracy than pixel-level classification (Zhang et al., 2015a; Achanta et al., 2012), the ATCM utilizes superpixel-level voting to enforce a spatial constraint on the smoothness of class labels in neighboring pixels.
Segmentation and description of natural outdoor scenes
2007, Image and Vision ComputingCitation Excerpt :At the classification stage, we take advantage of the information provided for the test image. In [18] they constraint the object model using the distribution of the newly observed data, while in our approach we do the reverse: we extend the object model using the newly test data distribution. Moreover in [21], they used active regions segmentation to obtain the regions of the image.
Object Detection Its Progress and Principles
2023, Proceedings of the 2023 12th International Conference on System Modeling and Advancement in Research Trends, SMART 2023A noise-resilient collaborative learning approach to content-based image retrieval
2011, International Journal of Intelligent SystemsDiscriminative graphical models for context-based classification
2010, Studies in Computational Intelligence