Elsevier

Neurocomputing

Volume 167, 1 November 2015, Pages 390-405
Neurocomputing

Salient object detection via contrast information and object vision organization cues

https://doi.org/10.1016/j.neucom.2015.04.055Get rights and content

Abstract

As a popular topic, saliency detection has attracted lots of research interest benefiting for its valuable applications in computer vision and image processing. In this paper, we propose to delineate saliency by considering both the contrast and object vision organization. It consists of two stages. In the first stage, the primary element, contrast saliency, is acquired by measuring color contrast and color distribution with background prior and center prior to address the uniqueness and compactness of salient regions. In the second stage, inspired by the Gestalt principles of grouping from the study of visual perception, we take into account the properties of closure, proximity and similarity for object vision organization, and then provide the object vision saliency filtering to emphasize homogeneous saliency across similar and object-like regions. As for the task, a map called object coverage confidence is presented to express the closure by characterizing the probability of complete object areas with refined profiles, which is constructed by fusing multiple information prediction maps, implying probable closure areas of objects in different layers of an image. Experimental results on five publicly available benchmarks demonstrate that our model outperforms the state-of-the-art methods.

Introduction

Saliency detection, with its capability to locate important regions covering valuable contents in an image, is seemed as one of the vital mechanisms in selective visual attention [1]. Relying on such a preattentive processing in visual perception, humans can rapidly focus their eyes on parts of interest being full of information they need from scenes [2]. Since understanding the mechanism distinctly is difficult [3], much effort has been devoted to research this human vision system (HVS) [4], [5]. Encouragingly, sequential achievements in this field has proven its worth in solving many computer vision problems including image classification [6], [7], recognition [8], [9], retrieval [10], segmentation [11], [12], and compression [13]. Based on this observation, saliency detection would be beneficial for a wide range of practical applications.

Since the feature integration theory (FIT) was proposed by Treisman and Gelade [14], it has been respected as one of the most significant principles in visual attention for decades. According to the concept of saliency map expressing human conspicuousness of scene locations proposed by Koch and Ullman [4], Itti et al. [5] construct the milestone model of visual attention, which is the first complete implementation incorporating FIT to detect saliency. Inspired by this succeed, a lot of research interest in this field has been attracted and a variety of models have been developed since then [15], [16], [17], [18], [19], dealing with the task mainly from two modes: the bottom-up manner [20] and the top-down manner [21]. Bottom-up attention is stimulus-driven and based on characteristics of scenes, while top-down attention is task-driven and mainly determined by cognitive phenomena [3]. Meanwhile, according to different purposes, existing models establish saliency towards two aspects, one is eye-fixation prediction [22] and another is salient object detection [23]. As the concept implies, the former is intended for predicting locations which easily attract human attention in a scene, whereas the later often focuses on identifying the whole salient objects or regions from an image, always preparing for the application like object segmentation [24] and recognition [25]. In this paper, we elaborate our work on salient object detection under a bottom-up manner since it is independent with specific tasks showing a more general pattern in preattentive processing [2], and so as to computer vision.

For bottom-up processing, early works measure saliency from various perspectives, including biological-inspired imitation [5], information-based theory [15], [26], [27], discriminant mechanism [18], [28], frequency domain [17], [29], graph-based model [16], [30], Bayesian framework [31], [32], and sparsity decomposition [33]. These approaches have achieved good results to some extent, however, when regarding to complex scenes, they may often fail. This is well understood that traditional saliency is preferably only measured by contrast to highlight feature rarity or irregularity. In fact, the interior of foreground object may contain various features (see Fig. 1(a)), leading to the invalidation of contrast-based measurement. To overcome this shortage, recently, object-based attention [34], [35], [36] is introduced into saliency modeling, which argues that saliency should be computed on regions formed by perceptual grouping procedures according to basic Gestalt principles [37], [30]. Under this guidance, saliency detection could be improved by a more complete constraint with the construction of object organization [38], [39], [40].

In this paper, we elaborate on providing a new salient object detection model inspired by the Gestalt principles [37]. Although this concept is not novel and has been used before in measuring saliency [41], [42], we are completely different. Formally, in our work, saliency is estimated by combining both contrast information and object vision organization cues to characterize the visual grouping of object. It emphasizes homogeneous saliency across similar and object-like regions via an optimization considering properties of contrast, closure, proximity and similarity. To this end, the proposed model consists of two stages: the first stage for expressing contrast and the second stage for describing object vision organization.

In the first stage, we first weight regions adjacent to image boundaries as the spatial background prior as the work in [43], [44]. Then, we compute the color contrast by measuring color dissimilarity between all regions to backgrounds in feature space, to reflect the uniqueness of salient regions, and calculate color distribution through evaluating the dispersiveness of color locations in spatial domain, to address the compactness of salient regions. After this, a contrast saliency (CS) map is generated via integrating the color contrast and color distribution with center bias. CS makes up the primary element for saliency measurement. In the second stage, we first present a map called object coverage confidence (OCC) to express the closure by characterizing the probability of complete object areas with refined profiles. It is constructed by fusing multiple information prediction maps which imply probable closure areas of objects in different layers of an image. These information maps are computed by multi-layer segmentations attributing to a bipartite graph partitioning approach [45]. Due to the coverage of probably entire objects, OCC provides the visual character of closure suggesting that objects are habitually perceived as being whole by individuals [37]. Then, given the CS and OCC, we propose the object vision saliency filtering (OVSF) to assign homogeneous saliency values across similar and object-like regions. Specifically, three terms are contained in OVSF, namely a constraint on contrast (with CS), a constraint on closure (with OCC), and a smoothing term. The smoothing term is to force homogeneous saliency for close and similar regions, which is in accordance with the visual proximity and similarity in human perception [37]. Accordingly, OVSF takes into account effects of contrast, closure, proximity and similarity simultaneously. As a result, the final saliency is obtained by optimizing an expression of energy minimization with an iterative solution. Some saliency results of typical models and ours are shown in Fig. 1, as can be seen that compared to other maps with heterogeneous values, our saliency maps could highlight objects as being whole even for complex scenes. For overview, Fig. 2 shows the framework of our proposed model.

In summary, major contributions of the proposed model cover four aspects:

  • 1.

    A novel measurement, object coverage confidence (OCC), is presented to express the visual character of closure, which characterizes the probability of complete object areas with refined profiles, to indicate the entire closure object as being whole.

  • 2.

    An optimization, object vision saliency filtering (OVSF), is proposed to estimate saliency on object, which integrates both contrast and object vision organization cues including properties of contrast, closure, proximity and similarity, inspired from the Gestalt principles [37].

  • 3.

    A new contrast saliency (CS) is presented by combining color contrast, color distribution, background prior and center prior, to serve as the primary saliency for OVSF.

  • 4.

    Due to the consideration of both the contrast and object vision organization, our model achieves respective performance and encouraging results against the state-of-the-art models on five publicly available benchmarks.

The remainder of this paper is organized as follows. Section 2 reviews some related work. Section 3 introduces the computation of contrast saliency. Section 4 describes the construction of object coverage confidence, and then details the formulation of object vision saliency filtering with its solution to obtain the final saliency map. Experiments carried on publicly available benchmarks are shown in Section 5. Section 6 discusses some applicative and failure cases for the proposed model. Finally, this paper end ups with a conclusion in Section 7.

Section snippets

Related Work

In this section, we review the work that relates to our model. Specifically, we mainly introduce several influential bottom-up techniques from two aspects: measuring saliency by contrast and object organization cues.

Proposed contrast saliency

Contrast is one of the crucial cues holding the key for saliency [49], [18], [47]. In our model, we leverage it as the primary element. However, different from traditional approaches, we consider the information expression for highlighting contrast and then conduct it into the construction of the contrast saliency (CS) map. Totally, four principles, the uniqueness, compactness, background prior and center prior, are taken into account to compute CS. They are illustrated as follows:

  • Uniqueness:

Object vision saliency filtering

Contrast cue is not able to provide the entirety of object as being whole due to its limited search in local regions. Saliency estimation relying on contrast alone always loses distinctive details, such as saliency maps shown in Fig. 1(b)–(d). This is because in general scenes, objects may consist of various features. Namely, large differences among regions might exist in the interior of object. For instance, in the last row of Fig. 1(a), the black hair and white blouse all belongs to the same

Experiments and results

In experiments, we first verify the function of each element (contrast, center bias, closure, proximity and similarity) considered in our proposed model. Then, we present the evaluation and analysis of our model against the state-of-the-art methods on five publicly available benchmarks to show the good effectiveness and performance of the proposed saliency model, especially for the construction of our object vision organization. More details are described in the rest of this section.

Discussions

Before ending, we give some discussions on applicative and failure cases for our model. As detailedly described before, the proposed model crucially consists of CS, OCC and smoothing in OVSF, in order to characterize the contrast, closure, similarity and proximity. Contrast is a primary element in measuring saliency, which highlights regions with rare and irregular features. In our work, it takes effects when foregrounds have unique colors different from backgrounds. However, for images shown

Conclusions

In this paper, we first argue that the accuracy of saliency measurement could be improved by visual grouping of object, and then present a salient object detection model by combining both the contrast information and object vision organization cues. To express the contrast, color contrast and color distribution with background prior and center prior are integrated to address the uniqueness and compactness of salient regions, which seemed as the primary saliency. To describe the object vision

Acknowledgment

This work was supported by the National Natural Science Foundation of China (NSFC) under the Grants 61273279 and 61273241.

Shengxiang Qi received the B.S. degree at the School of Automation from Nanjing University of Aeronautics and Astronautics, Nanjing, China, in 2010. He is now pursuing the Ph.D. degree at the School of Automation at Huazhong University of Science and Technology (HUST), Wuhan, China. His current research interests mainly include sparse coding and visual saliency modeling with its applications in object detection and recognition.

References (67)

  • X. Bai et al.

    Saliency-svman automatic approach for image segmentation

    Neurocomputing

    (2014)
  • Y. Sun et al.

    Object-based attention for computer vision

    Artif. Intell.

    (2003)
  • G. Papari et al.

    Edge and line oriented contour detectionstate of the art

    Image Vis. Comput.

    (2011)
  • L. Itti et al.

    Computational modeling of visual attention

    Nat. Rev. Neurosci.

    (2001)
  • C. Healey et al.

    Attention and visual memory in visualization and computer graphics

    IEEE Trans. Vis. Comput. Graph.

    (2012)
  • A. Borji et al.

    State-of-the-art in visual attention modeling

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • C. Koch et al.

    Shifts in selective visual attentiontowards and underlying neural circuitry

    Matters Intell.

    (1987)
  • L. Itti et al.

    A model of saliency-based visual attention for rapid scene analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1988)
  • G. Sharma, F. Jurie, C. Schmid, Discriminative spatial saliency for image classification, in: Proceedings of IEEE...
  • K. Huang et al.

    Biologically inspired features for scene classification in video surveillance

    IEEE Trans. Syst. Man Cybern.—Part BCybern.

    (2011)
  • A. Oikonomopoulos et al.

    Spatiotemporal salient points for visual recognition of human actions

    IEEE Trans. Syst. Man Cybern.—Part BCybern.

    (2005)
  • U. Rutishauser, D. Walther, C. Koch, P. Perona, Is bottom-up attention useful for object recognition? in: Proceedings...
  • X. Wang, W. Ma, X. Li, Data-driven approach for bridging the cognitive gap in image retrieval, in: Proceedings of IEEE...
  • Z. Liu et al.

    Unsupervised salient object segmentation based on kernel density estimation and two-phase graph cut

    IEEE Trans. Multimed.

    (2012)
  • C. Guo et al.

    A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression

    IEEE Trans. Image Process.

    (2010)
  • A. Tresiman et al.

    A feature integration theory of attention

    Cogn. Psychol.

    (1980)
  • N. Bruce, J. Tsotsos, Saliency based on information maximization, in: Proceedings of Advances in Neural Information...
  • J. Harel, C. Koch, P. Perona, Graph-based visual saliency, in: Proceedings of Advances in Neural Information Processing...
  • X. Hou, L. Zhang, Saliency detection: a spectral residual approach, in: Proceedings of IEEE Conference on Computer...
  • D. Gao, N. Vasconcelos, Bottom-up saliency is a discriminant process, in: Proceedings of IEEE International Conference...
  • N. Murray, M. Vanrell, X. Otazu, C. Parraga, Saliency estimation using a non-parametric low-level vision model, in:...
  • H. Seo, P. Milanfar, Nonparametric bottom-up saliency detection by self-resemblance, in: Proceedings of IEEE Conference...
  • J. Yang, M. Yang, Top-down visual saliency via joint crf and dictionary learning, in: Proceedings of IEEE Conference on...
  • A. Borji et al.

    Quantitative analysis of human-mode agreement in visual saliency modelinga comparative study

    IEEE Trans. Image Process.

    (2013)
  • A. Borji, D. Sihite, Salient object detection: a benchmark, in: Proceedings of European Conference on Computer Vision,...
  • H. Jiang, J. Wang, Z. Yuan, T. Liu, N. Zheng, S. Li, Automatic salient object segmentation based on context and shape...
  • Z. Ren et al.

    Region-based saliency detection and its application in object recognition

    IEEE Trans. Circuits Syst. Video Technol.

    (2014)
  • X. Hou, L. Zhang, Dynamic visual attention: searching for coding length increments, in: Proceedings of Advances in...
  • N. Bruce, J. Tsotsos, Saliency, attention, and visual search: an information theoretic approach, Journal of Vision 9...
  • D. Gao, V. Mahadevan, N. Vasconcelos, The discriminant center-surround hypothesis for bottom-up saliency, in:...
  • R. Achanta, S. Hemami, F. Estrada, S. Susstrunk, Frequency-tuned salient region detection, in: Proceedings of IEEE...
  • J. Yu et al.

    Maximal entropy random walk for region-based visual saliency

    Trans. Syst. Man Cybern.—Part B: Cybern.

    (2014)
  • L. Zhang, M. Tong, T. Marks, H. Shan, G. Cottrell, Sun: a Bayesian framework for saliency using natural statistics,...
  • Cited by (0)

    Shengxiang Qi received the B.S. degree at the School of Automation from Nanjing University of Aeronautics and Astronautics, Nanjing, China, in 2010. He is now pursuing the Ph.D. degree at the School of Automation at Huazhong University of Science and Technology (HUST), Wuhan, China. His current research interests mainly include sparse coding and visual saliency modeling with its applications in object detection and recognition.

    Jin-Gang Yu received the B.S. degree from Xi׳an Jiaotong University, Xi׳an, China, in 2005, the M.S. degree and the Ph.D. degree from the Huazhong University of Science and Technology, Wuhan, China, in 2007 and 2014 respectively. From 2007 to 2010, he was a Research and Development Engineer with the industry. He is currently a Postdoctoral Research Associate with the University of Nebraska-Lincoln, U.S.A. His research interests include visual saliency modeling, object matching, sparse representation, and graphical models.

    Jie Ma received the Ph.D. degree in pattern recognition and intelligent systems from Huazhong University of Science and Technology (HUST), Wuhan, China, in 2004. From 2005 to 2006, he was a Post-Doctoral Staff with the Department of Electronics and Information, HUST. He is currently a Professor with the School of Automation, HUST. His research interests include computer vision, pattern recognition, navigation and guidance.

    Yansheng Li received the B.S. degree in the School of Mathematics and Statistics from Shandong University, Weihai, China, in 2010. He is now pursuing the Ph.D. degree at the School of Automation at Huazhong University of Science and Technology (HUST), Wuhan, China. His current research interests mainly include visual saliency modeling, machine learning, infrared target detection, and automatic information extraction from remote sensing images.

    Jinwen Tian received the Ph.D. degree in pattern recognition and intelligent systems from Huazhong University of Science and Technology (HUST), Wuhan, China, in 1998. He is currently a Professor with the School of Automation, HUST. His current research interests include remote sensing image analysis, wavelet analysis, image compression and fractal geometry.

    View full text