Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement

doi:10.1016/j.patcog.2018.02.004

Pattern Recognition

Volume 79, July 2018, Pages 65-78

https://doi.org/10.1016/j.patcog.2018.02.004 Get rights and content

Highlights

•
Gestalt laws guided saliency detection via characterizing HVS and forming objects.
•
Smooth at superpixel and object levels by fusing bottom-up and top-down mechanisms;
•
Background suppression with background correlation term & spatial compactness term.
•
Two-stage refinement to show best among 10 state-of-the-art methods on 5 datasets.

Abstract

Visual attention is a kind of fundamental cognitive capability that allows human beings to focus on the region of interests (ROIs) under complex natural environments. What kind of ROIs that we pay attention to mainly depends on two distinct types of attentional mechanisms. The bottom-up mechanism can guide our detection of the salient objects and regions by externally driven factors, i.e. color and location, whilst the top-down mechanism controls our biasing attention based on prior knowledge and cognitive strategies being provided by visual cortex. However, how to practically use and fuse both attentional mechanisms for salient object detection has not been sufficiently explored. To the end, we propose in this paper an integrated framework consisting of bottom-up and top-down attention mechanisms that enable attention to be computed at the level of salient objects and/or regions. Within our framework, the model of a bottom-up mechanism is guided by the gestalt-laws of perception. We interpreted gestalt-laws of homogeneity, similarity, proximity and figure and ground in link with color, spatial contrast at the level of regions and objects to produce feature contrast map. The model of top-down mechanism aims to use a formal computational model to describe the background connectivity of the attention and produce the priority map. Integrating both mechanisms and applying to salient object detection, our results have demonstrated that the proposed method consistently outperforms a number of existing unsupervised approaches on five challenging and complicated datasets in terms of higher precision and recall rates, AP (average precision) and AUC (area under curve) values.

Introduction

For human beings, our visual attention system is mainly made up by both bottom-up and top-down attention mechanisms that enable us to allocate to the most salient stimuli, location, or feature that evokes the stronger neural activation than others in the natural scenes [5], [6], [7]. Bottom-up attention helps us gather information from separated feature maps e.g. color or spatial measurements, which is then incorporated to a global contrast map representing the most salient objects/regions that pop out from their surroundings [11]. Top-down attention modulates the bottom-up attentional signals and helps us voluntarily focus on specific targets/objects i.e. face and cars [15]. However, due to the high level of subjectivity and lack of formal mathematical representation, it is still very challenging for computers to imitate the characteristics of our visual attention mechanisms. In [11], it is found that the two attentional functions have distinct neural mechanisms but constantly influence each other to attentions. To this end, we aim to build a cognitive framework where separated model for each attentional mechanism is integrated together to determine the visual attention refer to the salient object detection.

To extract features at the bottom level, color plays an important role since it is a central component of the human visual system, which also facilitates our capability for scene segmentation and visual memory [22]. Color is particularly useful for object identification as it is invariant under different viewpoints. We can move or even rotate an object, yet the color we see seems unchanged due to the light reflected from the object into the retina remains the same. As a result, the salient regions/objects can be easily recognized intuitively for their high contrast to the surrounding background.

In addition to color features, our visual perception system is also sensitive to spatial signals, as the retinal ganglion cells can transmit the spatial information within natural images to the brain [25]. As a result, our human beings pay more attention to the objects and regions not only with dominant colors but also with close and compact spatial distributions. Therefore, the main objective of saliency detection is to computationally group the perceptual objects on the base of the way how our human visual perception system works.

Although color and spatial features have been widely used for salient object detection, the efficacy can still be fragile, especially in dealing with large objects and/or complicated background in the scenes [23]. The salient object often cannot be extracted as a whole (see examples in Fig. 1), though it is still relatively easily for our HVS to identify the full range of the salient objects. This shows a gap between existing approaches to an ideal one that can better exploit the potential of our HVS for more accurate salient object detection. To this end, we propose a Gestalt-law guided cognitive approach to calculate bottom-up attention. As gestalt-laws can characterize the capabilities of HVS to yield whole forms of objects from a group of simple and even unrelated visual elements [27], e.g. edges and regions, we aim to employ these laws to guide/improve the process of salient object detection.

For modelling top-down attention, Al-Aidroos et al. [28] proposed a theory named ‘background connectivity’ to describe the stimulus-evoked response of our visual cortex. It is found that focus on the scenes rather than objects may increase the background connectivity. Inspired by this theory, we employed a robust background detection model to represent the background connectivity of top-down attention in the images as post-processing to further refine the saliency maps detected using gestalt-laws guided processing.

Fig. 1 shows several examples in which the salient objects contain poor color and/or spatial contrasts. As such, conventional approaches either fails to detect the object as a whole or results in massive false alarms. Within the proposed cognitive framework, salient objects can be successfully detected whilst the false alarms are significantly suppressed. Descriptions of the proposed salient model and its implementation are detailed in Sections 3-4.

The main contributions of this paper can be highlighted as follows:

1)
We propose gestalt laws guided optimization and visual attention based refinement framework (GLGOV) for unsupervised salient object detection, where bottom-up and top-down mechanisms are combined to fully characterize HVS for effective forming of objects in a whole;
2)
We introduce a new background suppression model guided by the Gestalt law of figure and ground, where superpixel-level color quantization and adaptive thresholding are applied to determine object-level foreground and background for the calculation of the background correlation term and the spatial compactness term to further suppress the background and highlight the saliency objects;
3)
We have carried out comprehensive experiments on five challenging and complex datasets and benchmarked with eight state-of-the-art saliency detection models, where useful discussions and conclusions are achieved.

The rest of this paper is organized as follows. Section 2 summarizes the related work on saliency detection. The proposed framework by combining bottom-up and top-down HVS mechanisms for saliency detection is presented in Section 3, where the implementation detail is discussed in Section 4. Section 5 presents the experimental results and performance analysis. Finally, some concluding remarks are drawn in Section 6.

Section snippets

Related work

In the past decades, a number of salient object detection methods have been developed to identify salient regions in terms of the saliency map and capture as much as possible human perceptual attention. In general saliency detection methods can be categorized into two classes, i.e. supervised and unsupervised approaches. Most supervised methods including those using deep learning [29], [30], [31], [32], [33] are able to obtain good saliency maps, where high performance computers even with

The proposed GLGOV framework for unsupervised saliency detection

A new saliency detection framework inspired by the Gestalt laws of HVS is proposed. The proposed framework contains six main modules, i.e. homogeneity, similarity and proximity, figure and ground, background connectivity, two stage refinement and performance evaluation. The overall diagram of our saliency detection framework is illustrated in Fig. 2 where corresponding gestalt laws and visual psychology used in different modules are specified and also detailed below.

The homogeneity module aims

Implementation detail of the proposed GLGOV framework

In this section, the implementation of the proposed saliency detection framework is detailed in five stages, i.e. homogeneity, similarity and proximity, figure and ground, background connectivity and two-stage refinement below.

Experimental results

For performance evaluation of our proposed saliency detection method, in total 10 state-of-the-art algorithms are used for benchmarking, as listed below by the first letter of the name of methods. They are selected for two main reasons, i.e. high citation and wide acknowledgement in the community and/or newly presented in the last 3–5 years. Introduction to the datasets and criteria used for evaluation as well as relevant results and discussions are presented in detail in this section.

•
Bayesian

Conclusions

Inspired by both Gestalt laws optimization and background connectivity theory, in this paper, we proposed GLGOV as a cognitive framework to combine bottom-up and top-down vision mechanisms for unsupervised saliency detection. Experimental results over five publicly available datasets have shown that our method helps to produce the best overall accuracy and average accuracy when benchmarking with a number of state-of-the-art unsupervised techniques. Additional assessments in terms of the PR

Acknowledgements

This work was supported by the Natural Science Foundation of China (61672008, 61772144), the Fundamental Research Funds for the Central Universities (18CX05030A), the Natural Science Foundation of Guangdong Province (2016A030311013), Guangdong Provincial Application-oriented Technical Research and Development Special fund project (2016B010127006), and International Scientific and Technological Cooperation Projects of Guangdong Province (2017A050501039).

Yijun Yan received the M.E. degree from University of Strathclyde, Glasgow, UK, in 2013. He is currently a PhD student in the Department of Electronic and Electrical Engineering at University of Strathclyde, Glasgow, UK. His research interests include image retrieval, saliency detection and object tracking.

References (80)

C.E. Connor et al.
Visual attention: Bottom-up versus top-down
Curr. Biol.
(2004)
J.J. DiCarlo et al.
How does the brain solve visual object recognition?
Neuron
(2012)
A. Aksac et al.
Complex networks driven salient region detection based on superpixel segmentation
Pattern Recognit.
(2017)
M. Xu et al.
Bottom-up saliency detection with sparse representation of learnt texture atoms
Pattern Recognit.
(2016)
X. Sun et al.
Diversity induced matrix decomposition model for salient object detection
Pattern Recognit.
(2017)
M. Iqbal et al.
Learning feature fusion strategies for various image types to detect salient objects
Pattern Recognit.
(2016)
H. Fu et al.
Attention-driven image interpretation with application to image retrieval
Pattern Recognit.
(2006)
Z. Li et al.
Visual attention guided bit allocation in video compression
Image Vision Comput.
(2011)
Z. Tu et al.
Fusing disparate object signatures for salient object detection in video
Pattern Recognit.
(2017)
A.M. Treisman et al.
A feature-integration theory of attention
Cognit. Psychol.
(1980)

A. Desolneux et al.

Computational gestalts and perception thresholds

J. Physiol.-Paris

(2003)

N.I. Córdova et al.

Attentional modulation of background connectivity between ventral visual cortex and the medial temporal lobe

Neurobiol. Learn. Mem.

(2016)

N.-C. Yang et al.

A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval

J. Visual Commun. Image Represent.

(2008)

Y. Xie et al.

Bayesian saliency via low and mid level cues

IEEE Trans. Image Process.

(2013)

J. Zhang et al.

Minimum barrier salient object detection at 80 fps

M. Cheng et al.

Global contrast based salient region detection

IEEE Trans. Pattern Anal. Mach. Intell.

(2015)

L. Itti et al.

A model of saliency-based visual attention for rapid scene analysis

IEEE Trans. Pattern Anal. Mach. Intell.

(1998)

J.M. Wolfe

Guided Search 2.0 A revised model of visual search

Psychonom. Bull. Rev.

(1994)

C. Koch et al.

Shifts in selective visual attention: towards the underlying neural circuitry

Matters of Intelligence

(1987)

R. Desimone et al.

Neural mechanisms of selective visual attention

Annu. Rev. Neurosci.

(1995)

J. Kim et al.

Salient region detection via high-dimensional color transform

Y.-F. Ma et al.

Contrast-based image attention analysis by using fuzzy growing

J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” presented at the Advances in neural information...

F. Katsuki et al.

Bottom-up and top-down attention: different processes and overlapping neural systems

Neuroscientist

(2014)

C. Yang et al.

Saliency detection via graph-based manifold ranking

X. Hou et al.

Saliency detection: a spectral residual approach

Z. Liu et al.

Superpixel-based saliency detection

R. Achanta et al.

Salient region detection and segmentation

Computer Vision Systems

(2008)

X. Li et al.

Saliency detection via dense and sparse reconstruction

R. Achanta et al.

Frequency-tuned salient region detection

R. Achanta et al.

Saliency detection using maximum symmetric surround

W. Zhu et al.

Saliency optimization from robust background detection

E. Rahtu et al.

Segmenting salient objects from images and videos

European Conference on Computer Vision

(2010)

K.R. Gegenfurtner

Cortical mechanisms of colour vision

Nat. Rev. Neurosci.

(2003)

G. Li et al.

Visual saliency detection based on multiscale deep CNN features

IEEE Trans. Image Process.

(2016)

M.M. Cheng et al.

Global contrast based salient region detection

E. Doi et al.

Efficient coding of spatial information in the primate retina

J. Neurosci.

(2012)

G. Li et al.

Deep contrast learning for salient object detection

N.R. Carlson et al.

Psychology: The Science of Behavior

(2010)

N. Al-Aidroos et al.

Top-down attention switches coupling between low-level and high-level areas of human visual cortex

Proc. Natl. Acad. Sci.

(2012)

Cited by (152)

A graph-based top-down visual attention model for lockwire detection via multiscale top-hat transformation
2023, Expert Systems with Applications
As a locking device, lockwire is widely used in industrial fields that produces high levels of vibration. Lockwire detection is vital for ensuring mechanical stability. However, the existing methods are inapplicable to detect all types of small-size lockwires in complex backgrounds. In this paper, we construct a graph-based top-down visual attention model via the multiscale top-hat transformation to pick out lockwires from various complex backgrounds. Since lockwires typically exhibit separated bright spot-like structures at different scales in images, we firstly construct multiscale anisotropic Gaussian structuring elements to obtain the top-hat feature map. Based on a novel connectivity function, an undirected graph is then constructed. Afterwards, we propose an improved shortest path algorithm to remove prominent complex background components and extract lockwire candidates by minimizing the redefined cost function. Taking full advantages of imaging characteristics of lockwires, we design three saliency metrics to strengthen the saliency of lockwires while weakening backgrounds. Finally, a global top-down saliency map is produced to detect lockwires from complex backgrounds by combining three saliency maps. The experimental results show that our proposed method achieves superior performance compared to state-of-the-art methods.
Salient detection via the fusion of background-based and multiscale frequency-domain features
2022, Information Sciences
Citation Excerpt :
These datasets contain images of various resolutions. The proposed method is compared to some of the current state-of-the-art methods, e.g., HFT [24], SF [30], GS [37], MR [43], wCtr [49], BSCA [31], HDCT [19], FCB [28], RCRR [45], GLGO [42], and RP [22]. We show the advancement and superiority of our proposed algorithm from the following aspects: 1.
Salient object detection is a fundamental problem in image processing and computer vision. Many saliency detection algorithms based on the background and frequency-domain are used to extract salient object clues. However, the former causes the real object to be submerged in the detected object areas, especially in complex or small object scenes. While the latter will lead to the loss of some object information when detecting large objects. To solve these problems and achieve better object detection results, we propose a fusion framework for salient object detection by fusing background and frequency-domain features. The background features of the image are extracted by an improved background model. This model represents the spatial layout of the image area with respect to the image boundaries. Meanwhile, we present a new frequency-domain processing method to obtain multiscale frequency-domain features and mark the saliency of the object at different scales. Within our framework, inspired by human visual attention, we use the idea of a self-attention mechanism to capture the intrinsic relation between background and multiscale frequency-domain features. In addition, this fusion framework provides a three-dimensional Gaussian convolution kernel, which expands two-dimensional local information to three dimensions for feature fusion, thus producing more accurate salient objects. Experiment results demonstrate that the proposed method consistently outperforms eleven state-of-the-art methods on five challenging and complicated datasets in terms of four evaluation metrics.
Sports match prediction model for training and exercise using attention-based LSTM network
2022, Digital Communications and Networks
Sports matches are very popular all over the world. The prediction of a sports match is helpful to grasp the team's state in time and adjust the strategy in the process of the match. It's a challenging effort to predict a sports match. Therefore, a method is proposed to predict the result of the next match by using teams' historical match data. We combined the Long Short-Term Memory (LSTM) model with the attention mechanism and put forward an AS-LSTM model for predicting match results. Furthermore, to ensure the timeliness of the prediction, we add the time sliding window to make the prediction have better timeliness. Taking the football match as an example, we carried out a case study and proposed the feasibility of this method.
A new scheme of vehicle detection for severe weather based on multi-sensor fusion
2022, Measurement: Journal of the International Measurement Confederation
Automated vehicles are prone to traffic accidents in severe weather conditions. Real-time vehicle detection can improve the driving safety of automated vehicles. This paper proposes a new vehicle detection method based on multi-sensor fusion to improve the vehicle detection performance in severe weather conditions. First, an efficient vehicle target extraction method from the radar is proposed that uses supervised learning to train a classifier based on LightGBM. This method does not require complex prior knowledge to determine the target segmentation threshold and transforms the target extraction into a data-driven classification. The vehicle target extraction method based on LightGBM has 95.5% accuracy and a 96% true positive rate. Second, we estimate the potential area of vehicles from infrared images according to the distribution of radar targets and predict the region of interest (ROI) of vehicles based on pixel regression. The ROI extraction method based on radar can avoid complicated calculations and interference of heat sources in the environment, which will greatly improve the speed and accuracy of ROI extraction. Radar-based ROI extraction only takes 4 ms, which is much lower than image-based ROI extraction. Finally, four new Haar-like feature templates are designed to improve the vehicle detection performance, which can improve the detection accuracy by 2.9%. This method has a 92.4% detection accuracy and a 43 Fps detection speed in the road test, which significantly improves the vehicle detection performance in severe weather.
Semi-supervised Active Salient Object Detection
2022, Pattern Recognition
Citation Excerpt :
In this section, we briefly review existing deep models for SOD, annotation-efficient learning based techniques, adversarial learning based dense prediction models and recent work in deep active learning. Deep Salient Object Detection: Depending on the form of the supervision, existing deep SOD models can be roughly divided into four categories: 1) fully-supervised models [5–10,23–28], which mainly focus on producing high-resolution salient object prediction by learning from pixel-level annotated data; 2) weakly-supervised models [12,14,15,29–31], which learn saliency from weak but easy-to-obtain annotations, including image-level labels [12,14], image contour [31] and scribble labels [15]; 3) unsupervised models [16–18,32,33], which start with noisy labels computed by conventional handcrafted feature based methods, and design network to learn latent saliency from the noisy label; and 4) semi-supervised models [34,35], which learn saliency from given initial small set of labeled pool and a large amount of unlabeled data. Our solution belongs to the fourth direction where we aim to learn an effective salienct object detection model with limited budget.
In this paper, we propose a novel semi-supervised active salient object detection (SOD) method that actively acquires a small subset of the most discriminative and representative samples for labeling. Two main contributions have been made to prevent the method from being overwhelmed by labeling similar distributed samples. First, we design a saliency encoder-decoder with adversarial discriminator to generate a confidence map, representing the network uncertainty on the current prediction. Then, we select the least confident (discriminative) samples from the unlabeled pool to form the “candidate labeled pool”. Second, we train a Variational Auto-Encoder (VAE) to select and add the most representative data from the “candidate labeled pool” into the labeled pool by comparing their corresponding features in the latent space. Within our framework, these two networks are optimized conditioned on the states of each other progressively. Experimental results on six benchmarking SOD datasets demonstrate that our annotation-efficient learning based salient object detection method, reaching to 14% labeling budget, can be on par with the state-of-the-art fully-supervised deep SOD models. The source code is publicly available via our project page: https://github.com/JingZhang617/Semi-sup-active-self-sup-Learning.
A-contrario framework for detection of alterations in varnished surfaces
2022, Journal of Visual Communication and Image Representation
Citation Excerpt :
In all these studies, the detection is performed by rejecting a naive model which describes the statistic of the unstructured data. At the same time, grouping principles have been used for tasks related to the higher level perceptual organization of scenes [29,30], thanks to their general nature. In our case, we are interested in the applications dealing with the detection of changed areas across multi-temporal images.
Preventive conservation is the constant monitoring of the state of conservation of an artwork to reduce the risk of damages and so to minimize the necessity of restorations. Many methods have been proposed during time, generally including a mix of different analytical techniques. In this work, we present a probabilistic approach based on the a-contrario framework for the detection of alterations on varnished surfaces, in particular those of historical musical instruments. Our method is a one step Number of False Alarms (NFA) clustering solution which considers simultaneously gray-level and spatial density information in a single background model. The proposed approach is robust to noise and avoids parameter tuning as well as any assumption about the shape and size of the worn-out areas. Tests have been conducted on UV induced fluorescence (UVIFL) image sequences included in the “Violins UVIFL imagery” dataset. UVIFL photography is a well known diagnostic technique used to see details of a surface not perceivable with visible light. The obtained results prove the capability of the algorithm to properly detect the altered regions. Comparisons with other the state-of-the-art clustering methods show improvement in both precision and recall.

View all citing articles on Scopus

Jinchang Ren received his PhD in Electronic Imaging and Media Communication from Bradford University, U.K. Currently he is a Senior Lecturer (Associate Professor) with University of Strathclyde, Glasgow, U.K. His research interests focus mainly on visual computing and multimedia signal processing, especially on semantic content extraction for video analysis and understanding and hyperspectral imaging.

Genyun Sun received the B.S. degree from Wuhan University, China, in 2003 and PhD in Institute of Remote Sensing Applications, Chinese Academy of Sciences in 2008. He is currently an Associate Professor with China University of Petroleum, Qingdao, China. His research interests include remote sensing image processing, hyperspectral and high resolution remote sensing, and intelligent optimization algorithms.

Huimin Zhao received the Ph.D. degree in electrical engineering from the Sun Yat-sen University in 2001. At present, he is a professor of the Guangdong Polytechnic Normal University. His research interests include image, video and information security technology.

Junwei Han received the Ph.D. degree in pattern recognition and intelligent systems from the School of Automation, Northwestern Polytechnical University, Xi'an, China, in 2003. He is currently a Professor with Northwestern Polytechnical University. His current research interests include multimedia processing and brain imaging analysis.

Xuelong Li is a full professor with the Center for OPTical IMagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, Shaanxi, P.R. China. He is a Fellow of the IEEE.

Stephen Marshall received the BSc degree from the University of Nottingham and the PhD degree from the University of Strathclyde, U.K. He is a Professor with the Department of Electronic and Electrical Engineering in Strathclyde, and a Fellow of the IET. His research focuses in nonlinear image processing and hyperspectral imaging.

Jin Zhan received B.S. and Ph.D. degrees from Sun Yat-sen University in 2004 and 2015, respectively, and she is currently a Lecturer in the School of Computer Sciences, Guangdong Polytechnic Normal University. Her research interests include image/video analysis, computer vision, machine learning and applications.

View full text

Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement

Highlights

Abstract

Introduction

Section snippets

Related work

The proposed GLGOV framework for unsupervised saliency detection

Implementation detail of the proposed GLGOV framework

Experimental results

Conclusions

Acknowledgements

Curr. Biol.

Neuron

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Image Vision Comput.

Pattern Recognit.

Cognit. Psychol.

J. Physiol.-Paris

Neurobiol. Learn. Mem.

J. Visual Commun. Image Represent.

Bayesian saliency via low and mid level cues

IEEE Trans. Image Process.

Minimum barrier salient object detection at 80 fps

Global contrast based salient region detection

IEEE Trans. Pattern Anal. Mach. Intell.

A model of saliency-based visual attention for rapid scene analysis

IEEE Trans. Pattern Anal. Mach. Intell.

Guided Search 2.0 A revised model of visual search

Psychonom. Bull. Rev.

Shifts in selective visual attention: towards the underlying neural circuitry

Matters of Intelligence

Neural mechanisms of selective visual attention

Annu. Rev. Neurosci.

Salient region detection via high-dimensional color transform

Contrast-based image attention analysis by using fuzzy growing

Bottom-up and top-down attention: different processes and overlapping neural systems

Neuroscientist

Saliency detection via graph-based manifold ranking

Saliency detection: a spectral residual approach

Superpixel-based saliency detection

Salient region detection and segmentation

Computer Vision Systems

Saliency detection via dense and sparse reconstruction

Frequency-tuned salient region detection

Saliency detection using maximum symmetric surround

Saliency optimization from robust background detection

Segmenting salient objects from images and videos

European Conference on Computer Vision

Cortical mechanisms of colour vision

Nat. Rev. Neurosci.

Visual saliency detection based on multiscale deep CNN features

IEEE Trans. Image Process.

Global contrast based salient region detection

Efficient coding of spatial information in the primate retina

J. Neurosci.

Deep contrast learning for salient object detection

Psychology: The Science of Behavior

Top-down attention switches coupling between low-level and high-level areas of human visual cortex

Proc. Natl. Acad. Sci.