A Saliency Dispersion Measure for Improving Saliency-Based Image Quality Metrics

Objective image quality metrics (IQMs) potentially benefit from the addition of visual saliency. However, challenges to optimizing the performance of saliency-based IQMs remain. A previous eye-tracking study has shown that gaze is concentrated in fewer places in images with highly salient features than in images lacking salient features. From this, it can be inferred that the former are more likely to benefit from adding a saliency term to an IQM. To understand whether these ideas still hold when using computational saliency instead of eye-tracking data, we first conducted a statistical evaluation using 15 state-of-the-art saliency models and 10 well-known IQMs. We then used the results to devise an algorithm, which adaptively incorporates saliency in IQMs for natural scenes, based on saliency dispersion. Experimental results demonstrate that this can give significant improvements.

P u blis h e r s p a g e : h t t p:// dx.d oi.o r g/ 1 0. 1 1 0 9/TC SVT.2 0 1 7. 2 6 5 0 9 1 0 < h t t p:// dx.d oi.o r g/ 1 0. 1 1 0 9/TC SVT.2 0 1 7. S e e h t t p://o r c a .cf. a c. u k/ p olici e s. h t ml fo r u s a g e p olici e s.Co py ri g h t a n d m o r al ri g h t s fo r p u blic a tio n s m a d e a v ail a bl e in ORCA a r e r e t ai n e d by t h e c o py ri g h t h ol d e r s .
A Saliency Dispersion Measure for Improving Saliency-Based Image Quality Metrics Wei Zhang, Ralph R. Martin, and Hantao Liu Abstract-Objective image quality metrics (IQMs) potentially benefit from the addition of visual saliency.However, challenges to optimizing the performance of saliency-based IQMs remain.A previous eye-tracking study has shown that gaze is concentrated in fewer places in images with highly salient features than in images lacking salient features.From this, it can be inferred that the former are more likely to benefit from adding a saliency term to an IQM.To understand whether these ideas still hold when using computational saliency instead of eye-tracking data, we first conducted a statistical evaluation using 15 state-of-the-art saliency models and 10 well-known IQMs.We then used the results to devise an algorithm, which adaptively incorporates saliency in IQMs for natural scenes, based on saliency dispersion.Experimental results demonstrate that this can give significant improvements.

I. INTRODUCTION
Image quality metrics (IQMs) lie at the heart of algorithms to automatically predict perceived image quality [1].Reliably predicting image quality as perceived by humans, however, remains challenging [2].A significant trend in IQM research [3]- [12] is to investigate the significance of visual attention, an important mechanism in the human visual system that allows effective selection of the most relevant information in a visual scene [13].
Psychophysical studies have been undertaken to understand visual attention in relation to image quality assessment [3]- [7].Integrating visual attention data obtained from eye tracking leads to improved ability of IQMs.Eye tracking is cumbersome and impractical in many circumstances, however.A more realistic way to integrate visual attention into IQMs is to use computational saliency.State-of-theart saliency-based IQMs [8]- [12] generally weight local distortions with local saliency, resulting in a more sophisticated means of quality prediction.However, determining optimal use of computational saliency in IQMs is not straightforward [11], [12].
Our previous research based on eye tracking [14] revealed that the inter-observer agreement (IOA) for human fixations-the degree of agreement between observers freely viewing the same visual stimulus-is strongly image content dependent.Furthermore, this measure predicts the extent to which a certain image may profit from adding saliency information to an IQM.As the observation also revealed from eye-tracking studies in [15] and [16], if an image has highly salient objects, then most viewers will concentrate their fixations around them, whereas if there is no obvious object of interest, viewers' fixations will appear as a more evenly distributed pattern.Thus, images with salient objects tend to have less variation in fixations between viewers (i.e., higher IOA) than images without salient objects.As illustrated in Fig. 1, images of the former kind [see Fig. 1(a)] have concentrated saliency maps, while the latter [see Fig. 1(b)] have more dispersed maps.When saliency is spread throughout the scene, incorporating saliency in an IQM is less likely to benefit image quality prediction [3], [14], as different observers tend to look at different parts of the image.Incorporating saliency into an IQM may give a low weight to some region with high distortion, and therefore weighting the IQM might unhelpfully downplay the importance of distortion in this region.To make better use of saliency in IQMs, a sophisticated integration strategy is needed, taking into account image content, particularly in terms of the dispersion of saliency.
The contributions of this papers are: 1) a statistical evaluation of whether conclusions concerning the content-dependent nature of benefits of adding saliency information to an IQM, determined from eye-tracking data, still hold when computational saliency is used in its place and 2) an algorithm that can provide a reliable proxy for IOA, for use in content adaptive IQM which incorporates computational saliency.

II. EFFECT OF IMAGE CONTENT DEPENDENCE
Image content dependence of the improvement to IQMs by incorporating saliency has been demonstrated by use of eye-tracking data in [14].In that study, ground truth saliency and IOA [calculated as the average correlation coefficient (CC) between the mean saliency map and each observer's saliency map] were measured for the LIVE database [17]; based on IOA for scene content, the entire database was divided into three subsets: images with low, intermediate, and high IOA.The measured saliency was integrated into three IQMs to assess quality of images within individual subsets.The result was that the average performance gain of IQMs increases as IOA increases.
A realistic IQM, however, will use a computational model of saliency rather than eye tracking.To determine whether content dependence still remains significant, and potentially useful, we conducted a statistical evaluation using 15 state-of-the-art saliency models and 10 of the best-known IQMs.The methodology established in [3] and [12] was used to assess the added value of computational saliency in IQMs: saliency is incorporated by weighting the distortion map calculated by an IQM using the saliency map computed from the original scene.For each subset of images, we quantified the performance gain of a saliency-based IQM over its original form without saliency.The IQMs included 6 full-reference metrics, including PSNR [1], UQI [18], SSIM [19], MS-SSIM [20], VIF [21], and FSIM [22]; and four no-reference (NR) metrics, including GBIM [23], NBAM [24], NPBM [25], and JNBM [26].The IQMs used were implemented in the spatial domain.Note that other well-known IQMs formulated in the transform domain (e.g., [27], [28]) were not included in our study but could be considered in the future work.Also note that specific advanced IQMs (see [9]) already incorporate well-established saliency aspects.Adding saliency information to these IQMs is not very meaningful, as it would duplicate some aspects of how they work already.As suggested in [12], for all NR metrics, saliency was computed from the original scene rather than the distorted scene.Such saliency was either assumed to be practically available [e.g., as a side information, in which case the framework is analogous to the reduced-reference case], or considered to be plausibly approximated from the distorted image (e.g., by filtering out distortion).The 15 saliency models were AIM [29], AWS [30], CA [31], CBS [32], DVA [33], GBVS [34], ITTI [35], PQFT [36], SDCD [37], SDFS [38], SDSR [39], SR [40], SUN [41], SVO [42] and Torralba [43], representing the best performing saliency models in terms of the capability of improving the performance of IQMs [12].
The study thus resulted in 150 possible combinations (10 IQMs × 15 saliency models).The performance of an IQM was quantified by the Pearson linear CC and Spearman rank order CCs (SROCC) between the IQM's output and the subjective quality ratings [44].Fig. 2 illustrates the performance gain averaged over all 150 cases for different degrees of IOA.Results of t-test (preceded by a Kurtosis test for the assumption of normality [3]) show that the difference in performance gain between each pair of subsets is statistically significant with p <0.05 at the 95% confidence level.This confirms that the benefits of inclusion of computational saliency in IQMs depend on image content.For images with low IOA, incorporating saliency runs the risk of reducing IQM's performance (i.e., the performance gain can appear negative as s h o w ni nF i g .2 ) .

III. PROPOSED SALIENCY DISPERSION MEASURE
To optimize the saliency integration by incorporating the above observation, we propose an algorithm to measure the saliency dispersion and use that as a proxy for the variation in human fixation (i.e., IOA) on natural scenes.Reliably quantifying saliency dispersion in agreement with IOA is very challenging, despite research on the topic.Existing methods have either limited sophistication (e.g., the simple saliency coverage measure in [15]) or limited applicability to real-world systems (e.g., the complex approaches in [45] and [46]).We have thus devised our own simple, but reliable, method.
Our method is based on Shannon entropy, which is a measure of the randomness or uncertainty of a variable [47].We analyze saliency maps as realizations of random variables.Fig. 1(a) shows a ( For the saliency map in Fig. 1 Note, however, that even a single large salient object may also lead to a spread-out saliency map.For example, the saliency map in Fig. 3(b) is more concentrated than the saliency map in Fig. 3(a), but the entropy values are similar [i.e., H = 7.26 for Fig. 3(a) and H = 6.99 for Fig. 3(b)].This is because entropy is a single value summarizing the whole image; it does not consider spatial characteristics and relations of fixation patterns [48].To perform a more refined saliency dispersion analysis, we use a multilevel approach to entropy calculation.To do so, the saliency map is partitioned at level P into P × P nonoverlapping blocks of equal size (see Fig. 3).At P = 2 the original map is subdivided into four equal quadrants, at P = 3, into 9 equal partitions, and so on.We then define the multilevel entropy of the saliency map to be where P max is the finest level of division, and N max = P 2 max ; B runs over each block.In the case illustrated in Fig. 3, the disparity in entropy between saliency maps increases as the number of partitions increases, which allows the multilevel entropy to better distinguish the two saliency maps than the whole-image entropy, giving the more compact saliency map a lower entropy.
To determine the number of levels to use, we use an empirical approach, based on quantifying the correlation between the estimated saliency dispersion and its ground truth counterpart (i.e., IOA).Fig. 4 plots the absolute value of the Pearson correlation between H for different choices of P max , and ground truth IOA values determined for the same set of images from three independent eye-tracking databases [14].While correlation increases with P max , saturation starts to occur at about P max = 4. Hypothesis testing is performed to verify whether there is a significant difference between the use of P max = 4 and a higher level of P max .A Wilcoxon signed rank test (i.e., a nonparametric version of t-test in the case of nonnormality) based on the residuals between H and IOA [12] was conducted; and the results (i.e., p < 0.05 at the 95% confidence level) showed that there was no statistically significant difference between P max = 4 Absolute value of the Pearson correlation (as shown for each data point) between estimated saliency dispersion, H , and its ground truth counterpart IOA, for different choices of P max .IOA values were determined for the same set of images from three independent eye-tracking databases [14].
and P max = 5, and between P max = 4a n dP max = 6.We therefore use P max = 4 in our experiments.

IV. PROPOSED OPTIMIZATION METHOD
We now consider how to use the above formula for assessing saliency dispersion to improve saliency-based IQMs.
Suppose we are given a particular saliency model and an IQM.For an input scene of size M × N, we can compute a saliency map together with its degree of dispersion H .The key idea is to only include saliency in the computation of image quality if the dispersion is not too large, in line with the observation that using saliency in cases of low IOA may be of no benefit to or even reduce the IQM performance.
In principle, we wish to do the following.If H is below a threshold T , saliency is combined with the pre-existing IQM to provide a modified method of quality assessment, as where D represents the distortion map measured by an IQM, and S indicates the saliency map generated by the saliency model.If the saliency dispersion is large, the saliency of the scene contains much uncertainty, and so is ignored: the pre-existing IQM is used directly without saliency.However, using a hard threshold will lead to a discontinuous IQM, and two very similar scenes whose saliency dispersions are just above and below the threshold may end up with significantly different quality scores.To avoid such sudden changes, instead of using a step function to switch between using saliency, or not, a sigmoid function σ(•) is applied to smooth the IQM near the transition region.Our integrated image quality metric I ′′ is given by where I is the original IQM value and σ(x) is defined as where T is the threshold value and τ controls the steepness of the sigmoid function.
As different saliency models lead to intrinsically different scales of entropy measurements (i.e., different ranges of H values), T should be individually determined for each saliency model.To ensure generality of the technique and to perform a more rigorous procedure to determine reliable parameters, τ and T were empirically determined from a separate larger-scale saliency database to that used in our experiments; we used the MIT300 database [49] containing 300 natural scenes and a wide diversity of content.Fig. 5 gives H for these scenes, ordered from lowest to highest value, for the 15 saliency models considered in Section II.The median H value for each saliency model was used as the corresponding threshold T (e.g., T = 4.38 for AIM), while the slope of the envelope of the values between the 25th and 75th percentiles was used to determine an appropriate value of the steepness control τ ; in practice, these were similar, so we used τ = 20 for all saliency models.Note that other saliency databases (see [50], [51]) may be used to estimate these parameters, but we do not expect it to change the results significantly.

V. E XPERIMENTAL RESULTS
The performance of each IQM was evaluated against three recognized image quality databases: CSIQ [28], TID2013 [52], and LIVE.In each case, we compared its performance between use of saliency, fixed use of saliency, and adaptive use of saliency according to saliency dispersion.Table I shows the performance (in terms of CC) in each case, averaged over 15 saliency models (SROCC values exhibit similar trends and thus are not presented here).Following Fig. 6.Comparison of performance gain (i.e., CC) between saliency-augmented IQMs using fixed and adaptive use of saliency for each saliency models.the approach taken in [12], CC values are reported without nonlinear fitting in order to better visualize differences in IQM performance.As can be seen, the adaptive approach outperforms fixed use of saliency in all cases.On average (over all databases), VIF and FSIM do not benefit from fixed use of saliency, but are improved by using adaptive saliency.Note that VIF and FSIM obtain relatively small gain by adding saliency.This is probably due to the fact that some well-established saliency aspects are already embedded in VIF and FSIM, which consequently causes a saturation effect in saliency optimization.More detailed results are given in Fig. 6, which shows the performance gain (i.e., increase in correlation when using fixed or adaptive saliency approaches), averaged over all IQMs, for individual saliency models.On average, the gain achieved by adaptive use of saliency is more than double that of always using saliency.As well as the observed relative difference in gain, Fig. 6 also gives the absolute gain of the adaptive approach for individual saliency models-this can be easily used to decide which of these models are more useful for IQMs.For example, by applying a threshold CC = 0.04 to all databases picks out the good models to be PQFT, SDSR, SR.However, we again note that the purpose of this paper is not to find the best IQM (or to target specific IQMs), but rather to compare fixed use of saliency to adaptive use of saliency according to saliency dispersion.
A paired sample t-test analysis (preceded by a test for the assumption of normality) was performed, selecting the integration strategy as the independent variable and the performance as the dependent variable.Using the 150 × 2 × 3 data points contained in Table I demonstrated with p < 0.01 at the 95% confidence level that an adaptive strategy is statistically significantly better than fixed inclusion of saliency.

VI. CONCLUSION
This paper considered how to reliably measure saliency dispersion in natural scenes and how it can be used to adaptively incorporate computational saliency into image quality metrics.Results show that adaptive use of saliency according to saliency dispersion significantly outperforms the fixed use of saliency in improving IQMs.We also intend to investigate the dependence of gain on saliency model dependence to maximize the IQM's performance as future work.
s e n o t e: C h a n g e s m a d e a s a r e s ul t of p u blis hi n g p r o c e s s e s s u c h a s c o py-e di ti n g, fo r m a t ti n g a n d p a g e n u m b e r s m a y n o t b e r efl e c t e d in t his ve r sio n.Fo r t h e d efi nitiv e ve r sio n of t hi s p u blic a tio n, pl e a s e r ef e r t o t h e p u blis h e d s o u r c e.You a r e a d vis e d t o c o n s ul t t h e p u blis h e r's v e r sio n if yo u wi s h t o cit e t hi s p a p er. Thi s v e r sio n is b ei n g m a d e a v ail a bl e in a c c o r d a n c e wit h p u blis h e r p olici e s.

Fig. 1 .
Fig. 1.Natural scenes, their ground truth saliency maps, corresponding IOA scores, and entropy calculated from the histogram of saliency intensity values.(a) Image with a few highly salient objects; IOA is high.(b) Image lacking salient objects; IOA is low.IOA values and saliency maps were determined from human eye fixations of 20 observers [3].Entropy H of the saliency map was computed using (1).

Fig. 3 .
Fig. 3. Calculation of multilevel entropy H .At each level the saliency map is partitioned into blocks of equal size.H is found by adding the entropies computed at each level of partition.P max is the level with finest partitioning.
(a), it is 6.04 b.The entropy calculated for the saliency map of a different natural scene shown in Fig. 1(b) is 7.26 b.Saliency in Fig. 1(a) is more concentrated in fewer areas than in Fig. 1(b), which results in a smaller value of entropy.

Fig. 5 .
Fig.5.H calculated for 300 scenes from the MIT300 database[49], using saliency values generated by 15 state-of-the-art saliency models.H values are ordered from lowest to highest for each model.

TABLE I PERFORMANCE
FOR 10 IQMs (IN TERMS OF CC, WITHOUT NONLINEAR REGRESSION) ON ALL IMAGES OF THREE DATABASES,USING VERSIONS WHICH DID NOT USE SALIENCY,ALWAYS USED SALIENCY, OR ADAPTIVELY USED SALIENCY ACCORDING TO SALIENCY DISPERSION.THE 95% CONFIDENCE INTERVALS FOR CC VALUES RANGE FROM 0.001 TO 0.006.NOTE THE LIVE RESULTS ARE BASED ON THE REALIGNED SUBJECTIVE DATA Fig. 4.