Abstract

This work presents a no-reference image sharpness metric based on human blur perception for JPEG2000 compressed image. The metric mainly uses a ringing measure. And a blurring measure is used for compensation when the blur is so severe that ringing artifacts are concealed. We used the anisotropic diffusion for the preliminary ringing map and refined it by considering the property of ringing structure. The ringing detection of the proposed metric does not depend on edge detection, which is suitable for high degraded images. The characteristics of the ringing and blurring measures are analyzed and validated theoretically and experimentally. The performance of the proposed metric is tested and compared with that of some existing JPEG2000 sharpness metrics on three widely used databases. The experimental results show that the proposed metric is accurate and reliable in predicting the sharpness of JPEG2000 images.

1. Introduction

Images are usually degraded by various factors such as defocusing and compression. Thus, it is more and more necessary to assess the image quality. The most reliable approach of image quality assessment is in the subjective way. The mean opinion score method is commonly used. It is implemented by subjective rating followed by some statistical processes to derive the mean opinion score (MOS). However, the subjective assessment is time-consuming, costly, and impractical. Hence, recently, there has been an increasing interest from the research community and industry towards developing objective assessment techniques.

The objective metrics can be divided into three categories: full reference (FR), reduced reference (RR), and no reference (NR) [1]. FR utilizes all information of the reference image while RR uses the detected features. However, the reference image or its features cannot be obtained sometimes. NR needs no reference information. It is widely used and challenging.

As the data volume is increasing apace, the limitation of the bandwidth becomes critical. It is more necessary to compress images. Different compression techniques introduce very different distortions. The discrete cosine transform (DCT) [2] based techniques, for example, JPEG and MPEG, lead to blockiness, whereas the JPEG2000 compression [3, 4] involving wavelet transform [5] mainly introduces blurring and ringing artifacts [6]. The particular interest of this work is NR sharpness assessment for JPEG2000 compressed images.

The existing metrics for JPEG2000 images can be generally classified into two categories. The first category is about metrics on general sharpness. They can be used in assessment of JPEG2000 images and Gaussian blurred images as well. The second category consists of metrics particularly designed for JPEG2000 images. In these metrics, the ringing effect is taken into account. We firstly give an overview of metrics of the first category. The metrics proposed in [711] try to evaluate the phase coherence. It is shown that exactly localized features such as step edges result in strong local phase coherence across scales and spaces in complex wavelet domain, but blur leads to loss of such coherence. The metrics proposed in [1214] use the kurtosis information. The metrics based on perceptual blur [6, 1517] take edge spreading width to assess the image sharpness. Saad et al. [18] developed a general-purpose method for NR image quality assessment using natural scene statistics (NSS) model.

The metrics in the second category employ the characteristics of JPEG2000 images. Some metrics are based on the neural network. The metric proposed in [19] uses the probabilities of the coefficients being nonzero in different subbands as features. Sazzad et al. [20] used pixel distortions and edge information. The metric proposed in [21] extracts some local gradient distribution features. The feature extraction is built on calculating the degree of blur at the edges. Sheikh et al. [22] used the NSS model to quantify the departure of an image and to make the prediction about its quality. The metric in [23] utilizes a PCA method to extract features at each edge pixel. And the probabilities of a given edge pixel being “distorted” and “undistorted” are calculated.

Besides the network-based metrics, there are some metrics based on two-phase ringing estimation in the second category. The two-phase ringing estimation includes the ringing regions detection and the ringing annoyances estimation. Marziliano et al. [6] proposed an FR and NR blurring metric and an FR ringing metric for JPEG2000 images. The blurring metric measures the width of edges, and the ringing metric measures the oscillations around edges. Barland and Saadane proposed in [24] an NR metric for JPEG2000 images. It used a blurring measure, a ringing measure, and an edge measure. Liu et al. [25] proposed an NR metric for perceived ringing artifacts in images. Oguz et al. proposed in [26] a measure of visible ringing that captures the ringing artifacts around strong edges. The ringing measures in the Barland metric, Liu metric, and Oguz metric are derived from calculating the activity of ringing region, specifically, the local variance.

Some other works, such as [27, 28], mainly introduce filtering-based methods, for example, bilateral filter [29] and anisotropic diffusion [30], to conceal ringing artifacts. However, these methods do not aim at image quality assessment. The metric in [31] predicts several artifacts, such as blurring, noise, and ringing.

For the existing general sharpness metrics, ringing effects are not taken into account. They perform unsatisfactorily in moderate compression, for the ringing artifact is most visible in moderate compression. And the existing training-based metrics usually extract a set of features for training and prediction. Some already existing sharpness metrics may be directly adopted as features. It is computationally inefficient. In addition, they lack the modeling of image sharpness, which goes against further researches. And for the existing two-phase metrics, the property of ringing structure is not taken into account. As a result, ringing, noise, and textures are usually confused in estimation of ringing annoyance. Further, the detection of ringing region is not reliable unless the image is of simple scene and under low degradation.

In this paper, a novel metric is proposed to evaluate the sharpness of JPEG2000 images. It mainly used the ringing measure. To obtain the preliminary ringing map, we used the anisotropic diffusion. Then, the final ringing map is derived by considering the property of ringing structure. However, the ringing artifact may be concealed by extreme blur in highly compressed images. Thus, a blurring measure based traditional method is used for compensation. The complementarity between the ringing measure and the blurring measure is studied and the sharpness metric is derived.

The main contributions of this paper can be summarized as follows.(i)We proposed a new method to detect the ringing artifacts. This method involves the anisotropic diffusion and a refining phase which uses the prior ringing structures and the HVS characteristic.(ii)An NR sharpness metric is proposed. The metric mainly depends on the ringing measure, and it uses a blurring measure for compensation when the blur is highly severe (in highly compressed images). We show that the ringing term is sufficiently monotonous along with the perceptual sharpness of JPEG2000 images if the blurring is not so critical that ringing artifacts are concealed by the blurring.

This paper is organized as follows. Section 2 describes the proposed metric. The experimental results are illustrated in Section 3. And we conclude this paper in Section 4.

2. The Proposed Algorithm

JPEG2000 compression mainly introduces blurring and ringing artifacts. However, it is found in this paper that the ringing term is sufficiently monotonous along with the perceptual sharpness of JPEG2000 images if the image is under moderate compression. But the ringing artifact may be concealed by extreme blur in highly compressed images (detailed in Section 2.3). Thus, our metric mainly depends on the ringing measure, and it uses a blurring measure for compensation when the blur is highly severe (in highly compressed images).

For ringing measure, an anisotropic diffusion is employed to detect the preliminary ringing map. Then, a refinement based on ringing properties is applied and the final ringing map is derived. The ringing measure is obtained by a weighted summation. After the ringing measure, we compute the blurring measure based on the perceptual blur based metric [6]. Our metric is derived mainly depending on the ringing measure, while it uses the blurring measure for compensation in the case of severe blur. A block diagram summarizing the computation of the proposed sharpness metric is given in Figure 1.

2.1. Ringing Measure

An anisotropic diffusion is used for extracting the preliminary ringing map. Then, a refinement based on the property of ringing structure is employed to derive the final ringing map. The ringing measure is derived from a weighted summation of the final ringing map.

2.1.1. Preliminary Ringing Extraction with Anisotropic Diffusion

The anisotropic diffusion is applied to obtain the preliminary ringing map. Based on the property of anisotropic diffusion, the ringing artifacts are mainly filtered out, but the edge structures are retained. The anisotropic diffusion model proposed by Perona and Malik [30] is adopted. It is formulated as where the sign is the gradient operator, is the original image, is the evolving image at time , and is the diffusion function that is formulated as

Considering the general range of gradient magnitude in ringing regions, we set . It is set to a constant without adaptation with iterations as referred to in [32]. In fact, without adaptation, excessive smoothness may happen (see [32] for details). As a result, the preliminary ringing map may exceed the actual amount. However, excessive smoothness inclines to occur in high blurred regions, for the gradient in high blurred regions is relatively small, thus leading to a large diffusion function (close to 1 according to (2)). Excessive smoothness leads for more ringing artifacts to be detected (as mentioned above, the preliminary ringing map is obtained by subtracting the diffused image from the original one). It results in more close correlation of the proposed metric with human perception. Actually, the ringing measure does correlate well with the HVS perception of ringing artifacts by using this setting.

Image is used as the initial input . The evolving process is implemented as described in (1). As suggested by [30], a discrete scheme is adopted. It is formulated as where is the iteration index, is the step length, is the number of total iterations, and is the set of natural numbers. The diffused image is obtained as the output of the evolution. The preliminary ringing map is calculated as the difference between the original image and the diffused image

Figure 2 shows a sample of the obtaining of the preliminary ringing map.

2.1.2. Refinement Based on Morphological Operation

In the preliminary ringing map, we found that, besides the traditional ringing artifacts around strong edges, there existed another type of ringing artifact. It appears at tiny structures in highly compressed images. This is caused by highly concealing effect at tiny structures. They are generally presented as horizontal or vertical tiny strips. Because they degrade the image quality, they can also be interpreted as ringing artifacts. And, in general, the traditional ringing artifacts around edges also consist of horizontal and vertical tiny strips. Figure 3(a) shows traditional ringing artifacts near strong edges and Figure 3(d) shows that the other “ringing” artifacts appear at tiny structures under high compression.

The preliminary ringing map contains not only ringing but also some image inherent textures and noise. A refinement is implemented using the property of ringing structure. The refinement employs a morphological opening operation. Generally, opening operation can be comprehended as an “extracting” process. It extracts image structures that contain the structuring element (SE) used in it. This process is roughly shown in Figure 3 with some typical samples.

Two SEs, the horizontal strip and the vertical one (detailed subsequently), are used in the opening operation. The horizontal ringing map, , is extracted as where the symbol indicates the opening operator. The function max( ) is used to remove the negative entries of the preliminary ringing map. The opening operation is implemented on the whole image (not on a single pixel), so we used the subscript index for the pixel coordinate in the equation.

Similarly, the vertical ringing structures are extracted as

For each pixel , three cases exist. It is on horizontal ringing artifacts; then, (the more visible the ringing artifact, the greater the value of ). It is on vertical ringing artifacts; then, . It is a point where no ringing artifact exists; then, and . Hence, the final ringing map that contains ringing artifacts of both directions can be computed by choosing the entrywise maximum as Examples of the extracted ringing artifacts are shown in Figures 3(c), 3(f), 3(i), and 3(l).

Determining the SE and . The SEs and are used for extracting the horizontal and the vertical artifacts, respectively. Their widths are used as the “cut-off” periods of ringing structures. The property of human vision system (HVS) sensitivity with respect to spatial frequency is taken into account. Evidence from grating and other experiments suggests that HVS contains band-pass filters with a bandwidth of 1 octave [33]. The contrast sensitivity function (CSF) [34] shows a typical band-pass filter shape peaking at around 4 cycles per degree (cpd) with sensitivity dropping off either side of the peak. According to Rayleigh criterion [35], for the optical wavelength and the pupil diameter , the limit of angular resolution of the HVS is It is about 1.22 × 550 nm/2.5 mm = 2.684 × 10−4 rad corresponding to the frequency of about 60 cpd. However, from CSF, the contrast sensitivity at this frequency is so low that HVS can hardly sense any signal. Thus, we set the cut-off spatial frequency four times the limit resolution as . The corresponding cut-off spatial period is where is the viewing distance between an observer and the screen. For a general image height of 700 pixels, the viewing distance is typically about pixels (six times the image height). Then, is about 4.62 pixels. The spatial period can be regarded as the characteristic width of the ringing period. We set for integralization. It is used as the width of SE to extract the ringing structure with the spatial period larger than it by opening operation. The length of SE should be greater than the width, and it is simply set to 7. Based on the property of morphological opening operation, the choosing of longer length will decrease the amount of detected ringing artifacts. Similarly, the SE is used to extract the vertical ringing structures. Hence, SE is directly assigned the transform of SE , that is, the uniform array of size 7 × 5.

2.1.3. The Ringing Measure

The ringing measure is derived from a weighted integral of the ringing map as where is a weighting matrix. It involves two factors. One is luminance contrast sensitivity saying HVS is sensitive to contrast instead of absolute luminance and the other is location weight motivated by HVS salience property that more attention is given to the center of an image. Let and be the contrast sensitive weighting matrix and the location weighting matrix (detailed subsequently); then, the weight is The product is employed because it is the very form that the two subweights can act as weights individually.

Hereafter in this subsection, we give the details on the two subweighting matrixes and . As Weber’s Law [36] says, the just noticeable difference in terms of luminance between two regions is approximately proportional to the background luminance. The ringing map can be regarded as the luminance difference caused by ringing structures. And the local average luminance can be regarded as the background luminance. The background luminance is derived from a local average filtering; that is, where is the convolution operator and is the filtering kernel. Specifically, is a disk patch with radius 5. With this radius, it is sizable enough (with the spatial extent 11) to cover the two adjacent ringing structures, which makes it suitable for calculating the background luminance. A larger radius is also workable but not efficient in computation.

From Weber’s Law, is the visibility index of ringing artifacts at . Considering this, as well as the form in (10) and (11), we adopt the reciprocal of the background luminance as the luminance contrast weight ; that is, where is the normalization factor.

For location weight , the HVS salience property is taken into account. Specifically, is formulated as a 2D Gaussian function, where ( ) is the image center and and are set to one-sixth times the image width and height, respectively. It ensures that the principal part ( ) of the weighting matrix is located exactly in image domain.

2.2. Blurring Measure

A blurring measure based on perceptual blur [6] is employed. However, we detect the edge profile along edge normal instead of horizontal or vertical direction. And there is a modification that we do not use the whole edge transition but the middle section with absolute derivative larger than a proportion (a half) of that at the corresponding edge pixel. The blurring measure is computed by averaging spreading widths at all edges (see [6] for details).

2.3. The Proposed Sharpness Metric

The blurring and ringing artifacts usually appear in JPEG2000 compressed images simultaneously, whereas we found that the ringing measure can be used to evaluate the sharpness normally (if it is not under extremely high compression). While, in extreme high compression, ringing artifacts are concealed by blurring (see Figure 3(g), for an example), the blurring measure is needed for compensation. For conciseness, we refer to these two cases as ring dominating and blur dominating, respectively. We take the blurring measure as the criterion to distinguish one case from the other. In fact, the blurring measure rises rapidly in blur-dominating case. It is not reliable in ringing-dominating case (normal compression) due to the disturbance of ringing artifacts. Let be a general term with regard to the blurring measure . The proposed metric is generally expressed as follows: where is an indicating function (described subsequently) with respect to . A psychometric function, specifically a saturated exponential function, is employed for as where is the index that controls the craggedness of when crosses . It can be regarded as the soft Heaviside step function indicating the blur-dominating case (ringing is concealed by blurring). A sample of is shown in Figure 4.

In the case of normal compression, the proposed metric almost depends on the ringing measure . For extremely compressed image, our metric depends on the second term. Considering that the HVS perception is in a limited range, we set to a constant function ; that is, (thus the blurring term is the scaled psychometric function). This is used for the compensation in extreme blur. The constant function and the threshold are determined in a small-scale (twenty-image) experiment as described in the next section.

3. Performance Results

In this section, the performance of the proposed metric is tested on three databases. Additionally, two auxiliary experiments are conducted to show the properties of the ringing measure and the blurring measure individually.

In the anisotropic diffusion, the step parameter should be in to keep stable [30], and we set for rapidity. With the specified in the diffusion function and the parameter , it is appropriate to set the total iteration number to the interval . In this interval, the ringing artifacts can be generally removed and the image structures are persisted. We set it roughly to 8 in this interval. However, it is not very strict for , as long as it is in this interval. The index in the indicating function (16) is set to 4 as usually used in a psychometric function.

The constant and the threshold are determined in a twenty-image experiment. The first twenty images from CSIQ database [40] are used. It should be noted that the first twenty images are independent in terms of the scenes and these images are not used in the experiments for performance tests. According to the small-scale experiment, the constant and the threshold are set to 18 and 8, respectively.

3.1. Testing Set

The involved databases are LIVE [41], TID2008 [42], and CSIQ [40]. They are widely used in the researches of image quality assessment.

The LIVE database consists of 29 reference images. These images are distorted using five different distortion types: JPEG2000, JPEG, Gaussian blur in RGB components, white noise in the RGB components, and bit errors in the JPEG2000 bitstream when transmitted over a simulated fast-fading Rayleigh channel. There are 227 JPEG2000 images in this database. Each image was rated by about 20–29 subjects. The subjects rated the images on a continuous linear scale which was divided into five different regions, namely, “bad,” “poor,” “fair,” “good,” and “excellent.”

The TID2008 database consists of 25 reference images and 1700 distorted images. The images are distorted by 17 types of distortions. As one type of them, the JPEG2000 subdatabase contains 100 images. The subjective tests were conducted using a pair-comparing manner. A reference image at the bottom and a pair of distorted images were simultaneously presented to the subjects. The subjects selected a distorted image that differed less from the reference image. The subjects were preliminarily instructed and trained on a set of distorted images before carrying out the actual experiments. The experiments were carried out by a total of 838 observers from three countries.

The CSIQ database consists of 30 reference images distorted using six types of distortions at four or five different levels. The distortions used in CSIQ are JPEG compression, JPEG2000 compression, global contrast decrements, additive pink Gaussian noise, additive white Gaussian noise, and Gaussian blurring. It contains 150 JPEG2000 images. CSIQ images were subjectively rated based on a linear displacement of the images across four calibrated LCD monitors placed side by side with equal viewing distance to the observer. All of the distorted versions of an original image were viewed simultaneously on the monitor array and placed in relation to one another according to the overall quality. Across-image ratings were realigned according to a separate experiment in which observers place subsets of all the images linearly in space. The database contains 5000 subjective ratings from 25 observers, and the ratings were reported in the form of DMOS.

These databases were chosen because of the diversity of the procedures of the subjective evaluations. In detail, the distorted image/images were presented singlewise, pairwise, and image setwise, for LIVE database, TID2008 database, and CSIQ database, respectively. In our experiment, the JPEG2000 subsets of these databases were used.

3.2. Correlations for Comparison

In order to evaluate the correlations between objective metrics and their MOS of the used databases, the authors followed the suggestions of the VQEG report [43]. Firstly, a 4-parameter logistic fitting between the objective and the subjective metrics is adopted, where , , , and are model parameters and / denotes the subjective/objective metric of the th image.

The parameters are obtained by optimizing the fitting. Figure 5 shows a sampling fitting curve of the proposed metric on CSIQ JPEG2000 images. The predicted MOS are derived from the fitting parameters. And they are used to evaluate the performance of the metrics. As suggested in [43], the Spearman correlation coefficient (SPCC), Pearson correlation coefficient (PCC), root mean squared error (RMSE), mean absolute prediction error (MAE), and outlier ratio (OR) are used. Note that a good metric corresponds to high SPCC and PCC but low RMSE, MAE, and OR.

3.3. Performance Result for the Proposed Metric

Tables 1, 2, and 3 show the performance of the proposed metric as well as some leading metrics such as CPBD metric [17], JNBM metric [15], local kurtosis based metric (LKM) [13], local phase coherence metric (LPC) [8], Marziliano metric [6], Laplacian metric [37], Marichal metric [38], Shaked-Tastl metric [39], BLIINDS-II metric [18], Liu metric [25], Barland metric [24], and FR metric PSNR.

It can be seen from Tables 13 that the proposed metric performs slightly better than or competitively with CPBD metric and BLIINDS-II metric but is significantly superior to others. It is just slightly inferior to CPBD metric on TID2008 database, but it still considerably outperforms CPBD metric on LIVE and CSIQ. The proposed metric and the BLIINDS-II metric are almost the same in terms of the performance. It should be noted that the BLIINDS-II metric is the general-purpose image quality metric that is not limited to JPEG2000 degradation. In fact, BLIINDS-II is a commendable general-purpose quality metric. However, it requires extremely huge computation because it extracts and utilizes many features, such as multiscale and multiorientation features in DCT domain in a sliding window, and works in neural network-based framework. More than ten hours are needed for assessment (only the testing process) of the JPEG2000 subset of one database, whereas only a few minutes are needed by the proposed metric and other metrics with the same computer with MATLAB. The proposed metric is designed for JPEG2000 images and highly reliable on all of the three databases. In detail, almost all SPCC and PCC of the proposed metric are higher than 0.9 on these databases, excepting the SPCC on CSIQ database that is still extremely near 0.9.

Except for Liu metric and Barland metric, all other existing metrics do not exploit the ringing effect explicitly, although they are claimed to be suitable for JPEG2000 images. Liu metric and Barland metric explicitly introduce the ringing measure. However, they did not take the structuring properties of ringing into account. In fact, the ringing measure in Liu metric and Barland metric is derived from calculating the activity of ringing region, specifically, the local variance. As a result, they are likely to confuse structures, noise, and textures. The performance results of these two metrics are not satisfactory, especially for LIVE database. However, we should note that Liu metric is actually for ringing annoyance assessment not for sharpness (or quality) assessment. Thus, the straight comparison is not fair to it. However, experiment results show that the metrics based on ringing region detection and activity-based ringing annoyance measure are not robust at all.

Some metrics, such as JNBM metric and LPC metric, do not take ringing into account. As a result, their performance is not robust. It can be seen from Table 1 that these metrics perform unsatisfactorily on LIVE database. This is because the JPEG2000 images in LIVE database are almost in ring-dominating case (not extremely compressed). This is further validated subsequently in this section.

3.4. Auxiliary Experiments for Individual Measures

Two auxiliary experiments were conducted to test the individual components: the blurring measure and the ringing measure . These experiments demonstrated the characteristics of the ringing measure and the blurring measure.

The results of the auxiliary experiments are shown in Figure 6. It shows the dot plots of the blurring measure, the ringing measure, and the proposed sharpness metric. The blurring measure and the ringing measure are presented as -axis, while the MOS are presented as -axis, which is convenient to show the behaviors of the two measures with respect to MOS. In normal compression, the ringing measure is adequate for the sharpness assessment (see the leftmost figure in Figure 6(b)). Note that the blurring measure is not reliable in this compression range due to the disturbance of the ringing artifacts. However, for extremely compressed images, the blurring is so severe that the ringing is concealed. The corresponding ringing measure decreases (see the two figures on the right in Figure 6(b)), and the blurring measure is used for compensation. The blurring measure rises so rapidly after it exceeds about that it can be stably used as the indicating parameter.

It is shown in Figure 6(c) that the sharpness metric derived from and achieves the advantages of both of them. The ringing measure performs well in ring-dominating case, and it goes backwards in blur-dominating case. Fortunately, the blurring measure can be used as the indication of the likelihood of the two cases, because its value is quite large in blur-dominating case. Compensated by the blurring measure, the proposed metric becomes much more monotonous.

What should be noted is that the ringing measure performs well itself on LIVE database (see the leftmost figure of Figure 6(b)). This is because all images in the database are in normal compression. It validates the accuracy of the proposed ringing measure.

Comparison with the Blurring Metric in [6]. The proposed metric adopts the modified blurring metric in [6] as the secondary parameter which is used in the indicating function. The proposed ringing measure dominates when blurring (note that this is a considerably wide range). We compare the proposed metric with the blurring metric in [6]. The performance is listed in Tables 13 and the dot plots of metric in [6] are shown in Figure 7. The dot plots of the proposed metric have been shown in Figure 6(c). From Figure 6(c) and Figure 7, we can see that the proposed metric is much monotonous than the metric in [6] (noting that in Figure 7 the metrics are presented as -axis rather than -axis in Figure 6), which mainly profits from the monotonousness of the proposed ringing measure in ring-dominating case.

4. Conclusions

An NR image sharpness metric for JPEG2000 compressed images is proposed in this paper. The metric mainly utilizes a structuring ringing measure. In the case of extreme blurring, a blurring measure is used for compensation. One major contribution of this paper is the ringing detection which involves the anisotropic diffusion and a refining phase that uses the prior ringing structures and the HVS characteristics. We show that the ringing term is sufficiently monotonous along with the perceptual sharpness (quality) of JPEG2000 images if the blurring is not so critical that ringing artifacts are concealed. In fact, the ringing detection method is quite effective (see Figure 3). The highly visible ringing artifacts are detected while small image structures (or noise) are discarded.

The proposed metric is tested on three widely used databases. And quite a few existing leading metrics are tested for comparison. The experimental results show that the proposed metric is superior or at least competitive to the existing metrics.

One future direction is employing salience measure to directly take place of the location weight of the proposed ringing measure. Then, the artifacts in the background can be distinguished more clearly.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.