Difference of Visual Information Metric Based on Entropy of Primitive

: Image sparse representation is a method of efficient compression and coding of image signal in the process of digital image processing. Image after sparse representation, to enhance the transmission efficiency of the image signal. Entropy of Primitive (EoP) is a statistical representation of the sparse representation of the image, which indicates the probability of each base element. Based on the EoP, this paper presents an image quality evaluation method-Difference of Visual Information Metric (DVIM). The principle of this method is to evaluate the image quality with the difference between the original image and the distorted image. The comparative experiments between DVIM & PSNR & SSIM are carried out. It was found that there was a great improvement in the image quality evaluation of geometric changes. This method is an effective image quality evaluation method, which overcomes the weakness of other quality evaluation methods for geometrically changing images to a certain extent, and is more consistent with the subjective observation of the human eye.


Introduction
The signals input by the computer from imaging devices such as cameras and cameras must be sampled, quantized, etc., and the analog signals are converted into digital signals for analysis and processing [Bi (2017)]. In the process of digital image transmission, the principle of RGB three primary colors is often used to represent an image. A color image of size m×n, in the memory to store a "m×n"×3 multidimensional data arrays, indicating that a pixel needs to use 3×8=24 bit, a frame rate of 60 fps 720p video. The data for one second is 60×2.64 MB=158.4 MB. The amount of data in digital images is huge, and images and videos must be compressed to increase efficiency. There is redundant information in the process of image compression for transmission. This information includes multiple kinds of redundancy, the most important of which are spatial redundancy, time redundancy and information entropy redundancy. By analyzing and extracting the information, removing the correlation between the image and the image, and then compressing the data, a better result

Peak signal to noise ratio
The basic principle of image objective quality evaluation is to compare the original image and the distorted image pixel by pixel to obtain the pixel-level error value, and finally the obtained statistical information is used as the evaluation standard. The objective quality evaluation methods of images mainly include: Peak Signal-to-Noise Ratio (PSNR), Mean Squared Error (MSE), and Mean Absolute Error (MAE).
(1) Peak Signal-to-Noise Ratio (PSNR) is a method of expressing the ratio of the maximum possible power of a signal to the destructive noise power that affects its representation accuracy [Lee and Lim (2016)]. As shown in formula (1). (1) N represents the total number of pixels, L represents the dynamic range of the pixel, xi represents the pixel value of the original image, and yi represents the pixel value of the distorted image. Mean Square Error (MSE). As shown in formula (2).

Objective quality evaluation method based on structural similarity
In order to be able to obtain an evaluation method consistent with the subjective feelings of the human eye, it is not possible to rely solely on the statistical method of full pixels, so that it is necessary to use the "human eye" and some characteristics of psychology to consider the process of human visual analysis in evaluation techniques. in. The structural similarity measure has a strong similarity to the perception of images [Méndezaguilar, Kellypérez, Berrielvaldos et al. (2017)], similar to the high-use function of the "human eye". The mathematical function model of SSIM is shown in Eq. (3). (3) X and Y represent input signals, represents the average luminance of the image signal, represents the pixel value of the -th pixel in the image, 1 and 2 are constant, and standard deviations and y represent signal contrast.

Primitive entropy evaluation method
The primitive entropy is the statistic in the sparse representation, which represents the probability of use of each primitive. The combination of the statistical properties of the dictionary primitives in the sparse representation and the concept of Shannon's information entropy yields the Entropy of Primitive [Ramos and Mercère (2016)]. Shannon Entropy indicates the size of the maximum amount of information, which is a concise expression of the amount of information. The expression of Shannon's information entropy is shown in Eq. (4): (4) x represents a random variable, and P(x) represents an output probability function. For any input image Y, it is divided into several blocks 1, 2, …, of the same size (the size in this experiment is 8×8). These small blocks are now used as input signals for the K-singular value decomposition (K-SVD algorithm), and the complete dictionary D is trained. The sparse representation vector { } is obtained by the Orthogonal Matching Pursuit (OMP) algorithm, where xi corresponds to the block . After the sparse representation vector { } is obtained, the primitive corresponding to the value that is not zero in is considered to be used once. As shown in Eq. (5) where N is the number of image blocks, is the number of times each primitive is used, and , represents whether the th image block uses the th primitive. The calculation method of , is as shown in Eq. (6): In the above formula, , represents the th coefficient of the sparse representation vector of the th block image. This makes it possible to count the number of times each primitive in the dictionary is used, t1, 2, …, . Then you can count the probability that each primitive is used, as shown in Eq. (7): Finally, similar to the entropy calculation method, the element entropy (EoP) is defined as the formula (8) Figure 1: Lena image PSNR, SSIM, L with the EoP change curve As can be seen from Fig. 1, when the EoP and SSIM curves are smooth (about L=7), the subjective quality of the image tends to be stable, and the human eye barely notices any change. As can be seen from the above three figures, as the number of primitives L increases, the reconstructed image quality (PSNR, SSIM) continues to increase [Rehna and Kumar (2014)]. Entropy of Primitive (EoP) also increases. When L reaches a certain value (about 7), the SSIM curve tends to be stable, and the EoP curve tends to be stable. In order to use mathematical methods to prove subjective feelings, the Pearson correlation coefficient (PLCC) of 30 test images was calculated to measure the correlation between EoP and PSNR and SSIM. First, a nonlinear model, such as Eq. (9), is used to fit PSNR-EoP and SSIM-EoP. After obtaining the model parameters, the predicted values of EoP are calculated by substituting into Eq. (9).
Then, the PLCC between the EoP predicted value and the EoP measured value is calculated by the formula (10) As can be seen from Fig. 4, EoP has a strong correlation with SSIM. In most cases, the PLCC of SSIM-EoP is larger than the PLCC of PSNR-EOP and the PLCC of SSIM-EoP is close to 1 [Shi, Jiang and Zhao (2016)], which means that EoP and SSIM show linear correlation, indicating that EoP can evaluate the subjective quality of the image. In order to further explain why the value of EoP tends to be stable with the increase of the number of primitives L, this paper verifies. The histograms of the Lena, Einstein, and Plane images under different L primitives are counted, as shown in Figs. 5-7. The histograms from above are summarized as follows: (1) With the increase of L, the total number of times the primitives are used is increasing, and its distribution is gradually becoming stable, showing a strong regularity; (2) When L=1, each image block can only be represented by one primitive, so that some primitives are used in a large amount; (3) A large number of similar image blocks are repeated, and the most likely to be used is the same primitive.
(4) The reason why a large number of primitives are not used may be due to local similarity of the image, such as the background area of the image; (5) With the increase of L, the distribution of primitives is gradually uniformized. The performance on the elemental entropy (EoP) is that as L increases from 1, the EoP increases significantly. However, when L is increased to a certain value, although the number of primitives used, that is, the magnitude of the histogram is still increasing, the law of probability distribution is gradually stabilized, and the rate of change of EoP value gradually slows down and eventually stabilizes. It is verified that the value of EoP tends to be stable as the number of primitives L increases.

Primitive entropy model
The element entropy (EoP) is treated equally for all image blocks, and all image blocks contribute the same to the number of primitive uses. However, the degree of attention of the "human eye" is different in each image, which will bring higher demands to the processing of video images.
The experimental results yielded similar results to EoP, as shown in Fig. 8, which are the EoP (AEoP) versus L curves for the Lena, Einstein, and Plane images based on the human eye attention model. AEoP was found to have similar statistical properties and was strongly correlated with SSIM and subjective quality.

Visual information system
It can be seen from the above experiments that there is strong robustness between elemental entropy (EoP) and image quality evaluation. It takes advantage of this property of primitive entropy to propose visual information (Visual Information, VI). Vision is a kind of human cognitive means and an important way for human beings to understand the objective world. Fig. 9 shows the experimental results of a set of Lena images using four EoP curve results for JPEG-compressed images using different quantization factors (QF) [Vaksman, Zibulevsky and Elad (2016)]. It can be seen that the EoP curve of the distorted image also maintains a similar trend and tends to be stable as the L increases. However, the EoP peaks under these different quantization factors (QF) are different. As the quantization factor increases, the visual effect of the image is closer to the original image, and the peak of EoP gradually increases, and gradually approaches the EoP peak of the original image. In the formula, ε is a constant, which is 0.01 in this experiment. It is shown that after L exceeds this threshold l , the EoP curve will tend to be smooth. The EoP value of the threshold l ̃ point is the visual information of the image [Zhang Wang, Ma et al. (2014)], and the visual information (Visual Information, VI) is defined as shown in the formula (13).
Next, statistics were performed on the experiment. The visual information (VI) of the JPEG compressed image obtained by the image under different quantization factors is counted, and the results of four images are shown in Fig. 10. It can be seen from the above figure that as the quantization factor (QF) increases, the image quality will also increase, and the estimation of image visual information (VI) tends to be stable, gradually approaching the amount of visual information of the original image.

JND model based on primitive entropy
The just-identifiable distortion (JND) is a threshold that indicates the tolerance of the human eye. If the nonlinear effect relationship of the human eye to different things is neglected, a non-linear relationship is used to express the superposition effect of brightness and texture. The just-identified distortion (JND) expression is shown in Eq. (14).
( , ) represents the brightness mask effect, ( , ) represents the texture mask effect, and represents the degree of superposition between the two factors. The size of the value determines the effect of the superposition effect. When the value is large, the superposition effect between the brightness mask and the texture mask is strong; when =1, the superimposed effect reaches the maximum value; when =0, the case where the superposition effect is minimum is zero. In many cases, the superposition effect is between the maximum and the minimum, i.e., 0< <1. PLP (perceptual lossless profile) refers to an image that is lossless in visual information generated by maximizing distortion. The reconstructed image õf the primitive using the threshold ̃ is defined as a PLP. Experiments in the previous section have shown that the threshold ̃ means that the visual information will no longer change significantly, then the image reconstructed at this threshold point is considered visually undistorted from the original image. With PLP, the maximum tolerable loss of visual information can be effectively estimated, and the smallest error that can be recognized by HVS can be effectively estimated, which is similar to the concept of JND. The new EoP-based JND model is then established, as shown in Eq. (15).
In the formula, abs means taking the absolute value. EJND is the residual image of the PLP and the original image. Both the JND in the airspace and the JND in the frequency domain take advantage of the HVS feature and treat HVS as an unknown black box, a "bottom-up" modeling approach.

Visual information difference metrics
Since there is a strong correlation between the elementary entropy and the subjective feeling, the elementary entropy can be used to measure the visual information that an image brings to the human eye. Based on these properties of primitive entropy, this paper proposes a model of the Difference of Visual Information Metric (DVIM) as follows: (1) respectively calculating the visual information value (VI) of the original image and the distorted image; (2) using the visual information values of the original image and the distorted image to make a difference; (3) the difference obtained is used as a measure of the difference. This indicator is the difference in visual information, which can measure the size of the image visual information data. As the degree of image distortion continues to increase, the difference in visual information values (VI) between the original and distorted images is also increasing. Conversely, as the distorted image is closer to the original image, the difference in visual information values between the two decreases. The model of visual information difference metric (DVIM) is shown in Eq. (16).
and are used to represent the visual information of the original image and the distorted image, respectively, and DVI is the difference between the visual information of the two. Visual Information Difference Metric (DVIM) is a semi-reference quality evaluation method that does not require all the information of the original image. It only needs the visual information value calculated by the original image, and the difference between the original image and the distorted image visual information. The value can evaluate the quality of the image, which facilitates the calculation process and improves the efficiency of video processing. At the same time, the visual information difference between the two distorted images can also be used as an indicator to evaluate the image quality.

Comparison experimental design and results
In order to verify the feasibility of visual information difference metrics evaluation (DVIM) in image quality evaluation, this paper applies this method to LIVE image database for a large number of experiments. At the same time, the SSIM method and the Wavelet marginal method were used to compare and test in the same image library. The experiment is as follows: Experiment on the LIVE database, assuming that the size of the database is N, that is, contains N images. The results were calculated using five methods for calculating correlation, and the results were obtained as shown in Tab. 1. As can be seen from the final data in the above table, the evaluation performances of visual information difference (DVIM) and Wavelet marginal are almost the same, and even have a large improvement. That is, the DVIM method can perform the same evaluation function as the currently popular image quality evaluation method. (B) In order to further verify the effectiveness of the method, the scatter plots of SSIM, Wavelet and DVIM are plotted, as shown in Fig. 12. Figure 12: Scatter point distribution of three kinds of objective quality assessment methods Fig. 12 shows a scatter plot of SSIM, Wavlet, and DVIM with subjective scoring, respectively. As can be seen from the figure: (1) The result of Wavlet fitting is most linear, as shown in Fig. 12(b), but its distribution is too scattered and does not fit the subjective prediction well.
(2) The DVIM method is more concentrated, especially in the densely distributed and small values, which shows a good linear relationship with the subjective scores. This shows that the predicted results are similar to the subjective evaluation results, as shown as Fig. 12(c). Therefore, the method can predict the quality level of the image well, so the DVIM method can effectively evaluate the quality of an image.  From the final results, the PSNR method and the SSIM method are not as adaptable to the transformation of translation, rotation and scaling as DVIM, and the evaluation results given by these two transformations are very low, but the evaluation of the last JPEG compression is better than others. The changes of translation, rotation and zoom are much higher, which is obviously inconsistent with the subjective quality evaluation of "human eyes". (D) The result calculated by the DVIM method indicates the difference in the amount of information contained in the two images, which is independent of the position of the pixel image, so that the distribution of the corresponding pixel points caused by the geometric transformation can be effectively avoided. The subjective feeling of "human eyes". Therefore, the visual information difference measure (DVIM) can meet the subjective judgment of the "human eye" to a certain extent.

Conclusion
This paper discusses an image quality assessment method that replaces the subjective perception of the human eye, the visual information difference metric (DVIM). After several sets of test experiments, the method and the current popular quality evaluation have greatly improved the geometrical changes of the image, and the method is independent of the position of the pixel image, effectively avoiding the correspondence caused by the geometric transformation. The problem of scattered pixel distribution is more in line with the subjective feeling of "human eye", which proves that this method is an effective image quality evaluation method.