A Survey on Quantitative Metrics for Assessing the Quality of Fused Medical Images

The fused image derived from multimodality, multi-focus, multi-view and multidimensional, for real world applications in the field of medical imaging, remote sensing, satellite imaging, machine vision etc., are gaining much attention in the recent research. Therefore, it is important to validate the fused image in different perspectives such as information, edge, structure, noise and contrast for quality analysis. From this aspect, the information of fused image should be better than the source images without loss of information and false information. This survey is focused on analyzing the various quantitative metrics that are used in the literature to measure the enhanced information of fused image when it is compared to the source/reference images. The objective of this study is to group or classify the metrics under different categories such as information, noise, error, correlation and structural similarity measures for discussion and analysis. In reality, the calculated metric values are useful in determining the suitable fusion technique of the particular dataset with its required perspective as an outcome of the fusion process.


INTRODUCTION
Image quality assessment plays an important role in many domains such as compression, fusion, registration and reconstruction. The assessment is carried out, with the help of the experts of the domain or using the statistical parameters and further used for comparing different image processing algorithms depending on its requirement. The measures related to image quality evaluation can be classified as quantitative (objective) and qualitative (subjective) measures. In qualitative measure, users rate the images based on the effect of degradation and it varies from user to user whereas quantitative metrics, finds the difference in the images owing to process (Petrovic, 2007). This study is focused on the metrics related to one of the active research domain of the recent era, image fusion .
Image fusion is the process of combining complementary information from different source images into a single image without loss of information and false information (Image Fusion, 2014). During the process, same dimension image data is registered for convenience in the fusion process and post processing analysis. The fusion process is broadly classified as spatial or transformation based techniques, whereas each technique has different fusion methods, that was widely discussed in (James and Dasarathy, 2014;Mitchell, 2010). It plays a vital role in many applications such as disease diagnosis and computer assisted surgery using medical imaging, biometric authentication using bimodality acquisition and automatic target detection using remote sensing, in which medical imaging is one of the prominent research areas of the recent era. In (Kavitha and Thyagharajan 2014;Zhang et al., 2015;Radhika and Kavitha, 2013;Singh and Khare, 2013;Rockinger, 2013;Rajkumar and Kavitha, 2010;Yong Yang et al., 2014), different image fusion methods related to pixel and region based approaches are discussed. Also, the suitable fusion process for the specific combination of dataset is identified from the benchmark datasets using subjective and objective measures (Brain Images Source, 2014b; Brain Images Source, 2014a). With this brief introduction, we move on to the metrics related to analyzing the fused medical image and a short introduction on remote sensing images with its metrics, since the discussion of fusion techniques is not related to this study. The fused image is analyzed with respect to different perspectives such as information, edge, contrast, shape, structure, correlation, noise and error based on the requirement of users or applications.
The assessment of fused image is broadly divided into two categories: reference based (bivariate) and non reference (univariate) based assessment. In reference based assessment, a fused image is evaluated against the reference image which serves as a ground truth (Larson and Chandler, 2010). In assessment without reference images, the fused images are evaluated against the original source images for similarity and improvement in information.
Some of the important reference metrics reported in the literature are Wang's Image Quality Index (2002), the Root Mean Square Error (RMSE), Peak Signal-to-Noise Ratio (PSNR), Mean Absolute Error (MEA), Structural Similarity Index (SSIM) proposed by Wang et al. (2005), Image Fidelity (IF), Average Gradient Index (AG) (Zhu, 2002) and in (Eskicioglu and Fisher, 1995), correlation metrics are discussed for reconstructed image, which is applicable for fusion also. In medical imaging, the images acquired from different modality or sensor for a patient during the same time is considered as reference images in validation.
The non-reference assessment measures are generally of two types. One type determines the image quality by extracting features from the fused image itself. The metrics such as standard deviation, entropy and SNR estimation as given by Zhang and Blum (1999), falls under this category. The other type utilizes features extracted from both the fused image and the source images and are used to determine the amount of useful information transferred from the source to the fused image. Metrics such as Mutual Information (MI), Cross Entropy (CE), objective-edge based measure (Xydeas and Petrovic, 2000) and the Universal Index based measure (UI) as proposed by Piella and Heijmans (2003), the Tsallis entropy (Cvejic et al., 2006) and the Renyi entropy measure put forward by Zheng et al. (2008) defined the quality of the fused output based on the source images as the reference images.
Apart from the above discussed classification of metrics, some authors grouped the metrics depending on their requirement is listed below: • In Lin and Jay Kuo (2011), six metrics are defined under perceptual visual quality metrics (PVQMs) for the prediction of picture quality according to human perception. The metrics under this group are SSIM, VSNR, IFC, VIF, MSVD and PSNR. • In , the metrics are classified into four groups, such as statistical feature related (SNR, PSNR, MSE), mutual information related (MI, FS, FF), correlation information related (NCC, NCIE) and information deviation measure. • In Eskicioglu and Fisher (1995) metrics related to information, structure, correlation and error are listed, for the assessment of quality of source to the reconstructed image.
A short introduction for the metrics related to remote sensing and colour images are given here, for further reading in that domain. Also to retain the spectral information of medical images these methods can be preferred. In remote sensing domain, panchromatic image (PAN) and multispectral image (MS) are the two different modality images acquired and used for analysis, in which PAN has high spatial and low spectral resolutions whereas MS has high spectral and low spatial resolutions. The fusion of high spatial resolution PAN image with high spectral resolution MS image is an important issue for many remote sensing and mapping applications (Chang, 2000;Vijayaraj et al., 2006). Generally PAN sharpening algorithms are developed on the basis of improving the spatial resolution of the MS image while simultaneously retaining its spectral information. Also, the spectral information of fused image is validated using the quantitative measures, namely Spectral Angle Mapper (SAM) -calculates the average change in angle of all spectral vectors, Spectral Information Divergence (SID) -views each pixel spectrum as a random variable and then measures the discrepancy of probabilistic behaviours between spectra, Relative Average Spectral Error (RASE) -characterizes the average performance of the method of image fusion in the spectral bands, Correlation Coefficient (CC) -calculates the spectral distortion by comparing the CC between the original multispectral bands and the bands of the final fused image, RMSE is the root mean square error between the fused image and the multispectral image, Relative dimensionless global error in synthesis (ERGAS) is a normalized version of RMSE designed to calculate the spectral distortion (Rahmani et al., 2010;Ranchln and Wald, 2000;Zhang, 2008). Some other spectral measures rarely used in literature are: Spectral Distance Similarity (SDS), Pearson Spectral Correlation Similarity (SCS), Spectral Similarity Value (SSV), Modified Spectral Angle Similarity (MSAS) and Constrained Energy Minimizing (CEM) technique. Choi et al. (2014), the metrics are classified into two groups namely spatial (EN, UIQI, SNR, AG) and spectral metrics (CC, RASE, SAM, SID, ERGAS). The review of Jagalingam and Hegdeb (2015) discusses the measures related to PAN and MS. Gao et al. (2013), introduced a new contrast based greyscale image quality measure named as Root Mean Enhancement (RME) and for colour images the measures are: colour RME, contrast measure CRME -explores the three dimensional contrast relationships of the RGB colour channels, Colour Quality Enhancement (CQE) is a metric based on the linear combination of colourfulness, sharpness and contrast are discussed. These colour measures can be used to retain the tumor region in colour contrast for the images of type PET or SPECT in fusion process.
In practical applications, however, neither qualitative nor quantitative assessment alone is not satisfying the needs perfectly. Given the nature of complexity of specific applications, a new assessment paradigm combining both qualitative and quantitative assessment is most appropriate in order to achieve the best assessment result. The ultimate goal of this study on fused image quality evaluation is to list and explain all the measures used in the literature in different perspectives, for the motivation of developing a new measure that has to be consistently used for many applications (Anusha and Kavitha, 2015). This section, introduced some important metrics and its classifications in various authors' perspective, in detail, each metric is discussed in the following sections.
The objective of this study is to group or classify the quantitative metrics of fused images under different categories such as information, noise, error, correlation and structural similarity measures for discussion and analysis. In addition, a short description is given about qualitative measures. Further the importance of metrics in statistical analysis is illustrated using fused brain images.

RESEARCH SURVEY
Qualitative (subjective) measures: Subjective evaluation measure is still a method commonly used in measuring image quality by expertise of the domain. At the time of subjective test, expertise focuses on difference between fused images to the original images, while grading they notice where information loss cannot be accepted. The measure of subjective case is Opinion Score (OS) with two kinds of rules namely absolute and relative. The categories of absolute are excellent, good, fair, bad and very bad whereas the relative are best in the group, better than the average, average of the group, worse than the average and worst in the group. Each category is assigned with a numeric value for ease of understanding. For the images of special application, conclusion should be drawn by the professional is more important than by the amateur.
Quantitative (objective) metrics: Quantitative metrics are derived from the statistical parameters based on the application or requirement. In this study, the metrics stated in the literature are collected and grouped into five categories based on its outcome as: information, error, noise, correlation and structure similarity. In Table 1, the metrics related to measuring the information from single (fused) image and the metrics related in finding the enhancement of fused image over source images is listed. In Table 2, the metrics of remaining four groups are listed (Xydeas and Petrovic, 2000;Naidu and Raol, 2008).
The metrics listed in the Table 1 and 2 are defined and discussed in the following sub sections.
Information metrics: Information metrics are related to image information (texture) or information gained in the fused image when compared to the source images, in various aspects namely contrast, visibility, edge information, luminance etc.

Information Entropy (IE): Information Entropy (IE)
is a statistical measure of randomness that can be used to characterize the texture of the image. It is measured using Eq. (1) and (2). A higher value for the entropy signifies better information of the fused image: where, L is the number of grey levels: Spatial Frequency (SF): This frequency in the spatial domain indicates the overall activity level (row wise, column wise) of the fused image and it is calculated as shown in Eq.
(3) to (5): Standard deviation (SD): SD measures the contrast of the image. This metric would be more efficient in the absence of noise. An image with high contrast would have a high standard deviation and it is specified in Eq. (6) and (7): where, ℎ {˩{ is the normalized histogram of the fused image H {˲, ˳{ and L is the number of frequency bins in the histogram.

Image variance (VA):
The simplest focus measure is the variance of image gray levels. Higher the value of variance, higher will be the image information. The computation of variance for the M*N image is given in Eq. (8): where, µ is the mean value and is given in Eq. (9): Uniform parameter (d): This measure computes the pixel intensity distribution of either a block or an image, represented as two dimensional array of pixels and the pixel in the i th row and j th column is denoted by I(i, j). Then the uniform parameter d of an image is calculated as given in Eq. (10): where, µ k is the mean of the block. This measure is also used in the analysis of focused image.

Average gradient (F {:
This measure is related to the content of edge information of an image. After the fusion process it checks whether the edge information is retained or enhanced or lost when compared to the source images (Kim et al., 2010). The equation for measuring the average gradient is given in Eq. (11). The higher value of the average gradient is the more clear-cut of the image: Tenengrad sharpness measure (T): This measure gives the higher value for the images with sharp edges/regions using directional gradient parameters F x and F y . The corresponding Eq. is given in 12: where, m* n is the total number of pixels in F and x, y denotes directional gradient operations.

Energy of image Gradient (EOG):
This measure is equivalent to average gradient in computation aspect and returns the overall edge information. Higher the value of EOG indicates the image with better information. The corresponding formula is given in Eq. (13): where,

Energy of Laplacian of an image (EOL):
It is used for analyzing the high spatial frequencies associated with image sharpness as specified in Eq. (14). This value should be high for the image with good quality: where,

Universal image quality index (Q):
This quality measure gives the overall quality of the image as a single value, as defined in Eq. (15): The Q measure has been modified to represent three factors such as loss of correlation, luminance distortion and contrast distortion between transformed fused images to the source images as shown in Eq. (16). The overall range of this metric is -1 to +1: When the three parameters are calculated separately the range for luminance and contrast is 0 to 1: where, x is the source image1/image2 and y is the fused image.
W{V, V/ͨ{ metric: This measure is related to edge information, proposed by Xydeas and Petrovic (2000), given in Eq. (17). It returns the high value, when the image has variations in edge strength and orientation: where, {˱{ = computed over a window; c(w) = max( $ + $ { over a window and ˝H H # , H ˱{ is the quality index over a window for a given source image and fused image. The c(w) can be computed from saliencies of source images also.

Resembility (XSD):
This measures the approximation between the original image and the fused (reconstructed) image, as given in Eq. (20). Higher the value indicates better approximation is derived (Cosman et al., 1994): where, f is the fused image and g is the input image.

Fidelity (BZD):
The approximation between the fused (reconstructed) image and the original image is given by BZD, as specified in Eq. (21). Higher the value of BZD indicates a good approximation that exists between the images (Sheikh et al., 2005): where, f is a fused and g is a source image.

Average Difference (AD):
This metric gives the average difference between the input and fused image. Its corresponding computation is given in Eq. (22): Overall Cross Entropy (CE): Cross entropy evaluates the similarity in information content between input images and fused image. The fused and input images containing the same information would have a low value comparatively. To find the overall cross entropy take the average of I 1 to I f and I 2 to I f, as specified in Eq. This measure can be calculated from pixel intensity is also given in Eq. (1).

Overall Mutual Information (MI): MI measures how much information is obtained after the fusion of source images. It is measured using Eq. (24) or Eq. (25). If the MI value is high then it indicates better fusion process:
MI measures the degree of dependence of two images. If A and B, are the registered images then the MI is defined by the following Equation:

MI (A, B) = IE(A) + IE(B) -JE(A, B)
where, IE(A) and IE(B), denotes the information entropy of image A, image B respectively. JE (A, B) is the joint information entropy of two images A and B.
The MI can also be calculated using the histogram representation as given in Eq. (25).

UIQI with gradient (Q g ):
The quality index proposed by Wang-Bovik has been proven very efficient on image fusion performance evaluation as it considers three factors such as correlation, luminance and contrast, which are crucial in image quality measurement. Besides these three factors, many studies exhibit that in Human Visual System (HVS), the gradient (edge) information plays an important role when human subject judges the quality of an image. Owing to this reason, the local gradient information is added into the UIQI metric and proposed by Blasch et al. (2008). The gradient information of an image is computed using edge detection method (Sobel), for each image pixel, which is denoted as g. Therefore, the new UIQI metric with gradient can be presented as given in Eq. (26): The resultant fused image F from the source images A and B, computed using Q is given below: Summary: The information metrics related to fused image and the metrics related to comparing the source images to the fused image are discussed. Basically the term quality conveys information in terms of texture and edge. In some metrics, along with texture information, structure, spectral and correlation are added. Therefore these metrics can be classified in structure or correlation group also. For all these measures the calculated value of fused/reconstructed/fused to source images should be higher than the basic source images. The overall MI metric is represented as Fusion Factor (FF) by some authors. Also to check the degree of symmetry Fusion Symmetry (FS) is used. From these two measures (FF and FS), Fusion Index is proposed as a ratio between MI (A, F) to MI (B, F). Some authors classifies the visual information measures under this group namely VIF, IFC, VSNR, MAD etc. (Chandler and Hemami, 2007;Sheikh and Bovik, 2006;Wang and Li, 2011).
Error metrics: Error related measures are useful in finding the difference in various aspects, preferable for reference image to the fused image. However it can be used for non reference image also. For better information the computed error value should be less (Wang et al., 2004).

Root Mean Square Error (RMSE):
The RMSE portrays the average change in a pixel caused by image processing algorithms. It is the average sum of distortion in each pixel of the re-constructed fused image, as given in Eq. (29). It is zero when the source and fused image are equal: (29) where, X(i, j) and Y(i, j) are the source images.
where, norm is the operator to compute the largest singular value.

Normalized Absolute Error (NAE):
Normalized absolute error is a measure to validate the difference in reconstructed image from the original image. The value of zero being the perfect fit. The corresponding Equation is given in (34): Summary: The error metrics are generally applied when the reference images (ground truth) are exists for analysis. Here source images from different modality are considered as reference images and the fusion technique, which results in lowered error value, is considered as suitable fusion technique of that dataset, since the outcome is more dependent on dataset in medical imaging.
Noise metrics: Noise metrics are used to measure the artefacts generated through the fusion process.
Signal to Noise Ratio (SNR): SNR of an image is defined as the ratio of the mean pixel value to the standard deviation pixel values and its formula is presented in Eq. (35):

SNR = Mean/Standard Deviation (35)
Peak Signal to Noise Ratio (PSNR): It's the ratio between the maximum possible intensity value of pixels and the power of corrupting noise that affects the fidelity of its representation. It is measured using Eq.
(36). The signal in this case is the original data and the noise is the error introduced by fusion: where, MSE is the Mean Square Error and I is the maximum possible pixel value. PSNR is modified and proposed in Wang and Li (2011), as Information Content Weighted PSNR as given in Eq. (37). In addition to this, Contrast Weighted PSNR (CTW-PSNR), Saliency Weighted PSNR (SW-PSNR) and Distortion Weighted PSNR (DW-PSNR) are also discussed with seven benchmark datasets with its significance in applications: Summary: The value of noise metrics should be high than the source images, that indicates artefacts are suppressed. These measures are suitable for signals rather than images. At present, the PSNR modifications are proposed and used in various domain datasets (Sheikh and Bovik, 2006;Wang and Li, 2011).

Structural metrics:
These metrics are related to measuring the similarity of source to the fused images and places a vital role in human visual system analysis (Wang et al., 2004;Zhou, 2009).

Structural Similarity Index Metric (SSIM):
Structural similarity is a method for measuring the similarity between two images through their pixel intensities. It compares local patterns of pixel intensities that have been normalized for luminance and contrast, based on universal index. The SSIM index returns the decimal value between 0 to 1 (Wang et al., 2005) and the mathematical representation is given in Eq. (38): where, µ A and µ B are the mean intensities, σ A and σ B are the standard deviations and σ AB is the covariance of A and B,C 1 and C 2 are the small constants. It defines the link between the structural information changes in images and the perceived distortions of the images. When C 1 = C 2 = 0, it corresponds to universal image quality index (Q). SSIM metric is related to structure, quality and constant value. When it is measured for a window it is termed as FSM. In addition, SSIM is modified in different angles and proposed as a new metric namely Multiscale SSIM, Mean SSIM, Discrete Wavelet Transform SSIM, Complex Wavelet SSIM, Edge based SSIM, gradient based SSIM, Information content weighted SSIM and Contourlet SSIM. F blind and P blind are the metrics related to pixel based fusion, which are proposed and evaluated in Wang et al., 2005;Wang and Li, 2011). The implementation of SSIM is available online at (Zhou, 2009).

Discrete Wavelet Transform SSIM (DWT-SSIM):
In DWT-SSIM (edge and gradient based SSIM), the third parameter of basic SSIM, that is structure is altered and the first two parameters (luminance and correlation) remains same (Yang et al., 2008b) where, f stands for the fused image while a and b are the inputs. Initially, the FSSIM between the fused image and the input image is computed and then summation and average are taken to derive the final value. The larger value means stronger feature from input image is detected in the fused image.

P-blind metric:
It is a feature based metric using SSIM without reference images. Initially the phase congruency maps of the input and fused images are calculated. A third feature map Mpc is derived by point-by-point maximum selection of the two input maps Apc and Bpc, by retaining the larger feature points in Mpc. The evaluation index P blind is the average over all the blocks is diagrammatically shown as a flowchart in Liu et al. (2008) for further reference.

Fusion Similarity Metric (FSM):
It considers the similarity between source and fused image block within the same spatial position as given in Eq. (48) and (49)

Summary:
The nine modifications of basic SSIM in quality aspect with different parameters are explained in this section with its reference. Structure and quality plays an important role in diagnosis of tumor grade whereas luminance and correlation helps to diagnose the growth in relevance of time.
Correlation metrics: Correlation metrics are used to calculate the deviation from source to the fused images. The metrics commonly used are listed below and the variations also exist with specific constraints. For example, Wei and Blum (2009), modified the correlation measure for weighted averaging as a constraint.

Correlation metric (CORR):
This measure shows the correlation between the source and fused images, as defined in Eq. (51). The ideal value is one when the source and fused images are exactly alike and it will be less than one when the dissimilarity increases: If the two images are the same or perfectly matched this will give a result = 1.

Spearman's Rank Correlation Coefficient (SRCC):
SRCC is a nonparametric rank-based correlation metric, as defined in Eq. (57). It is independent of any monotonic nonlinear mapping between subjective and objective scores where d i is the difference between the image's ranks in subjective and objective evaluations (VQEG, 2000):

Kendall's Rank Correlation Coefficient (KRCC):
KRCC is also a nonparametric rank correlation metric given in , where N c and N d are the numbers of concordant and discordant pairs in the data set, respectively. The corresponding Equation is given in (58): Summary: In this section, seven correlation metrics are discussed. These metrics are used to find the deviation between the source images and the fused image. Hence, highest value indicates better correlation existing between the images.

Application of fusion metrics in brain image analysis:
In medical imaging, fusion methods are used effectively for brain images acquired from different modality for a patient during the same time than the other organ images. The appropriate fusion methods can be selected based on the constraints required for diagnosis and further the outcome is analysed using the metrics related to it as given in Table 3.

CONCLUSION AND RECOMMENDATIONS
The study on various criteria for image quality evaluation is a meaningful complicated task. The criteria will be used to evaluate the fusion algorithm and to guide the design of algorithm as well. We have classified the image quality measures of quantitative type into five groups and explained the measures with formula. In addition to that, the modification of the measures of a specific group is illustrated and cited. The constraints and appropriate metrics for the validation of fused brain images is summarized at the end, since the objective of this review is intended for validating the fused image derived from multimodality medical imaging. Also, a short introduction on remote sensing images is discussed. The subjective measures are concerned, it should be studied deeply and to be improved with the result of an expert opinion. At the same time, to devise a new quantitative measure or to modify an existing measure, the fused image should be analyzed, related to applications of medical imaging such as tumor diagnosis, computer assisted surgery and planning for treatment. Finally, the selection of metrics for brain imaging towards tumor analysis with different