Spiking Cortical Model Based Multimodal Medical Image Fusion by Combining Entropy Information with Weber Local Descriptor

Multimodal medical image fusion (MIF) plays an important role in clinical diagnosis and therapy. Existing MIF methods tend to introduce artifacts, lead to loss of image details or produce low-contrast fused images. To address these problems, a novel spiking cortical model (SCM) based MIF method has been proposed in this paper. The proposed method can generate high-quality fused images using the weighting fusion strategy based on the firing times of the SCM. In the weighting fusion scheme, the weight is determined by combining the entropy information of pulse outputs of the SCM with the Weber local descriptor operating on the firing mapping images produced from the pulse outputs. The extensive experiments on multimodal medical images show that compared with the numerous state-of-the-art MIF methods, the proposed method can preserve image details very well and avoid the introduction of artifacts effectively, and thus it significantly improves the quality of fused images in terms of human vision and objective evaluation criteria such as mutual information, edge preservation index, structural similarity based metric, fusion quality index, fusion similarity metric and standard deviation.


Introduction
With the development of medical imaging technology, various imaging modals such as ultrasound (US) imaging, computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET) and single-photon emission computed tomography (SPECT) are finding a range of applications in diagnosis and assessment of medical conditions that affect brain, breast, lungs, soft tissues, bones and so on [1]. Owing to the difference in imaging mechanism and the high complexity of human histology, medical images of different modals provide a variety of complementary information about the human body. For example, CT is well-suited for imaging dense structures like non-metallic implants and bones with relatively less distortion. Likewise, MRI can visualize the pathological soft tissues better whereas PET can measure the amount of metabolic activity at a site in the body. Multimodal medical image fusion (MIF) aims to integrate complementary information from multimodal images into a single new image to improve the understanding of the clinical information in a new space. Thus, MIF plays an important role in diagnosis and treatment of diseases and has found wide clinical applications, such as US-MRI for prostate biopsy [1], PET-CT in lung cancer [2], MRI-PET in brain disease [3] and SPECT-CT in breast cancer [4].
Numerous image fusion algorithms have been proposed by working at pixel level, feature level or decision level. Among these methods, the pixel-level fusion scheme has been investigated most

Spiking Cortical Model
The spiking cortical model (SCM) is derived from several other visual cortex models such as Eckhorn's model [26,27]. The SCM has been specially designed for image processing applications. The structural model of the SCM is presented in Figure 1. As shown in Figure 1, each neuron N i,j at (i,j) corresponds to one pixel in an input image, receiving its normalized intensity as feeding input O i,j and the local stimuli from its neighboring neurons as the linking input. The feeding input and the liking input are combined together as the internal activity F i,j of N i,j . The neuron N i,j will fire and a pulse output Y i,j will be generated if F i,j exceeds a dynamic threshold Θ i,j . The above process can be expressed by [40]: where f and g are decay constants less than 1; h is the scalar of large value; n denotes the number of iterations (1 ≤ n ≤ N max , N max is the maximum iteration times); and W i,j,k,l is the synaptic weight between N i,j and its linking neuron N k,l and it is defined as: Sensors 2016, 16, 1503 3 of 20

Spiking Cortical Model
The spiking cortical model (SCM) is derived from several other visual cortex models such as Eckhorn's model [26,27]. The SCM has been specially designed for image processing applications. The structural model of the SCM is presented in Figure 1. As shown in Figure 1, each neuron , i j N at (i,j) corresponds to one pixel in an input image, receiving its normalized intensity as feeding input , i j O and the local stimuli from its neighboring neurons as the linking input. The feeding input and the liking input are combined together as the internal activity , process can be expressed by [40]:  Through iterative computation, the SCM neurons output the temporal series of binary pulse images. The temporal series contain much useful information of input images. To explain this point better, Figure 2 shows the temporal series produced by the SCM with f = 0.9, g = 0.3, h = 20 and Nmax = 7 operating on an input MR image shown in Figure 2a. In Figure 2, we can see that during the various iterations, the output binary images contain different image information and the outputs of the SCM typically represent such important information as the segments and edges of the input image. The observation from Figure 2 indicates that the SCM can describe human visual perception. Therefore, the pulse outputs of the SCM can be utilized for image fusion. Through iterative computation, the SCM neurons output the temporal series of binary pulse images. The temporal series contain much useful information of input images. To explain this point better, Figure 2 shows the temporal series produced by the SCM with f = 0.9, g = 0.3, h = 20 and N max = 7 operating on an input MR image shown in Figure 2a. In Figure 2, we can see that during the various iterations, the output binary images contain different image information and the outputs of the SCM typically represent such important information as the segments and edges of the input image. The observation from Figure 2 indicates that the SCM can describe human visual perception. Therefore, the pulse outputs of the SCM can be utilized for image fusion.

SCM Based Image Fusion
The weighting fusion framework of the proposed SCM-M method is given in Figure 3. The key components of this method include the fusion rule and the weight computation. In the proposed method, the fusion rule is established based on the firing times of pulse outputs generated by the SCM. The weight is computed based on the similarity between the two source images, which is determined by combining the entropy information of pulse outputs from the SCM with the WLD operating on the resultant firing mapping image (FMI).

Fusion Rule
The two source images, A and B, are normalized and fed into the two SCMs as the external stimulus. By running the SCMs for Nmax times, we will obtain the firing times ,

SCM Based Image Fusion
The weighting fusion framework of the proposed SCM-M method is given in Figure 3. The key components of this method include the fusion rule and the weight computation. In the proposed method, the fusion rule is established based on the firing times of pulse outputs generated by the SCM. The weight is computed based on the similarity between the two source images, which is determined by combining the entropy information of pulse outputs from the SCM with the WLD operating on the resultant firing mapping image (FMI).

SCM Based Image Fusion
The weighting fusion framework of the proposed SCM-M method is given in Figure 3. The key components of this method include the fusion rule and the weight computation. In the proposed method, the fusion rule is established based on the firing times of pulse outputs generated by the SCM. The weight is computed based on the similarity between the two source images, which is determined by combining the entropy information of pulse outputs from the SCM with the WLD operating on the resultant firing mapping image (FMI).

Fusion Rule
The two source images, A and B, are normalized and fed into the two SCMs as the external stimulus. By running the SCMs for Nmax times, we will obtain the firing times ,

Fusion Rule
The two source images, A and B, are normalized and fed into the two SCMs as the external stimulus. By running the SCMs for N max times, we will obtain the firing times T A i,j and T B i,j for each pixel at (i,j) in the source images as: Based on the firing times T A i,j and T B i,j , two FMIs T A and T B will be produced. From Equations (5) and (6), we can see that the FMI is actually the sum of temporal series of pulse outputs produced by the SCM. Figure 4 shows the FMIs generated by the SCM with f = 0.9, g = 0.3, h = 20 and different N max operating on a pair of MR images. It should be noted that here the FMIs have been scaled linearly to fit the range [0, 255]. In Figure 4, we can see that the FMI provides a means for representing the information of source images. The representation ability is related to the parameters of the SCM, especially the parameter N max . A too small (e.g., N max = 10) or a too large N max (e.g., N max = 50) tends to produce the loss of important image details as shown in Figure 4b,d. A proper N max for the SCM can facilitate representing image details in the source images very well, which is of great significance for medical image fusion.
Based on the firing times , T will be produced. From Equations (5) and (6), we can see that the FMI is actually the sum of temporal series of pulse outputs produced by the SCM. Figure 4 shows the FMIs generated by the SCM with f = 0.9, g = 0.3, h = 20 and different Nmax operating on a pair of MR images. It should be noted that here the FMIs have been scaled linearly to fit the range [0, 255]. In Figure 4, we can see that the FMI provides a means for representing the information of source images. The representation ability is related to the parameters of the SCM, especially the parameter Nmax. A too small (e.g., Nmax = 10) or a too large Nmax (e.g., Nmax = 50) tends to produce the loss of important image details as shown in Figure 4b,d. A proper Nmax for the SCM can facilitate representing image details in the source images very well, which is of great significance for medical image fusion. For each pixel at (i,j) in two FMIs A T and B T , two image patches of size (2Lp + 1) × (2Lp + 1) centered at this pixel will be considered in order to represent the local statistical characteristics, which is more advantageous for the effective fusion of source images than the characteristics of individual pixels. The statistical characteristics of the two image patches will be characterized by the local energy , i j E , the following fusion rule will be established and correspondingly the intensity of the pixel at (i,j) in the fused image U will be expressed as: For each pixel at (i,j) in two FMIs T A and T B , two image patches of size (2L p + 1) × (2L p + 1) centered at this pixel will be considered in order to represent the local statistical characteristics, which is more advantageous for the effective fusion of source images than the characteristics of individual pixels. The statistical characteristics of the two image patches will be characterized by the local energy E A i,j and E B i,j defined as: According to the relationship between E A i,j and E B i,j , the following fusion rule will be established and correspondingly the intensity of the pixel at (i,j) in the fused image U will be expressed as: where I A i,j and I B i,j denote the pixel intensity at (i,j) in the source images A and B, respectively; ω i,j denotes the weight, which has an important influence on the quality of the fused image because it will determine the contribution of source images to the fused result. To ensure good image fusion effect, ω i,j will generally take a relatively big value to highlight the contribution of i,j , and otherwise it will take a relatively small value to underscore the contribution of I B i,j .

Weight Computation
To obtain the suitable weight ω i,j , the local neighborhood (i.e., image patch) Q i,j of size (2L p + 1) × (2L p + 1) centered at (i,j) in any source image will be considered. It is desirable to compute this weight based on the similarity between two image patches Q A i,j and Q B i,j . To determine this similarity effectively, the gray-level information and the saliency of Q i,j will be utilized simultaneously.
Here, the entropy of pulse outputs and the Weber local descriptor proposed in [42] will be adopted to characterize the gray-level distribution of Q i,j and its saliency, respectively.

Similarity Computation Based on the Entropy Information
For any pixel at (i,j) in each pulse image at the n-th iteration, the image patch ) denote the probability of the 1's and 0's in G i,j [n], respectively. Here, the probability P 1 i,j [n] is defined as: where K i,j [n] denotes the number of 1's in G i,j [n]. The Shannon entropy from the various iterations will form the feature vector (10), we can see that if the image patch Q i,j in any source image is a homogenous region, the Shannon entropy H i,j [n] for all the iterations will be zero because P 1 i,j [n] = 1 or P 0 i,j [n] = 1, thereby producing a zero vector V i,j . Otherwise, because H i,j [n] will not be zero for some iteration times, V i,j will include the nonzero elements, whose values will depend on the gray-level distribution of Q i,j . The above analysis indicates that the gray-level information of Q i,j can be characterized by V i,j . Accordingly, V i,j can be considered as the feature extracted from Q i,j .
The difference D i,j between the features of two image patches Q A i,j and Q B i,j is calculated as: where || · || 2 denotes the Euclidean norm. Based on the difference D i,j , the similarity S En i,j between Q A i,j and Q B i,j based on the entropy information will be defined as: where CS 1 is a constant.

Similarity Computation Based on the WLD
In this paper, the WLD is adopted to extract the salient features of an image patch of interest in the firing mapping image, which can be utilized to represent the saliency of the image patch Q i,j in any source image. The WLD is chosen due to its high computational efficiency and excellent ability in finding local salient patterns within an image to simulate the pattern perception of human beings [42]. Indeed, there are many other sparse and dense descriptors such as the scale invariant feature transform (SIFT) and the local binary pattern (LBP). Compared with the SIFT the LBP, the WLD is computed around a relatively small square region and it extracts the local salient patterns by means of the differential excitation [42]. Therefore, the WLD can capture more local salient patterns than the SIFT and the LBP.
The computation of the WLD stems from Weber's Law that the ratio of the increment threshold (i.e., a just noticeable difference) to the background intensity is a constant. For the current pixel at (i,j) in the FMI T A or T B , the difference R i,j between this pixel and its neighbors in an image patch of size (2L p + 1) × (2L p + 1) is given by: It can be seen from Equation (14) that the computation of R i,j is very similar to the Laplacian operation. Following the hints in Weber's Law, the differential excitation ξ i,j of the current pixel for the WLD is computed as [42]: where the arctangent function is used to prevent the output from increasing or decreasing too quickly when the input becomes larger or smaller [42]. As discussed in [42], the WLD can indicate the saliency of the local neighborhood very well because of its powerful representation ability for such important features as edges and textures. To explain this point better, Figure 5 shows the results of the WLD operating on the FMIs shown in Figure 4c,d,g,h. The comparisons between Figures 4 and 5 show that both the strong and weak edges in Figure 4 have become more salient in the results of the WLD than in the FMIs. Therefore, the WLD operating on the FMIs can bring out the local image structural features of source images very well, which are highly beneficial for medical diagnosis based on different imaging modalities. It will be desirable to utilize these extracted salient image features to determine the similarity S W LD i,j between Q A i,j and Q B i,j in the source images, i.e., where CS 2 is a constant.

Weight Determining Based on the Combined Similarity
By combining , (9) will be presented as: From Equation (17), we can see that if the two image patches centered at (i,j) in the source images have the same local structure, which means the same WLD, , i j  will be determined by , which is related to the intensity distributions of image patches in the source images. Likewise, , , which is related to the local image structure if the two image patches have the same gray-level distributions. The above analysis shows that , i j  can represent the similarity between two image patches effectively by combining their gray-level information with their local structure. It should be noted that , i j  is also likely to be computed using such non-Euclidean similarity measures as Cosine distance measure and Pearson correlation, which is scale and translation invariant. When Pearson correlation is used to measure the similarity between Shannon entropy feature vectors of two considered image patches, it can address scale and translation changes of feature vectors. However, this correlation requires that the variables follow a bivariate normal distribution. The possibility of utilizing non-Euclidean similarity measures for the similarity computation in the weighting fusion strategy will be explored in-depth in future work.

Implementation of the SCM-M Method
The implementation of the proposed SCM-M method can be summarized as the following steps: (1) The two source images A and B are input into two SCMs. After running the SCM for Nmax times, the series of binary pulse images will be obtained for the source images using Equations (1)-(4). (2) For each pixel at (i,j) in A and B, the Shannon entropy from the various iterations is computed on the output pulse images using Equation (10) (9). , the weight ω i,j in Equation (9) will be presented as: From Equation (17), we can see that if the two image patches centered at (i,j) in the source images have the same local structure, which means the same WLD, ω i,j will be determined by S En i,j which is related to the intensity distributions of image patches in the source images. Likewise, , which is related to the local image structure if the two image patches have the same gray-level distributions. The above analysis shows that ω i,j can represent the similarity between two image patches effectively by combining their gray-level information with their local structure. It should be noted that ω i,j is also likely to be computed using such non-Euclidean similarity measures as Cosine distance measure and Pearson correlation, which is scale and translation invariant. When Pearson correlation is used to measure the similarity between Shannon entropy feature vectors of two considered image patches, it can address scale and translation changes of feature vectors. However, this correlation requires that the variables follow a bivariate normal distribution. The possibility of utilizing non-Euclidean similarity measures for the similarity computation in the weighting fusion strategy will be explored in-depth in future work.

Implementation of the SCM-M Method
The implementation of the proposed SCM-M method can be summarized as the following steps: (1) The two source images A and B are input into two SCMs. After running the SCM for N max times, the series of binary pulse images will be obtained for the source images using Equations (1)-(4). (2) For each pixel at (i,j) in A and B, the Shannon entropy from the various iterations is computed on the output pulse images using Equation (10) to generate two feature vectors V A i,j and V B i,j . Based on the difference between the two feature vectors, the similarity S En i,j between two image patches Q A i,j and Q B i,j centered at (i,j) is computed using Equation (13). (3) The output pulse images are utilized to generate the firing mapping images for two source images.
For any pixel at (i,j) in two FMIs, the local energy E A i,j and E B i,j are computed on the considered two image patches centered at this pixel using Equations (7) and (8), respectively. Meanwhile, the WLD is computed for the two image patches using Equation (15) to determine the similarity S W LD i,j between Q A i,j and Q B i,j using Equation (16). (4) The weight ω i,j is determined by S En i,j and S W LD i,j using Equation (17).
(5) According to the relationship between E A i,j and E B i,j , the fused image is produced by the weighted sum of two source images using Equation (9).

Experimental Results and Discussions
To demonstrate the effectiveness of the proposed SCM-M method, extensive experiments have been done on eight groups of CT and MR images shown in Figure 6. All the images are chosen from the website [43]. Each image is of size 256 × 256. Two images in each image pair include the complementary information. Here, Groups 1-3 are three pairs of CT and MR images of different regions in the brain of a patient with acute stroke. Group 4 includes the transaxial MR images of the normal brain. Groups 5 and 6 are MR images of the brain of patients with vascular dementia and AIDS dementia, respectively. Groups 7 and 8 are two pairs of CT and MR images of the brain of the patients with cerebral toxoplasmosis and fatal stroke. Please note that intensity standardization and inhomogeneity correction have been performed on all MR images by the above website.

Experimental Results and Discussions
To demonstrate the effectiveness of the proposed SCM-M method, extensive experiments have been done on eight groups of CT and MR images shown in Figure 6. All the images are chosen from the website [43]. Each image is of size 256 × 256. Two images in each image pair include the complementary information. Here, Groups 1-3 are three pairs of CT and MR images of different regions in the brain of a patient with acute stroke. Group 4 includes the transaxial MR images of the normal brain. Groups 5 and 6 are MR images of the brain of patients with vascular dementia and AIDS dementia, respectively. Groups 7 and 8 are two pairs of CT and MR images of the brain of the patients with cerebral toxoplasmosis and fatal stroke. Please note that intensity standardization and inhomogeneity correction have been performed on all MR images by the above website. To verify the advantage of the SCM-M method, it has been compared with the discrete wavelet transform (DWT), the NSCT [21], the combination of NSCT and sparse representation (NSCT-SR) [21], m-PCNN [29], PCNN-NSCT [32] and SCM-F [41] based fusion methods. The code of the DWT and PCNN-NSCT methods is available on the websites [44] and [45], respectively. The code of the NSCT and NSCT-SR methods can be found on the website [46]. To verify the advantage of the SCM-M method, it has been compared with the discrete wavelet transform (DWT), the NSCT [21], the combination of NSCT and sparse representation (NSCT-SR) [21], m-PCNN [29], PCNN-NSCT [32] and SCM-F [41] based fusion methods. The code of the DWT and PCNN-NSCT methods is available on the websites [44,45], respectively. The code of the NSCT and NSCT-SR methods can be found on the website [46].

Parameter Settings
For the DWT fusion method, the wavelet and the number of decomposition levels are chosen to be DBSS(2,2) and 4, respectively. For the m-PCNN method and the SCM-F method, all the involved parameters are chosen as suggested in [29,41], respectively. For the NSCT and NSCT-SR methods, we have used the "pyrexc" as the pyramid filter and the "vk" as the directional filter. The number of directions of the four decomposition levels from coarse to fine is selected as 2, 3, 3, 4, respectively. For the PCNN-NSCT method, we have chosen the decay constant α L = 0.5, the linking strength β = 3 and the amplitude gain V θ = 0.5 in the PCNN while keeping other parameters to be the same as those in [32]. As regards the proposed method, we have fixed f = 0.9, g = 0.3, h = 20, L p = 1, CS 1 = 3 × N max , CS 2 = 5 × π and chosen N max to be close to 20.

Parameter Settings
For the DWT fusion method, the wavelet and the number of decomposition levels are chosen to be DBSS(2,2) and 4, respectively. For the m-PCNN method and the SCM-F method, all the involved parameters are chosen as suggested in [29] and [41], respectively. For the NSCT and NSCT-SR methods, we have used the "pyrexc" as the pyramid filter and the "vk" as the directional filter. The number of directions of the four decomposition levels from coarse to fine is selected as 2, 3, 3, 4, respectively. For the PCNN-NSCT method, we have chosen the decay constant 0.5 L   , the linking strength 3   and the amplitude gain 0.5 V   in the PCNN while keeping other parameters to be the same as those in [32]. As regards the proposed method, we have fixed f = 0.9, g = 0.3, h = 20, 1 p L  , CS1 = 3 × Nmax , CS2 = 5 × and chosen Nmax to be close to 20.   By comparison, the SCM-M method not only provides high contrast for the fused images, but also maintains important information from the various source images in the fused results effectively. In particular, the proposed method can preserve fine image details very well as shown by the red boxes in Figures 9g and 10g without introducing artifacts or leading to edge blurring. The above comparisons demonstrate the superiority of the SCM-M method over other compared methods in that the fused images obtained by this method are more clear, informative, and have higher contrast.

Visual Comparisons of Fused Results
To further verify the advantage of the proposed SCM-M method in multimodal image fusion, Figures 11 and 12 show the enlarged views of fused results for all evaluated methods operating on regions of interest (ROIs) denoted by the red boxes in Groups 1 and 6 in Figure 6, respectively. Figure 13 shows the enlarged views of fused results for the proposed method, the m-PCNN method and the SCM-F method operating on ROIs denoted by the red boxes in Groups 7 and 8 shown in Figure 6. In Figures 11 and 12, we can see that the SCM-M method can maintain the salient information in the source images and provide better visual perception with less loss in luminance or contrast than other compared methods. To explain this point better, some edges and regions have been chosen from Figures 11g and 12g. It can be seen from Figure 11 that the SCM-M method can provide better edge preservation than all other methods as pointed by the three red arrows. Meanwhile, compared with the DWT, NSCT and NSCT-SR methods, the SCM-M method can maintain the information in the MR image shown in Figure 6f better without introducing artifacts as indicated by the two red boxes. In Figure 12, we can see that the proposed method can keep the integrity of the edge marked by the red arrow best among all evaluated methods. Likewise, as pointed by the green arrow, the edge can be preserved very well by the proposed method while it has been damaged very seriously by other methods. Besides, the sharpness of the region shown by the red box can be maintained by the proposed method better than by the compared method. Furthermore, it can be seen in Figure 13 that compared with the m-PCNN and SCM-F methods, the SCM-M method can preserve fine image details and maintain image contrast better.
By comparison, the SCM-M method not only provides high contrast for the fused images, but also maintains important information from the various source images in the fused results effectively. In particular, the proposed method can preserve fine image details very well as shown by the red boxes in Figures 9g and 10g without introducing artifacts or leading to edge blurring. The above comparisons demonstrate the superiority of the SCM-M method over other compared methods in that the fused images obtained by this method are more clear, informative, and have higher contrast.
To further verify the advantage of the proposed SCM-M method in multimodal image fusion, Figures 11 and 12 show the enlarged views of fused results for all evaluated methods operating on regions of interest (ROIs) denoted by the red boxes in Groups 1 and 6 in Figure 6, respectively. Figure 13 shows the enlarged views of fused results for the proposed method, the m-PCNN method and the SCM-F method operating on ROIs denoted by the red boxes in Groups 7 and 8 shown in Figure 6. In Figures 11 and 12, we can see that the SCM-M method can maintain the salient information in the source images and provide better visual perception with less loss in luminance or contrast than other compared methods. To explain this point better, some edges and regions have been chosen from Figures 11g and 12g. It can be seen from Figure 11 that the SCM-M method can provide better edge preservation than all other methods as pointed by the three red arrows. Meanwhile, compared with the DWT, NSCT and NSCT-SR methods, the SCM-M method can maintain the information in the MR image shown in Figure 6f better without introducing artifacts as indicated by the two red boxes. In Figure 12, we can see that the proposed method can keep the integrity of the edge marked by the red arrow best among all evaluated methods. Likewise, as pointed by the green arrow, the edge can be preserved very well by the proposed method while it has been damaged very seriously by other methods. Besides, the sharpness of the region shown by the red box can be maintained by the proposed method better than by the compared method. Furthermore, it can be seen in Figure 13 that compared with the m-PCNN and SCM-F methods, the SCM-M method can preserve fine image details and maintain image contrast better.

Quantitative Comparison of Fused Results
The performance of these compared methods is appreciated in terms of quantitative indexes including mutual information (Q M ) [47], edge preservation index (Q E ) [48], structural similarity (SSIM) [49] based metric (Q S ) [50], fusion quality index (Q L ) [51] and the fusion similarity metric (Q T ) [52] and standard deviation (STD). Higher values for these indexes indicate better fusion results.
(1) Q M The metric Q M reflects the total amount of information that the fused image contains about two source images, and it is defined as: where where p(a, u) and p(b, u) are the discrete joint probability, p(b) and p(u) are the marginal discrete probabilities of A, B and U and are obtained by summing p over a, b, and u, respectively.
(2) Q E The metric Q E measures the similarity between the edges transferred during the fusion process, and it is defined as: where Q * ,U g,m,n and Q * ,U α,m,n denote the edge strength and orientation preservation values at (m, n) in the image A or B, respectively; w A m,n and w B m,n denotes the weight for Q A,U m,n and Q B,U m,n , respectively; and Y and Z are the width and the height of U, respectively.
(3) Q S The metric Q S employs the local SSIM between the source images as a match measure, according to which different operations are applied to the evaluations of different local regions [50]. This metric is defined as: where where w s is a sliding window, λ (w s ) is the local weight, W s is the family of all sliding windows, |W s | is the cardinality of W s , SSIM (A, U |w s ) is a measure for the similarity between the sliding window w s in A and that in U, and a similar definition can be extended to SSIM (B, U |w s ) and SSIM (A, B |w s ).
For the computation of Q S , all the involved parameters are kept to be same to those in [50] except that two constants C 1 and C 2 are chosen to be 2 × 10 −6 for the computation of SSIM.
(4) Q L s The fusion quality index Q L is computed as: where A , B and U denote the edge images of A, B and U, respectively. Q (A, B, U |w s ) is defined as: where C(w s ) represents the overall saliency of the sliding window w s and it is chosen as C(w s ) = max(s(A|w s ) + s max , s(B|w s ) + s max ) with s(A|w s ) , s(B|w s ) and s max denoting the variance of the window w s in the image A and that in the image B, and the difference between the maximum variance of all sliding windows in A and that in B, respectively.
The fusions similarity index Q T is computed as: where Q (A, U |w s ) and Q (B, U |w s ) are computed based on the universal image quality index [53]; and sim (A, B, U |w s ) is dependent on the similarity in spatial domain between the input images and the fused image and it is defined as a piecewise function presented in [52].

(6) STD
The metric STD can measure the contrast of the fused image, and it is defined as: where U is the mean intensity of the fused image. Tables 1-6 list Q M , Q E , Q S , Q L , Q M and STD of fused results for seven evaluated methods operating on the six groups of multi-modal medical images, respectively. In these tables, the "bold" value denotes the highest one for each metric. From these Tables, we can see that among all evaluated methods, the DWT produces the lowest Q M values and the NSCT method provides the smallest Q S and Q T for all test images. For the majority of medical image pairs, the m-PCNN method provides the lowest Q E and STD values while the PCNN-NSCT method produces the lowest Q L values. Compared with other evaluated methods, the SCM-M method provides higher six metrics values in all cases except that it is outperformed by the NSCT and NSCT-SR methods in terms of Q E for Groups 7 and 8 and the SCM-F method in terms of Q L for Group 6. The above comparisons demonstrate the superiority of the proposed method over the compared fusion methods in maintaining the information of source images, preserving the local image structure and image details, and ensuring the contrast of the fused image. To further demonstrate the superiority of the proposed method over other compared methods, the paired t-tests have been performed based on the data in Tables 1-6. The test results are listed in Table 7. The p values in Table 7 indicate that there exists very significant difference between the proposed method and other evaluated methods in terms of mutual information, structural similarity based metric, fusion similarity metric and standard deviation. As regards edge preservation index, there is no significant difference between the proposed method and the NSCT and NSCT-SR methods, but there still exists the significant difference between the proposed method and the DWT, PCNN-NSCT, and SCM-F methods. As for fusion quality index, the difference between the SCM-M Here we will make a simple analysis of the reason why the proposed method generally outperforms the m-PCNN method and the SCM-F method, which are similar to our method because of the utilization of the third generation neural networks. For the m-PCNN method, the fused image is produced based on the internal activity, which is only related to the pulse output of the individual neuron. In other words, only the individual neurons corresponding to the individual pixels in the two source images are used for image fusion while the characteristics of neighboring neurons of the considered neuron, which can facilitate representing the local image structure, has not been considered in this method. The same problem exists for the SCM-F method, in which only the firing times of the individual neurons is utilized to generate the fused image using the simple choosing and averaging strategy. By comparison, in the proposed SCM-M method, the characteristics of neurons in a local neighborhood are considered for the construction of the fusion rule and the determining of the fusion weight. As regards the fusion rule, the firing times of all the neurons in a neighborhood will be a more effective metric for the establishment of the fusion rule than that of the individual neuron. For the fusion weight, the WLD operating on the firing mapping image and the entropy information of pulse outputs of the SCM are computed in a local neighborhood to produce the weight. The combination of the WLD with the entropy information can facilitate determining the fusion weight effectively in that they can describe the local image structure and the gray-level information of source images very well.

Conclusions
A novel spiking cortical model based medical image fusion method is presented in this paper. The proposed method utilizes the pulse outputs of the SCM to realize pixel-level image fusion. By combining the gray-level image information represented by the entropy of pulse outputs with the local image structure represented by the Weber local descriptor operating on the firing mapping image, the proposed method can realize the effective weighted fusion of source images. Extensive experiments on the various CT and MR images demonstrate that the proposed method can produce clearer, more informative, higher contrast fused images than numerous existing methods in terms of human vision. Meanwhile, the objective comparison indicates that the proposed method outperforms the compared methods in terms of mutual information, edge preservations metric, structural similarity and standard deviation. Future work will be focused on extending our method to the fusion of multi-spectral medical images such as PET and SPECT images.