Adopting Quaternion Wavelet Transform to Fuse Multi-Modal Medical Images

Medical image fusion plays an important role in clinical applications such as image-guided surgery, image-guided radiotherapy, noninvasive diagnosis, and treatment planning. In this paper, we propose a novel multi-modal medical image fusion method based on simplified pulse-coupled neural network and quaternion wavelet transform. The proposed fusion algorithm is capable of combining not only pairs of computed tomography (CT) and magnetic resonance (MR) images, but also pairs of CT and proton-density-weighted MR images, and multi-spectral MR images such as T1 and T2. Experiments on six pairs of multi-modal medical images are conducted to compare the proposed scheme with four existing methods. The performances of various methods are investigated using mutual information metrics and comprehensive fusion performance characterization (total fusion performance, fusion loss, and modified fusion artifacts criteria). The experimental results show that the proposed algorithm not only extracts more important visual information from source images, but also effectively avoids introducing artificial information into fused medical images. It significantly outperforms existing medical image fusion methods in terms of subjective performance and objective evaluation metrics.


Introduction
Various medical image modalities are available, including magnetic resonance imaging (MRI), computed tomography (CT), ultrasonography, magnetic resonance angiography (MRA), positron emission tomography (PET), single-photon emission CT (SPECT), and functional MRI (fMRI) [1]. These modalities provide different information about human organs and have their respective application ranges. For example, CT and MRI can show anatomical information of the viscera with high spatial resolution. Despite the low spatial resolution of PET, it can provide function information on the viscera's metabolism. There are some differences between CT and MRI. CT images can show dense structures such as bones and implants with less distortion, but they cannot be used to detect physiological changes. Whereas MRI can provide normal and pathological soft tissue information, but it cannot show information regarding bones. Hence, in clinical applications, a single modality is insufficient to provide doctors with sufficient information to diagnose a patient's condition. It is necessary to combine several imaging modalities into one image to obtain more accurate information. The physician can accurately diagnose a disease and choose an appropriate therapeutic schedule according to comprehensive information of diseased tissue or organs in fused images. Therefore, it is necessary to fuse multi-modal medical images [2,3]. Image fusion can be broadly defined as the process of combing multiple input images or some of their features into a single image without the introduction of distortion or loss of information [4]. Recently, a lot of medical image fusion methods based on multiscale geometry analysis have been proposed. Examples include medical image fusion algorithms based on wavelets [5], ripplet transform [6], contourlet transform [7], nonsubsampled contourlet transform (NSCT) [8], and shearlet transform [9]. The NSCT, as a fully shiftinvariant form of the contourlet transform, leads to better frequency selectivity and regularity. The shearlet transform forms a tight frame at various scales and directions, and is optimally sparse for representing images with edges. Although the NSCT and the shearlet transform are fully shiftinvariant, they have redundancies and thus fusion methods based on NSCT and Shearlet are slower than other multiscale decomposition (MSD) methods, such as those based on the contourlet and wavelet transforms. The contourlet transform offers flexible multi-resolution and multi-directional decomposition for images. The discrete wavelet transform (DWT) preserves different frequency information in stable form and allows good localization both in the time and spatial frequency domains. However, one of the major drawbacks of DWT and the contourlet transform is lack of shift invariance. Therefore, image fusion method [5][6][7] performance quickly deteriorates when there is slight object movement or when the source multi-modal images cannot be perfectly registered. The quaternion wavelet transform (QWT) [10], proposed by Corrochano, is a new multiscale analysis tool for capturing the geometry features of an image. The QWT coefficients have three phases and a magnitude [11]. The first two QWT phases encode the shift of image features in the horizontal and vertical coordinate system, respectively. The third phase encodes the edge orientation mixtures and texture information. Hence, the QWT is nearly shift-invariant. The QWT also inherits many other interesting and useful theoretical properties, such as the quaternion phase representation and symmetry properties. The QWT subbands, as the quaternion analytic signals, contain rich geometric information of source medial images. Furthermore, the QWT can be computed using a two-dimensional (2-D) dual-tree filter bank with linear computational complexity.
Except for the MSD tool, the fusion rule is regarded as the most important factor that observably influences fusion performance. Generally, the classic fusion rules include standard deviation, spatial frequency, and average gradient. Fusion rules based on principal component analysis methods lead to pixel distortion in fused multi-modal medical images [6]. In [12], a visibility feature method was proposed to fuse the quaternion wavelet coefficient of source medical images. However, the fused images had lower contrast because the visibility feature does not effectively select the correct coefficients. Yong [13] proposed the Log-Gabor energy as a fusion rule in the NSCT domain. Phase congruency and directive contrast were introduced into fusing NSCT coefficients of medical images [8]. In [14], the weighted sum-modified Laplacian and maximum local energy were utilized to select secondgeneration contourlet transform coefficients. Although these fusion rules produce high-quality images, they also lead to loss of information and pixel distortion due to the nonlinear operations of fusion rules. The pulse-coupled neural network (PCNN) [15,16], proposed by Eckhorn, is a kind of neural network model based on the visual principle of a cat. PCNN-based methods well conform to the human visual system (HVS) in image fusion.
To overcome the aforementioned disadvantage, an improved medical image fusion model is proposed here based on the PCNN in the QWT domain. After the MSD of QWT, the fusion rule based on the simplified PCNN is applied to high-frequency subbands. The rule uses the number of output pulses from the PCNN's neurons to select fusion coefficients. The subband coefficients corresponding to the most frequently spiking neurons of the PCNN are selected to recombine a new image. The low-frequency coefficients of source medical images with the max value are selected as the fused coefficients. Finally, an inverse QWT is applied to the new fused coefficients to reconstruct the fused image. Some experiments are performed to compare the proposed algorithm with other start-of-state medical image fusion methods. The experimental results demonstrate that the proposed fusion rule is more effective than these methods.

Quaterion Wavelet Transform
The QWT is an extension of the complex wavelet transform that provides a richer scale-space analysis for 2-D signal geometric structure. Compared with the traditional DWT, the QWT provides a magnitude-phases local analysis of images with a magnitude and three phases. It is nearly shift-invariant [17].

Concepts of Quaternion Algebra
In 1843, Hamilton invented quaternion algebra, which is a generalization of complex algebra. The quaternion algebra over R, denoted by H, is a four-dimensional algebra that is associative and non-commutative: where the orthogonal imaginary numbers i; j; k satisfy the following calculation rules: The polar representation for a quaternion is q ¼ q j je iu e jh e kw , where q j j denotes the magnitude and (u; h; w) are the three local phases. The computational formula is:

Quaternion Wavelet Transform
The quaternion analytic signal can be calculated from its partial Hilbert transform (H 1 and H 2 ) and total Hilbert transform (H T ) [17]: where impulse sheets along the x and y axes, respectively. ** denotes 2-D convolution. We start with mother wavelets w H ; w V ; w D and real separable scaling function u. With regard to a separable wavelet, w x; y According to the definition of the quaternionic analytic signal, the QWT can be constructed as follows: For a separable wavelet, w x; y ð Þ ¼ w h x ð Þw h y ð Þ. The 2-D Hilbert transform is equal to twice the one-dimensional (1-D) Hilbert transform along the rows and columns, respectively [18]. Suppose that there is a 1-D Hilbert transform pair of wavelets (w h ; w g ¼ Hw h ) and a 1-D Hilbert transform pair of scaling functions (u h ; u g ¼ Hu h ). Then, the 2-D analytic wavelets can be derived from Eq. (5) as the product form of the 1-D separate wavelet: Four real DWTs are adopted in the QWT. The first DWT is used to form the real part of the QWT. The other three DWTs obtained using the Hilbert transform can be applied to generate the three imaginary parts of the QWT. Each subband of the QWT can be seen as the analytic signal associated with a narrow-band part of an image. This QWT decomposition heavily depends on the position of the image with respect to the x and y axes (rotation-variance), and the wavelet is not isotropic. However, the advantage is an easy computation with separable filter banks. The QWT magnitude q j j is shift-invariant and represents features at any space position in each frequency subband. The three phase angles (u; h; w) describe the 'structure' of those features.

Pulse-Coupled Neural Network
PCNN is a visual-cortex-inspired neural network characterized by the global coupling and pulse synchronization of neurons. PCNN, a recently developed artificial neural network model [19], has been efficiently applied to image processing in applications such as image segmentation, image restoration, and image recognition [20]. Furthermore, it has been observed that PCNN-based image fusion methods outperform conventional image fusion [21,22]. Since PCNN simulates the biological activity of the HVS, the fused image has a more natural visual appearance and can satisfy the requirements of the HVS. PCNN is a singlelayer, 2-D, laterally connected neural network of pulsecoupled neurons. PCNN is a kind of feedback network that consists of many PCNN neurons. The PCNN neuron model has three compartments: receptive field, modulation field, and pulse generator. Figure 1 shows the structure of a PCNN neuron. The neuron receives the input signals from feeding and linking inputs. Feeding input is the primary input from the neuron's receptive area. The neuron receptive area consists of the neighboring pixels of corresponding pixel in the input image. Linking input is the secondary input of lateral connections with neighboring neurons [23]. The difference between these inputs is that

Receptive field
Modulation field Pulse generator Linking field Fig. 1 Basic model of single neuron in simplified PCNN the feeding connections have a slower characteristic response time constant than that of the linking connections. The mathematical model of PCNN can be described as Eq. (7). In the simplified PCNN model, the variables above satisfy the following mathematical model: where the indices u and v refer to a location in the source image I and t is the number of the current iteration. W ab u; v ð Þ denotes the synaptic gain strength and subscripts a and b refer to the dislocation in a symmetric neighborhood around one pixel in the PCNN. F t u; v ð Þ, the feeding input of the PCNN, is set to Iðu; vÞ. L t u; v ð Þ is the linking input. U t u; v ð Þ is the internal activity of a neuron, and h t u; v ð Þ is the dynamic threshold. Y t u; v ð Þ is the pulse output of a neuron (either 0 or 1). a L is the attenuation time constant of L t u; v ð Þ. b is the linking strength. V L denotes the inherent voltage potential of Y tÀ1 u; v ð Þ.
4 Proposed Scheme

General Medical Image Fusion Framework
Generally, medical image fusion algorithms can be divided into two categories: spatial domain methods and MSD domain methods. Spatial domain methods directly select pixels or regions from clear parts in the spatial domain to compose fused images. Fusion in the spatial domain leads to undesired side effects such as reduced contrast. MSDbased image fusion methods overcome these disadvantages because coefficients in subbands, not pixels or blocks in a spatial region, are considered as image details and selected as the coefficients of the fused image. Multi-modal medical image fusion methods based on MSD can be summarized as a general medical image fusion framework, illustrated in Fig. 2. The key step in these approaches is to apply a multiscale transform to each source image to produce lowand high-frequency coefficients. Then, different rules are adopted to effectively select the coefficients for the final merged coefficient for every subband (e.g., low-and highfrequency subbands). The inverse MSD is then applied to the fused coefficient to produce the fused medical image.
MSD and fusion rules are important to final fused image quality. In this paper, the QWT is used to decompose the source images. The MSD and its inverse in Fig. 2 are replaced with QWT and its inverse, respectively. Selecting or acquiring coefficients in the multiscale transform domain is vital to improve fusion performance apart from the multiscale transform for the fusion methods based on MSD. Different frequency coefficients can represent different features of source medical images. In this paper, the absolute-max-method is used to select the low-frequency coefficients and the PCNN-based method is used to fuse the high-frequency coefficients. The proposed image fusion rules are described in Sects. 4.2 and 4.3, respectively. The proposed fusion approach is then summarized.

Low-Frequency Subband Fusion Rule
The low-frequency band at a coarse resolution level represents the context information of the original image. Most information in source images is in the low-frequency band. The absolute-max method is adopted to fuse the low-frequency subbands. In the absolute-max method, frequency coefficients from the source images with bigger absolute values are chosen as the fused coefficients. The fusion process can be expressed as: where Q l;k;m;n u; v ð Þ is the nth quaternion coefficient at row u and column v in the subband indexed by scale l, direction k, and phase angle m. The superscripts F, A, and B represent the fusion image, source image A, and source image B, respectively. When l is set to 0, Q l;k;m;n u; v ð Þ represents the low-frequency coefficients of the QWT.

High-Frequency Subband Fusion Rule
The details of an image are mainly in the high-frequency coefficients. Therefore, it is important to find an appropriate decision map to merge the details of input images. The following PCNN fusion rule is effective for fusing high-frequency subbands of the QWT.
(1) Make Q A l;k;m;n ðu; vÞ and Q B l;k;m;n ðu; vÞ separate feeding inputs of two PCNNs. That is, F t ðu; vÞ is set to Q A l;k;m;n ðu; vÞ or Q B l;k;m;n ðu; vÞ in Eq. (7). Q A l;k;m;n ðu; vÞ and Q B l;k;m;n ðu; vÞ are separate nth quaternion coefficients in the subband indexed by scale l (positive integer), direction k, and phase angle m.

Outline of Proposed Algorithm
The steps of the proposed fusion approach can be briefly summarized as follows: (1) Multi-modal medical images are registered to ensure that the corresponding pixels are aligned.
(2) The source images are decomposed using the QWT into low-and high-frequency coefficients.

Mutual Information
Mutual information (MI), proposed by Piella [24] where s and F denote the source image (A or B) and the fused image, respectively, h s,F is the joint gray-level histogram of s and F, h s and h F are the normalized gray-level histograms of s and F, respectively, and L is the number of bins. A larger MI indicates that the fused image contains more information from images A and B.

Fusion Performance Characterization
Petrovic proposed an objective image fusion performance characterization [25] based on gradient information to evaluate fused medical images. The metrics are total fusion performance Q AB=F , fusion loss L AB=F , and modified fusion artifacts N AB=F (artificial information created). The Q AB=F metric considers the amount of edge information transferred from the input images to the fused image. This method uses a Sobel edge detector to calculate the strength and orientation information at each pixel in source images A and B and fused image F. A higher Q AB=F indicates better results. Fusion loss L AB=F can be used to evaluate the information lost during the fusion process (i.e., information available in the source images but not in the fused image). Fusion artifacts N AB=F represent visual information introduced into the fused image by the fusion process that has no corresponding features in any of the inputs. The fusion artifacts are essentially false information that directly detracts from the usefulness of the fused image, and can have serious consequences in certain fusion applications. According to [25], a smaller L AB=F indicates better results. Similarly, a smaller N AB=F m meas better results. Q AB=F , L AB=F , and N AB=F are complimentary; i.e., their sum should be unity.
6 Experiments and Discussion

Experimental Setup
To evaluate the performance of the proposed fusion method, experiments were performed on six pairs of multimodal medical images, as shown in Fig. 4. To conveniently describe these pairs of images, the images were divided into groups a, b, c, d, e, and f, respectively. In Fig. 4, x1 and x2 (x 2 ½a; b; c; d; e; f ) indicate pairs of source images.  Fig. 4c1 and T1-weighted MR-GAD image in Fig. 4c2 show several focal lesions involving the basal ganglia. Figure 4d1, d2 are MRA and T1-weighted MRI images, respectively. Figure 4e1, f1 are T1-MRI images, respectively. Figure 4e2, f2 are T2-MRI images, respectively.
The performance of the proposed method is compared with the PCNN method motivated by spatial frequency of NSCT coefficients proposed by Xiaobo [26], Sudeb's method [27] of the PCNN motivated by the modified spatial frequency of NSCT coefficients, the ripplet method [6], and the QWT method based on visibility features proposed by Peng [12]. In Qu's method and Sudeb's scheme based on NSCT, the pyramid filter and the direction filter were set to 'pyrexc' and 'vk', respectively. In Qu's method, the decomposition levels were set to [0,1,3,3,4]. In Sudeb's NSCT method, the decomposition levels were set to [1,2,4] in accord with [27]. In Sudeb's ripplet method, the '9/7' filter, 'pkva' filter, and levels = [0,1,2,3] were used to decompose the source images using the ripplet transform. In Peng's method, the parameters given in [12] were adopted for the comparison.

Subjective Evaluation Analysis
To evaluate the performance of the proposed method in multi-modal medical image fusion, extensive experiments were performed on the six groups of images shown in Figs. 5, 6 and 7, respectively. From the fusion results of the five algorithms in Fig. 5a-e, all fused images contain both bone information and tissue information. However, careful observation reveals some differences. Figure 5a, b show a block effect and image degradation. There is higher contrast in the fused images obtained using the proposed method and Sudeb's NSCT method than in those obtained using the other three methods. However, the label regions in Fig. 4e are clearer than those in Fig. 4c.
From Fig. 5f-j, the visual effect in Fig. 5f is the worst. The external outline of Fig. 5g shows some image degradation. Furthermore, the upper outline in Fig. 5h Fig. 5h. However, the corresponding regions are clear in Fig. 5j. The label regions in Fig. 5j are clearer than those in Fig. 5h. The image fused using the proposed method, Fig. 5h, has higher contrast than that in Fig. 5i. The white outline in Fig. 4c1 is lost in Fig. 6a. There are clear artifacts along the boundary in Fig. 6b. The outline information in Fig. 6e is more obvious than that in Fig. 6c, d. A block effect and artifacts are clearly visible in Fig. 6f, h, and especially in Fig. 6g. The image quality in Fig. 6i, j are similar. Figure 7a shows that a lot of image information in Fig. 4e1, e2 is lost after the fusion process. Dark blocks appear in the middle part of Fig. 7b. The accuracy and image information in Fig. 7e are better than those in Fig. 7c, d. The lower-right corner of Fig. 7j shows more details of tissues compared to Fig. 7f-i. In summary, the fusion image obtained using the proposed algorithm has more accuracy and necessary information compared to those obtained using the other four methods while having fewer blocks and artifacts.

Objective Evaluation Analysis
Objective evaluation metrics are used to demonstrate the differences in the fused images. MI, Q AB=F , L AB=F , and N AB=F are used to compare the proposed method with four algorithms. The objective performances of the fused images of groups a and b in Fig. 4 are listed in Table 1. Table 2 shows the MI, Q AB=F , L AB=F , and N AB=F values of the fused images in groups c and d in Fig. 4, respectively. values of the proposed scheme are the smallest (i.e., our method introduced the fewest artifacts). The objective evaluation results well coincide with the subjective effect evaluation with minor exceptions. However, for these exceptions, the subjective effect of the proposed method is better than that of the other methods. From the subjective and objective metric comparisons, it can be concluded that the proposed algorithm works well for combinations of CT with MRI, CT with PD-weighted MR, CT with T1-weighted MR-GAD, MRA and T1weighted MRI, and T1-weighted MR with T2-weighted MR. The proposed scheme is more effective than some state-of-the-art methods.

Conclusion
This study applied the QWT to multi-modal medical image fusion. The proposed method adopts the PCNN fusion rule and abosolute-max fusion rule to select high-and lowfrequency coefficients, respectively. The experimental results illustrate that the proposed scheme is better than some existing fusion methods in terms of both objective and subjective metrics.