Multimodal Medical Image Registration and Fusion for Quality Enhancement

: For the last two decades, physicians and clinical experts have used a single imaging modality to identify the normal and abnormal structure of the human body. However, most of the time, medical experts are unable to accurately analyze and examine the information from a single imaging modality due to the limited information. To overcome this problem, a multimodal approach is adopted to increase the qualitativeand quantitativemedical information which helps the doctors to easily diagnose diseases in their early stages. In the proposed method, a Multi-resolution Rigid Registration (MRR) technique is used for multimodal image registration while Discrete Wavelet Transform (DWT) along with Principal Component Averaging (PCAv) is utilized for image fusion. The proposed MRR method provides more accurate results as compared with Single Rigid Registration (SRR), while the proposed DWT-PCAv fusion process adds-on more constructive information with less computational time. The proposed method is tested on CT and MRI brain imaging modalities of the HARVARD dataset. The fusion results of the proposed method are compared with the existing fusion techniques. The quality assessment metrics such as Mutual Information (MI), Normalize Cross-correlation (NCC) and Feature Mutual Information (FMI) are computed for statistical comparison of the proposed method. The proposed methodology provides more accurate results, better image quality and valuable information for medical diagnoses.

for diagnoses of various health disorders. In medical imaging, a single imaging modality is not sufficient to provide all anatomical and functional information required for diagnoses of the normal and abnormal structures. Single imaging modality has limited information, for example, anatomical information about bones can be acquired from X-rays and CT scan images whereas functional and soft tissue information can be attained from MRI images. Similarly, body functional and cancerous cell information can be extracted from PET and SPECT images. All the functional and anatomical information can be achieved on a single platform by using a multimodal approach. Multimodal medical imaging requires two or more than two imaging sources to give extended medical information that cannot be visible from a single imaging modality. The detection of the lesion, fractures, cancerous cells, brain hemorrhage, and tumors are more visible from multimodal medical imaging [1][2][3]. To achieve a resultant image that contains maximum information can be possible by multimodal image registration and fusion. Many diseases like Alzheimer's, neoplastic, Coronary Artery Disease (CAD), etc. cannot be diagnosed properly in the early stages from a single imaging modality. To overcome this limitation, registration and fusion techniques are used to diagnose such diseases more accurately [4][5][6]. Image registration is the first step to align geometrical coordinates of two images and match their intensities values followed by the image fusion to overlap two images without any loss of medical information. Then, the resultant fused image will contain both anatomical and functional information [7,8]. The research trend of multimodal approaches can be seen in Fig. 1, which reflects that the research happening in this area is increasing tremendously. The statistical results shown in Fig. 1 are collected from PubMed which is an online medical database [9]. It is observed that the number of publications is increasing each year. Fig. 1, represents published articles from 1990 to third-quarter year Q3, 2020.  92  85  90  67  60  37  44  51  37  33  26  37  14  21  14  10  7  10  6  5  2  5  1  0   50   100   150   2020  2019  2018  2017  2016  2015  2014  2013  2012  2011  2010  2009  2008  2007  2006  2005  2004  2003  2002  2001  2000  1999  1998  1997  1996  1995  1994  1993  1992 Pubmed result on medical image registration and fusion  The general block diagram of multimodal medical image registration and fusion techniques used for image information enhancement is shown in Fig. 2, which consists of two major steps: Image registration and fusion. In this research article, a multimodal medical image registration and fusion technique is presented. The main motivation of this research work to diagnose brain diseases at early stages with the help of registration and fusion techniques. The surgeons and medical experts can perform surgery more precisely using multi-modality. The contribution of the proposed methodology to visualize the brain anatomical and functional information more effectively on a single modality. The remaining section of this paper is arranged as follows: Section 2 contained the related works. The proposed methodology is discussed in Section 3. The dataset and experimental details are discussed in Section 4. The results and discussion of the proposed methodology are elaborated in Section 5. The last Section 6 concludes the research work.

Related Works
Many multimodal medical image registration and fusion approaches are presented in the literature. Das et al. [10] used the affine MRR technique for multimodal image registration. CT and MRI images of the human brain were used as an input. The registered images were obtained by maximizing the correlation function between the two input images. Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) were used to maximize the value of the similarity function between the two input images. The dataset utilized contained MRI-T1, MRI-T2, and CT images. The correlation similarity was used as a performance parameter to compare the results before and after registration. In this work better accuracy and robustness were achieved with PSO rather than GA. Joshi et al. [11] performed a rigid registration of the CT and MRI images using GA. Mean Square Difference (MSD) similarity metric was computed for statistical comparison. Leng et al. [12] proposed a novel approach based on interpolation that works on multi-resolution registration. In this method, the whole image registration steps were divided into two stages: Registration of medical images and intensity interpolation. The bicubic B-spline vector-valued transformation function was used for feature-based registration between the two input medical images. The control points of the B-spline were evaluated on each resolution registration for better-featured matching. In the second step, the intensity values of input images were matched using linear/cubic interpolation. The experimental result proved that this approach was suitable for deformable medical images. The registration results were evaluated using the MSD quality assessment metric.
Mohammed et al. [13] implemented an intensity-based rigid registration and then Wavelet Transform was applied to fused CT and MRI images. The SRR method was easy to implement and time-efficient. The correlation coefficient similarity metric was used for matching both input images. The registration results were determined by the correlation function of images before and after registration. It was observed that the correlation value was high after registration. In the image fusion step, the third level decomposition was used to determine the coefficient of input images using eighth order Daubechies (db) wavelet. Nandish et al. [14] proposed a multimodal B-Spline deformable and MRR registration technique. The B-Spline method was implemented on 2D and 3D brain images. It was observed that the B-Spline method gives good results on 2D medical images but produced noise in 3D images. The multi-resolution technique was much better and produce less noise in 3D brain MRI and CT images. The spatial information from a source image to the target image determines and mapped on the target image using multi-resolution registration. In this method, the three-level multi-resolution approach was used. After registration, the fusion step takes place. The statistical performance of registration and fusion was determined using MI and visual results were verified from the radiologist feedback.
Palanivel et al. [15] proposed a novel method for a 3D multimodal medical image registration. In this methodology, the volume of input images was registered by volume multifractal characteristics. Multifractal characteristics of volume were used as features in 3D registration. This methodology was implemented on medical 3D brain images and synthetic phantom images. Brain CT and MRI images of seven different patients were used from the RIRE database. Initially, multifractality characteristics of Input CT and MRI images were derived using Holder exponents and Hausdorff dimensions. Presti et al. [16] proposed a local affine transformation rigid or non-rigid registration-based technique that works on the Empirical Mutual Information (EMI) similarity metric. The optimization algorithm based on gradient descent was used for maximizing the value of the similarity metric between two input images. This method was implemented on the brain and knee images. Cui et al. [17] presented a novel non-rigid registration technique based on multichannel. The authors used a novel parameter-reduced cost function to optimize the weighting parameter and also improved the inflexible solid boundary. The proposed method was implemented on multi-scale CT and SPECT lung images. In this article, the main focus of image registration was to diagnose chronic obstructive pulmonary disease (COPD) in the lungs.
Gaurav et al. [18] proposed a multimodal fusion algorithm based on the Directive Contrast NSCT. The NSCT based fusion technique decomposes the input mages into low and high-level frequency components. There were two fusion rules applied in this algorithm. The first fusion rule was directive contrast and the other was phase congruency to fuse low and high-frequency components of image level. Finally, the resultant fused image was obtained by taking inverse NSCT. The experiment results were obtained from the brain images of different persons having diseases like a tumor, Alzheimer's, and brain stroke. Statistical results were verified based on the parameters like Edge Based Similarity Measure (ESM), Structure Similarity Index Metric (SSIM), and Normalize Mutual information (NMI).
Sahu et al. [19] implemented a multimodal brain image fusion using a Laplacian pyramid and Discrete Cosine Transform (DCT). As the level of the pyramid increases then the quality of fused images also increases when input images were decomposed. The visualization and statistical results showed that this method provided better image edges and contains maximum information. The methodology was compared with Daubechies complex wavelet transform (DCxWT). Bashir et al. [20] proposed a multimodal fusion method based on Stationary Wavelet Transform (SWT) and Principle Component Analysis (PCA). These two fusion methods were tested on satellite images, multimodal medical images, stereo images, and infrared visible images. Then, these two methods were compared with each other. The fusion image quality was determined by using image fusion quality metrics like entropy, MI, NCC, RMSE, etc. It was observed that SWT performed well on multimodal and multi-sensor images, while PCA performed well in the case of those multimodal images having higher contrast. He et al. [21] used PET and MRI brain images to fuse by applying Intensity-Hue-Saturation (IHS) technique and PCA. The advantage of these techniques was to maintain spatial information and acquire spatial features present in the input images with no loss of color disorder in the images.
Yuhui et al. [22] utilized a multi-wavelet transform fusion technique on the PET and CT chest images. The coefficients of images were decomposed by wavelet decomposition. The fusion results were evaluated by using different assessment metrics. Xu et al. [23] proposed a multimodal image fusion method based on adaptive Pulse Coupled Neural Network (PCNN). The optimization technique Quantum PSO (Q-PSO) was used for determining the maximum value of the similarity measure. Arif et al. [24] proposed a new fusion method based on Fast Curvelet Transform along with Genetic Algorithm (FCT-GA). The authors implemented their technique on 825 CT, MRI, MRA, PET, and SPECT brain images. In this article, the dataset collected from CMH hospital Rawalpindi and other sets of the image were acquired from AANLIB freely available dataset. The statistical results were verified on eight different performance metrics. Maqsood et al. [25] presented a new technique that works on two-scale image decomposition using the sparse representation technique. The authors first decomposed input images into two layers: Base and detail layers. SSGSM method was used to extract the detail layers. CT and MRI brain images were used for testing the method.
There is some work related to deep learning done that include Bhattacharya et al. [26] presented a deep learning survey article that mainly focused on Covid-19 disease detection using deep learning. Gadekallu et al. [27] presented the PCA technique with a deep learning model, but this work was associated with tomato plant disease detection. In recent deep learning approaches are widely used but they vary from case to case study as Gadekallu et al. [28] used deep learning to predict retinopathy disease. Reddy et al. [29] recently implemented a deep neural network with a combination of Antlion resampling techniques to classified multimodal stroke brain imaging datasets, the primary focused of this dataset is to classify dataset stroke images taken from the Kaggle dataset. Many researchers already worked on fusion and registration methods but accurate registration of two different imaging modalities is still a challenging task due to different intensity variations. The image fusion for gathering useful medical information in a single imaging modality is another problem. Most of the existing work is either only based on registration or the fusion process, a limited number of researchers combined both for useful results. In the proposed methodology, the combination of registration and fusion process is used to enhance the medical information in a single imaging modality for ease of diagnosis. The typical SRR method is tested and compared with the MRR method followed by PCAv for fusion to improve the results significantly as discussed in Section 4.

Proposed Methodology
The proposed method is based on MRR and DWT-PCAv techniques. The MRR technique is more accurate than single registration and then the DWT-PCAv fusion technique is applied for adding valuable details in the resultant image. Medical images: CT and MRI are used as input. The initial step of image registration is achieved by the application of the MRR technique to align the images and match their intensity values. It is noticed that the MRR technique gives better results as compared to SRR. The resultant image is best aligned and contains more valuable information for diagnosis purposes. After registration, the DWT-PCAv fusion technique is applied to fuse both images. The framework of the proposed method is shown in Fig. 3.

Multi-Resolution Rigid Registration (MRR)
During the proposed MRR technique, the input medical images are converted into multiple resolution levels. Fig. 4 shows the multi-resolution pyramid model for registration. Images are arranged in decreasing resolution as images go from the base of the pyramid to the top of the pyramid. The base image of the pyramid has a higher resolution than the top of an image of the pyramid which has the lowest resolution. Images are divided into multiple levels and on each level registration process will be performed. When input images are of the same sizes and different resolutions then the registration process will produce better results, which will help in diagnoses abnormalities. The original input images are at the base level of the pyramid and if these images are of size N × N then the next upper will be N/2 × N/2, the third level will be N/4 × N/4, and so on. The MRR procedure can be implemented either from the top to down approach or from down to top.   In the proposed methodology, affine rigid registration geometric transformation is applied which includes scaling, rotation, and shearing of the images to best align the source image into the target image. There are many similar metrics but we choose the Cross-Correlation similarity metric due to its non-complex nature and time efficiency. Automatic multi-resolution registration will be done when a maximum similarity value is achieved between source and target images using the optimizer. The optimizer will continuously calculate the similarity value until the similarity between two images is maximized and both images become perfectly aligned. The similarity metric is a non-convex function. In the proposed methodology, a Gradient descent optimizer is utilized. The role of the interpolator is to determine the position and the value of each pixel and its neighbor's pixel value from both moving and fixed images. The optimizer function is responsible for the correct registration of images by considering the similarity metric values. The similarity function is not smooth and contains multiple local optima values due to variation in intensities values of the images in the multimodal registration process. The images are best overlapped and registered when the global optima position is reached. The MRR technique takes more time as compared to SRR because this is an iterative procedure to register images. MRR registration requires multiple resolutions of images from a single input image. MRR matched intensities values of each image one by one and stoped when all the source and target split images perfectly matched. The final registered image has better geometrical alignment with the target image as it takes more time but it has better accuracy in terms of alignment. And more registration time reduces in the fusion process later because with the MRR technique the images are already registered with minimum alignment error.

DWT-PCAv Fusion
After the MRR process, the input source images are first decomposed into different multiscale resolutions and orientations using DWT. This method is used to visualize input images in different resolutions having each decomposed level with different information. After multi-scale decomposition, the principal components are performed on each image coefficients level. Then, the average of principal components is evaluated on each decomposed image level and some weights to each coefficient element of images are assigned for fusion rules. The input images are decomposed into different coefficient levels, such as Low-Low (LL), Low-High (LH), High-Low (HL), and High-High (HH). The LH, HL, and HH are detailed coefficient scale levels while LL is the approximate coefficients element. The LL coefficient elements taken from two source images are used as an input to the PCA. From the LL coefficients element, the highest principle components determine the new coefficient element m1 and m2. Similarly, other detailed coefficient elements are processed to calculate the principal components. Then, the average of approximation and detail coefficients principal components are taken to obtain m1 and m2 average components. These two average principal components are used to fuse the final image. The complete step-by-step fusion procedure is displayed in Fig. 6. The major steps can be summarized as follows: • Initially, CT and MRI input images are decomposed into two or three levels by using DWT.
• Then, detailed coefficients components and approximation components are obtained by using PCA. • Sort out every principle component of corresponding coefficient elements from both image sources. • Evaluate average coefficients components using PCAv.
• Implemented principle component averaging fusion via mean and averaging of principle components.
• Evaluate the quality of the fused image using fusion quality assessment metrics.
• Consider if Y 1 i and Y 2 i are coefficients approximation taken from LL decompose level from both medical image sources and can be represented as a column in a matrix. Where the elements of medical image one are Y 1 i and elements of the second image are Y 2 i within a matrix.
where the value of i is from range 1, 2, 3, . . .k, and k denotes the number of approximation coefficients. The co-variance among two vectors is given as in Eq. (2).
The mean of all pixel's value can be calculated as follows The two matrixes can be defined as D is a diagonal matrix that contains Eigenvalues and matrix E that contain Eigenvectors. These two matrixes are evaluated first then normalized components of m 1 and m 2 can be determined as described in the following equations. The n denotes the number of decomposition levels. If D(1, 1) > D(2, 2) Otherwise; After computation of m 1 and m 2 from approximation coefficients, m 1 and m 2 are calculated from detail coefficients elements. When all the m 1 and m 2 components are calculated then the mean of these components is computed. In the final step, the PCAv is used to fuse the final medical image having useful diagnostic information.

Experimental Results and Discussion
This section consists of the details of the dataset and performance parameters used for validation of the proposed methodology. The visual and statistical results along with the discussion are also provided.

Dataset
The proposed model is tested on the Harvard Atlas brain dataset obtained from (http://www.med.harvard.edu/AANLIB/home.html). The Harvard medical dataset is mainly classified into two categories: Normal and abnormal brain images. This dataset contains modalities includes MRI (MR, MR-T1, MR-T2, MRPD, MR-GAD), CT, and SPECT/PET brain images. In the normal brain images section, this dataset added new 3D anatomy brain structure images of MRI and PET modalities. This new dataset contains three different angles of images known as Transaxial plane, Sagittal, and Coronal plane. In the normal brain category, this dataset contains about a hundred brain structure parts with labeling Both normal and abnormal brain images of the selected dataset are incorporated in simulation to achieve statistical and visual results. This dataset intends to carry out a wide range of neuroanatomy, focusing on the anatomy of many emerging central nervous system diseases. A variety of brain abnormalities when working to show them. This dataset contains several substantial examples of some brain conditions and various combinations of imaging modalities and frequency of imaging. The Harvard dataset is further classified into four sets: Normal brain images, Cerebral Toxoplasmosis disease, cerebral hemorrhage diseased brain images, and acute stroke disease brain images.

Performance Parameters
The image quality assessment metrics such as Mutual information (MI), Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Metric (SSIM), Feature Mutual Information (FMI), Root Mean Square Error (RMSE), and Normalize Cross-Correlation (NCC) are computed for validation of the proposed model. MI determines the mutual combination of information between source images and resultant registered or fused images [30]. The MI between source and resultant images will be zero if the source and result images are independent [31,32]. If the MI is higher, then more information present between source and resultant images. The formula to determine MI is given below.
MI xy = Hx + Hy − Hxy (6) where MI xy is the mutual information between source and resultant image, Hx is a joint entropy of image X and Hy is a joint entropy of image Y, Hxy is a joint entropy of image X and Y. Similarly, Eq. (7) describes MI of the fused image. PSNR is a quantitative measure based on the RMSE. PSNR computes the ratio of the number of intensities level in the medical images to the related pixels in the resultant image. A higher value of PSNR shows superior image fusion or registration.
where fmax indicates the maximum pixel gray levels value in the fused image. The SSIM determines the resemblance between two regions w x and w y in both images X and Y [33].
FMI calculates features in the resultant fused image. It calculates the number of edges, curves, and other features transferred from source images to the resultant fused image. If the value of FMI is higher, then the quality of the resultant fused image is also higher. Mathematically, FMI can be expressed as follows [34]. (10) where FMI F is the features of the resultant image transferred from source images X and Y. FMI FX and FMI Fy are the features of image X and Y. RMSE computes the quality of the final fused image by comparing it with the ground truth image. For good fusion results, its value should be nearer to zero [35].

Multi-Resolution Rigid Registration (MRR) Results
The registration is an initial and important step after preprocessing. The quality of fusion also depends on registration. In multimodal image registration, intensity-based registration is suitable. The two registration methods: SRR and MRR are implemented on the CT and MRI brain images. The reason for selected these methods is that in SRR the time complexity is not an issue and complete image alignment is achieved in a very short time but image alignment and intensities matching are not so good. SRR method sacrifices with image registration quality. On the other hand, the MRR process significantly improves the image quality at the cost of time complexity. Both visual and statistical registration results are demonstrated for comparison and validations. In the fusion process, the MRR images are used as an input of the fusion method.
The visual and statistical results on each set of brain images are computed and evaluated. All the results of image registration and fusion are implemented in MATLAB 2018a software and HP pro-Book 430 G1 Intel Core i3 4010G, CPU 1.7 GHz, 4GB RAM. The MRR and SRR techniques are implemented on four sets of brain images as shown in Figs. 7-10. Each set is divided into a pair of moving and fixed images. In Fig. 7, image (a) contains MR-PD as a moving image and MR-T2 as a fixed image/target image of slice-20. Our target is to align and match different intensities of moving images onto the fixed image and achieved the target images (b) and (c). The image (b) is the resultant image of the SRR (left side) whereas an absolute difference image between resultant registered and fixed image (right side). If the difference of image is less then it means that the source and the target image are perfectly aligned and matched. If the difference is high then it means that the registered image is not perfectly aligned. Similarly, the image (d) contains a pair of MR-PD and MR-T2 slice-35, and the images (e) and (f) are resultant registered images of the SRR and MRR process, respectively. Similarly, the visualization results of single and multi-resolution of Cerebral Toxoplasmosis disease images are shown in Fig. 8.
The visualization results of an SRR and MRR approach on cerebral hemorrhage and acute stroke disease images are shown in Figs. 9, 10, respectively.   To analyze better registration results, the visual results of brain registered images with an absolute difference are shown and it is observed that these results are good for human perception. Besides, statistical results are also computed to show which technique performs better image registration. It is observed that the MRR performs better in most of the cases. Seven registration quality assessment metrics are selected which are MI, CC, SSIM, NCC, Peak SNR, SSD, and RMSE for validation. The computation time of both registration methods is also calculated. The value of MI, CC, SSIM, NCC, and PSNR should be high in case of better registration while the value of SSD, RMSE, and computation time should be lower for better quality image registration. The normal brain image registration statistical results are shown in Tab. 1. The second set contains brain medical images of Cerebral Toxoplasmosis disease. The MRR method shows good image quality results while computation time is high due to the increase of the iteration level. Tab. 2 shows the statistical results of brain images having Cerebral Toxoplasmosis disease. Similarly, the third section (cerebral hemorrhage disease brain images) and the fourth section (stroke disease brain images) statistical results of the Harvard dataset are shown in Tabs. 3, 4, respectively. It is observed that MRR statistical results are more promising, but the computation time of the MRR method on each dataset is high as compared to the SRR method.

DWT-PCAv Image Fusion Results
In the fusion process, the MRR images are used as an input because the resultant image is more accurately registered with its source moving image. The DWT-PCAv method is utilized for the fusion process. The proposed fusion results are compared with recent fusion methods in literature such as Discrete Wavelet Transform using principle component Averaging (DWT-PCA) [36], Guided Image Filter based on Image Statistics (GIF-IS) [37], Fast Filtering Image fusion (FFIF) [34] and Non-Subsampled Contourlet Transform using Phase Congruency (pc-NSCT) [38]. It is observed that the proposed methodology produced better results, which is reflected in the visual and statistical results.
The visual fusion results of normal brain images are shown in Fig. 11

Conclusion and Future Direction
In the proposed methodology, MRR and DWT-PCAv techniques are presented. The proposed MRR overcomes the limitation of the SRR. The SRR is a time-efficient approach but having the drawback of a miss-registration. The registered image of MRR is used as an input of the fusion step. PCAv fusion technique improves the quality of an image by fusing two brain images with the preservation of valuable information. The other advantage of PCAv fusion is that it is time-efficient. The proposed methodology is implemented on four sets of brain images in which one set contains normal brain images while the other three sets contain abnormal brain images. The proposed methodology is compared with existing fusion methods. The image registration and fusion results are shown both visually and statistically. It is observed that the proposed methodology gives promising results as compared with other existing methods. The researchers can work further in this field on non-rigid registration and extend this work to other imaging modalities such as PET, SPECT, etc. Furthermore, researchers can diagnose many recent brain diseases to identify the patient's condition at the early stages. This research work can be easily combined with machine learning models such as fast and compact 3-D Convolutaional Neural Networks [40] for obtaining better results. The state-of-the-art-work can be implemented in some current diseases such as COVID-19 and its impact on brain psychology.

Funding Statement:
The authors received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.