Multi-modality MRI fusion for Alzheimer’s disease detection using deep learning

Diﬀusion tensor imaging (DTI) is a new technology in magnetic resonance imaging, which allows us to observe the insightful structure of the human body in vivo and non-invasively. It identiﬁes the microstructure of white matter (WM) connectivity by estimating the movement of water molecules at each voxel. This makes possible the identiﬁcation of the damage to WM integrity caused by Alzheimer’s disease (AD) at its early stage, called mild cognitive impairment (MCI). Furthermore, the brain’s gray matter (GM) atrophy characterizes the main structural changes in AD, which can be sensitively detected by structural MRI (sMRI) modality. In this research, we further develop a novel multi-modality MRI (DTI and sMRI) fusion strategy to detect WM alterations and GM atrophy in AD patients. The latter is based on a 2-dimensional deep convolutional neural network (CNN) features extractor and a Support Vector Machine (SVM) classiﬁer. The fusion framework consists of merging features extracted from DTI scalar metrics (fractional anisotropy (FA) and mean diﬀusivity (MD)), and GM using 2D-CNN and feeding them to SVM to classify AD vs. cognitively normal (CN), AD vs. MCI, and MCI vs. CN. Our novel multimodal AD method demonstrates a superior performance with an accuracy of 99.79%, 99.6%, and 97.00% for AD/CN, AD/MCI, and MCI/CN respectively.


Introduction
Alzheimer's disease (AD) is an irreversible progressive neurodegenerative disorder that affects people over the age of 65 and outlines around 60% of dementia worldwide.It is caused by damage to nerve cells in certain brain regions, affecting a persons memory and cognitive abilities, which disrupt their daily life.The Alzheimer's Association declares that AD is the sixth leading cause of death in the USA; around 50 million people were diagnosed with this disease in 2018 and in 2050, this number will be tripled (1).At present, no effective treatment or prevention is found.Moreover, disease management is prohibitively costly.Early screening of this disease is of primordial importance for researchers to slow down its progression and optimize the treatment.In this context, advances in neuroimaging, primarily magnetic resonance imaging (MRI), have shown the potential to improve the early diagnosis of AD.
AD is characterized by a progressive loss of Gray matter (GM) that occurs pre-symptomatically in certain neuro-anatomical structures (2).Structural MRI (sMRI) is the most used neuroimaging modality to detect brain atrophy.It has already highlighted many biomarkers of Alzheimer's disease; in particular, the atrophy of structures such as the hippocampus, the amygdala, and the thalamus (3).In fact, hippocampal atrophy in prodromal patients proved to be the best structural predictor of Alzheimer's disease progression (4).However, it is associated with a large number of neurodegenerative pathologies, thereby limiting its specificity to Alzheimer's disease (5).
Within this frame of reference, many studies on the AD-prodromal phase called mild cognitive impairment (MCI) have focused their research on the hippocampus.Nevertheless, some other structures appear interesting such as, the volume of the amygdala which could be a structural predictor as powerful or even more efficient than the volume of the hippocampus to predict MCI (6; 7).Furthermore, there are changes in white matter that preceded gray matter atrophy but are not detectable by sMRI (8).The introduction of diffusion tensor imaging allows identification of these changes when the patient still presents an MCI (9).The MCI is the transitory phase between (CN) decline and AD or another dementia.DTI has conventionally studied the white matter microstructural integrity based on the estimation of the water molecules' diffusion in all directions (six directions at least) (10).The degree of anisotropy of water diffusion is represented by the fractional anisotropy (FA), while mean diffusivity (MD) represents its magnitude.Studies have shown the importance of measuring these two DTI indices (FA and MD) to describe the physiological aging in the MCI patient phase (11).Increased MD and decreased FA were reported in AD patients compared to CN.Higher MD in MCI patients was observed in both hippocampi (12).Indeed, a considerable increase in MD and decrease in FA indicates a progressive loss of the barriers restricting the motion of water molecules in tissue compartments, associated with neuronal loss in AD (13).It, therefore, seems important to measure the DTI indices because they can provide additional information on the pathophysiology of the disease.
The introduction of machine learning and deep learning techniques has greatly contributed to the diagnosis and prognosis of AD based on neuroimaging data (14).Numerous research works have been published for the AD classification using DTI, where the FA and MD were the most frequently metrics used as features.The most popular among these machine learningbased methods utilized as classifiers, are the Support Vector Machine (SVM), and Random Forest (RF) (15; 16; 17; 18; 19).Most of them used the tractbased spatial statistics (TBSS) algorithm (37) to extract the white matter skeleton from FA and MD.They selected only the pertinence WM Skeleton information to perform binary or multi-classification using Alzheimers disease national initiative (ADNI) data set.The difference was presented in the classification task, where Maggipinto (18) used Random Forest and Lella (19) proposed to concatenate the best result from different classifiers (SVM, RF, and Multi-layer perceptron (MLP)) from all features groups (FA, MD, radial diffusivity (RD), longitudinal diffusivity (LD)).The use of DTI-based machine learning shows impressive performance.However, it is necessary to extract features and subsequently select the relevant ones to perform classification tasks, which is difficult and time-consuming.
Deep learning is a state-of-the-art machine learning method (20).Classification techniques using deep convolutional neural networks (CNN) revealed higher AD detection performance (21).Most of the literature approaches have used CNN-based sMRI to classify the different stages of Alzheimer's disease.CNN can handle low to high automatic feature extraction from complex structures.Some authors have proposed a new CNN architecture (22) reaching promising results with an accuracy of 99.9%.Others have reported excellent results using transfer learning methods (23; 24; 25).However, others have suggested extracting deep discriminative features based on transfer learning methods and classifying them with SVM (26; 27).
In recent years, DTI indices, principally MD, combined with sMRI information have been adopted by many researchers.They proposed different techniques to combine DTI and sMRI.Massalimova et al. (28) have tried multi-modal Resnet-18 network (sMRI and DTI) in classifying CN, MCI, and AD from OASIS-3 datasets.They managed to suggest that the classification performed by the softmax layer could be preferable than another classifier in contrast to Kang et al. (26).Kang et al. (26) suggested a fusion technique consisting of merging slices with the same index of the T1w, FA, and MD images into an RGB slice.After that, the pre-trained VGG16 network is used to extract the features and SVM classifier to discriminate MCI patients, from CN using the ADNI dataset.Aderghal et al. (29) proposed LeNet-like CNN based on sMRI and DTI-MD images.They selected the median slice Hippocampal and its two neighbors in each projection (axial, sagittal, and coronal).The proposed CNN is trained on the MNIST database.They first retrained the model on sMRI then on DTI-MD.They achieved a classification accuracy of 86.83% for AD vs. CN, 69.85% for MCI vs. CN, and 71.75% for AD vs. MCI.Marzban et al. (30) proposed a simple 2DCNN based on a single convolution layer.They trained the model on diffusion scalars metrics (FA, MO, and MD) and GM.The cascaded MD and GM volumes achieved an overall accuracy of 88.9% and 79.6% respectively for AD vs. CN and MCI vs. CN.Ahmed et al. (31) extracted visual features from the hippocampus ROI in both sMRI and MD images.The extracted features and the amount of CSF calculated on the sMRI are combined and classified using multi-kernel learning (MKL).
Assessment of pathophysiological changes by neuroimaging would be essential to predict AD.Single modality cannot provide enough information, therefore, multi-modality must be combined to detect AD. sMRI and DTI have received more attention in recent years to study the progression of Alzheimer's disease.These two modalities are complementary; the sMRI detect the shrinkage of gray matter and changes in the brain volume.Moreover, the DTI is a useful prediction marker to detect the WM deterioration.In this context, we aim to detect patterns of micro and macrostructural changes in the different AD stages using the multi-modality MRI (sMRI and DTI) fusion process.We propose a new methodology that consists of a new CNN to extract the salient visual features from the DTI measurements and the GM images separately.After that, these features are merged and transmitted to SVM to identify AD from MCI, AD from CN, and MCI from CN.

Database
Dataset used in this work has been obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) (http://adni.loni.usc.edu).The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD.The objectives of the ADNI study are the identification of biomarkers for clinical use and early detection of AD (32).The selected balanced dataset includes both Diffusion-weighted images (DWI) and sMRI brain scans from 150 individuals of both genders (50 AD, 50 CN, and 50 MCI), with ages varying from 55 to 90 acquired by GE medical system scanners.The 50 MCI subjects are selected with 25 early MCI and 25 Late MCI.The selected subjects coming from ADNI-GO and ADNI-2 phases.
In addition to these images, 5 T2-weighted images without diffusion (b = 0) are used as reference scans.More informtion about the acquisition parameters can be found in the ADNI2 protocol.

Methodology
Our proposed strategy consists of pre-processing, a 2D slice selection, a features extraction, and a classification.We work on DTI measurements (FA, MD) and GM brain segmented from T1-weighted sMRI to classify (CN vs. AD), (AD vs. MCI), and (CN vs. MCI).New 2DCNN architecture was trained by slice-level dataset (only the 32 relevant slices selected from FA, MD, GM images) to extract the salient features from DTI maps and GM.The optimal FA-CNN, MD-CNN, and GM-CNN models are saved depending on lower loss value during the training process, then adapted to extract features from the last fully connected layer.After that, the features of each slice in the subjectlevel dataset (FA, MD, GM) are extracted by their optimal model (FA-CNN, MD-CNN, and GM-CNN).These features are merging and feeding to the SVM classifier to improve the performance as is illustrated in figure 1.The detailed description is found in the following subsections.

SVM classifier
Fig. 1 Flowchart of the proposed fusion multi-modalities system using the 2DCNN-SVM approach for AD identification.
The pre-processing steps of the raw sMRI volumes to segment the GM are performed by the CAT12 toolbox (http://www.neuro.uni-jena.de/cat/).The CAT12 toolbox is an extension of SPM12 software (33) .In short, all T1weighted 3D sMRI are normalized by the DARTEL algorithm (Diffeomorphic Anatomic Registration Through Exponentiated Lie algebra) using an affine transformation followed by a nonlinear registration, corrected for bias field inhomogeneities, and then segmented into GM, WM components.DWI volumes are preprocessed using Functional Magnetic Resonance Imaging of the Brain (FMRIB) Software Library (FSL) (34).First DWI scans are corrected for eddy current distortions and susceptibility artefacts by the FSL-eddy correct.FSLs Brain Extraction Tool was used to remove the brain skull.The diffusion tensor calculations are performed by the FSL dtifit at each voxel of fixed DWI scans.The eigenvalues of the diffusion tensor (λ1, λ2 , λ3) were utilized to obtain maps of scalar anisotropy and diffusivity.Several diffusion metrics can be calculated.The widely used diffusion metrics are fractional anisotropy (FA) and mean diffusivity (MD).FA is calculated using equation 1. MD represents the magnitude of diffusion which is calculated by averaging the three eigenvalues as it is mentioned in equation 2. Finally, FA and MD are co-registered with the corresponding sMRI scans and each scan contains 121×145×121 voxels using SPM12. (1)

2D slice selection
Each FA, MD, and GM volume is decomposed into 2D slices along the axial view to highlight the most distinctive features and ensure improved classification efficiency.We select 32 slices from each subject based on higher entropy information (slices with indices 3465).The selected slices are associated with most of the deteriorated AD brain regions mentioned in literature such as the hippocampus, the entorhinal cortex, and thalamus.As a result, a total of 1600 (32×50) of each class (CN, MCI, and AD) are selected.More details are shown in table1.

Feature extraction using 2DCNN
The handcrafted features extraction was the main problem in the traditional machine learning algorithms which is hard and time-consuming.CNN can perform this task automatically without human intervention.CNN is the most common deep learning model used among neural networks.
It is inspired by the human visual system.A typical CNN architecture comprises principally an input layer, convolution layer, pooling layer, fully connected layer, and classification layer.The convolution layer extracts automatically the features from the input FA, MD, or GM images by multiplying element-wise with a filter.The pooling layer aimed to reduce the redundant information by acquiring the average of a region or the maximum.The fully connected layer is used to reduce and transform the feature maps to a column feature map.The classifiers are finally used for AD prediction.
In short, the 2DCNN architecture consists of three convolutional layers with 3×3 size filters.Each convolutional layer is followed by a RELU layer, batch normalization (BN) layers, and a max-pooling layer, then two fully connected layers, softmax layer, and output layer.The RELU layer sets the negative values to zero and BN accelerates the training process.More details are tabulated in table 2.

Classification using support vector machine (SVM)
SVM is a widely applied supervised learning method that treats small highdimensional data by finding a maximal margin hyperplane to separate classes and solve a binary classification problem (35).SVM is considered better to use than the Softmax layer as is mentioned in previously published studies (36; 37).The trained FA-CNN, MD-CNN, and GM-CNN are adopted to extract the features.These features are then transmitted to the SVM classifier instead of the Softmax layer for AD classification.These features extracted from FA, MD, and GM images is a matrix whose size is the number of slices multiplied by the number of features selected from each slice.For 32 slices of each subject, the feature representation has the dimension of 32×2.For all subjects (100), the output of each model is a matrix of 100 32×2.They are then concatenated into a total feature matrix with the dimension of 3200×2.SVM classifier is trained and tested using these deep extracted features as is shown in figure 2.

Input GM dataset
Fig. 2 The pipeline of proposed GM-CNN with SVM method to distinguish between AD and CN.

Multi-modality MRI fusion process.
The automatic AD screening fusion algorithm developed using multimodalities MRI is illustrated in figure 1.The three optimal CNN (FA-CNN, MD-CNN and, GM-CNN) are used to extract features.We tried several fusion procedures experiences (FA and MD), (FA and GM), (MD and GM), and (FA and MD and GM) to choose the best model score.The fusion process consists of merging the features extracted from FA, MD, and GM into a global feature vector.Accordingly, the size of the fused FA + MD + GM feature matrix is 3200×6.

Experiments
In this work, several experiments are carried out to validate the effectiveness of our proposed method to classify (AD vs. CN), (CN vs. MCI), and (AD vs. MCI).In the first experience, we performed a direct unimodal classification of features extracted from FA, MD, and GM.This gives us information about the best modality and map.In the second experiment, we study whether multimodality increases performance and allows better discrimination between the different classes or not.This is achieved by studying the impact of merging features of the two modalities.The 2DCNN-SVM proposed has been implemented using MATLAB ver.R2019a and running on a 3.1 GHz Intel-i7 processor, 16 GB of RAM.The CNN model was trained using an optimized SGDM (Stochastic Gradient Descent Momentum) using the back-propagation algorithm and cross-entropy as a loss function.The batch size is 64, the learning rate is 0.0001 for 25 epochs.There is a total of 3200 images of each map (FA, MD, and GM), 1600 images for each class.The dataset is divided into 75% for training, 15% for validation and the remaining 15% for testing the SVM.The same CNN architecture is used to train FA slices, MD slices and GM slices.For the SVM classifier, the extracted data is categorized into training, validation, and test data.We used the extracted features from 2720 images for the training and 480 images for the test.
The best SVM using radial basis function (RBF) (Gaussian kernel) classification score was obtained by 10-fold cross-validation.The optimal hyperparameters (cost and gamma) were determined using the grid search technique.It finds the best model result from different combinations of parameters; where cost controls the error and gamma gives the curvature weight of the decision boundary.

Evaluation
The performance of our method was validated using accuracy and the area under the receiver operating characteristic curve (AUC).The validation results are illustrated in table 3 and the ROC curves of 10-fold cross-validation are shown in figures 3, 4, 5.The fused FA, MD, and GM improved better the result and outperformed the single modality and the sMRI+MD fused procedures adopted in many previous studies (26; 29; 30).We tested our method using 240 AD images, 240 CN images, and 240 MCI images.The used evaluation metrics are the accuracy, sensitivity, and specificity determined by the confusion matrices.In matrices of confusion, the sensitivity is shown in the last row and the specificity in the last column.The diagonal boxes indicate the numbers and percentages of correctly classified classes and the last one shows the overall accuracy of the model.An example of the confusion matrix of the fused characteristics FA, MD, and GM is shown in figure 6  Table 4 shows that the FA, MD, and GM are important to discriminate the different AD stages.For the use of FA, MD, GM independently, we report that MD obtained the best result in the case of AD vs. CN with an accuracy of 98.96%.However, the GM yields better results in classifying AD vs.MCI and CN vs.MCI with an accuracy of 96.88% and 93.50% respectively.
We investigated the best combination of features (FA and MD, FA and GM, and MD and GM).Fused FA and MD outperformed the other combined features with an accuracy of 99.98 % and 98.33% to classify AD vs. CN and AD vs. MCI.On the other hand, fused GM and MD achieved higher results to classify CN vs.MCI with an accuracy of 97.00%, a sensitivity of 97.20%, and a

Discussion
To validate the performance and efficiency of our novel workflow, we compared it to the previous approaches presented in the literature and dealing with the same databases (ADNI) and the same modalities (sMRI and DTI).
Our results gained higher accuracy in the AD detection compared to other Multi-modality MRI fusion for AD detection using deep learning  studies as is shown in Table 5.
In general, our results concerning AD early detection imply the existence of distinct pathophysiological processes.In fact, the hippocampus is known to be one of the earliest and most severely damaged structures affected by AD.However, there are other structures involved in AD detection such as the amygdala, thalamus, and putamen.The relevant slices selection seems a powerful and easy method than segmenting the hippocampus or other brains regions which requires a human expert.Our network learns the complex patterns of brain atrophy from relevant sections that contain almost all of the AD-affected regions mentioned in the literature, for each patient.This eliminates the process of segmentation of the hippocampus and other regions of the brain.Moreover, a subsequent selection of the most discriminating characteristics is avoided in our approach.
Our results confirm the effectiveness of the DTI measurement FA and MD in the classification of AD vs. CN, AD vs. MCI, and CN vs.MCI which is consistent with the previous works (19; 18).In addition, The GM atrophy in sMRI is of great interest to researchers for the AD early detection.The sMRI based transfer learning has proven impressive results (23; 25).Generally, the VGG16 and VGG19 models have gained higher accuracy than other pretrained models (24).Recently, some of the authors(26; 27) succeeded in using a pre-trained (VGG16) model for automatic extraction of features and SVM for the classification; they achieved a higher accuracy.However, the transfer learning technique relies generally on natural images whose models are trained using the Imagenet database (39).Conversely, our simple networks learn and extract from scratch the most pertinent features.
In the past few years, the multi modalities (DTI-MD and sMRI) were reported by many researchers.They proposed different combination techniques to ensure the best classification.Aderghal et al. (29) suggested the transfer learning technique to perform the fusion process and Marzban et al. (30)adopted a cascaded CNN.However, they achieved lower accuracy than what we got which is over 97%.This is probably due to the small sample size we used compared to them, or the fact that we didn't work on specific ROI, or the impact of adding FA.
In summary, both, diffusion scales metrics and the GM are powerful elements and important for AD stage discrimination.The multi-modality fusion process (FA+MD+GM) seems to be the best technique to improve the AD classification performance.

Conclusion
In this paper, we have proposed a 2DCNN-SVM classification approach based on DTI scalar metrics (FA and MD) and GM segmented from T1w images from ADNI databases for AD detection and diagnosis.The fusion of features extracted from FA, MD, and GM by the proposed 2DCNN demonstrates the effectiveness of our method achieving a classification accuracy of 99.79%, 99.85%, and 97.00% for AD/CN, AD/MCI, and CN/MCI respectively.In conclusion; the use of DTI-FA, DTI-MD, and GM separately gives lower results than fused together.
• Authors' contributions All authors were involved in the work leading up to the manuscript.All sources used are properly disclosed (correct citation).

Fig. 9
Fig. 9 Comparison of performance of proposed technique for binary classification of AD vs. CN AD vs. MCI, and CN vs. MCI.

Table 1
Sample size of the preprocessed selection process

Table 2
Layers Proprieties for the proposed 2DCNN architecture.

Table 3
The performance of the validation dataset.
, 7, 8.All test results are summarized in Table 4. Matrice of confusion of AD vs. CN.

Table 4
Matrice of confusion of CN vs. MCI.Performance evaluation of the proposed 2DCNN-SVM technique on the test dataset.

Table 5
Comparison of results with state-of-the-art techniques applied to AD detection.