DeepCurvMRI: Deep Convolutional Curvelet Transform-Based MRI Approach for Early Detection of Alzheimer’s Disease

Alzheimer’s Disease (AD) is the most common form of dementia. It usually manifests through progressive loss of cognitive function and memory, subsequently impairing the person’s ability to live without assistance and causing a tremendous impact on the affected individuals and society. Currently, AD diagnosis relies on cognitive tests, blood tests, behavior assessments, brain imaging, and medical history analysis. However, these procedures are subjective and inconsistent, making an accurate prediction for the early stages of AD difficult. This paper introduces a curvelet transform (CT) based-convolutional neural network (CNN) (DeepCurvMRI) model for improving the accuracy of early-stage AD disease detection using from Magnetic resonance imaging (MRI) images. The MRI images were first pre-processed using CT, and then a CNN model was trained using the new image representation. In this study, we used Alzheimer’s MRI images dataset hosted on the Kaggle platform to train DeepCurvMRI for multi and binary classification tasks. DeepCurvMRI achieved an accuracy, sensitivity, specificity, and F1 score of <inline-formula> <tex-math notation="LaTeX">$98.62\% \pm 0.10\%$ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$99.05\% \pm 0.10\%$ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$98.50\% \pm 0.03\%$ </tex-math></inline-formula>, and 99.21± 0.08, respectively, using the leave-one-group-out (LOGO) cross-validation approach in multi-classification task. The highest accuracy obtained in binary classification is <inline-formula> <tex-math notation="LaTeX">$98.71\% \pm 0.05\%$ </tex-math></inline-formula>. In addition to LOGO, DeepCurvMRI was tested using randomly stratified 10-fold and 5-fold cross validation. These encouraging results are superior to the ones reported in related methods, showcasing the potentiality of DeepCurvMRI in capturing the key anatomical changes in MRI images that can be differentiated between various staged of Alzheimer’s disease classes.


I. INTRODUCTION
Alzheimer's disease (AD), the most common type of dementia, is a neurodegenerative disease that deteriorates brain connections, leading to memory impairment and decline in other cognitive functions [1]. In 2018, it was estimated that over 50 million people worldwide were living with dementia, and this number is expected to reach 152 million by 2050 [2]. The average life expectancy after AD diagnosis is 3-9 years [3], as currently, there is no cure for AD, and in the past 20 years, the Food and Drug Administration (FDA) has The associate editor coordinating the review of this manuscript and approving it for publication was Marco Giannelli . approved only two types of drugs to treat some symptoms of AD [4]. The stages of AD can be divided into two stages: Mild Cognitive Impairment (MCI), and Alzheimer's disease (AD). The MCI stage can be subdivided further into Early Mild Cognitive Impairment (EMCI) and Late Mild Cognitive Impairment (LMCI). Individuals with MCI face a significant risk in progressing into the late stages of Alzheimer's [5]. MCI patients experience a mild decline in memory and other cognitive functions. At a later stage, the patient would be unable to respond to the environment or carry on a conversation. Therefore, early AD detection would significantly contribute to preventive treatment and help delay cognitive deterioration [6]. Accurate diagnosis of the disease requires a series of examinations: cognitive tests, blood tests, behavior assessments, brain imaging, and medical history analysis [7], [8], [9]. However, the current examination relies explicitly on behavioral assessments and the patient's medical history as pieces of evidence, which both demand multiple testing sessions by expert doctors over a long period. The latter increases the diagnosis cost and brings subjectivity and alterity to the diagnostic outcome [10]. As a result, a more efficient and cost-effective diagnostic system is crucial. Recently, with the advancement in technology, several imaging techniques have been developed, such as Magnetic Resonance Imaging (MRI) [11], Positron Emission Tomography (PET) [12], and Computed Tomography (CT) [13]. These techniques are non-invasive, rapid, accurate, and are widely used to obtain additional information about AD diagnosis. At the same time, artificial intelligence (AI) has been significantly developed in the recent years and offered substantial advantages in computer-based diagnostic systems [14], [15], [16]. Over the past years, various efficient machine learning (ML) algorithms have been designed to improve disease diagnosis accuracy [17]. Research interests in this domain include both Support Vector Machine (SVM) [18], [19], [20], [21] and Deep Learning (DL) models [22]. SVM and regular neural networks have been criticized for their poor classification performance when trained on the raw/un-preprocessed data [23], [24], [25]. A series of feature preprocessing algorithms combined with the classifier is necessary for improving the classifier accuracy. For example, Kamal et al. [26] preprocessed MRI images using an adaptive mean filter and histogram equalization. Afterwards, image features were extracted using Haar Transform for the binary classification of AD using SVM. Additionally, Wang et al. [27] trained AdaBoost as a classifier for AD diagnosis, while intermediate features were processed and selected from brain gray-matter images using kernel principal component analysis (KPCA). In short, such methods heavily depend on a series of feature processing algorithms to classify improve the ML performance.
In contrast, DL models can take the raw data as input and find discriminating features in the training dataset during model traning, such as Convolutional Neural Networks (CNNs) [28], [29], Recurrent Neural Networks (RNNs) [30], and Multi-Layer Perceptrons (MLP) [31]. CNN models are frequently used to extract features from PET or MRI images for their ability to detect essential attributes accurately and automatically with high-processing speed. MRI images are easier to access than PET as they require less processing time and are less expensive. AlSaeed and Omar [32] utilized ResNet50, a pre-trained CNN model, to automatically extract AD diagnosis features using MRI images. They obtained an accuracy ranging from 73 to 99%. Hogan and Christoforou [33] developed a 3D CNN model to identify biological markers of AD from MRI images, giving an accuracy of 80% on the testing dataset. Moreover, it has been proven that a feature extraction approach in combination with CNN classification can improve the final prediction result and decrease the training time compared to ML approach [34], [35], [36], [37].
Recently, Anitha et al. [38] proposed a WT-CNN model for AD image classification, in which wavelet transform (WT) is applied as a feature extraction method prior to training the CNN model. WT-CNN model achieved an accuracy of 91.87%, which is 1.63% higher than the CNN model. WT can detect features overlooked by other feature extraction methods, such as breakdown points and discontinuities. Several other studies have also utilized WT as a tool for feature extraction in the form of wavelet coefficients from MRI images [26], [39], [40], [41]. However, WT's major limitation is its inability to identify curved edges, which in some cases causes false alarms. A more advanced approach is utilizing Curvelet Transform (CT) as a feature extraction method for its ability to obtain both linear and curved edges along multiple scales and orientations [42]. In this regard, several studies have applied CT in various computer vision tasks, namely tumor detection [43], [44], image segmentation [45], [46], [47], signature verification [48], [49], and face recognition [50], [51], [52]. However, despite its advantages, limited number of studies have reported using CT as a feature extraction tool for AD detection using MRI images [53], [54].
In this article, we propose a novel CT-based CNN model named DeepCurvMRI that improves AD stage prediction accuracy using MRI images. The model incorporates Fast Discrete CT (FDCT) for feature extraction across multiple scales and orientations. Followed by a shallow CNN network for the multi-class classification (Non-Demented (ND) vs. Very Mild Demented (VMD) vs. Mild Demented (MID) vs. Moderate Demented(MOD)) and binary classification(ND vs. VMD). The major contributions of the paper are summarized as follows: • A novel Curvelet Transform-based Convolutional Neural Network approach is proposed, which provides a more effective and faster method for AD diagnosis.
• Fast Discrete Curvelet Transform is applied as a feature extraction tool for AD MRI image classification for the first time.
• In comparison with other models, DeepCurvMRI requires less number of training parameters, giving a high classification accuracy in a short period.
• DeepCurvMRI shows better accuracy in comparison to VGG-16 and AlexNet. The rest of the paper are organized as follows. Section II provides details of the data used in this research and introduces the proposed DeepCurvMRI approach. Section III evaluates the performance of DeepCurvMRI and discusses the results. Section IV concludes the paper.

II. THE PROPOSED DeepCurvMRI FRAMEWORK
The overall flow of the DeepCurvMRI is illustrated in Fig. 1. The model consists of three main steps: Data pre-processing, feature extraction using Curvelet Transform, and classification. Each step is discussed below.

A. DATA DESCRIPTION AND PRE-PROCESSING
The AD MRI dataset used here was obtained from the open-source platform Kaggle, 1 which consists of 6400 MRI images of four classes, i.e., Non-Demented (ND), Very Mild Demented (VMD), Mild Demented (MID), and Moderate Demented (MOD). The dataset contains 200 subjects, with 32 horizontal slices of the brain for each subject. To avoid information leakage, the training and testing sets were united, and leave-one-group-out and k-fold cross validation were performed. The original image size is 176×208. All images were resized into 208 × 208. Fig. 2 shows the typical brain MRI samples for each class. Table 1 provides dataset distribution with a number of images in the obtained dataset.

B. FEATURE EXTRACTION USING FAST DISCRETE CURVELET TRANSFORM
Curvelet is an excellent Multiscale Geometric Analysis (MGA) approach. CT reserves the same decomposition benefits reported with the WT, but has the additional advantage of compact representation of edges and singularities on curves along multiple scales and directions [42], [55]. As a matter of fact, CT is commonly used to obtain sparse representations of smooth objects with discontinuity along curves. In this work, Fast Discrete Curvelet Transform (FDCT) is applied to AD MRI images to detect low-level features and reduce roughness and noise-amplifications within the decomposed images. This allows the detection of local and regional differences in brain images between AD and control subjects. The input to FDCT is Cartesian arrays f [x 1 , x 2 ] (representing an image), where 0 < x 1 , x 2 < n. This results in a collection of curvelet coefficients generated by 2D Fourier plane, as indicated in equation 1, in which j,l,k denotes the curvelet basis function indexed by orientation l, scale j, and spatial positions (k 1 , k 2 ).
In the theory of CT, two main approaches can be used to obtain the curvelet coefficients, namely Unequal Space Fast Fourier Transform (USFFT) method and Wrapping-based method [55]. USFFT generates the coefficients by sampling the Fourier image samples in an irregular manner, making the frequency curvelet response appear to be a trapezoidal wedge. Furtherly, all scales and orientational coefficients are generated in an ascending order. With the Wrapping-based method, on the other hand, the wedge is wrapped into a rectangle shape to perform the inverse Fourier transform. The wrapping is applied via periodic tiling of the spectrum using the rectangular wedge to collect the coefficients. Both Wrapping-based and USFFT methods produce identical results. However, the Wrapping-based method is applied here, as it is more time efficient and requires less computational resources in comparison than USFFT. Fig. 3 illustrates the curvelet wrapping architecture. Iff [x 1 , x 2 ] denotes the Cartesian arrays' 2D discrete Fourier transform, then the construction of the FDCT via wrapping is as follows: 1) Apply 2D Fast Fourier Transform to generate Fourier samplesf is the discrete localizing window, then form the product for each scale (j) and orientation (l) 3) Wrap the product around the origin to obtaiñ where the ranges for x 1 and x 2 are 0 ≤ x 1 < 2 j and 0 ≤ x 2 < 2 j/2 , respectively. 4) Apply the inverse 2D Fast Fourier Transform to all f j,l in order to obtain the discrete curvelet coefficients j,l,k 1 ,k 2 The number of scales represents resolution, which depends on the size of the original image. The maximum image size in the dataset is 176 × 208 pixels; thus, the maximum number of scales to be used is 5. Scales 1 and 5, scale 2, and  scales 3 and 4 contain 1, 16, and 32 orientations, respectively, giving a total of 82 subbands. Figure 4 illustrates the decomposed images for an ND brain ( Fig. 4(a)) and VMD brain (Fig. 4(c)). The Cartesian concentric coronae are characterized by the course level, the dyadic spatial square, and the fine levels surrounding the center, representing higher frequencies. The selection of specific scales and orientations is essential to avoid redundancy in information. The first scale of MRI images in the curvelet domain corresponds to the general information in the images. As, the scale increases, the noise content increases. Hence, for this type of image, it is sufficient to utilize the coarse scale, as the original image resolution is low. Thus, increasing scales does not necessarily lead to improvements in the classification accuracy. VOLUME 11, 2023 C. CURVELET DOMAIN DENOSING USING KURTOSIS Due to its multiscale and multidirectional advantages, CT is an effective tool for extracting meaningful information and suppressing noise in seismic data [56], [57]. Before applying deep learning, it is necessary to remove coefficients associated with noise by setting a proper threshold. Signals produced by curvelet transform are normally found in lower scales and specific orientations, while noise can be distributed over all scales and directions. As the scale increases, the noise present within the curvelet matrices increases. Thus, curvelet coefficients can be processed with a scale-dependent threshold as the following: (j,l,k) = 0, | j,l,k 1 ,k 2 | < Thr j (j,l,k) , else (5) whereˆ (j,l,k) is the thresholded curvelet coefficients and Thr j is the threshold value. In the curvelet domain denoising, a crucial step is to estimate a threshold from the curvelet noisy coefficients. Donoho and Johnstone [58] proposed a multi-scale threshold value (Thrj) as where is the standard deviation, N is the total number of coefficients, and J denotes the total number of decomposition scales. Lin et al. [59] improved on the above threshold value and proposed incorporating kurtosis statistics. Kurtosis is a measure of non-Gaussian characteristic for a random variable. Noise is generically non-Gaussian in nature.
In image processing, noise found in images can be highly non-Gaussian. Thus, one possible way to remove noisy coefficients is by thresholding based on kurtosis. A high kurtosis value indicates the presence of coefficients that carry crucial information, while a low kurtosis value indicates noise. Weighting the multi-scale threshold according to the coefficients kurtosis matrix gives where K (k) is the kurtosis of the curvelet coefficients calculated over a sample block, and K max (k) is the maximum kurtosis found among all sample blocks [59]. Kurtosis is calculated over a sample block using the following equation: x denotes a curvelet coefficient, E is the expectation, µ is the mean, and ξ is the standard deviation. A sliding window with a size of (3,3) is applied in this work. Fig. 5 represents the reconstructed curvelet coefficients of an AD MRI image in the coarse scale before and after kurtosis threshholding.

D. CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORKS
After feature extraction with Curvelet Transform, specific angles and subbands are fed into a CNN model to determine the areas most affected by Alzheimer's within the MRI images. To classify Alzheimer's stages, the CNN model is built from scratch. The proposed model consists of two convolutional layers with Rectified Linear Unit (ReLU) activation function, two batch normalization layers, two maxpooling layer, a global average pooling layer, a dropout layer, a dense layer, and a Softmax classification layer. Fig. 6 illustrates the architecture of the DeepCurvMRI model. Tabl. 2 represents DeepCurvMRI architecture details. The Deep-CurvMRI parameters are represented in Table 3.

1) INPUT LAYER
The input layer is the first layer in the DeepCurvMRI model, where threshholded curvelet matrices at scale 1 for all images are given as an input. The threshholded curvelet matrices at the coarse scale have a size of 17 × 17, thus making the input image size 17 × 17.

2) CONVOLUTIONAL LAYERS
Conventional layers are the primary building block of the DeepCurvMRI model. They receive an image as an input and convolve using filters to produce an output image. The output at each channel is known as a feature response or map, and it  is calculated as follows: x n = I × W n + B n , n = 1, 2, . . . , F where I is the input, x n is the output of the nth filter, W n is the weight of the nth filter, B n is the bias of the nth filter, and F is the number of filters. In the DeepCurvMRI model, the number of filters for the first and second convolution are 9 and 12, respectively, and both have a filter size of 3 × 3 (see Table 2). In this work, the ReLU activation function has been applied directly to the feature map output.

3) BATCH NORMALIZATION LAYERS
Batch normalization layers are used after conventional layers to reduce the effect of initialization and speed up the process of training by recenting and rescaling. Batch normalization applies a transformation that keeps the output standard deviation close to 1 and the mean output close to 0. The values are normalized according to the following equation: in which y i represents the output values,x i the normalized input values, α the scale, and β the offset factor.

4) POOLING LAYERS
Pooling layers are applied after convolution layers to reduce the size of the feature map, thus decreasing the number of parameters required and lower computational cost. Maxpooling layers pick maximum pixel values in the filter map selected by the kernel filter. The result is a feature map containing the most prominent features of the convolutional layer's outputted feature map. Maximum pooling is computed with the following equation where X is the input, F is the max-pooling window size, s is stride. After max pooling, a global average pool (GAP) is applied to reduce the dimensions of the feature map and produce a 2D feature vector. Unlike a flatten layer, GAP considers the spatial information, making it more robust to spatial translations of the input.

5) DROPOUT LAYER
Dropout is performed to randomly drop a fraction of the neurons in the GAB layer to avoid some variables from being repeatedly accepted during the training. This layer also aids in reducing over-fitting. The dropout value applied here is 0.5.

6) DENSE LAYER
The fully connected or dense layer is a standard feedforward layered network that includes input neurons, hidden neurons, and a softmax regression unit. The output of the GAP layer is fed to the dense layer. In the proposed method, 64 neurons are used in the dense layer, after which a softmax layer is applied to classify the classes. The softmax layer generates the probability distribution of the classification results for each pixel.

E. MODEL EVALUATION
Due to the limited number of patients in the dataset, DeepCurvMRI performance was assessed using two different methods: leave-one-group-out cross-validation (LOGOCV) VOLUME 11, 2023 and stratified k-fold cross validation. LOGOCV involves using N g − 1 groups (N s = 200) as training sets and one group as a test set for validation. This process is repeated N g times until each group has been used as the test set. On the other hand, stratified k-fold cross validation randomly selects a fraction of the data 1 k x 100% for testing purposes and uses the remaining data for training. The classification model is reinitialized in each iteration and the subjects from the previous iteration are included in the training. This process is repeated for k iterations. To assure the robustness of the developed model, we performed the randomly stratified k-fold cross validation approach with two values of k, i.e. 10 and 5. We evaluated the performance of DeepCurvMRI using the following metrics: Accuracy, F1-score (F1), Specificity, and Sensitivity. This subsection presents the experimental results of the proposed DeepCurvMRI model for the multiclass and binary classification of AD. Adam optimizer was utilized with a learning rate of 10 −3 . The diagnostic abilities of the proposed DeepCurvMRI were evaluated on the aforementioned 200 subjects (see section II) that have been included in this study using a leave-one-group-out (LOGO) crossvalidation approach and different k-fold approaches (5-fold and 10-fold). We evaluated the performance of our model using overall accuracy, sensitivity, specificity, and F1-score.

B. DISCUSSION AND COMPARISON
This research aims to provide a detection system for Alzheimer's Disease stages using MRI images to improve the diagnostic accuracy in medical centers. The first step in the proposed model is applying curvelet transformation as a feature extraction function on the collected images. The next step is implementing kurtosis thresholding to remove curvelet coefficients associated with noise. Afterward, the images were reconstructed from the thresholded curvelet coefficients and were classified using CNN classifier. Results in Tabl. 5 express curvelet transformation's ability to capture the key anatomical changes in the brain MRI images, which can be utilized to differentiate between the ND, VMD, MID, and MOD classes. By feeding the deep learning network the output from the FCT, a new space of the MRI representation is created, giving a high classification accuracy. Table 5 compares the performance analysis of DeepCurvMRI with other models. The existing methods were trained on the binary and multi-class classification on same dataset utilized in this article. DeepCurvMRI is compared to models such as VGG-16 [60], AlexNet [64], DEMENT [65], SVM [61], HTLML [63], and Feed-forward LPQNet [62]. LPQNET achieved slightly higher accuracy than DeepCurvMRI for the binary classification of AD. However, cross-validation has yet to be applied in LPQNet to account for leakage possibility within the Kaggle Dataset, as there are 32 MRI slices for each patient. DeepCurvMRI outperforms all the other models in terms of accuracy and F1-score, as evidenced by the results of classifying four classes with 51,797 parameters. The performance of DeepCurvMRI is 28.41% higher than VGG-16 and 4.62% higher than AlexNet. Both models utilize millions of parameters. This is attributed to curvelet transformation and its ability to represent smooth objects with discontinuities along curves. A better image representation yields significantly better results within a shorter period. Moreover, thresholding curvelet coefficients using kurtosis removes coefficients associated with noise, providing a more precise representation of the MRI images. Within the Kaggle dataset, a 0.35% increase in accuracy is observed for the multiclass classification of AD. Kurtosis thresholding can be more advantageous based on the clarity of the input images.

IV. CONCLUSION
This work proposes a curvelet-based CNN structure for the binary classification of AD MRI images. DeepCurvMRI is trained and tested using the Kaggle database to classify Alzheimer's disease stages. FCT with wrapping method is used to decompose the MRI image into scales and sub-bands. The obtained curvelet coefficients are then processed and thresholded using kurtosis to extract prominent features. Our model achieved an overall accuracy, sensitivity, specificity, and F1 score of 98.62%±0.10%, 99.05%±0.10%, 98.50%± 0.03%, and 99.21 ± 0.08, respectively, using LOGOCV for the multiclass classification of AD, and an accuracy, sensitivity, specificity, and F1 score of 98.71% ± 0.05%, 98.84% ± 0.03%, 98.50% ± 0.03%, and 99.25 ± 0.01, respectively, for the binary classification of ND/VMD. DeepCurvMRI surpassed the performance of the existing methods. Hence, the results showcase the potentiality of the proposed Deep-CurvMRI to efficiently identify brain regions associated with AD MRI images, serving as a fast and easy to implement the tool for assisting physicians in AD diagnosis. As for future work, DeepCurvMRI will be trained and tested on various datasets for Alzheimer's disease diagnosis. Moreover, metadata such as clinical biomarkers and demographics can be included and combined to create a holistic approach to AD diagnosis.