A novel AI-based System for Detection and Severity Prediction of Dementia using MRI

Dementia is a symptom of Alzheimer’s Disease (A.D.) that affects many people around the globe each year. There is no effective cure to treat this disease, and it can prove to be deadly to the patient if left untreated or undetected. In this paper, the authors propose a novel DCGAN-based Augmentation and Classification (D-BAC) model approach to identify and classify dementia into various categories depending upon its prominence and severity in the available MRI scans. The proposed detection of early onset of dementia, also referred to as Mild Cognitive Impairment (MCI), is also studied with the help of a novel GAN-augmented dataset. The proposed model can predict MCI with an accuracy of 74% and can classify dementia into four categories depending upon its prominence in the MRI scan. The authors have also utilized Visual Explainable A.I. (XAI) and have used GradCAM to represent the internal working of the model visually. This novel approach helps verify the differentiating features of the MRI scans learned by the CNN model during training. Three different datasets, namely the original dataset, geometrically transformed images, and a GAN-augmented dataset, have been used for performance analysis. A comparison of the performance of the CNN model has been made on these datasets, and the superiority of results using the novel GAN-augmented dataset has been studied and discussed. Moreover, progressive resizing has also been applied on this GAN-dataset, and different CNN architectures have also been used to achieve better performance scores. The model proposed in the end has a training accuracy of 97% and a testing accuracy of 82% when tested using a conventional CNN architecture and has a testing accuracy of 84% and 87% when tested using VGG-16 and VGG-19 architecture, respectively.


INTRODUCTION
Advancements in the field of medicine have led to better life expectancy around the world. However, a higher number of elderly citizens now have dementia. The symptoms involved in this disease include memory loss, improper brain functioning, impairment in Activities of Daily Living (ADL), and difficulties in communication and expression [1]. Early dementia starts in Mild Cognitive Impairment (MCI) in elderly patients and slowly develops to a harsher and much severe stage called Serious Dementia. Serious dementia is a life-threatening disease, and there is no effective cure for this disease to date. The only known method to detect and prevent this disease is in the early MCI stage itself. But existing State-Of-The-Art (SOTA) models and systems fail to predict MCI accurately, leading to Serious Dementia in a substantial number of elderly patients globally [2]. This paper discusses dementia and its early detection to prevent it from developing to later stages for which there is no cure. Estimates show that 32 million patients had been affected with dementia in a span of 15 years between 2000-2015 [2]. And it is estimated that there will be more than 135 million patients affected with dementia globally by 2050. In addition to this, the financial cost involved is far from cheap and unrealistically burdensome for both patients and hospitals. Research shows that timely intervention in the early stages of this disease can save about 7.9 trillion dollars globally [2]. Therefore, it is imperative to detect MCI and stop this disease from affecting individuals.
These models are beneficial over other conventional forms of A.I. algorithms like Support Vector Machines (SVM) [32], Hand-crafted feature learning machines [33], Ensemble Learning models [34], and Random Forest models (RF) [35] as these require manual feature extraction which is very difficult and time-consuming work. Despite all the meticulous work involved, these models [32][33][34][35] do not guarantee SOTA results [13]. This is why it has become increasingly necessary to opt for A.I. algorithms like DNN and CNN when working on Computer Vision (CV) and image-processing tasks [14]. A similar CNN model is proposed in this paper that takes Magnetic Resonance Image (MRI) scans and performs various operations like convolution, pooling, and dropout to extract features from these MRI scans without any manual intervention required from the researchers. These extracted features are processed to map them to their corresponding class label, and then MCI and severity of dementia are detected.
Researchers in the medical domain face a significant problem: the unavailability of good quality and classbalanced open-source datasets [15]. The datasets are not readily available for implementation in research, or the classes are highly imbalanced as they do not have the same number of samples for each class [15]. This poses an underfitting and biasing problem [15]. Researchers must manually collect MRI scans from medical institutions and design their dataset to implement a dementia classification model [16]. Medical institutions are hesitant to disclose sensitive patient data required for balancing the dataset, due to which the class imbalance problem persists. Researchers have to resort to conventional image augmentation techniques like flipping, rotating, and scaling images to balance the dataset and convert it to a machine-learnable [4-21, [23][24][25][26][27][28][29][30]. Although conventional image augmentation techniques work very well on other images to balance the dataset, they are not recommended when working on medical images. Medical images need to maintain their spatial conformity, but it gets altered when cropped, rotated, or flipped. These augmented images can be used for binary classification, i.e. when the model only distinguishes between A.D. patients and non-AD patients. Still, it does not work well when more than two classes are involved. A multi-label CNN model differentiates between MRI scans based on the tumor size present in the scan. This tumor gets improperly cropped or rescaled when conventional augmentation is applied, resulting in mislabeled class samples.

D-BAC model:
The D-BAC (DCGAN-based Augmentation and Classification) model proposed in this paper tries to address these issues with the help of a GAN-based augmentation approach wherein the spatial conformity of the augmented images remains unchanged and unaltered. This helps train the CNN model smoothly and reduces misclassification of image samples, which helps achieve better performance metric scores. The classified images are visually presented using XAI to analyze the progression of dementia in a patient. Progressive resizing is also applied to the dataset to improve the performance metrics and scalability using different CNN models. This paper proposes the following contributions to the field of AI-assisted dementia research: 1. The paper addresses the problem of improper augmentation of medical images using conventional augmentation techniques. A multi-label class-balanced novel dataset consisting of MRI images augmented using a custom-DCGAN is proposed. 2. Early-onset of dementia (MCI) is predicted, which goes unforeseen in the existing literature. This is done using models trained on all three datasets, and a comparison is made among them based on various metrics. 3. A multi-class CNN-based classifier is proposed to classify dementia depending upon the severity of the disease into four discrete classes. The existing models mostly revolve around binary classifications, which fail to predict the stage and severity of dementia. 4. The CNN models are visualized for their features using Grad-Cam, which helps to give a visual perspective of the tumor progression in MRI scans of patients with dementia. 5. The samples of the novel dataset are progressively resized, and different open-source architectures are applied to them to help improve the accuracy and metric scores of the model proposed in this paper.
The rest of the manuscript is divided into four sections-Section II talks about existing literature and advancements in the field of dementia detection. Section III looks at the methodology and workflow proposed in this paper. Section IV discusses the results and their implications, followed by Section V which concludes the paper and discusses the future scope of this research. The last section acknowledges the reference works used in this research.

LITERATURE SURVEY
Dementia is a disease caused in elderly citizens due to a lack of exercise, inconsistent routine, and social interaction. It directly affects the brain of a patient. It can have adverse symptoms like memory loss, difficulty concentrating on a particular task, confusion leading to forgetfulness while performing daily tasks such as brushing teeth, bathing, cleaning, etc., and frequent mood swings. The only known method to detect this disease is through MRI scans of the patient's brain. There is no cure for this disease, and the only possible way to stop it from affecting patients is by detecting it in its early developmental stages. This early stage is also known as Mild Cognitive Impairment (MCI). Dementia progresses to more severe and harsher forms from the MCI stage onwards. There have been various methods and models used in the past for early detection of dementia and for preventing dementia altogether. These approaches are based on the MRI scans of patients and other symptoms leading to dementia, such as gait analysis, handwriting analysis, and consistency in performing ADLs (Activities of Daily Living).

A. MEDICAL DATA-BASED MODELS
Davis-Owusu et al. [1] proposed a novel approach to preventing dementia in elderly citizens by building a bidirectional activity-based system. This system helped in coordinating the actions and routines of elderly citizens and their caregivers. This system helped create a sense of belongingness and attachment in the elderly citizens and averted feelings of detachment and loneliness, which helped effectively prevent dementia. Pundane et al. [3] proposed another novel approach in which they studied various Ambient Assisted Living (AAL) systems. These systems are meant to assist elderly citizens in their Activities of Daily Living (ADLs) so that they do not have to deviate from their routine and perform their daily activities with ease. They studied various mobility-based systems that help give a sense of freedom to elderlies so that they do not have to depend on others for moving around and performing exercises. They also studied social interaction-based systems, which help elderlies stay socially and emotionally connected to their relatives and other elderly citizens. As proposed in [2], such systems help give a sense of freedom and liberty to elderly citizens, which helps to avert dementia and other related diseases in elderly patients.

Sabo et al. [36]
proposed a novel gait-based analysis approach for predicting dementia in the citizens of a residential society. They used a two-stage pipeline procedure consisting of Graph CNN called ST-GCN. This approach helped detect dementia in citizens by analyzing abnormalities in their gait patterns compared to the gait patterns of normal subjects. Various pose-estimation-based libraries are used for this purpose, but this approach was a bit complex and challenging to implement in a real-life setting. Jian Ma et al. [37] proposed a novel finger-tapping and gait-based combinatorial approach for dementia analysis and prevention. Finger-tapping pattern data like time intervals, frequency, speed, etc., were collected and combined with the gait pattern data of the patients. This data was fed to an ML model to improve fall-risk predictions in dementia patients. This approach required extensive data collection and thorough cleaning of data before it could be implemented. Luz et al. [38] proposed a speech-based approach to detect and analyze the rate of cognitive decline. A combination of acoustic and linguistic recordings is used to monitor patients' cognitive status. This required extensive cleaning of audio samples and routine and timely collection of audio data from patients, which were time-consuming and laborious. Rohanian et al. [39] presented another audiobased approach that combined audio recordings with text data to predict cognitive impairment. Both regression and classification models were used to monitor the rate of cognitive decline. Creating the dataset through data collection of diverse patients and then cleaning the data was a very laborious task. Cohen et al. [40] proposed a linguistic model trained on the Dementia Bank dataset. The model was a mixture of healthy controls and dementia-based linguistic samples and could predict dementia in patients effectively. This model used a readily available and precleaned dataset (DementiaBank), making its implementation simpler. But creating the dataset for such a model could be a tedious task, and its implementation could be even more tedious.

B. MRI IMAGE-BASED MODELS
After reviewing and analyzing the drawbacks of the above literature, the authors came across another well-renowned method of dementia prediction through MRI brain scans of dementia-affected patients. This method involved various classification algorithms like CNN, LSTM, RNN, etc., applied on MRI scans that helped differentiate and classify different levels of dementia. The review of these papers and their drawbacks is presented below.
Habes et al. [4] proposed a novel approach to classify dementia into two categories -Normal Cognition (N.C.) and Alzheimer's Disease (A.D.) using CNN. They applied feature maps of CNN to build a time-to-event model with additional features related to A.D. to classify dementia [4]. The dataset used was a combination of ADNI and AIBL. Model training and learning were done with the help of a CNN [4]. A regression model was also built using the features learned by the CNN model. However, determining results from two different models has two significant drawbacks -It is time-consuming to train different models, and it gets difficult for untrained individuals to integrate these models. S. Basaia et al. [5] proposed a deep-learning model to predict A.D. and MCI. Simple data augmentation techniques (scaling, rotation, translation, etc.) were applied to address the overfitting problem. They were able to achieve exceptional results on popular datasets like ADNI and MILAN [5]. However, these results applied only to a certain number of unseen datasets, and they failed to consider the advancement of dementia to later stages. M. Amin-Naji et al. [6] built a Siamese Convolutional Neural Network (SCNN) to detect dementia using features maps generated from a Res-Net model. They used an unsupervised approach to classify dementia into two categories which is helpful when unlabelled datasets are involved in the research and annotating labels to each sample is difficult [6]. The drawback of this approach was that SCNN could only classify dementia into two classes, and its functionality was limited. SCNN also failed to predict early dementia or MCI.

M. Kavitha et al. [7]
proposed a U-net-like Convolutional Neural Network (CNN). This model was used to segment images and classify dementia into four classes. Remarkable results were obtained on the ADNI dataset [7]. The only limitation to this approach was that the training data was small and limited [7], and no augmentation techniques were applied to increase the number of image samples. This model cannot generalize well to future unseen images, leading to misclassification of dementia and a poor accuracy score. M. Liu. et al. [8] proposed a combination of classification and a regression model to predict the diagnosis of dementia. Feature maps from CNN were used with the clinical scores obtained from the regression model to classify dementia into four classes [8]. ADNI dataset was used. However, the regression accuracy was limited and saturated after a certain point, and input samples were sparse, posing an overfitting problem as no augmentation was applied [8].
proposed a method to determine A.D. using Deep Convolutional Autoencoder (DCA). DCA was composed of an encoder and decoder, which were used to down-sample and up-sample the MRI scans to learn the underlying features in the images [9]. The limitation was that it failed to learn the statistical patterns and map the input MRI images in different categories, leading to inferior accuracy scores. X. Hong. et al. [10] proposed a model in which a 2D-CNN model and a 3D-CNN model were used. The final layer of this combined model averaged the results of feature maps obtained from both models to predict the diagnosis [10]. However, this work was hugely dependent on a readily available balanced dataset and many MRI images that are not simply available in the medical domain. No augmentation techniques were used to address the class imbalance problem posed by medical datasets.
Shaker El Sappagh et al. [41] proposed a Machine learning model using random forest classification. The experiments were conducted using the ADNI dataset. The results obtained were as follows; the first layer had a crossvalidation accuracy of 93.95% and an F1-score of 93.94%, and the second layer had a cross-validation accuracy of 87.08% and an F1-Score of 87.09%. The model consisted of 2 layers; for the first layer, features and predictors were discovered, MMSE being the essential feature for the Alzheimer's disease class, and Clinical dementia rating sum of boxes (CDRSB) as the rating scale or predictor for C.N. and MCI classes. The Functional Assessment Questionnaire was the most crucial feature for the second layer's sMCI (Stable) and pMCI (progressive MCI) classes. One of the limitations was the use of R.F. classification, which can be slow for real-time predictions that raise questions about the model's practicality; also, the classification was only for Mild cognitive impairment and did not cover other stages based on the intensity of the disease. Janani Venugopalan et al. [42] proposed using deep learning models; predictions by multi-modality data are better than single-modality data. The three setups implemented and discussed on the ADNI dataset are: electronic health records (EHR) + single nucleotide polymorphisms (SNP), electronic health records+Imaging+single nucleotide polymorphisms, and electronic health records+Imaging. In the external test performance: The EHR produced Accuracy: 0. The limitation of the model is the lack of visualization, which has been mentioned in the future scope. The data was taken from the ADNI dataset, but no augmentation was applied.
Changhee Han et al. [44] proposed an unsupervised medical anomaly detection GAN (MADGAN) to detect brain anomalies at different stages on multi-sequence structural MRI. This method helped reliably predict the successive three MRI slices from the previous three images for unseen healthy images and detect subtle anatomic anomalies and lesions. The drawbacks of this method to the dataset chosen are that this method requires a sizeable balanced dataset and does not account for the detection of the different stages of dementia. Simeon Spasov et al. [45] proposed a parametric-efficient method combining MRI, demographic, neuropsychological, and APOe4 genotyping data to predict MCI to A.D. conversion simultaneously and A.D. vs. healthy classification. This method effectively predicted conversion from mild cognitive impairment to Alzheimer's disease within three years. Still, it could not predict the exact stage at which a patient's dementia would be without the presence of a sizeable balanced dataset.

C. AUGMENTATION AND OPTIMIZATION-BASED MODELS
Additional literature focuses on conventional data augmentation methods like rotation, cropping, scaling and resizing, and optimization techniques that produce the best results on medical images.
A.M. Taqi et al. [11] studied data augmentation and various optimization techniques that could be used for medical images. They found that RMSProp and Adam optimizer gave the best results compared to other optimizers when working on medical images [11]. U. Senanayake et al. [12] worked on A.D. diagnosis by fusing different residual networks like ResNet, GoogLeNet, and DenseNet to reduce the complexity of the datasets and enable effective fusion. They achieved good results using this method for binary classification, but the classification of dementia based on severity and unbalanced datasets weren't studied by them [12]. The binary classification was done between different categories of dementia, such as N.D. v/s MCI and N.D. v/s S.D., and the results were compared with SOTA models. The results obtained using different CNN models were significantly better than other conventional models, but the proposed approach doesn't work well in real-world scenarios [12]. The complexity keeps increasing as the number of binary classifiers increases, and it becomes difficult for untrained medical officials to decide which classifier to use. This model also lacked in predicting early dementia. Daniel et al. [13] posited a 1D temporal convolutional network to classify dementia using hand-engineered features. They achieved good accuracy but extracting features manually is time-consuming and laborious and doesn't always guarantee good accuracy scores. Early dementia wasn't studied by [13].
A. Fedorov et al. [14] proposed an unsupervised model called Deep InfoMax (DIM) to predict the progression of A.D. This model was used to explore the brain structure in a flexible non-linear manner. It performed well. However, this was a novel implementation methodology, and it didn't perform well compared to other existing models [14]. J. M. Valverde et al. [15] performed a systematic literature review for articles that applied transfer learning to MRI brain scans, categorized them, and extracted relevant information. Patterns were recognized, which showed the models that worked best with transfer learning on MRI data [15]. R. Gupta et al. [16] proposed a GAN model that generates new data with the same statistical patterns as the training set using a generator as the network to improve the resolution and discriminator to train the generator better. This helped in creating super-resolution MRI images [16]. However, this model was very resource-intensive and could not provide possible results with limited hardware capabilities. D. Lu et al. [17] proposed the early diagnosis of A.D. by patch-wise feature extraction from structured M.R. and FDG-PET images. They found that the performance of classifiers built using a combination of FDG-PET and structural MRI images was better than those built using structural MRI or FDG-PET scans alone [17]. However, this model could not perform well with sparse datasets and only with extensive datasets rarely available in the medical field. Early-onset wasn't studied. H. Shin et al. [18] proposed a model that uses a GAN with discriminator-adaptive loss fine-tuning using PET scans and MRI scans for A.D. diagnosis. Unlike other GANs, this model uses A.D. diagnosis in the GAN training process to better A.D. classification performance [18]. However, other deep CNN and GANs provided better results on the testing data than the model proposed in [18]. T. Jo et al. [19] proposed a deep learning model to classify A.D. by using a combination of fluid biomarkers and multimodal neuroimaging data. It was used to detect structural and functional biomarkers for A.D.
in the MRI scans. It produced good results but was limited to a binary classification of dementia, reducing its functionality [19]. Early dementia wasn't studied by [19].
X. W. Gao et al. [20] proposed a 3D-like approach by using deep learning to extract information using 2D slices and 3D blocks of C.T. scans of the brain. An advanced CNN was used, which integrated both 2D and 3D CNN networks [20]. It produced high-quality results when predicting A.D. but was limited due to the lack of readily available C.T. images. Z. Cui et al. [21] proposed an enhanced version of the Inception (V3) neural network on MRI scans. They used a method called "Multi-Attention Combination" to learn the tumor size efficiently [21]. This was applied to the existing Inception model to enhance its performance. However, its accuracy was limited due to the lack of available data, and the authors did not use any augmentation techniques to increase the dataset size [21]. Early-onset of Dementia wasn't studied in [21]. Leonardo Rundo et al. [45] proposed a Progressively growing GAN (PGGAN) augment brain MRI, which used a multi-staged generative training method to infinitely generate 256x256 sized M.R. images for tumor detection. Although this method was very effective in generating brain MRI, it required a large balanced dataset to train, without which the images would be subpar. Chee Keong Chong et al. [46] proposed a method that uses a Super-Resolution GAN to learn 3D shape variations in adult brains and a pix2pix GAN to upgrade images slices with realistic local contrast patterns to synthesize 3D brain MRI images. This method effectively generated realistic 3D MRI images with high accuracy, with its only limitations being feature representation between slices.

III. METHODOLOGY
This section explores the implementation methodology followed in this research.  Fig.1 and Fig.  2.

B. DATA PRE-PROCESSING
Input data samples were pre-processed using two methods -Cleaning and Normalization.

CLEANING:
The GAN-augmented dataset was checked manually for deviant samples. These samples included those that deviated from the normal distribution of other samples belonging to that class label. These deviant samples often reduce the accuracy of the model and are called mislabeled samples. It is necessary to discard these samples so that they do not inhibit the accuracy and performance of the model. Each class had some of the other deviant samples, which didn't represent the normal distribution of that class. The tumor size in MCI scans is minor, so some GAN-generated samples did not correctly capture the tumor size of MCI. The tumor in these augmented scans was absent, and they fell under the N.D. label but were mislabelled as MCI. Such samples needed to be removed from the MCI category and similarly from other categories as well.

NORMALIZATION:
Normalization was applied to the input dataset, which consisted of 8,000 images. Normalization reduces the variance and overfitting of the dataset by the CNN model by scaling down the feature values to the range ℝ (0-1). This is done so that the model can also generalize well to the class of unseen samples (samples not used for training). Generally, input features are unevenly distributed in the feature map. The features are mapped in the range ℝ (0-255), a wide scale. This causes high variance in the dataset, and the model might start to overfit the dataset if it is not normalized. Overfitting tends to give perfect validation accuracy but inferior test accuracy. This causes a significant drop in the performance of the model after deployment. Therefore, normalization was applied to the dataset to scale the features down to a much narrower scale.

C. CONVOLUTIONAL NEURAL NETWORK (CNN) MODEL
A conventional Convolutional Neural Network (CNN) model was used to classify the disease into different classes. 6 Convolutional layers were used with a kernel size of 1 and a stride size of 3 for feature extraction and learning. 5 Pooling layers were used with a pool size of 2 to drop excess features and speed up computation. Batch normalization and dropout regularization with a value of 0.5 was applied after every convolutional layer to minimize variance. Finally, a flatten layer and four dense layers were added to the end of the model to flatten all the features into a single layer. This particular architecture of CNN was selected as the authors wanted to assess the performance of a rudimentary and straightforward CNN model on the custom dataset. More sophisticated and optimized CNN models and their performance on the dataset have been discussed later in the paper. Four output nodes in the final layer represent each class of dementia. The model was trained over 2,418,660 parameters. A Rectified Linear Unit (ReLU) activation function was used in each layer, excluding the last layer. This function is linear in the first quadrant and 0 in all the other quadrants. The last layer uses a SoftMax activation function that predicts each class's probability and outputs the class with the highest probability. A Categorical Cross-Entropy loss function was used to compute the error, and an Adaptive Moment Initializer (Adam) was used to optimize the model. Fig. 4 shows the CNN architecture used in this study, and Fig. 5 shows the final flattening and dense layers. Table 2 tabulates the layers in the CNN model along with their parameters.

D. GRADCAM FEATURE VISUALIZATION
Gradient-weighted Class Activation Mapping (GradCam) uses the gradients computed in the final CNN layer to generate a heatmap highlighting the features learned from the input MRI scans. This heatmap represents the different spatial conformities that the CNN model learns for different classes of MRI scans during the training phase. The differentiation learned is then used to predict the class of an unseen MRI test scan accurately. Fig. 6 shows the heatmap of some MRI scans taken from the test set used in this research.
The blue color is most prominent for the N.D. scan and least prominent for the S.D. scan. This was referred to as "blue shift" and represented how the CNN model differentiates between different classes of MRI scans.
The spatial conformity learned by the CNN model is represented in these heatmaps, and this helps to accurately distinguish between different classes of test samples and make appropriate predictions. This also helps in the early prediction of dementia or MCI. The CNN model takes a training image as an input and applies operations like convolutions, pooling, and dropout to learn, extract and discard features. The order and conformity of these features in the plane represent a particular class and severity of dementia. This conformity varies with the features of each class, such as the tumor size. Different colored heatmaps represent the different feature distributions in space used by the model to distinguish between different samples. This heatmap helps to make informed predictions about the early onset and severity of dementia. Each MRI scan, along with its intensity value (IV), is presented in Fig. 6.

E. PROGRESSIVE RESIZING (PRO-RESIZING)
Pro-resizing was applied on readily available open-source CNN architectures like ResNet-18, ResNet-34, VGG-16, and VGG-19. These models were trained on the custom dataset proposed in this study, and results with and without pro-resizing were compared. Progressive resizing is a technique used to resize input images sequentially. This includes starting from images of size 32x32, resizing them to size 64x64, and finally to 128x128 and giving them as input to different architectures. Each scaled-up model learns its features and incorporates the features learned by the scaled-down model used before. Fast AI was used to apply progressive resizing and to analyze the performance of some commonly used pre-trained CNN architectures on the custom dataset proposed in this study. VGG-19 performed exceptionally well as compared to other CNN architectures. Various values for batch size, epochs, and resolution were tried, and the best results were considered in this study.

F. PROGRESSIVE-GROWING GAN (PRO-GAN)
Pro-GAN is an extension of the DCGAN model used in the D-BAC system. It has been used as an additional measure to ensure the stability of the generated images and to ensure that they maintain their spatial conformity. Like progressive resizing, Pro-GAN constantly increments the size of the images that are input to it. This is done so that the model generator can generate various images of different resolutions until it reaches the desired size and resolution image. This approach has proven to be very effective in this study for generating high-quality synthetic images from original MRI scans.

G. DATASET
The Dementia dataset was acquired from a Kaggle repository   8 represents this dataset. These categories were highly imbalanced. DCGAN-based augmentation was applied to synthesize new images that were appended to the original dataset after data cleaning. Fig. 9 represents the custom-augmented dataset. This novel Dementia dataset proposed in this research consists of 2500 samples for each class, which helps balance all the categories with the same number of samples and addresses the biasing problem. This novel dataset can also be used for future research in this field.

H. COMPUTATIONAL COMPLEXITY
The

IV. SYSTEM ARCHITECTURE
This paper proposes a novel architecture that combines a DCGAN model for image augmentation and a CNN model for classification. Fig. 10 represents this architecture.
The original dataset was acquired from a Kaggle repository [27] and was balanced using DCGAN-based augmentation. This novel balanced dataset obtained after augmentation was manually analyzed, and mislabelled samples were discarded. Normalization was further applied to the dataset to avoid overfitting, and then this dataset was used for training. This new dataset consisted of 8000 training images and 2000 testing images. This initial stage is termed the "Data acquisition and augmentation" stage.
The model was trained for 50 epochs using the novel dataset generated above. 50 epochs were chosen because the authors noticed that the model started to overfit the dataset above 50 epochs, which resulted in a high accuracy on the training set but a low accuracy on the test set. The model underfits the dataset when trained on epochs lower than 50, which resulted in inadequate training and testing accuracy. A CNN model was used to extract essential features, and the loss/error was computed. The loss was minimized using Adam optimizer. The output of this stage was the final ML model, which could predict early onset of dementia and classify MRI scans into four different categories depending upon the tumor size -N.D., MCI, MD, and S.D. This stage is termed the "Training and Optimization" stage.
The third stage involved the Grad-cam-based feature visualization. The weights and features from the last convolutional layer were taken and visualized using Gradient-weighted Class Activation Mapping (Grad-Cam). This stage is termed the "Model Visualization" phase.
The final stage involved testing and making predictions on the test dataset, which consisted of 2000 MRI scans divided equally between the four categories. This is called the "Testing and Prediction" stage. The model could make predictions on this test dataset with the help of the learning done in the training phase.

RESULTS AND DISCUSSION
The results for both the augmentation techniques -Simple augmentation and GAN-based augmentation have been compared below. Simple augmentation involves augmenting images by using conventional methods like flipping, scaling, and cropping of images. This technique yields poor results as the spatial conformity of the images are altered during augmentation. The tumor size is crucial in predicting early dementia and determining its severity, but it gets altered during augmentation, leading to poor results. Simple augmentation can be used for binary classification where the classification is not done based on the tumor size, and only the present is essential. However, it cannot be used for multi-class classification as a multipleclass CNN model differentiates the severity of dementia depending upon the size of the tumor. Therefore, the D-BAC system proposes a GAN-based augmentation as it maintains the spatial conformity of the original images and retains the tumor size in the augmented images. This helps to predict MCI accurately along with the severity of dementia. This method has yielded significantly better 14 results as compared to simple augmentation. The CNN model could achieve 97% accuracy on the training set and SOTA accuracy of 80.5% on the testing set. This is an improvement of over 30% compared to M. Liu's DSML model [8] and an improvement of over 25% compared to M. Liu's enhanced DSML model called the DM2L model [8], which are the existing SOTA models in multi-class dementia classification. Note: "-" represents an unknown value; "±" represents standard deviation.  Table 4 compares the D-BAC model proposed in this paper with other SOTA models in multi-label Dementia classification. The dataset augmented using DCGAN gives the best results, and classification accuracy of 80.5% as the spatial conformity and the standard data distribution of the augmented images remain unaltered. Simple augmentation yields a poor accuracy score as it alters the conformity of the images by scaling, cropping, and resizing the images. The original imbalanced dataset is also compared in Table 4 with other models. It has the lowest accuracy score of 43.5% as the model is biased towards the densely-populated labels and doesn't generalize well to samples from the sparsely-populated labels. Table 5 and Fig. 12
Precision is the number of false positives classified by the model. Eq. (1) depicts the formula for precision.
The recall is the number of false negatives classified by the model. Eq. (2) depicts the formula for recall.
Eq. (4) depicts the formula for accuracy.  Table 6 compares the following evaluation metrics -Precision, Recall, F1-score, and Accuracy for all the three datasets studied in this research. The metric values were significantly better for the GAN-augmented dataset than other datasets. The total false positives were low, and the total true positives were high after the CNN classification. The CNN model overfits the imbalanced dataset due to a variation in the sample density for the four labels, which amounts to a high number of false positives and the low number of true positives, leading to low-performance metric values. Table 7 represents the confusion matrix (C.F.) used for a 4-class classifier. Furthermore, the ROC curves and the C.F. matrices for the three datasets are presented in Fig. 16-21. The ROC curve for the original imbalanced dataset is presented in Fig.16

Limitations and Threats to validity for an original dataset:
The obtained results will become invalid if additional MRI scans are injected into the original dataset resulting in a balanced dataset. A balanced dataset would yield significantly better results as well as address the problem of overfitting while training. The ROC curve for the dataset augmented using geometric transformations is presented in Fig.18. Although this curve is better than the original dataset, it fails to predict the early onset of dementia and its severity accurately. The accuracy score obtained by the CNN model for this dataset is only 57%. The authors think this might be that the feature distribution of the images is altered when augmented like this. This increases the False Positive Rate (FPR) when the model is tested on a test set. The Confusion Matrix (C.F.) for the dataset augmented using geometric transformations is plotted in Fig. 19. The False Positive Rate (FPR) has dropped significantly compared to the C.F. of the original dataset as the dataset is now balanced, and a higher number of image samples are available for training. But this dataset achieves only 57% accuracy, which is not satisfactory or acceptable when working in the medical field to predict early dementia or its severity. The accuracy score for the prediction of early-onset or MCI for this dataset is only 42%, which is not a significant improvement over the original dataset. Hence, a novel GAN-augmented dataset is proposed to overcome and address this problem and better predict MCI and Dementia severity

Limitations and Threats to validity for dataset augmented using geometric transformations:
These results would become invalid if one were to take utmost care to prevent geometrically transformed samples while augmenting the dataset using this method or by discarding them if, at all, they were present in the dataset after careful analysis of the dataset in the pre-processing stage. Geometrically transformed samples are mislabeled because the tumor size gets altered while augmenting, and these samples are placed in the incorrect class. The results above would become invalid if other augmentation techniques other than transformations can be implemented, preventing a geometrically transformed sample.

C. DCGAN-augmented dataset
The ROC curve obtained for the novel D-BAC system is presented in Fig. 20 Fig.17 and Fig.19, contributing to the massive accuracy improvement. The prediction accuracy of MCI (class 1) is 77% for the D-BAC model, which is the highest compared to existing literature and highest amongst the three datasets. The obtained results could become invalid if the DCGAN model used for synthesizing and augmenting images couldn't generalize well to the input dataset and, therefore, produced poor quality images that were very different from the actual images supplied. This could lead to images that have been mislabeled, and such images can lead to a poorly trained model or even an underfitted model. Each image generated by the DCGAN needs to be assessed for its affinity to that particular class.
Furthermore, Table 8 presents the evaluation metric scores for each label using the novel D-BAC model. All the values obtained are SOTA when compared to existing literature and models.    The images augmented using Progressive-growing GAN are displayed in Fig. 25. This is another augmentation technique that was applied to augment MRI scans using the original dataset to ensure the stability of the generated images. This shows that other GAN-based augmentation techniques can also generate a new dataset from the original dataset.

VI. CONCLUSION
Since there is a need to explore multi-class classification in dementia along with a sufficient number of MRI images, the paper presented a novel D-BAC system that uses a GAN-based data augmentation technique for training and classification of dementia into various categories depending upon the severity. The paper addressed the critical problem of imbalanced datasets using GAN augmentation to balance the class labels and created a newly balanced dataset. The balancing dataset has been done with the help of two techniques: using geometric transformations, and the second method uses GAN. The GAN-based dataset proved to be superior to geometric transformations as the spatial conformity of the image is unaltered, which helps the CNN model generalize well to unseen test samples. The GANaugmented dataset achieved an accuracy of 81% using a conventional CNN model and 86% using VGG-19. This is a massive improvement over the existing state-of-the-art models, which achieved only 49% and 52% accuracy. Early-onset of dementia or MCI has been studied in the paper, and the proposed model could predict MCI with an accuracy of 74%. A novel GradCAM-based approach was also used to analyze the CNN model visually. The "blue shift" presented in this paper displayed the progressive features in different stages of dementia which would help the medical practitioners make decisions. Finally, progressive resizing was applied to the GAN-augmented dataset, and dementia classification was done using ResNet-18, ResNet-36, VGG-16, and VGG-19. Out of these, an accuracy of 83% was obtained using VGG-16, and an accuracy of 86% was obtained using VGG-19.
The future scope of this research includes applying other data augmentation techniques to improve the accuracy of the proposed model. Optimization of the total number of features to speed up the computation time. Tuning hyperparameters to find better optimizers and features to improve early dementia prediction