Early Detection of Cognitive Decline Using Machine Learning Algorithm and Cognitive Ability Test

Department of Computational Intelligence, SRM Institute of Science and Technology, Chennai, India Department of Computer Science and Engineering, Saveetha Engineering College, Chennai, India Department of Computer Science and Engineering, Annamacharya Institute of Technology and Sciences, Rajampet, Andhra Pradesh, India Department of Computer Science and Engineering, Pandit Deendayal Energy University, Gandhinagar, India


Introduction
Physical health and mental health carry equal importance in human life. Elderly people are normally affected with cardiovascular disease, cancer, diabetic, arthritis, depression, kidney disease, pulmonary disease, dementia and alzheimer's disease. Dementia is a cognitive decline in mental ability, which severely affects routine life. A person suffering from dementia is always in need of someone to accomplish his everyday activity since the disease affects cognitive function in multiple domains. Alzheimer's Disease (AD) is one among the overall general neurodegenerative cortical dementia.
e incurable neurodegenerative disorder primarily affects the elderly population. It gradually progresses from mild cognitive impairment to Alzheimer's and other kinds of Dementia. e projections are specifically high in South Asian countries such as India and China. e rise in AD disease is proportionate to the elderly population and it is foreseen that 5% to 7% of elders are affected by dementia. By 2050, 1 in 5 persons of low-and middle-income countries will be above 60 ages which may escalate the disease population [1].
Dementia will be an inevitable result of demographic transition and it causes damage to the brain cells. e stages of dementia span start with no cognitive decline to severe decline. e different types of dementia are Alzheimer's Disease (AD), Vascular Dementia (VaD), Frontotemporal Dementia, etc.
is impairment affects the capacity of synapses to converse with one another which in turn affects person's thinking, emotion, and behavior. Various sorts of dementia align with a specific type of brain cell decay in brain regions. A significant level of specific proteins presents inside and outside of synapses, making it difficult for brain cells to remain healthy and to connect with others. e foremost section to be affected is "Hippocampus" region of the brain cell, which is the central point of learning and memory in the cerebrum. is is the reason why cognitive decline is perhaps the initial indication of Alzheimer's Disease. ere is no effective handling or treatment available for the disease. e feasible option is to train the population with related risk factors and the defending factors.
People affected by diabetes are growing exponentially and it is expected that 640 million people will be affected by the year 2040 [2,3]. As indicated by the World Alzheimer Report 2014, people who had hypertension in their midlife (individuals age around 40-64 years old) were bound to create vascular dementia in later life [4]. Choked or decreased blood flow to the brain is the basic symptom of dementia. Many people with diabetes have brain changes that are a hallmark of Alzheimer's disease. Hypertension causes hurt on the heart and veins and it happens when the power of blood pushing against within our veins is excessively high.
is causes the cells to work tougher, which makes them less effective. A recent exploration in a journal named Neurology Trusted Source shows that elderly people have more average BP that is likely to create tangles and plaques in brain. ere exists some evidence for a relation of SBP with AD, specifically tangles [5,6].
Multifactor analysis predicts Alzheimer's disease more precisely by extracting heterogeneous information present in health records. It is feasible to predict AD using administrative, clinical information rather than images. Machine learning algorithms are the ideal alternative to apply to a large volume of health data [7]. e focus of our proposed work is twofold: (i) Predicting people with possibilities of Alzheimer in their late life by doing careful analysis on various risk factors associated with Alzheimer's. (ii) Conducting a neuropsychological test called Cognitive Ability Test (CAT) to assess the cognitive decline of a person [8].
e proposed work considers general health data available in "Data World" repository. We apply 2-stage classifier algorithms in the proposed work. In the first stage, support vector machine learning algorithm and Random Forest algorithm are used to find the associated risk factor of individuals. In the second stage, to enhance the prediction accuracy, cognitive ability test was conducted among the people identified by the stage 1 classifier. e cognitive ability of a person is estimated using CAT test, which contains simple yes or no type questions, values ranging from 0 to 30. e CAT test results are applied to Multinomial Logistic Regression to classify the severity of the disease. e score between 25 and 30 is classified as "No Dementia," between 13 and 24 as "Uncertain Dementia," and less than 13 as "Severe Dementia." e proposed work combines multiple factors associated with Alzheimer's to predict the possibility of disease more accurately. e paper is ordered as follows: Related work on dementia and Alzheimer's disease is explored in chapter 2. Chapter 3 describes the proposed work that includes the relation between Alzheimer's with type 2 diabetes and hypertension dataset, which relates to our claim. e application of multinomial logistic expression on CAT test results to enhance the prediction process is also explored. Chapter 4 justifies the results and relevant discussions. Conclusion of the present work and extension is mentioned in chapter 5.

Literature Survey
Mild Cognitive Impairment (MCI) leads to Alzheimer's and various kinds of Dementia in later life. Exceptional intelligent inability of the Alzheimer's diseased patient weighs more burden on family members and public. It has a physical, psychological, social, and economic impact. Careful review was conducted in various aspects such as cause of disease, the different test applied, clinical diagnosis procedure, statistical techniques used, AI/Machine learning techniques used, and so on in order to find the research gap. e research findings are tabulated in Table 1. e summary leads to an understanding of the correlation between diseases such as diabetes, hypertension, depression, and cognitive impairment. Few drugs are also identified as the main cause of Alzheimer's and related dementia. Various statistical techniques such as ANOVA, ttest, Kaplan-Meier estimates (survival estimation function), and QUADAS-2 (diagnosis test against ref value) are used to analyze the data. e main issue to be addressed by the use of a statistical tool is sampling error present in the dataset. is sampling error in the data set would lead to wrong conception. Application of suitable machine learning algorithm will provide optimal solution. e extensive review shows the importance of analyzing cognitive level of the patient and also the role of machine learning algorithm for prediction and classification. Our proposed research work considers diabetes and hypertension details of the patients and applies suitable machine learning algorithm and cognitive ability test to predict the risk of Alzheimer in person's late life.

Proposed Work
e proposed model aims at early prediction of cognitive decline of the people using cognitive data, clinical data, and physical data from his history. Dementia has a lengthy preclinical period during which there are no perceptible cognitive impairments, but neurogenerative changes are happening. erefore, it is essential to identify individuals at high risk of dementia in an earlier stage to protect them from the possibility of disease in their late life [16]. Population based study among precise age group supports our understanding with fewer possible bias. Studies looking at the mid-age people with chronic diseases are particularly helpful since the chances of dementia development are higher for It is perceived that the duration of disease has a direct association with a decline in cognition Zhao et al. [13] BMC Endocrine Disorders, 2020 e authors conducted a cohort study among type 2 diabetes patients with age >55 years. e data set is divided into three groups with reference to the level of HbA1C : HbA1c < 7.7%, HbA1c between 7.8% and 8%, and HbA1c greater than 8%. Univariate and multivariate regression analysis was done to find the correlation between the level of HbA1C and cognitive decline It is noted that HbA1c greater than 8% is an important factor to determine the level of cognitive decline Moore et al. [14] PLoS One, 2019 e data set of TADPOLE grand challenge in association with ADNI is taken in their case study. ADAS-13 score and normalized ventricles volume were used to analyze the severity of the Alzheimer's disease. e random forest model is simulated in their study.
e outcome of the model was effective and comparable with other methods. However, image processing adds overhead.
Altaf et al. [15] Biomedical Signal Processing and Control, 2018 Clinical data and MR imaging available in Alzheimer's disease neuroimaging initiative (ADNI) dataset is analyzed using a multiclass classification algorithm. MMSE test is included finally.
e images are classified into three different classes: AD, normal, and MCI. Overhead due to image processing needs to be addressed.
Exalto et al. [16] Alzheimer's Dement, 2014 Retrospective cohort study was conducted among 9480 Kaiser Permanente members from 1964 to 1973 of age 40-55(CAIDE). e midlife vascular risk factors are analyzed using C statistic and Kaplan-Meier estimates to predict dementia e disease prevention strategies need to pinpoint life course perspective on maintaining vascular health them. Midlife hypertension increases the risk of lacunar infarcts and stroke, which in turn increases the risk VaD. e existing system requires imaging data or fluid collection, which imposes a delay in early detection. Huge measure of Electronic Health Records available in structured and unstructured manner supports timely diagnosis and decisions. Collection of administrative, electronic medical data requires less amount of time. Viable use of information and attaining precise outcome is the major challenge in different fields, particularly in medical field. Utilization of Machine learning is found in almost all fields like image processing, language automation, computer vision, e-business, etc. e advent of predictive models of machine learning can be applied to these valuable digitized health records for the early risk prediction of VAD and AD.
Chronic diseases like diabetics, blood pressure, heart problems, and kidney infection are increasing worldwide. It was witnessed that diabetics and blood pressure have strong relation with cognitive decline in elderly people [24,25]. e helpless diabetic control and bad adherence to physician instructions are the primary reason for the elevation of AD or dementia in their late life [26,27]. e early detection aids to prevent AD with the help of proper diabetic control, drugs, cognitive training, and so forth. Reference [28] research finding states that there is a direct connection between glucose dysregulation and neurodegeneration. Diabetes is viewed as a key risk factor for cognitive impairment and few investigations prove that cognitive dysfunction influences both older and younger persons with diabetes [29,30]. Type 2 diabetes patients ought to be e CAT test helps the physician to assess the cognitive function of the patient in the early stage itself [13].
e proposed work applies a support vector machine learning algorithm to identify people with a high risk of cognitive impairment in their late life and they are exposed to CAT screening tests [31,32]. e test results are analyzed with the help of Multinomial logistic regression to classify them as "Severe dementia," "Uncertain Dementia," and "No Dementia."

Data Set.
e primary focus of the proposed work is to provide health care service to the elderly population residing in resource poor areas. People with ages between 40 and 65 years are considered as mid-age people in our case study. e proposed work mainly considers hypertension, diabetes as the most common risk factor for cognitive decline. e appropriate data set available in "Data World" repository is taken and filtered with the required features. Plausible crosssectional examination provides the best technique for analyzing a causal connection between diabetics, blood pressure (BP), and the occurrence of dementia. To enhance the analysis, we consider two age classifications, namely midlife <65 years and late life >65 years. e emphasis on midlife is especially relevant for dementia counteraction for two reasons. (i) Midlife is sufficiently early to make an association between risk factors and Alzheimer's before the initiation of neurodegeneration. (ii) A few examinations presented the connection between raised BP in midlife (age 40-64 years) and the beginning of dementia and AD in their late life. is study considers general health data available in "Data world," the world's largest collaborative data community. e database consists of 19 features and 2361 patients records whose snippet is depicted in Table 2.

Feature Selection.
e data set contains 19 features describing age, cholesterol, glucose, etc. Since Hemoglobin A1c (HbA1c) is the important measure of long-term control of glucose in our body, it was mainly considered in the early identification of AD [30]. Along with HbA1C, the patient's systolic and diastolic BP was examined. e list of features and their descriptions are given in Table 1. HbA1c value greater than 6.5 is considered diabetes positive. e given data set is separated into two distinctive sets with respect to the age to assess the midlife attributes and their association with AD. Exploratory Data Analytics is performed to summarize the main characteristics of data and to find important features with the help of visual aids. Multivariate analytics is performed to understand different features and their interaction. Table 3 summarizes the features description. e correlation factor associated with each pair of features helps to extract relevant attributes for study. e highly influential factors such as age, gender, HbA1c, glucose, systolic pressure, diastolic pressure, and cholesterol are considered in our case study.

Process Flow.
e process flow of the proposed model to predict the level of cognitive decline is shown in Figure 1.
e diabetes and pressure data set collected from "Data World" is preprocessed to remove redundant information and missing values. e highly influencing features are extracted with the help of correlation values. We apply 2stage classification model to determine the cognitive decline more accurately. In the first stage, the selected set of features is applied to the classifier algorithm to identify the associated risk among the population. Support Vector Machine and Random Forest algorithm are used for risk classification. In the second stage, we apply CAT among the people identified in stage 1. e Multinomial Logistic Regression algorithm examines the CAT results and medical care is provided for the people predicted as "Severe Alzheimer". CAT.

Algorithm
Step 1: Input: Patients with Blood Pressure, Diabetes dataset. Filter 40-65 age group data. Handle inconsistent and missing data Output: Preprocessed data Step 2: Identify correlation between features of the dataset using multivariate analysis Step 3: Do Initial classification for Alzheimer Disease using stage1 classifier Step 4: If AD possible perform CAT test on those patients Step 5: Perform second level classification to find AD Dementia, Uncertain Dementia, No-Dementia using stage 2 classifier

Flowchart.
e process flow of the proposed model to predict the level of cognitive decline is represented as a flowchart in Figure 2.

Support Vector
Machine. SVM is the commonly used supervised classifier, which classifies data in N-dimensional space using a hyperplane. It has been applied in enormous healthcare applications in predicting diseases from structural data [33]. Figure 3 shows the classification graph of SVM. e line function y � ax + b helps to easily differentiate linearly separable data. e SVM uses the line equation transformed into a hyperplane which is applied in the prediction process. e model tries to find out optimal bias and variance for both train and test data set. e

Security and Communication Networks
comprehensive review has proven that support vector machine provides good performance for big data and healthcare applications.

Random Forest.
e Random Forest algorithm trains n different decision trees with different data subset and tuning parameters [34]. It combines the output of all n trees   with the help of a voting mechanism. Hence, it is also called Ensemble learning. e working principle of Random Forest algorithm is depicted in Figure 4.

Mental Ability Test.
e cognitive decay of a person ranges from mild to severe. e primary causes include medications, disorder among blood vessels, despair, and dementia. Dementia represents a severe loss of mental functioning and the common type is Alzheimer. Cognition of a person includes a blend of processes in the brain involved in all facets of his life. It includes his memory capacity, thinking skill, language, and talent to learn new things. A cognitive ability test is performed to examine the cognitive impairment of a person. With the help of a detailed review conducted to screen the cognitive function, we framed multiple questions to check the decline in mental function, and the test questionnaire is given in Table 4 [35]. e CAT test scores from 25 to 30 are considered as normal [36]. Items address orientation, memory, attention, recall, naming objects, responding to verbal and written commands, writing a sentence, and copying a figure are the tasks considered in CAT to evaluate the cognitive status of persons. e informant accompanied the patients, and the questions are administered to the informants without unduly alarming the patient. e maximum CATscore is 30 points. A score of 25 to 30 suggests no cognitive decline, 13 to 24 recommends moderate decline, and less than 12 indicates severe cognitive decline. In every year, the CAT score of Alzheimer's diseased person declines about two to four points on average. e snippet of CAT dataset is shown in Table 5.

Multinomial Logistic Regression on CAT.
e multinomial logistic regression model is applied to predict the severity of illness with respect to the correlation existing among the dependent variables as "Severe dementia," "Uncertain Dementia," and "No Dementia." e multinomial logistic regression is applicable for the class of probe, which has more than two outcomes. Our proposed model owns three different outcomes. For N different outcomes, there are n-1 models developed as a set of independent binary regression. One outcome is referred to as Pivot class, and others are regressed against this reference class. e probabilities for the N categories are estimated based on dependent variables. � kXi: β1, β2 . . . , βn)

pr(Yi
where Y is the dependent variable and X is the set of explanatory variables, βk is the regression coefficient for the k th category of Y. Based on the estimated probability the output is categorized by the algorithm with reference to the threshold.  Figure 2: Flow chart.

Stage 1 Classifier. In stage 1 we train SVM and Random
Forest algorithm to diagnose chronic disease and to identify the associated risk. e data set contains 2361 records of mid-age people. e glyhb values in the range between 4% and 5.6% are considered as normal values, and between 5.7% and 6.4% informs more chance of being affected with diabetes. Values above 6.5% mean they have diabetes. Patient's systolic and diastolic pressure are the other important factors to be considered in the development of Alzheimer's. Systolic pressure less than 120 mm Hg and diastolic pressure less than 80 mm Hg is considered as normal value and the range 120-139 of systolic and 80-89 of diastolic is the prehypertension values. Persons having >140 mm Hg of systolic and >90 mm Hg of diastolic pressure are considered as having hypertension. e chosen data set contains 526 records of persons with no diabetics and pressure, 1187 records of persons having diabetics or pressure, and 648 records of patients having both diabetics and pressure. Since the presence of either pressure or diabetes increases the chance of dementia, the total 1835 patients having either diabetics or pressure or both exposed to CAT test to assess their cognitive power. e details of records are visually represented in Figure 5. Does the patient comfort level change when they are in new places? Y 1 4 Does the patient able to manage their medication schedule? Y 2 5 Does the patient able to manage time while doing tasks? Y 1 6 Does the patient confuse about certain things? Y 2 7 Does the patient able to understand context? Y 1 8 Does the patient confuse to identify known persons? Y 2 9 Does the patient experience difficulty to recognize people familiar to them? Y 1 10 Does the patient behavior is different from their earlier stages? Y 1 11 Does the patient have imaginations? Y 2 12 Does the patient forget to do regular tasks? Y 2 13 Does the patient have problem in counting numbers or figures? Y 2 14 Does the patient able to manage position or direction? Y 1 15 Does the patient has shown less priory or interest towards hobby or passion? Y 1 16 Does the patient understand situations or explanations? N 1 17 Does the patient forget recent activities? N 1 18 Does the patient have any cognitive issues previously? Y 2 19.
Does the patient not able to recall main or important occasions? Y 2 20 Does the patient not able to recollect some important days in his life. Y 2  Specificity measure is given as follows: (iii) Specificity Specificity (or) True Negative rate measure is given as follows: e terms TN, TP, FP, and FN denote True negative (person with no chance of Alzheimer is identified as 'No Alzheimer'), True positive (person subjected to Alzheimer is predicted as 'Alzheimer'), False Positive (healthy person is detected as 'Alzheimer'), and False Negative (person suffering from Alzheimer is identified as healthy), respectively.

Performance Analysis of Stage 1 Classifier.
e proposed SVM classifier outperforms with 0.90 AUC value and for Random Forest AUC is 0.74. e probabilistic classifier shows the tradeoff between sensitivity and specificity. Table 2 shows the performance comparison of SVM and Random Forest algorithm used in our case study. NDP represents patients with No Diabetics and No Pressure, CRD represents patients having either diabetes or pressure, HDP represents patients Having both Diabetics and Pressure. Performance comparison of each measure is given in Table 6. e same is visually represented in Figures 6-8.
In Random Forest algorithm, it is important to consider the subsampling of data points in the tree construction process. More subsampling or no subsampling results in inconsistent effects. It is possible to enhance the accuracy of Random Forest algorithm by varying the parameters. Due to the unavailability of sample data set, it is not probable to fine-tune the parameters for RF in our case study.

Stage 2 Classifier.
e CAT test result dataset contains a minimum age of 47 years and a maximum of 96 years. Clinical Dementia Rating shortly termed as CDR is a numeric scale used to quantify the severity of dementia indications and its score ranges from zero (none) to 3(severe). Summarization of CAT data set is provided in Table 7 and the same is visually represented in Figure 9.
e confusion matrix of multinomial logistic regression is considered for analysis. True Positive Rate and True Negative rate for the three different classes of output is given as follows: TPR No Alzheimer � 98%, TPR Uncertain_Alzheimer � 85% and TPR Alzheimer � 81%. True Negative rates of the three different classes are given as follows. TNR No Alzheimer � 86%, TNR Uncertain_Alzheimer � 73%, and TNR Alzheimer � 75%. e results of the proposed model show that the model can predict with improved accuracy provided ample amount of dataset for training.

Analysis with Bench Mark Models.
e study of neurodegenerative diseases caused by the ageing of brain systems necessitates brain banking. e Brazilian Aging Brain Study Group's Brain Bank collects a large number of elderly brains and their related disorders. It encourages researchers to look at a variety of aspects of ageing brain processes and related neurodegenerative illnesses. Table 8 represents the performance analysis of our proposed model with existing benchmark models. e table explores the detail of the data set used by different authors and the employed machine learning algorithms. Reference [13] considers people above 65 years from 2002 to 2010. e main features considered in their case study include Implantable Cardioverter Defibrillator -10 codes, laboratory results, medication codes, sociodemographics, illness of a person, and his family [37]. ey have trained and tested dataset with random forest, logistic regression, and SVM and to predict Alzheimer's incident in 1, 2, 3, and 4 subsequent years. For comparison average of 4 years is taken. e Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge is organized in association with Alzheimer's Disease Neuroimaging Initiative (ADNI) to find people at risk of Alzheimer's. e historical measurements of the people were considered to predict future implications. TADPOLE challenge facilitates early identification of Alzheimer disease with the help of appropriate algorithm [14] used data from the TADPOLE grand challenge and claimed their result with benchmark SVM, which produces 62% AUC and classification accuracy of 52%. Reference [15]

Conclusion
Automated healthcare techniques support physicians in making correct decisions on patient care in resource poor rural areas. e timely identification of risk factors with the help of AI based model's safeguards the person from late life Alzheimer's. e availability of an appropriate dataset with relevant attribute is a cumbersome process in the development of a more accurate model. e proposed method supports the statistically significant diagnosis of persons at risk for Alzheimer's disease simply based on administrative health records. It allows earlier and accurate screening for further clinical testing. Our proposed work analyzes the influence of hypertension and diabetes on Alzheimer's disease. Support Vector Machine algorithm is more suitable when the dataset is not continually distributed. e performance of SVM is relatively good due its convex optimization nature. Survey conducted on the population with chronic disease for cognitive assessment provides the degree of cognitive decline in the community. e CAT test results are analyzed with the help of multinomial logistic regression to exactly identify the possibility of Alzheimer's in patient's late life. To achieve optimum accuracy of the model, a large sample size is essential. In the future, the proposed work may be extended with more classifiers by accumulating a huge volume of samples and an increased number of surveys on CAT tests. Time series survey among the population for CAT test will further improve the precision of prediction.
Data Availability e datasets used and/or analyzed during the current study are available in the following repository: https://staff. pubhealth.ku.dk/∼tag/Teaching/share/data/Diabetes.html.

Conflicts of Interest
e authors declare that they have no conflicts of interest to report regarding the present study.

Authors' Contributions
A. Revathi was responsible for conceptualization, data curation, formal analysis, methodology, software, and writing-original draft; R.Kala Devi was responsible for supervision, writing-review and editing, project administration, and visualization; Kadiyala Ramana was responsible for software, validation, writing-original draft, methodology, and supervision; Rutvij H.Jhaveri was responsible for supervision, writing-review and editing, and visualization; Madapuri Rudra Kumar was responsible for data curation, investigation, resources, and software; M.Sankara Prasanna Kumar was responsible for visualization, investigation, formal analysis, and software.