A detailed investigation in determining Alzheimer’s disease and its risk factor using different classi ication techniques

The prevalence of genetic disorders has recently crept surprisingly high. Neurodegenerative complications, speci ically, pose physical and mental stress to parents and caretakers. These complications may be witnessed in the case of dementia. The general dementia type that accounted for between 60 to 80 per cent of psychiatric illnesses was Alzheimer’s disease. At an earlier stage, illness detection serves as a critical task that helps the diseased person to enjoy a decent quality of life. It has become a much necessitated strategy towards relying on automated techniques like data mining approach for early diagnosis and assessment of risk factors concerned with Alzheimer’s. There has been an unprecedented growth of interest concerned with devising novelized approaches proposed in recent times for classifying the disease. However, there is still a grave need for developing an ef icacious approach for better prognosis and classi ication. Data mining is carried out using different machine-learning approaches to assess the risk factors for Alzheimer’s disease. Through the present research, and we compared numerous classi ication methods such as Decision Tree, Linear SVM, KNN, Logistic Regression, Radial SVM, and Random Forest, and inally reported the most outstanding approach in terms of its accuracy.


INTRODUCTION
Approximately 44 million people have dementia (shree et al., 2014). There are 38 million people with Alzheimer's disease who are struggling. One of the forms of dementia is Alzheimer's disease (Viswanathan et al., 2009;Sosa et al., 2009). Alois Alzheimer's, a German neurologist and physi-cian, discovered Alzheimer's disease in 1906 (Sandeep et al., 2015). Multiple risk factors that lead to the progression of the disease are distinct (shree et al., 2014). Height, Down syndrome, consumption of alcohol and smoke, food style, cholesterol, etc. The signs of this disorder are interpersonal coordination, decision making, total lack of memory and failure of gestures, bad judgment, and irregular moods. The three different steps of ADD care are visiting the general surgeon, doing neuropsychological assessments, and taking MRI scans (Saling et al., 2007) Although as medical specialists such as physicians, there is a signi icant difference between them; medical practitioner never reveals to the outside world their system of prediction of a speci ic illness. Therefore this crisis could be overcome by a prediction approach with expert experience and lead to reliable disease prediction outcomes. We use different kinds of machine learning algorithms for this research.

Literature survey
In over 60 to 80% of dementia cases, Alzheimer's disease accounted. Such disorders remain undiagnosed at an early stage (Sandeep et al., 2017a,b). There are 3 main diagnostic stages via a general practitioner.
Step one is consultation. The 2nd stage includes multiple neuropsychological assessments after MRI scans are taken in the 3rd stage (Thies and Bleiler, 2013). AD requires a screening test can be used, regardless of culture, gender, education and religion, for the subjects. The Dementia Research Group 10/66 formed a network in 1998 and dedicated itself to studies of the highest standards in those areas. This phase is also dependent on the psychologist's mood. In addition it is not easy to prevent human error. This crisis could be resolved by machine based research. So, researchers discovered information using a data mining approach.
Using methods such as analytics, arti icial intelligence and machine learning, data mining can be performed. As different scholars have used data mining was explored for the study of various diseases (shree et al., 2014). Use decision tree and Bayesian classi ication when evaluating the data sets of patients with heart disease (Soni et al., 2011). Classi ication algorithms have been used to classify Parkinson's disease (Tarigoppula et al., 2013). The machine learning approach used in the classi ication of Alzheimer's disease, vascular dementia and Parkinson's disease (Joshi et al., 2010). The whole work illustrates the ef icacy of assuming that the risk factor for proper classi ication of AD, VD and PD is very signi icant. It was determined from assessing 180 related investigations. The study showed a precision of 99.33 per cent obtained using perceptron multilayer and random forest (Tarigoppula et al., 2013). The machine learning investigation governing Alzheimer's disease was discussed in (Escudero et al., 2013).

Architectural framework
The work low of the present study is represented in Figure 1.

Dataset Array
The data obtained here is the most signi icant. The 750 medical reports obtained by various neuropsychologists comprised of datasets. The four age ranges are 65-69, 70-75, 76-79, and over 80 years of age.

Preprocessing
It is a stage the missing and incorrect values can be veri ied. Data preprocessing is not carried out here as there is no risk of missing data.

Options for Attributes
A selection of attributes is the main stage, and certain attributes create a great difference in decision making. The data set comprises 8 attributes that represent the main risk factors related to AD namely, Family history, Age, Environmental toxins, Gender, Head injury, Factors including High BP and cholesterol level, Low education Level, and Lifestyle.

WEKA Tool
The next stage is the grouping. This is done to understand just how the material is being categorized. For research, the WEKA tool is used. The classi ication algorithm runs several times to maximize precision. WEKA has two successful assessors for learning. The irst one is a classi ier and cross-validation is the second one.

Linear SVM (Linear support vector machine)
Linear SVM is the recently discovered classi ication technique for large dataset data mining. Compare to other techniques, Linear SVM is the best performer.

Radial SVM (Radial support vector machine)
Radial SVM is a common kernel feature used in different learning algorithms that are kernelized. It is commonly used in the Methodology of Support Vector Machines (Chang et al., 2010). Do the optimization of an SVM model that can forecast bankruptcy. While the RBF kernel is commonly used in the itting of data for its stability, other common kernels, such as polynomial or sigmoid, are (Joshi et al., 2010).

Logistic Regression
For predicting binary classes it is statistical method. The target variable is dichotomous. Dichotomous means there are only two possible classes. It calculates the probability of an event occurrence.

KNN (K Nearest Neighbor)
The K Nearest Neighbor algorithm has been used in various data analysis because of its simplicity and high accuracy (Xiong et al., 2007). It has been accepted as one of top 10 algorithms in data mining. Estimating k value by 10 fold cross validation, 97.4% of accuracy has been obtained (Wu et al., 2008).

Decision Tree
It is a tree structure that is lowchart like. When the internal node represents a function (or attribute), the branch represents a law of choice, and the outcome is expressed by each leaf node. In decision making, this lowchart-like form supports. Like a lowchart map, it's a hallucination that imitates thinking at the human level. But it is easy to grasp and interpret only decision trees.

Random Forest
It is a managed algorithm for learning. For classi ication and regression, Random Forest is used. The algorithm is the algorithm that is most versatile and simple. Woodland is composed of trees. It lies at the base of the Boruta algorithm, which is a dataset that selects essential attributes.

RESULTS AND DISCUSSION
From the observed classi ication methods that are used for the datasets (i.e.) Risk factors which are major contributors for AD were taken into account. The accuracy from the classi ication of algorithms before standardization was represented in Table 1. The percentage of test set tuples that are appropriately identi ied by the classi ier is the accuracy of the classi ier on a given test set. Post-standardization precision and collection of correlation features are calculated in Table 2 and its cross-validation scores represented in Table 3. From the result, the process was used for determining the model showcasing the best accuracy. From the determined Datasets of AD based on major risk factors is represented in the Correlation matrix Figure 2.

CONCLUSIONS
Different data mining classi ication methods have been compared and graded. The speci icity of the execution of each procedure is observed. Linear Support Vector Machine, Radial Support Vector Machine, Logistic Regression, K Nearest Neighbor, Decision Tree, and Random Forest are the following classi iers used in the prediction of Alzheimer's disease risk factors. Among them, compared with other classi iers, Linear SVM demonstrated better accuracy. This study shows the Linear SVM classi ication method serves as the best protocol for the prediction of various genetic disorders. We conclude with this analysis linear SVM classi ication technique can use another genetic disease risk prediction process.