Predictive data mining approaches in medical diagnosis : A review of some diseases prediction

Article history: Received: October 18, 2018 Received in revised format: December 20, 2018 Accepted: January 8, 2019 Available online: January 8, 2019 Due to the increasing technological advances in all fields, a considerable amount of data has been collected to be processed for different purposes. Data mining is the process of determining and analyzing hidden information from different perspectives to obtain useful knowledge. Data mining can have many various applications, one of them is in medical diagnosis. Today, many diseases are regarded as dangerous and deadly. Heart disease, breast cancer, and diabetes are among the most dangerous ones. This paper investigates 168 articles associated with the implementation of data mining for diagnosing such diseases. The study concentrates on 85 selected papers which have received more attention between 1997 and 2018. All algorithms, data mining models, and evaluation methods are thoroughly reviewed with special consideration. The study attempts to determine the most efficient data mining methods used for medical diagnosing purposes. Also, one of the other significant results of this study is the detection of research gaps in the application of data mining in health care. © 2019 by the authors; licensee Growing Science, Canada.


Introduction
We live in a world where large volumes of data are collected every day and analyzing such data plays an essential role in business management (Han et al., 2011).In the past, traditional methods were used to analyze the data, which relied on manual operations.Data analysis using the traditional method was timeconsuming and frustrating operations.Furthermore, it was impractical in many cases.Knowledge discovery is considered a significant challenge.The purpose of extracting knowledge is to discover useful knowledge, and data mining is one of the steps in knowledge discovery to obtain useful information.Data mining is the process of detecting and extracting hidden information, patterns and specific data connections of the prediction idea.Data mining is a new discipline with different applications known as one of the ten leading sciences influencing technology.Wherever the data exists, data mining is also meaningful, for instance: Market Basket Analysis, Education, Manufacturing Engineering, Customer Relationship Management, Fraud Detection, Intrusion Detection, Lie Detection, Customer Segmentation, Financial-Banking, Corporate Surveillance, Research Analysis, Criminal Investigation, Telecommunication, and Healthcare.
Today, the healthcare industry generates large amounts of complex data on patients, hospital resources, diagnosis of diseases, electronic patient records and medical devices.More copious amounts of data are an essential resource for data mining.There is a vast potential in healthcare data mining applications, and some of the most critical applications in healthcare data mining are prediction and diagnosis, treatment effectiveness, healthcare management, fraud and abuse, customer relationship management, and the medical device industry (Koh & Tan, 2011).Choosing the wrong treatment for patients will not only waste time and money but also can cause adverse effects such as the death of patients.Therefore, a method for diagnosing and selecting the appropriate treatment is essential for patients.Data mining can help with the prediction and determination of the diseases in this area.In this study, concerning the importance of early detection, 168 articles on heart disease, breast cancer, and diabetes have been selected to review their performance in the field of prediction.After the initial review of these articles, 85 research is chosen for the analysis throughout 1997-2018.We hope the present study will be helpful for the future studies.The paper is prepared as follows: Section 2 explains knowledge discovery in databases and data mining concepts.Section 3 describes the research strategy used in this study.Section 4-6 thoroughly evaluate and report the review results of the heart diseases, breast cancer, and diabetes mellitus.Finally, the conclusion and future work recommendations are presented in section7.

Knowledge Discovery in Databases
Knowledge discovery in databases (KDD) is the process of determining useful and helpful knowledge from the collection of the data.The steps of knowledge extraction are necessary to achieve essential knowledge, and blindly data mining can easily lead to meaningless patterns, which is very dangerous.Fig. 1 displays the knowledge discovery steps.

Data Mining
Data mining is one of the steps of knowledge discovery in a database as an effort to gather helpful information.Data mining is a new discipline with various applications known as one of the top ten sciences affecting technology.There are various major data mining techniques have been developing and applying in projects including classification, clustering, and association rules.

Research Strategy
In this paper, 168 articles on heart disease, breast cancer, and diabetes were selected.After the initial searches, 85 research studies were selected for analysis and final examination between 1997 and 2018.Fig. 2 demonstrates the area and the number of these articles separately.These studies were identified by using the databases like IEEE Xplore, Google Scholar, Science Direct, and Springer Link.

Heart Diseases
Cardiovascular or heart diseases are heart conditions that include diseased vessels, structural problems, and blood clots.Heart disease is so significant that many people have tried to investigate further for early diagnosis and effective treatment of cardiovascular diseases.Using data mining from information related to heart patients can create valuable knowledge to improve heart disease diagnosis.Studied research on heart disease is selected between 2008 and 2018.Among the 40 studied research, five articles have been studied in the form of a review paper, and 35 articles are associated with applications.

Literature Review
The application of data mining begins a new dimension to cardiovascular disease prediction.Several data mining techniques are used for identifying and extracting valuable information from the clinical dataset (Srinivas et al., 2010).Researchers investigated numerous ways to implement data mining in healthcare to achieve an accurate prediction accuracy.

   
Table 1 exposes comprehensive information about the all implemented methods, and the concept of each paper is discussed as follows: Palaniappan and Awang ( 2008) developed a prototype intelligent heart disease prediction system using data mining techniques, namely, Decision Trees, Naive Bayes, and Neural Network.They used the CRISP-DM methodology to build the mining models.Results showed that each method has its unique strength in realizing the objectives of the defined mining goals.Ensemble approaches, which use multiple data mining algorithms, have confirmed to be an effective technique of improving classification accuracy.Das et al. (2009) introduced a methodology for diagnosing of the heart disease.They propose a Neural Networks ensemble method using SAS base software by combining three independent Neural Networks models.They obtained 89.01%classification accuracy from this ensemble model.Tu et al. (2009) proposed the use of a Bagging algorithm to diagnose heart disease in patients.They compared the effectiveness of the Bagging algorithm with the Decision Tree algorithm.In the end, the results show that Bagging algorithm increases the accuracy and this algorithm has better performance and efficiency than the Decision Tree.Rajkumar and Reena (2010) 2016) compared the performance of C4.5, CART, and RIPPER as a fuzzy rules generator to be used on the fuzzy expert system.The combination of data mining and fuzzy expert systems have been successfully carried out in this research to diagnose coronary heart disease.Joshi et al. (2016) presented a Decision Tree-based classification technique for accurate heart disease prediction.The results determine that the accuracy of the proposed method is better than other methods that are discussed in this paper.Malav et al. (2017) suggested an efficient hybrid combination of K-Means clustering algorithm and Artificial Neural Network.They compared Naive Bays and K-Nearest Neighbor models with the hybrid method, and the hybrid approach gave a higher accuracy rate.Samuel et al. (2017) developed a fuzzy analytic hierarchy process technique that computes the global weights for the attributes based on their contribution.The performance of the newly suggested Decision Support System was evaluated by using 297 records and 13 attributes of heart disease patients.
Al-Maqaleh and Abdullah (2017) proposed an intelligent predictive system using classification techniques for heart disease diagnosis, namely, J48 Decision Tree, Naive Bayes, and Multi-Layer Perceptron Neural Network.The experimental results are evaluated by the common performance metrics like accuracy, F-measure, and ROC graph.Dekamin and Sheibatolhamdi (2017) provided a data preparation method based on clustering algorithms with higher efficiency and fewer errors.Naive Bayes, KNN, and Decision Tree are used for classification.According to the results, the proposed method is highly successful.Babu et al. (2017) provided a prototype heart disease diagnosis using data mining technique such as Genetic algorithm, K-Means algorithm, Mafia algorithm, and Decision Tree classification.The results show that Decision Tree has great efficiency after applying a Genetic algorithm.Bhargava et al. (2017) undertook an experiment on an application of mining algorithm CART to predict the heart attacks and to compare the best available method of prediction.They evaluated the performance of the CART algorithm by calculating the time taken, confusion matrix, f-measure, recall, precision, and prediction accuracy.Singh et al. (2018a) tried to devise out a model that gives a highly accurate prediction of heart disease.
They have done a combination of Genetic and Naive Bayes technique.The Research developed a hybrid model of both these techniques using Python 3.6 platform.Kulkarni et al. (2018) used the Decision Tree classification algorithm to assess the events related to heart disease.Their work was mainly concerned with the development of a data mining model with the Random Forest classification algorithm.Also, their work was a kind of review paper, and they discussed some classifiers too.Shirwalkar et al. (2018) showed that each algorithm contains specific functions which are helpful to diagnose heart disease.Their work was a kind of review paper and focused on classification and prediction methods of data mining using Naive Bayes and improved K-Means algorithm.Singh et al. (2018b) developed an effective heart disease prediction system using the Neural Network for predicting the risk level of heart disease.The obtained results have illustrated that the designed diagnostic system can effectively predict the risk level with 100% accuracy.Wadhawan (2018) developed a system prototype which can help determine and extract hidden knowledge related to heart disease.The proposed technique combines rule mining using Apriori algorithm and Mafia algorithm as well as classification using K-Nearest Neighbors algorithm to predict the heart diseases efficiently.Kurian and Lakshmi (2018) introduced an ensemble classifier approach that is the combination of three classifiers namely K-Nearest Neighbor algorithm, Decision Tree, Naive Bayes.The ensemble model can be used to give predictions with better accuracy than the individual classifiers.2017) proposed a systematic review that investigated the studies that were performed in cardiology using data mining techniques.Four hundred and seven papers from between 2000 and 2015 were identified, and finally, 149 studies were selected.The obtained results showed that hybrid approaches appear to be more interesting to researchers.

Classification Technique Analysis
The classification technique is one of the main data mining techniques used in all the studies.Table 2 and Fig. 3 compare the classification methods used in heart diseases diagnosis.The Decision Tree and the Bayesian Classifier method are utilized more than other methods.

Decision Tree Method
Decision Tree algorithm is based on contingent possibilities.Decision Trees create rules, and a rule is a provisional statement that can easily be followed by humans and used within a database to recognize a set of records (Oracle, 2008).Unfortunately, some works of literature have not determined the name of the model used in the Decision Tree method.

Clustering Technique Analysis
Clustering technique finds clusters of data objects that are similar in some senses to one another (Oracle, 2008).Table 6 and Fig. 6 compare the clustering methods in heart diseases diagnosis.

Table 8
The overall review of the evaluation methods

Breast Cancer Diseases
Breast cancer forms in the breast cells and can occur in men and women, but it is much more common in women.Survival rates of breast cancer have increased, and the number of deaths associated with this disease is due to factors such as earlier detection (MayoClinic, 2018a).The studied research on breast cancer is selected between 1997 and 2018.Among the 23 studied research, three articles have been studied in the form of a review paper, and 20 articles are associated with applications.

Literature Review
The utilization of data mining opens a new dimension to breast cancer prediction.Many data mining techniques are used for recognizing and obtaining valuable information from the clinical dataset (Srinivas et al., 2010).Researchers studied various ways to implement data mining in healthcare to reach a perfect prediction accuracy.Table 9 reveals complete information about the all implemented methods, and the concept of each paper is reviewed as follows: Burke et al. (1997) compared the prediction accuracy of the TNM staging system with Artificial Neural Network statistical models.The result of this paper shows that the prediction of the Artificial Neural Network was more accurate than the TNM staging system.Kuo et al. (2001) made a new system for the classification of breast cancers by using Decision Tree technique.Prediction accuracy, sensitivity, and specificity are some of the evaluation models that are used to estimate the performance of the proposed system.Hassanien and Ali (2004) presented a Rough Set method for generating classification rules.This study showed that the theory of Rough Sets seems to be a useful tool.Bellaachia and Guven (2006) offered an analysis of the prediction of survivability rate of breast cancer patients using data mining technique namely the Naive Bayes, Back-Propagated Neural Network, and the C4.5 Decision Tree algorithms.The results illustrated that the C4.5 algorithm is better in comparing other techniques.Chang and Liou (2008)

Classification Technique Analysis
Table 10 and Fig

Table 11
The overall review of the classification methods

Artificial Neural Network Method
Table 13 and Fig. 10 compare the Artificial Neural Network models.Unfortunately, some works of literature have not determined the name of the model used in the Artificial Neural Network method.

Clustering Technique Analysis
Table 14 and Fig. 11 compare different clustering methods in breast cancer diagnosis.Unfortunately, some works of literature have not determined the name of the method used in this technique.

Table 16
The overall review of the evaluation method  

Diabetes Disease
Diabetes mellitus refers to a group of diseases affecting the use of blood sugar or glucose in your body.
Glucose is vital to your health, as it is an important energy source for the cells that makes up your muscles and tissues.Diabetes conditions include diabetes type1 and diabetes type2 (MayoClinic, 2018b).The studied research on diabetes mellitus is selected between 2013 and 2018.Among the 22 studied research, two articles have been studied in the form of a review paper and 20 articles are associated with applications.The utilization of data mining reveals a new way to diabetes prediction.Many data mining techniques are used for identifying and collecting helpful knowledge from the clinical dataset (Srinivas et al., 2010).
Researchers studied different approaches to implement data mining in healthcare to reach an excellent prediction accuracy.model.Santhanam and Padmavathi (2015) used the K-Means method to remove the noisy data and Genetic algorithms to find the optimal set of features with Support Vector Machine as a classifier for classification.Prajwala (2015) discussed two classification algorithms namely Decision Trees and Random Forests considering 256 data samples.The experimental results show that the redistribution error rate of the Random Forest is less than the Decision Tree.Thirumal and Nagarajan (2015) proposed research that several data mining algorithms such as Naive Bayes, Decision Trees, K-Nearest Neighbor and Support Vector Machine algorithm have been discussed.The experimental results show that K-Nearest Neighbor provides lower accuracy compared to other algorithms.Perveen et al. (2016) followed the Adaboost and Bagging ensemble techniques using the J48 Decision Tree as a base learner to classify patients with diabetes mellitus.This paper concluded that the overall performance of the Adaboost ensemble method is better than the bagging method.Shukla and Arora (2016)

Classification Technique analysis
Table 18 and Fig. 13 compare the classification methods in diabetes diagnosis.The Decision Tree, Bayesian Classifier, and K-Nearest Neighbors are more common than the other methods.  

Clustering Technique Analysis
Table 22 and Fig. 16 compare the clustering methods in in diabetes diagnosis.Unfortunately, some works of literature have not determined the name of the model used in this technique.

Evaluation Technique analysis
Table 23 and Fig. 17

Table 24
The overall review of the evaluation methods

Conclusion
This paper reviewed the predictive data mining approaches in heart disease, breast cancer, and diabetes diagnosis.The number of 168 articles associated with the implementation of data mining for medical diagnosis between 1997 and 2018 were identified.After the initial investigations, 85 empirical studies were selected for the final review.The obtained results reveal that a significant number of studies have used classification technique.Also, researchers have achieved better prediction accuracy results with hybrid and ensemble models.Furthermore, in most research, the performance of different data mining models is compared to each other.Comparison of the different clustering methods has appeared that K-Means clustering is the most common clustering method.Additionally, the Decision Tree algorithm, Bayesian Network, and Neural Network are three widely used classification methods based on the comparison of the different classification methods.Moreover, the most frequently used Decision Tree models are CART and C4.5, and for evaluating and comparing the models, prediction accuracy is widely used.This paper recommends using large datasets to guarantee the performance of the prediction model.Further, model performance improvement techniques such as Ant Colony Optimization Algorithms and Particle Swarm Optimization are very little used, and it is better to use these techniques more.As mentioned, hybrid and ensemble models give better prediction accuracy results, so using these models are recommended in the future studies.Therefore, with regards to the mentioned notes about the research gaps and the use of predictive data mining approaches in medical diagnosis, new studies can be reached in this field.

Fig. 1 .
Fig. 1.Steps of knowledge discovery in databases

Fig. 2 .
Fig. 2. The area and the number of selected articles

Fig. 5 .
Fig. 5. Comparison of the Artificial Neural Network models

Fig. 7 .
Fig. 7. Comparison of the evaluation methods . 8 compare the classification methods used in breast cancer diagnosis.The Decision Tree and the Artificial Neural Network are used more than other methods.

Fig. 8 .
Fig. 8.Comparison of the classification methods Fig. 9 compare the Decision Tree models in classification.Unfortunately, some works of literature have not determined the name of the model used in the Decision Tree method.

Fig. 9 .
Fig. 9. Comparison of the Decision Tree models

Fig. 10 .
Fig. 10.Comparison of the Artificial Neural Network models

Fig. 12 .
Fig. 12.Comparison of the evaluation methods

Fig. 15 .
Fig. 15.Comparison of the Artificial Neural Network models

Fig. 16 .
Fig. 16.Comparison of the clustering methods compare the evaluation methods in diabetes diagnosis.The prediction accuracy is more common than other methods.

Fig. 17 .
Fig. 17.Comparison of the evaluation methods

Table 1
The overall review of the data mining techniques in heart diseases diagnosis Alizadehsani et al. (2012)arning algorithms such as Naive Bayes, K-Nearest Neighbor, and Decision List.The results are compared by Tanagra tool and confirm that the Naive Bayes algorithm has the best processing time and prediction accuracy.Shouman et al. (2011)recommended a model that outperforms Decision Tree J48, Voting and Bagging algorithm in the early prediction of heart disease.One of their results shows that applying the Voting algorithm increases the efficiency of the Decision Tree.Alizadehsani et al. (2012)attempted to find a way for specifying the lesioned vessel when there are not enough electrocardiogram changes.Means clustering for classification by adjusting their related parameters and measures.
They also selected Principal Component Analysis (PCA) algorithm to reduce the attribute dimension.Baihaqi et al. ( Kumari and Godara (2011)014)proposed a review paper on various Decision Tree algorithms in classifying and predict heart disease.They studied different researches with some useful techniques.Patel et al. (2017)suggested a review paper.They described a prototype using data mining techniques mainly Naive Bayes and Weighted Associated classifier and entirely explained these two techniques.Shouman et al. (2012)offered a review paper that identifies gaps in the research on heart disease diagnosis.One of the results shows that hybrid data mining techniques have shown promising outcomes in the diagnosis of heart disease.Kumari and Godara (2011)recommended a review paper to review data mining classification techniques namely, Ripper Classifier, Decision Tree, Artificial Neural Networks, and Support Vector Machine.They compared these techniques through the lift chart, error rate, sensitivity, specificity, and accuracy.Kadi et al. (

Table 2
Comparison of the classification methods Fig. 3. Comparison of the classification methods

Table 3
The overall review of the classification methods

Table 4
Comparison of the Decision Tree models Fig. 4. Comparison of the Decision Tree models 4.1.1.2.Artificial Neural Network Method Artificial Neural Network is an algorithm based on a biological Neural Network that is used to estimate or approximate functions depending on a large number of generally unknown inputs.(Oracle, 2008).Unfortunately, some works of literature have not determined the name of the model used in Artificial Neural Network method.

Table 5
Comparison of the Artificial Neural Network models

Table 6
Comparison of the clustering methods

Table 7
Comparison of the evaluation methods

Table 9
The overall review of the data mining techniques in breast cancer diagnosis Oskouei et al. (2017)013)r predicting breast cancers.They used a Decision Tree, Neural Network, Genetic algorithm, and Logistic Regression to diagnosis the breast cancer.The results showed that the Decision Tree has the lowest prediction accuracy and the Logistic Regression model had a higher accuracy rate.Sarvestani et al. (2010)evaluated several Neural Network formations.The performance of the statistical Neural Network structures, RBF Network, General Regression Neural Network, and Probabilistic Neural Network are tested and investigated for breast cancer diagnosis problem.Anunciaçao et al. (2010)explored the applicability of Decision Trees.In their work; first, they made different association rules by default and then made one questionnaire based on that rules and important defined factors which can be related to cancer disease.Einipour (2011)proposed a model by the combination of Fuzzy Systems and Ant Colony Optimization algorithm.Conclusions showed that the proposed approach would be capable of classifying cancer instances with a high accuracy rate.GhassemPour et al. (2012)compared a model-based data mining technique with a Neural Network classification technique.This paper shows that adding an ensemble approach can improve the results.They also used evaluations model to compare the performance of these models to others.Rajesh and Anand (2012) applied a C4.5 classification algorithm to breast cancer dataset to classify patients.This paper also compared the performance of the C4.5 algorithm with other classification techniques.Raad et al. (2012)Proposed a Neural Network approach especially the MLP, and the RBF.A detailed comparison between these two models showed that the constructed model from the RBF Neural Network is much more efficient than other models based.Hota (2013)applied various intelligent techniques including Artificial Neural Network, Support Vector Machine, Bayesian Network, and Decision Tree to classify a data that is related to breast cancer health care with 699 records.Experimental results revealed that the accuracy rate of the ensemble model is better than a single individual model.Yadav et al. (2013)prescribed a procedure that uses Support Vector Machines and Decision Tree to classify 100 breast cancer patients into two classes.Results showed that Support Vector Machine gives the 98% prediction accuracy.Sumbaly et al. (2014)presented a Decision Tree data mining technique for early detection of breast cancer using Weka tool.Experimental results confirm the effectiveness of the proposed model.Senturk and Kara (2014)applied seven algorithms including KNN, Decision Tree, Naive Bayes, Logistic Regression, MLP, Discriminant Analysis and Support Vector Machine for diagnosis of breast cancers.Also, this paper used evaluations model like accuracy to measure the performance of the models.Joshi et al. (2014)compared various classification rules to predict the best classifier.Authors claimed that they used 47 classification algorithms for recognizing healthy people from patients.Their experimental results showed that the results of approximately 13 techniques within those 47 applied techniques were same.Majali et al. (2014)presented a system to diagnosis cancer using Frequent Pattern Mining growth algorithm.Also, this research used the Decision Tree algorithm to predict the possibility of cancer.Coutinho and das (2017)presented new hybrid fuzzy clustering algorithms.This research used three kinds of fuzzy clustering, and the results obtained with the proposed hybrid methods indicate that it is possible to increase the performance of the conventional fuzzy clustering algorithms.Chaurasia et al. (2018)used three popular data mining algorithms namely Naive Bayes, RBF and Decision Tree J48 to develop the prediction models using a large dataset and the obtained results indicated that the Naive Bayes performed the best with a classification accuracy of 97.36%.Cherif (2018) investigated a novel approach for classification of breast cancers.It selected the most reliable attributes and then weights them according to their level of reliability.This research speeds up the performance of KNN by clustering method.Kharya (2012) recommended a review paper about applying different classification techniques for diagnosis of breast cancers.This paper studied different methods including DT, Bayesian Network, Logistic Regression, SVM, Naive Bayes, Association Rule Mining, and Artificial Neural Network.Shrivastava et al. (2013)gave an overview of the use of data mining techniques on breast cancer data.They observed that the Neural Network and Decision Tree approach mostly used by various researchers to create a predictive model.Oskouei et al. (2017)reviewed several types of research works for diagnosis, treatment or prognosis breast cancers.They studied 125 references and based on the results of this study, most of the research works are concerned about comparing the accuracy rate of data mining various algorithms or techniques.

Table 10
Comparison of the classification methods

Table 12
Comparison of the Decision Tree models

Table 13
Comparison of the Artificial Neural Network models

Table 14
Comparison of the clustering methods

Table 15
and Fig.12compare the evaluation methods in breast cancer diagnosis.The prediction accuracy is obviously more common than other methods.

Table 15
Comparison of the evaluation methods

Table 17
The overall review of the data mining techniques in diabetes diagnosis Dewangan and Agrawal (2015)2015)on about the all implemented ways and methods, and the concept of each paper is reviewed as follows:Meng et al. (2013)compared the performance of Artificial Neural Networks, Logistic Regression and Decision Tree C5 models for predicting diabetes.The results indicated that the C5 Decision Tree model performed best on classification accuracy.KratiSaxena et al. (2014)diagnosed diabetes mellitus using K-Nearest Neighbor algorithm with MATLAB software.The result is showing that as the value of K increases, accuracy rate and error rate will also increase.Kandhasamy and Balamurali (2015)compared machine learning classifiers namely J48 Decision Tree, KNN, and Random Forest, and SVM to classify patients with diabetes mellitus using eight essential attributes.kumarDewangan and Agrawal (2015)attempted to make an ensemble hybrid model by combining Bayesian classification and multilayer perceptron techniques.The results show that hybrid models give higher accuracy than the individuals' Rani and Kautish (2018))8)ongside information mining procedure scaled conjugate gradient to predict diabetes mellitus.This paper incorporates calculations of Random Forest tree and scaled conjugate gradient.diabetic is a life-threatening complication.Meza-Palacios et al. (2016)proposed the development of a fuzzy expert system that was a new and innovative proposal to help doctors.Garg et al.  (2017)showed the comparison of different classification algorithms using Weka tool.These classification algorithms include Naive Bayes, Bayes Network, Decision Tree J48, Sequential Minimal Optimization (SMO)classifier, and Random Forest.The experimental results propose that SMO classifier has the best performance.Xu et al. (2017)proposed a prediction model based on a Random Forest.This method can significantly reduce the risk of disease by digging out a clear and understandable model for type2 diabetes from a medical database.The results show that using Random Forest can cause a better prediction accuracy.Nilashi et al. (2017)suggested a new system for diabetes prediction using clustering, noise removal, and prediction techniques.This research uses CART method to generate the fuzzy rules.Also, EM and PCA were used clustering.Khaleel et al. (2017)used One-Attribute-Rule algorithm to adjust the attributes weights and propose a new classification algorithm that improves the accuracy of the K-Nearest Neighbor algorithm.Sambyal et al. (2018)compared six different data mining algorithms.This system is trained and tested in Microsoft Azure, and the brilliant created system has been deployed as a web service using the python language.Lakshmi et al. (2018)introduced system use the Decision Tree and K-Nearest Neighbor algorithms, but there is not any information about the results.Das et al. (2018)studied Decision Tree J48 and Naive Bayesian techniques.This research will assist to propose a quicker and more efficient method for diagnosis of diabetes.Wu et al. (2018)recommended a hybrid model based on data mining techniques.They used the improved K-Means algorithm and the Logistic Regression algorithm that achieve higher accuracy of prediction.Sisodia and Sisodia (2018)designed a model which can prognosticate the likelihood of diabetes with maximum accuracy.This research is used three machine learning classification algorithms namely Decision Tree, Support Vector Machine algorithm and Naive Bayes to detect diabetes at early stages.Patil and Tamane (2018)used the combination of techniques such as feature selection with K-Nearest Neighbor and Naive Bayes approach to developing a predictive model.Joshi and Alehegn (2017)studied and reviewed various data mining techniques such as K-Nearest Neighbor, Naive Bayes, Random Forest, and J48.Rani and Kautish (2018)reviewed the most cited research papers of highest journals to investigate data mining techniques which are generally used to predict some chronic disease like diabetes.

Table 18
Comparison of the classification methods

Table 19
The overall review of the classification methods

Table 20 and
Fig. 14 compare the Decision Tree classification models.Unfortunately, some works of literature have not determined the name of the model used in the Decision Tree method.

Table 20
Comparison of the Decision Tree models Fig. 14.Comparison of the Decision Tree models6.2.1.2.Artificial Neural Network Method

Table 21 and
Fig. 15 compare the Artificial Neural Network classification models.Unfortunately, some works of literature have not determined the name of the model used this method.

Table 21
Comparison of the Artificial Neural Network models

Table 22
Comparison of the clustering methods

Table 23
Comparison of the evaluation methods