Classification Models on Cardiovascular Disease Prediction using Data Mining Techniques

,


Introduction
In today's modern world cardiovascular disease is the most lethal one [1]. According to World Health Organization about more than 12 million deaths occurs worldwide, every year due to heart problems [2]. With the turn of the century, cardiovascular diseases (CVDs) have become the leading cause of mortality in India [3]. The term "cardiovascular disease" includes a wide range of conditions that affect the heart and the blood vessels, and the way blood is pumped and circulated through the body, also are considered forms of heart disease [4]. This disease attacks a person so instantly that it hardly gets any time to get treated with. One of the best ways to diagnose a heart disease is by using echocardiography. Echocardiography or echo is a painless test that uses sound waves to create pictures of the heart. The test gives information about the size & shape of the heart and how well the heart chambers & valves are working [5].
The test also can identify areas of heart muscles that are not contracting normally due to poor blood flow or injury from a previous heart attack [6,7]. So, diagnosing patients correctly on timely basis is the most challenging task for the medical fraternity. The Healthcare industry today generates huge amounts of complex data about patients, disease diagnosis, hospitals resources and medical devices, which is difficult to process by manual methods [8]. Data mining provides a set of tools and techniques to find patterns and extract knowledge to provide better patient care and it combines statistical analysis, machine learning and database technology to extract hidden patterns and relationships from large databases [9]. The detection of heart disease from various factors or symptoms is a multi-layered issue which is not free from false presumptions often accompanied by unpredictable effects. Effective and efficient automated heart disease prediction can benefit healthcare sector and this automation will save not only cost but also time [10]. This research paper highlights the utility and application of three different classification models of data mining techniques for prediction of cardiovascular disease to facilitate experts in the healthcare domain [11][12][13].

Methods
A total of 336 records with 24 attributes were obtained from the Echocardiography database and list is given in the Table 1. The attribute "Diagnosis" was identified as the predictable attribute with value "1" for patients with heart disease and value "0" for patients with no heart disease. The present study conducted by using simple random sampling (SRS) method [14,15], with an SRS each patient has an equal chance of being chosen. Every patient who comes for the ECHO are included and paediatric patient are excluded for the study, patient's personal information is collected such as Age, Sex, Smoking, Alcohol    Table 2 shows a confusion matrix for a two-class classification problem. It is a contingency table that contains information about actual and predicted classifications done by a classification system. It is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class. This is the key to the confusion matrix.
The entries in the confusion metrics that can be calculated from the coincidence matrix, we use hypothesis below: • True Negative (TN) is the number of correct predictions that an instance is negative.
• False Positive (FP) is the number of incorrect predictions that an instance is positive.
• False Negative (FN) is the number of incorrect of predictions that an instance negative.

•
True Positive (TN) is the number of correct predictions that an instance is positive.

Results
The experimental results have shown that Neural Network outperformed J48 Decision tree and Naïve Bayes in the domain of predicting heart diseases cases. Three different experiments were conducted on the echocardiography report dataset, the experiment was designed to evaluate the performance of a J48 Decision tree, Neural Network and Naïve Bayes to investigate the effect of attribute selection on the model. Neural Network has proved its performance as a powerful classifier in term of accuracy (97.91%), Sensitivity (97.2%) and Specificity (98.4%), which makes it a good classifier to be used in the medical field for classification and prediction.

Discussion
In this research, the data mining classifiers J48 Decision tree, Naïve Bayes, and Neural Network are considered for the comparisons to classify and diagnose heart diseases for the patient data set from medical practitioners. For better understanding results of confusion matrix for all the three algorithms given in Table 3.
Classification Matrix displays the frequency of correct and incorrect predictions [17]. It compares the actual values in the test dataset with the predicted values in the trained model. Table 3 shows the results of the Classification Matrix for all the three algorithms, 88%, 97% and 50% patients are correctly diagnosed that they have disease and predicted as having the disease. 12%, 3% and 50% patients are wrongly diagnosed as they don't have but, they had disease, it is very dangerous    The performances of the models in this study were evaluated using the standard metrics of accuracy, precision, F-measure which were calculated using the predictive classification table, ROC area was also used to compare the performances of the classifiers [18-20]. Based on the results given in Table 4.
Three different experiments were conducted on the dataset of 336 instance 24 attributes using three algorithms: J48 Decision Tree, Naive Bayes and Neural Network, respectively it took 0.02, 1.81-and 0.02seconds time to build the models. The True positive rate for J48 Decision Tree algorithm (0.87), Neural Network (0.97) and Naive Bayes (0.5). Whereas Neural Network performed best in True Positive Rate 0.97 and Naive Bayes performed lowest in True Positive Rate 0.5. The True Negative Rate for J48 Decision Tree algorithm (0.96), Neural Network (0.98) and Naive Bayes (0.93), it was observed that all the three algorithms J48 Decision Tree, Naïve Bayes and Neural Network performed best in True Negative Rate. Therefore, the models are best in identifying Negative cases. The comparative ROC curves based on risk of heart diseases. Neural Network has outperformed than J48 Decision Tree, Naïve Bayes with area under curve (AUC) 0.97, AUC for J48 Decision Tree was 0.94 and Naive Bayes 0.79. Overall, these results of area under curve reveals better performance of Neural Network.

Conclusion
The analysis shows that Neural Network performed better in predicting the heart disease with 97.91% of accuracy, this model will have high true negative rate which makes it a handy tool for junior cardiologists and echo technicians to screen out patients who have a high probability of having the disease and transfer those patients to senior cardiologists for further analysis.

Conflicts of Interest
There are no conflicts of interest for the present study.