Heart Disease Prediction Using Machine Learning

The heart disease cases are rising day by day and it is very Important to predict such diseases before it causes more harm to human lives. The diagnosis of heart disease is such a complex task i.e., it should be performed very carefully. The work done in this research paper mainly focuses on which patients has more chance to suffer from this based on their various medical feature such as chest pain etc. We proposed a system of heart disease prediction that is used to diagnose whether the patient is a victim or not by using the previous medical features of the patient. Support vector machine and k-nearest neighbor algorithms of machine learning are used to predict and classify the patient with heart disease. The models gave satisfactory results and were capable for predicting a heart disease by using k-nearest neighbor and support vector machine which gave a good accuracy in contrast to the algorithms that were used in the previous research such as naive bayes etc.


INTRODUCTION
The heart or blood vessels are effected by a disease called heart disease. Another name for this disease is cardiovascular disease and it is the main reason of death in adults in this era. This is the major cause of deaths in the world and the identification at the early stages is key to successful treatment and cure. Cardiovascular disease (CVD) in all its forms, is a matter of life and death in the planet. This disease is a combination of multiple disease and injuries that effects the heart of a person. [1].
AI has several domains, and one is known as machine learning (ML). It learns from the data given to it as input and it go through that data and find patterns. Human intervention is not too much requires for this. Tasks are not told to the computer instead they are given with the patterns, they detect them and comes to a conclusion. Moreover, ML plays a very important role in the healthcare centers. The data is generated in electronic from in the healthcare [2,[27][28][29][30][31][32][33][34][57][58][59][60]. The use of machine learning in the center of healthcare, the patterns are being found by the algos of machine learning that is very difficult to be detected individually. As machine learning is being more utilized in the field of health care, its providers must need to use a very effective approach so that they have better results [3]. Machine learning algos takes significant features from the data that has more effect and it also helps the doctors, physicians to make better decisions. It also helps them to take steps for the improvement of the health of the patient [4,[35][36][37][38][39][40][41][42][43][44][45].
The use of machine learning in field of health care has much more effects than a human can even think. The doctors cannot be replaced but the use of machine learning gives a better solution for the heath of a person. It is very important area as we are moving towards smart cities and smart health systems [5]. It is used for the better diagnosis of severe diseases. It also helps in our project to detect a human would have chances of a heart disease through their medical features [6].
If someone has chest pain or high blood pressure, then with fewer test he/she can be diagnosed with disease. In the past, when machine learning was not introduced, the doctors do a lot of tests just to confirm whether the patient is a victim of a certain disease or not. It was such a difficult task but now machine learning algorithms [46][47][48][49][50] are used instead. Health care offers the care of millions or billions of people that is why it is becoming the highest earners in many of the countries [7].

II. LITERATURE REVIEW
The leading cause of the heart disease is high blood pressure and stroke because it damages the lining of the arteries, that makes more suspectable for the making of plaque, which narrows the arteries. Early detection of this disease is essential for timely cure. Chang et al. [6] proposed a system in which they proposed a detection system which used Random Forest classifier to identify heart disease. The accuracy of this model is 83%. It predicts heart disease using python. It was published in November 2022. Sonam Nikhar et al. [7] tries to provide a detailed description of the naïve bayes and the decision tree algos, which are applied in research especially for the prediction of heart disease. They concluded that decision tree has much better accuracy then Rishab et al. [8] has used four machine learning algorithms in research, which are Logistic Regression with accuracy 82.89%, naïve Bayes with 80.43% and decision tree with 80.43% also and support vector machine with 81.57%. Irfan Javid et al. [9] in their research used many algos such as random forest, SVM and KNN, LSTM and GRU models of the deep learning as their proposed methodology. Its accuracy was 85.71% accuracy. Jyoti Soni et al. [10] in which three algorithms was used. These were naive bayes with 86.53% accuracy, decision tree with 89% and KNN with 85.53% accuracy.
Md.Julker Nayeem et al. [11] in their research used KNN, naive bayes and random forest algo. Out of which the best accuracy was given by random forest which as 95.63%. Malavika et al. [12]utilized naive bayes, logistic regression, random forest, SVM, decision tree, and knearest neighbor algos in the research. The accuracy of Logistic regression is 86.88%, k-nearest neighbor has 86.88%, support vector machine has 88.52%, decision tree has 78.68%, naive bayes has 88.52% and random forest has 91.80% respectively.
Chithambaram et al [13] algorithms that were used are KNN, SVM, random forest and decision tree. The best accuracy was given by decision tree algo which is 98.83%. Pooja Anbuselvan [14] used various algorithms which are as follows: logistic regression, naive bayes SVM, KNN, decision tree, random fore, and the technique of ensemble which is Boosting are used. The accuracy given by logistic regression was 75.41%, naive bayes 77.05%, support vector machine gave 73.77%, k-nearest neighbor had 57.83%, decision tree gave 77.05%, random forest algo has 86.89%, XGBoost algo which gave 78.69% accuracy.
Gunturu Deepthi et al. [15] in this research they used various algorithms like SVM, naive bayes, decision tree, random forest, logistic regression, Adaboost, XG-boost were used .The dataset was taken from UCI in which 76 attributes were taken, out of which only 14 attributes were selected for the prediction. The best accuracy was given by XGBoost which was 81.3%. The accuracy of SVM was 80.2%, naive bayes 76.9%, logistic regression 79.1%, decision tree 75.8%, random forest 79.1% and Adaboost 73.6%.
To enhance security and risk mitigation in heart disease detection, control systems and convolutional neural networks can be utilized to improve the accuracy and efficiency of diagnosis and treatment [67][68][69] Animesh hazra et al. [16] naive bayes, kNN and decision tree was used. These algo gave 52.33%,45.67 and 52% respectively.

III. METHODOLOGY AND IMPLEMENTATION
The proposed model provides a forecast for the earlier identification of the heart disease. The dataset is taken from Kaggle website, and it is easily available. As the dataset contains no missing value, outliers or any categorical data so the only one step of preprocessing "feature selection" is performed. Support Vector Machine (SVM) is one of the most popular supervised learning algo. Its primary use is for the classification and regression problems. In ML its primary function is to use it for the classification problems. Better accuracy can be predicted using SVM. It is a new type of ML methods which is totally based on statistical learning theory. Because of its higher accuracy, Support vector machine became the main focus in the ML community [22,[51][52][53][54][55][56]. SVM is recently developed within the statistical learning theory. It is applied on a lot of applications such as from time series prediction to the recognition of faces and in the medical for the prediction purposes [23].
SVM has many advantages both of its problems such as separable and non-separable. Linear separable and nonlinear separable problems are included in the separable problems [24].It works on a high dimensional space by mapping the data so that the points of data can be easily categorized even if the given data is not even linearly separable. There is a separator between the categories, then a separator is drawn as a hyperplane by transforming the data. If there is a clear gap between classes that are categorized, SVM performs well in such cases. SVM is memory efficient it has more dimensional space then the total number of samples in it, then its performance is very efficient in such cases. [25].
Fischer linear discrimination and quadratic discrimination can be utilized in heart disease prediction. These algorithms can analyze patterns in medical data and provide insights to doctors for better diagnosis and treatment planning [56]. Further, heart disease prediction may be improved while using mobile agents, blockchain technology and machine learning as well [61][62][63][64][65][66] [26]. It is very productive algo of ML. It gives efficient results in both the regression and classification problems. But it is mainly used for the classification problems in which there are two categories. KNN makes a cluster of data, when new data comes to it. It checks and gives the result. Entry is given to the cluster which has a minimum distance from its neighbor [27]. It finds the distance between all the queries in the data, It selects a number of k neighbors that is closest to the query, then it do voting in the case of classification problems or averaging in terms of regression problems. It is very easy to understand and implement. There is no assumption about the underlying data in KNN so it is very ideal for the use of non-linear problems. There are multiple classes in kNN and it handles it naturally [28]. Once the libraries and dataset are imported, the system is first trained on the data, followed by a testing phase. To read the CSV files, we utilize the Pandas library, and then visualize the data as plots using Matplotlib and Plotly. Figure 3 shows the hyperplane of SVM and Table 2 is showing the results of the algorithm.   Figure 4 shows the training plot of KNN on the chosen dataset Figure 4: KNN TRAINING Figure 4 shows the training plot of KNN on the chosen dataset and Table 3 shows the results of KNN on selected dataset.   The major organ in human body is Heart if it stops working human dies. As the death rate is increasing from this disease it has become compulsory to develop a system which is accurate and make the exact prediction of the presence or absence of disease. As it is a matter of life and death the system must needs to be perfect. Heart disease is a type of disease which is not detected at early stages as its symptoms are very normal. We proposed a system which is accurate and do identification of disease at early stages. ML algos are required for the implementation of this system. The analysis is done on the confusion matrix and it compares the accuracies between the models and also find KNN is the best algorithm as its accuracy is higher than the applied SVM. Many research articles written by researchers they used different methods such as kNN, logistic regression, SVM, decision tree, random forest algo. All these algos gave better accuracy. In our research KNN and SVM gave accuracy which is greater than 82% which is almost equal or greater than the accuracies given by different research. In short ,our accuracy can be made more higher by including more medical features of a patient.