Smartphone-Based Heart Disease Classification Using Machine Learning Techniques

Heart disease patients can occasionally endure considerable delays in diagnosis, and making inaccurate diagnoses without the assistance of a medical professional can be fatal. In order to solve this, the study offer applying a variety of machine learning algorithms to datasets related to heart disease so as to forecast the eventual condition.. These techniques include Naive Bayes (NB), k-nearest Neighbor, Artificial Neural Network (ANN), Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Decision Tree Classifier (DT), and XGBoost Classifier (XGB). The Cleveland, Hungarian, Swiss, Long Beach, VA, and Statlog (Heart) datasets—a total of 1190 occurrences with 11 features—were used in the study. An 80:20 ratio was used to split these datasets into training and test sets. Metrics like recall, precision, accuracy, and F1 score were used to assess the model's performance. XGBoost Classifier outperformed the other eight models, with a 93.7% accuracy rate. The Android Studio framework was then used to integrate the trained model and create a mobile application for categorizing heart diseases.


INTRODUCTION
In Bhutan, coronary heart disease was responsible for 799 deaths, or 19.00% of all deaths (WHO, 2018).With an ageadjusted death rate of 147.58 per 100,000 people, Bhutan is ranked 66th in the world (World Life Expectancy, 2018).Symptoms of heart disease include a fast heart rate, dizziness, difficulty breathing, and chest pain.Obesity, hypertension, and excessive cholesterol are among the causes.Specialized cardiologists are needed for the diagnosis of heart infections, which entails a rigorous process to decide the best course of action.In underdeveloped countries, patients with heart disease often face significant delays in diagnosis and must travel long distances for treatment, creating a substantial burden.Heart disease can be prevented with a correct prediction, but it can also be fatal if the prediction is inaccurate.Early diagnosis and taking necessary precautions such as regular exercise, eating healthy, and avoiding tobacco can help to prevent heart disease.Further, Diagnosis in the absence of medical personnel can be achieved with machine learning algorithms.According to data released by the Ministry of Information and Communications, mobile cellular subscribers in Bhutan increased by 24,558 during the second quarter of 2020.(Subba, 2020).Smartphones are one of the most frequently used technologies in the modern world, and a variety of smartphone-based health applications help individuals.The development of mobile apps to predict heart disease will benefit doctors and medical staff.The objective of this task is to create a smartphone application that categorizes cardiac diseases using machine learning algorithms.The major goal of this initiative is to grow a classifier for heart disease using many machine intelligence techniques and connect it with the Android application.To achieve the goals of the research, the following question needs to be addressed: Research Question 1: Which machine learning algorithms give high accuracy?

Motivation:
The question is motivated by the goal of identifying the best machine learning algorithms that offer the best classification accuracy for heart illness.

RELATED WORK
Numerous prior studies have performed the UCI Machine Learning (ML) dataset extensively to predict cardiac/heart illness.Various levels of accuracy have been obtained using various machine-learning methods.Begum et al. proposed 6 machine learning algorithms: LR, RF, XGB, SVM, ANN, KNN to determine whether or whether the patient has heart problems (Begum et al., 2021).Using data from the UCI library, they employed random forest classifiers, which performed flawlessly on both test and train sets, out of the six machine learning methods.Using the UCI heart illness dataset, the authors recommended employing machine learning and deep learning techniques for categorizing heart disease.(Bharti et al., 2021).Three different methods (identifying features and recognizing outliers when there's no outliers, deciding features and recognizing outliers when there are outliers) were adopted by the researchers, and it was found that deep learning algorithms outperformed machine learning algorithms with an accuracy of 94.2%.A machine learningbased diagnosis technique for heart disease detection using the UCI repository was presented by Li et al. (Amin et al., 2019).Classification techniques including ANN, KNN, LR, DT, SVM, and NB are used to identify HD.The following methods have all been utilized in the feature selection process: relief, LASSO, mRMR, Fast Conditional Mutual Information (FCMIM), and local learning-based feature selection (LLBFS).The Leave-one-subject-out crossvalidation (LOSO) technique is also used to identify the optimal hyper-parameters for the best model selection.Using the Cleveland heart disease dataset, the suggested methodology has been examined.The experimental findings shown that, in comparison to the conventional feature selection method, the suggested feature selection algorithm picks features with higher classification accuracy and efficiency.Nevertheless, optimization methods could be used to raise a prediction system's HD diagnosis accuracy.Raihan et al. presented an Android system to predict ischemic heart disease (IHD) using data mining techniques (Raihan et al., 2017).They gathered 917 cases with 10 features from two cardiac hospitals and achieved an accuracy of 86 per cent on the decision tree.Alalawi and Alsuwat used a variety of machine learning algorithms in their 2021 study to diagnose cardiac problems using datasets for circulatory and heart disorders.These methods included kNN, RF, DT, SVM, ANN, Gradient Boosting Classifier, Voting Classifier, LR, and NB (Alalawi & Alsuwat, 2021).The RT was the best classifier in the heart disease dataset (73% accuracy), but the Gradient Boosting Classifier fared better with all features (94% accuracy) in the cardiovascular disease datasets.The performance was evaluated using the models' F1 score, accuracy, recall, and precision.Padmaja et al. researched the Cleveland data repository to classify heart disease using 9 machine-learning algorithms (Padmaja et al., 2021).94 percent accuracy was achieved by random forest classifiers, which performed better than the other models.Nowadays, many researchers have used ensemble learning techniques for classification problems.In order to improve predictive performance, it makes use of several machine learning methods.The authors in (Lakshmanarao et al., 2021) proposed feature selection and ensemble learning techniques to classify heart disease from the UCI and Kaggle datasets.In their research, they used ANOVA F-value and Mutual information techniques as the feature selection methods to reduce the number of features.For the classification, random forest, Adaboost, stacking classifier, and voting classifier.The stacking classifier gave an accuracy of 99%.

METHOD 3.1. System Overview
Machine learning is an area of study within computer science and artificial intelligence that focuses on using data and algorithms to automatically learn and improve user experience without the need for expert programming.supervised learning, unsupervised learning, and reinforcement learning are the three categories into which machine learning algorithms are divided.The proposed study will use supervised machine learning methods.In order to forecast the future, supervised machine learning algorithms use the labeled data as input.The test accuracy served as the basis for the hyperparameter adjustment.The mobile application was combined with the best model that yields excellent accuracy.

Dataset
Regarding the dataset, a combination of five well-known datasets was utilized for training.These datasets include the Statlog (Heart) Data Set (Siddhartha, 2020)

Evaluation Metrics
In the classification problem, confusion or a classification matrix are used to assess the overall accuracy of the model.It is used to characterize the model's performance using test data, when the actual values are known ahead of time.It is possible to compute the model's f1-score, precision, and recall using the confusion matrix (Table 2).

Precision:
The ratio of accurately predicted occurrences to all expected examples for a class is known as precision.This accuracy measuring approach works effectively when each class in the dataset has the same number of examples.The precision formula is provided in Equation 2.

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 (𝑃) = 𝑇𝑃 𝑇𝑃 + 𝐹𝑃
(2) Recall: The percentage of accurately predicted positive results to actual class observations is known as recall.To calculate the recall from the confusion matrix, use equation 3.

𝑅𝑒𝑐𝑎𝑙𝑙 (𝑅) = 𝑇𝑃 𝑇𝑃 + 𝐹𝑁
(3) F1-Score: Finding a basic accuracy to assess the model's performance may not be dependable due to the unequal amount of examples in each class.This is due to the possibility that classes with greater instance frequencies will provide larger true positive rates.As a result, the f1-score handles an uneven number of occurrences within the class.It finds the F1 score by using false positives and false negatives, in contrast to the simple accuracy method.Equation 4 provides the formula to calculate the F1-score value.

Naive Bayes
The Naive Bayes classification algorithm relies on Bayes' Theorem and assumes that all features used to predict the target class are independent of one another.The term Naive refers to the feature-independent assumption, which states that the values of one feature are unaffected by or dependent on the presence or characteristics of other features.The confusion matrix and classification report obtained from the Naive Bayes is given in Tables 5 and 6.

k-Nearest Neighbor (KNN)
One of the most basic machine learning methods is the KNN algorithm.It keeps track of every instance in ndimensional space that corresponds to the training set.It used the entire data as the training set, rather than splitting it into train and test sets.When the new data is required to predict, it goes through the entire dataset to find the knearest instances which are calculated by the distance functions (Euclidean, Manhattan, or Hamming distance).Tables 7 and 8 provide the classification report and confusion matrix that were obtained from the k-Nearest Neighbor:

Support Vector Machine
The goal of SVM's creation was to recognize and categorize patterns (Hearst et al., 1998).Finding the best line or decision boundary to divide an ndimensional space into classes is the goal of the SVM technique.This will make it possible to classify additional data points quickly in the future.We refer to this decision boundary as a hyperplane.SVMs are faster and perform better when working with smaller amounts of data than neural networks.Tables 9 and 10 provide the classification report and confusion matrix that were obtained by the SVM.

Decision Tree Classifier
Although it is typically used to solve classification problems, a decision tree is a supervised machine learning technique that performs well on both regression and classification tasks.In this structure, internal nodes represent branches illustrate decision rules, dataset features, and each terminal node indicates an outcome.At the top of the tree sits the root node, initiating the decision-making process.
Humans can readily comprehend and analyze the decision tree because it visualizes like a flowchart diagram, which closely resembles human thought processes.(Navlani, 2018).Tables 11 and 12 present the confusion matrix and classification report generated by the Decision Tree Classifier.XGBoost builds trees sequentially, with each successive tree aiming to reduce the mistakes of the preceding one.This is in contrast to random forest classifiers, which build trees concurrently (Jamtsho & Riyamongkol, 2019).The XGBoost classification technique includes early stopping to avoid overfitting problems and enables for parameter adjustment based on demands, such as the learning rate.Tables 13 and 14 present the confusion matrix and classification report generated by the XGBoost Classifier.Ensemble learning, which combines multiple classifiers to tackle complex problems, provides the foundation for Random Forest, a collection of decision trees.The random forest classifier uses a majority vote to determine the final outcome after gathering predictions from each decision tree, as opposed to depending just on one.By increasing the number of trees, overfitting is less likely to occur and accuracy is improved.Tables 15 and 16 provide the classification report and confusion matrix that were obtained by the Random Forest.

Artificial Neural Network
A branch of artificial intelligence influenced by biology, artificial neural networks (ANN) are brain-like models.
Three layers comprise an ANN: hidden layers, output layers, and input layers.The input layer is in charge of entering the system with the first data so that the concealed levels can handle it.The output layer is in charge of generating the final product.In the experiment, heart disease was predicted using four hidden layers and sigmoid activation.Training the model with a batch size of eight yielded 100 epochs.2.

CONCLUSION
This study developed a heart disease classifier by proposing eight distinct machine learning algorithms.The UCI machine learning heart dataset has been extensively utilized in the majority of studies to diagnose heart disease.Little to no study has been done to identify heart illness, but with the aid of the heart disease dataset-a compilation of datasets from five different repositoriesit can be done.An 80:20 split of the dataset is made into train and test sets.From the experiment, the XGBoost classifier outperformed other algorithms with an accuracy of 93.7%.A simple mobile application using the Android Studio platform was developed to make the model usable.In the future, the data can be collected from the Bhutanese health sector and develop a heart disease classifier.Moreover, the datasets can be merged together to design a robust classifier.

Figure 1
Figure 1 depicts the suggested framework's overview.Within the framework, the dataset is split 80:20 between the train and test sets.Seven alternative algorithms were employed to train the heart disease dataset: Naive Bayes, Support Vector Machine, Decision Tree Classifier, Random Forest Classifier, XGBoost Classifier, Artificial Neural Network, and Logistic Regression.

Table 19 . Summary of testing accuracy Algorithm Test Accuracy
Android Studio platform was used.The official integrated development environment (IDE) for creating Android apps, which are based on the Java and Kotlin languages, is called Android Studio.The mobile application allows the user to type features and then it is sent to Django REST API for the classification of the disease.The Django REST API is deployed in the Heroku server.The sample snapshot is given in Figure