Cloud-Based Diabetes Decision Support System Using Machine Learning Fusion

: Diabetes mellitus, generally known as diabetes, is one of the most common diseases worldwide. It is a metabolic disease characterized by insulin deficiency, or glucose (blood sugar) levels that exceed 200 mg/dL (11.1 ml/L) for prolonged periods, and may lead to death if left uncontrolled by medication or insulin injections. Diabetes is categorized into two main types—type 1 and type 2—both of which feature glucose levels above “normal,” defined as 140 mg/dL. Diabetes is triggered by malfunction of the pancreas, which releases insulin, a natural hormone responsible for controlling glucose levels in blood cells. Diagnosis and comprehensive analysis of this potentially fatal disease necessitate application of techniques with minimal rates of error. The primary purpose of this research study is to assess the potential role of machine learning in predicting a person’s risk of developing diabetes. research the of various machine algorithms, such as naïve Bayes, decision trees, and artificial neural networks, for early diagnosis of diabetes. However, to achieve maximum accuracy and minimal error in diagnostic predictions, remains an immense for further research and to improve the machine-learning tools and techniques available to healthcare professionals. Therefore, in this paper, we propose a novel cloud-based machine-learning fusion technique involving synthesis of three machine algorithms and use of fuzzy systems for collective generation of highly accurate final decisions regarding early diagnosis of diabetes. data pre-processing activities—data cleaning and normalization—and is followed by data splitting for classification. In our study, we divided the dataset for training and testing at a ratio of 70:30 to optimize classification techniques and yield more accurate results in the validation data. After pre-processing, we executed the classification process, which involved training of the three classification techniques (ANN, NB, and DT) followed by validation on our selected dataset. We optimized these techniques until maximum accuracy was achieved. Finally, using a fuzzy system, we synthesized the three prediction results from the three classification techniques to generate the final prediction output. In our study, our proposed system achieved an accuracy rate of 95.2%, outperforming previously applied machine-learning techniques for diabetes diagnosis.


Introduction
Diabetes mellitus, widely known as diabetes, is an increasingly common physiological health issue. A patient with diabetes, or a diabetic, suffers from a critical shortage of insulin, resulting in an inability to adequately process glucose (sugar) [1]. Diabetes is generally classi ed into two types: type 1 and type 2. Type-1 diabetes is characterized by insulin dependency, while type-2 diabetes is characterized by insulin de ciency. Insulin is one of the vital hormones produced by the pancreas, the organ responsible for regulating glucose (blood sugar) levels in the human body. The primary underlying causes of diabetes are an imbalanced diet (i.e., one high in sugary foods), obesity, and genetic inheritance. Recent industrial and technological advancements have signi cantly affected the average human lifestyle, leading to the higher standard of living and accompanying decrease in physical activity commonly observed in developed countries. Accordingly, rates of diabetes have increased, and clinical analysis and effective diagnosis of diabetes have become key subjects of healthcare studies. Traditionally, diabetes has been diagnosed via clinical tests of glucose tolerance levels in patients [2]. Like many other metabolic diseases, diabetes is associated with severe complications such as heart failure, kidney problems, and eyesight issues including complete blindness [3]. An alarming report issued by the Diabetes Research Centre stated that the prevalence of diabetes has increased at a rate of 7% annually and doubled globally during the last decade, with more than 200 million now diagnosed. Research studies have indicated that 8% of the population aged 25-65 suffer from ailments linked to pancreatic dysfunction, and in a sample of 2.2 million of such patients, 17% were adults; most of these patients have high risk of developing diabetes in the near future [4]. Diabetes can be fatal and otherwise can lead to severe, often irreparable damage to multiple organs. There is an immense need for tools and technologies enabling ef cient, accurate investigation and diagnosis to support the decision making of health experts in managing this disease.
Recent studies indicate that accurate and timely diagnosis may prevent 80% of complications in patients with type-2 diabetes. Accurate and timely diagnosis provides a solid basis for effective treatment, helping to minimize cost of treatment and other dif culties for patients [5]. These are the key success factors for prevention of diabetes complications and development of effective treatment strategies. Healthcare professionals can implement such strategies to reduce long-term damage caused by this disease. Due to its signi cant advantages, early detection has become a top priority among healthcare prognosis personnel. Notably, detection of type-2 diabetes requires a higher level of medical expertise, as this disease is more complex compared to type-1 diabetes. One of the most promising new methods for accurate early diagnosis is the use of an arti cial neural network (ANN). ANN is one of a number of recently developed machine-learning methods being implemented to predict disease earlier and more accurately. According to M. S. Shanker in his research paper "Using neural networks to predict the onset of diabetes mellitus" [6], ANN is considered a more suitable approach to early diagnosis than other machine-learning methods, particularly when one considers the factor of network topology. However, parameter optimization presents a major issue when utilizing ANN. Multi-layer perceptron (MLP), a subset of Deep neural networks (DNN), has offered effective resolutions to this problem. DNN are increasingly recommended to support diagnostic processes for diverse diseases [7], as DNN facilitate disease identi cation and diagnosis while minimizing human error [8]. When utilizing neural networks for diagnosis, it is vital to attain a high level of accuracy, which is achieved via suf cient training and testing on patient datasets. DNN have shown particular promise for achieving maximum accuracy and minimal error through training and testing on datasets.
Machine-learning models are commonly used for diabetes prognostication and provide better results. Among machine-learning models, one of the most widely used methods for results classi cation is the Decision tree (DT). In machine-learning methods for disease diagnosis, the results of multiple DT can be synthesized to generate a random forest (RF) that yields a single collective nal result-that is, a nal diagnostic decision. The authors used RF in parallel with Principal component analysis (PCA). RF approximately obtains 80% accuracy. Historically, the primary objective of diabetes diagnosis was simply to help control the development of the disease. With support from machine learning, early diagnosis has become possible. High-risk individuals may now take precautionary measures to avoid consequences of the disease for as long as possible. Successful early diagnosis largely depends on accurate selection of classi ers and related features. Researchers have been experimenting with various machine-learning methods, testing different algorithms with the aim of achieving superior rates of prediction accuracy. Previously explored algorithms include support-vector machines (SVM), J48, naïve Bayes, and DT; studies of these algorithms have proven that machine-learning methods achieve superior diagnostic results [9]. The real strength of these algorithms lies in their exibility to integrate data from varying sources [10].
In this study, we propose a new DNN approach for generating highly accurate predictions of type-2 diabetes. Our approach utilizes a cloud-based decision support system for early identication of diabetic patients. The proposed system uses real-time patient data as input to predict whether a particular patient has diabetes. We apply three popular machine-learning algorithms and a fuzzy system to achieve nal diagnostic results with accuracy rates higher than those achieved in similar past studies.

Related Research
Researchers in [11] presented a hybrid framework for detection of type-2 diabetes that uses two techniques: K-means and C4.5. They used the clustering algorithm to identify class labels and C4.5 for classi cation. Their experiment on the Pima Indians diabetes dataset (PIDD) yielded a 92.38% accuracy rate. Researchers in [12] proposed a model using fuzzy C-means clustering techniques to diagnose type-2 diabetes. They used 768 records with nine features in their experiment, achieving 94.3% accuracy. In [13], researchers performed a comparative analysis of various classi cation and clustering techniques for diabetes diagnosis. They conducted tests to evaluate the performance of applied data-mining techniques. Their results indicated that the J48 classi er outperformed all other techniques in Weka with an accuracy rate of 81.33%. Researchers in [14] proposed a framework to diagnose diabetes using DT along with a fuzzy decision boundary system. The proposed framework achieved an accuracy of 75.8%. Researchers in [15] presented a system to detect diabetes using generalized discriminant analysis and least-squares SVM. Their proposed system demonstrated 82.50% accuracy. Researchers in [16] presented a diabetes detection system using a modi ed arti cial bee colony (ABC) optimization technique with fuzzy rules. Their proposed system showed an accuracy rate of 82.68%. Researchers in [17] proposed a model for diabetes detection that integrated ANN and SVM using a stacked ensemble technique. They applied their model to the PIDD and achieved an accuracy rate of 88.04%. In [18], researchers presented an ensemble classi cation model based on data streams. The proposed model was able to perform classi cation tasks in a data-streaming environment. Researchers in [19] also presented an ensemble classi cation model; theirs was designed to detect diabetic retinopathy. They used fuzzy RF and applied Dominance-based Rough Sets Theory. Their experiment used the SRJUH dataset and showed an accuracy rate of 77%. Researchers in [20] presented a heterogeneous ensemble classi cation model that included a fuzzy rule inference engine to tackle the issue of uncertainty in the results of base classi ers.

Materials and Methods
Early diagnosis of type-2 diabetes can offer patients the opportunity to improve their lifestyles and dietary habits. Moreover, early detection can guide patients to start taking proper medication before the disease worsens. In our study, we present a method for early detection of diabetes that uses a cloud-based intelligent framework empowered by supervised machine-learning techniques and fuzzy systems as shown in Fig. 1. Our framework consists of two layers: Training and testing. Each layer further consists of multiple stages. The training layer begins with the selection of a proper dataset. In the present study, we selected a pre-labeled dataset of diabetes patients [21] for the implementation of our proposed framework. This dataset consists of 15,000 instances and a total of 10 features, of which nine features are independent and one, the output class, is dependent. The pre-processing layer of our proposed framework involves two stages: 1) Data cleaning and normalization and 2) data splitting. Data cleaning removes missing values using the mean imputation method, while normalization brings the values of all features into a certain range. Both activities help the classi cation process achieve higher performance/accuracy. After data cleaning and normalization, the dataset is divided into training data and test data at a ratio of 70:30 on the basis of class split.
After pre-processing is the classi cation process, which consists of training of three widelyused supervised classi cation techniques: ANN, DT, and naïve Bayes (NB). This layer receives input from the training set and test set in the pre-processing stage and provides three prediction results for the next stage. All three classi cation algorithms must be optimized to achieve maximum accuracy. During ANN con guration, we used one hidden layer with 10 neurons and backpropagation technique to tune the weights. We used a multi-layer perceptron with at least one hidden layer besides the input and output layers. The steps involved in backpropagation are as follows: initialization of weight, feed forward, backpropagation of error, and updating of weight and bias. Every neuron present in the hidden layer has an activation function such as f (x) = Sigmoid(x). The sigmoid function for input and the hidden layer of the proposed BPNN can be written as Input derived from the output layer is The output layer activation function is Backpropagation error is represented by the above equation, where, τ k and p p k represent the desired output and estimated output, respectively. In Eq. (6), rate of change in weight for output, the layer is written as After applying the chain rule method, the above equation can be stated as By substituting the values in Eq. (7), the value of weight changed can be obtained as presented in Eq. (8). where, Then, we apply the chain rule method for the updating of weights between input and hidden layers: where represents the constant: After simpli cation, the above equation can be stated as where Eq. (10) is used for updating the weights between hidden layers and output.
Eq. (11) is used for updating the weights between the input and hidden layer.
In DT, we used three optimizers one by one: Random search, Bayesian optimization, and grid search. Bayesian optimization performed well and was hence selected for this framework.
GINI index is and information gain is In machine learning, information gain is used to de ne a desired sequence of attributes for investigation of the most rapidly reduced state of S. DT depicts how each stage depends on the outcomes of the analysis of the last attribute; applied in the area of machine learning, this is known as decision-tree learning. An element with high mutual information must be preferred to other attributes. z = arg min z∈Z f(z) (15) Here, f (z) serves to minimize error rate, or Root mean squared error (RMSE), assessed on the validation set. z can take on any value from domain Z, and z * is the set of hyper-parameters that relent the lowest value of the score. In simple terms, we aimed to nd the model hyperparameters that would deliver the best score on the validation set metric. This model is known as a "surrogate," which is represented as p(z | n), for the objective function: We intended to optimize expected improvement with respect to proposed set of hyperparameters n. Here, z * is an edge value of the objective function, whereas z depicts the actual value of the function using hyper-parameters n, and p(z | n) is the surrogate probability model stating the probability of z given n. This suggests the best hyper-parameters under the function p(z | n).
The hyper-parameters are not expected to produce any improvement if p(z | n) is zero everywhere that z < z * . On the other hand, the hyper-parameters n are expected to produce better results than the threshold value if the fundamental part is positive. The p(n | z) function is expressed as where l(n) is the distribution of the hyper-parameters when the score is lower than the threshold z * , and g(n) is the distribution when the score is higher than z * . z * is the minimum observed true objective function score, whereas z stands for new scores. To maximize the expected improvement result under the Gaussian Process model, the new score z must be less than the current minimum score (z < z * ), hence the max (z * − z, 0) can be a large positive number where z < z * shows a lower value of the objective function than the threshold.
Our rationale for this equation is that we have two different distributions for the hyperparameters: the rst represents where the value of the objective function is less than the threshold, l(n), and the other where the value of the objective function is greater than the threshold, g(n).
To increase expected improvement, points with high probability under l(n) and low probability under g(n) might be chosen as the next hyper-parameter.
In NB, three kernel types are used: Box, Gaussian, and Triangle.

Probability of Liklihood of Evidence * Prior Probability of Evidence
The traditional NB classi er estimates probabilities by an approximation of the data through a function, such as a Gaussian distribution: where µ t represent the mean of the values of attribute S t averaged over training points with class label z, and σ z represents the standard deviation. The one-parameter Box-Cox transformations are de ned as and the two-parameter Box-Cox transformations as After particular optimization, each optimized model is stored in the cloud. The next stage of the training layer in our proposed framework deals with the creation and implementation of fuzzy logic on the results of optimized classi cation algorithms as shown in Fig. 2. This layer receives the results of ANN, DT, and NB and generates the output using fuzzy rules as shown in Figs. 3 and 4, which is again stored in the cloud.
Conditional or if-then statements are used to make fuzzy logic. On the basis of these statements, fuzzy rules are constructed as follows: IF (NeuralNetwork is yes and NaïveBayes is yes and DecisionTree is yes) THEN (Diabetes is yes).

IF (NeuralNetwork is no and NaïveBayes is yes and DecisionTree is no) THEN (Diabetes is no).
In formulating the rules, it is evident that if any two of the three supervised classi cation techniques are true, then diabetes is true; otherwise, diabetes is false.   Fig. 2 shows the proposed fused ML rule surface of diabetes with respect to the neural network and naïve Bayes results. If both neural network and naive Bayes solutions predict no diabetes, then the resultant fused ML also predicts no diabetes; otherwise, the fused ML predicts diabetes. Fig. 3 shows that if the neural network diagnoses no diabetes and remaining algorithmsnaïve Bayes and decision tree-both diagnose diabetes, then the fused ML diagnoses the patient with diabetes. Fig. 4 shows that if all three algorithms-neural network, naïve Bayes, and decision treediagnose no diabetes, then the fused ML also diagnoses no diabetes.
The second layer of the proposed framework deals with the real-time classi cation of diabetic patients. The real-time patient data can be given as input to the proposed machine-learning fuzzed model, and appointments can be made on the basis of the results. If any patient is predicted to be a diabetic, then he or she is appointed to an early slot on an emergency basis; meanwhile, if the patient is predicted to be a non-diabetic, then he or she can be given an appointment following the regular schedule.

Results and Discussion
To implement the proposed framework, we used a dataset [21] consisting of 10 features and 15,000 instances as shown in Tab. 1. The rst nine features were independent features used as inputs to calculate and predict the tenth feature, the output class indicating whether the particular patient is suffering from diabetes or not. If the value of this feature is 1, the patient is diabetic, and if the value is 0, the patient is non-diabetic. Age of the patient Numeric values (21-77) 10 Diabetic class Yes = 1 and No = 0 We divided the dataset into two parts, 70% training data (10,500) and 30% test data (4,500). We performed the pre-processing activities of cleaning and normalization on the dataset prior to classi cation. For classi cation of the dataset, we used three machine learning algorithms: ANN, DT, and NB. We optimized these techniques iteratively until we achieved maximum performance. We applied various statistical measures to assess the performance of the classi cation techniques as shown below. Negative where RO 0 , RO 1 , EO 0 and EO 1 represent the predicted positive output, predicted negative output, expected positive output, and expected negative output, respectively.
False Positive Ratio = 1 − Speci city (28) First, we used ANN to classify the dataset. We used one hidden layer consisting of nine neurons while designing the structure of the neural network. We used 70% of the dataset, consisting of 10,500 records, for training the model and the remaining 30% of the dataset, consisting of 4,500 records, for testing. Of the 10,500 records reserved for training, 7,000 were negative and 3,500 were positive. During the training process with ANN, 6,801 records were classi ed as negative and 3,273 were classi ed as positive. After comparing the expected results with the output results shown in Tab. 2, we achieved 96% accuracy with a 4% miss rate. In testing with ANN, 2,831 records were classi ed as negative and 1,285 were classi ed as positive (Tab. 2). The accuracy rate of ANN in the testing stage was 91.5% and the miss rate was 8.5%. During the training process with DT, 6,801 records were classi ed as negative and 3,273 were classi ed as positive. After comparison of the expected negative and positive records with the output results of the training process with DT (Tab. 3), we achieved an accuracy rate of 95.9% and miss rate of 4.1%. During the testing process with DT, 2,898 records were classi ed as negative while 1,404 were classi ed as positive (Tab. 3). During our comparison of expected output with output of the testing process with DT, we achieved an accuracy rate of 94.9% and miss rate of 5.1%. During training with NB, 6,647 records were classi ed as negative and 3,109 were classi ed as positive. After comparing the achieved output of NB in the training stage with the expected output (Tab. 4), we achieved 92.91% accuracy and a miss rate of 7.09%. During the testing process, we used 4,500 records (30% of the dataset) for validation. Of these records, 3,000 were negative and 1,500 records were positive. The NB classi ed 2,828 records as negative and 1,348 as positive. After comparison with the expected output (Tab. 4), the proposed model achieved an accuracy rate of 92.8% and miss rate of 7.2%. Finally, we inputted all of the records of test data into the fuzzy system along with the output class for the nal decision. The fuzzy system classi ed 2,903 records as negative and 1,380 as positive (Tab. 5). During comparison of expected output and fuzzy system output, we achieved 95.2% accuracy with a miss rate of 4.8%.   Tab. 7 re ects the detailed results of our proposed fused model along with input and output. We can observe that the real-time input parameters of the patients were given to the decision support system, where the three classi ers individually predicted diabetes diagnosis and the fuzzy inference system then formulated the nal result.

Conclusion
Early diagnosis of diabetes using machine-learning techniques is a challenging task. In this paper, we proposed a novel cloud-based decision-support system for diabetes prediction using a fused machine-learning technique. Our proposed system integrates the classi cation accuracy of three supervised machine-learning techniques (ANN, NB, and DT) with a fuzzy inference system to generate accurate predictions. Our system consists of two layers: training and testing. The training layer initiates with data pre-processing activities-data cleaning and normalization-and is followed by data splitting for classi cation. In our study, we divided the dataset for training and testing at a ratio of 70:30 to optimize classi cation techniques and yield more accurate results in the validation data. After pre-processing, we executed the classi cation process, which involved training of the three classi cation techniques (ANN, NB, and DT) followed by validation on our selected dataset. We optimized these techniques until maximum accuracy was achieved. Finally, using a fuzzy system, we synthesized the three prediction results from the three classi cation techniques to generate the nal prediction output. In our study, our proposed system achieved an accuracy rate of 95.2%, outperforming previously applied machine-learning techniques for diabetes diagnosis.