Intelligent Chronic Kidney Disease Diagnosis System using Cloud Centric Optimal Feature Subset Selection with Novel Data Classication Model

Internet of Things (IoT) and cloud computing offers diverse applications in the medicinal sector by the integration of sensing and therapeutic gadgets. Medical expenses are rising gradually and different new diseases also exist globally, it becomes essential to transform the healthcare facilities from a hospital to patient-centric platform. For providing effective remote healthcare services to patients, this paper introduces an optimal IoT and cloud based decision support system for Chronic Kidney Disease (CKD) diagnosis. The proposed method makes use of simulated annealing (SA) based feature selection (FS) with Root Mean Square Propagation (RMSProp) Optimizer based Logistic Regression (LR) model called SA-RMSPO-LR to classify the existence of CKD from medical data. The proposed model involves a set of four subprocesses, which include data collection, preprocessing, FS, and classification. The inclusion of SA for FS helps to improvise the classifier results of the SA-RMSPO-LR model. The effectiveness of the SA-RMSPO-LR model has been validated using a benchmark CKD dataset. The experimental results indicated that the proposed SA-RMSPO-LR model leads to effective CKD classification with the maximum sensitivity of 98.41%, specificity of 97.99%, accuracy of 98.25%, F-score of 98.60% and kappa value of 96.26%. The experimental outcome indicates that the proposed SA-RMSPO-LR model has the capability to detect and classify CKD over the compared methods proficiently.


Introduction
Internet of Things (IoT) is a significant model that concentrates on modeling and interconnection of Internet-linked things under the application of computer systems. IoT is mainly applied for diverse applications using a maximum number of low power devices such as wrist band, fridge, umbrella, and so forth, instead of using a minimum number of high power devices like computers, tablets, smartphones, etc. [1]. IoT and Cloud Computing (CC) are advantageous when it is combined for developing a novel technique [2]. An observance model has been designed under the integration of IoT and CC to observe the patient's condition and effectively collect the details even at the remote areas that may be more helpful for medical physicians. In several cases, the IoT method is often retained with the help of CC environment to enhance efficiency concerning productive resource application, data storage, computation and processing abilities. Besides, CC earns the merits of IoT by extending the value of tackling the current world issues and dynamically providing new facilities.
The combination of IoT and CC is a relied platform; it serves quite-well when compared with conventional CC based performance [3]. Some of the major applications such as: healthcare, armed forces, user appliances as well as banking sectors that exploit the concatenation of IoT and CC method. Among other domains, medical and healthcare which has two challenging works that tend to develop various techniques in medical tools and screening gadgets [4]. Usually, the medical costs are more expensive and also several diseases may exist around the world; it is significant to transform the healthcare service from a involved in local search based learning techniques. The CS approach is more applicable in selecting the input weight vector of NN in order to provide an appropriate training for the data. The classification process in a deployed model concentrates on providing optimal performance. Modified approaches of NN-CS (NN-MCS) [14] have been established for resolving the issue exist in local optimum of NN-CS technique. The primary weights of neuron link handles the function of NN, and the projected model applies MCS scheme to reduce the Root Mean Square Error (RMSE) measure that is used at the time of NN training. The attained results show that NN-MCS model reached an optimized function when compared with NN-CS approach.
Chen, Z. et al. [15] have introduced two fuzzy classification models such as fuzzy rule-building expert system (FuRES) and Fuzzy Optimal Associative Memory (FOAM) it has been applied to find the presence of CKD. FuRES produces a classification tree that includes a minimal NN. It tends to develop the classification rules for computing the weight vector using lower fuzzy entropy. The two fuzzy classifiers were utilized to diagnose the CKD patients. Additionally, FuRES is related to FOAM in case of training, and forecasting task, that has same intensity of noise. FuRES and FOAM has accomplished an optimal function in the CKD analysis; simultaneously, FuRES is more important when compared with FOAM.
Arasu, S. D., & Thirumalaiselvi, R has proposed a new technique termed as Weighted Average Ensemble Learning Imputation (WAELI) [16]. The missing values present in a dataset minimize the accuracy level of CKD. Since the traditional approaches are applied with data preprocessing process, the data cleaning task is essential to occupy the missing values and eliminate the ineffective scores. A revaluation strategy is projected for every CKD phases in which missing values have been estimated and placed in corresponding locations. Even though the conventional models are productive, it still requires a professional disease diagnosis system in healthcare dataset to assure the CKD values.
Here, FS task is treated as a vital portion in the region of data classification that is applied for identifying tiny set of rules from a training dataset with permanent objectives. There are various models such as AI (Artificial Intelligence), bio-inspired mechanism that is utilized in FS process. Tan, K. C., et al. projected a wrapper technique which is mainly used for hybridization of GA using Support Vector Machine (SVM) known as GA-SVM approach that selects the feature subset in an optimized manner [17]. The minimization of repeated features of presented system enhances the classification task that has been verified under the application of five various disease dataset. In addition, Chetty, N., et al. deployed a wrapper scheme for identifying CKD using 3 phases, a technique is produced from DM, Wrapper subset attribute calculator as well as best first search model is applied for selecting attributes and classifier [18]. The experimental results stated that, the accuracy has been incremented for lower dataset than the actual dataset. P. Arulanthu and E. Perumal have suggested classifiers for effective CKD classification and prediction with reduced attribute information [19].
Wibawa, M. S., et al. have introduced an approach to improve the superiority of CKD [20]. This model contributes in 3 steps like FS, ensemble learning as well as classification process. The combination of Correlation-based FS (CFS) and k-nearest neighbour (kNN) classification concludes in maximum classification accuracy. Polat, H., et al. have applied alternate CKD identification model under the application of filter and wrapper methodologies [21]. Therefore, the results attained from this method reveal that, the reduced number of features could not ensure the efficiency of classification task. Pramila Arulanthu and Eswaran Perumal have proposed efficient online CKD diagnosis method using IoT and cloud support system for easy identification [22].
An intelligent prediction and classification model for healthcare sector using Density based Feature Selection (DFS) with Ant Colony based Optimization (D-ACO) model for CKD has been developed [11]. The presented model makes use of DFS to select features and applies ACO for data classification.
IoT enabled cloud based disease diagnosis model for CKD has been presented in [23]. Deep Neural Network (DNN) classifier is used to predict CKD with its severity level. Besides, PSO algorithm is applied to increase the classification results by selecting the required features.
Set of two ensemble models such as Bagging and Random Subspace methods on three base-learners k Nearest Neighbours, Naïve Bayes and Decision Tree has been presented in [24] to improve the classifier outcome. The presented model involves data preprocessing for handling the missing values and data scaling for the normalization of the range of independent variables.

Proposed CKD Diagnosis Model
Entire process of the proposed SA-RMSPO-LR model has been shown in Figure 1. As shown in figure 1, data collection process takes place in diverse ways. Followed by data preprocessing takes place and then the preprocessed data is provided to the SA-FS model. The SA-FS model will choose the optimal subset of features and then classification process is carried out by the RMSPO-LR model. These processes have been discussed in the following subsections. Figure 1. Overall process of the proposed CKD diagnosis system.

Data Collection
In the first stage, data collection process takes place where the data is collected using IoT gadgets which are linked to patients, standard benchmark medical dataset, patient health data and hospital management server. The medical data collected by IoT tools linked to a person are mostly required. In general, a sensor which is linked to a human being assembles the specific medical data in a regular time interval. The deployed SA-RMSPO-LR model exploits the 4G network for transmitting monitored data for CDSS. Finally won benchmark CKD real-time dataset has been applied for disease diagnosis [26]. The medicinal dataset is composed with the patient data which are gathered from hospitals and saved in hospital management server. Then, a data collection tool represents the capability of collecting required data and transmits it to the CDSS.

Pre-processing
In this phase, CKD data has been transformed into suitable form under three various steps. At the initial stage, format transforming task is carried out in which actual data is converted as .csv file format. At the same time, few categorical values in a dataset like Yes, No, Absent or Present are converted as arithmetic scores like '0' and '1'. Finally, the missing values in a data would be occupied using median process.

Simulated Annealing (SA):
SA is defined as one of the random global optimization approaches, which expect the differentiability and multimodality of an objective function. It refers the annealing operation of external system present in statistical models. Figure 2 shows the general process of SA method. The physical task is that, a substance has been melted and cools with a minimum speed to ensure the process of reaching thermal equilibrium at every temperature, and if the temperature is equal to 0, then it obtains the crystalline lattices of lower energy which is named as ground state. When a maximum temperature is lower when compared to the melting point then, it solidifies as sub-optimal configuration that is not comprised with lower energy. In addition, the emergence of a substance at a fixed temperature relied on Monte Carlo approaches. Provided with a current state of substance with energy , the subsequent state with energy have been produced under a tiny random perturbation to state . While is less than or equal to , then the state could be approved as current state. Else, the state has been consumed with a probability as provided below: where T denotes the present temperature and kB implies Boltzmann constant. The acceptance rule mentioned is named as Metropolis criterion, and a model which has been operated along with it is called as Metropolis technique. Since the temperature has been decreasing gradually, the substance attains a thermal equilibrium at every temperature. Hence, thermal equilibrium undergoes characterization using Boltzmann distribution that offers the probability of a substance in state with energy at temperature . It can be expressed as: where the denominator is the addition of energy of every feasible state at temperature

Optimal Feature Subset Selection Problem:
The FS issue for CKD is assumed to be a combinatorial optimization problem. The count of viable feature integration from a feature set which has 24 features that are computed as 2 24 -1. It refers that, it is not possible to process the estimation of every feature combinations. Hence, optimal feature subset selection issue can be described in the following way: Definition 1 Optimal feature subset selection problem Given a feature set and a cost function , identify feature subset (s) ' where the measure of cost function has been reduced.

Feature Selection Approach based on SA:
Here, it has been developed with the FS technique for CKD diagnosis. The presented FS method depends upon SA method that has been vastly applied for combinatorial optimizing task. SA model is assumed to be major searching techniques. But, if a naïve local search algorithm applies a greedy approach to identify the best solution, SA is one of the probabilistic scheme which activates to leave the local optimum for identifying better solutions. For this purpose, SA model has the nature of obtaining optimal solution when compared with naïve local search algorithm at any circumstances.

i) Solutions
A solution applied for FS method is shown by a binary vector with a length of 41 as shown in Eqn. (3). The value 1 is allocated for selected feature, whereas 0 is declared for unselected feature.
Various types of search modules were employed for handling the optimization issue which requires a primary solution. The possible solution is selected in a random manner and applied as initial solution. Simultaneously, the neighboring solutions are provided for binary vectors with a single bit, which have varied from the predefined solution. For instance, the neighboring solution applies a set of every 24 features, which has a set of 24 binary vectors with single bit of value which is a single function among others.

ii) Cost function
Major aspect on the process of optimization of heuristic approaches like SA is based on the cost function which helps to estimate the single solutions. Besides, the working function of a technique is highly based on the definition of cost function and the way it was defined. In addition, the cost for the provided record from a training data can be estimated by refers the set to 1 while class is computed by clustering model that is equal to actual class where the record is constrained, else, it is assigned to . The cost for a given solution has been evaluated across the training data set with the application of Eqn. (5).

iii) Some other parameters
The SA model applies the cooling procedure for discovering the best solutions to eliminate local optima at the time of exploring the solution space. Basically, a cooling method is defined as that, to define the way of searching an optimal solution. Some of the attributes which are initial temperature, a temperature reducing function as well as stopping criteria has to be employed.
The initial temperature should be high to enable the transition which has to be approved. A measure of 100,000 was allocated as initial value , that is higher when compared with size of a training data set. The temperature reduction function is a simple iteration that is a product of combined with a constant where the value of is declared to 0.9. Finally, a termination criteria is that, when a score of is lower than 0.001 then, the method terminates at 0.001 and computed with various process.

iv) Procedure of algorithm
Generally, an initial solution has been selected in a random manner as it has been considered as optimal solution. Consecutively, the cost of an initial solution can be estimated under the application of cost function. When temperature is not capable to meet the termination criteria, a neighboring solution of recent optimal solution were chosen and evaluate the cost. When a cost of selected neighboring solution is less than or equal to recent optimal solution, then current optimal solution can be interchanged with novel neighbour solution. While a cost of neighboring solution is higher when compared with current optimal solution, an arbitrary value has been selected at the range of (0, 1). At this point, the replacement of best solution is activated when random value is minimum when compared with . Once the temperature is decreased by Eq. (6), it is repeated until meets the termination criteria.
Pseudo-code for the FS technique depends upon SA as given below:

LR-based classification model
Classification task tends to develop a presentation which undergoes mapping with data items of provided classes that depends upon recent data. The elimination of required data substance from a method is applied for detecting the nature of data. In most of the cases, the LR model has been relied on a variable to perform the binary classification. In order to specify LR approaches, it is concerned with 2 types of difficulties. The key intention of this technique is, to predict the presence of CKD that is often processed by a binary classification approach. In addition, the LR methods are often applied for identifying diseases, DM and classification of health care data. LR represents the function of predicting the existence or absence of CKD. LR model is mainly depends upon linear regression approach as represented in Eqn. Here, a classification is implemented at negative and positive class. Where, is the presence of CKD, shows the autonomous variables for the collection of eight elements. Each dependent variable has been allocated to a coefficient value named as that specifies the weight. As it is identified by LR model, the database values are comprised with weight values. For diverse weights, it specifies different links among and . Parameters in LR can be modified to reach optimized classification outcome. At this point, RMSProp model is used to choose the parameters present in LR.

RMSProp model
RMSProp model depends on developing weighted average of gradients like Gradient Descent (GD) with momentum with the diverse upgrading parameters. By considering the instance, to optimize a cost function that is comprised with contours where red dot is a position of local optima. The initial GD begins from point 'A' and the iteration of GD has been completed at point 'B', and the alternate side of an ellipse is illustrated in figure 3. Then, the following step of GD is concluded with point 'C'. Among all other iteration of GD, it moves towards the local optima in up and down directions. While there is an application of higher learning value, then a vertical oscillation has high magnitude, and vertical oscillation lowers the GD and removed from the employment of higher learning measure. The bias is responsible for vertical oscillations while 'weight' implies a motion present in a horizontal direction. After upgrading the bias, it decreases the vertical oscillation and if 'weights' are extended with higher values. In case of backward propagation (BP), dW and db parameters were employed to update W and b is predefined one.

Performance validation
This section discusses the effectiveness of the proposed SA-RMSPO-LR model on the benchmark CKD dataset. A detailed comparative examination with the existing methods takes place to verify the superiority of the presented model. The proposed model is simulated using MATLAB tool. The parameter settings of SA are Maximum Number of Iterations: 20, Maximum Number of Sub-iterations: 5, Initial Temperature T0=10, and Temp. Reduction Rate alpha=0.99.

Dataset Description
For applying the classification task of SA-RMSPO-LR approach, a standard CKD dataset has been applied [25]. The dataset description as well as accessible features is depicted in Table 1. The CKD dataset is comprised with a sum of 400 instances with 24 features. Among the 400 instances, 250 instances are labeled as CKD present and the remaining 150 instances are labeled with the non-existence of CKD. The sample frequency distribution and class distribution of 24 features are illustrated in Figure 4. On the other side, the features which affects on CKD are given in Figure 5. For experimentation, 10-fold cross validation method is employed to assess the efficiency of the projected technique.    Table 2 shows the outcome of the FS models on the applied CKD dataset. Figure 6 shows the best cost analysis of the presented SA-FS model. The table values indicated that the CFS model has exhibited an inferior FS results with the best cost of 0.79. Besides, it is demonstrated that the Principal component analysis (PCA) model has offered slightly lower best cost of 0.04570 over CFS, but not than other models. It is also noted that the GA-FS and PSO-FS models have outperformed the earlier models and attained near identical best cost of 0.03440 and 0.03656 respectively. However, the proposed SA-FS model has chosen a set of 13 features with the best cost of 0.01053. This minimum best cost value offered by the SA-FS model clearly ensured the effective performance over its existing models.            Table 3 shows the results attained by the SA-RMSPO-LR model with existing models with respect to different measures. Figure 12 offered a comparative investigation of the results provided by the SA-RMSPO-LR model in terms of sensitivity and specificity. The figure indicated that the OlexGA is found to the worst performance which has attained a least sensitivity and specificity values of 80% and 66.66% respectively. In addition, it is depicted that the LR reaches to a slightly higher sensitivity and specificity values of 83% and 82% respectively. Besides, it is noted that the XGBoost model has reached to an identical sensitivity and specificity values of 83%. Moreover, it is shown that the PSO algorithm outperformed the earlier models by offering a sensitivity and specificity values of 88% and 80% respectively.

Result analysis
Furthermore, it is observed that the ACO leads to effective results with the sensitivity and specificity values of 88.88% and 84.61%. On continuing with, the DT model results in a slightly manageable performance with the sensitivity and specificity values of 90.38% and 89.28%. In the same way, it is provided that the MLP model reaches to an acceptable classifier results with the sensitivity and specificity values of 92.30% and 92.86%. Simultaneously, the FNC model has shown moderate results with the sensitivity and specificity values of 95.68% and 95.86%. Concurrently, it is depicted that the D-ACO model offered even higher sensitivity and specificity values of 96% and 93.33% respectively. In line with, the RMSPO-LR model has led to a competitive classifier outcome with the sensitivity and specificity values of 98.37% and 94.80%. Generally, the proposed SA-RMSPO-LR models have resulted to a maximum sensitivity and specificity values of 98.41% and 97.99% respectively.   Figure 13 provides a comparative analysis of the results attained by SA-RMSPO-LR method with respect to accuracy and F-score. The figure shows that the OlexGA performs a worst function that has reached a lower accuracy and F-score values of 75% and 80% correspondingly. Also, it is illustrated that the LR accomplish with little higher accuracy and F-score values of 82% and 82%. On the other hand, it is pointed that the XGBoost approach has reached to similar accuracy and F-score values of 83% and 80% respectively. Furthermore, it is given that the PSO technique outperformed than the previous models by attaining the accuracy and F-score of 85% and 88% respectively. Moreover, it is noted that the ACO results in a productive outcome with the accuracy and F-score measures of 87.50% and 90.56% respectively. Similarly, the DT scheme tends to provide a slightly reasonable function with the accuracy and F-score values of 90% and 92.15% correspondingly. Likewise, it is given that the MLP approach accomplish to a manageable classifier outcome with the accuracy and F-score values of 92.50% and 94.11% respectively. At the same time, the FNC model has shown gradual results with the accuracy and F-score values of 95.5% and 96.63%. Simultaneously, it is illustrated that the D-ACO approach provides even better accuracy and F-score values of 95% and 96.63%. Likewise, the RMSPO-LR model resulted to a competitive classification outcome with the accuracy and F-score values of 97% and 97.58% respectively. Finally, the projected SA-RMSPO-LR model provides an optimal accuracy and F-score values of 98.25% and 98.60% correspondingly. Figure 14. Performance of the Kappa Analysis. Figure 14 shows a relative examination of the results achieved by the SA-RMSPO-LR technique by means of Kappa. The figure implied that the OlexGA is found to be a poor performance that has reached lower Kappa value of 46.66%. Additionally, it is illustrated that the LR attains to a slightly better Kappa value of 74.60%. On the other hand, it is evident that the XGBoost model has accomplished to a similar Kappa value of 75.42%. Also, it is pointed that the PSO model outperformed the existing frameworks by giving Kappa value of 68%. Moreover, it is monitored that the ACO tends to provide efficient results with the Kappa value of 72.06%. In line with this, the DT model reached slightly appreciable results with the Kappa value of 78.37%. Similarly, it is given that the MLP model attains an acceptable classifier results with the Kappa value of 83.78%. At the same time, the FNC model has shown a gradual outcome with the Kappa value of 90.87%. Simultaneously, it is evident that the D-ACO approach offered even better Kappa value of 89.33%. The RMSPO-LR model tends to provide competitive classification outcome with the Kappa value of 93.63%. Consequently, the proposed SA-RMSPO-LR system has provided the results with higher Kappa value of 92.26%. By looking into the above mentioned tables and figures, it is ensured that the RMSPO-LR model is found to be an appropriate tool for CKD diagnosis and can be implemented in real time environment.

Conclusion
This paper has addressed an optimal IoT and cloud based decision support system for CKD using a SA-RMSPO-LR model. Initially, the data collection process takes place, which collects the patient's data through medical gadgets. Then, preprocessing is carried out to transform the collected data for further processing. Afterwards, SA-FS process gets executed and has chosen a subset of features, which are provided to the RMSPO-LR based classifier. The proposed classifier effectively classified the existence of CKD. Detailed experimental analysis takes place and the results are ensured using the benchmark CKD dataset. The simulation results are examined under varying number of epochs. The experimental results indicated that the proposed SA-RMSPO-LR model leads to effective CKD classification with the maximum sensitivity of 98.41%, specificity of 97.99%, accuracy of 98.25%, F-score of 98.60% and kappa value of 96.26%.The attained results clearly portrayed the enhanced classification performance over the compared methods. As a part of our future work, the performance of the SA-RMSPO-LR CKD diagnosis model can be improved by the inclusion of clustering techniques.