Performance based Evaluation of Algorithmson Chronic Kidney Disease using Hybrid Ensemble Model in Machine Learning

In medical data science, data classification, pattern generation, data analysis and improving classification accuracy are the important issues in the recent scenario. The main objective of this research to enhanced classification accuracy by four combinations of features technique separately with Neural Network classifier approach. The neural network is analyzed for chronic kidney disease with the help of features reduction and relevant techniques. In experiment, we used neural network as ensemble model with different features techniques as: Pearson Correlation, Chi-Square, Extra Tree and Lasso regularization. In this research paper, we have prepared training model on 300(75%) instances of chronic kidney disease attributes and testing on 100 (25%) instances. We test the dataset on different applied epochs and calculated accuracy with error rate. The summary of this experiment, we used400 instances with 26 attributes of Chronic Kidney Disease and evaluated highest accuracy calculated (99.98%) with less error rate on passing several epochs by Neural Network ensemble with Lasso model.

Chronic kidney failure is not known until its function deteriorates. Kidney function can only be assessed if it is too bad, then kidney transplantation will be only one way to safe human life.
Transplantation will be only one way remains by which can be avoided in this fatal situation. Some symptoms arise when the kidney is unhealthy such as: On the basis of our previous analysis 2- 6 , we calculated high accuracy on the basis of ensemble method and majority of voting. The machine learning algorithm provides an environment that makes the study of the data set very easy for the analyst. Machine learning has different algorithms for different property patterns. Some algorithms describe the relationship between attributes and what types of attributes are present in the data set, and some algorithms identify their distribution intensity etc.
Nithya A et al. [2020], discussed about normal and abnormal kidney disease by neural network. Authors used ultrasound image, neural network, multi-kernel k-means clustering, GLCM features, segmentation, classification and bilateral filter for better prediction. They used linear and quadratic based segmentation for find better accuracy (99.61%) compare with other machine learning algorithms 7 .
Verma AK et al. [2020], analyzed about skin disease by six different machine learning algorithms. Authors used bagging, AdaBoost, and gradient boosting Meta classifiers to predict class level variable prediction. They find accuracy (99.68%) after the applied features selection method, gradient boosting trained algorithms 8 .
Harimoorthy Yadav DC and Pal S [2020], discussed about lack of cardiovascular centre in rural side. In this paper authors used heart data sample from UCI repository. Authors used cluster-based DT learning at various levels for class set combination. They calculated accuracy (88.90%) by cluster Based random forest machine learning algorithm 12 .
Chaurasia V et al., [2020], identified lower back pain in chronic as a muscled pain, nerves and bones. They used Genetic Algorithm (GA)-based feature selection to improve classification accuracy and used seven classification algorithms. Finally authors find k-Nearest Neighbors calculated better accuracy (85.2%) compare with other machine learning algorithms 13 .
Alloghani M et al. [2020], analyzed about high-risk of cardiovascular disease and complications in kidney problem. Authors used decision tree boosted decision tree, CN2 rule, logistic regression (Ridge and Lasso), neural network, support vector machine and find support vector machine calculated highest accuracy (91.7%) 14 .
Shon HS et al. [2020], discussed about kidney cancer prognosis for (1157) patients and calculated classification accuracy by machine  15 .
The main goal of this research is to enhanced classification accuracy by four combinations of features technique separately with Neural Network classifier approach. The neural network is analyzed for chronic kidney disease with the help of features reduction and relevant techniques.

MaTerial and MeThods
In this section, we experimentally define Neural Networks with features important extra tree algorithms, lasso regularization and Pearson correlation extraction methods. In this study, we conducted epoch, error rate, accuracy and their improvement from medical dataset. The medical dataset are stored from UCI with features repository and their correlative features. In this experiment, we used Python, R languages with Weka tool.

data description
We have analyzed 400 instances with 26 attributes of Chronic Kidney Disease to find the true and false distribution of classes by 0 and 1 as: The detailed of chronic kidney disease with attributes: Age: Represent by numeric values, bp : Measure the blood pressure, sg : represent gravity specific values, al : represents albumin values, su: sugar, rbc: count as red blood cells, pc: count pus cell, pcc: count pus cell clumps values, ba : detect bacteria, bgr : represents blood glucose random numeric values, bu : Analyzed blood urea, sc : represents serum creatinine numeric values, sod : measure sodium values in body, pot : represents potassium values, hemo : measure hemoglobin numeric values, and other attributes pcv, wc, rc, htn, dm, cad, appet, pe, classification with descriptions as packed cell, volume, white blood cell count, red blood cell count, hypertension, diabetes mellitus, coronary artery disease, appetite, pedal edema, anemia and class respectively. We measured the density of each attributes on the basis of target variables classification and represent as: epoch In this paper, we used epoch as a number of instances passes or complete passes through chronic kidney disease training dataset 16 . In this analysis we used number of epoch from 100-600 to check the error rate and accuracy evaluated at various level. error rate In this research, we used error rate as inaccuracy of predicted output values 17 . In this experiment, we find if target values categories then the error express in the form of error rate. accuracy In this experiment, we observed and examined good prediction of correct class. It makes decision in diagnosis of chronic kidney disease 18 . It is calculated as per the equation: Accuracy=(Correctly Predicted Class)/(Total Testing Class) *100………. (1) In the research, we have study [19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34] for accuracy and error rate on various disease and find how instances and features closely relate with each other.

Proposed Method
In this research paper, we used Neural Network as a classifier of input variables. We have used four features based algorithms: Extra Tree, Pearson Correlation, Lasso model and Chi-Square for better prediction. In this research paper, we have prepared training model on 300(75%) instances of chronic kidney disease attributes and testing on 100 (25%) instances. We calculated various prediction model of Neural Network with higher and lower classification accuracy with different error rate.
All the Chronic Kidney features with their low and high relevant are calculated as:

Fig. 3. Representation of Pearson Correlation with target variable classification
We have ensemble neural network with extra tree, Pearson Correlation, Chi-Square and Lasso regularization separately then find their performance improved as per experimental results. We find classification accuracy continuous increase and error rate continuous decrease with the increasing of epoch values.

resulTs
In this section, the neural network, extra tree, lasso model, Pearson correlation and Chisquare perform the function based classification algorithms. Neural network perform as an ensemble model with lasso model. All the medical dataset are   In each experiment the instances of kidney disease classify into two sections as like training and testing with 75% and 25% in whole instances. The results were done only by class level so we determined number of parts the input variable has to be divided. Table 2., represents the selected highly correlated features (cor_target>0.5) because Pearson Correlation decides variables relationship between -1 to +1. The positive correlations assign both variables increase and decrease in same direction. Conversely, negative correlations assign both variables move inversely. A zero assigns no correlation between variables. Table 3., represents the correlation matrix as square with same variables in the rows and columns. The lines 1.00 going from top left to the right bottom in diagonal form symmetrically, with the same correlation is shown in figure 3.  Figure 3., represents selected features correlation as dark green values represents high correlation and dark red represents weak correlations. In the first row attribute ID highly correlated itself and weakly correlated with attribute Classification. In the last row Classification attribute highly correlate with itself and weakly with attributes ID. Table 4., represents calculated important features in dataset. These calculated values plot same as in figure 4., the value (0.40625492) of attribute Classification and various attributes assigned by very less values. Figure 4., represent all attributes not assigned their less values but table 4., provide attributes decimal very less values .
Extra Tree features selection method used on whole original sample instead to reduce bias and randomly select the split point of each node to reduce variance. This features selection technique provides the results for kidney disease and calculated highest values of selected attributes: Classification, dm, htn, rc rbc, id, hemo, sod, al, pcv, pe, bu, pot and ane.  The LASSO features selection method used to shrinking and removing the coefficients can reduce variance without a substantial increase of the bias. The variables that have a non-zero coefficient after the shrinking process because shrinking process penalizes the coefficients of the regression variables and regulates some of them to zero. Lasso Method represents non_ penalized variables with values range and picked 6 variables and eliminated the other 19 variables as: Best alpha using built-in LassoCV: 0.437812 Best score using built-in LassoCV: 0.776112 Chi-Square calculated with k-fold cross validation, k=10 and explains attributes scores as: Chi-Square used to test and compare observed with expected frequencies highly sensitive to sample size. The main objective of this features selection method to find goodness of fit variables and measures how well the observed distribution of data fits with independent variables. With the results, we find improvement in classification accuracy and reduce the error rate values by selected features mentioned in discussion section.  A figure (6-10) represent the testing model on 100 instances of CKD attributes of workflow error rate and passes epoch by neural network with features methods and generates different data prediction models. The neural network determines the nature of data and generates a train to medical data set. The experimental setup identified last score values for error rate and passes epoch. The error rate of all algorithms have major differences with the passing epoch (100-400) but after that we find minor changing (near nothing) in error rate with passing epoch (500-600). The neural network find (0.12), extra tree (0.36) and ensemble model of neural network with: extra tree (0.16), Lasso model (0.0001), Pearson correlation (0.6) and Chi-square (0.07). After the passing epochs from (100-600), we observed again from (700-1000) epochs but did not find major differences between error rate and calculated accuracy. In this research paper, we stored data from UCI Repository, 400 instances with 26 attributes of Chronic Kidney Disease. With the results, it is clear that the highest accuracy calculated (99.98%) by Neural Network ensemble with Lasso model. The Neural Network with Lasso model always calculated highest accuracy for each epoch. This ensemble model prepared minimum error rate but calculated error rate is not less compare with other algorithms. The Neural network without ensemble calculated very less error rate compare with other algorithms but calculated less accuracy compare with other algorithms. Finally we find Neural Network with Lasso Model calculated high accuracy and less error rate. The error rate of Neural Network ensemble valuable on two decimal points so we measure error rate difference were minor compare with Neural Network. So Neural Network ensemble with Lasso model performed better compare with other algorithms. For future, we will use feature extraction with feature selected as hybrid modified various applications.

Conflict of Interest
Authors have no conflict of Interest.

Funding
This study was not funded.

acknowledgeMenTs
The author is grateful to Veer Bahadur Singh Purvanchal University Jaunpur, Uttar Pradesh, for providing financial support to work as Post Doctoral Research Fellowship.