Rule Based Approach for prediction of Chronic Kidney Disease : A Comparative Study

Chronic Kidney Disease (CKD) is a major public health problem with growing challenges for its early diagnosis, timely prevention and effective treatment. The present dataset on Chronic Kidney Disease consists of 24 predictive parameters. The study performs a comparative analysis of rule based classifiers inorder to generate human interpretable rules for diagnosing CKD. Various rulebased approaches for comparison that have been used in the paper are JRip, CART, Conjunctive Rule, C4.5, NNge, OneR, Ridor, PART, and Decision Table-Naive Bayes (DTNB) hybrid classifier. The study concludes that among all the conventional classifiers cited, DTNB is best rule-based classifier with highest area under ROC (0.999) along with lowest False Positive Rate (0.011).


INTRODUCTION
Chronic Kidney Disease (CKD) is a worldwide health concern rising at a very fast pace.According to the National Kidney Foundation 1 , ten percent of the total world population is affected by CKD leading to an increased mortality rate.It is estimated that in developing countries like China and India, the cases of kidney failure will increase disproportionately where the number of elderly people are increasing due to enhanced life span.Chronic Kidney Disease is also termed as end-stage renal disease.Kidney disease can be classified into five stages 2 .At the first stage, kidney functions normally but it reduces mildly at every succeeding stage.During the transition from stage 3 to stage 4, renal functions are severely reduced.End-stage renal failure occurs at last stage of CKD.CKD detected at stage 5 leads to renal replacement therapy and dialysis.With early diagnosis and treatment, the progression of the kidney disease can be stopped along with substantial reduction in treatment cost.In this way an early prediction of CKD can lead to improved quality of life.
The stages of CKD and the level of kidney function are estimated by the Glomerular Filtration Rate(GFR) 3 .Creatinine is a waste product that comes from muscle activity.When kidneys are working well, removal of creatinine from the blood takes place.As kidney function slows down, blood levels of creatinine rise.Stage I CKD is reported with high GFR (> 90 mL/min) and end stage CKD is detected with very lowGFR (<15 mL/min).
Kidneys are two bean-shaped organs residing opposite to each other on either side of the spine.The position of right kidney is a little bit lower than the left kidney to accommodate the liver.Its core actions include extraction of waste from the blood, balancing body fluids, urine formation, blood pressure regulation, red blood cell regulation, and acid regulation 5 .Figure 1 shows the location of kidneys in human body.This figure is adapted and modified from the reference.
T h e k i d n ey s a r e m u l t i -f u n c t i o n a l powerhouses of activity and aid in performing many important functions of the human body.When they do not work properly, harmful toxins and excess fluids are produced in the body which can cause kidney failure 6 .
Exosomes are cell-derived 40-100 nm membrane vesicles occurring in all eukaryotic fluids including blood, urine and other cell cultures.The diameter of exosomes are larger than low-density lipoproteins (LDL)and smaller than red-blood cells (RBCs).These 7 exosomes play a critical role in various processes such as coagulation, intercellular signaling and waste management.Its physiological roles include exosomes as a source of protein and RNA biomarkers, as potential therapeutic agents.The failure of exosome-based therapy in endothelial cells can cause chronic kidney diseases and other health problems like atherosclerosis and hypertension.These exosomes can be potentially used for prognosis, in therapy and as biomarkers for identifying health disorders.

Related Work
During recent past, various classification methods as well as their applications have been developed to predict CKD.In what follows, we will review and investigate related works on classification methodologies for kidney diagnosis done previously.
In the field of medical diagnosis of chronic kidney disease, Noia et al. 8 present a classifier based on ensemble of ten artificial neural networks with the data collected over a period of 38 years.A software tool has been developed predicting the end-stage kidney disease (ESKD) risk of the patients as online web application as well as Android mobile application.
Gunasundari et al. 9 propose two modified Boolean Particle Swarm Optimization (BoPSO) algorithms viz.Velocity bounded BoPSO (VbBoPSO) and Improved Velocity bounded BoPSO (IVbBoPSO) for solving the problem of feature selection.Both these algorithms have been tested on 28 benchmark datasets.The proposed system selects exclusive features from the datasets to achieve high classification accuracy.
Inorder to detect CKD, Salekin and Stankovic 10 provide classification with feature selection methodology based on three classifiers namely K-nearest neighbour, random forest and neural networks.With the feature reduction methods namely, wrapper method and LASSO regularization, 12 attributes from 24 attributes have been selected to detect accuracy with high accuracy.Further, CKD detection has been done by reducing the number of attributes to 5.
Using WEKA data mining tool, Arora and Sharma 11 focus on CKD detection with eight classification methods namely SGD, Random Subspace, SMO, JRIP rules, Hoeffding tree, NaiveBayes, Locally weighted learning, oneR.Different performance measures of the resulting algorithms have also been presented.
Rubini and Eswaran 12 propose three classifiers viz.radial basis function networks, multilayer perceptron, and logistic regression for the prediction of CKD.Various performance metrics have also been calculated for the CKD dataset.
Gadaras and Mikhailov 13 provide a fuzzy classification method for the extraction of fuzzy rule sets from the dataset for building of medical diagnosis framework.The proposed method is compared with the existing methods on three medical datasets namely Wisconsin breast cancer dataset, pima Indian diabetes dataset, and bupa liver dataset.
Krause et al. 14 provide an overview on the role of exosomes in kidney growth and diseases like renal cancer.The review also includes recent research on the importance of exosomes as diagnostic markers and its therapeutic use to kidney diseases and cancers.The authors have also illustrated about the proteins found in human urinary exosomes existing in the regions of the kidney and also about the significant genes found in exosomes of the various species.Jella et al. 15 examined the regulation of epithelial sodium channels (ENaC) in mpkCCD cells.Glycolytic enzyme glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was located within exosomes which were derived from proximal tubule LLC-PK1 cells.The study illustrates the importance of exosomes in the modulation of EnaC activity and in collection of the duct cells.

MATERIALS AND METHODS
The description of the dataset and the abbreviations used throughout the paper are elaborated in this section.The terms used all through the paper are shown in table1.

Dataset
The Chronic Kidney Disease (CKD) dataset used in this study is obtained from UCI repository 16 .The information about each attribute and its description is shown in Table 2.This dataset contains missing values for its various attributes.The missing values for nominal attributes have been replaced by modes, and numeric attributes by means from the training data.The classification of the dataset is with respect to the kidney disease as ckd or notckd.Weka classification algorithms available at 17 were used for the purpose of comparative study among rule-based learners.The performance of each learner was tested using 10-fold cross validation.The CKD dataset was loaded in the WEKA classification module and trained several times to maximize the classification performance.

Methodology
The prevalence of ckd in the dataset was about 62.5 % and that of non-ckd 37.5 %.We have chosen rule based classification method.The step-by-step workflow of the detection process is depicted in Figure 2. In order to perform 10-fold cross validation, the dataset has been divided into training and testing set.The classification model derived from the rule-based learners is applied on the testing dataset for determining the performance metrics.The whole process is conducted 10-times for averaging of performance parameters.Finally, set of rules are generated by the rule-based learners with minimal set of attributes for CKD prediction.

Performance evaluation measures
This dataset has been evaluated on six different classifier performance parameters namely accuracy as defined by equation ( 1), true positive rate as defined in equation ( 2), false positive rates defined in equation (3), precision as defined in equation ( 4), F-measure as defined in equation ( 5) and area under ROC curve (AUC)....( 5)

Comparative analysis of the rule-based learners
In this section, various rule base learners used for the study are elaborated and studied in context of the various performance evaluation measures such as accuracy, true positive rate, false positive rate, precision, and F-measure.JRip: JRip 18 implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER).This algorithm is competitive with C4.5 rules in terms of error rates and more efficient on large instances and noisy datasets.It achieves an accuracy of 0.960 and ROC area 0.959.CART: Classification and Regression Tree algorithm was developed by Brieman etal. 19and utilizes Gini index as its splitting function.CART achieves an accuracy of 0.968 and area under curve 0.966 on CKD data set.Conjunctive Rule: This rule implements a single conjunctive rule learner that makes predictions for both numeric and nominal class labels.The rulebased learner attains an accuracy of 0.915 and ROC area of 0.924.C4.5:The rules generated by the C4.5 algorithm 20 utilize splitting criterion as gain ratio to determine the goodness of split.The accuracy achieved by C4.5 is 0.968 and ROC area 0.976.NNge: This instance based learning method 21 makes use of non-nested generalized exemplar to improve the performance of nearest neighbor classifier.This rule-based learner attains an accuracy of 0.988 and area under curve 0.983.3.

End
progression of the disease.The study concludes that rule-based classifier DTNB i.e.DecisionTable-Naïve Bayes learner achieves the highest AUC of 99.9 % and lowest FP rate of 1.1 %.The significance of our comparative study lies with the reduced set of features through which simple and comprehensive rules are generated.The findings of this comparative analysis on rule-base learners can be used as a diagnostic tool in prediction of chronic kidney disease.

Fig. 1 :
Fig.1: The location of kidneys in human body 4

2 :
9) and (htn = no) and (dm = no) and (al = 5) =>notckd Interpretation of the results Various rule-based classification strategies have been presented for detection of the CKD into two classes as ckd or notckd.The rule generated by the JRip rule learner utilizes three attributes hemo, bgr and al over the two rules for prediction as ckd and notckd.This learner achieves an accuracy of 96 % and AUC (0.959).Another rule based learner C4.5 employs six attributes and 11 rules for detecting CKD.The accuracy achieved by this learner is 96.8 % with AUC (0.976).A comparative analysis among the entire rule-based algorithms shows that DTNB i.e.Decision Table based Naïve Bayes classifier achieves lowest FP Rate (0.011) and highest area under curve (0.999).Accordingly, this study concludes that among rule-based learners DTNB is most effective in terms of ROC area.CONCLUSION In this paper, a comparative study of different rule-based classification approaches has been carried out.The performance measures of these rule-extraction methods on CKD dataset have been depicted.The rules extracted from the different rule-based algorithms can be used as a second opinion for diagnosing CKD and identifying persons lying in higher risk groups during the Fig. Workflow of the CKD prediction