A Novel Approach for the Diagnosis of Diabetes and Liver Cancer using ANFIS and Improved KNN

: The multi-factorial, chronicle, severe diseases are cancer and diabetes. As a result of abnormal level of glucose in body leads to heart attack, kidney disease, renal failure and cancer. Many studies have been proved that several types of cancer are possible in diabetes patients having a high blood sugar. Many approaches are proposed in the past to diagnose both cancer and diabetes. Even though the existing approaches are efficient one, the classification accuracy is poor. An Enhanced approach is proposed to achieve a higher efficiency and lower complexity. Adaptive neuro fuzzy inference system is used to classify the dataset with the help of adaptive group based KNN. The Pima Indian diabetes dataset are used as input dataset and classified based on the attribute information. The experimental result shows the classification accuracy is better than the existing approaches such FLANN, ANN with FUZZYKNN.


INTRODUCTION
When the production of insulin is affected which control the glucose level of the body leads to a disease called diabetes mellitus.The diabetes mellitus is the one of the main reason for nerve damage, kidney disease and also cancer.Many researches are going on this area and some of the statistics report shows that most of the people will die because of the high glucose level lead to either cancer or heart diseases.Diabetes mellitus cannot be healed fully but it can be controlled by using medicines such as insulin, food items.The diagnosis of diabetes is important classification problem (Vigneri et al., 2009).Some types of cancer such as pancreatic cancer, liver, endometrium cancer are commonly seen in diabetes patients.
Many researches have been conducted to find the relationship between the cancer and the diabetes mellitus recently in diabetes patients.The diabetes is a group of metabolic disorders and the reinterpretation is required, so the possibilities of cancer in diabetes patients are high (Vigneri et al., 2009).The metabolic and hormonal disorders are affecting the diabetes patients in a different way.Some of the risk factors of the cancer and diabetes are physical activity, diet, alcohol, smoking and obesity.The biological link mechanisms between cancer and diabetes are hyperinsulinemia, hyperglycemia and inflammation.The certain diabetic patients are affected by cancer because of the exposure of the treatment of diabetes.
Many new techniques are used for diagnosing both cancer and diabetes.The diagnosis using machine learning technique is one of the existing techniques which have a transparent diagnostic knowledge.Machine learning is classified into connectionist learning and symbolic learning.The user can easily understand the rules of symbolic learning techniques and it is considered as a comprehensible technique.The best example for the symbolic technique is rule induction which is extensively used for medical diagnosis (Richards et al., 2000;Lisboa et al., 2000).Artificial neural network is an incomprehensible technique and best example for connectionist learning techniques.The connections and information are hidden from the user in this technique.Artificial neural networks have been applied in medical field for various tasks (Kononenko, 2001;Andrews et al., 1995).
Artificial neural network has been applied to various processes such as pattern recognition and data classification in the medical field.It attracts the many researchers and it becomes multi-objective solution to the various problems (Asada et al., 1990).ANN working process is based on the neurons in human brain.It is used to find the connection between the given data as input from the user and output data based on the information merged from large number of cases (Grenier et al., 1994).The computer aided diagnosis with Bayesian is used for diagnosing lung diseases.The chest radiography, chest CT, clinical data values are the main parameters for diagnosing the lung disease.Even though computer aided diagnosis is consider as a supplementary technique for the diagnosis, performance of CAD on radiologists are not evaluated.
In recent days K-Nearest-Neighbour (KNN) technique is usually used in data mining for pattern recognition and classification problems (Moreno-Seco et al., 2003).This study examines the diagnosis of cancer and diabetes on the benchmark dataset using ANFIS by training using adaptive group-based kNN.The proposed approach is compared with the other data mining techniques utilize on the same dataset.It also examines if ANFIS integrates with AgKNN can increase its accuracy in the diagnosis of cancer and diabetes in patients.The rest study is divided into related works, proposed approach and performance evaluation.

Diabetes:
The method for diagnosing a diabetes using back propagation neural network algorithm is proposed in this study (Siti and Dannawaty, 2005).The blood pressure, glucose concentration in blood, serium insulin, BMI, number of times a person pregnant and age are the main parameters for the diagnosing diabetes.The main drawback of this framework is missing values of the input.Jayalakshmi and Santhakumaran (2010) improved this framework to overcome the drawback of this system.The input dataset is reformed using the missing values and it improves the framework by enhancing the classification precision.Data preprocessing technique also proposed in this study which is used to improve the speed of the framework.Dey et al. (2008) proposed a technique using a backpropogation neural network along with binary classification.The inputs to this framework are blood sugar test, post plasma, age.The percentage of performance of this framework is 92.5% when it is compared with previous framework.The prediction of gloucose level for future based on the current level is using artificial neural network.Ann is trained using the parameters to predict the glucose level.
The Pima Indian diabetes dataset is used for research work.The frameworks with 22 different classifiers are used for the accurate classification using input as Pima Indian diabetes dataset.The performance evaluation of framework based on k means and knn is compared with existing framework and accuracy range of framework is 66.6 to 77.7% (Michie et al., 1994).The model using hybrid k-means and decision tree with the help of support vector machine, GDA, reaching the classification accuracy of 82.05%.The author achieved different classification accuracy for framework with various combinations of ANN, DT_ANN and cascaded GA_CFS_ANN (Humar and Novruz, 2008;Patil et al., 2010;Karegowda et al., 2011).
The insulin usage prediction can be done by using neuro fuzzy systems with the help invasive blood tests (Dazzi et al., 2001).The neuro fuzzy based on the back propagation network and fuzzy logic and it is used to find the connection between variables and rules.The neuro fuzzy system is trained using 1000 BG values and tested with 400 BG values were used to build the nomogram.When it is compared with the conventional control systems neuro-fuzzy system shows the better results in insulin variation and maintains constant body glucose level.

Cancer:
The nuclear imaging method attracts the some of the researchers in the medical field.Most of the researches based on the information provided by patients along with images and remaining researches based on the images only.Computer aided design is used for the classification of affected cells from normal one and it used to predict the cancer.The combinations of detection and segmentation task are used in the tumor localization problem.Even though the standard segmentation methods are used in this problem, some drawbacks are there such as lack of sensitivity and specificity.Some of the reports stated that the image and data are mainly used for feasibility studies.They are (Ying et al., 2004) proposed a method which gives better visual performance but failed to present quantitative results when it is compared with the other methods.Even though Guan et al. (2006) gives good sensitivity and specificity compared with the other methods, CT/PET images and 2 D scale image are used to false detection.In Saradhi et al. (2009) this study comparing the previous image processing methods with the supervised classification schemes.

METHODOLOGY Proposed system for diagnosis of diabetes and cancer:
The proposed approach for diagnosing both cancer and diabetes using ANFIS and Adaptive group based KNN.

ANFIS architecture:
To enhance the learning and adaptations of the adaptive system neuro fuzzy inference system is used in this model.The first order fuzzy inference system based on if then rules is used in ANFIS architecture (Karlık et al., 2003): where, ˲ and ˳ are the inputs, ˓ and ˔ are the fuzzy sets, ˦ are the outputs within the fuzzy region specified by the fuzzy rule, J , J IJˤ J are the design parameters that are determined during the training process.The ANFIS architecture to implement these two rules is shown in Fig. 1, in which a circle indicates a fixed node.In the first layer, all the nodes are adaptive nodes.The outputs of layer 1 are the fuzzy membership grade of the inputs, which are given by: where ˓ {˲{, ˔ $ {˳{ can adopt any fuzzy membership function.For example, if the bell shaped membership function is employed, ˓ {˲{ is given by: where, A i , b i and c i are the parameters of the membership function, governing the bell shaped functions accordingly.
In the second layer, the nodes are fixed nodes.They are labeled with M, indicating that they perform as a simple multiplier.The outputs of this layer can be represented as: which are the so-called firing strengths of the rules.
The above diagram depicts the typical adaptive neuro fuzzy inference system with 5 layers.In each layer circle depicts fixed node, square shows that adaptive node and each layer is used for different purpose.To obtain the preferred performance number, type, parameter and rules of fuzzy membership functions are used and it is selected based on the trial and error method.Although the better performances are achieved, these parameters are hard to use in some situations.To overcome this problem ANFIS is trained to get optimal premise and consequent parameters.
In the third layer, the nodes are also fixed nodes.They are labeled with N, indicating that they play a normalization role to the firing strengths from the previous layer.The outputs of this layer can be represented as: which are the so-called normalized firing strengths.
In the fourth layer, the nodes are adaptive nodes.The output of each node in this layer is simply the product of the normalized firing strength and a first order polynomial (for a first order Sugeno model).Thus, the outputs of this layer are given by: In the fifth layer, there is only one single fixed node labeled with S. This node performs the summation of all incoming signals.Hence, the overall output of the model is given by.
It can be observed that there are two adaptive layers in this ANFIS architecture, namely the first layer and the fourth layer.In the first layer, there are three modifiable parameters {I , I , I }, which are related to the input membership functions.These parameters are the so-called premise parameters.In the fourth layer, there are also three modifiable parameters {J , J , J }, pertaining to the first order polynomial.These parameters are so-called consequent parameters.

ANFIS classifier:
Adaptive neuro fuzzy inference system is the combination of fuzzy inference system and learning power of artificial neural network.It mainly aims to incorporate the best features of the fuzzy systems and neural network.The algorithms such as gradient descent and back propagation are used to train the artificial neural network systems by regulating the membership functions and weights of the defuzzification.
The GUI parameters which is used to regulating the fuzzy inference system are mentioned below: • Fuzzy Inference System (FIS) Editor (FIS editor handles issues such as the numbers of input and output variables; their names etc.) • Membership Function Editor (Membership function editor defines the shapes of all the membership functions associated with each variable) • Rule Editor (Rule editor is used for editing the list of rules that define the behavior of the system) • Rule Viewer (Rule viewer is used for looking at the Fuzzy Inference System (FIS) to help diagnose the behavior of specific rules or study the effect of changing input variables) The adaptive neuro fuzzy inference system process is explained.In ANFIS assume the model structure is used to relate the membership function, rules input and output in cyclic process.The collection of input and output data which is used to train the fuzzy inference model by regulating the membership function based on processed.˕ {˩{-Categorization result of ith document by jth group.The best example for the categorization calculation for groups is defined as diagnosing the liver cancer {number of times pregnant, plasma glucose level, blood pressure, skin rashes and age} as {1, 2, 3, 4, 5}.˕ % {5{ means third parameter is considered, i.e., blood pressures is going to be discussed.˕ is the average value of different categories calculated by feature distance in groups.Sample data as: {Number of times pregnant, plasma glucose level, blood pressure, skin rashes and age} ˕ = Adaptive training group is determined as: The samples data's are suggested to the variance of different groups by adjusting the grouping situation.If the variance of the grouping data is higher than the threshold then the categorization results are inaccurate.The reason for inaccurate results is more groups are required for final decision.If the variance is low means then the sample groups are merged without any disputes in classification results.Threshold value can be calculated as a by using lower and higher bound (1/˕ and ˕).The value of ˫ can be calculated adaptively as: -Random initial value of ˫.The random value can be tested by the system to check whether it is suitable or not.To obtain the exact categorization results the value of k should be adjusted.
The k value can be tested by the system to check whether it is suitable for group or not and can be set by algorithm with the help of training set.The groups are adjusted based on the variance of categorization results by different groups in real time (Chun-Hong and Wei, 2009;Coomans and Massart, 1982).Runtime complexity for n elements are 3J -1 and computational complexity is calculated as a: Training of network i.e., error correction is stopped when the value of the k has become sufficiently small and as desired in the required limits (Duch et al., 2000).Total error for J observation of data set and ˪ neuron in the output layer can be computed as: where, ˮ represents the desired target output, ˳ represents the predicted from the system and ˗

error correction
The problems of k nearest neighbor is reduced in adaptive group based KNN.When the AGKNN is compared with the traditional KNN the proposed algorithm shows the higher efficiency and robustness by solving the experience dependent problem and the algorithm shows the accurate results by solving category balance.
The training data set used to train the neural network which is implemented in the client server architecture.The neural network contains 8 nodes in input layer based on the input attributes.The pima Indian diabetes dataset are used as input and applying the adaptive group based k nearest neighbor algorithm for training the network.The training set or sample parameters are grouped into the multiple groups.The data are classified simultaneously in each group with random value of k and compare the results.The same algorithm is applied to same network without propagation of errors and obtaining the efficient results.The proposed algorithm i.e., combination of ANFIS and AGKNN are compared with the previous methods and it outperforms the existing method in classification accuracy.The proposed algorithm accuracy is calculated by using 10 fold CV in weka classifier.The calculated accuracy is compared with the existing approaches such as FLANN, ANN with FUZZYKNN.
In training data set the unwanted values or less significant values are removed by preprocessing the data to obtain the classification accuracy more than the existing methods.

DATASET DESCRIPTION AND RESULTS
In our study we have used Pima Indian Diabetes data sets (Ngoc and Edward, year) for training and testing the neural network model.

Attribute information:
• Number of times pregnant The 8 input parameters are Number of times pregnant, Plasma glucose level, Diastolic blood pressure, Triceps skin fold thickness, 2-H serum insulin, body mass index, Diabetes pedigree function and Age.The output parameter name is class.Class value positive is interpreted as "tested positive for diabetes and cancer" or else "tested positive for diabetes" and class value negative is represented as "tested negative for diabetes and cancer".
All the input parameters had numeric values.First parameter is total number of times the patient was pregnant.Second parameter is the value of oral glucose tolerance test which is used to find the amount of glucose level in the blood.Third parameter is the diastolic blood pressure value which is measured in millimeter by Hilo gram.Fourth parameter is triceps skin fold thickness which is a measure in millimeter.Fifth parameter is 2-h serum insulin test values.It is conducted to find the amount of insulin creation in the patient body.Sixth parameter is the patents body mass index.It is calculated by the following formula: Body mass Index = Patient weight in kg / {patient height in meter{ $ Seventh parameter the relationship function value of diabetes family hierarchy and eighth parameter is age of the person.
Experimental results: ANFIS is relatively fast to convergence due to its learning strategy and its easy interpretation.It is a more transparent model and its behavior can be explained in human understandable terms, such as linguistic terms and linguistic rules.
The ANFIS classifiers were trained with the Adaptive Group-Based KNN method in combination with the least squares method when attributes such as Number of times pregnant, Plasma glucose level, Diastolic blood pressure, Triceps skin fold thickness, 2h serum insulin, body mass index, representing the patient details were used as inputs.
The fuzzy rule architecture of the ANFIS classifiers were designed by using a generalized bell shaped membership function.Each ANFIS classifier was implemented by using the MATLAB software package (MATLAB version 7.0 with fuzzy logic toolbox).The data sets were divided into two separate data sets-the training data set and the testing data set.The adequate functioning of the ANFIS depends on the sizes of the training set and test set.
The training data set was used to train the ANFIS model, whereas the testing data set was used to verify the accuracy and the effectiveness of the trained ANFIS model for classification.
The test performance of the classifiers can be determined by the computation of total classification accuracy and RMSE.The graphical representation for root mean square error is shown in the Fig.   a conclusion, these results show that the ANFIS based AGKNN approach provide the highest TCC accuracy in the detection of Cancer and Diabetes.The total classification accuracy is defined as ratio of number of correct decisions and total number of cases (Table 1): The graphical representation of the sensitivity and specificity comparison is shown in the Fig. 5.The performance of the proposed classifier is better than the other existing classifiers such as navie bayes network.As a conclusion, these results show that the ANFIS based AGKNN approach provide the highest TCC accuracy in the detection of Cancer and Diabetes.

CONCLUSION
To predict both diabetes and cancer many researches has been conducted.The dataset used here is pima Indian diabetes dataset for predicting both cancer and diabetes.A novel approach by using Adaptive neuro fuzzy inference system is used and to train the neural network Adaptive group based k nearest neighbor algorithm is used.The input nodes in neural network are constructed based on the input attribute.The hidden nodes are used to classify given input based on the training dataset with the help AGKNN by grouping the training dataset.The experimental results of the proposed approach show that classification accuracy is better than the existing approaches.The proposed model has lower complexity, achieves higher efficiency and performs the pattern classification well than the traditional methods.

Fig. 2 :
Fig.2: Diagnosis of cancer and diabetes using ANFIS with KNN 3. Sensitivity vs. Specificity comparison: TP : Number of True positives FP : Number of False positives TN : Number of True Negatives FN : Number of False Negative True Positive Rate (TPR) (also called sensitivity, hit rate and recall): J˥JJ˩ˮ˩˰˩ˮ˳ = False Positive Rate (FPR) (also called false alarm rate):

Table 1 :
Performance of pima Indian diabetes model