Fuzzy Unordered Rule Using Greedy Hill Climbing Feature Selection Method: An Application to Diabetes Classification

Diabetes classification is one of the most crucial applications of healthcare diagnosis. Even though various studies have been conducted in this application, the classification problem remains challenging. Fuzzy logic techniques have recently obtained impressive achievements in different application domains, especially medical diagnosis. Fuzzy logic technique is unable to deal with data of a large number of input variables in constructing a classification model. In this research, a fuzzy logic technique using greedy hill climbing feature selection methods was proposed for the classification of diabetes. A dataset of 520 patients from the Hospital of Sylhet in Bangladesh was used to train and evaluate the proposed classifier. Six classification criteria were considered to authenticate the results of the proposed classifier. Comparative analysis proved the effectiveness of the proposed classifier against Naive Bayes, support vector machine, K-nearest neighbor, decision tree, and multilayer perceptron neural network classifiers. Results of the proposed classifier demonstrated the potential of fuzzy logic in analyzing diabetes patterns in all classification criteria.


INTRODUCTION
Medical diagnostic is the operation of classifying which condition explains a person's status into a distinct and separate disease (Sisodia & Sisodia, 2018). It is frequently related to medical context being implicit. With technological development, different devices could be used for monitoring and collecting medical data about a specific disease (Vitabile et al., 2019). These data could be used in the future for determining and making medical decisions regarding prognosis and treatment to improve accuracy, reliability, and diagnostic speed.
Diabetes could affect one out of four people over the age of 65 years (Morgan, 2018). According to the World Health Organization, this disease has affected over 246 million people worldwide, and this number is expected to increase to 380 million by 2025 (Durgadevi & Kalpana, 2018). It is common knowledge that some people are unaware of having this disease. Thus, diabetes has been categorized as the fifth deadliest disease in the world, with no coming treatment in sight. Diabetes could be controlled when detected early. By contrast, late diagnosis could lead to potential complications and, over time, may lead to kidney disease, stroke, heart disease, foot problems, eye problems, and nerve damage (Centers for Disease Control and Prevention, 2017). With the rise of artificial intelligence and its continued advent into the healthcare and medical diagnostic sectors, the cases of diabetes and their symptoms are well controlled and detected.
Artificial intelligence or machine intelligence in the medical field refers to the simulation of the intelligence of medical experts in machines or computers programmed to act like experts and imitate their experience for medical diagnosis (David, 2016). Artificial intelligence is widely used to derive interesting patterns from medical data (Sharif et al., 2019;Ngan et al., 1999). Various artificial intelligence methods are used by experts in analyzing diabetes. Different data mining algorithms have been recently used to detect diabetes using the principles of artificial intelligence, machine learning, and statistical methods, such as K-nearest neighbor (KNN) (Saxena et al., 2014), ant colony optimization (ACO) (Ganji & Abadeh, 2011), support vector machine (SVM) (Barakat et al., 2010), artificial neural network (ANN) (Smith et al., 1988), and decision tree (Kaur & Chhabra, 2014). Different studies show the success of these classification algorithms; nevertheless, the classification models created by those classifiers are of a complex mathematical model, which are considered incomprehensible and opaque to humans. This weakness prevents the usage of these classifiers in various real-life domains, whereby both comprehensibility and classification accuracy are needed. According to the authors' knowledge, no comprehensive research has been conducted to combine fuzzy unordered rule algorithm (FURIA) (Hühn & Hüllermeier, 2009) together with greedy hill climbing feature selection method (Venkatesh & Anuradha, 2019) to detect early-stage diabetes. Therefore, the transparency and clarity of FURIA (i.e., fuzzy rules induction) and the feature reduction ability of greedy hill climbing method have improved the performance of classification and detection accuracy of diabetes.
In this paper, a comparative study between popular classification algorithms and the proposed technique is presented. This research aims to provide high classification accuracy and a comprehensive comparative result of different classification techniques in diabetes. The rest of this paper is structured as follows. The related works with different classification techniques in diabetes are described in the Related Works section. Then, the proposed hybrid fuzzy unordered rule using the greedy hill climbing feature selection method is provided. The evaluation of the proposed method used in this paper is explained, followed by the analysis of the results in the Results and Discussion section. The final section concludes the research and provides some possible future research directions.

Related Works
Diabetes has been studied in the literature by various researchers from countless aspects. As a life-threatening disease, early diagnosis of diabetes could save many lives. Artificial intelligence has been widely applied in areas of medical diagnosis and diabetes prediction to ensure an accurate and meaningful result. This involves either pattern recognition or classification system. Fuzzy logic has also been used to handle noisy, irrelative, and ambiguous data to improve the classification performance of many feature selection algorithms.
Fuzzy system constructs large knowledge for diabetes detection. Vieira et al. (2012) proposed a fuzzy extension criterion as a searching strategy to allow more flexible data into fuzzy space, which enables a variety of features to be considered during the optimization process. The extension criteria are used to solve the problem of multi-objective optimization. This problem occurs when a minimum number of features is obtained, which reduces the classification accuracy. Therefore, the research provides new objective functions (i.e., fuzzy) with wide flexibility in solving the problem of multi-objective feature selection. The UCI datasets with a diverse number of features (ranging from 9 to 279) and sample size (ranging between 178 and 699) have been used in the evaluation of the proposed fuzzy objective functions. The proposed fuzzy approach exhibits high classification performance for the majority of the datasets. Cai et al. (2016) applied a fuzzy criterion in multi-objective unsupervised learning by using hybridized filter-wrapper approach. This method allowed for an active approach to select features from the data and to avert misunderstanding of overlapping features in an unsupervised multi-objective clustered problem. An experiment was carried out using benchmark datasets that showed a superior performance of the proposed method in both number of features and accuracy. Jalali, Nasiri, and Minaei (2009) presented a wrapper-based feature selection method based on consistency measure function and fuzzy logic. The work projected the full dataset into a fuzzy space, and then the consistency measures selected the best feature subset. Evaluation of the proposed method was demonstrated by testing nine datasets from a real-world problem. It showed that all numbers of features were reduced with higher classification performance in five out of nine datasets. In the remaining datasets, equal classification performance was obtained.
Nosrati and Eftekhari (2014) employed fuzzy similarity measures integrated with multi-objective genetic algorithm (GA) for feature selection and classification problem to find the optimal set of features (subset). The performance of this method was evaluated by using UCI datasets, and the results showed superior performance as compared to correlation-based feature selection methods. Hedjazi et al. (2010) applied fuzzy logic to solve efficiency and operation safety of sensors in the industrial plant domain. Fuzzy logic was used in the learning algorithm for sensor situation identification. It was employed for feature selection in choosing the optimal number of sensors that were either operational or faulty. The work used centered binomial as a membership function for attribute selection. The results showed that the method had a highly accurate performance.
El-Alfy and Al-Obeidat (2014) developed a multi-criteria fuzzy classifier hybrid with greedy attribute selection method for network anomaly detection. This hybrid technique had a significant impact on the performance of intrusion-detection systems. The proposed hybrid system enhanced the detection rates for different kinds of intrusions and reduced the number of selected attributes to about 74 percent. A swarm intelligence classifier called FCS-ANTMINER for diabetes diagnosis was developed by Ganji and Abadeh (2011). The classifier was based on a combination of fuzzy logic and ACO in order to extract a predictive model. The classification performance was compared with state-of-the-art classifiers used in the literature, and the result for classification was 84.24 percent.
Another classification system combined hybrid ACO and fuzzy logic for diabetes classification. The result showed that the proposed method outperformed the baseline classifiers in terms of classification accuracy. This research also provided a detection expert system for diabetes (Tnv & Gundabathina, 2016). Beloufa and Chikh (2013) presented an artificial swarm intelligence algorithm called modified artificial bee colony (ABC) algorithm for diabetes classification. They used this modified algorithm to create an optimal fuzzy classifier by searching for optimal fuzzy rules and membership functions simultaneously on the basis of classification accuracy and high readability. The experimental result showed that the proposed fuzzy classifier based on modified ABC algorithm could be a helpful tool for diabetes diagnosis (Beloufa & Chikh, 2013). Jain and Raheja (2015) combined fuzzy verdict mechanism in the fuzzy logic system for diabetes diagnosis. The proposed mechanism was introduced to enhance the result's accuracy by providing a decision on whether the patients were suffering from diabetes or not and to produce a comprehensive result. This research considered urine as an important parameter for diabetes disease. The obtained classification result was promising at 87.02 percent.
Other artificial intelligence methods have been applied in areas of medical diagnosis and diabetes prediction to ensure accurate and meaningful results. Smith et al. (1988) proposed an ANN model for diabetes prediction. ANN is considered one of the best algorithms for data classification problems. The research was conducted on Pima Indians, who were considered a high-risk population of having diabetes. The result of the classification accuracy was 76 percent. The research by Temurtas et al. (2009), which used multilayer ANN to solve the classification problem, produced a 79.62 percent accuracy. Multilayer perceptron (MLP) has been known to contain more than one layer in its structure, which helps in producing a good performance. Kahramanli and Allahverdi (2008) developed a hybrid system consisting of ANN and fuzzy neural network. In the research, two medical datasets were used (Cleveland heart disease and Pima Indian diabetes) to evaluate the performance of this hybridization. The proposed hybrid system enhanced the classification accuracy with 84.24 percent for diabetes. Jaganathan et al. (2007) proposed an improvement on quick reduct as data pre-processing (i.e., reduction methods). Then, they applied swarm ACO algorithms for diabetes prediction. The classification improved to 76.58 percent when compared with the original ACO algorithm. This pre-processing stage improved the quality of data by selecting the most related attributes to be used for classification.
The research by Mei et al. (2017) demonstrated a personalized hypoglycemic medication classification for diabetes. The dataset was from the Electronic Health Records (EHR) repository of China consisting of 21,796 patients with diabetes. This research used hierarchical recurrent neural network and compared the performance with that of a logistic regression classification algorithm. The result showed that the proposed technique outperformed the logistic regression. In a study by Saxena et al. (2014), KNN classification algorithm was used on a dataset from Stanford University repository, which comprised 11 patient attributes. Different K values were used in the experiment, which showed that the best result of K = 5, with 75 percent prediction accuracy. Three classification algorithms were proposed by Sisodia and Sisodia (2018) for diabetes diagnosis. These algorithms were Naive Bayes, SVM, and decision tree. The highest classification accuracy obtained was 76.30 percent using the Naive Bayes classifier. Perveen et al. (2016) used the same traditional J48 decision tree with AdaBoost ensemble and bagging methods. The database was collected by Canadian Primary Care Sentinel Surveillance Network. The result showed that AdaBoost ensemble with decision tree was better than bagging with decision tree. Another research was conducted on Egyptian patients with diabetes (Karim et al., 2016). This research added a new factor, which was age. This study aimed to classify people who would have the disease or not as an early warning before reaching the critical phase. The decision tree classification algorithm was used with 84 percent accuracy. However, the result was not compared with other classification algorithms. Real data from 100 patients were collected for the prediction of diabetes types in a research conducted by Rahman et al. (2014). A healthcare system was developed, which consisted of AdaBoost method with random committee classifier to supply a service for patients with diabetes. The system produced a result with 81.0 percent accuracy, and the future direction was to add a feedback method on the system to increase the user satisfaction level. Aishwarya and Anto (2014) proposed a hybridization of GA and extreme learning machine for a medical expert system. GA was used in this system as a feature selection method. The accuracy of the proposed medical system was promising at 89.54 percent as compared to that of the other existing results in previous studies. In another research conducted by Kaur and Kumari (2019), five different classification models for diabetes diagnosis were developed. These models were multifactor dimensionality reduction, KNN, ANN, linear kernel for SVM (SVM-linear), and radial basis function kernel for SVM. The research was conducted using R data tools on the Pima Indian dataset. The best classification result was found in SVM-linear and KNN. For future research directions, the researchers proposed Boruta wrapper algorithm as the feature selection technique before building the diabetes prediction model. A recent study proposed by Faniqul et al. (2019) considered new factors for diabetes detection at the early stage. The dataset was collected from patients with diabetes at the Hospital of Sylhet in Bangladesh. The data consisted of information regarding 520 patients and included 16 main factors. The result of the classification accuracy was investigated using three classifiers, namely random forest, logistic regression, and Naive Bayes. The result obtained showed that random forest had the best classification accuracy on this dataset. Mishra et al. (2020) proposed a hybrid classification technique that used the variation of GA based on multilayer perceptron called enhanced and adaptive-GA-multilayer perceptron (EAGA-MLP). This work introduced a new fitness function called chromosome swapping and a variation of mutation called Restrics mutate. The proposed method outperformed different variants of MLP, which are: GA-MLP, E-GA-MLP, A-GA-MLP, and EAGA-MLP.
In summary, the existing classification techniques used in the literature have two types, as summarized in Table 1. The first type is considered as a black box (e.g., SVM, ANN, KNN, random forest, logistic regression, Naive Bayes, and random committee). These techniques produce high classification accuracy, but the classification models are not understandable. The other type (i.e., decision tree and fuzzy logic) produces an understandable model with low classification accuracy. This weakness prevents the usage of these classifiers in diabetes, whereby both comprehensibility and classification accuracy are needed. Thus, an extension to decision tree classifier called FURIA was introduced, and it outperformed the other classifiers of the second type (Hühn & Hüllermeier, 2009). In addition, the greedy hill climbing feature selection method has an ability to find the main factors and explain the relationships between these factors. Therefore, this research has introduced a new FURIA with greedy hill climbing feature selection method to diagnose diabetes. Testing was performed with the most related diabetes symptoms dataset. These data were collected from the Hospital of Sylhet in Bangladesh by Faniqul et al. (2019). The dataset was directly collected from patients who are recently diagnosed with diabetes or those who still have not suffered from diabetes but with some symptoms.

Proposed Hybrid Fuzzy Unordered Rule WITH Greedy Hillclimbing Feature Selection Method
A general framework of the proposed diabetes classification model is presented in Figure 1. The hybrid fuzzy unordered rule and greedy hill climbing feature selection method were used to collect the most related features from the data for diabetes diagnosis to produce a simple classification model. The two main stages of diabetes diagnosis are: i) feature selection method, in which the greedy hill climbing used a correlation-based filter to eliminate the irrelative features; and ii) classification model, which was responsible for extracting patterns.

General Framework for the Proposed Diabetes Classification Model
The outline of the proposed classifier algorithm, which consists of 24 steps, is shown in Algorithm 1.  { υ1, υ2, υ3 ,..., υn}, which determined the probability value of each attribute. The best attribute subset consisted of attributes highly correlated to the classification class (target class) and uncorrelated with each other. The third step involved measuring the uncertainty and unpredictability in the classification model, and was conducted using entropy. The entropy of A attributes is given by Equations 1 and 2 (Venkatesh & Anuradha, 2019): (1) where the entropy of A after observing another variable X.
In the fourth step, the algorithm began the iteration, which was terminated using the stopping criteria of either the classifier arriving at the limit number of iterations or no improvement achieved for the predefined number of iterations. The fifth and sixth steps focused on searching for the worst attribute in the neighborhood and removing it to create a different subset that would be based on the current subset. In the seventh step, a new list of features is denoted as AttributesSet {}*, which represented the set of features after deleting the selected attribute. The eighth step involved checking if the deletion of the attribute increased the quality of the current subset (i.e., attribute fitness). In the ninth step, the new subset was saved as the current subset if and only if the quality increased, and the stage proceeded to the tenth step.
The fuzzy unordered rule classification stage initiated on the 12 th step. Fuzzy unordered rule algorithm adopted the separate-and-conquer approach and was a modified version of the well-known repeated incremental pruning to produce an error reduction (RIPPER) algorithm (Cohen, 1995). The RIPPER algorithm, used to learn patterns for rule construction, is an extension of the incremental reduced error pruning (IREP) rule induction classifier (Mohamed et al., 2012). RIPPER enhanced IREP in many aspects, such as it is able to deal with multinomial classification problems (Hülhn & Hüllermeier, 2010).
In the FURIA classifier, the order of the classes was not applicable, indicating that the default rule used in the majority classifiers was irrelevant. In the 12 th step, FURIA selected a specific class to create a fuzzy rule to it in accordance with a list of covered training instances.
In the 13 th step, fuzzification was conducted to create fuzzy rule by using the fuzzy logic concept. FURIA calculated the best fuzzification of membership in terms of purity. In addition, FURIA selected terms from the data to be added to the rule in accordance with maximizing the information gain (IG) criterion, which measured the improvement of the rule performance for the specific class. IG was measured according to the following Equation 3 (Hülhn & Hüllermeier, 2010): ( 3) where and represent the number of negative and positive instances, respectively, covered by the construction rule. In the same way, P and N represent the number of negative and positive instances, respectively, covered by the default rule.
The 14 th -19 th steps involved the main FURIA algorithm loop (while) with stopping criteria of uncovered instances by specific fuzzy rules. These steps included checking if the rule error was more than or equal to 0.5, then the stopping criteria = true. If so, the algorithm would delete the newly created rule. The 20 th step also involved fuzzification for realizing the largest purity antecedent. Fuzzification was repeated until all antecedents were fuzzified. Fuzzification played a crucial role in classifying new instances even though the purity on the training data did not vary in the fuzzification step. The relevant instance of the antecedent may change in this step. Thus, recalculation was conducted in each iteration. In the 21 st step, all rules were evaluated by computing the confidence degrees on the basis of a certainty factor.
The rule stretching method was performed in the 22 nd step to exploit the antecedents of the rule. This method consisted of two factors. For the first factor, each rule was treated as a list of antecedents <α1,α2,....,αm> instead of a set of antecedents {α1,α2,....,αm}. This method aimed to reflect the most important antecedents in the rule. In general, the list of antecedents is denoted as <α1,α2,...,αk>, where k ≤ m, k is represented as j − 1, and αj is the antecedent not satisfied by the query instance. The evaluation was measured using Equation 4 (Hühn & Hüllermeier, 2009): where Tp is the number of true positive instances, and N the number of true negative instances covered by the fuzzy rule. The second factor was rule stretching, which considered the degree of generalization: too short rules were deducted, as removal of the antecedents decreased the rule's relevance. Laplace Correction was responsible for determining the quality of the rule on the basis of the antecedents' number, and preference was given to longer rules. In the 23 rd step, the pruning method was applied to remove all unwanted antecedents from a rule to create the replacement, except in the situation wherein the pruning method would delete all antecedents from the rule, thereby creating a default rule to cover the set of instances.

EVALUATION OF THE PROPOSED METHOD
This section describes the research methodology of this study. The state-of-the-art diabetes classification algorithms are explained, followed by description of the parameters and their values used in the algorithms. Lastly, the datasets and their characteristic are described. Figure 2 shows the main steps to develop a classification model.

State-of-the-art Diabetes Classification Algorithm
Pattern data classification was utilized to group each case of data into one of the predefined sets of classes. Classification is a data mining task that accurately classifies each item of a target class. The most famous classification algorithms used for diabetes predication will be explained in this study.
The Naive Bayes classification algorithm, a statistical classification method based on the Bayes theorem, has the ability to classify the probability of membership of the target class (Faniqul et al., 2019;Irfan et al., 2018). Support vector machine (SVM) is a supervised classification algorithm that analyzes a set of input cases and generates non-probabilistic binary linear models. SVM uses input cases to recognize patterns that could predict a target class (Aydin et al., 2011;Barakat et al., 2010). Multilayer perceptron neural network (MLPNN) is a feedforward type of ANN. It is a statistical nonlinear data modeling method. This method has the ability to find patterns in the data by dealing with complicated relationships between input and output (Temurtas et al., 2009;Tkáč & Verner, 2015).
Decision tree is a popular data mining method that classifies the target class on the basis of multiple input variables. The classifier builds a decision tree model in accordance with the internal node that is equivalent to input variable, with each possible input variable value at the edge of the decision tree nodes. Decision tree consists of a leaf representing the value of target variables, and the decision tree pattern is based on the path from the root to leaf (Hssina et al., 2014;Perveen et al., 2016). KNN classifies the data cases based on similar cases that were previously classified. It decides the target class of new given data by examining the KNN of the most equal neighbors and assigns the same class (Kaur & Kumari, 2019;Liao & Vemuri, 2002).

Common Parameters
In all classifiers successfully used to classify diabetes, the values of the input parameters of the classification algorithms are shown in Table 2.

Dataset Details
The diabetes dataset was obtained from patient information at the Hospital of Sylhet in Bangladesh. This dataset has been used to investigate patients with early-stage diabetes in accordance with the World Health Organization criteria. The dataset consisted of 520 cases with 16 attributes (i.e., features) and was divided into positive and negative classes. However, the target for this research was to identify the main attributes (factors) expected to be highly related with the occult development of diabetes. The details of the dataset are shown in Table 3, which included the name of feature, number of values for each attribute, number of values in each class, and the type of attributes. Other experiments were performed using nine UCI small and medium size medical benchmark datasets in ARFF Weka's format to test the performance of the proposed classifier. These datasets are popular medical datasets in the literature, and they have different attribute numbers, which lie between 9 and 19. The datasets exhibited binary and multi-label classes. They also have different instance numbers within the range of 106-768. The main characteristics of the medical benchmark datasets are listed in Table 4.

Results and Discussion
This section evaluated the performance of the proposed classification algorithm. In the first step, it presented the experimental methods, the evaluation criteria used in the experiments, and the majority of diabetes classification algorithms. In the second step, experiments were conducted to obtain connections between the values of the input parameters for the proposed technique and achieve good classification results. Finally, the results were compared with those of baseline classification algorithms in diabetes diagnosis. The well-known k-fold cross-validation test was used to partition the dataset into ten parts Gupta et al., 2016;Hairuddin et al., 2020) In each test, one part was used for the testing set, while the rest was used for the training set. The test was repeated ten times, with different datasets for testing each time. The average classification performance of the ten runs indicated the performance of the classifier. For more robust analysis results, the evaluation was measured by the combination of facts and values of six evaluation criteria. These criteria were true positive (TP) rate, false positive (FP) rate, precision, recall, F-measure, and accuracy. In addition, the input parameters of the algorithm are as follows:  Fold's, responsible for instance numbers used for pruning the fuzzy rule.  MinNo, the minimum total weight of the instances covered by each discovered rule.  Optimizations, the number of optimizations run.  Uncovered instance methods, the methods that deal with the uncovered instances.
The fold's parameter was responsible for determining the instance number used for pruning the fuzzy rule. In accordance with the experimental result, the fold's parameter could influence the classification accuracy at a maximum percentage of 1 percent. Figure  3 represents the values of the folds. The best value was obtained when the number of folds was medium. Figure 4 presents the results obtained from the classification accuracy's point-of-view if the minimum total weight of the instances (i.e., min No) in a rule were 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 and the fold's parameter as kept at fixed value = 5. The best classification result was selected as minNo = 2.

Figure 3
Instance Numbers Used for Pruning the Fuzzy Rule

Figure 4
Minimum Total Weight of the Instances in a Rule Figure 5 depicts the number of optimizations run on this parameter, which was mostly responsible for the learning process and determined the growth of the execution times in the learning process. In this experiment, the fold's parameter was kept at fixed value = 5 and minNo = 2. The best classification result was obtained with the number of optimizations = 4. Figure 6 portrays the classification result based on three methods performed for uncovered instances. These methods were rule stretching, vote for the most frequent class, and reject the decision and abstain. In this experiment, the fold, minNo, and optimization parameters were kept at fixed values of 5, 2, and 4, respectively. The best result was obtained with voting for the most frequent class. These methods could influence the classification accuracy at a maximum percentage of 1.73 percent.

Figure 5
Number of Optimizations Run

Figure 6
Methods Performed for Uncovered Instances This section compares the results of the hybrid classifier with those of the state-of-the-art classification algorithms. These classifiers included Naive Bayes, SVM, MLPNN, KNN, and decision tree. An experiment on early-warning diabetes at the Hospital of Sylhet in Bangladesh was conducted for all classification algorithms. In the first evaluation stage, Table 5 indicates the experimental results of the average prediction performance in 10-fold cross-validation. For each criterion, the best result was written in bold. Table 5 displays that the proposed method was the best among all other classifiers in all evaluation criteria. Best classification accuracies are as highlighted. Furthermore, the proposed method found 11 significant factors out of 16 factors that were available in the dataset. These factors were age, gender, polyuria, polydipsia, sudden weight loss, polyphagia, itching, irritability, delayed healing, muscle stiffness, and alopecia. The best results and discovered factors were achieved by using greedy hill climbing feature selection method. It used the power of information theory (i.e., entropy) to find the most effective factors for diabetes. In addition, the proposed classification model provided the highest TP rate, precision, recall, F-measure, and classification accuracy for all evaluation measurements. It also produced the lowest FP rate. Thus, the proposed technique was more suitable for diabetes diagnosis than the other classifiers. In addition, Figure 7 shows the fuzzy rules that were obtained by using this experiment for the purpose of diabetes detection.

Example of Fuzzy Classification Model Constructed by the Proposed Method
The performance of the commonly used classification techniques on other medical benchmarking datasets was checked to compare the results of the proposed hybrid classifier. Table 6 shows that the proposed hybrid classifier was better than MLPNN, KNN, and decision tree in all datasets. The proposed hybrid classifier was better than SVM in seven datasets. Furthermore, the proposed classifier outperformed Naive Bayes in six out of nine medical datasets. In comparison with all classifiers, the proposed hybrid classifier achieved the highest results in five datasets. It also acquired the second-best performance in four datasets. Meanwhile, the second-best classifier was Naive Bayes, with three datasets. SVM achieved the best results in two datasets. MLPNN, KNN, and decision tree classifiers obtained the lowest results in all datasets.
Looking at the overall average ranking of classification accuracy in Table 6, the proposed classifier performed the best, as expected from the usage of the greedy hill climbing feature selection method together with the fuzzy logic that chose the most related features, which increased the classification accuracy.

CONCLUSION
Diabetes detection is a serious medical problem in the real world. Therefore, early-stage detection of diabetes plays an important role in treatment. In this research, a new hybrid classifier was proposed for diabetes classification on the basis of the feature selection method. The proposed classifier comprised two main steps: a) greedy hill climbing feature selection method; and b) generation of a fuzzy unordered rule list for diabetes pattern classification. The simulation results of the proposed hybrid classifier could deal with diabetes diagnosis problems at an early stage. The proposed classifier also showed the ability to select good features that improved the classification accuracy. In addition, it demonstrated the importance of feature selection in diabetes detection and showed that the performance of the classification became better after taking more consideration for this method. For validation purposes, the proposed classifier outperformed the other state-of-the-art classifiers, such as Naive Bayes, SVM, MLPNN, KNN, and decision tree. For future research, the usage of stochastic local search algorithms (i.e., variable neighborhood search, guided local search, and iterated local search) together with swarm intelligence algorithms (i.e., particle swarm optimization, ACO, and ABC) could be implemented to improve the feature selection method, increase the classification rate, and reduce the error rate. Another research direction will be to focus on collecting extra information to discover new potential elements to be integrated.