Local Search-based Enhanced Multi-objective Genetic Algorithm and Its Application to the Gestational Diabetes Diagnosis

In evolutionary computation, several multiobjective genetic algorithms (MOGAs) have been widely used to solve multi-objective optimization problems (MOOPs). The version NSGA-II, developed by Deb et al., is a useful package using a population-based genetic algorithm to solve optimization problems with multiple objectives subject to constraints. This study proposes an enhanced version of NSGA-II, termed LS-EMOGA herein, which modifies the crossover and mutation operators of original NSGA-II by an extended intermediate crossover and a nonuniform mutation and also incorporates a local search (LS) procedure to improve the fine-turning ability of the solution searching. The performance of the proposed LS-EMOGA is assessed by evaluating five benchmark cases of MOOPs. The computed solutions are compared with those of obtained using NSGA-II and proposed MOGA without local search procedure (EMOGA version). Moreover, the proposed LS-EMOGA combines a k-means clustering algorithm to apply to the case diagnosis of gestational diabetic disease. 

these algorithms, the GA is a widely applied method in solving multi-objective optimization problems (MOOPs) due to its simplicity for implementation.Generally speaking, the GA can be applied to solve single or multiple objective functions.As we known, there are a set of optimal solutions, named non-dominated solutions, in the MOOP, whereas there is only a global optimum in the single objective optimization problem.As shown in Fig. 1, the optimal solution marked with symbol "" can be achieved via a single objective optimization algorithm by optimizing the objective function f 1 .On the other hand, when multi-objective optimization reaches its optimum, the cost will be higher if a higher accuracy is to be demanded.It is clearly that the objectives of predicted accuracy (f 1 ) and cost (f 2 ) are conflicting with each other.If we set a threshold of acceptable accuracy (Accept threshold ) as a constraint, the Pareto front, along points "p" to "q", can be formed for the multi-objective optimization.Each solution on the Pareto front is an optimum and is a tradeoff between the objectives f 1 and f 2 .The best popular version of MOGA is NSGA-II [5].This version of NSGA-II contains a fast non-dominated sorting procedure with a less computational complexity of (MN 2 ) instead of (MN 3 ).The symbols denoted in the computational complexity are the numbers of objectives (M) and the population (N).Moreover, NSGA-II also includes a tournament selection operator based on a crowded-comparison to achieve a uniform distribution for non-dominated solutions.To improve the performance of NSGA-II, Liu et al. [6] introduced an enhanced MOGA which applied Deb et al.'s fast non-dominated sorting and tournament selection operator, as well as used an extended intermediate crossover and a non-uniform mutation to achieve the obtained non-dominated solutions with good diversity-preservation and uniform spread on the approximated Pareto front.From the experiments of test cases, homogeneous and heterogeneous base station placement problems, the enhanced MOGA showed better results than those obtained using NSGA-II.
Basically, GA itself is a gradient-free searching method, and hence it possesses a good potentiality in performing the global search.However, it is a little of ineffective in local search so that the computed solution is often trapped in the local optimum.Therefore, Do et al. [7], Sindhya et al. [8] and Soam et al. [9] proposed three different kinds of local search techniques to their evolutionary algorithms in order to improve the convergence speed and solution accuracy.Their experiments showed that the local search procedure is useful for aiding the convergence to obtain a more accurate Pareto-optimal front.Accordingly, this study develops an iterative-manner local search procedure to the EMOGA, termed LS-EMOGA herein, and also combines a k-means clustering algorithm to apply to the case diagnosis of gestational diabetes disease.

A. Mathematical Formulations of Multi-objective
Optimization Problem Generally, the mathematical expression for multiobjective optimization can be formulated as a minimization of objective function In the minimizing process, if the solution a x  and b x  meet the following two conditions, we call it: " a x  In the process of optimum, the dominated solutions will gradually transform into non-dominated ones.When the optimal solution * x  is found, i.e. at the optimal state, we can obtain an estimated Pareto-optimal front.

B. Objective Functions for the Medical Diagnosis
This study conducts the GA to find the best possible weights ( w  ), and then combines the weights with that similarity computing using Manhattan distance in the kmeans algorithm.The k-means clustering can divide the given data set into k clusters.When there are n attributes in the data set of disease, the similarity computing for a solution i x  to the th j cluster centroid, j c x  , can be expressed as Thereafter, we put the miscalculated values of classification and false negativity yielded from k-means into the use of GA, thereby setting the dual objective functions to evolve the best solution ( opt w  ).The objectives used in this study are: (1) minimizing the classification error between the predicted class ( pred C ) and actual class ( actual C ), and ( 2) minimizing the number of false negativity (fn) which means the number that a patient is with disease but is diagnosed contrarily.
 

A. Enhanced Multi-objective Genetic Algorithm
As mentioned above, the use of non-dominated sorting is an important process to build up the Pareto front.This study adopts the computing process presented in Deb et al. [5] to obtain each individual's rank (i rank ).After we completed the sorting process to all individuals, we then proceed to comput the crowded distance (i distance ) based on the sorted fitness values of each objective function between non-dominated solutions [5].A tournament selection operator is also applied to select the solution with the lower rank (with better fitness) or larger crowded distance when both points i and j are located on the same front.
The process mentioned above also includes an elitist strategy to make sure that the best individual will to be survival in the next generation.Thereafter, a crossover process is performed after the tournament selection.This study adopts an extended intermediate crossover [10], presented mathematically as follows: ) ( where α is a random parameter between [d, 1+d] where t and r represent the number of generation and random value between [0, 1], respectively; the symbol T is the maximum number of generation, and b (in this study, b=0.9) is a parameter determining the degree of non-uniformity.

B. Local Search Procedure
As mentioned in Introduction, an evolutionary algorithm incorporating a local search procedure can improve the convergence speed of solution and solution accuracy [7] [8] [9] [11].Accordingly, this study incorporates an iterative-manner local search procedure, named feasible direction method, into the EMOGA to enhance the solution accuracy.The feasible direction method is considered to be the best available optimization technique because it usually converges rapidly to a nearoptimum design.The calculation is performed iteratively, as follows.(10) here, the subscript i of x  denotes the number of iteration, vector S  represents the direction of the search in ndimensional design space, and  is the step length for determining the amount of movement in the search direction.In Eq. ( 10), the search direction must be usable and feasible.Hence, the search direction S  should satisfy the requirements of usability and feasibility as follows: The work applies the Golden Section method to find the minimum objective function.After performing the local search procedure, parts of the individuals can be improved to superior fitness values.Accordingly, this work combines the individuals of parents and offspring, and also the improved individuals into the sorting pool for the non-dominated sorting process.Hence, the new individuals in the next generation are composed of parents, offspring and also improved individuals, as shown in Fig. 2. The improved individuals are often picked up as one part of individuals of the next generation.

C. The Combination of LS-EMOGA and K-means
Clustering Algorithm The data set in this study was divided into two groups (k=2), those with and without gestational diabetic disease.And this work combines EMOGA and the k-means algorithm for the diabetes diagnosis.The computing procedure of LS-EMOGA with k-means clustering algorithm was displayed in Fig. 3.

A. Performance Evaluation of the Proposed LS-EMOGA
Firstly, this study adopts five benchmark cases to evaluate the performance of the proposed LS-EMOGA.The benchmark cases include ZDT1, ZDT2, ZDT3, ZDT4, and ZDT6 [5].The objective functions and dimension (n) of the five problems are listed in Table I.The numbers of population and function evaluations were 100 and 2000, respectively, and the probability rates of crossover and mutation were p c =0.9 and p m =0.1.The computations were performed using NSGA-II, EMOGA and proposed LS-EMOGA for the five optimization problems, and the computed results were shown in Fig. 4. It is obviously that the results obtained using the three algorithms were very close to the true Pareto fronts for ZDT1-4 (Figs. 4(a)-(d)).However, the result solved by NSGA-II and PAES were deviated from the true Pareto front for ZDT6 [5] (as shown in Fig. 4(e)).
To further understand the differences of algorithmic performance between the three MOGAs, two metrics of performance, denoted by GD and △, are used to evaluate the proximity of the obtained non-dominated solutions to true Pareto-optimal solutions and the uniformity of spread for the computed non-dominated solutions [5].Table II lists the metrics of proximity and uniformity obtained using NSGA-II, EMOGA and LS-EMOGA for the five test cases with 20 repeated runs.It can be seen from Table II that LS-EMOGA was capable of converging closely to the true Pareto-optimal solution, with good distribution of non-dominated solutions, for most of the cases.For Problem ZDT6, the mean values of GD and △ metrics obtained using the presented LS-EMOGA were both clearly superior to those of using NSGA-II.Again, the proposed LS-EMOGA gave good results on metric GD for the four test cases, ZDT1, ZDT2, ZDT3, and ZDT6, while NSGA-II was good for the case ZDT4 only.Moreover, the spread and uniformity of non-dominated solutions were evaluated by the spread metric △ from the NSGA-II, EMOGA and LS-EMOGA computations.It is noted that the lower value of △ indicates a better performance.Table II clearly shows that the presented LS-EMOGA outperforms NSGA-II and EMOGA for all the cases.

B. Medical Diagnosis of Gestational Diabetic Disease
In this study, the data set of disease, "pima-indiansdiabetes" from UCI repository [12], was conducted for the medical diagnosis.This work adopted 614 cases (80% of the data set) as the training sets (randomly chosen) and the remains with 154 cases as the test sets.Two objective functions were used for minimizing the classification inaccuracy ) ( 1 w f  (subject to Accept threshold 60%) and number of false negativity ) ( 2 w f  for the gestational diabetes diagnosis.The result of medical diagnosis will be presented using performance parameters, including accuracy (acc), sensitivity (true positive ratio, tpr), specificity (true negative ratio, tnr), F-measure (f-m), cost, and number of false negativity (fn).Fig. 5 displays the results of the Pareto front, from points "p" to "q", obtained using NSGA-II, EMOGA and LS-EMOGA.From the Pareto fronts, the solution at point "p" displayed a smaller number of false negativity, while the classification accuracy was lower.Solution at point "q" was more accurate in classification, while the number of false negativity was high.Clearly, the solution at point "q" obtained using LS-EMOGA had a lower number of false negativity than those of NSGA-II and EMOGA at the best classification accuracy.Moreover, the uniformity of non-dominated solutions obtained using LS-EMOGA was also better than those of EMOGA and NSGA-II.Table III shows the average classification accuracy and its standard deviation after 20 repeated times evaluated by NSGA-II, EMOGA and LS-EMOGA.The results of mean value and standard deviation at point "q" obtained using LS-EMOGA were better than those of NSGA-II and EMOGA.From Table IV, the best values of classification accuracy at point "q" evaluated by NSGA-II, EMOGA and LS-EMOGA were 78.34%, 78.66% and 78.50%, respectively.Moreover, the total costs evaluated by NSGA-II, EMOGA and LS-EMOGA were 425, 419 and 412, respectively.Table IV also reveals that the solution at "p" had higher sensitivity and lower cost; whereas the solution at point "q" had higher classification accuracy, specificity, and F-measure.The numbers of false negativity at points "p" and "q" were 36 and 70 for the computation of LS-EMOGA.Still, we can observe from Table V that the solutions at point "q" obtained using NSGA-II and LS-EMOGA were remarkable on the attributes "Pl", "Bm" and "Pd", while it gave the least effect on the attributes "Sr" and "Ag".NSGA-II EMOGA LS-EMOGA "p" "q" "p" "q" "p" "q" acc 62.05% 78 NSGA-II EMOGA LS-EMOGA "p" "q" "p" "q" "p" "q" w1 (Nu) 0.0181 0.2334 0.0418 0.3574 0.0288 0.3363 w2 (Pl) 0. This study proposed the local search-based enhanced multi-objective optimization genetic algorithm, termed LS-EMOGA, combining with the k-means algorithm to achieve the best classification accuracy and the lowest number of false negativity for the medical diagnosis.The multi-objective functions for evaluating the performance of LS-EMOGA included ZDT1, ZDT2, ZDT3, ZDT4, and ZDT6.The computational results clearly displayed that the proposed LS-EMOGA outperforms NSGA-II on ZDT1, ZDT2, ZDT3, and ZDT6.Also, the results of performance measures using the metrics of proximity and spread showed that the proposed LS-EMOGA can provide accurate Pareto fronts.Furthermore, this study incorporated the k-means clustering algorithm into LS-EMOGA to analyze the diagnosis of gestational diabetic disease.In the analyses of the data set by using the proposed LS-EMOGA, EMOGA and NSGA-II, the proposed LS-EMOGA gave better solutions than those of EMOGA and NSGA-II from the comparisons of performance metrics.

Figure 2 .
Figure 2. Modified sorting pool of individuals for the non-dominating sorting.

Figure 3 .
Figure 3. Flow chart of the proposed LS-EMOGA and K-means clustering algorithm.

Figure 5 .
Figure 5.Comparison of solution distributions for medical diagnosis.

TABLE I .
MATHEMATICAL FORMULAS AND DIMENSIONS OF FIVE TEST CASES

TABLE II .
COMPARISONS OF METRICS GD AND  FOR FIVE TEST CASES (20 REPEATED RUNS)

TABLE III .
COMPARISONS OF AVERAGE CLASSIFICATION ACCURACY

TABLE IV .
COMPARISONS OF PERFORMANCE METRICSPar.

TABLE V .
COMPARISONS OF THE OPTIMAL WEIGHTS OF ATTRIBUTES attribute s