Multi-objective single agent stochastic search in non-dominated sorting genetic algorithm

. A hybrid multi-objective optimization algorithm based on genetic algorithm and stochastic local search is developed and evaluated. The single agent stochastic search local optimization algorithm has been modiﬁed in order to be suitable for multi-objective optimization where the local optimization is performed towards non-dominated points. The presented algorithm has been experimentally investigated by solving a set of well known test problems, and evaluated according to several metrics for measuring the performance of algorithms for multi-objective optimization. Results of the experimental investigation are presented and discussed.


Introduction
Global optimization problems can be found in various fields of science and industry, i.e. mechanics, economics, operational research, control engineering, project management, etc.In general, global optimization is a branch of applied mathematics that deals with finding "the best available" (usually minimum or maximum) values of a given objective function, according to a single criterion.Without reducing the generality further we will focus on the case of minimization, since any maximization problem can be easily transformed to a minimization one.
Mathematically, a global optimization problem with d variables is to find the value and a decision vector x * such as where f (x) is an objective function which is subject to minimization while decision vector x = (x 1 , x 2 , . . ., x d ) varies in a search space D ∈ R d , and d defines the number of variables.
Real-world optimization problems often deal with more than one objective functions that are conflicting to each other -improvement of one objective can lead to deterioration of another.Such type of optimization problems are known as Multi-objective Optimization Problems (MOP).
Mathematically, a MOP with d variables and m objectives in the objective vector is to find an objective vector Since there exist conflicts among objectives, it is natural that a single solution best according to all objectives does not exist.However a set of Pareto optimal solutions, which cannot be improved by any objective without reducing the quality of any another, may be found.
In multi-objective optimization two different decision vectors can be compared by their dominance relation.They can be related to each other in a couple of ways: either one dominates the other or none of them is dominated by the other [1].
Suppose two decision vectors x and y from a set D. It is said that decision vector x dominates decision vector y if • decision vector x is not worse than y by all objectives and • decision vector x is strictly better than y by at least one objective.
If ( 5) is satisfied then decision vector x is dominator of decision vector y.The decision vector which has no dominators is called non-dominated or Optimal in Pareto sense.If neither x y nor y x are not satisfied then the decision vectors x and y are called indifferent and denoted by x ∼ y.
The set of all decision vectors that are non-dominated according to some set D ⊂ D is called Pareto set, and the set of decision vectors that are non-dominated according to the whole search space D is called Pareto-Optimal Set.The corresponding set of objective vectors are called Pareto Front and Pareto-Optimal Front, respectively.
Usually it is hard and time consuming to find the Pareto-optimal front therefore a lot of multi-objective optimization algorithms are based on approximation of the Pareto-optimal front.

Evolutionary multi-objective optimization
One well known class of optimization algorithms are Evolutionary Algorithms (EAs).They are well suited to multi-objective optimization problems as they are fundamentally based on biological processes which are inherently multi-objective.Multi-Objective EAs (MOEAs) can yield a whole set of potential solutions -which are all optimal in some sense -and give the option to assess the trade-offs between different solutions.Furthermore EAs require little knowledge about the problem being solved and they are robust, easy to implement and inherently parallel.The only requirement to solve a certain optimization problem is to be able to evaluate the objective functions for a given set of input parameters.
After the first works on multi-objective evolutionary optimization [2], several different algorithms have been proposed and successfully applied to various problems.
Although there are many techniques to design MOEAs, which have different characteristics, the advantages of a particular technique can be improved by hybridizing two or more different techniques, for example, hybridizing different search methods, search and updating methods or different search methods in different search phases.A special case of hybrid MOEA is memetic MOEA [3] -incorporation of local optimization techniques into an evolutionary algorithm.
In this paper we will focus on one of the most popular multi-objective evolutionary algorithm Non-dominated Sorting Genetic Algorithm (NSGA) [1], and local multiobjective optimization technique Multi-Objective Single Agent Stochastic Search (MOSASS) [4], derived from algorithm for single-objective optimization Single Agent Stochastic Search (SASS) [5].
The Non-dominated Sorting Genetic Algorithm (NSGA) was proposed in [1], and was one of the first MOEA.Since then NSGA was applied to various problems [6,7].The updated version of NSGA -NSGA-II has been proposed in [8].
The algorithm begins with an initial parent population, consisting of decision vectors randomly generated over the search space.Then the algorithm continues with the following generational process (see Fig. 1): 1.A new child population is created by applying genetic operators (selection, crossover and mutation) to the elements of the parent population.Usually a child population has the same size as the parent population.
2. Parent and child populations are combined into one population.
3. The population is sorted according to the number of dominators.
4. Obtained population is reduced to the size of parent population by removing the most dominated elements.If two or more decision vectors are equally dominated then crowding distances estimator [8] is used to choose the most promising decision vector.

5.
The reduced population is used as a parent population in the next generation.3 Hybrid multi-objective optimization  [5].SASS algorithm begins with an initial solution x which is assumed to be the best solution found so far, and a new candidate solution x is generated by the expression where ξ = (ξ 1 , . . ., ξ d ) is a vector of random numbers generated following Gaussian distribution with the mean b i , i = 1, . . ., d, and the standard deviation σ.Here x − i and x + i correspond to the lower and upper bounds of the value of variable x i .In the case of the solution x is accepted as x -the best solution found so far -and the algorithm continues to the next iteration.Otherwise the opposite solution is investigated and accepted as x if If either (8) or (10) is satisfied then the mean values b i are recalculated by and the iteration is supposed to be successful.Otherwise, the iteration is supposed to be failed and the mean values are recalculated by If the number of repetitive successful iterations (scnt) reaches the given value Scnt, then the standard deviation value is expanded by Analogically, if the number of repetitive failed iterations (fcnt) reaches the given value Fcnt, then the standard deviation is contracted by If standard deviation σ falls below the given minimum value σ min , it is increased to the given value σ max .
The values e > 1 and c ∈ (0, 1) -expansion and contraction coefficients, respectively, are given as input parameters as well as the values Scnt and Fcnt, the upper and lower ranges for standard deviation (σ min and σ max ), and the number of iterations to be performed (N iters ).Recommended values for the parameters are e = 2, c = 0.5, Scnt = 5, Fcnt = 3, σ min = 10 −5 , σ max = 1.0 [5].
SASS algorithm has been successfully used in evolutionary algorithms for singlecriterion optimization problems [9][10][11].In order to apply it for multi-objective optimization problems a new version called Multi-Objective Single Agent Stochastic Search (MOSASS) has been developed.Primary version of MOSASS algorithm and its evaluation on several test functions has been presented in our previous work [4].Further we will present a modified version of MOSASS algorithm where modifications are mainly based on involving probability in generation of neighbor solutions.
MOSASS algorithm begins with an initial solution x, empty archive A and constant N A , which defines how many solutions can be stored in the archive.Previously defined SASS parameters Scnt, Fcnt, e, c, σ min , σ max and N iters are given as initial parameters for MOSASS as well.
Since a new solution x is generated, following equation ( 6), and objective vector F(x ) is calculated, the dominance relation between x and x is evaluated.In the case of x x, the present solution x is changed by x and the algorithm continues to the next iteration.Otherwise, if x does not dominate x and is not dominated by any solution in A∪{x} then the archive A is supplemented by x and algorithm goes to the next iteration.If solution x is dominated by any member of A ∪ {x}, then x is rejected and opposite solution x is investigated in the same way as x .If archive size exceeds the limit N A then a half of elements are removed by utilizing crowding distance operator [8].
If either a present solution x is changed or the archive is supplemented then the iteration is assumed to be successful, otherwise -failed.
The mean and standard deviation parameters are dynamically adjusted as in SASS algorithm.
A strategy which is used in selecting neighbor solution can play an important role in random search techniques.In SASS as well as in MOSASS a neighbor solution is generated by (6).Since values of all variables of x are changed without any probabilistic choice, it is a large probability that obtained neighbor solution x or x differs from its precursor x by all variables.However, sometimes it is necessary to make only a slight modification of the current solution -to alter only one variable in order to obtain a neighbor solution which would dominate its precursor or at least would be non-dominated by other solutions.This is especially important in a later stage of the algorithm when almost optimal solutions are used to produce new ones.In order to approve (or disprove) this slight modification, we previously modified MOSASS by involving the probability in generating a neighbor decision vector, and call the modified algorithm by MOSASS/P.The modification has been done by changing the expression ( 7) to the following one: where r is a random number uniformly generated in [0, 1] and p ∈ (0, 1] is a probability value, predefined in advance.The larger p value is, the larger probability that the particular coordinate will be changed.If p equals to 1, the algorithm behaves as MOSASS. The main difference in generation of neighbor decision vector in algorithms MOSASS and MOSASS/P is shown in Fig. 2, where the most likely neighbor decision vectors x to be generated by MOSASS (on the left) and MOSASS/P (on the right) are illustrated (x i = 0.5, p = 0.5, σ = 0.1 and b i = 0, i = 1, 2).
There is also possible situation that all coordinates will remain unchanged.Then the generation of the new decision vector is repeated until at least one coordinate will be changed.

Memetic algorithm
In order to improve some performance metrics of original NSGA-II algorithm a memetic algorithm, based on performing the local search towards a set of non-dominated decision vectors, has been developed.The memetic algorithm begins with an initial parent population P , consisting of N decision vectors randomly generated over the search space.Further, the algorithm continues with the following processes: www.mii.lt/NA 1.A new child population, of the size N , is generated by applying genetic operations to the elements of the parent population.The uniform crossover that combines pairs of parent population elements, and mutation with mutation rate equal to 1/d are used.
2. Parent and child populations are combined into one 2N -size population, and each decision vector is evaluated by counting the number of its dominators.
3. The obtained population is reduced to the size of N by removing the required number of most dominated elements.In the case of removing two decision vectors that are equally dominated, the crowding distance operator [8] is used to choose the most promising one.
4. The counter of NSGA-II generations G is increased by one.If the value of G does not exceed the predefined number, then the algorithm returns to the first step.
Otherwise the algorithm continues to the next step.
5. An auxiliary set P L of k decision vectors is created from the population P .Nondominated decision vectors are chosen to be included into the set All these sets are combined into one set together with the population P , which is reduced to the size of N .The same scheme as in third step is used to perform the reduction.
7. The algorithm continues to the first step by reseting the generations counter G to 0, and using the obtained population as the parent population for performing NSGA-II generation.
Depending on the local search algorithm (MOSASS or MOSASS/P) used, we denote the derived memetic algorithm by NSGA/LS and NSGA/LSP, respectively.The number of NSGA-II iterations after which local search is performed, the size of the auxiliary set P L , and the number of local search iterations are given as input parameters by the user.

Experimental investigation 4.1 Description of experiments
The proposed memetic algorithms NSGA/LS and NSGA/LSP have been experimentally investigated by solving a set of multi-objective optimization problems which are listed in Table 1.

Performance metrics
The following performance metrics were used in experimental investigation to evaluate the performance of the algorithms: • Pareto Front Size (PS ) -number of non-dominated objective vectors in the obtained Pareto front.
• Coverage of Pareto Optimal Front (C) -number of decision vectors which are nondominated regarding to members of Pareto-optimal front.This metric is based on Coverage metric proposed in [12]: where P * denotes the Pareto-optimal front, and P -the obtained approximation.Naturally C | P | and larger C value implies better quality of P .The value C( P , P * ) = 0 means that all decision vectors in P are dominated by at least one decision vector in P * .The opposite, C( P , P * ) = | P | represents the situation when all decision vectors in P are indifferent to decision vectors in P * .
• Hyper-Volume (HV ) -the volume of a region made by the members of obtained Pareto front and the given reference point (Ref ) [19].The larger HV value means the better quality of Pareto Front.Two examples of HV are presented in Fig. 3, where Pareto front on the left has more points and better (more similar to uniform) distribution than Pareto front on the right.Therefore the dominated space of the Pareto front on the left is noticeably larger.
• Inverted Generational Distance (IGD) -the average distance from the element of the Pareto-optimal front P * to the nearest element of the approximated Pareto front P : Since it is expected to find decision vectors which would be as close as possible to the members of the Pareto-optimal front, lower IGD value implies better quality of the approximation.
• Pareto Spread (∆) shows how well members of Pareto front are spread.The original metric is based on calculating the distance between two consecutive solutions and is dedicated to bi-objective problems.We use an extended version of spread metric which is based on calculating the distance from a point to its nearest neighbor [19]: where P is an obtained approximation of Pareto front, P * -Pareto-optimal front, e 1 , . . ., e m are m decision vectors, corresponding to extreme objective vectors in P * and d(x, P ) = min y∈ P , y =x If the members of approximated Pareto front are well distributed and include extreme solutions, then ∆ = 0.

Results
The first set of experiments was aimed at choosing the appropriate number (E G ) of function evaluations after which local search is performed, and the number (N L ) of function evaluations to be performed during each local search.A set of 36 different combinations of parameters (E G , E L ) has been used.The results of the investigation are presented in Table 2, where the numbers of test functions for which particular set of parameters (E G , E L ) was the best by hyper-volume metric are given.From the table we can see that it is worth to perform local search every 500-2000 function evaluations.
The number of function evaluations, performed in each local search, should be 300-500.
We have chosen a median point -(1000, 400) as an appropriate parameter set (E G , E L ) for further experiments.Sensitivity of the algorithm to the values of parameters (E G , E L ) depends on test function being solved.For 5 test functions the loss of HV when choosing the worst combination of parameters was less than 5%, while for 15 test functions -less than 10%, and for 20 test functions -less than 20%.The algorithm was most sensitive to the values of the parameters when solving test functions ZDT4, UP8, OKA2, ZDT6 and UP10 -loss of HV was 28%, 35%, 37%, 56%, 91%, respectively.
The second experiment was dedicated to investigate performance of proposed algorithms NSGA/LS and NSGA/LSP.The performance of the algorithms has been compared with each other and with classical NSGA-II, implemented as given in literature [8].All three algorithms have been performed for 100 independent runs on all 26 test functions.The crossover probability equal to 0.8, mutation rate equal to 1/d and population size equal to 100 have been used for NSGA-II.Local optimization has been performed after every 10 NSGA generations (1000 function evaluations) for 400 function evaluations, as this combination of parameters gives the best statistical results in previous experiment.A set of 10 decision vectors have been selected (see Section 3.2) for local optimization.The probability in generating new decision vector in NSGA/LSP was chosen to be equal to 1/d -the same as mutation rate in NSGA-II.
Results of the experimental investigation are presented in Tables 3-7, where values of PS , C, HV , IGD and ∆ metrics are presented respectively.100 independent runs were performed for each test function.Average values and standard deviations are presented in the tables.The best values of performance metric are marked in bold.Each algorithm was evaluated by counting the number of test problems for which the evaluated algorithm gives the best result according to a particular metric of performance.All three algorithms were evaluated according to all five performance metrics used in the experimental investigation.The results are illustrated in Fig. 4, where the vertical axis represents the number of problems which were solved best, and horizontal axis -performance metric.Different columns in a group represent different algorithms: NSGA-II, NSGA/LS and NSGA/LSP, respectively.As illustrated in the figure, NSGA-II gives the best Pareto size values for 7 test problems, while NSGA/LS was the best for 8, and NSGA/LSP -for 11.According to coverage metric, NSGA-II was the best for 5, NSGA/LS -for 5, NSG-II/LSP -for 14 test problems, and for 1 test problem all three algorithms give zero values of coverage metric.According to the hyper-volume metric, NSGA-II solves best 5 test problems, NSGA/LS -3 and NSGA/LSP -18.Three test problems were solved best by NSGA-II according to IGD metric, while NSGA/LS and NSGA/LSP solve best 8 and 15 test problems, respectively.According to spread metric, NSGA-II solves best 7, NSGA/LS -6, and NSGA/LSP -13 test problems.Thus we can conclude that algorithm NSGA/LSP gives the best result of the evaluation according to any metric of performance.One test problem (VLMOP2) was solved best by NSGA-II, however advantages were insignificant, and 9 test problems were solved best by NSGA/LSP according to all metrics.
Although NSGA/LSP increases the performance metrics for most of test problems which were investigated, for some test problems some performance metrics were reduced.Also there is no information about how much a performance metric was increased or reduced.In order to elicit such an information, algorithms NSGA/LS and NSGA/LSP were evaluated by the percentage improvement of values of performance metrics obtained by the classical algorithm NSGA-II.The results of the evaluations are illustrated in Figs.5-8.In order to avoid the uncertainty which would arise zero values of C metric values produced by NSGA-II for most test problems, this metric was not included in the evaluation.Instead, the results of evaluation by improvement in amount of decision vectors is presented in Fig. 9. Results presented in the figures show that performance metrics have been more or less improved for most of test problems investigated.
Several illustrations of approximations of Pareto fronts obtained by NSGA-II and NSGA/LSP are illustrated in Fig. 10.Third experiment was dedicated to compare the performance of the algorithm NSGA/LSP with performance of the Multi-objective Optimization Evolutionary Algorithm based on Decomposition (MOEA/D), which has been presented by Zhang and Li in [20].MOEA/D decomposes a multiobjective optimization problem into a number of scalar optimization subproblems and optimizes them simultaneously.MOEA/D has the best performance on test functions UP1-UP10 in the CEC 2009 Special Session and Competition [14].Implementation of the algorithm has been downloaded from the MOEA/D Homepage.Both algorithms NSGA/LSP and MOEA/D have been ran for a certain time period -20 seconds, within which at least one of the algorithms provide valuable solution.The size of the population in NSGA/LSP has been chosen to be 200.
Local search has been performed for 800 function evaluations after every 2000 function evaluations performed by global search.The parameters of MOEA/D have been chosen to be the same as in CEC 2009 Special Session and Competition.Both algorithms have been ran for 100 independent runs on test functions UP1, UP2, UP3, UP4, UP7, UP8 and UP10.Performance of the algorithms has been evaluated by hyper-volume metric, which was measured at discrete time moments -3rd, 5th, 10th, 15th and 20th second of experiment.Results of the experiment are presented in Fig. 11  The last experiment was dedicated to investigate the ability of NSGA/LSP to solve hard test problems -DTLZ1 to DTLZ4 with 10 objectives.Number of variables have been chosen to be 14 for test problems DTLZ1 and DTLZ3, and 20 -for DTLZ2 and DTLZ4, as it was suggested in [13].The algorithm has been ran for 1 × 10 6 function evaluations with population size of 1000.Local search has been performed for 5000 function evaluation after every 10000 function evaluations of global search.Every test function has been optimized by 100 independent runs.Performance has been measured and compared with NSGA-II by IGD metric.Results, presented in Table 8, showed that NSGA/LSP gives significantly better IGD value for all test functions investigated.

Conclusions
The paper proposes a hybrid multi-objective optimization algorithm based on Non-dominated Sorting Genetic Algorithm (NSGA-II) and improved Multi-Objective Single Agent Stochastic Search algorithm.The developed algorithm NSGA/LSP has been experimentally investigated and evaluated by solving a set of 26 test problems and measuring five different metrics of performance.Given results show that involving the presented local search method into the evolutionary algorithm NSGA-II gives significant advantage for most of test functions according to different performance metrics.Performance metrics are significantly improved for most of test problems by involving improvement in Multi-objective Single Agent Stochastic Search based on changing randomly only part of coordinate values.

1 Fig. 2 .
Fig. 2. Illustration of difference in generation of neighbor decision vector in MOSASS and MOSASS/P.

Fig. 4 .
Fig. 4. Comparison of the algorithms by number of problems solved best.

Fig. 11 .
Fig. 11.HV values of different test functions, obtained by algorithms MOEAD/D and NSGA/LSP at discrete time moments.

Table 1 .
List of test problems used in experimental investigation.

Table 2 .
Numbers of test functions for which particular values (EG, EL) was the best by HV value.

Table 4 .
Coverage of Pareto front.

Table 8 .
IGD values obtained solving test problems with 10 objectives.where the horizontal axis in each graph represents time moments, and the vertical axis -value of hyper-volume.Different graphs represent different test function and different curves -different algorithm.The results show that NSGA/LSP produces significantly better results for almost all test functions within fixed time period.MOEA/D was significantly better for one test function -UP3, however MOEA/D was unable to find any evaluable approximation of Pareto front of test function UP10 within 20 seconds.