Regularity Model for Noisy Multiobjective Optimization

Regularity models have been used in dealing with noise-free multiobjective optimization problems. This paper studies the behavior of a regularity model in noisy environments and argues that it is very suitable for noisy multiobjective optimization. We propose to embed the regularity model in an existing multiobjective evolutionary algorithm for tackling noises. The proposed algorithm works well in terms of both convergence and diversity. In our experimental studies, we have compared several state-of-the-art of algorithms with our proposed algorithm on benchmark problems with different levels of noises. The experimental results showed the effectiveness of the regularity model on noisy problems, but a degenerated performance on some noisy-free problems.

where is the decision space and F is the objective function consisting of m individual objectives f 1 , . . ., f m .Let x, y ∈ , x is said to dominate y if and only if f i (x) ≤ f i (y) for all i ∈ 1, 2, . . ., m and F(x) = F(y).x * ∈ is Pareto optimal if no x ∈ dominates x * .The set of all the Pareto optimal solutions is called the Pareto set (PS), and the set of their objective vectors is called the Pareto front (PF).
Multiobjective evolutionary algorithms (MOEAs) have been regarded as a major tool for approximating the PF [1]- [7].In many real-world applications, environmental and measurement noises are inevitable and thus the exact objective function evaluation is impossible.Moreover, noises on different objectives might be of different scales and distributions.In this paper, we assume each function evaluation can only obtain F(x) + ε, where ε ∼ N(0, σ 2 I), the deviation σ represents the noise level.
Noises could significantly deteriorate the performance of MOEAs if no extra measures are taken for handling them.Noises can lead to wrong ranking and thus mislead the search.Noises can also make diversity maintaining difficult.Existing approaches for dealing with noises in MOEAs include the following.
1) Ranking: To reduce the effect of noises, nondominated sorting genetic algorithm-II (NSGA-II)-A [8] adopts αdominance and uses the support vector machine for establishing a confidence model for ranking.Hypothesis testing [9] and fuzzy theory [10] have also been used to build new dominance relations for noisy MOPs.Probabilistic Pareto ranking [11] uses an error function in estimating the dominance probability between two solutions for noisy MOPs in [9], [12], and [13].2) Averaging: It is for de-noising.The most straightforward method is to do objective function evaluation several times independently and then average the obtained value.The deviation of the mean value (i.e., average value) decreases as the number of evaluations increases.Noisetolerant strength Pareto evolutionary algorithm (SPEA) (NTSPEA) [14] assigns different evaluation numbers to different individual solutions according to their dominance relations in a population.Some researchers (see [15]) also consider the standard deviation of noisy objective function values.To reduce the number of function evaluations, an average of noisy function values of several close solutions can be used as in [16].3) Modeling: Noises may affect a single solution significantly.However, their effect on a model can be minimal.
As a solution set can be described as a model, estimation of distribution algorithms (EDAs) have been recently used for dealing with noisy MOPs.For example, restricted Boltzmann machine [17], [18] and univariate marginal distribution algorithm [19] have been adopted for handling noisy MOPs.The above approaches can deal with noisy MOPs to certain extent, but their performance is not always satisfactory.The existing ranking methods cannot output ranks with a high accuracy for noisy problems.That is the reason why multi-objective probabilistic selection evolutionary algorithm (MOSPEA) [11] with the probabilistic Pareto ranking is not satisfactory on noisy MOPs.The averaging methods need multiple function evaluations for a single solution or similar solutions, which increases their computational cost.Even though some function evaluation saving methods are brought into the averaging-based algorithms (NTSPEA for instance), their efficiency is not high enough.The modeling-based algorithms are more effective than the averaging-based algorithms, but few modeling-based algorithms have considered the characteristics of MOPs.
This paper focuses on modeling approaches for continuous noisy MOPs.Under mild conditions, the PS of a continuous MOP is a piecewise (m−1)-D manifold.Although this regularity property has been successfully used for solving noise-free MOPs [3], it has not yet been considered in noisy MOPs.This paper advocates using this regularity model in MOEAs for dealing with noisy MOPs.Through some analysis in this paper, we argue that the regularity model is efficient for denoising.We also give the guide on how to use the regularity model in existing MOEAs.
The rest of this paper is organized as follows.Section II introduces the regularity model briefly.Section III analyzes the reason why the regularity model can de-noise efficiently and gives the motivation of this paper.The implementation of the regularity model is shown in Section IV.Section V shows how to use the regularity model in MOEAs.Section VI shows the experimental results and discussions.Finally, the conclusion is given in Section VII.

II. REGULARITY MODEL
Under some mild smoothness conditions, it can be proven by using the well-known Karush-Kuhn-Tucker condition that the PS of a continuous MOP is an (m − 1)-D piecewise continuous manifold [20], [21].This regularity property was first used in MOEAs in [3].
Fig. 1 illustrates a simple way for using the regularity property to obtain a model (called the regularity model) to model a population in MOEAs.Circular points are a population found during the previous search, and the solid curve are the true PS.One can assume that the population scatters around the PS, in other words, the PS can be thought of as a central (m − 1)-D manifold of these solutions.To model the PS, one can divide these solutions into several (three in Fig. 1) clusters.Then each cluster is approximated by an (m − 1)-D linear model ζ .
Each linear model ζ in one cluster can be defined as ( 2), it can be obtained by principal component analysis (PCA)  (the distances from solutions to their projection of ζ are minimized) [3], [22].In (2), x is the center of cluster C, U i (1 ≤ i ≤ n) (sorted by the eigenvalue λ i of the covariance matrix in a descending order) are the principal components of this cluster, θ i is the free variable in the ith principal subspace ( An MOP with two decision variables and two objectives in Fig. 2 is used as an example to explain how to obtain ζ .To simplify the problem, its true PS is x 2 = 0.5 and 30 solutions around the true PS are classified into one cluster.Firstly, the center x of those solutions can be calculated.Then, the principal component U i can be obtained by PCA.As ζ is (m−1)-D, only the first m−1 principal components U i (i = 1, . . ., m−1) are considered as shown in (2).From the example, x and U i (i = 1, . . ., m − 1) construct ζ , the former determines the location of ζ , and the latter determines the shape of ζ .
By the above "modeling" step, an analytical model for PS is obtained.Such a probabilistic model cannot be used directly in population-based MOEAs.Sampling is necessary to sample a population from a model as an operator to generate new solutions in MOEAs.
Fig. 3 illustrates two kinds of sampling from a regularity model.In Fig. 3(a), a population is sampled exactly on the regularity model.In Fig. 3(b), a population is sampled around the regularity model by adding noise ε to ζ , which has been employed in RM-multi-objective estimation of distribution algorithm (MEDA) [3] to add diversity.The process of sampling from ζ in cluster C can be done by uniformly sampling from free variables θ i (1 ≤ i ≤ m − 1) in ( 2).The sampling on θ i should be in the interval a i ≤ θ i ≤ b i as (3), where α i is a uniformly random value in [0, 1], a i and b i [shown in (4) and ( 5)] are the boundaries of sampling in the ith principal subspace of the cluster [3]

III. ANALYSIS AND MOTIVATION
In this section, we consider a major issue which motives us to use the regularity model for dealing with noisy MOPs, that is how well the regularity model can reduce the effect from noises.We first analyze the effect of noises on selection and explain why models can help to reduce noises.Then, through the comparison with other new solution reproduction operators, we study characteristics of the regularity model in noisy environments.
In this paper, we consider the following four different types of MOPs.
1) Type I: The objective functions are uni-modal and the mapping F (i.e., the objective function) is nearly symmetrical around the PS.In other words, two symmetry solutions about the PS have the same distance to the PF. 2) Type II: The objective functions are multimodal and the mapping F is nearly symmetrical around the PS. 3) Type III: The objective functions are uni-modal and the mapping F is not very symmetrical around the PS.In other words, two symmetry solutions about the PS do not have the same distance to the PF. 4) Type IV: The objective functions are multimodal and the mapping F is not very symmetrical around the PS.Table I lists four test instances of different types used in our studies.The PS of F1 and F2 is 0 ≤ x 1 ≤ 1, and x i = 0.5, i = 2, . . ., n.The PS of F3 and F4 is 0 ≤ x 1 ≤ 1, and x i = 0, i = 2, . . ., n.All these instances are for minimization.

A. Effects of Noises on Selection
To show the effect of noise level (i.e., σ = 0, σ = 0.1, and σ = 0.2) on selection, we conduct the following experiments on F1-F4 with two decision variables.
1) Evenly generate 51 × 51 solutions in [0, 1] 2 as an initial population Pop i from the search space.2) Use a nondominated sort (e.g., fast nondominated rank sort [1], nondominated rank sort [23], deductive sort [24], corner sort [25], and efficient nondominated sort [26]) to select 51 solutions as the selected population Pop s .Fig. 4 presents the experimental results.All the solutions in the selected population Pop s are very close to the true PS in the noise-free case (i.e., σ = 0).The deviation of the points in the selected population is increasing as σ grows.Solutions closer to the true PS have more chances to generate optimal solutions in the future generations.They can promote the evolutionary process and be viewed as good solutions.However, good solutions cannot be selected due to noises in the objective space.It is clear that the effect of noises varies from instance to instance.
A majority of selected solutions on uni-modal F1 and F3 (types I and III) are close to the PS.In contrast, quite a number of selected solutions on multimodal F2 and F4 (types II and IV) are close to some local optima.
On F1 and F2 (types I and II), selected solutions are distributed on both sides of the PS.In contrast, on F3 and F4 (types III and IV), most solutions are distributed on one side of the PS due to their less-symmetrical mappings around the true PS.

B. Models for Selected Population
Now, we study if one can build a good model for the PS from the selected solutions.We do PCA on these solutions to build a linear model for modeling their distributions.The results are plotted in Fig. 5.
Fig. 5 presents the experimental results.In the noise-free case, the obtained model approximates the PS very much.This is not surprising because the selected solutions are very close to the PS.As the noise level σ increases, the model quality is getting worse.
On F1 and F2 (types I and II) with symmetrical mapping, the model can approximate to the true PS reasonably good, particularly when σ is small.However, models are very poor on F3 and F4 even when σ is small.These experimental results suggest that models are useful for noisy MOPs of types I and II, but are not effective for types III and IV.The symmetrical degree of the mapping F affects the de-noising performance of the regularity model.There are two kinds of F in those test instances, they are both extreme cases.F1 and F2 have the perfectly symmetrical mapping and F3 and F4 have the completely nonsymmetrical mapping.In practice, the symmetrical degree of most problems is between these two extremes.Therefore, the results of F1 and F2 show the best performance of the regularity model for noisy MOPs, whereas the results of F3 and F4 show the worst performance of regularity model for noisy MOPs.

C. Efficiency of Regularity Model
The efficiency of a de-noising approach means how well it can guide the population to approximate the true PS, which can be quantitatively measured by the average distance to the true PS in the decision space.That distance can be calculated by averaging the shortest distances of Pop s to a set of uniform samples from true PS.Now, we study how well a regularity model works in noisy environments by comparing with EDA (a different modelbased approach that has been used for de-noising).We use the EDA with the univariate marginal product model [19] as a compared approach.SBX [27] without any de-noising ability is also compared as a reference.Therefore, we conduct the following experiments on F1-F4 with 2-30 decision variables in two different noise levels (σ = 0.1 and σ = 0.2).
1) Randomly generate 500 solutions from the search space.
2) Find the nondominated solutions as the selected population Pop s .3) Build a regularity model from Pop s as the modeling step in [3], and sample 100 solutions exactly on the obtained model as in Fig. 3(a).4) Build a univariate marginal product model [19] from Pop s , and sample 100 solutions on the obtained model.5) Use SBX [27] to generate 100 solutions from Pop s .These compared approaches have different output.The regularity model outputs analytical models (ζ with x and U i (i = 1, . . ., m − 1)).Whereas, EDA and SBX produce solutions.Therefore, we use solutions sampled from the obtained regularity model as in Fig. 3(a) for a fair comparison with EDA and SBX.Fig. 6 presents the average distance to the true PS of the 100 solutions generated by each approach on the four test instances with different numbers of decision variables under two different noise levels (σ = 0.1 and σ = 0.2).The average distance of the solutions in Pop s to the true PS is also presented in this figure as a reference.It is clear that on the instances with two or three decision variables, all these approaches perform similarly.Their obtained solutions have about the same average distance to the true PS.However, on all the four instances with more than four decision variables, the average distance of new solutions generated by the regular model to the true PS is smaller than that of the solutions in Pop s .Recalling that Pop s was selected from noisy environments, this suggests that the regularity model does have a de-noising ability.In contrast, the average distance of the new solutions generated by SBX is about same as that of the solutions in Pop s .Therefore, SBX is unable to de-noise.The EDA approach can de-noise on F3 and F4 with a large number of decision variables, but it is not as good as the regular model approach.A major reason is that the EDA approach does not make use of the PS regularity property.With the PS regularity property, the dimension of the model is reduced from n to m − 1.Therefore, when n is larger than m − 1, which often happens in MOPs, the regularity model, as a de-noising tool, is more effective than the EDA approach.

D. Motivation
With advantages of models and regularity property, the regularity model is more effective than other model-based methods in de-noising.Having this in mind, we believe that a regularity model on the obtained nondominated solution set can help existing MOEAs for noise-free problems to deal with noisy MOPs.Therefore, we will focus on how the regularity model works for noisy problems in Section IV and how to use the regularity model for improving existing MOEAs in Section V.

IV. PROPOSED REGULARITY MODEL FOR DE-NOISING
A simple implementation of the regularity model has been proposed for noise-free MOPs in [3].In this section, we modify it for noisy MOPs. Break. 6: Local PCA [22] rather than K-means [28] is employed to classify the population into K clusters (i.e., K linear models), because it suits the task of manifold division.Considering a PS with K segments of manifold, each cluster C j contains one (m − 1)-D manifold, i.e., the contribution of principal components should concentrate on the first (m−1) components as (6), where λ i j is the ith eigenvalue in C j and P 0 is one parameter of PCA.P 0 is usually set as 0.7, 0.8, and 0.9 [29], we set it as 0.9 in this paper However, the cluster method with a fixed number of clusters K cannot suit all different distributions well.In view of this, we design a self-adaptive method in Algorithm 1.The self-adaptive local PCA tries to cluster the population with an increasing K and stops until all the clusters satisfy (6) [the population is divided into K (m − 1)-D manifolds].
The minimal cluster size to build an (m−1)-D linear model is m.That is the reason why we iterate K from 1 to N/m in Algorithm 1.PS is a piecewise (m − 1)-D manifold, there would not be as many as N/m segments in most cases.Our method stops once K (m − 1)-D manifolds are found.Therefore, it does not incur any heavy computational cost.

B. Sampling
Aiming at noisy MOPs, we improve the sampling step especially in two aspects [endpoint maintenance and uniform sampling matrix (USM)].
1) Endpoint Maintenance: In the regularity model, only the first (m − 1) principal components U i (i = 1, . . ., m − 1) of ζ are considered.We define that either point x i min or x i max [shown in (7) and (8)] is an endpoint P in the decision space, which contributes to the spread of PS.Because of noises, the solution set is very changeable in different generations.That is the reason why endpoints (the spread of PS) are hard to be maintained There are two cases between two generations, either model ζ changes significantly or slightly.When ζ changes significantly, the location and shape of ζ change significantly, the endpoints in the previous generation may not be on the new ζ , thus they do not need to be maintained.However, when ζ changes slightly, the location and shape of ζ change slightly but the spread might be changeable, the endpoints in the previous generation may be on the new ζ , they need to be kept in the new ζ to maximize the spread of PS.
The changing degree of ζ between generations can be quantitatively calculated.Taking Fig. 7 We record endpoints in every generation to maintain endpoints as Algorithm 2. When an endpoint from the previous generation is on ζ , the model changes slightly between generations, the points need to be maintained for the calculation of the points x i min and x i max for the tth generation by ( 7) and ( 8).Thus, the spread of PS can be kept well between different generations.
2) Uniform Sampling Matrix: As shown in (3), the sampling step is to sample uniformly in an (m − 1)-D cubic space.Uniformly sampling from the regularity model can provide better diversity for the population.We randomly sample a matrix rand(N, m − 1) in every generation and accumulate these matrices to form a relatively USM.Our algorithm adds new random samples and deletes extra samples in every generation to form a more USM.Some diversity maintenance strategies in MOEAs can achieve the deleting task, such as the crowding distance in NSGA-II [1], environment selection in SPEA2 [4], and the harmonic distance in [30].Add P l t−1 to Pop.  7) and ( 8)) to P t for the t-th generation.

V. IMPROVE EXISTING MOEAS BY REGULARITY MODEL
MOEAs have been developed to handle different MOPs.However, existing MOEAs for noisy problems have not taken the full use of MOEAs for different types of MOPs.Therefore, the main purpose of this paper is to improve the performance of any existing MOEAs on a wide range of MOPs with noises by adding the regularity model as a part of them.To show how to embed the regularity model in existing MOEAs, we use NSGA-II [1] as an example in the following sections.

A. Example MOEA Embedded With Regularity Model
In Section II, we have shown that the regularity model can efficiently reduce noises in the population.Building a regularity model for the nondominated solution set in every generation can help MOEAs to de-noise.Thus, the nondominated solution set is modeled by a regularity model in every generation.To be compatible with population-based MOEAs, the regularity model samples a population from a model as shown in Section IV-B.The final output is a set of samplings from the obtained regularity model in the last generation.
As shown in Fig. 8, the regularity model can be embedded in existing MOEAs as an additional reproduction operator, which is similar to that of [31], the other parts of the MOEA are not affected by the regularity model.In this paper, we embed the

B. Extra De-Noising
To provide a more precise sample points for the regularity model, we add a preprocessing step [an extra de-noising (ED) strategy] to RM-NSGA-II.Inspired by extended averaging [16], similar individuals in the decision space can be viewed as of approximated objective values.As shown in Fig. 9, the observed objective values are far from their true objective values due to noises, but the average of neighbors (gray dots) in the decision space is approximated to the true objective value.
Therefore, there is a set to record the solutions during the optimization process.After observing the objectives of a new solution, its decision variables are compared with the existing solutions in the set to check if any similar solutions have been searched before.If there are any similar solutions in the set, the averages of both decision variables and objective values are employed for the solution after the extra strategy.We use the L 1 -norm-based distance as the similarity evaluation.

A. Test Problems, Evaluation Metrics, and Parameter Settings
In order to evaluate our new idea for noisy MOPs, we employ the ZDT [32], DTLZ [33], and WFG [34] problems with different noise levels as the test problems in our experiments.The details of those test problems are shown in Table II.We adopt generational distance (GD) [35], minimal spacing [36], and inverted generational distance (IGD) [37] to evaluate the performance of different algorithms.GD is the average distance from the obtained PF to the true PF, which describes the convergence of the obtained PF.Minimal spacing is a metric for uniformity, which uses nonduplicated distances for its final calculation.IGD is the average distance from the true PF to the obtained PF, whose value reflects both convergence and diversity of the obtained PF.In the following experiments, the stopping criterion is set as 50 000 function evaluations, and the reproduction methods in the compared algorithms are set as SBX (η = 15 and probability = 1) and polynomial mutation (η = 15 and probability = 0.1).All the experiments are repeated independently for 30 times.

B. Experiments for Extra De-Noising
The ED strategy is a special noise handling operation in RM-NSGA-II, whose effect to RM-NSGA-II is analyzed in this section.We compare the proposed algorithms with and without the ED strategy on ZDT2 and DTLZ2.The results on GD, minimal spacing, and IGD are shown in Table III, which are analyzed by Wilcoxon signed-rank test [38].For the noise-free problems, the algorithm without the ED strategy has better performance than that with the ED strategy.This is because that the averaging in neighborhoods for the noise-free problems adds uncertainty to function observations.However, for the noisy problems, the ED strategy helps the algorithm to build an accurate model.
From the results in Table III, we find that the ED strategy improves the performance of RM-NSGA-II for noisy MOPs on both convergence and diversity.Although the improvement on the two-objective problem ZDT2 is small, the improvement on the three-objective problem DTLZ2 is significant.In short, the ED strategy can help the proposed algorithm to obtain a better model in noisy environments.

C. Experiments for Endpoint Maintenance
In this section, we analyze the effect of the endpoint maintenance strategy in the regularity model.We compare the proposed algorithms with and without the endpoint maintenance strategy on ZDT2 and DTLZ2 (σ = 0.1 and σ = 0.2).The distances to the true endpoints of the proposed algorithms with and without the endpoint maintenance strategy over generations are shown in Fig. 10.With the endpoint maintenance strategy, the proposed algorithm can obtain the solution close to the true endpoints in a noisy environment, which leads to a larger PS spread than that without the endpoint maintenance strategy.

D. Experiments for Uniform Sampling Matrix
In this section, we analyze the USM in the regularity model.We compare the proposed algorithms with and without the USM on ZDT2 and DTLZ2.The results on GD, minimal   spacing, and IGD are shown in Table IV, which are analyzed by Wilcoxon signed-rank test [38].As the USM has no relation to the modeling in RM-NSGA-II, GD is not influenced by the USM.For minimal spacing, the improvement is significant especially on ZDT2.The reason is that the USM becomes more and more uniform by generations than a random sampling matrix.Additionally, the value of IGD is significantly improved on both ZDT2 and DTLZ2 because of better diversity from the USM.

E. Comparative Experiments
In order to test the performance of RM-NSGA-II on noisy MOPs, we employ several different MOEAs as compared algorithms in our experiments (shown in Table V).They are NSGA-II, which is one of the most well-known MOEAs and the base of RM-NSGA-II, RM-MEDA, which is a regularity   VI shows the IGD values of compared algorithms on the ZDT problems.For the noise-free ZDT problems, NSGA-II and RM-MEDA can outperform other algorithms.For noisy ZDT1-3, RM-NSGA-II outperforms other compared algorithms, but for noisy ZDT4, RM-NSGA-II cannot perform better than the MOEAs for noisy problems (NTSPEA and MOSPEA).As ZDT4 is a multimodal problem, noises make algorithms easily trapped in local optima.Thus, it is hard for the regularity model to learn to jump out of local optima due to its low exploration ability.In contrast, NTSPEA and MOSPEA have strategies to improve the convergence for noisy MOPs.That is the reason why RM-NSGA-II has poor performance on ZDT4, which is similar to the case of F4 in Section III-C.
Table VII shows the IGD values of compared algorithms on the DTLZ problems.For noise-free DTLZ problems, both NSGA-II and RM-MEDA are the winners.The results of noisy DTLZ problems are similar to that of ZDT.RM-NSGA-II can outperform other compared algorithms on noisy DTLZ2 and DTLZ4, but cannot outperform other compared algorithms on noisy DTLZ1 and DTLZ3 that are multimodal, because the regularity model concentrates on local optima.The reason of the less satisfactory IGD of RM-NSGA-II on DTLZ4 comes from poor diversity.As we know, the mapping relation of DTLZ4 from PS to PF is not uniform, but RM-NSGA-II samples uniformly on PS, which leads to poor diversity on PF.
Table VIII shows the IGD values of compared algorithms on the WFG problems.For the noise-free WFG problems, NSGA-II outperforms other algorithms on WFG1, WFG4, WFG5, and WFG8, RM-MEDA outperforms other algorithms on WFG2 and WFG7.RM-NSGA-II outperforms other algorithms on most noisy WFG problems except for WFG2 and WFG7, where RM-MEDA is the winner.
To explore the limitation of the noise level that compared algorithms can deal with, we test these five algorithms on ZDT2 and DTLZ2 with higher level noises (up to σ = 0.5) for 30 independent runs.The curves of IGD value versus the σ value are shown in Fig. 11.
All the compared algorithms increase their IGD values as the σ value increases, and RM-NSGA-II has the smallest  RM-NSGA-II can handle problems with noises, large or small, better than all other algorithms.
2) Discussion: From the results in the last section, we can conclude the behaviors of the characteristics on these compared algorithms.Fig. 12 is the GD values of RM-NSGA-II, NSGA-II, RM-MEDA, MOSPEA, and NTSPEA over generations on DTLZ2 (σ = 0, σ = 0.05, σ = 0.1, and σ = 0.2).For the noise-free DTLZ2, all the compared algorithms have approximated convergence performance except for RM-MEDA.However, the situation changes when σ increases.In the cases with σ = 0.05 and σ = 0.1, MOEAs for noisy MOPs (RM-NSGA-II, MOSPEA, and NTSPEA) have better convergence performance than MOEAs for noisy-free MOPs (NSGA-II and RM-MEDA), MOSPEA converges fast in the first 10 000 function evaluations, but RM-NSGA-II outperforms MOSPEA after 10 000 function evaluations; NTSPEA has a smaller GD value than NSGA-II but a larger GD value than RM-NSGA-II and MOSPEA.When σ grows to 0.2, the advantage of RM-NSGA-II over MOSPEA on GD rises, and NTSPEA cannot outperform NSGA-II due to the large number of re-evaluations for the high level of noises.
NSGA-II searches individuals to form the nondominated solution set, while RM-NSGA-II obtains a regularity model of the nondominated solution set.Therefore, NSGA-II can  obtain a small number of solutions very close to the true PFs but distributed randomly.As NSGA-II has no de-noising strategy, it cannot obtain the observed objective values precisely or maintain diversity reasonably, which is the reason why the performance of NSGA-II on noisy problems is unstable.In contrast, RM-NSGA-II considers the solution set as a whole, which helps de-noising and obtaining a solution set with both satisfactory convergence and diversity.However, building (m − 1)-D models in the decision space may limit RM-NSGA-II.There is a mapping relation from PS to PF, a large spread of PS might not lead to a large spread of PF, which is the reason why RM-NSGA-II cannot have a larger spread of PF than NSGA-II on some WFG problems (their solution sets are more complicated than that of ZDT problems, and NSGA-II emphasizes extreme points in the objective space), WFG2 for instance.That is also the reason why RM-NSGA-II loses its performance on the multimodal problem DTLZ3.
RM-MEDA uses the regularity model as RM-NSGA-II, but it is not suitable for noisy MOPs from the result.From Table IX, we can find the significant differences between RM-NSGA-II and RM-MEDA.
For noisy MOPs, RM-NSGA-II only uses nondominated solutions rather than the whole population as RM-MEDA does to build the model, because RM-NSGA-II aims to capture the manifold of the nondominated set.In contrast, RM-MEDA aims to learn the manifold of the population by the regularity model to promote optimization.For such different aims, RM-MEDA introduces noise ε in the model and extends the manifold to add diversity, whereas RM-NSGA-II only keeps the manifold strictly from the obtained model.RM-NSGA-II also adds special strategies to maintain endpoints and to de-noise, which RM-MEDA never takes into account.Additionally, RM-NSGA-II identifies the number of clusters adaptively and maintains the sampling matrix into an USM.All the above differences make RM-NSGA-II more effective than RM-MEDA in solving noisy MOPs.
For noise-free problems, the ED strategy in RM-NSGA-II still averages the objective values without uncertainty of different solutions in a small neighborhood, which can lead inaccurate objective values for RM-NSGA-II, thus the performance of RM-NSGA-II is lowered (shown in Section VI-B).That is the reason why RM-NSGA-II cannot perform better than RM-MEDA on the problems without any noises.
MOSPEA is a representative of MOEAs for noisy MOPs with the probabilistic Pareto ranking, which results in its good convergence ability on noisy problems.That is the reason why MOSPEA performs better on hard problems such as DTLZ1, DTLZ3, and ZDT4.However, probabilistic Pareto ranking cannot provide satisfactory diversity in MOSPEA, especially when the size of noises increases.Both RM-NSGA-II and MOSPEA aim at noisy MOPs, hence we only compare their performance on noisy problems.Probabilistic Pareto ranking performs better than building a model on convergence when the structure of PS is simple (see the results on the ZDT and DTLZ problems).However, when the PS becomes complicated, building a model works better than probabilistic Pareto ranking.
NTSPEA adopts re-evaluation, thus, the effect brought by noises can be reduced.With relatively accurate objective evaluations, NTSPEA can obtain the solutions with better convergence even on DTLZ3, whereas RM-NSGA-II cannot obtain approximated models.However, the model construction in RM-NSGA-II works on the WFG problems, which leads to better convergence than NTSPEA.
Summarily, existing MOEAs such as NSGA-II and RM-MEDA are insufficient to solve noisy MOPs because they were not designed to cope with noises in MOPs.For the two compared MOEAs for noisy MOPs, both of them fail to maintain diversity well due to their focusing on a single individual rather than a model of a solution set.Comparing with these algorithms, RM-NSGA-II improves NSGA-II on noisy MOPs.The regularity model can efficiently learn from noisy environments, but its performance on the spread may be limited by the (m − 1)-D model.Therefore, it is not good at multimodal problems.

VII. CONCLUSION
In this paper, we have analyzed the de-noising performance of the regularity model in noisy environments and the behavior of the regularity model embedded in NSGA-II (RM-NSGA-II) for noisy MOPs.Due to the effectiveness of the regularity model, RM-NSGA-II can obtain the solution set with both satisfactory convergence and diversity in noisy environments, which is shown by our experiments.The contributions of this paper are summarized as follows.1) The convergence of RM-NSGA-II on multimodal problems such as DTLZ3 is not good enough.2) In the regularity model, only the even disturbance around PS is assumed.Hence, the regularity model may not perform well for discontinuous MOPs or MOPs with unsymmetrical mapping from the decision space to the objective space.
3) The diversity exploration ability of the regularity model on the WFG problems should be improved.

Fig. 3 .
Fig. 3. Sampling from the regularity model.(a) Sampling on the regularity model.(b) Sampling around the regularity model.

Fig. 6 .
Fig. 6.Average distance of the selected population Pop s and samplings obtained by SBX, EDA, and the regularity model to the true PS on F1-F4 with 2-30 decision variables in different sizes of noises (σ = 0.1 and σ = 0.2).

Algorithm 1
Pseudo Code of the Self-Adaptive Local PCA 1: Parameters: Pop-the non-domination solution set in the current population, N-the size of Pop, m-No. of objectives.2: For K = 1 : N/m 3:Divide Pop into K clusters by local PCA[3],[22].
as an example, endpoints A and B are two endpoints in the previous generation, ζ is the model in this generation.Determining whether those two points are on ζ can be done by their projection to ζ .We find the projection distance d A p of A is much longer than its offset distance d A ⊥ , thus A is on ζ .By adding A, the spread of ζ increases.However, the situation of B is different, thus B is not on ζ .Therefore, the ratio d 2 p /(d 2 p + d 2 ⊥ ) of one endpoint x e shows the changing degree, when the ratio is higher than 0.95, ζ changes slightly, x e needs to be maintained.Distances d p and d ⊥ of x e can be calculated as shown below

Algorithm 2
Pseudo Code of the Endpoint Maintenance for an Obtained Model ζ in the tth Generation 1: Parameters: P l t−1 -the l-th endpoint in the t − 1-th generation, Pop-the solution set for building the model, and ζ -the obtained model in the t-th generation 2: For each P l

7 :
Calculate a i and b i as Equations (4) and (5), where x ∈ Pop, i = 1, . . ., m − 1. 8: Add the points x i min and x i max (shown in Equations (

Fig. 8 .
Fig. 8. Flow-chart of embedding the regularity model in MOEAs, the solid line represents the general flow of MOEAs, and the dotted line represents the flow of building regularity model on the nondominated solution set (modeling after the clustering by self-adaptive local PCA, endpoint maintenance, and sampling by USM).

Fig. 10 .
Fig. 10.Distances to the true endpoints of the proposed algorithms with and without the endpoint maintenance strategy over generations on ZDT2 and DTLZ2 (σ = 0.1 and σ = 0.2).

TABLE II CHARACTERISTICS
OF TEST PROBLEMS

TABLE III RESULTS
OF THE PROPOSED ALGORITHMS WITH AND WITHOUT THE ED STRATEGY ANALYZED BY WILCOXON SIGNED-RANK TEST.THE SIGNIFICANT RESULTS ARE IN BOLD FACE (SIGNIFICANCE LEVEL = 0.05)

TABLE IV RESULTS
OF THE PROPOSED ALGORITHMS WITH AND WITHOUT THE USM ANALYZED BY WILCOXON SIGNED-RANK TEST.
THE SIGNIFICANT RESULTS ARE IN BOLD FACE (SIGNIFICANCE LEVEL = 0.05)

TABLE V EXPLANATIONS
FOR COMPARED ALGORITHMS model-based MOEA, MOSPEA, which is a probabilistic Pareto ranking-based MOEA for noisy MOPs, and NTSPEA, which is an averaging-based MOEA for noisy MOPs.We conduct the experiments on the ZDT, DTLZ, and WFG problems.All the experiments are terminated after 50 000 function evaluations and repeated for 30 independent runs.Other parameter

TABLE VI IGD
VALUES OF RM-NSGA-II, NSGA-II, RM-MEDA, MOSPEA, AND NTSPEA ANALYZED BY WILCOXON SIGNED-RANK TEST ON THE ZDT PROBLEMS.THE SIGNIFICANT RESULTS ARE IN BOLD FACE (SIGNIFICANCE LEVEL = 0.05) TABLE VII IGD VALUES OF RM-NSGA-II, NSGA-II, RM-MEDA, MOSPEA, AND NTSPEA ANALYZED BY WILCOXON SIGNED-RANK TEST ON THE DTLZ PROBLEMS.THE SIGNIFICANT RESULTS ARE IN BOLD FACE (SIGNIFICANCE LEVEL = 0.05) settings have been shown in Section VI-A.The comparative results are shown in Tables VI-VIII.1)Results: Table

TABLE VIII IGD
VALUES OF RM-NSGA-II, NSGA-II, RM-MEDA, MOSPEA, AND NTSPEA ANALYZED BY WILCOXON SIGNED-RANK TEST ON THE WFG PROBLEMS.THE SIGNIFICANT RESULTS ARE IN BOLD FACE (SIGNIFICANCE LEVEL = 0.05)

TABLE IX COMPARISON
BETWEEN RM-NSGA-II AND RM-MEDA 1) De-Noising Ability of Regularity Model: Although the regularity model has been applied in RM-MEDA, it is the first time used for handling noisy MOPs.In this paper, we find the effectiveness of the regularity model in noisy MOPs.The reason comes from two aspects, one is from the natural de-noising characteristics of modeling, the other one is from the dimension-reduced complexity by considering the features of MOPs.2) Improving Existing MOEAs on Noisy Problems: The regularity model has very good transportability in existing MOEAs.Thus, those MOEAs that are not good at noisy problems can be improved by embedding the regularity model.With the regularity model like a patch in the system of MOEAs, existing MOEAs can solve noisy MOPs.Although the regularity model can help existing MOEAs to solve noisy MOPs satisfactorily, there are still several issues that should be studied in the future.