On the Benefits and Risks of Using Fitness Sharing for Multimodal Optimisation

Fitness sharing is a well-known diversity mechanism inspired by the idea that individuals in the population that are close to each other have to share their ﬁtnesses in a similar way to how species in nature occupying the same ecological environment have to share resources. Thus, by derating the ﬁtness of close individuals one hopes to encourage the population to spread out more. Previous runtime analyses of ﬁtness sharing studied a variant where selection was based on populations instead of individuals. We study the conventional ﬁtness sharing mechanism based on individuals and use runtime analysis to highlight its beneﬁts and dangers on the well-known bimodal test problem TwoMax , where diversity is crucial for ﬁnding both optima. In contrast to population-based sharing, a (2 + 1) evolutionary algorithm (EA) with conventional ﬁtness sharing does not guarantee to ﬁnd both optima in polynomial time even when problem speciﬁc knowledge is used to estimate the distance between individuals; however, a ( μ + 1) EA with μ ≥ 3 always succeeds in expected polynomial time. We further show theoretically and empirically that large offspring populations in ( μ + λ ) EAs can be detrimental as creating too many offspring in one particular area of the search space can make all individuals in this area go extinct. We conclude the paper with an empirical study indicating that similar conclusions may be drawn when using the genotypic distance that has to be relied upon when no problem speciﬁc knowledge is available. © 2018 The


Introduction
Many real-world optimisation problems are multimodal by nature, i.e., they have a number of different local optima and may have more than one global optimum.Nature-inspired techniques have proven to be very popular and powerful to tackle these types of problems [2] and different optimisation goals have been discussed in the literature [3].Taking a global perspective, one is for example interested in locating a single (local or global) optimum.However, in practice it is often more important to identify a multitude of different optima, either in a simultaneous or sequential fashion.Our analyses therefore concentrates on the multi-local aspect of multimodal optimisation, i.e., where the goal is that the set of local optima is contained in the population by the end of the run.
In evolutionary computation, diversity mechanisms are commonly used to tackle multimodal optimisation problems [4][5][6], particularly in the context of a multi-local perspective.The main idea is to try and introduce niches in the population to prevent the algorithm from converging to a single solution such that different niches explore different peaks of the fitness landscape.Thus, in this context, niches are often understood as narrow, connected areas of the search space.In contrast to the numerous and widespread applications, the amount of theoretical research rigorously proving the effectiveness of diversity mechanisms is limited.Nevertheless, some previous theoretical work on diversity mechanisms for multimodal optimisation using nature-inspired techniques exists.Most notably, runtime analyses are available where the performance of diversity mechanisms is evaluated in terms of their optimisation time (i.e., the number of fitness function evaluations required to find the global optimum or a set of optima).
Friedrich et al. [7] showed that diversity mechanisms may be necessary by analysing population-based evolutionary algorithms (EAs) for a bimodal function called TwoMax.TwoMax is a function of unitation, that is, the fitness only depends on the number of 1-bits |x| in the considered search point x.Hence, while the bit string x is the genotype, the number of ones |x| may be considered the corresponding phenotype, from which the fitness max{|x|, n − |x|} is then derived (where n is the length of the bit string).The function is easy from the perspective of localising the two local optima in a sequential fashion, for example, by using local search coupled with a restart strategy.On the other hand, the function is very challenging from a multi-local perspective, since the two local optima are as far away as possible from each other.Friedrich et al. [7] proved that a population-based EA with realistic population size (i.e., at most sublinear in the problem size) would fail at locating efficiently both optima of TwoMax if no diversity mechanism is used.They then showed that some diversity mechanisms make the algorithm efficient (i.e., fitness sharing, which we consider here in more detail, and deterministic crowding, where offspring compete for survival only with their parents) while others do not (i.e., avoiding genotype duplicates and avoiding duplicates of equal fitness).Recently, it has been proven that also the clearing mechanism, where resources are only assigned to the best individual of a subpopulation, makes population-based EAs efficient for the TwoMax function [8].
Diversity mechanisms have also been shown to enhance the capabilities of the recombination operator by favouring the emergence of dissimilar individuals.Fischer and Wegener [9] were the first to rigorously study this effect by analysing the performance of a genetic algorithm (GA) using fitness sharing on colouring problems inspired by the Ising model.They showed that the diversity mechanism helps the exploration of large plateaus and proved a speedup of order n over the simple (1+1) EA on one-dimensional Ising models with n nodes.Sudholt [10] further proved an exponential speedup for a GA for the Ising model on trees, using the fact that fitness sharing is powerful enough to allow the algorithm to tunnel through shallow fitness valleys.Recently, Dang et al. [11] showed that several diversity mechanisms allow an exponential speedup in the time required to escape from the local optima via recombination for the standard Jump k benchmark function.Diversity mechanisms can also enhance the capabilities of the recombination operator for hillclimbing the OneMax function.
A (2+1) GA with genotype diversity optimises the function in half the expected time (i.e., (e/2 + o(1))n log n) required by EAs only using standard bit mutation with fixed mutation rate [12], while GAs without diversity have been experimentally shown to be slower and the best upper bound on the expected runtime known is (3/4 + o(1))en log n [13].It is also worth noting that an analysis for the Balance benchmark function has shown that diversity mechanisms allow the efficient optimisation of deceptive functions in the context of dynamic optimisation [14].
Fitness sharing [15,16] is amongst the best known diversity mechanisms.It is featured in many surveys [4][5][6] and used in many practical applications (see, e.g.[17][18][19][20][21], to name just a few).In this scheme niche formation is induced by using a sharing function that derates the fitness of an individual by an amount related to its 'distance' to the rest of the population.It is inspired by the idea that individuals in the same niche of the search space, i.e., individuals that are close to each other, have to share resources (their fitness)-similarly to how species in nature occupying the same ecological environment have to share resources.As a result individuals are encouraged to increase their distance from other individuals, thus spread out in the search space.
The effectiveness of fitness sharing may vary considerably according to which measure is used to define the distance between individuals.In particular, different fitness sharing functions are obtained according to how the distance between individuals is defined [4].Fitness sharing can use distances defined on a genotypic or phenotypic level [4][5][6]16,22].Genotypic sharing [4] uses genotypic distances like the Hamming distance to measure how close individuals of the population are to each other.Phenotypic sharing refers to distances in the decoded parameter space [22], which in turn depends on the encoding used.For example, if the genotype encodes a vector of real values, the phenotypic distance is commonly defined as the Euclidean distance between two such vectors [22].In our case of functions of unitation, the phenotypes correspond to the number of ones in a search point, hence the number of ones may be used as distance measure to optimise the class of functions of unitation [23].Note that when using phenotypic sharing in this way, we are in fact using problem-specific knowledge: we exploit the fact that we are dealing with a function of unitation.Such knowledge may not be available in a general black-box setting, and in this case only genotypic sharing may be used.
For the theoretical work presented in this paper we will use the phenotypic distance for three main reasons.The first is that this allows to highlight problems that may be encountered by using fitness sharing, even when problem knowledge is incorporated in the diversity mechanism.The second is that using phenotypic sharing allows for easier comparisons with previous results available in the literature, since they used this distance measure [7,8].The third reason is that the analysis is considerably simplified compared to genotypic sharing but similar conclusions on algorithmic behaviour may be observed.This is indicated by the empirical analysis for genotypic sharing presented in Section 7.
Previous theoretical work on fitness sharing has concentrated on a somewhat unusual implementation of the sharing mechanism.Rather than selecting individuals based on their shared fitness f (x, P ), selection was done on a level of populations, creating a population that maximises the overall shared fitness of the population [7,[9][10][11]14].While maximising the shared fitness of the population is indeed what is sought in fitness sharing, this approach has the drawback that the fitness of all the possible combinations of individuals needs to be examined.For large populations this is prohibitive as the number of populations that need to be examined is μ+λ λ (see Section 2.1 for a detailed discussion).In this paper we analyse the performance of the conventional fitness sharing approach based on individuals to match the approach taught in surveys and tutorials [4][5][6] and the way that fitness sharing is used in practice.As pointed out by Goldberg and Richardson [15], shared fitness values can be used with any selection mechanism.However, to allow for comparison with previous work on the effectiveness of fitness sharing for multimodal optimisation we use the same analytical framework, i.e., a standard (μ+λ) EA using the shared fitness values within the selection for replacement, and the same example function, i.e., the simple bimodal function TwoMax consisting of two different symmetric branches [7,24,25].
In the context of multimodal optimisation one crucial parameter of the algorithm is the population size, since this determines the number of local optima that can be found simultaneously.Lower bounds for the population size have been investigated in different settings [16,26] and this work further adds to the understanding of the influence of this parameter.
A (μ+1) EA using the unconventional approach (i.e., maximising the phenotypic shared fitness of the population) can efficiently optimise TwoMax for any population size μ ≥ 2 [7].The reason is that, in any population, the individuals with the smallest and the largest number of ones are always accepted for the next generation.Our analysis shows that using the conventional (phenotypic) sharing approach leads to considerably different behaviours of evolutionary algorithms.
We first concentrate on the effects of the parent population in Section 4. A population of size μ = 2 is not sufficient to guarantee that the (μ+1) EA finds both optima in polynomial time.If the two individuals are initialised on the same branch, then there is a high probability that they will both find the same local optimum.Furthermore, there is a chance that the algorithm fails also when the two individuals are initialised on opposite branches.This leads to a worse failure probability than that of a simple crowding algorithm or that of a (1+1) EA that is restarted twice.On the other hand Section 5 shows that for μ ≥ 3, once the population is close enough to one optimum, individuals descending the branch heading towards the other optimum are accepted.This threshold, that allows successful runs with probability 1, lies further away from the local optimum as the population size increases.
Concerning the effects of the offspring population, in Section 6 we show that large values of λ can be detrimental.
We rigorously prove that increasing the offspring population of a (μ+1) EA to a (μ+λ) EA, with μ = 2 and λ ≥ 2 a constant, results in an overcrowding that can make a (sub-)population go extinct.For the special case of λ = 2 we also prove an increased failure probability.We complement this result with an empirical analysis that suggests that the (μ+1) EA is successful if λ < μ/2 and that it almost always fails for λ ≥ μ.We conclude the paper with an empirical analysis indicating that similar algorithmic behaviour to that proven theoretically also occurs if no problem specific knowledge is available and genotypic sharing is used.A preliminary version of this work with parts of the results and without most of the proofs can be found in [1].

Analytical framework
In our analyses, we consider a simple bimodal function consisting of two different symmetric branches (i.e., OneMax and ZeroMax) and we have defined both 0 n and 1 n to be global optima (see Fig. 1).Formally: Moreover, we consider a standard (μ+λ) EA as shown in Algorithm 1 using standard bit mutation with mutation probability 1/n, uniform random selection of parents and truncation selection for selection for replacement.However, instead of the raw fitness, it uses the shared fitness value in the truncation selection.
1: Let t = 0 and initialise P 0 as a population of μ individuals chosen uniformly at random from {0, 1} n .2: repeat 3: Select a parent x ∈ P uniformly at random from the population.

5:
Let x i := x.Flip each bit in x i independently with probability 1/n.

6:
end for 7: Create a new population P t+1 by selecting the μ best individuals according to their shared fitness in P t ∪ λ i=1 {x i }, breaking ties towards favouring offspring over parents, breaking remaining ties uniformly at random.8: Let t := t + 1.

9: until stopping criterion met
We consider fitness sharing as introduced by Goldberg and Richardson [15].Throughout this work, |x| denotes the number of 1-bits in x.The shared fitness of an individual and the sharing function is Here, d(x, y) is the distance between the two individuals x and y and σ is the sharing distance beyond which individuals do not share fitness.More precisely, if d(x, y) < σ then sh(x, y) > 0 and the shared fitness of x and y is lower than their true fitness.We say that then x and y share fitness.If d(x, y) ≥ σ then sh(x, y) = 0 and x and y do not share fitness.We consider fitness sharing with phenotypic sharing as in [7], where the distance between individuals is based on the number of ones: d(x, y) := |x| − |y| .Note that d is a distance metric in phenotype space, that is, d(x, y) = 0 implies that x and y have identical phenotypes, even though their genotypes might be very different.We use σ = n/2 (as in [7]) as this is the smallest distance that allows us to discriminate between the two branches.The parameter α is a constant, typically set to 1, that regulates the shape of the sharing function.We use the standard value α = 1 and obtain .
For s := μ + λ, let P := {x 1 , x 2 , . . ., x s } denote the extended population of current search points and the new offspring, labelled such that denote the sum of phenotypic distances of x j to all other members of the extended population.Individual distances are capped at the sharing distance n/2 so that the shared fitness can be written as Since we are particularly interested in the multi-local perspective and aim at analysing the global exploration capabilities of the population-based EA, we call a run successful if it manages to find both optima of TwoMax (i.e., a population is reached that contains both 0 n and 1 n ) efficiently.The expected number of generations for this to happen is called expected running time.
In the remainder we say that an event happens with overwhelming probability (w.o.p.) if it occurs with probability at least 1 − 2 − (n ε ) for some constant ε > 0.

On the time complexity of implementing fitness sharing
Before analysing the optimisation time, we discuss the overhead from implementing fitness sharing in terms of the classical notion of computation time.To this end, we assume that fitness values f (x) are already known and accessible in time O (1).

Computing sharing function values sh(x, y).
In what follows we denote by T (n) the time to compute a sharing function sh(x, y).A naive implementation would give T (n) = (n) for both phenotypic and genotypic distances.If the phenotype |x| is stored when computing f (x), the phenotypic sharing function can be computed in additional time O (1).Another approach that works for phenotypic and genotypic sharing is to update sh(x, y) according to the respective value of x's parent and checking any bits flipped during mutation.Since in expectation only a constant number of bits have to be reconsidered, this leads to a constant expected time (and O (n) preprocessing time) for each value sh(x, y).
With both population-based and the conventional individual-based fitness sharing we need to compute or maintain sh(x, y) for all individuals x, y from the union of parents and offspring.These can be stored in a (μ + λ) × (μ + λ) matrix that takes time ((μ + λ) 2 T (n)) to compute initially, but can be updated in time (λ(μ + λ)T (n)) in each generation as only distances between the λ offspring and the other μ + λ − 1 search points need to be computed.
Lemma 1.Let T (n) be the time to compute sh(x, y) for any two search points x, y.Then for a population P of μ parents and λ offspring in a (μ+λ) EA a (μ + λ) × (μ + λ) matrix of all values sh(x, y) can be created in time Computing shared fitness values f (x, P ).In order to compute a shared fitness f (x, P ) from f (x), we need to compute y∈P sh(x, y).This sum can be computed from scratch in time The preprocessing time at the start of the run is O ((μ + λ) 2 ).Along with Lemma 1, we obtain the following time bounds.

Theorem 2. Let T (n) be the time to compute sh(x, y) for any two search points x, y. Then the overhead from individual-based fitness sharing in one generation of the (μ+λ) EA is O (λ(μ + λ)T (n)), with an additional preprocessing time at the start of the run of
Time complexity of population-based fitness sharing.Given a population P of μ parents and λ offspring, population-based fitness sharing looks for a subpopulation P ⊆ P of size |P | = μ that maximises the shared fitness of the population, f (P ) = x∈P f (x, P ).Note that there are μ+λ μ possibilities to choose P and we are not aware of an efficient algorithm that is faster than computing all μ+λ μ shared population fitnesses.
We describe the most efficient way we could find, based on computing f (P ) values incrementally.We iterate over all possible population of size μ that can be formed from μ + λ parents and offspring.Chase's Twiddle algorithm [27] outputs a sequence P 1 , P 2 , . . . of all such size-μ populations in time O μ+λ μ , and this sequence has the property that two subsequent populations only differ in one element.Now consider two populations P i , P i+1 , both of size μ, such that P i+1 differs from P i in just element: P i+1 = (P i \ {z}) ∪ {w}.Further assume y∈P i sh(x, y) are stored for all x ∈ P i with O (1) access time.Then for all x ∈ P i ∪ P i+1 , by ( 2) can be computed in time O (1).So if f (P i ) and all y∈P i sh(x, y) are known, f (P i+1 ) can be computed as

General results
Phenotypic fitness sharing, along with the shape of the TwoMax function, implies that an individual with a better fitness than that of any other individual in the population will always survive, as it has a better fitness than the individual with the closest number of ones, and it has a larger phenotypic distance to other individuals.This means that in a (μ+1) EA the current best fitness never decreases; this also holds if multiple individuals have the same current best fitness, as only one individual is removed by selection.
As a result, the (μ+1) EA never decreases its current best fitness and finds at least one optimum in expected time O (μn log n).
Proof.We prove the first statement.The second statement will follow by symmetry, swapping the meaning of zeros and ones.By definition of phenotypic fitness sharing, to prove the statement.This follows by definition of D j since, according to how the individuals are labelled, for all 3 ≤ i ≤ μ + 1 if x 1 shares fitness with x i then x 2 also shares fitness The time bound follows from standard fitness level arguments: For an individual x with f (x (1).We consider the case |x| = i: To improve the fitness it suffices to flip one of the remaining i 1-bits and leave all other bits unchanged.The probability for this event is Since the probability of selecting x as parent is 1/μ, the probability for a fitness improvement during a generation is at least i/(eμn).Since the waiting times are geometrically distributed, we get an upper bound of for the expected number of fitness evaluations to increase the fitness from n/2 to n.The case |x| = n − i is proven by considering the remaining i 0-bits in the very same way. 2 The symmetry between f (x 1 , P ) vs. f (x 2 , P ) and f (x s−1 , P ) vs. f (x s , P ) follows from swapping the meaning of zeros and ones.This also applies to further statements, where for simplicity we omit symmetric statements.
The following Main Lemma gives sufficient and necessary conditions on when the shared fitness of one individual is better than another.

Lemma 5 (Main Lemma). Let
The same holds if all inequalities "≥" are replaced by strict inequalities ">".Moreover, for i that all pairs of individuals do share fitness.We have .
Comparing D i and D i+1 , for the latter the distance to x 1 , . . ., x i−1 is higher by |x i+1 | −|x i |, and the distance to x i+2 , . . ., x s is lower by |x i+1 | − |x i |: In the last step we used h > 0. The same calculations hold if "≥" is replaced by ">" throughout.The second equivalence from the statement follows from The second statement follows by simply applying the first statement: In general, the conditions from Lemma 5 are true for x s−1 and x s if |x s−1 | < n/2 and two individuals are in the optimum 0 n as then

Population size μ = 2 is not enough
We first investigate the case of the (2+1) EA, showing that a population size of μ = 2 is not sufficient to guarantee finding both optima.The following lemma gives sufficient and necessary conditions for a single individual on a branch to survive.
For |x 3 | = |x 2 | the statement implies that x 1 survives if the distance from n/2 to x 2 is less than around 3/2 times the distance from n/2 to x 1 .The condition for survival sharpens when |x 3 | > |x 2 |; however, as x 2 and x 3 are likely to result from a mutation of one another, |x 3 | − |x 2 | is bounded from above by the number of bits flipped in that mutation.
Proof of Lemma 7. We use the shorthand x i for |x i |.The claim follows from Lemma 4 if f (x 1 ) > f (x 2 ), hence we assume in the following that f (x 1 ) ≤ f (x 2 ).Then f (x 2 , P ) < f (x 1 , P ) and this is equivalent to The right-hand side terms can be simplified as follows. (n The following theorem states that with a probability greater than 1/2, the (2+1) EA will end up with both individuals in the same optimum, leading to an exponential running time from there.We remark that two independent runs of a (1+1) EA as well as a (2+1) EA with deterministic crowding are more efficient as they both find both optima with probability exactly 1/2, leading to expected runtimes of O (n log n) [7].
Theorem 8.The (2+1) EA with fitness sharing with probability 1/2 + (1) will reach a population with both members in the same optimum, and then the expected time for finding both optima from there is (n n/2 ).
. By symmetry, with probability 1/2 − o(1), x 1 and x 2 are on the same branch.Since at least n 1/3 bits would have to be flipped in one mutation, the probability of a mutation jumping from one branch to the other is then at most 1/(n 1/3 !) = 2 − (n 1/3 log n) , and the probability of this happening in expected polynomial time is still of the same order.This implies that w.o.p. no individuals on the opposite branch will be created in polynomial time as long as no offspring of decreasing fitness are ever accepted on the current branch.In the following we prove by contradiction that such offspring are always rejected.
Assuming both search points and the offspring are all on the same branch, w.l.o.g. the left branch, and labelling them by where if n is large enough.Now Lemma 7 implies f (x 1 , P ) < f (x 2 , P ) = f (x 3 , P ), hence, x 1 will be removed.Then we are in the same situation as when initialising two individuals on the same branch.2 However, there is still a constant probability that the (2+1) EA finds both optima in polynomial expected time.This holds if the EA is initialised with its two search points on different branches, and if these two search points maintain similar fitness values throughout the run.

Theorem 9. The (2+1) EA with fitness sharing with probability (1) will find both optima in time O (n log n).
Since the proof of Theorem 9 is quite long, we first provide a sketch of the proof.Let Now, assume w.l.o.g. that when a new offspring is created and the population contains x 1 , x 2 , x 3 in order of their numbers of ones, that x 2 and x 3 are on the same branch.
It is easy to derive from Lemma 7 and further arguments using Intuitively, this means that if x 1 and x 2 have a similar fitness-d 1 and d 2 being within a factor of 2/3-then x 1 is guaranteed to survive.
We then define a potential function that indicates a distance to a population where the lower-fitness individual is at risk of dying.For a current population This ensures that g(P ) ≥ 0 ⇒ f (x 1 , P ) > f (x 2 , P ).The potential of the initial population P 0 is comfortably large: k for some k ∈ N, the potential increases by k if d 1 increases by k.However, the potential only decreases by (2/3)k if d 2 increases by k.Moreover, increasing d 1 is easier than increasing d 2 as the former contains more "incorrect" bits (cf.Lemma 13 in [28]).This shows that, whenever the potential changes, it increases in expectation by 1/3.
A straightforward application of the simplified drift theorem [29,30] shows that with overwhelming probability the potential never decreases below √ n/24 in 2 ( √ n) steps.So, with overwhelming probability x 1 survives until both optima are reached.
These arguments are made rigorous in the following proof.
Proof of Theorem 9. Let x 1 , x 2 be the two initial search points and define i.e., x 1 and x 2 are on opposite branches and have similar fitness.The probability of these inequalities holding for x 1 is where the last step follows from bounding the binomial coefficient from below by (2 n / √ n) [31,Lemma 8].By symmetry, the same holds for d 2 and hence the probability of ( 4) is (1) • (1) = (1).Now, assume w.l.o.g. that when a new offspring is created and the population contains x 1 , x 2 , x 3 in order of their numbers of ones, that x 2 and x 3 are on the same branch.The case where x 1 and x 2 are on the same branch is symmetric.
In the following we assume f (x 1 ) ≤ f (x 2 ).The probability of flipping at least √ n/6 bits in one mutation is at most 1/( and the probability that this happens in expected polynomial time is still of the same order.So in the following we work under the assumption that such a mutation does not happen. For which is implied by ( The same holds for The shared fitness of x 2 is smaller, for n large enough, even in the best case where x 2 does not share with x 1 and This establishes (5) as sufficient condition for the survival of x 1 , regardless of whether For a current population P = {x 1 , x 2 } define a potential Intuitively, the potential indicates a distance to a population where the lower-fitness individual is at risk of dying.For using Lemma 7. Now we show that the potential with high probability never decreases to 0, which implies that x 1 survives until both optima are reached eventually.
For the initial population P 0 we have g(P 0 ) ≥ (3/4) As we do not allow jumps of this length, if d 1 < d 2 then the same will hold for the distances in the next generation.In other words, the roles of d 1 and d 2 in the min and max terms of ( 6) do not change.
If P t is the current population at generation t, and Finding an improvement by d is easier for x 1 than for x 2 as the former contains more 'incorrect' bits.Formally, Lemma 13 in [28] along with the symmetry of TwoMax implies that So we get E (g(P t+1 ) − g(P t ) as the probability of selecting x 1 as parent and increasing its number of zeros is at least .
Using the simplified drift theorem [29,30], we see that for a := √ n/24 and b := √ n/12 the first condition is satisfied for ε := 1/6.The second condition on jump lengths follows by standard arguments: for all d ∈ N This shows that with probability 2 − ( √ n) the potential never decreases below If individuals on both branches survive, by standard arguments (cf.Theorem 4 in [7]) both optima will be reached in expected time cn log n for some constant c > 0. By Markov's inequality, the probability of not having done so after c • cn log n generations is at most 1/c .Choosing c large enough and taking into account all failure probabilities, the (2+1) EA finds both optima in time O (n log n) with probability (1). 2

Population size μ ≥ 3 always finds both optima
A population of size μ = 2 may fail, but we show that a (μ+1) EA with fitness sharing and μ ≥ 3 always finds both optima in expected time O (μn log n).
The following lemma is an extension of the Main Lemma (Lemma 5) to the case where an individual x μ+1 is on the other branch compared to the rest of the population.In particular, a stronger condition is given such that x μ+1 will survive selection when f (x μ ) > f (x μ+1 ).The proof is similar to the one for the Main Lemma.
Proof.By considering that, for all 1 ≤ i ≤ μ − 1, the summands of D μ+1 are bounded as min( we bound D μ+1 as follows: In the last equivalence we used that The following lemma states that if there is a bounded number r of individuals in one optimum then they will have better shared fitness than the next sub-optimal individual.This implies that r such individuals survive in the (μ+1) EA; the same holds if there are more than r such individuals in the extended population as only one individual is being removed.

Lemma 11. Let
In particular, if the current population of the (μ+1) EA contains at least two individuals 0 n , two such individuals always survive.
Proof.As f (x 1 , P ) = • • • = f (x r , P ), we only need to show the claim for i = 1.
If |x r+1 | < n/2, we assume pessimistically that x r+1 shares fitness with the same individuals as x 1 , . . ., x r , namely x 1 , . . ., x for some ≥ r + 1.Then we have hence From (7) we see that The above term is strictly increasing with x r+1 , hence along with x r+1 < n we have With these lemmas we are ready to prove the main result of this section.
Theorem 12. Let μ ≥ 3. The (μ + 1) EA with fitness sharing will find both optima of TwoMax with probability 1 in expected time O (μn log n).
Proof.By Lemma 4, in expected time O (μn log n) one of the two optima is found.W.l.o.g.we assume the 0 n optimum is found.In expected time O (μ) a clone of 0 n is created (i.e., |x 2 | = 0) and by Lemma 11 x 1 and x 2 (or clones thereof) will survive for the rest of the run.We show that then the individual with the largest number of ones, x μ+1 (or a clone thereof), will always survive.Our analysis has revealed two very different behaviours.It is possible that the whole population climbs up one branch.But once a sufficiently large overall fitness value has been obtained -at the latest when two individuals have found an optimum -then the population expands towards lower fitness values as then the individuals with the smallest and the largest numbers of 1-bits always survive.

Too large offspring population sizes
Fitness sharing works for the (μ+1) EA, but for larger offspring populations it can have undesirable effects: if a cluster of individuals creates too many offspring, sharing decreases the shared fitness of all individuals in the cluster, and the cluster may go extinct.We consider this problem of overpopulation for μ = 2 and λ ≥ μ with λ = O (1).In this setting we cannot guarantee convergence to populations with both optima any more, i.e., depending on λ we can lose one or even both optima.
Following the same argumentation, we lose both optima if λ ≥ 6: If mutation creates λ − 2 copies and two points with distance 1 to the optimum (also with probability (1)), we have for λ ≥ 6.
In exactly the same way we show that both optima are lost with probability (1) if λ ≥ 6 even if they are on different branches, i.e., we create λ/2 offspring on the left branch and λ/2 on the right branch where exactly one offspring on each branch has distance 1 to the optimum and the remaining offspring are copies.
Offspring populations can also decrease diversity in the following way.
Lemma 13.With probability 1 − o(1), the (2 + λ) EA with fitness sharing, λ ≥ 2 and λ = O (1) will, at some point of time before an optimum is reached, obtain a population with both members on the same branch.
The following proof mainly uses that in a single iteration with probability (1) only copies of x 1 and x 2 are created.We then show that if f (x 1 ) = f (x 2 ) and if we have a surplus of offspring on the branch with smaller fitness (also probability (1)), this branch goes extinct.If f (x 1 ) = f (x 2 ) in iteration t we have f (x 1 ) = f (x 2 ) in iteration t + 1 with probability (1) and if f (x 1 ) = f (x 2 ) in iteration t we still have f (x 1 ) = f (x 2 ) in iteration t + 1 with probability (1).Thus, with probability 1 − 2 − (n) there are (n) iterations with f (x 1 ) = f (x 2 ) before an optimum is reached and consequently, with probability 1 − 2 − (n) , one branch will take over the whole population before an optimum is reached.
Proof.Let x 1 , x 2 be the individuals of the current population.As in Theorem 8 with probability 1 − o (1) ] after initialisation and thus, the probability to create an offspring on the other branch is 2 − (n 1/3 log n) = o(1).
Assuming that we have two individuals on different branches after initialisation (otherwise there is nothing to prove), we now show that with probability 1 − 2 − (n) we will lose the individual on one of the two branches before an optimum is reached.
We use that with probability (1) only copies of x 1 and x 2 are created in an iteration.Thus, all individuals on the same branch have the same fitness value.Let x L , x R denote an individual on the left and right branch, and δ L , δ R the number of offspring on the left and right branch, respectively.Let d i, j = min{n/2, |x i | − |x j | }.We observe that d L,R is the same for all pairs of x L and x R and We observe that δ L = δ R = λ/2 holds with probability (1) if λ is even.If λ is odd, we have δ L = λ/2 and δ R = λ/2 with probability (1).
We first consider the case f ( If λ is even, the above observation implies D L = D R and thus, f (x L , P ) > f (x R , P ).For odd λ we conclude D L > D R and thus, f (x L , P ) > f (x R , P ).Hence, only individuals on the left branch survive with probability (1) For odd λ we use exactly the same argument as above: the branch with λ/2 offspring has lower shared fitness and thus, only individuals on the other (i.e., the left) branch survive.For even λ we need to be more careful since from the above argumentation we can only conclude f (x L , P ) = f (x R , P ) and we pessimistically assume that we select individuals on two different branches in this case.However, we see that a successful mutation occurs with probability at least . This is (1) as long as k = (n).Thus, with probability (1) we create λ − 1 copies and one improved offspring.Since the offspring has larger shared fitness, we have f (x L ) = f (x R ) in the next iteration.In summary: ) in iteration t + 1 with probability (1) (since it suffices to only create copies of x 1 and x 2 ).We conclude that with probability 1 − 2 − (n) there are (n) iterations with f (x 1 ) = f (x 2 ) before an optimum is reached.Since in this situation with probability (1) one branch will take over the whole population, this happens with probability 1 − 2 − (n) before an optimum is reached. 2 In order to show that the (2 + λ) EA also reaches a population with both members in the same optimum we additionally need to show that the population will not be stuck somewhere on the branch and that individuals cannot traverse back to the other branch.We consider this for the special case of λ = 2. Theorem 14.With probability 1 − o(1), the (2 + 2) EA with fitness sharing will, at some point of time, reach a population with both members in the same optimum.The expected time for finding both optima from there is n n/2 .
Proof.Assume that both individuals are on the same branch.This happens with probability 1 − o( 1) before an optimum is reached (see Lemma 13).
We first show that a current best individual is never lost.If there is a single best individual in the population, this will never be lost since f (x 1 , P ) > f (x 2 , P ) > f (x 3 , P ) (Lemma 4 and 5 as discussed above).If there are 3 or 4 best individuals, we are guaranteed to select at least one of them for the next generation since μ = 2.In case there are 2 best individuals, we again use the above argumentation to prove that f (x 3 , P ) < f (x 2 , P ).Thus, we are guaranteed to select at least one of the two best individuals for the next generation.
Since due to the above argumentation we never lose a single best individual, a single improved offspring of a best individual will always be accepted.Thus, we will reach a population with both members in the same optimum.The claim about the expected time to find both optima follows as in Theorem 8. 2

Experiments for phenotypic and genotypic fitness sharing
We first present a set of experiments, shown in Table 1, where we ran (μ+λ) EAs for n = 100 bits and varying values of 2 ≤ μ ≤ 12 and 1 ≤ λ ≤ 12.We recorded the success rate as the number of runs where both optima were found within 100000 generations.The table shows a clear distinction between efficient and inefficient behaviour: for λ < μ/2 runs were always successful, whereas runs for λ ≥ μ always failed (except for one run with λ = μ = 11).
We further ran experiments to test the performance of genotypic fitness sharing, that is, repeating the above experiments but using Hamming distance as distance measure in the (μ+λ) EAs.Table 2 shows the resulting success rates with sharing radius σ = n/2 to match the setting from Table 1.Apart from the (2+1) EA, (3+1) EA, and (4+1) EA, all algorithms were unable to find both peaks.The reason could be that the sharing radius needs to be chosen differently.With σ = n/2, since two uniform random individuals will have Hamming distance n/2 in expectation, this means that any two initial individuals will either not share fitness, or share so little that the effect of fitness sharing is negligible.
Table 3 shows success rates when repeating the experiment with a sharing radius of σ = n, where all individuals always share fitness.One can see that the success rates show a similar pattern compared to Table 1 for phenotypic sharing, albeit numbers are generally smaller.For the (μ+1) EA success rates seem to converge to 1 with increasing μ, but a few runs still fail.We suspect that this is due to few runs that are initialised with all individuals on one branch.
To test this, we also ran experiments for a modified, favourable initialisation where we drew μ individuals independently and uniformly at random, and then checked whether the population contains at least one individual with n/2 + √ n ones and at least one individual with at least n/2 + √ n zeros.If this was not the case, the population was discarded and μ new Success rates as percentages of the (μ+λ) EA with genotypic fitness sharing and σ = n on TwoMax in 100 runs, stopped after 100000 generations, and once both optima were found.Here all runs were initialised using rejection sampling that the initial population contains at least one individual with at least n/2 + √ n ones and at least one individual with at least n/2 + √ n zeros.
μ \ λ  individuals were drawn independently and uniformly at random.The term n/2 + √ n was chosen such that two individuals are firmly placed on their respective branches, from which mutations to the other branch are unlikely.We note without giving a formal proof that the probability of having at least n/2 + √ n ones is bounded from below by a positive constant c > 0 and hence the probability of initialising a population as described is at least 1 − 2(1 − c) λ = 1 − 2 − (λ) .This means that for λ not too small, only a small fraction of initial populations is discarded.Table 4 shows that when unlucky initialisations are excluded, success rates of 100% are achieved for small λ.

Conclusions
This work sheds light on advantages and disadvantages of fitness sharing in multimodal optimisation, particularly in the context of a multi-local perspective where we are interested in locating different global or local optima.To allow for easy comparison with previous work, we used a common analytical framework (i.e., (μ+λ) EA) and example problem (i.e., TwoMax).
Our main contribution is the rigorous theoretical analysis of the conventional fitness sharing mechanism which selects individuals based on their shared fitness (rather than performing selection on a level of populations as done in previous theoretical work) when phenotypic sharing is used.We concentrated on the influence of the population sizes μ and λ as crucial parameters.Regarding the parent population, our analyses show that a population size μ of at least 3 is required to guarantee finding both optima of TwoMax in polynomial time.We also prove that large offspring population sizes λ can cause overpopulation which results in the extinction of whole clusters of search points.The latter results are accompanied by experiments suggesting that the (μ+1) EA is successful if λ < μ/2 and that it almost always fails for λ ≥ μ.These findings highlight the risks of using fitness sharing with inappropriate parameters and highlight the need for a better understanding of algorithm parameters.We concluded the paper with an empirical analysis of the genotypic sharing that has to be used when no problem specific knowledge is available.The experiments indicate that similar conclusions on algorithmic performance may be made when the Hamming distance is used.We leave rigorous theoretical proofs of this as an open problem for future work.
In the future it would also be interesting to extend the analyses of fitness sharing and other diversity mechanisms to problems beyond TwoMax.Promising candidates for such work are the set of theory-affine multimodal benchmark functions introduced in [32] or dynamic problems.

Lemma 5
implies the following structural insight: If the population is located on one branch and the shared fitness values of two neighbouring (in the number of 1-bits) search points compare favourably for the higher search point, then the shared fitness strictly increases for all search points further up the branch.More precisely, Lemma 5 gives a condition for the individual of lowest raw fitness (i.e., x s ) to be accepted by selection.Concerning the (μ+1) EA, the condition clearly shows that for μ = 2 at least n/2 bits have to flip (i.e., |x 3 | − |x 2 | ≥ n/2).On the other hand, for μ ≥ 3 offspring with lower fitness values are accepted once the population is close enough to the optimum 0 n .This threshold is further away from the optimum as the population size increases.If mutation was only allowed to flip one bit and μ = 3, then it is necessary that both x 1 and x 2 reach the local optimum before decreasing moves are accepted (i.e., |x 1 | + |x 2 | = 0).For μ = 4 the sum of 1-bits in the first 4 individuals can be up to |x 1 | + |x 2 | + |x 3 | + |x 4 | ≤ n/2 for any decreasing move to be accepted by the (μ+1) EA.
x 1 , x 2 be the two initial search points and d 1 := n/2 − |x 1 | and d 2 := |x 2 | − n/2.With probability (1), x 1 and x 2 are on opposite branches and have similar fitness: (3/4) (|P |), assuming that the sharing values are available from a table with O (1) access time.It can further be computed more efficiently by using incremental steps.If we have stored (1) sh(x, y) for a population P with O(1)access time, we can compute y∈P sh(x, y) = y∈P sh(x, y) − y∈P \P sh(x, y) + in time O (1 + |P \ P | + |P \ P |).This is O (1) if P and P only differ in one element, and O (λ) if they differ in at most λ elements.Time complexity of individual-based fitness sharing.The conventional individual-based fitness sharing computes f (x, P ) = f (x)/ y∈P sh(x, y) for the same population P of μ parents and λ offspring.Using (2) and the arguments from the previous paragraph, given a matrix of all sh(x, y) values, all μ + λ values f (x, P ) values can be computed incrementally in time the elements in the last sum can be computed in total time O (μ) and f (w, P i+1 ) can be computed in time O (μ) as well.So we can compute shared population fitness values for all size-μ populations and find a best one in time y∈P i+1 sh(x, y)+ f (w, P i+1 )where O λ(μ + λ)T (n) + μ+λ μ μ per generation and initial preprocessing time O ((μ + λ) 2 T (n)).Theorem 3. Let T (n) be the time to compute sh(x, y) for any two search points x, y.Population-based fitness sharing in a (μ+λ) EA can be implemented in such a way that the overhead from fitness sharing is time O ((μ + λ) 2 T (n)) for preprocessing and time O λ(μ + λ)T (n) + μ+λ μ μ per generation.

Table 1
Success rates as percentages of the (μ+λ) EA with phenotypic fitness sharing on TwoMax in 1000 runs, stopped after 100000 generations, and once both optima were found.