Next Article in Journal
Joints Trajectory Planning of Robot Based on Slime Mould Whale Optimization Algorithm
Previous Article in Journal
Foremost Walks and Paths in Interval Temporal Graphs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Non-Stationary Stochastic Global Optimization Algorithms

Departamento de Ingeniería de Sistemas e Industrial, Facultad de Ingeniería, Universidad Nacional de Colombia, Bogotá 11001, Colombia
*
Author to whom correspondence should be addressed.
Algorithms 2022, 15(10), 362; https://doi.org/10.3390/a15100362
Submission received: 8 July 2022 / Revised: 31 August 2022 / Accepted: 6 September 2022 / Published: 29 September 2022
(This article belongs to the Special Issue Optimization under Uncertainty 2022)

Abstract

:
Studying the theoretical properties of optimization algorithms such as genetic algorithms and evolutionary strategies allows us to determine when they are suitable for solving a particular type of optimization problem. Such a study consists of three main steps. The first step is considering such algorithms as Stochastic Global Optimization Algorithms (SGoals ), i.e., iterative algorithm that applies stochastic operations to a set of candidate solutions. The second step is to define a formal characterization of the iterative process in terms of measure theory and define some of such stochastic operations as stationary Markov kernels (defined in terms of transition probabilities that do not change over time). The third step is to characterize non-stationary SGoals, i.e., SGoals having stochastic operations with transition probabilities that may change over time. In this paper, we develop the third step of this study. First, we generalize the sufficient conditions convergence from stationary to non-stationary Markov processes. Second, we introduce the necessary theory to define kernels for arithmetic operations between measurable functions. Third, we develop Markov kernels for some selection and recombination schemes. Finally, we formalize the simulated annealing algorithm and evolutionary strategies using the systematic formal approach.

1. Introduction

In global optimization studies, a general question is which type of algorithms are more fit for which type of optimization problems, so, we can decide when to use an evolutionary algorithm to solve a class of optimization problems by understanding its theoretical properties and characterizing its observable behavior [1].
According to Zilinskas and Zhigljavsky in [2], stochastic global optimization algorithms (SGoals in short) are inseparable from their presentation and analysis. Several researchers have done active development of the field from long time ago. Such as: Torn and Zilinskas in [3], Mockus and Zilinskas in [4], or Neimark and Strongin in [5]. However, it remains an active field of research including mathematical analysis of problems.
SGoals have been also studied from a Markovian perspective: Zhigljavsky and Zilinskas in Sections 3.3 and 3.4 in [6] and Tikhomirov in [7] studied the convergence rate of some homogeneous Markov monotone random search optimization algorithms. Also, H Al-Mharmah et al. in [8] studied some random non-adaptive algorithms for finding the maximum of a continuous function on the unit interval. An analysis of selection algorithms was done by Chakraborty et al. in [9] and an analysis for evolutionary strategies for global minimization was described by François in [10].
As can be noticed, these studies do not use measure theory to formalize probabilistic concepts or are developed around a specific optimization problem. Gomez in [11], describes a formal and systematic approach for characterizing stochastic global optimization algorithms. There, the required theory of probability to characterize SGoals such as measure theory, Markov kernel, operations between kernels, products and conditions to study convergence is presented. In addition, it is proved that some algorithmic functions like projection and sort can be represented by kernels.
Moreover, the notion of join-kernel has been introduced in that paper as a way to characterize the combination of stochastic methods. Also, it is defined a formal structure of an optimization space for studying SGoals. Finally, Gomez formalizes algorithms that have a next-population stochastic method, which does not change transition probabilities over time. Such algorithms can be viewed from the perspective of stationary Markov processes. This viewpoint applies among others for, standard versions of hill-climbing, parallel hill-climbing, steady-state genetic, generational genetic, and differential evolution algorithms.
This work continues such a systematic formal approach. First, we review the theory done by Gomez in [11]. Next, we generalize the sufficient conditions convergence Lemma 71 in [11] from stationary to non-stationary Markov processes. Third, we develop arithmetic kernels to characterize arithmetic operations between measurable functions. We develop Markov kernels for some selection and recombination schemes. Finally, in order to show some applications of the concepts developed, we formalize both simulated-annealing and evolutionary-strategies using the systematic formal approach, which are classical algorithms and can be found several studies in the literature such as Romeijn et al. in [12] or Weise in [13].

2. Preliminaries

This section provides a brief introduction to the systematic formalization proposed by Gomez in [11]. Such systematic formalization of SGoals, is carried on Markov kernels terms. We formalize SGoals with stationary next population stochastic method, i.e., SGoals that can be characterized as stationary Markov processes and do not change transition probabilities of the next population over time. That is the case of the hill-climbing [14], the parallel hill-climbing, the generational genetic [15,16,17], the steady-state genetic [18], and the differential evolution [19,20] algorithms. However, SGoals such as the Simulated Annealing [21], Evolutionary Strategies [22], or any algorithm using parameter control/adaptation techniques [23] cannot be characterized as stationary Markov processes.
We clarify that in this section we only review the concepts and not the proofs. The proof of each concept can be found in [11].

2.1. Systematic Formalization Theory

We review the concepts used to characterize SGoals and necessary concepts to extend the theory to characterize adaptive SGoals.
We consider an optimization problem with an objective function f Φ R , which is defined over a feasible region Ω Φ , where Φ is the solution space and f is a function that is looking for a global optimizer p * described by:
min f : Φ R = { p * Ω Φ p Ω f p * f p } .

2.1.1. Stochastic Global Optimization Algorithm

In this work, we focus on algorithms that are not deterministic but stochastic. e.g., simulated annealing and evolutionary strategies. A generic way to describe such algorithms in pseudocode is given by Algorithm 1. Here the main difference between algorithms is the way the N EXT P OP operation is carried out.
The N EXT P OP method generates new populations, the I NIT P OPN ( n ) method generates an initial population and B EST ( P t ) chooses the best set of individuals from P t .
Algorithm 1. Stochastic Global Optimization Algorithm
SGoal(n)
1.
t 0 = 0
2.
P = InitPop(n)
3.
while¬End( P t , t) do
4.
P t + 1 = NextPop( P t )
5.
t = t + 1
6.
returnBest( P t )

2.1.2. Measure and Probability Theory

Gomez in [11] used probabilistic kernels to formalize SGoals. This work is an extension of Gomez’s work. We review some concepts that Gomez uses and that we are going to use to characterize adaptive SGoals.

Definitions of Measure Theory

Probability theory uses measure theory to formalize the concepts. Measure theory defines elementary events Ω and the system of observable events A 2 Ω , where A is a family of sets. These concepts can be translated in the context of SGOAL where the set of elementary events contains all possible populations and, a family of sets of populations is a system of observable events.
Probability theory operates over sets to measure the probability of some observable event. The structure used to measure subsets of Ω is a σ -algebra that meets the following conditions:
1.
Ω Σ . Ω is considered a universal set.
2.
Σ is C ¯ . Σ is closed under complement.
3.
Σ is C U ¯ . Σ is closed under countable unions.
When we deal with a continuous space, a topological space is used. Hence, Ω , τ is a topological space, the sigma-algebra ( σ -algebra) B Ω B Ω , τ σ τ is the Borel σ -algebra on Ω . Consider set Ω = R , in that case B R is the σ -algebra on R . In this paper, a tuple of the form Ω , Σ , refers to a measurable space, where Σ is a σ -algebra.
In probability theory, we generally need to operate between measurable spaces using certain functions, precisely measurable functions. Let Ω 1 , Σ 1 and Ω 2 , Σ 2 be two measurable spaces. A function f is defined as measurable if f : Ω 1 Ω 2 , and f 1 B maps a subset of the domain. i.e., if B Σ 2 then f 1 B Σ 2 .
We call this function f measurable because we can define a measure μ : Σ R ¯ on Ω 2 , Σ 2 in terms of Ω 1 , Σ 1 . Where μ has the following conditions:
1.
μ Ø = 0 .
2.
μ B 0 for all B Σ .
3.
μ i I B i = i I μ B i for all B i Σ i I Countable Disjoint Family.
If Ω , Σ is a measurable space and μ is a measure, we call Ω , Σ , μ a measure space, if μ is a probability measure i.e., μ Ω = 1 , then Ω , Σ , μ is a probability space.
Let Ω 1 , Σ 1 , P r be a probability space and Ω 2 , Σ 2 be a measurable space. If X : Ω 1 Ω 2 is a measurable function, then X is called a random variable with values in Ω 2 , Σ 2 .

2.1.3. Kernel

Stochastic processes can model the process of generating one population from another population. A transition kernel is used to characterize each iteration in a stochastic process given by:
K x , A = P ( x , A ) = P r X t A X t 1 = x .
where x is the current population and A a subset of possible populations.
Definition 1. 
(Markov kernel) Let Ω 1 , Σ 1 and Ω 2 , Σ 2 be measurable spaces. A function K : Ω 1 × Σ 2 0 , 1 is called a (Markov) kernel if the following two conditions hold:
1.
Function K x , A K ( x , A ) is a probability measure for each fixed x Ω 1
2.
Function K , A x K ( x , A ) is a measurable function for each fixed A Σ 2 .
The transition kernel of an existing transition density K : Ω 1 × Ω 2 0 , 1 is defined by:
K x , A = A K x , y d y .
Kernels linked with deterministic methods used by SGoals play a significant role in developing a systematic formal theory. Next, we review several characterizations of stochastic methods using transition kernels.

Deterministic Kernel

Let Ω 1 , Σ 1 and Ω 2 , Σ 2 be measurable spaces, and f Ω 1 Ω 2 be Σ 1 Σ 2 measurable. The function 1 f Ω 1 × Σ 2 0 , 1 is a kernel defined by:
1 f x , A = 1 if   f x A 0 otherwise .

Kernel Indicator

Let Ω , Σ be a measurable space. The indicator function 1: Ω × Σ 0 , 1 defined as 1 x , A = 1 i d x A , with i d x = x is a kernel.

Random Scan (Mixing)

The mixing update mechanism of a set of n Markov transition kernels K 1 , …, K n , each of them with a probability of being picked p 1 , p 2 , , p n ( p i = 1 ), is defined by:
i = 1 n p i K 1 x , A = i = 1 n p i A K i x , y d y .

Composition

The composition update mechanism is built using the kernel multiplication operator. It is built on the concept of applying one kernel after another. Kernel composition of K 1 , K 2 is defined by:
K 2 K 1 x , A = K 2 y , A K 1 x , d y .
The composition of update mechanisms that correspond to a set of n transition kernels K 1 , …, K n is defined as the product kernel K n K n 1 K 1 . Based on the fact that kernel multiplication is an associative operation.

Transition Kernel Iteration

The transition probability of iteration t (application) of a Markovian kernel K, describes the probability to transit to set A Σ within t steps when starting at state x Ω as defined by:
K t x , A = K x , A , t = 1 Ω K t 1 y , A K x , d y , t > 1 .
Let p : Σ 0 , 1 be the initial distribution of subsets, in that case the probability that the Markov process is in set A Σ at step t 0 is given by:
P r X t A = p A , t = 0 Ω K t x , A p d x , t > 0 .

2.1.4. Kernels on Cartesian Products

SGoals work with populations, so we need to work with probability theory concepts that can handle them; To do so, we use a probability space Ω n , Σ n , i = 1 n μ i , where Ω n is the set of elementary elements, i.e., the set of all possible populations of size n. Σ n , is the product σ -algebra and i = 1 n μ i a product probability measure [11].
We can use probability theory on cartesian products to use several new kernels by joining simple kernels. Hence, we can join stochastic methods used in SGoals to generate new populations from other populations.

Swap Kernel

We can define a kernel 1 : Ω 1 × Ω 2 × Σ 2 Σ 1 0 , 1 that characterizes the Swap method, a deterministic method often utilized by SGoals when we need to select individuals. Where is defined as follows:
: Ω 1 × Ω 2 Ω 2 × Ω 1 x , y y , x .

Projection Kernel

We can define a kernel 1 π I : i = 1 n Ω i × i = 1 m Σ i 0 , 1 that characterizes a method that selects m individuals from a population of size n. where π I is defined as follows:
π I : i = 1 n Ω i i = 1 m Ω k i x 1 , , x n x k 1 , , x k n .

Join Kernel

The join of methods that generate a subpopulation of the next population can be characterized by a join kernel K Ω η × Σ υ 0 , 1 with υ = k = 1 m υ k . Where the join stochastic method is defined as follows:
F : Ω η Ω υ x 1 , , x n i = 1 m F i .

Permutation Kernel

Methods that generate permutations of populations can be characterized by the kernel K I Ω n × Σ n 0 , 1 defined as K I = k = 1 n π i k . Where I = i 1 , i 2 , , i n is a fixed permutation of the set 1 , 2 , , n . If we let P as the set of permutations of the set 1 , 2 , , n . then, the kernel K P Ω n × Σ n 0 , 1 defined as K P = 1 P I P π I characterizes a stochastic method that generates a population from another one, looking for a fixed set of permutations.

Sorting Kernel

Methods that sorts populations according to closeness of objective function are commonly used in SGoals, so we need to characterize them with a kernel s n , n 1 Ω n × Σ n 0 ,   1 . As defined in Proposition 63 in [11].

VR Kernel

A common pattern in most SGoals is that they can be described by two consecutive stochastic methods: A variation V: Ω η Ω ϖ and a replacement R: Ω η + ϖ Ω υ method. Gomez in [4] defined these methods as a single variation-replacement F: Ω η Ω υ method named (VR) method. These methods can be characterized by kernels K R : Ω η + ϖ × Σ υ 0 , 1 and K V : Ω η × Σ ϖ 0 , 1 , hence K V R = K R 1 Ω η K V and K V R = K R K V 1 Ω η .

2.2. Characterization of a SGoal Using Probability Theory

Definition 2. 
(optimality) d x = f B EST x f * (Where f * is the optimal value of the objective function f in Ω).
Gomez in [11], defines an optimization space using sets that include optimal elements. It can be seen that these sets can be related to the concept of level set. We establish optimal individuals as individuals that have an objective function less than ϵ R + . The optimal elements are defined as follows.
1.
( ϵ -state) x is ϵ -optimum element if d x < ϵ ,
2.
( ϵ ¯ -state) x is an ϵ -optimum element if d x ϵ , and
3.
( ϵ ^ -state) x is an ϵ -element if d x = ϵ .
Gomez in [11] defines a ( f -optimization σ -algebra) as σ -algebra that contains strict ϵ -optimum states Ω ϵ ϵ > 0 Σ . This in order to finally define an optimization space Ω n , Σ n , f to study the convergence of SGoals. There Ω is the feasible region, Σ is a f-optimization σ -algebra and f is an objective function.

2.3. Kernels on Optimization Spaces

Elitist Stochastic Methods

Some SGoals use elitist stochastic methods that guarantee that the best υ solutions in the next generation are equal to or better than the υ solutions of the current generation. It is to capture the notion of improving the solution.
Definition 3. 
(elitist method) A stochastic method F: Ω η Ω υ is called elitist if f B EST F P f B EST P .
Definition 4. 
(elitist kernel) A kernel K : Ω η × Σ υ 0 , 1 is called elitist if K x , A = 0 for each A Σ υ such that d x < d y for all y A .
Lemma 1. 
If K : Ω η × Σ υ 0 , 1 is elitist then
1.
K x , Ω d x ¯ v c = 0 and K x , Ω d x ¯ v = 1 .
2.
Let x Ω η , if d x < α R then K x , Ω α ¯ v c = 0 and K x , Ω α ¯ v = 1 .
Definition 5. 
(optimal strictly bounded from zero) A kernel K : Ω η × Σ υ 0 , 1 is called optimal strictly bounded from zero iff K x , Ω ϵ δ ϵ > 0 for all ϵ > 0 .

2.4. Convergence of a SGoal

2.4.1. Convergence

Let D t be a random sequence, i.e., a sequence of random variables defined on a probability space Ω , Σ , P . Then D t is said to
1.
converge completely to zero, denoted as D t c 0 , if for every ϵ > 0
lim t i = 1 t P r D t > ϵ < .
2
converge in probability to zero, denoted as D t p 0 , if for every ϵ > 0
lim t P r D t > ϵ = 0 .
Gomez in [11] follows the approach proposed by Günter Rudolph in [24], to study the convergence properties of a SGoal. This concept is also studied by Zhigljavsky and Zilinskas in [6], where there is an extensive study of SGoals. We clarify that in the rest of this paper, Σ is an optimization σ -algebra. First, Rudolph defines a convergence property for a SGoal in terms of the objective function.
Definition 6. 
(SGoalconvergence). Let P t Ω n be the population maintained by a SGoal A at iteration t. Then A converges to the global optimum if the random sequence D t = d P t : t 0 converges completely to zero.
Lemma 2. 
(Lemma 1 in [24]) If ϵ > 0 such that K x , Ω ϵ δ > 0 for all x Ω ϵ c and K x , Ω ϵ = 1 for all x Ω ϵ then, holds for t 1
K t x , Ω ϵ 1 1 δ t .
Using Lemma 2, Rudolph establishes a theorem for convergence of evolutionary algorithms. Which is specified towards SGoals in Theorem 1.
Theorem 1. 
(Theorem 1 in Rudolph [24])
Let a SGoal fulfill the condition everywhere dense sampling condition of Lemma 2. Then will converge to the global optimum ( f * ) of a real valued function f : Φ R with f > , defined in an arbitrary space Ω Φ , regardless of the initial distribution p · .

2.4.2. Convergence of aVR-SGoal

Gomez [11] follows the approach used by Günter Rudolph in [24], to study the convergence properties of a VR-SGoals but Gomez rewrites it in terms of kernels VR variaton-replacement.
Theorem 2.
AVR-SGoalwith K V an optimal strictly bounded from zero variation kernel and K R an elitist replacement kernel, will converge to the global optimum of the objective function.

3. Materials and Methods

In this section, we begin by generalizing the theory developed by Gomez in [11] to generalize the concept of Stationary Markov process to non-stationary ones. Next, we study the necessary theory to characterize arithmetic methods that are useful in some recombination and mutation schemes described in [25].

3.1. Generalization to Non-Stationary Algorithms

For a non-stationary (or non-homogeneous) Markov process, the transition probabilities (kernel) may change over time ([26]). Suppose that K t is the transition kernel applied at time t > 0 of a non-stationary Markov process. Then, the transition kernel of such non-stationary Markov process at time t is defined as K t = K t K t 1 K 1 . Clearly, we can rewrite Equation (7). The transition kernel of a non-stationary Markov process is given by:
K t x , A = K 1 x , A if t = 1 Ω K t 1 y , A K t x , d y if t > 1 .
Now we are in the position of generalizing Lemma 71 in [11] to non-stationary Markov processes.
Lemma 3.
If δ > 0 , such that for all x Ω ϵ c , K t x , Ω ϵ δ > 0 and, for all x Ω ϵ , K t x , Ω ϵ = 1 , then K t x , Ω ϵ 1 1 δ t holds for t 1 .
Proof. 
We just rewrite the proof of Lemma 71 in [11] (Gomez uses induction on t) but taking care of the non-stationary property of the Markov process. For t = 1 we have that K t x , Ω ϵ = K t x , Ω ϵ (Equation (7)), so K t x , Ω ϵ δ (condition lemma), therefore K t x , Ω ϵ 1 1 δ t ( t = 1 and numeric operations). Here, we will use the notation (as Gomez did) K t y , Ω ϵ = K y t Ω ϵ to reduce the visual length of the equations.
K x t + 1 Ω ϵ
= Ω K y t Ω ϵ K t x , d y (Equation (7))
= Ω ϵ K y t Ω ϵ K t x , d y + Ω ϵ c K y t Ω ϵ K t x , d y ( Ω = Ω ϵ Ω ϵ c )
= Ω ϵ K t x , d y + Ω ϵ c K y t Ω ϵ K t x , d y (If y Ω ϵ , K y t Ω ϵ = 1 )
= K t x , Ω ϵ + Ω ϵ c K y t Ω ϵ K t x , d y (def kernel)
K t x , Ω ϵ + 1 1 δ t A ϵ c K t x , d y (Induction hypothesis)
K t x , Ω ϵ + 1 1 δ t K t x , Ω ϵ c (def kernel)
K t x , Ω ϵ + K t x , Ω ϵ c 1 δ t K t x , Ω ϵ c
1 1 δ t 1 K t x , Ω ϵ (Probability)
1 1 δ t 1 δ (condition lemma)
1 1 δ t + 1 .
   □
Finally, Theorem 72 in [11] also holds for non-stationary Markov processes. So, in order to show convergence of a non-stationary SGoal it is sufficient to prove that the SGoal satisfies the condition of Lemma 3.
Theorem 3. 
(Theorem 72 in [11]–a corrected version of Theorem 1 in [24]) ASGoalwhose stochastic kernel satisfies K t x , Ω ϵ 1 1 δ t for all t 1 will converge to the global optimum ( f * ) of a well-defined real-valued function f : Φ R , defined in an arbitrary space Ω Φ , regardless of the initial distribution p · .
Proof. 
See proof of Theorem 72 in [11].   □

3.2. Arithmetic between Measurable Functions

Arithmetic operations can be found in schemes of mutation and recombination [25]. So to characterize an algorithm with kernels in its entirety we must characterize all methods that can alter the generation of new populations.
According to Theorem 22 in [11], to characterize arithmetic methods as deterministic kernels it is enough to prove that these methods are measurable. Proposition 1 provides the sufficient conditions for a function f : Ω R to be measurable.
Proposition 1.
Let ( Ω , Σ ) be a measurable space, then f : Ω R is Σ B ( R ) measurable if and only if one of the following conditions holds:
1.
{ x Ω : f ( x ) < b } Σ for every b R
2.
{ x Ω : f ( x ) b } Σ for every b R
3.
{ x Ω : f ( x ) b } Σ for every b R
4.
{ x Ω : f ( x ) > b } Σ for every b R .
Proof. 
Note that { x Ω : f ( x ) < b } = f 1 ( ( , b ) ) , and use { ( , b ) : b R } family to generate B ( R ) . Details see [27] proposition 1 chapter 3.    □
Lemma 4 gives a useful equality between sets to characterize arithmetic methods.
Lemma 4.
Let ( Ω , Σ ) , be a measurable space and f : Ω R and g : Ω R be Σ B ( R ) measurable, then
{ ( x , y ) Ω × Ω : f ( x ) + g ( y ) < c } = q Q L q , c × R q , c ,
where,
L q , c = { x Ω : g ( x ) < c q }
and
R q , c = { y Ω : f ( y ) < q } .
Proof. 
[ ] Consider a tuple ( x , y ) such that:
( x , y ) q Q L q , c × R q , c
then, for some q Q we have that g ( x ) < c q and f ( y ) < q . Hence, applying an arithmetic operation we have that f ( y ) + g ( x ) < c . So ( x , y ) { ( x , y ) Ω : f ( y ) + g ( x ) < c } .
[ ]
Let ( x , y ) { ( x , y ) Ω : f ( y ) + g ( x ) < c } so there exists some q Q that by the density of rational numbers holds: f ( y ) < q < c g ( x ) applying some arithmetic f ( y ) < q and g ( x ) < c q . From,
( x , y ) q Q [ { x Ω : g ( x ) < c q } × { y Ω : f ( y ) < q }
follows that
{ ( x , y ) Ω × Ω : f ( x ) + g ( y ) < c } q Q L q , c × R q , c .
   □

3.2.1. Method Product by a Scalar

Proposition 2.
Let ( Ω , Σ ) , be a measurable space and f : Ω R be Σ B ( R ) measurable, then h : Ω R defined as h ( x ) = α f ( x ) where α R , is Σ B ( R ) measurable.
Proof. 
For every α R and c R we want to show that h 1 Σ , we prove by cases.
[ α = 0 ] Note that h 1 ( ( , c ) ) = { x Ω : 0 f ( x ) = 0 }
{ x Ω : 0 f ( x ) = 0 } = Ω Because f ( x ) R
Ω Σ Definition of σ -algebra
h 1 ( { 0 } ) Σ h 1 ( { 0 } ) = { x Ω : 0 f ( x ) = 0 }
[ α > 0 ] Note that h 1 ( ( c , ) ) = { x Ω : α f ( x ) > c }
h 1 ( ( c , ) ) = { x Ω : f ( x ) > c / α } Arithmetic operations
{ x Ω : f ( x ) > c / α } Σ Measurable by proposition 2.
h 1 ( c , ) Σ h 1 ( c , ) = { x Ω : α f ( x ) > c }
[ α < 0 ] Note that h 1 ( ( , c ) ) = { x Ω : α f ( x ) < c }
h 1 ( ( , c ) ) = { x Ω : f ( x ) < c / α } Arithmetic operations
{ x Ω : f ( x ) < c / α } Σ Measurable by proposition 2.
h 1 ( ( , c ) ) Σ h 1 ( ( , c ) ) = { x Ω : α f ( x ) < c } .
   □

3.2.2. Method Addition

Proposition 3.
Let ( Ω , Σ ) be a measurable space and f : Ω R and g : Ω R be Σ B ( R ) measurable functions, then h : Ω × Ω R defined as h ( x , y ) : f ( y ) + g ( x ) is Σ Σ B ( R ) measurable.
Proof. 
We want to show that h 1 ( ( , c ) ) Σ Σ for all c R according to proposition 2.
Now, note that
{ ( x , y ) : f ( y ) + g ( x ) < c } = h 1 ( ( , c ) )
And using Lemma 4 we establish that:
{ ( x , y ) Ω × Ω : g ( x ) + f ( y ) < c } = q Q L q , c × R q , c
where
L q , c = { x Ω : g ( x ) < c q }
and
R q , c = { y Ω : f ( y ) < q }
So, if we show that q Q L q , c × R q , c Σ Σ , then h is measurable.
{ x Ω : g ( x ) < c q } Σ measurable by proposition 2
{ y Ω : f ( y ) < q } Σ measurable by proposition 2
L q , c × R q , c Σ × Σ family product definition in [11]
L q , c × R q , c Σ Σ Σ × Σ Σ Σ .
q Q L q , c × R q , c Σ Σ Σ Σ is C U ¯
So h 1 ( ( , c ) ) Σ Σ h 1 ( ( , c ) ) = q Q L q , c × R q , c .
   □

3.2.3. Method Product

Lemma 5.
Let ( Ω , Σ ) be a measurable space and f : Ω R be Σ B ( R ) measurable function, then h : Ω R defined as h ( x ) : f 2 ( x ) is Σ B ( R ) measurable.
Proof. 
We need to show that h ( x ) is measurable. i.e., we need to show that h 1 ( ( c , ) ) Σ , for all c R , note that h 1 ( ( c , ) ) = { x Ω : f 2 ( x ) > c } , We prove by cases.
[ c 0 ]
{ x Ω : f 2 ( x ) > c } = .
{ x Ω : f ( x ) > c } { x Ω : f ( x ) < c } Inequality
{ x Ω : f ( x ) > c } Σ Measurable by proposition 2
{ x Ω : f ( x ) < c } Σ Measurable by proposition 2
{ x Ω : f ( x ) > c } { x Ω : f ( x ) < c } Σ Σ is C U ¯
[ c < 0 ]
{ x Ω : f 2 ( x ) > c } = Ω All values of f 2 ( x ) are positive.
Ω Σ Definition of σ -algebra
h 1 ( c , ) Σ { x Ω : f 2 ( x ) > c } = h 1 ( c , ) .
   □
Proposition 4.
Let ( Ω , Σ ) be a measurable space and f : Ω R and g : Ω R be Σ B ( R ) measurable functions, then h : Ω × Ω R defined as h ( x , y ) : f ( y ) g ( x ) is Σ Σ B ( R ) measurable.
Proof. 
We can observe that the product of two functions can be expressed as follows:
f ( y ) g ( x ) = 1 2 [ ( f ( y ) + g ( x ) ) 2 f ( y ) 2 g ( x ) 2 ]
This is measurable because it is in terms of addition, square of measurable functions, and multiplication with a scalar. Follows from Propositions 2 and 3, and Lemma 5.    □

3.2.4. Arithmetic Kernels

Now we proceed to characterize the arithmetic methods using deterministic kernels as defined in Section 2.1.3 that establishes as condition to work on a measurable function.
Theorem 4. 
(Addition kernel) Let h ( x , y ) = f ( x ) + g ( y ) be a measurable function as defined in Proposition 3. The addition function 1 + : ( Ω × Ω ) × B ( R ) [ 1 , 0 ] defined as 1 h ( ( x , y ) , A ) = 1 h ( x , y ) ( A ) is a kernel.
Proof. 
This is a deterministic kernel as defined in in Section 2.1.3, it is sufficient to prove that h ( x , y ) is measurable, which is done in Proposition 3.    □
Theorem 5. 
(Product-Scalar kernel) Let h ( x ) = α f ( x ) be a measurable function as defined in Proposition 2. The product-scalar function 1 α : Ω × B ( R ) [ 1 , 0 ] defined as 1 h ( x , A ) = 1 h ( A ) is a kernel.
Proof. 
This is a deterministic kernel as defined in in Section 2.1.3, it is sufficient to prove that h ( x ) is measurable, which is done in Proposition 2.    □

3.2.5. Product Kernel

Theorem 6. 
(Product kernel) Let h ( x , y ) = f ( x ) g ( y ) be a measurable function as defined in Proposition 4. The product function 1 * : ( Ω × Ω ) × B ( R ) [ 1 , 0 ] defined as 1 h ( ( x , y ) , A ) = 1 h ( x , y ) ( A ) is a kernel.
Proof. 
This is a deterministic kernel as defined in Section 2.1.3, it is sufficient to prove that h ( x , y ) is measurable, which is done in Proposition 4.    □
Remark 1.
For the sake of writing simplicity, in the rest of this paper, whenever we refer to an arithmetic kernel or a combination of these, we will use the symbol 1 + .

4. Results

4.1. Selection Scheme Formalization

A Selection Scheme, is a method of selecting a group of individuals from a population [28]. Some studies of these schemes can be found in [29,30,31]. Many schemes define an individual selection mechanism S 1 : Ω λ Ω , and selects a group of individuals by repeatedly applying S1. In this paper, we study the uniform, fitness proportional, tournament ([32]), roulette, and ranking selection schemes:
1.
A uniform scheme ( U NIFORM 1 : Ω λ Ω ) gives to each candidate solution i = 1 , 2 , , λ , the same selection probability p x i = 1 λ .
2.
A fitness proportional scheme ( P ROPORTIONAL 1 : Ω λ Ω ) gives to each candidate solution i = 1 , 2 , , λ , a selection’s probability p x i such that p x i < p x j if f x j f x i and p x i = p x j if f x i = f x j .
3.
A tournament scheme ( T OURNAMENT 1 m : Ω λ Ω ) of size m chooses m individuals using a Uniform scheme and selects an individual from these using a Proportional1 scheme, T o u r n a m e n t 1 m = P r o p o r t i o n a l 1 U n i f o r m m .
4.
A roulette scheme ( R OULETTE 1 : Ω λ Ω ) is a fitness proportional one where p x i = r a t e ( x i ) i = 1 λ r a t e ( x i ) with r a t e x i < r a t e x j if f x j f x i and r a t e x i = r a t e x j if f x i = f x j . If f x i 0 for all i = 1 , 2 , , λ and maximizing then r a t e x i can be set to f x i .
5.
A ranking scheme ( R ANKING 1 : Ω λ Ω ) is a roulette one with
r a t e x i = 1 + | { x k : f x i f x k } | .
6.
A stud scheme ( S TUD : Ω λ Ω ) chooses the best candidate and can be characterized for the next kernel K R μ , μ + λ = π 1 s λ , λ 1
(7)
An Over Selection scheme as defined in [33] ( OS ELECTION 1 : Ω λ Ω ) is a roulette one with p ( x i ) = 0.8 / ( λ 0.68 ) if r a n k i n g ( x i ) λ 0.68 and p ( x i ) = 0.2 / ( λ 0.32 ) if r a n k i n g ( x i ) λ 0.32 where r a n k i n g ( x i ) is defined as
r a n k i n g ( x i ) = 1 + | { x k : f x i f x k } | .
Proposition 5.
If s 1 : Ω λ Ω is a selection scheme with kernel K s 1 then s : Ω λ Ω μ has kernel K s = i = 1 μ K s 1 .
Corollary 1.
If s 1 is based on a probability function then K s is a kernel.
Corollary 2.
TheUniform, Proportional, Tournament, Roulette,Ranking,StudandOSelectionselection schemes have Markov kernels.

4.2. Recombination Scheme Formalization

Recombination schemes use information from one or more parents and generate offspring that share information with their parents. Details for each scheme can be found in [25,34,35,36].
In the following characterizations each individual from a population belongs to Ω n (set of elementary events), where n is the dimension. Keep in mind that all theory developed by [11] is applicable, hence it is generalized from tuples of tuples to a single tuple. Review Proposition 32 and Corollary 33 in [11].
1.
A Single-Point Crossover method ( SPC 1 d : Ω n × Ω n Ω n × Ω n ) is described in Algorithm 2 and can be characterized by the next kernel:
K S P C Q l = π { 1 d } ( A ) π { d + 1 n } ( B )
K S P C Q r = π { 1 d } ( B ) π { d + 1 n } ( A )
K S P C = K S P C Q l K S P C Q r .
Proof. 
K S P C is defined in terms of projection kernel and join-kernels.    □
Algorithm 2. Single Point Crossover-Spc1
Spc1 d ( A , B )
1:
A l = π { 1 , , d } (A)
2:
A r = π { d + 1 , , n } (A)
3:
B l = π { 1 , , d } (B)
4:
B r = π { d + 1 , , n } (B)
5:
Q l = J o i n ( A l , B r )
6:
Q r = J o i n ( B l , A r )
7:
return ( Q l , Q r )
2.
A Multiple-Point Crossover scheme ( M PC 1 D : Ω n × Ω n Ω n × Ω n ). Let D = { 1 , d 1 , d 2 , d m , n } be an ordered list of { m + 2 N + } integers that indicate the m positions of crossover plus the first and last position. This formalization just considers when m is an odd number. We can see in Algorithm 3 the description of the algorithm and can be characterized by the next kernel. where l is the length of D.
K M P C Q l = i = 1 l / 2 ( π { D i * 2 1 D i * 2 } ( A ) π { D i * 2 + 1 D i * 2 + 1 } ( B ) )
K M P C Q r = i = 1 l / 2 ( π { D i * 2 1 D i * 2 } ( B ) π { D i * 2 + 1 D i * 2 + 1 } ( A ) )
K M P C = K M P C Q l K M P C Q r .
Proof. 
K M P C is defined in terms of projection kernel and join-kernels.    □
Algorithm 3. Multiple Point Crossover-MultiplePoint1
1:
D = [ 1 , d 1 , d 2 , d m , n ]
MultiplePoint1 D ( A , B )
1:
Q l = { }
2:
Q r = { }
3:
for i = 1 to l e n g t h ( D ) / 2  do
4:
     Q l i = J OIN ( π { D i 2 1 , , D i 2 } ( A ) , π { D i 2 + 1 , , D i 2 + 1 } ( B ) )
5:
for i = 2 to l e n g t h ( D ) / 2  do
6:
     Q r i = J OIN ( π { D i 2 1 , , D i 2 } ( B ) , π { D i 2 + 1 , , D i 2 + 1 } ( A ) )
7:
return ( Q l , Q r )
3.
A Multi-Parent Crossover scheme ( M ULTI P ARENT C 1 : Ω n b Ω n ) can be considered as a generalization of Uniform Crossover, where the definition is given in Algorithm 4. There, the method ( S IZE : Ω n b N × N ) in line 1 calculates the number of features of each individual and the amount of parents ( n , b ) respectively. The method G ENERATE L IST I NDEX 1 : N × N N n creates a list of length n where each position has an integer that indicates some parent. This assignation is done following some rule defined in the design of the algorithm. Finally, the method C ROSSOVER 1 D : Ω n * b Ω n assigns each element from the parents to a new individual according to values of D. In this characterization we can see that P Ω n b , we are using this representation in order to use all theory created in [11] Section 3 that allow us to move from tuples of tuples to a single tuple.
The method C ROSSOVER 1 ( D , n ) as defined in Algorithm 4 can be characterized by a kernel K C r o s s o v e r : Ω n b × Σ n [ 0 , 1 ] defined as:
K C r o s s o v e r 1 = i = 1 n π { n ( D i 1 ) + i } .
Proof. 
K C r o s s o v e r 1 is defined in terms of projection kernel and join kernels.    □
Algorithm 4. MultiParent Crossover-MultiParentC1
Crossover1 D , n (P)
1:
Q = { }
2:
for i = 1 to n do
3:
     Q i = π { ( n ( D i 1 ) ) + i } ( P )
4:
return (Q)
MultiParentC1(P)
1:
n, b= size(P)
2:
D = GenerateListIndex (n, b)
3:
Q = Crossover1 D , n (P)
4:
return (Q)
4.
A Shuffle Crossover scheme ( S HUFFLE C 1 : Ω n × Ω n Ω n × Ω n ). We start by permuting each parent. Next, we use some scheme that we have studied above to obtain children. Finally, we undo the permutation that we did to the beginning of the method. The definition can be seen in Algorithm 5. Where method R ANDON P ERMUTATION : N N n generates a permutation of a set of indexes corresponding to the length of features of each parent; C ONV P ERMUTATION L p e r : Ω n Ω n sorts the features of the parents according to the set of indexes obtained in R ANDON P ERMUTATION ; S EGMENTED 1 p : Ω n × Ω n Ω n × Ω n is the same as definition above; and C ONV P ERMUTATION I NV L p e r : Ω n Ω n undo the permutation obtained after obtain the child. This method can be characterized by the next kernels:
K R a n d o m P e r m u t a t i o n = K P
K C o n v P e r m u t a t i o n = i = 1 n π L p e r i
K S e g m e n t e d 1 p = K S C
K C o n v P e r m u t a t i o n I n v = i = 1 n π i
K ShuffleC = [ i = 1 n π i i = 1 n π i ] K S C [ i = 1 n π L p e r i i = 1 n π L p e r i ] K P .
Proof. 
K ShuffleC is defined in terms of projection kernel, join-kernels and kernel composition.    □
Algorithm 5. Shuffle Crossover-ShuffleC1
ConvPermutation1 L p e r (P)
1:
Q = { }
2:
for i = 1 to n do
3:
     Q i = π { L p e r i } ( P )
4:
return (Q)
ConvPermutationInv1 L p e r (P)
1:
Q = { }
2:
for i = 1 to n do
3:
     Q { L p e r i } = π i ( P )
4:
return (Q)
ShuffleC1( A , B )
1:
L p e r = R ANDOM P ERMUTATION ( n )
2:
A p e r = C ONV P ERMUTATION L p e r ( A )
3:
B p e r = C ONV P ERMUTATION L p e r ( B )
4:
Q 1 p e r , Q 2 p e r = Segmented1 p ( A p e r , B p e r )
5:
Q 1 = C ONV P ERMUTATION I NV L p e r ( Q 1 p e r )
6:
Q 2 = C ONV P ERMUTATION I NV L p e r ( Q 2 p e r )
7:
return ( Q 1 , Q 2 )
5.
Flat Crossover or Arithmetic Crossover schemes, we can use them when the features are defined in the real numbers. ( F LAT C 1 : Ω n × Ω n Ω n ), ( A RITHMETIC C 1 : Ω n × Ω n Ω n × Ω n ).The definitions can be seen in Algorithm 6. These methods can be characterized by the kernels K F l a t C 1 : ( Ω n × Ω n ) × Σ n [ 1 , 0 ] , K A r i t h m e t i c C 1 : ( Ω n × Ω n ) × ( Σ n × Σ n ) [ 1 , 0 ] where the definitions are:
K F l a t C 1 = i = 1 n [ 1 + 1 [ π i ( A ) π i ( B ) ] ]
K A r i t h m e t i c C 1 = [ i = 1 n [ 1 + 1 [ π i ( A ) π i ( B ) ] ] i = 1 n [ 1 + 2 [ π i ( A ) π i ( B ) ] ] ] .
Proof. 
K A r i t h m e t i c C 1 is defined in terms of projection kernel, join-kernels and kernel composition.    □
6.
A Blended Crossover scheme, can be seen as a generalization of F LAT C 1 . The scheme is represented by the function ( B LENDED C 1 α : Ω n × Ω n Ω n ).The definitions can be seen in Algorithm 6. These methods can be characterized by the kernels K B l e n d e d C 1 : ( Ω n × Ω n ) × Σ n [ 1 , 0 ] , defined by:
K m i n i = π 1 s 2 [ π i ( A ) π i ( B ) ]
K m a x i = π 1 s 2 [ π i ( A ) π i ( B ) ]
K B l e n d e d C 1 = i = 1 n [ 1 + i [ [ K m i n i K m a x i ] U [ 0 , 1 ] ] ] .
Proof. 
K B l e n d e d C 1 is defined in terms of projection kernel, join-kernels, kernel composition, arithmetic kernel and sorts kernel.    □
Algorithm 6. Flat and Arithmetic Crossover-ShuffleC1, ArithmeticC1
FlatC1 ( A , B )
1:
Q = { }
2:
for i = 1 to n do
3:
     α U [ 0 , 1 ]
4:
     Q i = α π i ( A ) + ( 1 α ) π i ( B )
5:
return (Q)
ArithmeticC ( A , B )
1:
Q l = { }
2:
Q r = { }
3:
for i = 1 to n do
4:
     α U [ 0 , 1 ]
5:
     Q l i = α π i ( A ) + ( 1 α ) π i ( B )
6:
     Q r i = ( α 1 ) π i ( A ) + ( α ) π i ( B )
7:
return ( Q l , Q r )
Algorithm 7. Blended Crossover-BlendedC1
BlendedC1 α ( A , B )
1:
Q = { }
2:
for i = 1 to n do
3:
     x m i n = m i n ( π i ( A ) , π i ( B ) )
4:
     x m a x = m a x ( π i ( A ) , π i ( B ) )
5:
     d x = x m a x x m i n
6:
     β [ 0 , 1 ]
7:
     Q i = β ( x m i n α d x ) + ( 1 β ) ( x m a x + α d x )
8:
return (Q)
7
A Linear Crossover scheme, ( L INEAR C 1 : Ω n × Ω n Ω n × Ω n × Ω n ). The definitions can be seen in Algorithm 6. These methods can be characterized by the kernels K L i n e a r C 1 : ( Ω n × Ω n ) × ( Σ n × Σ n × Σ n ) [ 1 , 0 ] , defined by:
K L i n e a r C 1 = i = 1 n [ [ 1 + 1 i [ π i ( A ) π i ( B ) ] ] 1 + 2 i [ π i ( A ) π i ( B ) ] 1 + 3 i [ π i ( A ) π i ( B ) ] ] .
Proof. 
K L i n e a r C 1 is defined in terms of kernel, join-kernels, kernel composition and addition kernel.    □
Algorithm 8. Linear Crossover-LinearC1
LinearC1( A , B )
1:
Q 1 = { }
2:
Q 2 = { }
3:
Q 3 = { }
4:
for i = 1 to n do
5:
     Q 1 i = ( 1 / 2 ) π i ( A ) + ( 1 / 2 ) π i ( B )
6:
     Q 2 i = ( 3 / 2 ) π i ( A ) ( 1 / 2 ) π i ( B )
7:
     Q 3 i = ( 1 / 2 ) π i ( A ) + ( 3 / 2 ) π i ( B )
8:
return ( Q 1 , Q 2 , Q 3 )

4.3. Simulated Annealing (sa)

4.3.1. Concept

The Simulated Annealing algorithm (sa) considers the idea behind the process of heating and cooling a material to recrystallize it, see Algorithm 9. When the temperature decreases, the material settles into a more ordered state, and the state into which they settle is not always the same. This state tends to have low energy compared when the material is in the presence of high temperature ([25]). If we consider energy as a cost function, we can use this approach to minimize cost functions. Therefore, SA is a stochastic algorithm that works with a single-individual that generates a single candidate-solution x (parent) and sets a high temperature to explore the search space. Then, a variation mechanism generates a new candidate-solution y (child) and measures its cost. A replacement policy, that fitness function and the temperature, picks one individual between the father and the child. Finally, a process decreases the temperature looking for each new solution having less energy.
Clearly, the replacement policy in Algorithm 9 (lines 6, …, 11) is not elitist. This allows sa to expand the search but can lead to the loss of some good candidate-solutions. In practice, it is normal to keep track of the best solution found so far [25]. If this is done, the replacement policy is an elitist one.
Algorithm 9. Simulated Annealing [25]
Simulated annealing
1:
T = initial temperature > 0
2:
α ( T ) = cooling function: α ( T ) [ 0 , T ] for all T
3:
Initialize a candidate solution x 0 to minimization problem f ( x )
4:
while¬ TerminationCondition() do
5:
    Generate a candidate solution x
6:
    if  f ( x ) < f ( x 0 )
7:
     x 0 = x
8:
    else
9:
     r = U [ 0 , 1 ]
10:
    if r < e x p [ ( f ( x 0 ) f ( x ) ) / T ]
11:
     x 0 = x
12:
     T = α ( T )

4.3.2. Formalization

To formalize and characterize (sa), we use the approach proposed by [11]. We rewrite Algorithm 9 in terms of individual non-stationary stochastic methods, see Algorithm 10. This new Algorithm is in terms of Variation-Replacement methods. Observe that Algorithms 9 and 10 are equivalents. Line 5 of Algorithm 9 is the method Variate S A (line 1) of Algorithm 10; lines 6 to 11 of Algorithm 9 is the method Replace S A (line 2) of Algorithm 10. Finally, line 12 of Algorithm 9 and method UpdateParameters (line 3) perform the same task.
Algorithm 10. Simulated Annealing in terms of VR methods
NextPop S A (x)
1:
y=Variate S A (x)
2:
y=Replace S A T (y, x)
3:
UpdateParameters (T)
4:
return x
Now, we focus on characterizing (sa) as a VR stochastic method and analyzing its convergence through non-stationary Markov kernels.
Proposition 6.
IfReplace S A ( x , x ) is an elitist method, then it can be characterized by the Markov Kernel R S A : Ω 2 × Σ [ 1 , 0 ] defined as:
K R S A = π 1 s 2 .
Proof. 
K R S A is defined in the same way that the method of R H C in [11]. So the proof uses the same argument that Lemma 75 in [11].    □
Proposition 7.
If the stochastic method Variate A S T can be characterized by a non-stationary Markov kernel V S A T ( t ) : Ω × Σ [ 1 , 0 ] and condition of Proposition 6 are fulfilled then method theNextPop S A (x) can be described as a VR non-stationary Markov Kernel defined as
K S A ( t ) = K R K V S A T ( t ) .
Proof. 
K S A ( t ) is a kernel composition under the given conditions.    □
Proposition 8.
IfReplace S A is an elitist method, then NextPop S A can be characterized by an elitist non-stationary Markov kernel.
Proof. 
This proof uses the same argument as Proposition 77 in [11].    □

4.3.3. Convergence

Corollary 3.
If the conditions of Propositions 6, 7 and 8, are fulfilled and methodVariate A S T is optimal strictly bounded from zero thenNextPop S A is optimal strictly bounded from zero.
Proof. 
Follows from Definition 67, Lemma 68, and Definition 69 in [11] and Proposition 8 that state that NextPop S A can be characterized by an elitist kernel, and this is optimal strictly bounded from zero.    □
Theorem 7.
sawill converge to the global optimum ifReplace S A is elitist and ifVariate A S T is optimal strictly bounded from zero.
Proof. 
Follows from Corollary 3, and Propositions 6–8.    □

4.4. Evolutionary Strategies (es)

4.4.1. Concept

Evolutionary Strategies ( μ / ρ + , λ )-es are a type of Evolutionary Algorithms that apply mutation, recombination, and selection operators to a population of individuals [22], see Algorithm 11. Every individual has two parts: the candidate solution (x) and the set of endogenous strategy parameters (s) used to control the mutation operator ([22]). An es randomly initializes the population, (Line 2), and evolves both parts of the individual (Lines 5–9) up to certain ending-condition is fulfilled (Line 3). The set of endogenous parameters are exposed to evolution (Lines 6 and 8) before producing a child candidate solution (Line 7 and 9) to introduce variety. The new individual is a composition of a set of selected candidate solutions (Line 5). es generates a new population of λ new individuals each generation (Line 4). Finally, es selects a final population using two possible approaches. The ( μ + λ )-es approach that selects the best μ individuals among the μ parents and λ children or the ( μ , λ )-es that selects the best μ individuals from the λ children (notice that λ μ in this case). In this work, we study both of them.
Algorithm 11. Evolutionary strategies described by [22]
ES μ / ρ + , λ
1:
g = 0
2:
initialize( P q ( 0 ) : = { ( y m ( 0 ) , s m ( 0 ) , F ( y m ( 0 ) ) ) , m = 1 , , μ })
3:
while¬ TerminationCondition() do
4:
    for  l = 1 to λ  do
5:
         a l = Marriage( P q g , ρ )
6:
         s l = Recombination s ( a l )
7:
         y l = Recombination y ( a l )
8:
         s l = Mutation s ( s l )
9:
         y l = Mutation s ( y l , s l )
10:
         F l = F ( y l )
11:
     P 0 g = { ( y l , s l , F l ) , l = 1 , , λ }
12:
    if ( μ , λ ) then
13:
         P q g + 1 = S e l e c t i o n ( P 0 g , μ )
14:
    else ( μ + λ )
15:
         P q g + 1 = S e l e c t i o n ( P 0 g , P q g , μ )
16:
    g = g+1

4.4.2. Formalization

To formalize and characterize ( μ / ρ 0.35 e x + 0.5 e x , λ )-es, we rewrite Algorithm 11 in terms of individual non-stationary stochastic methods, see Algorithm 12. This follows the approach in [11] that express the algorithms in terms of Variation-Replacement methods to study their convergence properties.
Notice that Algorithms 11 and 12 are equivalents: Lines 4–11 in Algorithm 11 is method Variate(P) (Line 1) in the NextPop method of Algorithm 12. Also, Lines 12–15 in Algorithm 11 are Line 2 in the NextPop method of Algorithm 12. Using this characterization, we proceed to characterize each method of Algorithm 12 through non-stationary Markov kernels.
With the object of characterizing ( μ / ρ + , λ )-ES we need to establish some non-stationary Markov kernels. First, we study the Variate method (Line 1, method NextPop, Algorithm 12).
Following Definition 55 in [11], we can express the variation method V ARIATE : Ω μ Ω λ as a joined stochastic method.
V ARIATE ( P ) = i = 1 λ N EXT S UB P OP i ( P )
where N EXT S UB P OP : Ω μ Ω chooses ρ individuals from the population, combines the ρ individuals, generates a child and finally mutates the strategy and the child.
Algorithm 12. Evolutionary strategies algorithm-NextPop method described in terms of VR methods
NextSubPop i (P)
1:
a = PickParents(P)
2:
q = Xover a (P)
3:
UpdateStrategies a (s, i)
4:
d = Variate s (q)
5:
returnd
UpdateStrategies a (s, i)
1:
z = XoverStrategie a (s)
2:
s i = VariateStrategie(z)
Variate(P)
1:
for i = 1 to λ  do
2:
     Q i = NextSubPop i (P)
3:
return Q
NextPop Ψ (P)
1:
Y = Variate(P)
2:
Q = Replace Ψ (P, Y)
3:
returnQ
Proposition 9.
If Lines 8 and 9 of methodUpdateStrategiesof Algorithm 5 can be characterized by non-stationary kernels X : R ρ × B ( R ) ρ [ 0 , 1 ] and V S ( t ) : R × B ( R ) [ 0 , 1 ] respectively.UpdateStrategiescan be characterized by a non-stationary kernel U S ( t ) : R ρ × B ( R ) [ 0 , 1 ] defined as:
K U S ( t ) = K V S ( t ) K X S .
Proof. 
K U S ( t ) is in terms of kernel composition, follows from Definition 25 in [11]. □
Proposition 10.
If Lines 2 and 4 of Algorithm 12 can be characterized by non-stationary Markov kernels X OVER a : ( Ω ρ × Σ ) [ 0 , 1 ] and V ARIATE s : ( Ω × Σ ) [ 0 , 1 ] respectively, then the method N EXT S UB P OP can be characterized by the kernel N EXT S UB P OP : ( Ω ρ × Σ ) [ 0 , 1 ] defined as the non-stationary kernel:
K N EXT S UB P OP = K V ARIATE s ( t ) K X O V E R π 1 , , ρ K P .
Proof. 
K N EXT S UB P OP is in terms of kernel composition, follows from Definition 25 in [11]. □
Proposition 11.
If N EXT S UB P OP can be characterized by a non-stationary Markov kernel, the stochastic method V ARIATE ( t ) can be characterized by a kernel V : Ω μ × Σ μ [ 0 , 1 ] defined as
K V ARIATE ( t ) = [ i = 1 λ [ K N EXT S UB P OP i ] ] .
Proof. 
K V ARIATE ( t ) is a join stochastic method, follows from Definition 55 and Proposition 56 in [11]. □
Proposition 12.
The stochastic methodReplace ( μ + λ ) used in Line 2 of methodNextPop, can be characterized by the kernel R μ , μ + λ : Ω μ + λ × Σ μ [ 0 , 1 ] defined as K R μ , μ + λ = π 1 , , μ s μ + λ , μ + λ 1 and the stochastic method R EPLACE ( μ , λ ) , can be characterized by the kernel R μ , λ : Ω λ × Σ μ [ 0 , 1 defined as K R μ , λ = π 1 , , μ s λ , λ 1 .
Proof. 
K R μ , λ and K R μ + λ are kernels composition. Follows from Definition 25 in [11]. □
Corollary 4.
If methodsPickParents,XOver a ,XoverStrategie a ,VariateStrategieand,Variate s can be described by Markov kernels fulfilling the conditions of Propositions 9 and 10, evolutionary Strategies can be described by a VR kernel.
K E S = K R K V
where:
K V = K V a r i a t e
K R = K R μ , λ or K R = K R μ + λ .
Proof. 
Follows from Propositions 9–12. □

4.4.3. Convergence

Proposition 13.
TheNextPop ( μ / ρ + λ ) E S is an elitist stochastic method that can be characterized by an elitist stochastic kernel.
Proof. 
Let k [ 1 , μ ] be the index of the best individual in population P, then f ( B E S T ( P ) ) = f ( P k ) . Since P { P V ARIATE ( P ) } and the method Replace is elitist. It is clear that f ( B E S T ( P V ARIATE ( P ) ) ) f ( P k ) . □
Corollary 5.
If conditions of Proposition 9 and 10 are satisfied andVariate s is optimal strictly bounded from zero then the methodNextPop μ + λ is optimal strictly from zero.
Proof. 
Follows from Definition 67, Lemma 68, and Definition 69 of [11] and Proposition 13 that establish that an elitist kernel is optimal strictly bounded from zero. □
Theorem 8. 
( μ / ρ + λ )-ES will converge to the global optimum if methodsPickParentsandVariate s ( t ) can be characterized by stationary or non-stationary Markov kernels andVariate s is optimal strictly bounded from zero.
Proof. 
Follows from Theorem 3 and Corollary 5. □

5. Discussion

We have generalized the conditions of convergence to the global optimum from stationary to non-stationary Markov process that are presented in the work of stochastic global optimization algorithms: a systematic approach proposed in [11]. We study the necessary theory to describe with kernels some arithmetic methods. For doing so, it was necessary to use the concepts studied in real analysis, such as arithmetic between measurable functions. However, the literature found only studied the case of operating two functions on the same variable but not on two different variables. Hence, the concepts of product sigma algebra presented by Gomez in [11], were used to prove that arithmetic operations between measurable functions on two different variables are also measurable.
We formalized some selection and recombination schemes to generalize the theory to cover as many variations of each algorithm as possible. For that, we have found that most of these methods could be characterized using the kernels studied in [11] and the new kernels studied in this paper. This makes us think that other schemes in the literature could be easily adapted to the concepts developed in this paper.
In this paper, we have formalized and characterized the simulated annealing algorithm and evolutionary strategies using the developed theory (both have been formalized in terms of Variation-Replacement kernels). A wide variety of non-stationary algorithms described algorithmically can be found in the literature. However, the theory described in this work cannot be used directly. For that reason, the first step to take is to write the algorithms in terms of Variation-Replacement as shown in Section 4.3 and Section 4.4. This approach can also be studied in [1]. There, the class of hybrid adaptive evolutionary algorithms is characterized.
Also, we formulated a set of conditions that sa and es algorithms should fulfill to achieve a global convergence. After characterizing these algorithms by a Variation-Replacement Kernel, it has been proven that these can converge to the global optimum if the particular implementation of the Variational method is strictly bounded from zero, which depends of the way each algorithm is implemented.
Our future work will focus on using the proposed approach to formalize as many stationary and non-stationary SGoals as possible, and extending and developing the theory for several particular methods (Mutation, recombination and selection) that can be considered in SGoals. Moreover, we will study new convergence conditions and not only for the global optimum.

Author Contributions

Conceptualization, J.G. and A.R.; methodology, J.G. and A.R.; validation, J.G. and A.R.; formal analysis, J.G. and A.R.; investigation, J.G. and A.R.; writing—original draft preparation, J.G. and A.R.; writing—review and editing, J.G. and A.R.; supervision, Gomez; project administration, Gomez.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Universidad Nacional de Colombia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SGoalStochastic Global Optimization Algorithm
SGoalsStochastic Global Optimization Algorithms
ESEvolutionary Strategies
SASimulated Annealing

References

  1. Gómez, J.; León, E. On the class of hybrid adaptive evolutionary algorithms (chavela). Nat. Comput. 2021, 20, 377–394. [Google Scholar] [CrossRef]
  2. Žilinskas, A.; Zhigljavsky, A. Stochastic global optimization: A review on the occasion of 25 years of Informatica. Informatica 2016, 27, 229–256. [Google Scholar] [CrossRef]
  3. Törn, A.; Žilinskas, A. Global Optimization; Springer: Berlin/Heidelberg, Germany, 1989. [Google Scholar]
  4. Mockus, J.; Tiesis, V.; Zilinskas, A. The application of Bayesian methods for seeking the extremum. Towards Glob. Optim. 1978, 2, 2. [Google Scholar]
  5. Neimark, J.; Strongin, R. Function extremum search with the use of information maximum principle. Autom. Remote Control 1966, 27, 101–105. [Google Scholar]
  6. Zhigljavsky, A.; Zilinskas, A. Stochastic Global Optimization; Springer Science & Business Media: New York, NY, USA, 2007; Volume 9. [Google Scholar]
  7. Tikhomirov, A.S. On the convergence rate of the Markov homogeneous monotone optimization method. Comput. Math. Math. Phys. 2007, 47, 780–790. [Google Scholar] [CrossRef]
  8. Al-Mharmah, H.; Calvin, J.M. Optimal random non-adaptive algorithm for global optimization of Brownian motion. J. Glob. Optim. 1996, 8, 81–90. [Google Scholar] [CrossRef]
  9. Chakraborty, U.K.; Deb, K.; Chakraborty, M. Analysis of selection algorithms: A Markov chain approach. Evol. Comput. 1996, 4, 133–167. [Google Scholar] [CrossRef]
  10. François, O. An evolutionary strategy for global minimization and its Markov chain analysis. IEEE Trans. Evol. Comput. 1998, 2, 77–90. [Google Scholar] [CrossRef]
  11. Gomez, J. Stochastic global optimization algorithms: A systematic formal approach. Inf. Sci. 2019, 472, 53–76. [Google Scholar] [CrossRef]
  12. Romeijn, H.E.; Smith, R.L. Simulated annealing and adaptive search in global optimization. Probab. Eng. Inform. Sci. 1994, 8, 571–590. [Google Scholar] [CrossRef]
  13. Weise, T. Global optimization algorithms-theory and application. Self-Publ. Thomas Weise 2009, 361. [Google Scholar]
  14. Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Prentice Hall Press: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
  15. De Jong, K. An Analysis of the Behavior of a Class of Genetic Adaptive Systems. Ph.D. Thesis, University of Michigan, Ann Arbor, MI, USA, 1975. [Google Scholar]
  16. Holland, J.H. Adaptation in Natural and Artificial Systems; The University of Michigan Press: Ann Arbor, MI, USA, 1975. [Google Scholar]
  17. Mitchell, M. An Introduction to Genetic Algorithms; MIT Press: Cambridge, MA, USA, 1996. [Google Scholar]
  18. Goldberg, D.E.; Deb, K. A comparative analysis of selection schemes used in genetic algorithms. In Foundations of Genetic Algorithms; Morgan Kaufmann: Burlington, MA, USA, 1991; pp. 69–93. [Google Scholar]
  19. Das, S.; Suganthan, P.N. Differential Evolution: A Survey of the State-of-the-Art. IEEE Trans. Evol. Comput. 2011, 15, 4–31. [Google Scholar] [CrossRef]
  20. Storn, R.; Price, K. Differential Evolution–A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
  21. Kirkpatrick, S.; Gelatt, C.D.; Vecchi, M.P. Optimization by Simulated Annealing. Science 1983, 220, 671–680. [Google Scholar] [CrossRef] [PubMed]
  22. Beyer, H.G.; Schwefel, H.P. Evolution strategies–A comprehensive introduction. Nat. Comput. 2002, 1, 3–52. [Google Scholar] [CrossRef]
  23. Eiben, A.E.; Hinterding, R.; Michalewicz, Z. Parameter Control in Evolutionary Algorithms. IEEE Trans. Evol. Comput. 1999, 3, 124–141. [Google Scholar] [CrossRef]
  24. Rudolph, G. Convergence of Evolutionary Algorithms in General Search Spaces. In Third IEEE Conference on Evolutionary Computation; IEEE Press: Piscataway, NJ, USA, 1996; pp. 50–54. [Google Scholar]
  25. Simon, D. Evolutionary Optimization algorithms; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  26. Bowerman, B.L. Nonstationary Markov Decision Processes and Related Topics in Nonstationary Markov Chains. Ph.D. Thesis, University of Iowa, Iowa City, IA, USA, 1974. [Google Scholar]
  27. Royden, H.L.; Fitzpatrick, P. Real Analysis, 4th ed.; Pearson Education Press: New York, NY, USA, 2010. [Google Scholar]
  28. Blickle, T.; Thiele, L. A Comparison of Selection Schemes Used in Evolutionary Algorithms. Evol. Comput. 1996, 4, 361–394. [Google Scholar] [CrossRef]
  29. Baker, J.E. Reducing bias and inefficiency in the selection algorithm. In Proceedings of the Second International Conference on Genetic Algorithms, Cambridge, MA, USA, 28–31 July 1987; Erlbaum Associates Inc.: Mahwah, NJ, USA, 1987; Volume 206, pp. 14–21. [Google Scholar]
  30. Angeline, P.J. Genetic Programming: On the Programming of Computers by Means of Natural Selection: John R. Koza, a Bradford Book; Elsevier: Amsterdam, The Netherlands, 1994; ISBN 0-262-11170-5. [Google Scholar]
  31. Whitley, L.D. The GENITOR algorithm and selection pressure: Why rank-based allocation of reproductive trials is best. In Icga; Citeseer, 1989; Volume 89, pp. 116–123. [Google Scholar]
  32. Miller, B.L.; Miller, B.L.; Goldberg, D.E.; Goldberg, D.E. Genetic Algorithms, Tournament Selection, and the Effects of Noise. Complex Syst. 1995, 9, 193–212. [Google Scholar]
  33. Koza, J. Genetic Programming: On the Programming of Computers by Means of Natural Selection; The MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
  34. Eiben, A.E.; Bäck, T. Empirical investigation of multiparent recombination operators in evolution strategies. Evol. Comput. 1997, 5, 347–365. [Google Scholar] [CrossRef]
  35. Herrera, F.; Lozano, M.; Verdegay, J.L. Tackling real-coded genetic algorithms: Operators and tools for behavioural analysis. Artif. Intell. Rev. 1998, 12, 265–319. [Google Scholar] [CrossRef]
  36. Michalewicz, Z.; Dasgupta, D.; Le Riche, R.G.; Schoenauer, M. Evolutionary algorithms for constrained engineering problems. Comput. Ind. Eng. 1996, 30, 851–870. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Gomez, J.; Rivera, A. Non-Stationary Stochastic Global Optimization Algorithms. Algorithms 2022, 15, 362. https://doi.org/10.3390/a15100362

AMA Style

Gomez J, Rivera A. Non-Stationary Stochastic Global Optimization Algorithms. Algorithms. 2022; 15(10):362. https://doi.org/10.3390/a15100362

Chicago/Turabian Style

Gomez, Jonatan, and Andres Rivera. 2022. "Non-Stationary Stochastic Global Optimization Algorithms" Algorithms 15, no. 10: 362. https://doi.org/10.3390/a15100362

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop