Elsevier

Information Sciences

Volume 297, 10 March 2015, Pages 216-235
Information Sciences

Cluster-Based Population Initialization for differential evolution frameworks

https://doi.org/10.1016/j.ins.2014.11.026Get rights and content

Abstract

This article proposes a procedure to perform an intelligent initialization for population-based algorithms. The proposed pre-processing procedure, namely Cluster-Based Population Initialization (CBPI) consists of three consecutive stages. At the first stage, the individuals belonging to a randomly sampled population undergo two subsequent local search algorithms, i.e. a simple local search that performs moves along the axes and Rosenbrock algorithm. At the second stage, the solutions processed by the two local searches undergo the K-means clustering algorithm and are grouped into sets on the basis of their euclidean distance. At the third stage the best individuals belonging to each cluster are saved into the initial population of a generic optimization algorithm. If the population has not been yet filled, the other individuals of the population are sampled within the clusters by using a fitness-based probabilistic criterion. This three stage procedure implicitly performs an initial screening of the problem features in order to roughly estimate the most interesting regions of the decision space. The proposed CBPI has been tested on multiple classical and modern Differential Evolution variants, on a wide array of test problems and dimensionality values as well as on a real-world problem. The proposed intelligent sampling appears to have a significant impact on the algorithmic functioning as it consistently enhances the performance of the algorithms with which it is integrated.

Introduction

Engineering and natural sciences often require the solution of multiple optimization problems. This fact makes the study of optimization methods extremely important in fields such as design and control engineering. Since only a very limited number of real-world optimization problems can be solved by exact methods, in the vast majority of cases an optimizer that does not require a specific hypothesis must be used. Over the past decades, computer scientists have designed a multitude of these types of algorithms for addressing real-world problems where an exact approach is almost never applicable. These methods, known as metaheuristics, do not offer guarantees regarding the convergence, but still are capable to detect high quality solutions that can be of great interest for engineers and practitioners. Among the plethora of metaheuristics some are Evolutionary Algorithms (EAs) [26], Swarm Intelligence (SI) [25], and Memetic Computing (MC) [50].

For about two decades, i.e. from the 1970s to 1990s, computer scientists have put much effort to design metaheuristics with the intention of detecting an algorithm that could outperform all the other algorithms. After the publication of the No Free Lunch (NFL) Theorems [71], the view on optimization of scientists and practitioners underwent a radical modification. The NFL Theorems prove that all the optimization algorithms, under the hypotheses that they search within a set of finite candidate solutions and that the algorithms never visit the same point/candidate solution twice, display the same performance when averaged over all the possible optimization problems. As an immediate consequence, it was clear that it was no longer useful to discuss which algorithm was universally better or worse. Despite of the fact that the hypotheses of NFL Theorems are often not realistic (for example, it is very unlikely that an EA does not generate the same point twice during a run), a large portion of algorithmic design community started to propose algorithms, which were tailored to specific problems, see e.g. [64], [14], [53], instead of trying to propose universally applicable algorithms. On the other hand, by using the non-realism of NFL Theorems’ hypotheses as an argument, another portion of the optimization community in recent years researchers have attempted to push towards the outer limit of these theorems by proposing relatively flexible algorithmic structures that combine (to some extent) robustness and high performance on various problems. This tendency is especially clear in continuous optimization and for those algorithms characterized by adaptively coordinated heterogeneous algorithmic components. For these two sub-fields, the NFL Theorems are proved to be not verified, see e.g. [3], [58], respectively.

Since modern algorithms for continuous optimization are often composed of multiple adaptively coordinated operators, these two sub-fields are not disjointed. For example, in the context of Differential Evolution (DE), the optimizer proposed in [59] combines and coordinates multiple mutation strategies by making use of a learning period and a randomized success based logic (see also [22], [52]). In [44] another DE based strategy namely ensemble has been presented on the basis of strategy used in Evolutionary Programming proposed in [43]. In the ensemble multiple mutation and crossover strategies, as well as the related parameters are encoded within the solutions and evolve with them. Other harmonic self-adaptive combinations of components within the DE framework are proposed in [8], [7]. In the context of Particle Swarm Optimization (PSO), a harmonic coordination of multiple components is also a popular option to enhance the algorithmic robustness over a range of problems. An emblematic example of this strategy is the so-called Frankenstein’s PSO [45]. A more elegant algorithm that coordinates, in a simplistic way, a perturbation logic with a variable decomposition is proposed in [37].

Some studies focus on the coordination techniques in order to have a robust behavior of the algorithm. Several nomenclatures are used in different contexts to express fairly similar concepts. With the term portfolio it is usually referred to algorithmic frameworks composed of optimizers that are alternatively selected during the run time. The selection criteria can be a simple schedule or a more sophisticated adaptive system. Some examples in the context of continuous optimization are given in [68], [57]. In the context of combinatorial optimization, and more specifically for the maximum satisfiability problem, a popular portfolio named SATzilla platform, see [72], [32], has been proposed. The difficulty of finding a trade-off between the search algorithms and the aim at determining an automatic coordination system is studied in [33]. A model of the behavior of optimizers in order to predict their run time is presented in [34]. Very closely related to the concept of portfolio, hyper-heuristics are composed of multiple algorithms usually coordinated by a machine learning algorithm which takes a supervisory role. This term is in the vast majority of cases, to combinatorial problems. Famous examples of hyper-heuristic have been proposed in [20], [11] in the field of timetabling and rostering while in [12] graph coloring heuristics are coupled with a random ordering heuristic. An important concept in hyper-heuristic implementation is the choice function, that is a criterion that assigns a rewarding score to the most promising heuristic, see [20]. More sophisticated coordination schemes present in the literature make use of reinforcement learning in a stand alone or combined fashion, see e.g. [11], [23], and memory-based mechanisms, see [10]. Elegant learning schemes coupled with multiple operators (multi-agents) for addressing complex optimization problems are presented in [2], [1].

Closely related to hyper-heuristics and portfolio algorithms, Memetic Algorithms (MAs) are optimization algorithms composed of an evolutionary framework and a set of local searchers activated within the generation cycle, see [46], [30]. In MAs, as for the related algorithmic families, optimization is carried out by multiple components/sub-algorithms but unlike them, emphasize the global and local search roles of its components. Although the one may argue that there is no clear definition of global and local search (e.g. a DE with a proper tuning can be used as a local search), the term MA is broadly used to refer population-based hybrid algorithms. Moreover, modern MA implementation ignore the original definition that the population-based framework should be evolutionary and refer as MAs also those algorithms based on a SI framework, see e.g. [69], [66]. Recently, the concept of MA has been extended to single-solution algorithms or to any algorithm composed of multiple/heterogeneous components. In the latter case the subject is termed, by a part of the computer science community, as Memetic Computing (MC) and its implementation as MC structures, see e.g. [50], [49], [54], [55].

Regardless of the used nomenclature, an important issue, that is also the focus of this paper, is the generation of an initial population in population-based hybrid algorithms. Nearly all the population-based metaheuristics start with the random sampling of a prefixed amount of points within the decision space. This choice can be explained by the motivation, “since we have no a priori knowledge on the problem, we give to each possible candidate solution the same chance to be in the starting population”. Obviously, there is nothing wrong in this way of reasoning. Moreover, this initialization has the undoubted advantage that is computationally cheap as it does not require objective function evaluation nor other complex operations. On the other hand, for every problem, there likely exists many other strategies that can lead to much better results. Similar in the motivation, but very different in the implementation, a fully deterministic procedure that spreads the points in the decision space is also possible, see [41]. In the latter case, the motivation can be summarized as “since we have no a priori knowledge on the problem, we try to sample the initial points in a way that they cover the decision space as much as possible”. This choice besides being computationally expensive includes an implicit drawback. The minimum amount of points necessary to cover the decision space grows exponentially with the dimensionality of the problem. Hence, in high dimensions a very (unreasonable) large amount of points is required to have a representative search space coverage. Some studies on the degree of randomization of the initial population have been reported in the literature, especially about EAs, see [42], [61]. It is shown that in the many cases a deterministic initial sampling can lead to a performance deterioration. On the other hand, a random sampling within mapped areas of the decision space, i.e. a quasi random sampling, leads to a robust algorithmic behavior without excessively jeopardizing the algorithmic performance with respect to a simple (pseudo-) random sampling.

Whenever there is some knowledge about the problem, a sampling that uses this knowledge can be used to enhance the algorithmic performance, see [24]. For example, in control engineering a initial tuning of the control parameters usually allows an estimation of the instability region and a vague estimation of the region of interest of the algorithm. An initial sampling in this region of the decision space can bias the search towards a quick detection of the optimum, see e.g. [14].

Although in the vast majority of cases an a priori knowledge of the problem is not available, there is always the possibility to perform at runtime a problem characterization in order to extract some features to be exploited in the subsequent stages of optimization, see [18]. A pioneering study in this direction proposed the selective sampling, [5]. This procedure consists of performing an initial random initialization containing a large amount of points followed by a tournament selection in order to shrink the population to those individuals that display the best performance.

Within the context of DE, the sampling of extra points according to a central symmetry (opposition-based points [60]) appear to be beneficial to the algorithmic performance. Another approach consists of applying a local search to one or more solutions and then insert these improved solutions into the initial population of an optimizer. The schemes that improved one solution and inserts into a DE initial population is termed super-fit and displayed a very good performance with respect to the same algorithm making use of a random population and whose budget is entirely devoted to DE, see [15], [35], [19].

This article proposes a novel algorithmic component for pre-processing the initial solutions and generating an initial population for DE algorithms. The proposed component does not require any assumption on the optimization problem (except it being continuous). More specifically, an initial screening of the problem is implicitly performed in order to detect the most promising regions of the decision space. This result is achieved by a multiple stage procedure. At first a set of point is sampled at random. Subsequently two local searchers with very different features are consecutively applied to them with a shallow depth. The resulting points are then clustered. The population of the optimization algorithm is then composed of those individuals belonging to each cluster that display the highest performance and by other points, sampled from their neighborhood, according to a probabilistic criterion. Thus, the initial population is composed of points displaying a good performance and spread in different basins of attraction. A graphical representation of the entire framework is presented in Fig. 1.

In different ways, the combination of clustering techniques within DE framework for global optimization is a topic that has been investigated. For example, paper [70] makes use of a clustering technique over the individuals of a DE population in order to prevent a diversity loss and premature convergence. Paper [13] uses the one step k-means clustering as a multi-parent crossover. This idea is developed in [39] where the k-means clustering is associated to two novel crossover operators. In the context of dynamic optimization problem, the algorithm in [29] uses a multi-population where each sub-population covers a different area of the decision space. The number of clustered populations is dynamically varied during the optimization by means of an adaptive logic.

The remainder of this article is organized in the following way. Section 2 gives a description of the proposed initialization procedure. Section 3 displays for a large set of problems the effect of the proposed initialization over multiple and diverse optimizers. Finally, Section 4 gives the conclusions of this work.

Section snippets

The proposed Cluster-Based Population Initialization

Without a loss of generality, in order to clarify the notation in this paper, we refer to the minimization problem of an objective function (or fitness) f(x), where the candidate solution x is a vector of n design variables (or genes) in a decision space D. Thus, the optimization problem considered in this paper consists of the detection of that solution xD such that f(x)<f(x), and this is valid xD. Array variables are highlighted in bold face throughout this paper.

Before entering into the

Numerical results

In order to test the validity and potentials of the proposed CBPI, the following testbeds have been taken into account:

  • The CEC2013 benchmark described in [38] in 10, 30, and 50 dimensions (28 test problems).

  • The BBOB2010 benchmark described in [48] in 100 dimensions (24 test problems).

  • The CEC2010 benchmark described in [65] in 1000 dimensions (20 test problems).

In addition, one real-world problem from [21] is also studied. Totally 129 problems for 5 dimensionality values have been considered.

Conclusion

This article proposes a software module that processes a population randomly sampled within a decision space and performs an intelligent sampling to detect the most interesting/promising areas of the domain. This software module is composed of three sub-modules that consecutively act on the sampled points. At first, two local search algorithms characterized by different search logics are applied to each solution with a limited budget. During the second stage, the improved solutions are

Acknowledgement

This research is supported by the Academy of Finland, Akatemiatutkija 130600, “Algorithmic design issues in Memetic Computing”.

References (73)

  • R. Mukherjee et al.

    Cluster-based differential evolution with crowding archive for niching in dynamic environments

    Inf. Sci.

    (2014)
  • F. Neri et al.

    Memetic algorithms and memetic computing optimization: a literature review

    Swarm Evol. Comput.

    (2012)
  • F. Neri et al.

    Compact particle swarm optimization

    Inf. Sci.

    (2013)
  • A. Reese

    Random number generators in genetic algorithms for unconstrained and constrained optimization

    Nonlinear Anal.: Theory Methods Appl.

    (2009)
  • P.J. Rousseeuw

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

    J. Comput. Appl. Math.

    (1987)
  • H. Wang et al.

    A memetic particle swarm optimization algorithm for multimodal optimization problems

    Inf. Sci.

    (2012)
  • Y.-J. Wang et al.

    A dynamic clustering based differential evolution algorithm for global optimization

    Eur. J. Oper. Res.

    (2007)
  • G. Acampora et al.

    A multi-agent memetic system for human-based knowledge selection

    IEEE Trans. Syst. Man Cybern. – Part A

    (2011)
  • G. Acampora et al.

    Hierarchical optimization of personalized experiences for e-learning systems through evolutionary models

    Neural Comput. Appl.

    (2011)
  • A. Auger et al.

    Continuous lunches are free!

  • M.S. Bazaraa et al.

    Nonlinear Programming: Theory And Algorithms

    (2006)
  • M.F. Bramlette, Initialization, mutation and selection methods in genetic algorithms for function optimization, in:...
  • J. Brest et al.

    Self-adapting control parameters in differential evolution: a comparative study on numerical benchmark problems

    IEEE Trans. Evol. Comput.

    (2006)
  • J. Brest et al.

    Differential evolution and differential ant-stigmergy on dynamic optimisation problems

    Int. J. Syst. Sci.

    (2013)
  • J. Brest et al.

    Self-adaptive differential evolution algorithm using population size reduction and three strategies

    Soft Comput.

    (2011)
  • J. Brest et al.

    Population size reduction for the differential evolution algorithm

    Appl. Intell.

    (2008)
  • E.K. Burke, Y. Bykov, in: PATAT ’08 Proceedings of the 7th International Conference on the Practice and Theory of...
  • E.K. Burke et al.

    A tabu search hyperheuristic for timetabling and rostering

    J. Heuristics

    (2003)
  • A. Caponio et al.

    A fast adaptive memetic algorithm for on-line and off-line control design of PMSM drives

    IEEE Trans. Syst. Man Cybern. – Part B

    (2007)
  • A. Caponio et al.

    Super-fit control adaptation in memetic differential evolution frameworks

    Soft Comput. – Fusion Found. Methodol. Appl.

    (2009)
  • F. Caraffini et al.

    The importance of being structured: a comparative study on multi stage memetic approaches

  • F. Caraffini, F. Neri, I. Poikolainen, Micro-differential evolution with extra moves along the axes. In: Proceedings of...
  • P. Cowling et al.

    A hyperheuristic approach to scheduling a sales summit

  • S. Das et al.

    Problem Definitions and Evaluation Criteria for CEC 2011 Competition on Testing Evolutionary Algorithms on Real World Optimization Problems

    (2010)
  • S. Das et al.

    Differential evolution: a survey of the state-of-the-art

    IEEE Trans. Evol. Comput.

    (2011)
  • J. Du et al.

    Memetic algorithms, domain knowledge, and financial investing

    Memetic Comput.

    (2012)
  • Cited by (87)

    View all citing articles on Scopus
    1

    Tel.: +358 14 260 1211; fax +358 14 260 1021.

    View full text