Elsevier

Information Sciences

Volume 297, 10 March 2015, Pages 191-201
Information Sciences

Regarding the rankings of optimization heuristics based on artificially-constructed benchmark functions

https://doi.org/10.1016/j.ins.2014.11.023Get rights and content

Abstract

Novel Evolutionary Algorithms are usually tested on sets of artificially-constructed benchmark problems. Such problems are often created to make the search of one global extremum (usually minimum) tricky. In this paper it is shown that benchmarking heuristics on either minimization or maximization of the same set of artificially-created functions (with equal bounds and number of allowed function calls) may lead to very different ranking of tested algorithms. As Evolutionary Algorithms and other heuristic optimizers are developed in order to be applicable to real-world problems, such result may raise doubts on the practical meaning of benchmarking them on artificial functions, as there is little reason that searching for the minimum of such functions should be more important than searching for their maximum.

Thirty optimization heuristics, including a number of variants of Differential Evolution, as well as other kinds of Evolutionary Algorithms, Particle Swarm Optimization, Direct Search methods and – following the idea borrowed from No Free Lunch – pure random search are tested in the paper. Some discussion regarding the choice of the mean or the median performance for comparison is addressed and a short debate on the overall performance of particular methods is given.

Introduction

Although plenty of novel single-objective Evolutionary Algorithms (EAs) is proposed each year, comparing their performance in a trustful manner is not an easy task. As the ultimate goal of creating novel EAs should be their application to real-world problems that may be of interest to researchers from different fields of science, the problem of meaningful testing of novel approaches is of high importance.

It is widely accepted that the search for the optimization method that would perform well for any problem cannot be successful. Since the proofs of No Free Lunch theorems (NFL) for optimization [79], [80], [41], [17] it is known that the averaged performance of all non-revisiting heuristic algorithms (the ones that to guide the further search use only information about function values measured at previously visited points) over all possible problems (which number is finite when both domain and codomain are finite, what is always fulfilled when digital computers are used [80], irrespective that one frequently claims to be interested in optimization in continuous domain) would be equal if no distinction in the significance of different problems is set. In [63] it was proven that NFL holds for any closed under permutation (c.u.p.) subset of all possible problems. However, it was also shown that vast majority of problems that belong to a subset of problems that is c.u.p. cannot be in fact of interest to anyone [35], [36]. Most such problems are simply random ones, without any pattern in the function values – the visualized fitness landscape of such problems would look like a hyper-blurry. In addition, some of problems within any subset that is c.u.p. must have the global minimum and the global maximum neighbored, making any optimization effort un-robust [35], [36]. Such kinds of problems are sometimes called “out of interest” to anyone, or “not searchable in practice” [9]. As the problems that may indeed be of interest to someone are only a small fraction of all problems that are c.u.p. [36], contrary to some popular beliefs the NFL does not dismiss the search for the heuristics that could be successfully applicable to wide variety of practical problems.

Although some methods have been proposed to develop problems that would diversify performance of various heuristics [43], novel EAs are usually compared with state-of-the-art algorithms on published collections of benchmark problems. A number of such collections have been proposed for single-objective optimization methods [81], [70], [72], [45], but majority of them are composed of artificially-constructed benchmark functions. Some exceptions, like CEC 2011 set [13] are still rarely used in practice. The goal of the tested algorithm is usually to find the point in the search space (frequently within box-constraints) with the extreme (usually minimum) function value. Such artificial functions are often created according to some “concept” how to make the search tricky enough in order to test some properties of algorithms. The question arise whether such way of constructing and testing novel EAs is not hampered by human-induced bias, that may unfavorably affect the comparison of different methods. It may easily turn out that the comparison based on such conceptually-created benchmarks would not allow researchers pointing out the optimizers that would then perform best for other problems, including real-world ones. Important point is that many real-world, as well as some artificially created problems are deceptive, what means that the features of their fitness landscape “allure” the majority of EAs to local optima [28], [78]. The algorithms that better cope with deceptive problems are rarely the most efficient for the more “classical” ones, and vice versa.

It is not difficult to agree that using various sets of artificially constructed problems (see [6]), or different numbers of maximum function calls (as shown in e.g. [58]), different algorithms may be pointed out as the best ones – what unfortunately makes introduction of the novel (of course claimed to be better) methods easier (for critical discussion of some novel heuristics see [67]). But the major goal of this paper is to go a step further and verify the meaningfulness of classical benchmarks by testing EAs on the same artificial functions twice. The first test follows the constructors’ idea and is based on searching for the global minimum of the specified functions. The second test is performed by using the same functions, but, contrary to the inventors’ suggestion, searching for their maximum. The very popular collection of CEC 2005 benchmark functions [70] is chosen for the study.

Obviously the difficulty (irrespective how defined and measured [11], [46]), the number of local optima and deceptive features of both kinds of problems, namely minimization and maximization of CEC 2005 functions, are different. Also more frequently the global optima are located on, or close to, the bounds in case of maximization than in case of minimization problems. However, the fitness landscape, box-constraints, initialization ranges and maximum number of function calls remains the same in both tests. The simple idea is to throw off the “concept” that is hidden behind artificially-created problems, and verify the performance of algorithms in cases when fox-holes or narrow valleys (for which most researchers tune their optimizers) loose their meaning. As there is little reason that searching for the minimum of artificially-constructed benchmark functions could be “more important” than searching for their maximum, this simple approach may verify the relevance of using such collections of benchmark problems to show superiority of some EAs over others.

The present paper may also point out another question. Imagine that one needs to find the best and the worst solution of some problem – can the same heuristic algorithm be successfully used for both tasks? Probably most practitioners would hope so, at least because few could be satisfied with taking the burden of finding the proper optimization method twice.

Alongside the main goal, some discussion regarding the choice of the mean or the median performance for comparison is addressed in the paper. The differences in rankings of algorithms obtained for benchmark sets of various dimensionalities are also discussed. Finally, some debate regarding the overall performance of particular algorithms is given. As optimization algorithms are usually compared in papers in which novel methods are introduced, the present study may be a chance to make such comparison without any hidden temptation to show that the just-introduced algorithm is indeed worth publication.

Section snippets

Problems, algorithms and methods used for comparison

Among a number of collections of benchmark problems based on artificially-created functions [81], [70], [72], [45] the CEC 2005 set [70] is selected for this study due to its wide popularity in testing various EAs during recent years [75], [76], [38], [23], [5], [30], [22], [62], [66], [56], [6]. The basic properties of 25 classical CEC 2005 minimization problems are given in Table 1. The standard 10-, 30-, and 50-dimensional versions are used in this paper and the maximum number of function

Results

The discussion in this section is divided into three parts. First, the performance of all algorithms for minimization of CEC 2005 functions is highlighted (Section 3.1), allowing a guess how particular methods could be ranked in a “classical” test. Then the results achieved for maximization of the same functions are discussed and the algorithms rankings based on minimization and maximization problems are compared (Section 3.2). Finally, some discussion on the performance of “historical” methods

Conclusions

Novel EAs are usually tested on selected set of artificially-created benchmark functions, which are often developed in order to make their minimization tricky. In this paper it is shown that the ranking of optimization algorithms may noticeably differ when the problems to be solved are defined as minimization or maximization of the same artificially-created benchmark functions.

According to the empirical results achieved for a set of CEC 2005 problems that aim at minimization of 10-, 30- and

Acknowledgments

This work was financed from the Polish public budget for science (2013–2015) by MNiSW, Grant No. IP2012 040672.

Author would like to thank Prof. Ponnuthurai N. Suganthan for providing the Matlab codes of MDE_pBX, DCMA and CLPSO algorithms, Prof. Jasper A. Vrugt for providing the Matlab code of AMALGAM, and Dr. Wei Chu for providing the Matlab code of SP-UCI. Author is also grateful to Prof. Ferrante Neri for his valuable comments on SFMDE algorithm.

References (83)

  • S. Ghosh et al.

    A differential covariance matrix adaptation evolutionary algorithm for real parameter optimization

    Inf. Sci.

    (2012)
  • W.Y. Gong et al.

    Adaptive strategy selection in differential evolution for numerical optimization: an empirical study

    Inf. Sci.

    (2011)
  • G. Iacca et al.

    Ockham’s razor in memetic computing: three stage optimal memetic exploration

    Inf. Sci.

    (2012)
  • C. Igel et al.

    On classes of functions for which no free lunch results hold

    Inf. Process. Lett.

    (2003)
  • D.L. Jia et al.

    An effective memetic differential evolution algorithm based on chaotic local search

    Inf. Sci.

    (2011)
  • K.M. Malan et al.

    A survey of techniques for characterising fitness landscapes and some possible ways forward

    Inf. Sci.

    (2013)
  • R. Mallipeddi et al.

    Differential evolution algorithm with ensemble of parameters and mutation strategies

    Appl. Soft Comput.

    (2011)
  • F. Neri et al.

    Compact particle swarm optimization

    Inf. Sci.

    (2013)
  • Q.K. Pan et al.

    A differential evolution algorithm with self-adapting strategy and control parameters

    Comput. Oper. Res.

    (2011)
  • A.P. Piotrowski et al.

    Differential evolution algorithm with separated groups for multi-dimensional optimization problems

    Eur. J. Oper. Res.

    (2012)
  • A.P. Piotrowski et al.

    Corrigendum to: “Differential evolution algorithm with separated groups for multi-dimensional optimization problems” [Eur. J. Oper. Res. 216, 33–46]

    Eur. J. Oper. Res.

    (2012)
  • A.P. Piotrowski

    Adaptive memetic differential evolution with global and local neighborhood-based mutation operators

    Inf. Sci.

    (2013)
  • A.P. Piotrowski et al.

    How novel is the “novel” black hole optimization approach?

    Inf. Sci.

    (2014)
  • A.P. Piotrowski et al.

    Comparing large number of metaheuristics for artificial neural networks training to predict water temperature in a natural river

    Comput. Geosci.

    (2014)
  • D. Simon et al.

    Linearized biogeography-based optimization with re-initialization and local search

    Inf. Sci.

    (2014)
  • K. Tang et al.

    Population-based algorithm portfolios with automated constituent algorithms selection

    Inf. Sci.

    (2014)
  • A. Ulas et al.

    Cost-conscious comparison of supervised learning algorithms over multiple data sets

    Pattern Recogn.

    (2012)
  • G.G. Wang et al.

    Chaotic krill herd algorithm

    Inf. Sci.

    (2014)
  • Y. Zhou et al.

    A differential evolution algorithm with intersect mutation operator

    Appl. Soft Comput.

    (2013)
  • G. Bergmann et al.

    Improvements of general multiple test procedures for redundant systems of hypotheses

  • J. Brest et al.

    Self-adaptive differential evolution algorithm using population size reduction and three strategies

    Soft. Comput.

    (2011)
  • A. Caponio et al.

    Super-fit control adaptation in memetic differential evolution frameworks

    Soft. Comput.

    (2009)
  • W.N. Chen et al.

    Particle swarm optimization with an aging leader and challengers

    IEEE Trans. Evol. Comput.

    (2013)
  • S. Christensen, F. Oppacher, What can we learn from no free lunch? A first attempt to characterize the concept of a...
  • M. Clerc

    Particle Swarm Optimization

    (2006)
  • S. Das et al.

    Differential evolution using a neighborhood-based mutation operator

    IEEE Trans. Evol. Comput.

    (2009)
  • S. Das, P.N. Suganthan, Problem Definitions and Evaluation Criteria for CEC 2011 Competition on Testing Evolutionary...
  • S. Das et al.

    Differential evolution: a survey of the state-of-the-art

    IEEE Trans. Evol. Comput.

    (2011)
  • J. Demsar

    Statistical comparisons of classifiers over multiple data sets

    J. Mach. Learn. Res.

    (2006)
  • E.A. Duenez-Guzman et al.

    No free lunch and benchmarks

    Evol. Comput.

    (2013)
  • O.J. Dunn

    Multiple comparisons among means

    J. Am. Stat. Assoc.

    (1961)
  • Cited by (0)

    View full text