Regarding the rankings of optimization heuristics based on artificially-constructed benchmark functions
Introduction
Although plenty of novel single-objective Evolutionary Algorithms (EAs) is proposed each year, comparing their performance in a trustful manner is not an easy task. As the ultimate goal of creating novel EAs should be their application to real-world problems that may be of interest to researchers from different fields of science, the problem of meaningful testing of novel approaches is of high importance.
It is widely accepted that the search for the optimization method that would perform well for any problem cannot be successful. Since the proofs of No Free Lunch theorems (NFL) for optimization [79], [80], [41], [17] it is known that the averaged performance of all non-revisiting heuristic algorithms (the ones that to guide the further search use only information about function values measured at previously visited points) over all possible problems (which number is finite when both domain and codomain are finite, what is always fulfilled when digital computers are used [80], irrespective that one frequently claims to be interested in optimization in continuous domain) would be equal if no distinction in the significance of different problems is set. In [63] it was proven that NFL holds for any closed under permutation (c.u.p.) subset of all possible problems. However, it was also shown that vast majority of problems that belong to a subset of problems that is c.u.p. cannot be in fact of interest to anyone [35], [36]. Most such problems are simply random ones, without any pattern in the function values – the visualized fitness landscape of such problems would look like a hyper-blurry. In addition, some of problems within any subset that is c.u.p. must have the global minimum and the global maximum neighbored, making any optimization effort un-robust [35], [36]. Such kinds of problems are sometimes called “out of interest” to anyone, or “not searchable in practice” [9]. As the problems that may indeed be of interest to someone are only a small fraction of all problems that are c.u.p. [36], contrary to some popular beliefs the NFL does not dismiss the search for the heuristics that could be successfully applicable to wide variety of practical problems.
Although some methods have been proposed to develop problems that would diversify performance of various heuristics [43], novel EAs are usually compared with state-of-the-art algorithms on published collections of benchmark problems. A number of such collections have been proposed for single-objective optimization methods [81], [70], [72], [45], but majority of them are composed of artificially-constructed benchmark functions. Some exceptions, like CEC 2011 set [13] are still rarely used in practice. The goal of the tested algorithm is usually to find the point in the search space (frequently within box-constraints) with the extreme (usually minimum) function value. Such artificial functions are often created according to some “concept” how to make the search tricky enough in order to test some properties of algorithms. The question arise whether such way of constructing and testing novel EAs is not hampered by human-induced bias, that may unfavorably affect the comparison of different methods. It may easily turn out that the comparison based on such conceptually-created benchmarks would not allow researchers pointing out the optimizers that would then perform best for other problems, including real-world ones. Important point is that many real-world, as well as some artificially created problems are deceptive, what means that the features of their fitness landscape “allure” the majority of EAs to local optima [28], [78]. The algorithms that better cope with deceptive problems are rarely the most efficient for the more “classical” ones, and vice versa.
It is not difficult to agree that using various sets of artificially constructed problems (see [6]), or different numbers of maximum function calls (as shown in e.g. [58]), different algorithms may be pointed out as the best ones – what unfortunately makes introduction of the novel (of course claimed to be better) methods easier (for critical discussion of some novel heuristics see [67]). But the major goal of this paper is to go a step further and verify the meaningfulness of classical benchmarks by testing EAs on the same artificial functions twice. The first test follows the constructors’ idea and is based on searching for the global minimum of the specified functions. The second test is performed by using the same functions, but, contrary to the inventors’ suggestion, searching for their maximum. The very popular collection of CEC 2005 benchmark functions [70] is chosen for the study.
Obviously the difficulty (irrespective how defined and measured [11], [46]), the number of local optima and deceptive features of both kinds of problems, namely minimization and maximization of CEC 2005 functions, are different. Also more frequently the global optima are located on, or close to, the bounds in case of maximization than in case of minimization problems. However, the fitness landscape, box-constraints, initialization ranges and maximum number of function calls remains the same in both tests. The simple idea is to throw off the “concept” that is hidden behind artificially-created problems, and verify the performance of algorithms in cases when fox-holes or narrow valleys (for which most researchers tune their optimizers) loose their meaning. As there is little reason that searching for the minimum of artificially-constructed benchmark functions could be “more important” than searching for their maximum, this simple approach may verify the relevance of using such collections of benchmark problems to show superiority of some EAs over others.
The present paper may also point out another question. Imagine that one needs to find the best and the worst solution of some problem – can the same heuristic algorithm be successfully used for both tasks? Probably most practitioners would hope so, at least because few could be satisfied with taking the burden of finding the proper optimization method twice.
Alongside the main goal, some discussion regarding the choice of the mean or the median performance for comparison is addressed in the paper. The differences in rankings of algorithms obtained for benchmark sets of various dimensionalities are also discussed. Finally, some debate regarding the overall performance of particular algorithms is given. As optimization algorithms are usually compared in papers in which novel methods are introduced, the present study may be a chance to make such comparison without any hidden temptation to show that the just-introduced algorithm is indeed worth publication.
Section snippets
Problems, algorithms and methods used for comparison
Among a number of collections of benchmark problems based on artificially-created functions [81], [70], [72], [45] the CEC 2005 set [70] is selected for this study due to its wide popularity in testing various EAs during recent years [75], [76], [38], [23], [5], [30], [22], [62], [66], [56], [6]. The basic properties of 25 classical CEC 2005 minimization problems are given in Table 1. The standard 10-, 30-, and 50-dimensional versions are used in this paper and the maximum number of function
Results
The discussion in this section is divided into three parts. First, the performance of all algorithms for minimization of CEC 2005 functions is highlighted (Section 3.1), allowing a guess how particular methods could be ranked in a “classical” test. Then the results achieved for maximization of the same functions are discussed and the algorithms rankings based on minimization and maximization problems are compared (Section 3.2). Finally, some discussion on the performance of “historical” methods
Conclusions
Novel EAs are usually tested on selected set of artificially-created benchmark functions, which are often developed in order to make their minimization tricky. In this paper it is shown that the ranking of optimization algorithms may noticeably differ when the problems to be solved are defined as minimization or maximization of the same artificially-created benchmark functions.
According to the empirical results achieved for a set of CEC 2005 problems that aim at minimization of 10-, 30- and
Acknowledgments
This work was financed from the Polish public budget for science (2013–2015) by MNiSW, Grant No. IP2012 040672.
Author would like to thank Prof. Ponnuthurai N. Suganthan for providing the Matlab codes of MDE_pBX, DCMA and CLPSO algorithms, Prof. Jasper A. Vrugt for providing the Matlab code of AMALGAM, and Dr. Wei Chu for providing the Matlab code of SP-UCI. Author is also grateful to Prof. Ferrante Neri for his valuable comments on SFMDE algorithm.
References (83)
- et al.
A clustering-based differential evolution for global optimization
Appl. Soft Comput.
(2011) - et al.
Parallel memetic structures
Inf. Sci.
(2013) - et al.
An analysis on separability for memetic computing automatic design
Inf. Sci.
(2014) - et al.
Enhancing distributed differential evolution with multicultural migration for global numerical optimization
Inf. Sci.
(2013) - et al.
A new evolutionary search strategy for global optimization of high-dimensional problems
Inf. Sci.
(2011) - et al.
An adaptive invasion-based model for distributed differential evolution
Inf. Sci.
(2014) - et al.
A new genetic algorithm for solving optimization problems
Eng. Appl. Artif. Intell.
(2014) - et al.
Self-adaptive mix of particle swarm methodologies for constrained optimization
Inf. Sci.
(2014) - et al.
Evolving cognitive and social experience in particle swarm optimization through differential evolution: a hybrid approach
Inf. Sci.
(2012) - et al.
Advanced nonparametric tests for multiple comparison in the design of experiments in computational intelligence and data mining: experimental analysis of power
Inf. Sci.
(2010)