Invited paper
A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms

https://doi.org/10.1016/j.swevo.2011.02.002Get rights and content

Abstract

The interest in nonparametric statistical analysis has grown recently in the field of computational intelligence. In many experimental studies, the lack of the required properties for a proper application of parametric procedures–independence, normality, and homoscedasticity–yields to nonparametric ones the task of performing a rigorous comparison among algorithms.

In this paper, we will discuss the basics and give a survey of a complete set of nonparametric procedures developed to perform both pairwise and multiple comparisons, for multi-problem analysis. The test problems of the CEC’2005 special session on real parameter optimization will help to illustrate the use of the tests throughout this tutorial, analyzing the results of a set of well-known evolutionary and swarm intelligence algorithms. This tutorial is concluded with a compilation of considerations and recommendations, which will guide practitioners when using these tests to contrast their experimental results.

Introduction

In recent years, the use of statistical tests to improve the evaluation process of the performance of a new method has become a widespread technique in computational intelligence. Usually, they are employed inside the framework of any experimental analysis to decide when one algorithm is considered better than another. This task, which may not be trivial, has become necessary to confirm whether a new proposed method offers a significant improvement, or not, over the existing methods for a given problem.

Statistical procedures developed to perform statistical analyses can be categorized into two classes: parametric and nonparametric, depending on the concrete type of data employed [1]. Parametric tests have been commonly used in the analysis of experiments in computational intelligence. Unfortunately, they are based on assumptions which are most probably violated when analyzing the performance of stochastic algorithms based on computational intelligence [2], [3]. These assumptions are known as independence, normality, and homoscedasticity. To overcome this problem, our interest is focused on nonparametric statistical procedures, which provide to the researcher a practical tool to use when the previous assumptions cannot be satisfied, especially in multi-problem analysis.

In this paper, the use of several nonparametric procedures for pairwise and multiple comparison procedures is illustrated. Our objectives are as follows.

  • To give a comprehensive and useful tutorial about the use of nonparametric statistical tests in computational intelligence, using tests already proposed in several papers of the literature [2], [3], [4], [5]. Through several examples of application, we will show their properties, and how the use of this complete framework can improve the way in which researchers and practitioners contrast the results achieved in their experimental studies.

  • To analyze the lessons learned through their use, providing a wide list of guidelines which may guide users of these tests when selecting procedures for a given case of study.

For each kind of test, a complete case of application is shown. A contest held in the CEC’2005 special session on real parameter optimization defined a complete suite of benchmarking functions (publicly available; see [6]), considering several well-known domains for real parameter optimization. These benchmark functions will be used to compare several evolutionary and swarm intelligence continuous optimization techniques, whose differences will be contrasted through the use of nonparametric procedures.

To do so, this paper is organized as follows. Section 2 shows the experimental framework considered for the application of the statistical methods and gives some preliminary background. Section 3 describes the nonparametric tests for pairwise comparisons. Section 4 deals with multiple comparisons by designating a control method, whereas Section 5 deals with multiple comparisons among all methods. Section 6 surveys several recommendations and considerations on the use of nonparametric tests. Finally, Section 7 concludes this tutorial.

Section snippets

Preliminaries

In this section, the benchmark functions (Section 2.1) and the evolutionary and swarm intelligence algorithms considered for our case of study (Section 2.2) are presented. Furthermore, some basic concepts on inferential statistics are introduced (Section 2.3), providing the necessary background for properly presenting the statistical procedures included in this tutorial.

Pairwise comparisons

Pairwise comparisons are the simplest kind of statistical tests that a researcher can apply within the framework of an experimental study. Such tests are directed to compare the performance of two algorithms when applied to a common set of problems. In multi-problem analysis, a value for each pair algorithm/problem is required (often an average value from several runs).

In this section, first we focus our attention on a quick and easy, yet not very powerful, procedure, which can provide a first

Multiple comparisons with a control method

One of the most frequent situations where the use of statistical procedures is requested is in the joint analysis of the results achieved by various algorithms. The groups of differences between these methods (also called blocks) are usually associated with the problems met in the experimental study. For example, in a multiple problem comparison, each block corresponds to the results offered over a specific problem. When referring to multiple comparisons tests, a block is composed of three or

Multiple comparisons among all methods

Friedman’s test is an omnibus test which can be used to carry out these types of comparison. It allows us to detect differences considering the global set of algorithms. Once Friedman’s test rejects the null hypothesis, we can proceed with a post-hoc test in order to find the concrete pairwise comparisons which produce differences. In the previous section, we focused on procedures that control the FWER when comparing with a control algorithm, arguing that the objective of a study is to test

Considerations and recommendations on the use of nonparametric tests

This section notes some considerations and recommendations concerning the nonparametric tests presented in this tutorial. Their characteristics as well as suggestions on some of their aspects and details of the multiple comparisons tests are presented. With this aim, some general considerations and recommendations are given first (Section 6.1). Then, some advanced guidelines for multiple comparisons with a control method (Section 6.2) and multiple comparisons among all methods (Section 6.3) are

Conclusions

In this work, we have shown a complete set of nonparametric statistical procedures and their application to contrast the results obtained in experimental studies of continuous optimization algorithms. The wide set of methods considered, ranging from basic techniques such as the Sign test or Contrast Estimation, to more advanced approaches such as the Friedman Aligned and Quade tests, include tools which can help practitioners in many situations in which the results of an experimental study need

Acknowledgements

This work was supported by Project TIN2008-06681-C06-01. J. Derrac holds a research scholarship from the University of Granada.

References (53)

  • P. Suganthan, N. Hansen, J. Liang, K. Deb, Y. Chen, A. Auger, S. Tiwari, Problem definitions and evaluation criteria...
  • J. Kennedy, R. Eberhart, Particle swarm optimization, in: Proceedings of IV IEEE International Conference on Neural...
  • A. Auger, N. Hansen, A restart CMA evolution strategy with increasing population size, in: Proceedings of the 2005 IEEE...
  • C. Fernandes, A. Rosa, A study of non-random matching and varying population size in genetic algorithm using a royal...
  • H. Mülenbein et al.

    Predictive models for the breeding genetic algorithm in continuous parameter optimization

    Evolutionary Computation

    (1993)
  • M. Laguna et al.

    Scatter Search. Methodology and Implementation in C

    (2003)
  • K.V. Price et al.

    Differential Evolution: A Practical Approach to Global Optimization

    (2005)
  • A.K. Qin, P.N. Suganthan, Self-adaptive differential evolution algorithm for numerical optimization, in: Proceedings of...
  • T. Bartz-Beielstein

    Experimental Research in Evolutionary Computation: The New Experimentalism

    (2006)
  • D. Ortiz-Boyer et al.

    Improving crossover operators for real-coded genetic algorithms using virtual parents

    Journal of Heuristics

    (2007)
  • W. Conover

    Practical Nonparametric Statistics

    (1998)
  • M. Hollander et al.

    Nonparametric Statistical Methods

    (1999)
  • R.A. Fisher

    Statistical Methods and Scientific Inference

    (1959)
  • D.J. Sheskin

    Handbook of Parametric and Nonparametric Statistical Procedures

    (2006)
  • C. García-Martínez et al.

    Evaluating a local genetic algorithm as context-independent local search operator for metaheuristics

    Soft Computing

    (2010)
  • J.D. Gibbons et al.

    Nonparametric Statistical Inference

    (2010)
  • Cited by (0)

    View full text