Generalized Simulated Annealing

Many problems in mathematics, statistics, finance, biology, pharmacology, physics, applied mathematics, economics, and chemistry involve the determination of the global minimum of multidimensional real-valued functions. Simulated annealing methods have been widely used for different global optimization problems. Multiple versions of simulated annealing have been developed, including classical simulated annealing (CSA), fast simulated annealing (FSA), and generalized simulated annealing (GSA). After revisiting the basic idea of GSA using Tsallis statistics, we implemented a modified GSA approach using the R package GenSA. This package was designed to solve complicated nonlinear objective functions with a large number of local minima. In this chapter, we provide a brief introduction to this R package and demonstrate its utility by solving non-convexoptimization problems in different fields: physics, environmental science, and finance. We performed a comprehensive comparison between GenSA and other widely used R packages, including rgenoud and DEoptim. GenSA is useful and can provide a solution that is comparable with or even better than that provided by other widely used R packages for optimization.


Introduction
Determining the global minimum of a multidimensional function is the focus of many problems in statistics, biology, physics, applied mathematics, economics, and chemistry [1][2][3][4][5][6]. Although there is a wide spectrum of problems, computing the global minimum remains a challenging task, because, for example, modern problem dimensionality is increasing. the quasi-Newton method. These methods can also provide reasonable results for the study of simple non-convex functions with only a few dimensions and well-separated local minima.
Deterministic methods are usually faster than stochastic methods although they tend to be trapped into a local minimum. To overcome this particular issue, stochastic methods have been widely developed and can determine a good approximation of the global minimum with a modest computational cost. Among stochastic methods, genetic algorithms [7], evolution algorithms [8], simulated annealing (SA) [9], and taboo search [10][11][12] have been successfully applied.
Among popular approaches, genetic algorithms [7] mimic the process of natural DNA evolution. In this approach, a population of randomly generated solutions is generated. The solutions are encoded as strings and evolve over many iterations toward better solutions. In each generation, the fitness of each individual in the population is evaluated, and in the next generation, strings are generated by crossover, mutation, and selection, based on their fitness. Differential evolution belongs to such genetic algorithms.
Ant colony optimization (ACO) [13] is another set of stochastic optimization methods, which is inspired by ants wandering to find food for the colony. An ant starts wandering randomly while laying down pheromone trails that will influence other ants because they will be attracted (increase in probability) by the trail, and if they eventually locate food, will return and reinforce the trail. To avoid the algorithm converging to local minima, the pheromone trail is set to evaporate proportionally to the time it takes to traverse the trail to decrease its attractiveness. As a consequence, the pheromone density of short paths becomes higher than that of longer paths. The design of ACO perfectly matches graph-based optimization (e.g., traveling salesman problem), but it can be adapted to determine the global minimum of realvalued functions [14] by allowing local random moves in the neighborhood of the current states of the ant.
The SA algorithm was inspired by the annealing process that takes place in metallurgy, whereby annealing a molten metal causes it to achieve its global minimum in terms of thermodynamic energy (crystalline state) [9]. In the SA algorithm, the objective function is treated as the energy function of a molten metal, and one or more artificial temperatures are introduced and gradually cooled, which is analogous to the annealing technique, to attempt to achieve the global minimum. To escape from local minima, this artificial temperature (or set of temperatures) acts as a source of stochasticity. Following the metallurgy analogy, at the end of the process, the system is foreseen to reside inside the attractive basin of the global minimum (or in one of the global minima if more than one global minimum exists). In classical simulated annealing (CSA), the visiting distribution is a Gaussian function (a local search distribution) for each temperature. It has been observed that this distribution is not optimal for moving across the entire search space [5]. Generalized simulated annealing (GSA) was developed to overcome this issue by using a distorted Cauchy-Lorentz distribution.
For a more extensive review of stochastic optimization algorithms, see the review provided by Fouskakis and Draper [15].
The R language and environment for statistical computing will be the language of choice in this chapter because it enables a fast implementation of algorithms, access to a variety of statistical modeling packages, and easy-to-use plotting functionalities. These advantages make the use of R preferable in many situations to other programming languages, such as Java, C++, Fortran, and Pascal [16].
In this chapter, we elaborate further on the background and improvements of GSA and the use of the R package GenSA [17], which is an implementation of a modified GSA. We will also discuss the performance of GenSA and show that it outperforms the genetic algorithm (R package rgenoud) and differential evolution (R package DEoptim) in an extensive testbed comprising 134 testing functions based on the success rate and number of function calls. The core function of GenSA is written in C++ to ensure that the package runs as efficiently as possible. The utility of this R package and its use will be presented by way of several applications, such as the famous Thomson problem in physics, non-convex portfolio optimization in finance, and kinetic modeling of pesticide degradation in environmental science.

Method
As mentioned above, SA methods attempt to determine the global minimum of any objective function by simulating the annealing process of a molten metal. Given an objective function f ðxÞ with x ¼ ðx 1 , x 2 , …, x n Þ T , we attempt to determine its global minimum using SA. The general procedure for SA is as follows: 1. Generate an initial state x 0 ¼ ðx 0 1 , x 0 2 , …, x 0 n Þ T randomly and obtain its function value E 0 ¼ f ðx 0 Þ. An initial temperature T 0 is set. imax is set to be any big integer.

For step
• The temperature T i is decreased according to some cooling function.
• Generate a new state x i ¼ x i−1 þ Δx, where Δx follows a predefined visiting distribution (e.g., Gaussian distribution).
• Calculate the probability p of x i−1 ! x i .If p < randomð0, 1Þ, x i is set back to its previous state x i−1 and E i is also set back to E i−1 .

3.
Output the final state x imax and its function value E imax .
We provide more details of SA methods as follows.

Classical simulated annealing (CSA)
According to the process of cooling and the visiting distribution, SA methods can be classified into several categories, amongwhich CSA [9], fast simulated annealing (FSA) [18], and the GSA [19] are the most common.
In CSA, proposed by Kirkpatrick et al., the visiting distribution is a Gaussian function, which is a local search distribution [5,19]: where Δx is the trial jump distance of variable x and T is an artificial temperature in the reduced unit. In a local search distribution, for example, a Gaussian distribution Δx is always localized around zero. The jump is accepted if it is downhill of the energy/fitness/objective function. If the jump is uphill, it might be accepted according to an acceptance probability, which is computed using the Metropolis algorithm [20]: Geman and Geman [21] showed that for the classical case, a necessary and sufficient condition for having probability 1 of ending at the global minimum is that the temperature decreases logarithmically with the simulation time, which is impossible in practice because this would dramatically increase the computational time.

Fast simulated annealing (FSA)
In 1987, Szu and Hartley proposed a method called FSA [18], in which the Cauchy-Lorentz visiting distribution, that is, a semi-local search distribution, is introduced: where D is the dimension of the variable space. In a semi-local search distribution, for example, the Cauchy-Lorentz distribution, the jumps Δx are frequently local, but can occasionally be quite long. The temperature T in FSA decreases with the inverse of the simulation time, and the acceptance algorithm is the Metropolis algorithm shown in Eq.(2).

Introduction to GSA
A generalization of classical statistical mechanics was proposed by Tsallis and Stariolo [19]. In the Tsallis formalism, a generalized statistic is built from generalized entropy: where q is a real number, i is the index of the energy spectrum, and s q tends to information entropy: when q ! 1. Maximizing the Tsallis entropy with the constraints where ε i is the energy spectrum, and the generalized probability distribution is determined to be where z q is the normalization constant that ensures that p i integrates to 1. This distribution pointwise converges to the Gibbs-Boltzmann distribution, where q tends to 1.
The GSA algorithm [19] refers to the generalization of both CSA and FSA according to Tsallis statistics. It uses the Tsallis-Stariolo form of the Cauchy-Lorentz visiting distribution, whose shape is controlled by the parameter q v : Please note that q v also controls the rate of cooling: where T q v is the visiting temperature. In turn, a generalized Metropolis algorithm is used for the acceptance probability: where β ¼ 1=ðKT q a Þ. The acceptance probability is controlled by the artificial temperature, T q a . When q v ¼ 1 and q a ¼ 1, then GSA is exactly CSA. Another special instance is given by q v ¼ 2 and q a ¼ 1 for which GSA is exactly FSA. Asymptotically, GSA has a similar behavior thanthe stochastic steepest descentas T ! 0. A faster cooling than CSA and FSA is achieved when It has been shown in a few instances that GSA is superior to FSA and CSA. Xiang et al. established that a pronounced reduction in the fluctuation of energy and a faster convergence to the global minimum are achieved in the optimization of the Thomson problem and Nickel cluster structure [4][5][6]. Lemes et al. [22] observed a similar trend when optimizing the structure of a silicon cluster.

Improvement of GSA
A simple technique to accelerate convergence is as follows: where T q a is the acceptance temperature, T q v is the visiting temperature, and t is the number of time steps. Our testing shows that this simple technique accelerates convergence [6].
The performance of GSA can be further improved by combining GSA with a local search method, large-scale bound-constrained Broyden-Fletcher-Goldfarb-Shanno (BFGS) method [23] for a smooth function, or Nelder-Mead [24] for a non-smooth function. A local search is performed at the end of each Markov chain for GSA.

Results
The GenSA R package has been developed and added to the toolkit for solving optimization problems in the Comprehensive R Archive Network (CRAN) R packages repository. The package GenSA has proven to be a useful tool for solving global optimization problems [17].

Usage
The GenSA R package provides a unique function called GenSA whose arguments were described in the associated help [25]: par: Numeric vector. Initial values for the parameters to be optimized over. Default value is NULL, in which case, the initial values will be generated automatically.
lower: Numeric vector with a length of par. Lower bounds for the parameters to be optimized over.
upper: Numeric vector with a length of par. Upper bounds for the parameters to be optimized over.
fn: A function to be minimized, where the first argument is the vector of parameters over which minimization is to take place. It should return a scalar result. The function has to always return a numerical value; however, not applicable (NA) and infinity are supported.
...: Allows the user to pass additional arguments into fn.
control: A list of control parameters, discussed below. The control argument is a list that can be used to control the behavior of the algorithm. Some components of control are the following: • maxit: Integer. Maximum number of iterations of the algorithm. Default value is 5000, which could be increased by the user for the optimization of a very complicated objective function.
• threshold.stop: Numeric. The program will stop when the objective function value is≤ threshold.stop. Default value is NULL.
• smooth: Logical. TRUE when the objective function is smooth, or differentiable almost everywhere, and FALSE otherwise. Default value is TRUE.
• max.call: Integer. Maximum number of calls of the objective function. Default value is 10,000,000.
• max.time: Numeric.Maximum running time in seconds.
• temperature: Numeric. Initial value for the temperature.
• visiting.param: Numeric.Parameter for the visiting distribution.
• acceptance.param: Numeric.Parameter for the acceptance distribution.
• verbose: Logical. TRUE means that messages from the algorithm are shown. Default value is FALSE.
• trace.mat: Logical. Default value is TRUE, which means that the trace matrix will be available in the returned value of the GenSA call.
The default values of the control components are set for a complicated optimization problem.
For typical optimization problems with medium complexity, GenSA can determine a reasonable solution quickly; thus,using the variable threshold.stop to the expected function value is recommended to make GenSA stop at an earlier stage. A maximum run time can be also set by max.time argument or max.call argument for setting the maximum run time or number of calls, respectively.

Performance
An extensive performance comparison of currently available R packages for continuous global optimization problems has been published [26]. In this comparison, 48 benchmark functions were used to compare 18 R packages for continuous global optimization. Performance was measured in terms of the quality of the solutions determined and speed. The author concluded that GenSA and rgenoud are recommended in general for continuous global optimization [26]. Based on this conclusion, we set out to perform a more extensive performance test by including more benchmark functions and the additional algorithm DEoptim. The SciPy Python scientific toolkit provides an extensive set of 196 benchmark functions. Because these 196 benchmark functions are coded in Python, we had to convert the Python code to R code. As a result, a subset containing 134 functions was available for this test. One hundred runs using a random initial starting point were performed for every combination of the 134 benchmark functions and the aforementioned three methods. We used a local search method to further refine the best solution provided by Deoptim, because this technique provides a more accurate final result [17]. The default values of the parameters of every package were used in this comparison. A tolerance of 1e-8 was used to establish whether the algorithm determines the global minimum value.
The reliability of a method can be measured by the success rate%, which is defined as the number of successful runs over 100 runs. For each testing function, the number of function calls required to achieve the global minimum was recorded for every method, and we refer to this as the number of function calls. Please note that rgenoud required a longer time to complete 100 runs compared with GenSA and DEoptim. Table 1 shows the success rate% and the average number of function calls (in parentheses).
The mean of the success rate% over all the benchmark functions is 92, 85, and 86% for GenSA, DEoptim, and rgenoud, respectively. Because the number of function calls changes dramatically, the median rather than the mean of the number of function calls is provided: 244.3 for GenSA, 1625.9 for Deoptim, and 1772.1 for rgenoud. A heatmap of the success rate% for GenSA, DEoptim, and rgenoud is displayed in Figure 1.
The color scaling from red to green represents the success rate% from 0 to 100. Clearly, GenSA has a larger green region (high success rate%) than DEoptim and rgenoud.
Both the success rate% and number of function calls show that GenSA performed better than DEoptim and rgenoud in the testbed composed of 134 benchmark functions.

The Thomson problem in physics
The physicist J.J. Thomson posed the famous Thomson problem after proposing his plum pudding atomic model, based on his knowledge of the existence of negatively charged electrons within neutrally charged atoms [27]. The objective of the Thomson problem is to determine the minimum electrostatic potential energy configuration of N equal point charges constrained at the surface of a unit sphere that repel each other with a force given by Coulomb's law. The Thomson model has been widely studied in physics [28][29][30][31]. In the Thomson model, the energy function is The number of metastable structures (local minima) of the Thomson problem grows exponentially with N [28]. The region containing the global minimum is often small compared with those of other minima [30]. The Thomson problem has been selected as a benchmark for global optimization algorithms in a number of previous studies [4,5,28,32]. The Thomson problem has been solved by both deterministic methods, including steepest descent [28], and stochastic methods, including (but not limited to) constrained global optimization (CGO) [29], the GSA algorithm [4,5], genetic algorithms [30], and the Monte Carlo method [31,33]. Typically, deterministic methods with multiple starts can provide a good solution when there are fewer point charges, whereas stochastic methods have to be used when N is large.  We can define an R function for the Thomson [ 2],])^2)) }) res<-sum(rdist) return(res) } In this example, we chose six point charges because our purpose is only to show how to use our package GenSA. The global energy minimum of six equal point charges on a unit sphere is 9.98528137 [28].
The global minimum 9.98528137 for six point charges is determined within 600 function calls.

Kinetic modeling of pesticide degradation
Various types of pesticides have been widely used in modern agriculture. It is important to calculate the concentration of a pesticide in groundwater and surface water. We will show how GenSA can be used to fit a degradation model for a parent compound with one transformation product. All the data and models are from the R packages mkin [34] and FOCUS (2006) [35].

Discussion and conclusions
The discrete optimization problem, in particular, the feature selection problem, exists extensively. GSA can also be used for the discrete problem. Please refer to [37] for details.
GSA is a powerful method for the non-convex global optimization problem. We developed an R package GenSA based on Tsallis statistics and GSA. In an extensive performance testbed composed of 134 benchmark functions, GenSA provided a higher average success rate% and a smaller median of the number of function calls compared with two widely recognized R packages: DEoptim and rgenoud. GenSA is useful and can provide a solution that is comparable with or even better than that provided by other widely used R packages for optimization.
R is very good for program prototype. When there is a need for heavy computation, other computational languages, such as C/C++, Fortran, Java, and Python,are recommended. Considering both speed and usability, aPython version of GSA, PyGenSA, is being developed and will be released within the SciPy scientific toolkitat the end of 2016.