GA : A Package for Genetic Algorithms in R

Genetic algorithms (GAs) are stochastic search algorithms inspired by the basic principles of biological evolution and natural selection. GAs simulate the evolution of living organisms, where the ﬁttest individuals dominate over the weaker ones, by mimicking the biological mechanisms of evolution, such as selection, crossover and mutation. GAs have been successfully applied to solve optimization problems, both for continuous (whether di ↵ erentiable or not) and discrete functions. This paper describes the R package GA , a collection of general purpose functions that provide a ﬂexible set of tools for applying a wide range of genetic algorithm methods. Several examples are discussed, ranging from mathematical functions in one and two dimensions known to be hard to optimize with standard derivative-based methods, to some selected statistical problems which require the optimization of user deﬁned objective functions. (This paper contains animations that can be viewed using the Adobe Acrobat PDF viewer.)


Introduction
Genetic algorithms (GAs) are a class of evolutionary algorithms made popular by John Holland and his colleagues during the 1970s (Holland 1975), and which have been applied to find exact or approximate solutions to optimization and search problems (Goldberg 1989;Sivanandam and Deepa 2007).Compared with other evolutionary algorithms, the distinguishing features in the original proposal were: (i) bit strings representation; (ii) proportional selection; and (iii) crossover as the main genetic operator.Since then, several other representations have been formulated in addition to binary strings.Further methods have been proposed for crossover, while mutation has been introduced as a fundamental genetic operator.Therefore, nowadays GAs belong to the larger family of evolutionary algorithms (EAs), and the two terms are often used interchangeably.
Following Spall (2004) the problem of maximizing a scalar-valued objective function f : S !R can be formally represented as finding the set where ⇥ ✓ S. The set S ✓ R p defines the search space, i.e., the domain of the parameters ✓ = (✓ 1 , . . ., ✓ p ) where each ✓ i varies between the corresponding lower and upper bounds.The set ⇥ indicates the feasible search space, which may be defined as the intersection of S and a set of m 0 additional constraints: g j (✓)  0 for j = 1, . . ., q, h j (✓) = 0 for j = q + 1, . . ., m.
The solution set ⇥ ⇤ in (1) may be a unique point, a countable (finite or infinite) collection of points, or a set containing an uncountable number of points.
While the formal problem representation in (1) refers to maximization of an objective function, minimizing a loss function can be trivially converted to a maximization problem by changing the sign of the objective function.
Typically, for di↵erentiable continuous functions f the optimization problem is solved by root-finding, i.e., by looking for ✓ ⇤ such that @f (✓ ⇤ )/@✓ i = 0 for every i = 1, . . ., p.However, care is needed because such a root may not correspond to a global optimum of the objective function.Di↵erent techniques are required if we constrain ✓ to lie in a connected subset of R p (constrained optimisation) or if we constrain ✓ to lie in a discrete set (discrete optimisation).
In the latter case, also known as combinatorial optimization, the set of feasible solutions is discrete or can be reduced to discrete.R (R Core Team 2012) includes some built-in optimization algorithms.The function optim provides implementations of three deterministic methods: the Nelder-Mead algorithm, a quasi-Newton algorithm (also called a variable metric algorithm), and the conjugate gradient method.Box-constrained optimization is also available.A stochastic search is provided by optim using simulated annealing.The function nlm performs minimization of a given function using a Newton-type algorithm.The golden section search for one dimensional continuous functions is available through the optimize function.Many other packages deal with di↵erent aspects of function optimization.A comprehensive listing of available packages is contained in the CRAN task view on "Optimization and Mathematical Programming" (Theussl 2013).
Packages gafit (Tendys 2002), galts (Satman 2012a) and mcga (Satman 2012b) o↵er some limited options for using optimization routines based on genetic algorithms.The package rgenoud (Mebane Jr. and Sekhon 2011) combines evolutionary algorithm methods with a derivative-based (quasi-Newton) method to solve optimization problems.genalg (Willighagen 2005) attempts to provide a genetic algorithm framework for both binary and floating points problems, but it is limited in scope and flexibility.DEoptim (Mullen, Ardia, Gil, Windover, and Cline 2011) implements the di↵erential evolution algorithm for global optimization of a real-valued function.
The aim in writing the GA package was to provide a flexible, general-purpose R package for implementing genetic algorithms search in both the continuous and discrete case, whether constrained or not.Users can easily define their own objective function depending on the problem at hand.Several genetic operators are available and can be combined to explore the best settings for the current task.Furthermore, users can define new genetic operators and easily evaluate their performances.The package is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=GA.
In the next section, we briefly review the basic ideas behind GAs.Then, we present the GA package in Section 3, followed by several examples on its usage in Section 4. Such examples range from mathematical functions in one and two dimensions known to be hard to optimize with standard derivative-based methods, to some selected statistical problems which require the optimization of user defined objective functions.

Genetic algorithms
Genetic algorithms are stochastic search algorithms which are able to solve optimization problems of the type described in Equation 1, both for continuous (whether di↵erentiable or not) and discrete functions (A↵enzeller and Winkler 2009;Back, Fogel, and Michalewicz 2000a,b;Coley 1999;Eiben and Smith 2003;Haupt and Haupt 2004;Spall 2003).Constraints on the parameters space can also be included (Yu and Gen 2010).
GAs use evolutionary strategies inspired by the basic principles of biological evolution.At a certain stage of evolution a population is composed of a number of individuals, also called strings or chromosomes.These are made of units (genes, features, characters) which control the inheritance of one or several characters.Genes of certain characters are located along the chromosome, and the corresponding string positions are called loci.Each genotype would represent a potential solution to a problem.
The decision variables, or phenotypes, in a GA are obtained by applying some mapping from the chromosome representation into the decision variable space, which represent potential solutions to an optimization problem.A suitable decoding function may be required for mapping chromosomes onto phenotypes.
The fitness of each individual is evaluated and only the fittest individuals reproduce, passing their genetic information to their o↵spring.Thus, with the selection operator, GAs mimic the behavior of natural organisms in a competitive environment, in which only the most qualified and their o↵spring survive.Two important issues in the evolution process of GAs search are exploration and exploitation.Exploration is the creation of population diversity by exploring the search space, and is obtained by genetic operators, such as mutation and crossover.Crossover forms new o↵springs from two parent chromosomes by combining part of the genetic information from each.On the contrary, mutation is a genetic operator that randomly alters the values of genes in a parent chromosome.Exploitation aims at reducing the diversity in the population by selecting at each stage the individuals with higher fitness.
Often an elitist strategy is also employed, by allowing the best fitted individuals to persist in the next generation in case they did not survive.
The evolution process is terminated on the basis of some convergence criteria.Usually a maximum number of generations is defined.Alternatively, a GA is stopped when a su ciently large number of generations have passed without any improvement in the best fitness value, or when a population statistic achieves a pre-defined bound.
Figure 1 shows the flow-chart of a typical genetic algorithm.A user must first define the type of variables and their encoding for the problem at hand.Then the fitness function is defined, which is often simply the objective function to be optimized.More generally, it can be any function which assigns a value of relative merit to an individual.Genetic operators, such as crossover and mutation, are applied stochastically at each step of the evolution process, so their probabilities of occurrence must be set.Finally, convergence criteria must be supplied.
The evolution process starts with the generation of an initial random population of size n, so for step k = 0 we may write n }.The fitness of each member of the population at any step k, f (✓ (k) i ), is computed and probabilities p } is formed from the reproducing population using crossover and mutation operators.Then, set k = k + 1 and the algorithm returns to the fitness evaluation step.When convergence criteria are met the evolution stops, and the algorithm deliver i ) as the optimum.

Overview of the GA package
The GA package implements genetic algorithms using S4 object-oriented programming (OOP).For an introduction to OOP in the S language see Venables and Ripley (2000), while for a more thorough treatment of the subject specifically for R see Chambers (2008) and Gentleman (2009).The proponents of OOP argue that it allows for easier design, writing and maintaining of software code.However, the actual internal implementation should be transparent to the end user, and in the following we describe the use of the package from a user perspective.
fitness The fitness function, any allowable R function which takes as input an individual string representing a potential solution, and returns a numerical value describing its "fitness".
... Additional arguments to be passed to the fitness function.This allows one to write fitness functions that keep some variables fixed during the search.
min A vector of length equal to the decision variables providing the minimum of the search space in case of real-valued or permutation encoded optimizations.
max A vector of length equal to the decision variables providing the maximum of the search space in case of real-valued or permutation encoded optimizations.
nBits A value specifying the number of bits to be used in binary encoded optimizations.
population The string name or an R function for randomly generating an initial population.
selection The string name or an R function performing selection, i.e., a function which generates a new population of individuals from the current population probabilistically according to individual fitness.
crossover The string name or an R function performing crossover, i.e., a function which forms o↵springs by combining part of the genetic information from their parents.
mutation The string name or an R function performing mutation, i.e., a function which randomly alters the values of some genes in a parent chromosome.
popSize The population size.
pcrossover The probability of crossover between pairs of chromosomes.Typically this is a large value and by default is set to 0.8.
pmutation The probability of mutation in a parent chromosome.Usually mutation occurs with a small probability, and by default is set to 0.1.
elitism The number of best fitness individuals to survive at each generation.By default the top 5% individuals will survive at each iteration.
monitor An R function which takes as input the current state of the ga object and show the evolution of the search.By default, the function gaMonitor prints the average and best fitness values at each iteration.If set to plot these information are plotted on a graphical device.Other functions can be written by the user and supplied as argument.
maxiter The maximum number of iterations to run before the GA search is halted.
run The number of consecutive generations without any improvement in the best fitness value before the GA is stopped.
maxfitness The upper bound on the fitness function after that the GA search is interrupted.
names A vector of character strings providing the names of decision variables.
suggestions A matrix of solutions string to be included in the initial population.
seed An integer vector containing the random number generator state.This argument can be used to replicate the results of a GA search.
A call to the ga function should at least contain the arguments type and fitness.Furthermore, for binary search the argument nBits is required, whereas min and max are needed for real-valued or permutation encoding.
Default settings for genetic operators are given by the R function gaControl, which is described in detail in Section 3.1.Users can choose di↵erent operators among those already available and discussed in Section 3.1, or define their own genetic operators as illustrated with an example in Section 4.9.
The function ga returns an S4 object of class "ga".This object contains slots that report most of the arguments provided in the function call, as well as the following slots: iter A numerical value for the current iteration of the search.
population A matrix of dimension object@popSize times the number of decision variables.
fitness The evaluated fitness function for the current population of individuals.
best The "best" fitness value at each iteration of the GA search.
mean The average fitness value at each iteration of the GA search.
fitnessValue The "best" fitness value found by the GA search.At convergence of the algorithm this is the fitness evaluated at the solution string(s).
solution A matrix of solution strings, with as many rows as the number of solutions found, and as many columns as the number of decision variables.
The GA package is byte-compiled, as are all of standard (base and recommended) R packages.
For a simple vectorized fitness function, byte-compiling may marginally improve the computational time required.However, if the fitness function is not vectorized and it must perform complex calculations, byte-compiling should significantly reduce the computational time.

Functions and genetic operators
Several R functions for generating the initial population and for applying genetic operators are contained in the GA package.The naming of these functions follow the scheme ga<type>_<operator> where <type> can be one of bin, real or perm, according to the type of GA problem, and <operator> identifies the genetic operator to be employed.
Note that this naming scheme is just a convention we thought was useful to adopt, but, in principle, any name could be used.
Hereafter, we briefly introduce the available operators for each GA type.Interested readers may find detailed descriptions of such operators in, for instance, Back et al. (2000a,b), Yu and Gen (2010) and Eiben and Smith (2003).

Population
For generating the initial population, the available R functions are: gabin_Population Generate a random population of object@nBits binary values.
gareal_Population Generate a random (uniform) population of real values in the range [object@min, object@max].
gaperm_Population Generate a random (uniform) population of integer values in the range [object@min, object@max].
All these functions take as input an object of class "ga" and return a matrix of dimension object@popSize times the number of decision variables.

Selection
The following R functions are available for the selection genetic operator: gabin_lrSelection, gareal_lrSelection, gaperm_lrSelection Linear-rank selection.
gareal_lsSelection Fitness proportional selection with fitness linear scaling.
gareal_sigmaSelection Fitness proportional selection with Goldberg's sigma truncation scaling.
The above functions take as arguments an object of class "ga" and, possibly, other parameters controlling the genetic operator.They all returns a list with two elements: population A matrix of dimension object@popSize times the number of decision variables containing the selected individuals or strings.
fitness A vector of length object@popSize containing the fitness values for the selected individuals.

Crossover
Available R functions for the crossover genetic operator are: gabin_spCrossover, gareal_spCrossover Single-point crossover.
These functions take as arguments an object of class "ga" and a two-rows matrix of values indexing the parents from the current population.They all return a list with two elements: children A matrix of dimension 2 times the number of decision variables containing the generated o↵springs.
fitness A vector of length 2 containing the fitness values for the o↵springs.A value NA is returned if an o↵spring is di↵erent (which is usually the case) from the two parents.

Mutation
Available R functions for the mutation genetic operator are: gabin_raMutation, gareal_raMutation Uniform random mutation.
gareal_rsMutation Random mutation around the solution.
gaperm_swMutation Exchange mutation or swap mutation.
These functions take as arguments an object of class "ga" and a vector of values for the parent from the current population where mutation should occur.They all return a vector of values containing the mutated string.

Default settings
The function ga uses a set of default settings for genetic operators.These can be retrieved or set with the function gaControl.Its usage depends on the arguments provided.A call with no arguments returns a list containing the current values, which by defaults are: R> gaControl() A call to gaControl with a single string specifying the name of the component returns the current value(s): R> gaControl("binary") In this case the function returns the current genetic operators used by the "binary" GAs search.
To change the default values, a named component must be followed by a single value (in case of "eps") or a list of component(s) specifying the name of the function for a genetic operator.For instance, the following code saves the current default values, and then set the tournament selection as the new default for binary GAs: R> defaultControl <-gaControl() R> gaControl("binary" = list(selection = "gabin_tourSelection")) When any value is set by gaControl, this will remain in e↵ect for the rest of the session.To restore the previously saved package defaults: R> gaControl(defaultControl)

Examples
Many examples concerning optimization tasks are provided in this Section.In particular, we will present the optimization of well-known benchmark mathematical functions, but also applications of genetic algorithms in a variety of statistical problems.
Hereafter, we assume that the GA package is already installed and loaded in the current R session, for example by entering the following command: R> library("GA")

Function optimization on one dimension
We start by a simple one-dimensional function minimization by considering the function , which has min f (0) = 1 for 1 < x < +1 (see test function F1 in Haupt and Haupt 2004).Here we restrict our attention to x 2 [ 20, 20], so this function can be defined and plotted in R as follows: We can define the fitness function, which in this case is simply minus the function to minimize, and run the genetic algorithm with the code: R> fitness <-function(x) -f(x) R> GA <-ga(type = "real-valued", fitness = fitness, min = min, max = max) Here we specified type = "real-valued" for a real-valued function optimization using the R function fitness as the objective function to be maximized over the range provided by the arguments min and max.By default the ga function monitors the search by printing the mean and the best fitness values at each iteration: At the end of the search an S4 object of class "ga" is returned, which can be printed and plotted as follows: R> GA An object of class "ga" Call: ga(type = "real-valued", fitness = fitness, min = min, max = max) Available slots: [1] "call" "type" "min" "max" [5] "nBits" "names" "popSize" "iter" [9] "run" "maxiter" "suggestions" "population" [13] "elitism" "pcrossover" "pmutation" "fitness" [17] "best" "mean" "fitnessValue" "solution" R> plot(GA) R> summary(GA) ---------------------------------- The plot method produces the graph in Figure 2b, where the best and average fitness values along the iterations are shown.
Figure 2a contains an animation of the GA search, which shows the evolution of the population units and the corresponding functions values at each generation.This has been obtained by defining a new monitor function and then passing this function as an optional argument to ga: R> monitor <-function(obj) { + curve(f, min, max, main = paste("iteration =", obj@iter), font.main= 1) + points(obj@population, -obj@fitness, pch = 20, col = 2) + rug(obj@population, col = 2) + Sys.sleep(0.2) + }
The final GA result can be compared with the solutions provided by two other optmization algorithms available in R: optimize, which uses a combination of golden section search and successive parabolic interpolation, and nlm, which uses a Newton-type algorithm.The results shown in Figure 3b makes clear that the latter two optimization algorithms are both trapped in local maxima, while the GA is able to identify the right global maximum.The code used to obtain this graph is the following: R> opt.sol <-optimize(f, lower = min, upper = max, maximum = TRUE) R> nlm.sol <-nlm(function(...) -f(...), 0, typsize = 0.1) R> curve(f, min, max)

Function optimization on two dimensions
The Rastrigin function is a non-convex function often used as a test problem for optimization algorithms because it is a di cult problem due to its large number of local minima.In two dimensions it is defined as with x i 2 [ 5.12, 5.12] for i = 1, 2. It has a global minimum at (0, 0) where f (0, 0) = 0.
Figure 4 shows a perspective plot 1 and a contour plot of the Rastrigin function obtained as follows: R> x1 <-x2 <-seq(-5.12,5.12, by = 0.1) R> f <-outer(x1, x2, Rastrigin) R> persp3D(x1, x2, f, theta = 50, phi = 20) R> filled.contour(x1,x2, f, color.palette= jet.colors) The optimization of this function with the monitoring of the space searched at each GA iteration (see Figure 4b) can be obtained through the following code: 1 The function persp3D, included in the GA package, is an enhanced version of the base persp function.
The Andrews Sine function and the fitness function to be used in the GA (recall that we need to maximize the fitness) are defined as: R> AndrewsSineFunction <-function(x, a = 1.5) + ifelse(abs(x) > pi * a, 2 * a^2, a^2 * (1 -cos(x/a))) R> rob <-function(b, s = 1) + -sum(AndrewsSineFunction((y -X %*% b)/s)) We apply the robust fitting procedure to the well-known stackloss dataset available in the datasets package.
R> data("stackloss", package = "datasets") The range of the search space can be obtained from a preliminary OLS estimation of the coe cients and their standard errors: R> OLS <-lm(stack.loss~., data = stackloss) R> y <-model.response(model.frame(OLS))R> X <-model.matrix(OLS)R> se.coef <-sqrt(diag(vcov(OLS))) R> min <-coef(OLS) -3 * se.coef R> max <-coef(OLS) + 3 * se.coef We can now run the GA search, this time by using a large number of possible iterations and increasing the probability of mutation to ensure that vast portion of the parameter space is explored.

Subset selection
A typical application of binary GAs in statistical modeling is subset selection (see e.g., the R package glmulti, Calcagno and de Mazancourt 2010).Given a set of p predictors, subset selection aims at identifying those predictors which are most relevant for explaining the variation of a response variable.This allows one to achieve parsimony of unknown parameters, yielding both better estimation and clearer interpretation of regression coe cients.The problem of subset selection can be naturally treated by GAs using a binary string, with 1 indicating the presence of a predictor and 0 its absence from a given candidate subset.The fitness of a candidate subset can be measured by one of the several model selection criteria, such as AIC, BIC, etc. Bozdogan (2004) discussed the use of GAs for subset selection in linear regression models, and in the following we present an application closely following his analysis, but with the use of Akaike's information criterion (AIC; Akaike 1973).We start by loading the dataset from the UsingR package (Verzani 2005)  The design matrix (without the intercept) and the response variable are extracted from the fitted model object using: Then, the fitness function to be maximized can be defined as follows: R> fitness <-function(string) { + inc <-which(string == 1) + X <-cbind(1, x[,inc]) + mod <-lm.fit(X,y) + class(mod) <-"lm" + -AIC(mod) + } which simply estimates the regression model using the predictors identified by a 1 in the corresponding position of string, and returns the negative of the chosen criterion.Note that an intercept term is always included, and that we employ the basic lm.fit function to speed up calculations.The following R code runs the GA: R> GA <-ga("binary", fitness = fitness, nBits = ncol(x), + names = colnames(x), monitor = plot) R> plot(GA) R> summary(GA) ----------------------------------+ | G e n e t i cA l g o r i t h m | +- ---------------------------------- A graphical summary of the GA search is shown in Figure 5.
The linear regression model fit obtained using the best subset found by GA is the following: Compared to Bozdogan (2004) solution, which used the ICOMP(IFIM) criterion for evaluating the subsets, the GA solution with fitness based on AIC selects one more predictor, namely weight.Because of its strong collinearity with hip (r = 0.94), the latter predictor is not statistically significant (p value = 0.1594).This result is not surprising, since it is known that AIC tends to overestimate the number of predictors required, while ICOMP(IFIM) is able to protect against multicollinearity.

Acceptance sampling
Acceptance sampling is an area of applied statistics where sampling is used to determine whether to accept or reject a production lot of material (raw materials, semifinished products, or finished products).An introduction to acceptance sampling is contained in the textbook by Montgomery (2009), while a monograph devoted to the argument is Schilling and Neubauer (2009).
In acceptance sampling for attributes, only the presence or absence of a characteristic in the inspected item is recorded.Among the available sampling plans, the single-sampling plan involves taking a random sample of size n from a lot of size N .The number d of defective items found is compared to an acceptance number c, and the lot is accepted if d  c.The probability of acceptance P a can be computed by assuming a Binomial distribution for the number of defectives in a lot (the so-called type B sampling).Thus, such a probability is given by A plot of P a vs p is called operating characteristic (OC) curve, and expresses the probability of acceptance as a function of lot quality.
In practical applications, a single-sampling plan needs the specification of the sample size n and the acceptance number c.This is usually pursued by specifying two points on the OC curve and solving the resulting system of equations: where p 1 is typically set at the average quality limit (AQL), and p 2 at the lot tolerance percent defective (LTPD).The system of equations in ( 2) is nonlinear and no direct solution is available.Traditionally, a graph called nomogram is consulted for obtaining the pair (n, c) that solves (2), at least approximately due to discreteness of parameters.Below we present a simple solution to this problem using GAs.
Given that both n and c should be positive integer values, we may use binary GAs with Gray encoding.This eliminates the well-known Hamming cli↵ problem associated with binary coding.As an example, consider a five-bit encoding using the standard binary coding.Two consecutive integers, for instance 15 and 16, are encoded as: R> decimal2binary(15, 5) [1] 0 1 1 1 1 R> decimal2binary(16, 5) [1] 1 0 0 0 0 then moving from 15 to 16 (or vice versa) all five bits need to be changed.On the other hand, using Gray encoding: R> binary2gray(decimal2binary(15, 5)) [1] 0 1 0 0 0 R> binary2gray(decimal2binary(16, 5)) [1] 1 1 0 0 0 the two binary strings di↵er by one bit only.Thus, in Gray encoding the number of bit di↵erences between any two consecutive strings is one, whereas in binary strings this is not always true.The R functions binary2decimal and gray2binary are also available to move from one type of encoding to another.
Returning to our problem, a decoding function which takes as input a solution string of binary values in Gray representation, and then transform it to a decimal representation for the pair (n, c) can be defined in R as: where l1 and l2 are the number of bits required to separately encode the two parameters.
The fundamental step for solving (2) via GAs is to define a loss (quadratic) function to evaluate a proposal solution pair: R> decode(GA@solution) [1] 87 2 The final solution provided is thus decoded to obtain the solution pair (n = 87, c = 2).The corresponding OC curve is shown in Figure 6b.This is obtained from Figure 6a by adding the solution OC curve as follows: R> n <-87 R> c <-2 R> p <-seq(0, 0.2, by = 0.001) R> Pa <-pbinom(2, 87, p) R> lines(p, Pa, col = 2) An advantage of using GAs for solving the problem of identifying the parameters of an acceptance sampling plan is that this approach can be easily extended to more complicated plans, for instance double sampling plan, by suitably modifying the functions fitness and decode.

Constrained optimization
The knapsack problem considers the maximization of the weighted profit subject to the constraint of the knapsack's capacity.Formally, given a set of weights w i , profits p i , and knapsack's capacity W , find a binary vector x = (x 1 , . . ., x n ) such that P n i=1 x i p i is maximized under the constraint that P n i=1 x i w i  W .The solution to this problem is a binary string of length n where x i = 1 if the i-th item is selected for the knapsack, and x i = 0 otherwise.
Consider the allele frequency estimation problem in Lange (2004, p. 123-125).For the three alleles A, B, and O, there are four observable phenotypes A, B, AB, and O.This is because each individual inherits two alleles from the parents, and alleles A and B are genetically dominant to allele O. Lange (2004)  This approach yields a solution very close to that provided by Lange (2004).An unconstrained optimization can be pursued via parameter transformation using the inverse multinomial logit transformation: The GA search using unconstrained maximization yields an improved solution, and it also requires much fewer iterations.

Traveling salesperson problem
The traveling salesperson problem (TSP) is one of the most widely discussed problems in combinatorial optimization.In its simplest form, consider a set of n cities with known symmetric intra-distances, the TSP involves finding an optimal route for visiting all the cities and return to the starting point such that the distance traveled is minimized.The set of feasible solutions is given by the total number of possible routes, which is equal to (n 1)!/2, a value which quickly can become enormous.Several algorithms for solving the TSP have been proposed in the literature, and some of them are available in the R package TSP (Hahsler and Hornik 2007).
Several di↵erent representations and genetic operators for solving the TSP with GAs are available (for a review see Larranaga, Kuijpers, Murga, Inza, and Dizdarevic 1999).The most natural representation is denominated path representation.In this representation, the n cities are put in order according to a list of n elements, so that if the city i is the j-th element of the list, city i is the j-th city to be visited.For example, given 5 cities the list (B, D, A, C, E) corresponds to the tour that visits first city B, then D, etc., ending with city E.

Conclusion
In this paper we discussed the R package GA for applying genetic algorithm methods in optimization problems.The package is flexible enough to allow users to define their own objective function to be optimized, either using built-in standard genetic operators, or by defining and exploring new operators.
According to the no-free-lunch theorem (Wolpert and Macready 1997), which roughly speaking states that there is no optimization algorithm which is uniformly better than other algorithms on average, genetic algorithms are not the panacea for all types of optimization searches.In general, GAs are slower than derivative-based algorithm.However, the latter may be unable to find any optimum at all.On the contrary, GAs can be successful when the fitness function is not smooth or there are local optima.Furthermore, their use in practical problems may serve to highlight a set of candidate solutions which, albeit not the optimal ones, could be at least worthwile to consider.
Finally, we think that the GA package may serve the community in providing a simple, accurate, and extensible tool for exploring the potentiality of genetic algorithms in statistical applications.

Figure 1 :
Figure 1: Flow-chart of a genetic algorithm.
each individual in the population, usually proportional to their fitness.The reproducing population is formed (selection) by drawing with replacement a sample where each individual has probability of surviving equal to p

Figure 2 :
Figure 2: One-dimensional test function: f (x) = |x| + cos(x).Panel (a) shows the function; if you are viewing this in Acrobat, click on the image to see an animation of fitness evaluation at each iteration.Panel (b) shows best and average fitness valuew at each GA generation step.

Figure 3 :
Figure 3: One-dimensional test function: f (x) = (x 2 + x) cos(x).Panel (a) shows best and average fitness values at each GA generation step.Panel (b) shows the solutions found by GA and two other numerical optimization algorithms available in R.

Figure 4 :
Figure 4: Panel (a) shows a perspective plot of the Rastrigin test function, while panel (b) shows the corresponding contours.If you are viewing this in Acrobat, click on the panel (b) image to see an animation of fitness evaluation at each GA iteration.

+Figure 5 :
Figure 5: Plot of best and average fitness values at each step of the GA search.If you are viewing this in Acrobat, click on the image to see an animation of fitness evaluation during the GA iterations.
neck chest abdomen hip thigh knee ankle bicep

Figure 6 :
Figure 6: OC curve for single acceptance sampling plan.Panel (a) shows the two fixed points for which a solution is sought.Panel (b) shows the OC curve for the solution (n = 87, c = 2) found by GAs.

Figure 7 :
Figure 7: Map of European cities with optimal TSP tour found by GA.

Figure 8 :
Figure 8: GA search paths using Boltzmann selection at di↵erent values of ↵ parameter.
and then we fit a linear regression model by OLS: