Parameter Estimation in Ordinary Differential Equations Modeling via Particle Swarm Optimization

Researchers using ordinary differential equations to model phenomena face two main challenges among others: implementing the appropriate model and optimizing the parameters of the selected model. The latter often proves difficult or computationally expensive. Here, we implement Particle Swarm Optimization, which draws inspiration from the optimizing behavior of insect swarms in nature, as it is a simple and efficient method for fitting models to data. We demonstrate its efficacy by showing that it outstrips evolutionary computing methods previously used to analyze an epidemic model


Introduction
In mathematical biology, parameter estimation is one of the most important components of model fitting procedure.With poorly estimated parameters, even the most appropriate models perform poorly.Although there is already an abundance of traditional algorithms in the literature, such as Gauss-Newton methods, the Nelder-Mead method, and simulated annealing (see [1] for a thorough review), as models get more complex, researchers need more versatile and capable optimization methods.Evolutionary computing methods are becoming more frequently used tools, reducing the computational cost of model fitting, and starting to attract interest among mathematical biologists.
Compartmental models that are frequently used in infectious disease modeling have not escaped the computational cost versus complex model dilemma either.Since parameter estimation for compartmental models can be tackled by such evolutionary computing methods, it is reasonable to examine the goodness of fit versus the price of fit for these commonly employed models.Therefore, we will focus on the lesser-used evolutionary algorithms in comparison to Particle Swarm Optimization (PSO).In particular, Genetic Algorithms (GA) have been frequently used to optimize the parameters of ordinary differential equations (ODE) models [2][3][4][5][6][7].PSO has thus far been underutilized in this area.It is an optimization algorithm inspired by swarms of insects, birds, and fish in nature.Although PSO has been applied in a number of different scenarios [8], its performance on compartmental models has not yet been studied.
In optimization, the computational cost of an algorithm is just as important as the quality of its output.Unfortunately for us, cost increases with quality.Evolutionary computing algorithms are especially susceptible to this effect, wherein small reductions in error must be paid for by disproportionate amounts of additional computation.To reduce cost or improve accuracy without compromising the other requires the application of an innovative technique to the problem.For example, Hallam et al. [9] implemented progressive population culling in a genetic algorithm to hit a given error target in fewer CPU cycles.
Here, we intend to show that PSO is not only a viable technique for fitting ODE models to data, but also that it has the potential to outperform GA, which was proposed in [2] as a superior optimization method to the traditional ones for fitting ODE models.Hence, with this study, we aim to establish an even more viable tool for ODE model fitting.Specifically, we apply PSO to kinetic parameter optimization/fitting in 2 Journal of Applied Mathematics the context of ODEs, and we contrast the PSO and GA algorithms using the cholera SIRB example as common ground.The organization of this article is as follows: In Section 2, we introduce PSO.Section 3 contains a description of how PSO can be applied to ODE models.We illustrate an implementation of PSO to optimize an ODE model of cholera infections during the recent Haitian epidemic in Section 4. Finally, we provide concluding remarks in Section 5.

Particle Swarm Optimization
We are given an objective function  : R  → R with the goal of minimization.Based on reasonable parameter bounds for the problem at hand, we constrain our search space to the Cartesian product for lower bounds {  } and upper bounds {  }.We release a "swarm" of point particles within the space and iteratively update their positions.The simulation rules allow the particles to explore and exploit the space all the while communicating with one another about their discoveries.Since PSO does not rely on gradients or smoothness in general, it is well-suited to optimizing chaotic and discontinuous systems.This flexibility is an essential property in the domain of ODE models, whose prediction errors may change drastically with minute parameter variations.The swarm nature of PSO makes it amenable to parallelization, giving it a sizable advantage over sequential methods.The algorithm used for our computations is largely based on SPSO 2011 as found in [10].We provide implementation details in Section 2.Then, we discuss ODE models, as they are ubiquitous in the realm of modeling natural phenomena.We provide a justification of our procedure for hyperspherical sampling in the Appendix.

Particle Swarm Optimization Implementation
where  = ( 1 , ⋅ ⋅ ⋅ ,   )  and  = diag( 1 −  1 , ⋅ ⋅ ⋅ ,   −   ).Hence, the particles search within the unit hypercube, which provides better hyperspherical sampling performance than does a search space whose dimensions could be orders of magnitude apart.The particles' positions are then affinely mapped to  before evaluation by the objective function.
A swarm consists of an ordered list of  particles.This value is manually adjusted to fit the problem at hand, although more advanced methods may attempt to select it with a metaoptimizer or adaptively adjust it during execution.If  is too low, then the particles will not be able to explore the space effectively; if it is too high, then computational costs will rise, without yielding a significant improvement in the final fitness.Each particle has attributes which are updated at every iteration.The attributes are shown in Table 1, and their initial values ( = 0) are given.We define (, ) to mean a value sampled uniformly from [, ].The particles are initially placed in the search space via Latin Hypercube Sampling (LHS).This helps avoid the excessive clustering that can occur with truly random placement, as we want all areas of the search space to be considered.The reader is directed to [11] for more details on LHS.
The PSO algorithm derives its power from the communication of personal best positions among particles.During the initialization phase, each particle randomly and uniformly selects  natural numbers in [1, ] (with replacement) and adds its own index  to the list.Once again,  is a tunable parameter of the algorithm; one might use  = 3 as a good starting point.These numbers, with repeats discarded, form a set   .They tell the particle which of its peers to communicate with when it discovers lower values of the objective function during the execution of the algorithm.This communication is also performed once to initialize the   values before any motion has taken place.In addition,   is continually updated based on the performance of the current configuration; this aspect of the algorithm is known as its adaptive random topology.

Exploration and Exploitation.
The degree of exploration in a given PSO configuration refers to the propensity of the particles to travel large distances within the search space and visit different regions.This must be properly balanced with exploitation, which is the tendency of particles to stay within a given region and examine local features to seek out even lower values of the objective function.The PSO specification provides a parameter to this end, and it may have to be manually or automatically tuned for best results on a particular function landscape.

Algorithm.
We present a description of the algorithm after initialization below.Each step is followed by a comment that may describe its motivation, practical implementation details, or pitfalls.Table 1 contains descriptions of the particle attributes.The parameters  and  (not to be confused with the  in the SIRB model) used in the algorithm must be selected beforehand.They control the explorationexploitation balance and the "inertia" of the particles, respectively.
(1) Apply a random permutation to the indices of the particles.
Permutations can be selected uniformly at random with the simple Fisher-Yates shuffle.
(2) For  = 0 to  − 1, We define   to be the "center of gravity" of these points:   ,   + (  −   ), and   + (  −   ).This provides a mean position around which the particle can search for new locations.The larger  is, the farther the particle will venture on average.Previous versions of SPSO had detectable biases along the axes of the coordinate system.This coordinate-free definition elegantly avoids such a problem.(b) Sample    uniformly from the volume of a hypersphere with center   and radius  = ‖  −   ‖.One might naively attempt this by generating a vector representing displacement from   whose coordinates are (0, 1) each and scaling it to a length of (0, ).However, this does not produce a constant probability density.The proper method is to generate a vector whose coordinates are standard normal variates and scale it to a length of (0, 1) 1/ .Normal variates can be generated from a uniform source with methods such as the Box-Muller transform, and Ziggurat algorithm.

The position 𝑥 󸀠
represents where the particle intends to travel next.The second term of V  (+1) contains the corresponding displacement from its current position.However, we would also like to retain some of the particle's old velocity to lessen the chance of getting trapped in local minimum.To this end, we add an inertial term with a "cooldown" factor  that functions analogously to friction.(d) Set   ( + 1) =   () + V  ( + 1).
The next position of the particle is its current position plus its updated velocity.(e) Clamp position if necessary.A multitude of conditions can be used to decide when to stop.Some common ones are as follows:

If any component of a particle's position falls
(i) The iteration count exceeds a threshold .For a fixed threshold, this method gives more computation time to larger swarms.
(ii) The product of the iteration count and the swarm size (the number of objective function evaluations) exceeds .In this case, the threshold is a good representation of how much total computation time the swarm utilizes, as the computational resource limit is nearly independent of swarm size.
(iii) The globally optimal objective function value (min    ) falls below some threshold.(iv) The globally optimal objective function value has not improved by more than  within the last  iterations.
(v) Any combination of the above.
The output of the algorithm is then the   value corresponding to the minimum   value.
The flowchart (Figure 1) summarizes the process of Particle Swarm Optimization.

Discrete Parameters.
If some (not necessarily all) of the parameters in the problem are discrete, PSO can be adapted by snapping the appropriate continuous coordinates in the search space to the nearest acceptable values during each iteration.This feature gives PSO great versatility, in that it can solve problems with entirely continuous, mixed, or entirely discrete inputs.

Parallel Computing.
Most real-world objective functions are sufficiently complex so that PSO will spend the vast majority of its time evaluating them, as opposed to executing the control logic.In our case, profiling showed that numerically solving an ODE at many points consumed more than 95% of the total CPU time.Therefore, in order to parallelize the algorithm, one needs only to evenly distribute these evaluations among different cores or possibly computers.The remaining logic of the algorithm can be executed in a single control thread that collates the fitness values and updates particle attributes before offloading evaluations for the next iteration.

Randomness.
Far too often in our experience, probabilistic numerical simulations are run with little thought to the underlying random number generator (RNG).As Clerc states in [10], the performance of PSO on some problems is highly dependent on the quality of the RNG.The default RNG in one's favorite programming language is likely not up to the task.By default, our code uses the fast, simple, and statistically robust xorshift128+ generator from [12].It outputs random 64-bit blocks, which are transformed using standard algorithms into the various types of random samples required.The use of a seed allows for exactly reproducible simulations (this is more difficult to ensure if a single generator is being accessed by multiple threads).

ODE Models
3.1.Introduction.Suppose we have data and a corresponding system of ordinary differential equations which we believe models the underlying process.The system takes parameters and initial values (some known, some unknown) as input and outputs functions, which can then be compared to the data using some goodness of fit (GOF) metric (e.g., mean squared error).Competing models can be evaluated using frameworks introduced in [3].If we take the vector of unknown inputs to be a position within search space, then we can apply PSO in order to minimize the model's error.The choice of error function is an important consideration.Those GOF metrics that are superlinear in the individual deviations will preferentially reduce larger deviations over smaller ones.Similarly, sublinear functions will produce good correspondence on most data points at the expense of some larger errors.

Compartmental Models.
In epidemiology, a common tool for studying the spread of an infectious disease is the compartmental model.The prototypical example is the SIR model, in which the population is divided into three compartments: Susceptible, Infected, and Recovering.Differential equations describe how individuals flow between the compartments over time.The behavior of a specific disease is captured by parameters, and it is here that we use model fitting techniques to derive these parameters from data.The simplest form of SIR does not include birth or death (the dynamics of the disease are assumed to be sufficiently rapid), so it holds that the population size  =  +  +  is constant.
More advanced models include more compartments and incorporate factors such as demography.In the field, it is often easiest to measure the number of infected individuals; thus a viable GOF metric may compute the distance between time series infection data and the corresponding points on the () curve for a given set of parameters.
Chubarova [13] implemented GA for fitting ODE models to cholera epidemic data.Although better errors were achieved than similar studies that used traditional methods, the computational cost required was substantial.This led to our consideration of a technique which could achieve similar accuracy with fewer cycles, namely PSO.One of the models studied in [13] was SIRB, which is a refinement of SIR: it includes birth and death rates for the population and a compartment  representing the concentration of bacteria in the water supply.We present equations for SIRB and describe the roles of its parameters in Section 4.

SIRB.
The SIRB model is a system of four nonlinear firstorder ODEs that capture how quickly individuals progress through the susceptibility-infection-recovery cycle and how bacteria are exchanged between the population and the water supply: The parameters and their ranges are listed in Table 2 (derived from [2] with some later corrections from the authors).An upper bound of NA means that the value is constant and not optimized by PSO.The bacteria being studied in this case are of the genus Vibrio.They decay from a state of hyperinfectivity (HI) to a non-HI state (only the former is considered infectious in this model) at a rate of .The parameter  has units of cells per milliliter, while the rest have units of inverse days.

Model Fitting.
We apply PSO to fit the SIRB model to epidemic data from [14] using our own C code and the rkf45 ODE solver in the GNU Scientific Library.Since the data were obtained from hospitals, we must correct for the fact that not every infected person received care.We introduce the parameter ℎ as the proportion of recorded infections.Instead of presuming a certain value, we allow ℎ to be optimized by PSO.The initial values for the system for Department Nord are (0) = 549,486, (0) = 1/ℎ, (0) = 0, and (0) =  0 , where  0 is another parameter to be optimized by PSO.The particles explore an 8-dimensional search space with generic position vector  = ( 0 , ℎ,   ,   , , , , ) .
We measure GOF using the mean absolute deviation: where {(  ,   )} is the time series infection data.Figure 2 shows a plot of ℎ() and   versus  for an example run of the PSO algorithm.The tail behavior of our model is typical in that it fails to account for cyclical recurrence of the disease.Akman et al. address this phenomenon in [2].We performed 12 runs of the algorithm with the parameters  = 1.193,  = 0.721,  = 40, and  = 3 (suggested values from [10]).We chose a stopping condition of two million objective function evaluations.We found this condition experimentally by observing that fitness values rarely, if ever, improved beyond this point.Our results are given in Table 3.An individual run only takes around 8 minutes on a quad-core Intel Celeron J1900 @ 1.99 GHz (compiled with icc -O2).The same optimization problem, approached using GA in [13], took between approximately 1 and 6.4 days to run on two hexa-core Intel Xeon X5670s @ 2.93 GHz (see Table 4).Considering the similarity of the fitness values to the evolutionary computing results, PSO appears to be an attractive method for model fitting, especially for the case of expensive objective functions.It uses smaller populations and fewer timesteps due to the fast convergence of the fitness values, which distinguishes it from GA.
Code for PSO and the SIRB model with sample data is available at [11].

Concluding Remarks
The use of PSO is a novel approach in ODE modeling, particularly in the field of epidemiology.Where previous algorithms such as GA have taken excessively long to run in the face of complex models, PSO demonstrates its effectiveness by providing comparable results in an impressively small fraction of the time.It has a structural advantage over GA that cannot be attributed to the complexity of the control logic; PSO requires many fewer objective function calls to reach the same level of accuracy.We implemented PSO to optimize an ODE model given cholera data.Although there are other global optimization methods that are also used in fitting parameters of ODE models, such as simulated annealing or scatter search, we restricted our comparison only to the performance of GA, since it was extensively used for the same infectious disease model.We achieved similar model errors in a substantially shorter period of time, as explained above, due to the fundamental differences of operation between PSO and GA.Additionally, we compared the performance of PSO with that of DIRECT [15][16][17][18][19] per the suggestion of a referee.DIRECT evaluated the objective function approximately six thousand times in the same amount of time it took PSO to perform two million evaluations, and it appeared to converge to a worse fitness value.
As with most optimization algorithms, the remaining challenge of PSO is algorithm parameter selection.In the context of our study, this is similar to selecting the right GA parameters, such as population size, number of generations, cross-over rate, and mutation rates; that all may have an impact on the accuracy and runtime.One possible improvement is to add a metaoptimizer to choose the algorithm parameters without human intervention.A fruitful potential topic of investigation is to extend PSO to approximating unknown functions (such as probability distributions) rather than merely selecting parameters.The same distribution can also be produced in a simpler fashion by repeatedly generating a vector  with components of (−, ) and rejecting it if ‖‖ > .However, the expected number of tries per sample is equal to the ratio of the volume of a -cube to that of an inscribed -ball, which grows too rapidly with .

Table 4 :
GA run statistics.Time to make elites (s) Time to make parameter sets (s) Total time (s) Sample size Stopping threshold Number of elites Max generations Parameter sets created Fitness 8680 find that () must be uniformly distributed on the surface of the unit -sphere.Since a -ball with radius  ≤ 1 has volume proportional to   , we would like to find a random variable  supported on [0, 1] such that  (      ()     ≤ ) ∝   ⇒  ( ≤ ) ∝   .(A.1)It is easily shown that  = (0, 1) 1/ has a cumulative distribution function of () =   .Setting  = () produces the desired result.

Table 1 :
Particle attributes indexed by position in list.
outside [0, 1], we clamp it to the appropriate endpoint and multiply the corresponding component of its velocity by −0.5.This inelastic "bouncing" contains particles, while still allowing exploitation of the periphery of the search space.(f) Increment .(g) If (  ()) <   ( − 1), set   () =   () and  (i) Regenerate topology if global fitness min    did not improve from previous step.If the global fitness of the swarm has not improved at the end of the iteration, then the sets   are updated by regenerating them as during the initialization phase.(j) Check stopping condition.See below for example conditions.
() = (  ()).Else, carry forward values from previous iteration.If the value of the objective function at the current position is better than the personal best value from the previous iteration, set the personal best position to the current position and the personal best value to the current value.(h)Communicate with neighbors.When a particle finds a value of the objective function lower than its   (which corresponds to the position   ), it informs all of its neighbors by updating their values of these attributes to its current objective function value and position.

Table 2 :
Parameter ranges for cholera model.

Table 3 :
Parameter values from PSO and corresponding fitness.