A Comparison of Computational Efficiencies of Stochastic Algorithms in Terms of Two Infection Models

In this paper, we investigate three particular algorithms: a stochastic simulation algorithm (SSA), and explicit and implicit tau-leaping algorithms. To compare these methods, we used them to analyze two infection models: a Vancomycin-resistant enterococcus (VRE) infection model at the population level, and a Human Immunodeficiency Virus (HIV) within host infection model. While the first has a low species count and few transitions, the second is more complex with a comparable number of species involved. The relative efficiency of each algorithm is determined based on computational time and degree of precision required. The numerical results suggest that all three algorithms have the similar computational efficiency for the simpler VRE model, and the SSA is the best choice due to its simplicity and accuracy. In addition, we have found that with the larger and more complex HIV model, implementation and modification of tau-Leaping methods are preferred.


Introduction
Deterministic approaches involving ordinary differential equations to approximate large populations or sample sizes with a continuum, though widely used, have proven less descriptive when applied to small populations/sample sizes.To address this issue, continuous time Markov chain (CTMC) models are often used when dealing with low species count.There are a variety of stochastic algorithms that can be employed to simulate CTMC models.However, it seems that none of these algorithms can be readily applied to all problems.
The goal of this paper is to illustrate how widely algorithm performances vary between two stochastic infection models and demonstrate how one might perform computational studies to aid in selection of appropriate algorithms.Specifically, we examine three commonly used algorithms: a stochastic simulation algorithm SSA, commonly referred to as the Gillespie algorithm, and explicit and implicit tau-leaping methods.One of our test models is adopted from existing literature, and this model is used to describe Vancomycin-resistant enterococcus (VRE) infection in a hospital unit.The other model is derived in this paper based on an existing deterministic Human Immunodeficiency Virus (HIV) infection model.This new stochastic model is used to describe the dynamics of HIV during the early stage of infection, where the target cells are still at very high numbers while the infected cells are at a very low level.
The methodology illustrated in this paper is highly relevant to current researchers in the biological sciences.As investigations and models of disease progression become more complex and as interest in initial phases (i.e., HIV acute infection, initial disease outbreaks) of infections or epidemics increases, the recognition increases that many of these efforts require small population number models (for which ordinary differentail equations (ODEs) are unsuitable).These efforts will involve multiscale discrete valued stochastic models that have as limits (as population numbers increase) ordinary differential equations for the expected population values or means often used (in part because of the ease of their use in simulation studies) in mathematical treatments.As the interest in initial stages of infection grows, so also will the need grow for efficient simulation with these small population numbers models.A major contribution of this paper is a careful presentation of computational issues arising in simulations with such models.A second contribution is the presentation and discussion of a new multiscale discrete stochastic model for HIV infection which in the limit as population numbers increase is an existing clinical-data validated ODE model which has been instrumental in prediction of disease progression and experimental design.
The outline of the remainder of this paper is as follows.In Section 2 we give a short introduction of the stochastic simulation algorithm, and explicit and implicit tau-leaping algorithms.In Section 3 we give some background on the VRE and HIV models, and then apply these three stochastic algorithms to these two infection models and compare their computational efficiency.We conclude the paper in Section 4 with some summary remarks and suggestions for future efforts.

Simulation Algorithms
In this section, three computational algorithms for solving stochastic systems will be examined, the stochastic simulation algorithm or Gillespie algorithm, the explicit tauleaping method and the implicit tau-leaping method.Outlines for implementing each algorithm will be given along with motivations for the algorithm and discussions about when one might want to use one algorithm over another.
Unless otherwise indicated, a capital letter is used to denote a random variable, a bold capital letter is for a random vector, and their corresponding small letters are for their realizations.

Stochastic Simulation Algorithm
The stochastic simulation algorithm (SSA), also known as the Gillespie algorithm [11], is the standard method employed to simulate continuous time Markov Chain models.The SSA was first introduced by Gillespie in 1976 to simulate the time evolution of the stochastic formulation of chemical kinetics, a process which takes into account that molecules come in whole numbers as well as the inherent degree of randomness in their dynamical behavior.However, in addition to simulating chemically reacting systems, the Gillespie algorithm has become the method of choice to numerically simulate stochastic models arising in a variety of other biological applications [3,4,18,19,25,26].
Two mathematically equivalent procedures were originally proposed by Gillespie, the "Direct method" and the "First Reaction method".Both procedures are exact procedures rigorously based on the chemical master equation [11]; however, the direct method is the method typically implemented due to its efficiency.Likewise, this is the method employed in this paper.The direct method can be described for a general system by assuming X = (X 1 , X 2 , ..., X n ) T represents the state variables of the system where X i (t) denotes the number in state X i at time t (X i may be the number of patients, cells, species, etc).Furthermore, it is assumed l transitions (often referred to as reaction channels in biochemistry literature) are possible with associated transition rates (often referred to as propensity functions in the biochemistry literature) represented by λ i , i = 1, ..., l.Given this terminology, the direct method for the Gillespie algorithm can be described by the following procedure: Step 1. Initialize the state of the system x 0 ; Step 2. For the given state x of the system, calculate the transition rates λ i (x), i = 1, ..., l; Step 3. Calculate the sum of all transition rates, λ = l i=1 λ i (x); Step 4. Simulate the time, τ , until the next transition by drawing from an exponential distribution with mean 1/λ; Step 5. Simulate the transition type by drawing from the discrete distribution with probability Prob(transition = i) = λ i (x)/λ.Generate a random number r 2 from a uniform distribution and choose the transition as follows: If 0 < r 2 < λ 1 (x)/λ, choose transition 1; if λ 1 (x)/λ < r 2 < (λ 1 (x) + λ 2 (x))/λ choose transition 2, and so on; Step 6. Update the new time t = t + τ and the new system state; Step 7. Iterate steps 2-6 until t ≥ t stop .

Tau-Leaping Methods
Since the SSA method keeps track of each transition, it can be impractical to implement for certain applications due to the computational time required.As a result, Gillepsie proposed an approximate procedure, the tau-leaping method, which accelerates the computational time while only sustaining a small loss in accuracy [12].Instead of taking incremental steps in time, keeping track of X(t) at each time step as in the SSA method, the tau-leaping method leaps from one subinterval to the next, approximating how many transitions take place during a given subinterval.It is assumed that the value of the leap, τ , is small enough that there is no significant change in the value of the transition rates along the subinterval [t, t + τ ].This condition is known as the leap condition.The tauleaping method thus has the advantage of simulating many transitions in one leap while not loosing significant accuracy, resulting in a speed up in computational time.In this paper, we consider two tau-leaping methods, an explicit and an implicit version.

An Explicit Tau-Leaping Method
The explicit tau-leaping method is based on an explicit formulation for the update in number of species X at time t + τ , given X(t) = x.The basic explicit tau-leaping method approximates K j , the number of times a transition j is expected to occur within the time interval [t, t + τ ], by a Poisson random variable P j (λ j (x), τ ) with mean (and variance) λ j (x)τ .Once the number of transitions are estimated, the approximate number of species, known as the tau-leaping approximation, of X at time t + τ is given by the formula with v j = (v 1j , ..., v nj ) T where v ij represents the change in state variable X i caused by transition λ j [8].However, as mentioned previously, the process for selecting τ is critical in the tau-leaping method.If τ is chosen too small, tau-leaping will essentially stop, leading to the standard SSA algorithm; on the other hand, if the value of τ is too large, the leap condition may not be satisfied, possibly causing significant inaccuracies in the simulation.In this paper, we use a τ -selection procedure based on the algorithm in [8].
For alternative procedures for selecting τ , we refer the reader to references [8,12,13].
Let ∆X i = X i (t + τ ) − x i with x i being the ith component of x, i = 1, 2, . . .n, and be an error control parameter with 0 < 1.In the given τ -selection procedure, τ is chosen such that which evidently requires the relative change in X i to be bounded by g i except that X i will never be required to change by an amount less than 1.The value of g i in (2.2) is chosen such that the relative changes in all the transition rates will be bounded by .For example, if the transition rate λ j has the form λ j (x) = c j x i with c j being a constant, then the reaction j is said to be first order and the absolute change in λ j (x) is given by Hence, the relative change in λ j (x) is related to the relative change in X i by ∆λ j (x) , which implies that if we set the relative change in X i by (i.e., g i = 1), then the relative change in λ j is bounded by .If the transition rate λ j has the form λ j (x) = c j x i x r with c j being a constant, then the reaction j is said to be second order and the absolution change in λ j (x) is given by Hence, the relative change in λ j (x) is related to the relative change in X i by which implies that if we set the relative change in X i by 2 and the relative change in X r by 2 (i.e., g i = 2, g r = 2), then the relative change in λ j is bounded by to the first order approximation.
The tau-leaping method employed in this paper also includes modifications developed by Cao, et al., [7] to avoid the possibility of negative populations.When utilizing a tau-leaping method instead of the exact SSA method, as discussed previously, estimates are made about how many times a transition has occurred during the leap-interval.From the estimate of the number of transitions and how each transition effects the state variables, an estimate is obtained for the number of species in each state, X i , at the end of the leap-interval.In some instances, if a population or number of species is small at the beginning of the leap-interval, the estimate of the state variable after numerous transitions may result in a negative population.To avoid this situation, Cao, et al., [7,8] introduced another control parameter, n c , a positive integer (normally set between 2 and 20) which is used to separate transitions into two classes, critical transitions or noncritical transitions.
A transition j is deemed critical if after n c of these transitions, there is a danger in one of the state variables involved in the transition reaching zero.An estimate for the maximum number of times L j , j = 1, ..., l that transition j can occur before reducing one of the state variables involved in the transition to 0 (or less) is calculated by with the brackets indicating the floor function.If L j is less than the control parameter n c , then the reaction is deemed critical.All critical transitions are then restricted to a single transition during the leap period reducing the probability of a negative population to nearly zero.All the remaining noncritical transitions use the traditional tau-leaping method.The algorithm for the modified explicit tau-leaping method is given below.
Step 1.Given X(t) = x, identify all critical transitions by first estimating the maximum number of times, L j , that a transition can occur before causing a negative population where with the brackets indicating the floor function.A transition is considered critical if L j < n c .(In our calculations, we set n c =10)
Step 2. Choose a value for the error control parameter .(In our calculations, we set = 0.03).Then, compute τ 1 so each transition rate λ j , j = 1, ..., l is bounded by , according to the following definitions: where g i as described in the text.
Step 3. Determine whether tau-leaping is appropriate by comparing τ 1 to 1/λ.If τ 1 is less than some multiple of 1/λ (chosen to be 10 in our calculations), then abandon tauleaping and execute a set number of single transition SSA steps (chosen to be 100 in our calculations) and return to step 2. Otherwise proceed; Step 4. Compute (the sum of all critical transition rates).Generate a second candidate time leap, τ 2 as a sample of the exponential random variable with mean 1/λ c .
Step 5. Let τ = min{τ 1 , τ 2 }.Approximate the number of transitions within the time interval, K j , as a sample of the Poisson random variable with mean λ j (x)τ for all j ∈ J ncr .For all critical reactions, define K j as follows: • If τ = τ 1 , set K j = 0 for all j ∈ J cr (no critical transitions occur).
• If τ = τ 2 , let j c be a sample of the integer random variable with point probabilities λ j (x)/λ c for j ∈ J cr .Set K jc = 1 (j c indicates the only critical transition which occurs) and K j = 0 for j ∈ J cr , j = j c (only one critical transition occurs).
Step 6.If there is a negative component in x + j K j v j , reduce τ 1 by half and return to step 3. Otherwise leap by replacing the time, t = t + τ and the update the new system state, Step 7. Iterate steps 1-6 until t ≥ t stop .

An Implicit Tau-Leaping Method
In many applications, such as the HIV model explained in Section 3.2, problems of "stiffness" may arise.Rathinam et al., [24], explored the nature of stiffness in discrete stochastic systems and demonstrated that an implicit tau-leaping method (similar to implicit Euler methods for ordinary differential equations) is capable of taking large time steps for stiff, discrete systems, producing accurate results for such systems while significantly reducing the computational time when compared to explicit tau-leaping methods [14].The implicit tau-leaping method replaces the explicit update formula given in equation (2.1) by an implicit tau-leaping formula given by Note that the above formula typically gives a non-integer vector for X(t+τ ).To overcome this difficulty, Rathinam et al., [24] proposed a two-stage process given by and where z denotes the nearest non-negative integer corresponding to a real number z.
The implicit tau-leaping method does not have a stability limitation as the explicit tauleaping does (i.e., the relative changes in all the transition rates are bounded by ) due to the implicitness of the scheme.In [9], the stepsize for the stiff system is chosen to bound the relative changes of those transition rates resulting from the non-equilibrium reactions by , thus, a larger stepsize is allowed.However, as remarked by the authors in [9] that it is generally difficult to determine whether or not a reaction is in partial equilibrium, and the partial equilibrium condition is only formulated in [9] for those reversible reaction pairs for some biochemical systems.To overcome this difficulty, in this paper we use (2.3) to choose τ 1 for the implicit tau-leaping method but with a larger to allow a possible large time stepsize.
To avoid the possibility of negative populations (i.e., to ensure (2.4) has non-negative solution), the algorithm for implicit tau-leaping is implemented in the same way as that for the explicit tau-leaping method except the update for states (i.e., replace (2.1) by (2.4) and (2.5)).

Biological Applications
In this section, we report on use of the SSA, and the explicit and implicit tau-leaping algorithms with two stochastic infection models.One is an existing stochastic model in the literature that is used to describe VRE infection at the population level.The other is derived in this paper based on an existing validated deterministic model for describing the HIV infection within a host.All three algorithms were coded in Matlab, and all the simulation results were run on a Linux machine with a 2GHz Intel Xeon Processor with 8GB of RAM total.We do note that computational times depend on how Matlab codes are written.Hence, all three algorithms are implemented in the same style as well as using similar coding strategies (such as array preallocation and vectorization) to speed up computations.Thus the comparison computational times given below should be interpreted as relative to each other for the algorithms discussed rather than in any absolute sense.
The following notation is used throughout the remainder of this paper: Z n is the set of n-dimensional column vectors with integer components, and e i ∈ Z n is the ith unit column vector, that is, the ith entry of e i is 1 and all the other entries are zeros, i = 1, 2, . . ., n.
Before we introduce the VRE and HIV models, we give a brief comment on how to numerically solve the system of nonlinear equations required in the implicit tau leaping method.By (2.4) we know that the system of nonlinear equations to be solved are of the form where p j is the realization of P j (λ j (x), τ ).To solve (3.1) for x, we first convert the nonlinear equation (3.1) into the form g(y) = 0, where y = (y 1 , y 2 , . . .y n ) T with x = (y 2 1 , y 2 2 , . . ., y 2 n ) T , and Then we use the command fsolve in Matlab to obtain the numerical solution of (3.2), and then the numerical solution for (3.1) is obtained by taking x = (y 2 1 , y 2 2 , . . ., y 2 n ) T .The reason for converting (3.1) into (3.2) is to avoid possible negative values of state variables.

A VRE Model
VRE is the group of bacterial species of the genus enterococcus that is resistant to the antibiotic vancomycin, and it can be found in sites such as digestive/gastrointestinal, urinary tracts, surgical incision, and bloodstream.The bacteria responsible for VRE can be a member of the normal, usually commensal bacterial flora that becomes pathogenic when they multiply in normally sterile sites.VRE infection is one of the most common infections occurring in hospitals, and this type infection is often referred to as a nosocomial or hospital-acquired infection (these are evidence that hospitals provide not only medical care but also harbor pathogens that pose serious risks of infections).
A stochastic model was developed in [19] to describe the dynamics of VRE infection in a hospital, and it will be used to demonstrate the computational efficiency of the SSA, as well as the explicit and implicit tau-leaping methods.In this model, the patients are classified as uncolonized U (those individuals with no VRE present in the body), colonized C (those individuals with VRE present in the body) or colonized in isolation J, as depicted in the compartmental schematic of Figure 1.From this figure, we see that patients are admitted into the hospital at a rate of Λ per day, with some fraction m of patients already VRE colonized (0 ≤ m ≤ 1).Uncolonized patients are discharged from the hospital at a rate of µ 1 U per day, and colonized patients and colonized patients in isolation are discharged from the hospital at rates of µ 2 C and µ 2 J per day, respectively.An uncolonized patient becomes colonized at a rate β U [C + (1 − γ)J ] per day, where the hand-hygiene policy applied to health care workers on isolated VRE colonized patients reduces infectivity by a factor of γ (0 < γ < 1), and the rate of contact is assumed to be proportional to the population size (i.e., mass action incidence).In addition, a colonized patient is moved into isolation at rate αC per day.

Stochastic VRE Model
In [19], the dynamics of VRE infection are modeled as a continuous time Markov chain.In addition, a constant population is assumed for this model in which the hospital remains full all the time, that is, the overall admission rate equals the overall discharge rate.Let N denote the total number beds available in the hospital, and {X N (t), t ≥ 0} be a continuous time Markov chain with , where the meaning of the random variable X N i , i = 1, 2, 3 are given in Table 1.In any small time interval of length ∆t, we assume {X N (t), t ≥ 0} jumps from state , and λ j is the transition rate for reaction j.Based on the the assumption of constant population, the transition rates are summarized in the second column of Table 2. From this table, we see that there are five transition rates (i.e, l = 5), and the corresponding states changes v j are listed in the third column of this table.
In addition, the corresponding deterministic model for the stochastic model (3.3) with transition rates given in Table 2 was derived in [19] by using the Kurtz Limit Theorem [16,17] (that is, ordinary differential equations can be used to approximate a (density Reactions Table 2: Transition rates λ j (x N ) as well as the corresponding state changes v j for the stochastic VRE model (3.3), j = 1, 2, . . ., 5.
dependent) CTMC when the sample size is sufficiently large), and this model is given by where U N (t), C N (t) and J N (t) denote the number of uncolonized patients, VRE colonized patients and VRE colonized patients in isolation at time t in a hospital with N beds.

Numerical Results
In this section we report on numerical results obtained by applying the SSA, explicit tauleaping and implicit tau-leaping methods to the stochastic VRE model (3.3) with transition rates given in Table 2.We compare the computational times of the SSA, explicit and implicit tau-leaping methods with different values of N .All the simulations were run for the time period [0, 200] days with parameter values given in the third column of Table 3 and initial conditions (adapted from Table 3 in [19]) that is, all the simulations start with the same initial density 29 37 , 4 37 , 4 37 To implement the tau-leaping methods, we need to find g i , i = 1, 2, 3. From Table 2 we see that all the reactions are first order except the first one.Let ∆λ j (x N ) = λ j (x N + ∆x N ) − λ j (x N ) with ∆x N being the absolute changes in the state variables, j = 1, 2, . . ., 5. Note that which implies that .
Hence, by the above equation and the positiveness of parameters and non-negativeness of the state variables we have Thus, from the above inequality we see that if we choose then the relative change in λ 1 is bounded by (to the first order approximation).For all the other first-order reactions, to ensure the relative changes in the transition rates are bounded by , we need to set the relative changes in the state variables to be bounded by .Thus, by (3.6) we know that to have |∆λ j (x N )| ≤ λ j (x N ) for all j = 1, 2, . . ., 5, we need to set For the implicit tau-leaping, the value of is taken as 0.3 (to allow a possible larger time stepsize).This value is chosen based on the simulation results so that the computational time is comparatively short without compromising the accuracy of the solution.
Figure 2 depicts the computational times of each algorithm for an average of five typical simulation runs with N varying from 37, 185, 370, 1850, 3700, 18500, 37000.From this figure, we see that the computational times for all the algorithms increase as the value N increases.This is expected for the SSA as the mean time stepsize for the SSA is the inverse of the sum of all transition rates, which increases as N increases (roughly proportional to N 2 as can be seen from the transition rates illustrated in Table 2).For the  explicit tau-leaping method we found, for all the N that we tried, the value of τ 1 is often less than 10/λ, which implies that the SSA is implemented most of the time as opposed to the tau-leaping method (based on the algorithm in Section 2.2.1).This also explains why the SSA and the explicit tau-leaping perform similarly.The same thing is also observed for the implicit tau-leaping method when N = 37 and 185.However, when N increases to 370, we found that the implicit tau-leaping method requires significant time for implementation, and we also found that its time stepsize in this case is still not significantly larger than those of the SSA.Note that systems of nonlinear equations must be solved for the implicit tau-leaping method.Hence, the computational time in this case is expected to be larger than for the other two methods (this can be observed from this figure).As N continues increasing, we see that the computational times for the implicit tau-leaping is similar to those of the SSA and the explicit tau-leaping method; this is because the time stepsize becomes significantly larger than those of these two methods in which solving systems of nonlinear equations comprises the main time consuming cost.As one can see from (2.3) and the transition rates illustrated in Table 2 that if x i /g i > 1 then the first term inside of the minimum sign of (2.3) is roughly proportional to 1/N while the second term is roughly proportional to 1.The simulation results show that the first term inside of the minimum sign of (2.3) is smaller than the second term when N increases to 1850 and above.Hence, the time stepsize for the implicit tau-leaping method decreases as N increases when N ≥ 1850, which implies that the computational time for the implicit tauleaping method increases as N increases.Based on the above discussions, we see that the SSA has similar performance to that of the tau-leaping methods (and may be slightly better in some cases).Due to its simplicity and accuracy, the SSA is the best choice for this particular problem.
To have some idea on the dynamics of each state variable, we have plotted five typical sample paths of each state of the stochastic VRE model (3.3) obtained by each stochastic algorithm in comparison to the solution for the deterministic VRE model (3.4). Figure 3 depicts the results obtained with N = 37.From this figure, we observe that all the sample paths for each state variable oscillate around their corresponding deterministic solution, and they exhibit very large differences.
In addition, Figure 3 reveals that the sample paths obtained by the SSA and the tauleaping methods exhibit similar variation.The results obtained with N = 370, 3700 and 37000 are given in Appendix A. From these figures we also observed that the sample paths obtained by all three algorithms exhibit similar variation.Note that results obtained by the SSA are exact.Hence, the difference between histograms obtained by the SSA and those obtained by the tau-leaping methods provide a measure of simulation errors in tau-leaping methods when the number of simulation runs are sufficiently large.Thus, to provide further information on the accuracy of the tau-leaping methods, we plotted the histograms of state solutions to the stochastic VRE model (3.3) obtained by explicit and implicit tau-leaping algorithms in comparison to those obtained by the SSA.For the purpose of demonstration, we present here the results only for the case with N = 370.These histograms are shown in Figure 4 (where plots in the left column are for the histograms of state solutions at t = 80 days, while those in the right column are for the histograms of state solutions at t = 160 days).They were obtained by simulating 1000 sample paths of solutions to the stochastic VRE model.We observe from this figure that the histograms obtained with the explicit tau-leaping algorithm approximately lie on top of those obtained with the SSA, and the histograms obtained with the implicit tau-leaping algorithm are reasonably close to those obtained with the SSA (a similar behavior is also observed for the histograms of state solutions at other time points).This is evidence that the accuracy of results obtained by the tau-leaping methods are reasonably high.
We also observe from Figure 3 as well as those figures in Appendix A that the variation between the sample paths decreases as N increases, and become quite close to the deterministic solution when N = 37000.This agrees with the expectations in light of the Kurtz Limit theorem [16,17].

An HIV Model
HIV is a retrovirus that targets the CD4+ T-cells in the immune system.Once the virus has taken control of a sufficiently large proportion of CD4+ T-cells, an individual is said to have AIDS.There are a wide variety of mathematical models that have been proposed to describe the various aspects of in-host HIV infection dynamics (e.g., [1,2,5,6,10,21,23]).The most basic of these models typically include two or three of the key dynamic compartments: virus, uninfected target cells, and infected cells.These compartmental depictions lead to systems of linear or nonlinear ordinary differential equations in terms of state variables representing the concentrations in each compartment and parameters describing viral production and clearance, cell infection and death rate, treatment efficacy, etc.
The stochastic model we use to demonstrate the computational efficiency of the SSA, and explicit and implicit tau-leaping methods is based on the deterministic HIV model proposed in [5].Data fitting results validate this model and verify that it provides reasonable fits to all the 14 patients studied.Moreover it has impressive predictive capability when comparing model simulations (with parameters based on estimation procedures using only half of the longitudinal observations) to the corresponding full longitudinal data sets.

Deterministic HIV Model
The model in [5] includes eight compartments: uninfected activated CD4+ T-cells (T 1 ), uninfected resting CD4+ T cells (T 2 ), along with their corresponding infected states (T * 1 and T * 2 ), infectious free virus (V I ), non-infectious free virus (V N I ), HIV-specific effector CD8+ T-cells (E 1 ), and HIV-specific memory CD8+ T-cells (E 2 ), as depicted in the compartmental schematic of Figure 5.In this model, the unit for all the cell compartments is cells/µl-blood and the unit for the virus compartments is RNA copies/ml-plasma 1 .In addition, protease inhibitor (PI, used to prevent viral replication) and reverse transcriptase inhibitor (RTI, used to block new infection) are the two types of drug used to treat HIV patients.
For simplicity, we only consider this model in the case without treatment, and will exclude the non-infectious free virus compartment from the model.In addition, we convert the unit for the free-virus from RNA copies/ml-plasma to RNA copies/µl-blood to be consistent with the units of cell compartments, and modify the equation based on this change.This conversion will make the derivation of the corresponding stochastic HIV model more direct.With these changes in the model of [5], the state variables and their corresponding units for the deterministic model that we use in this paper is reported in Table 4.The corresponding compartmental ordinary differential equation model is given by with an initial condition vector A summary of the description of all the model parameters in (3.7) is given in Table 5.Next we present a brief description of model (3.7), and focus our discussion on the interactions particularly relevant to the derivation of the corresponding stochastic HIV model.The interested readers may refer to [5] for more detailed discussion of the rational behind this model.The terms 1 +κ d E 1 and d E2 E 2 denote the death (or clearance) of uninfected activated CD4+ T cells, uninfected resting CD4+ T cells, infected resting CD4+ T cells, infectious free virus, HIV-specific activated CD8+ T cells and HIVspecific memory CD8+ T cells, respectively.The term δ E E 1 T * 1 is used to account for the elimination of infected activated CD4+ T cells T * 1 by the HIV-specific effector CD8+ T cells (that is, T * 1 is eliminated from the system at rate δ E E 1 , depending on the density of HIV-specific effector CD8+ T cells).
The terms involving β T 1 V I T 1 represent the infection process wherein infected activated CD4+ T cells T * 1 result from encounters between uninfected activated CD4+ T cells T 1 and free virus V I (that is, the activated CD4+ T cells T 1 become infected at a rate β T 1 V I , depending on the density of infectious virus), and β T 2 V I T 2 is the resulting term for the infection process wherein infected resting CD4+ T cells T * 2 result from encounters between uninfected resting CD4+ T cells T 2 and free virus V I (that is, the resting CD4+ T cells T 2 become infected at rate β T 2 V I , depending on the density of infectious virus).In addition, for simplicity it is assumed that one copy of virion is responsible for one new infection.Hence, the term The terms a T V I V I +κ V + a A T 2 and a T V I V I +κ V + a A T * 2 represent the activation of uninfected and infected resting CD4+ T cells, respectively, due to both HIV and some non-HIV antigen.We assume that each proliferating resting CD4+ T cells produce n T daughter cells.In addition, the term a E V I V I +κ V E 2 denotes the activation of HIV-specific memory CD8+ T cells, and each proliferating HIV-specific CD8+ T cells produce n E daughter cells.
The infected activated CD4+ T cell dies at a rate δ V , and produce n V copies of virions during its lifespan (either continuously producing virions during its life or release all its virions in a single burst simultaneous with its death2 ).
The terms ζ T κs 1 +κ b1 E 1 denote the source rates for the uninfected resting CD4+ T cells and HIV-specific effector CD8+ T cells, respectively.The term b E2 κ b2 E 2 +κ b2 E 2 is used to account for the self proliferation of E 2 (due to the homeostatic regulation).

The Stochastic HIV Model
In this section, we derive a corresponding stochastic HIV model based on the deterministic model (3.7).Let ν denote the volume of blood (in units µl-blood), and the parameter vector κ = (κ V , κ s , κ b1 , κ d , κ γ , κ b2 ) T where κ V , κ s , κ b1 , κ d , κ γ , κ b2 are the saturation constants in model (3.7).Then we define k = νκ with k = (k V , k s , k b1 , k d , k γ , k b2 ) T , which will be used in the transition rates for the stochastic model.
Let {X ν (t), t ≥ 0} be a pure jump Markov process with X ν = (X ν 1 , X ν 2 , . . ., X ν 7 ) T , where the meanings of random variables time interval of length ∆t, we assume that there is only one event (or reaction) that occurs (e.g., a cell dies, a cell becomes infected, an activated cell is differentiated into a resting cell, a resting cell becomes activated), and the process {X ν (t), t ≥ 0} jumps from state x ν to x ν + v j with probability λ j (x ν )∆t + o(∆t), that is, where , and λ j is the transition rate for the jth reaction.In this stochastic model, we assume a burst production for the viral load (that is, an infected activated CD4+ T cell releases all its virions in a single burst simultaneous with its death).Based on this assumption as well as the discussions in Section 3.2.1,we can obtain the transition rates, which are summarized in the second column of Table 7. From this table, we see that there are 19 transition rates (i.e., l = 19), and the corresponding states changes v j are listed the third column of this table.Birth of HIV-specific effector CD8+ T cells Proliferation of HIV-specific memory CD8+ T cells b Table 7: Transition rates λ j (x ν ) as well as the corresponding state changes v j for the stochastic HIV model (3.8).

Numerical Results
In this section, numerical results were obtained by applying the SSA, explicit tau-leaping and implicit tau-leaping methods to the stochastic HIV model (3.8) with transition rates given in Table 7.We compare the computational times of the SSA, explicit and implicit tau-leaping methods with different values of ν (the scale on the transition rates which effectively scales the population counts).Each of the simulations were run for the time period [0, 100] days with parameter values given in the third column of Table 5 (adapted from Table 2 in [5]) and initial conditions X ν (0) = ν(5, 1, 1400, 1, 10, 5, 1) T . (3.9) That is, each of the simulations start with the same initial concentrations (5, 1, 1400, 1, 10, 5, 1) T .
To implement the tau-leaping methods, we need to find g i , i = 1, 2, . . ., 7. From Table 7 we see that all the reactions are either zero order, or first order or second order.However, a lot of them have transition rates with non-constant coefficients (λ 6 , λ 12 , λ 13 , λ 14 , λ 15 , λ 17 , λ 18 , λ 19 ).For these reactions, g i is not just one or two.Here we will take reaction six as an example to illustrate this.Let ∆λ i (x ν ) = λ i (x ν + ∆x ν ) − λ i (x ν ) with ∆x ν being the absolute changes in the state variables, i = 1, 2, . . ., 19.Note that , which implies that Thus, by the above equation as well as the non-negativeness of the state variables and the positiveness of the parameters we have Hence, by (3.2.3) and the above inequality we obtain Thus, from the above inequality we see that if we choose then the relative change in λ 6 is bounded by (to the first order approximation).Similarly to have the relative changes in the other transition rates with non-constant coefficient to be bounded by (either exactly or to the first order approximation), we need to set Reaction 12: Reaction 13: For those transition rates with constant coefficient, we can easily see that to have them be bounded by (to the first order approximation) we need to set the relative changes in the state variables to be bounded by either 1 2 (to those second order reaction) or (to those first order reaction).Thus, by (3.11) and (3.12) we know that to have |∆λ i | ≤ λ i for all i = 1, 2, . . ., 19, we need to set For the implicit tau-leaping method, was taken as 0.12 (to allow a larger time stepsize).This value is chosen based on the simulation results so that the computational time is comparatively short without compromising the accuracy of the solution.
Figure 6 depicts the computational time of each algorithm for an average of five typical simulation runs with ν varying from 10, 50, 10 2 , 2 × 10 2 , 5 × 10 2 , 10 3 for the SSA and  10, 50, 10 2 , 2 × 10 2 , 5 × 10 2 , 10 3 , 10 4 , 10 5 , 10 6 , 5 × 10 6 for the explicit and implicit tau-leaping schemes.From this figure, we see that the computational times for the SSA increase as the value ν increases.This is again expected as the mean time stepsize for the SSA is the inverse of the sum of all transition rates, which increases as ν increases (roughly proportional to ν, as can be seen from the transition rates illustrated in Table 7).In addition, even with ν = 10 3 , it took the SSA more than 8000 seconds for one sample path (which is why we did not run any simulations for the SSA when ν is greater than 10 3 ).Hence, it is impractical to implement the SSA if we want to run this HIV model for a normal person (generally having approximately 5 × 10 6 µl-blood).This is expected due to the large value of uninfected resting CD4+ T cells (as can be seen from the initial condition (3.9)).
From Figure 6 we also see that the computational times for the explicit tau-leaping method increase as the value ν increases from 10 to 50, and decrease as ν increases to 100.Then its computational times decrease dramatically as the value of ν increases from 100 to 10 4 , and stabilizes somewhat for ν ≥ 10 4 .This is due in some way to the formula for τ 1 .As we can see from (2.3) and transition rates in Table 7 that if x i /g i > 1 then the first term inside the minimum sign of (2.3) is roughly in the same order for all the values of ν while the second term is roughly proportional to the value of ν.In addition, we found from the simulation results that the first term inside the minimum sign of (2.3) is much larger than the corresponding second term until ν = 10 4 , and becomes smaller than the second term when ν ≥ 10 4 .Hence, τ 1 increases as ν increases when ν ≤ 10 4 and has roughly similar values for all the cases when ν ≥ 10 4 , which agrees with the observation that the computational times decrease dramatically as the value of ν increases from 100 to 10 4 , and stabilizes there for ν ≥ 10 4 .The increase of computational times as ν increases from 10 to 50 is because τ 1 is so small that a large number of SSA steps are implemented instead of tau-leaping.
We also observe from Figure 6 that the computational times for the implicit tau-leaping method decrease as ν increases when ν ≤ 10 4 and then stabilizes there for ν ≥ 10 4 .This is for the same reason as that for the explicit tau-leaping method.In addition, we see that the computational times for the implicit tau-leaping is significantly higher than those of the SSA and the explicit tau-leaping at ν = 10, which is because under this case the implicit tau-leaping is implemented many times (solving systems of nonlinear equations in each implicit tau-leaping step is costly) and the time stepsize is not significantly larger than those of the other two methods.
Based on the above discussions, we know that for smaller values of ν (less than 100) the SSA is the choice due to its simplicity, accuracy and efficiency.However, for larger values the tau-leaping methods are definitely the choice with implicit tau-leaping performing better than explicit tau-leaping; this is expected due to the stiffness of the system (large variations in both parameter values and state variables).
To have some idea on the dynamics of each state variable, we plotted five typical sample paths of solution to stochastic HIV model (3.8) (in terms of concentrations, i.e., X ν (t)/ν) obtained by each stochastic algorithm in comparison to the solution for the deterministic HIV model (3.7).Figures 7-8 depict the results obtained with ν = 10 µl-blood, where the results in Figure 7 are for CD4+ T cells, and the ones in Figure 8 are for the infectious virus and CD8+ T cells.The right column of Figure 7 reveals that all the sample paths for the uninfected resting CD4+ T cells (T 2 ) are similar to their corresponding deterministic solutions, which is expected as the value of the uninfected resting CD4+ T cells is so large that its dynamics can be well approximated by the ordinary differential equation.While all the other state variables oscillate around their corresponding deterministic solution (as can be seen from these two figures), and the variations of the sample paths for the infectious virus (V I ) and CD8+ T cells (E 1 and E 2 ) are especially large.This is expected as T 2 has less effect on the dynamics of these three compartments (especially on E 1 and E 2 , as observed from (3.7) and the parameter values in Table 5).
In addition, Figures 7 and 8 reveal that the sample paths obtained by the SSA and the tau-leaping methods exhibit similar variation, which suggests that tau-leaping methods have reasonably high accuracy.The results obtained by all three algorithms with ν = 100, 1000 µl-blood are given in Appendices B.1 and B.2, and the results obtained by the tau-leaping methods with ν = 10 4 , 10 5 and 5 × 10 6 are given in Appendices B.3-B.5.From the figures in Appendices B.1 and B.2 we observe that the sample paths obtained by all three algorithms exhibit similar variation, and the variation between the sample paths decreases as ν increases.The same conclusion can be obtained from the figures in Appendices B.3-B.5.To gain further information on the accuracy of the tau-leaping methods, we plotted the histograms of state solutions to the stochastic HIV model (3.8) (in terms of concentration) obtained by explicit and implicit tau-leaping methods in comparison to those obtained by the SSA.Due to long computational times to obtain these histograms, we present here the results only for the case with ν = 200 µl-blood (where computational times required by both tau-leaping methods are significantly lower than those required by the SSA, as can be seen from Figure 6).These histograms are shown in Figures 9 and 10  model.The plots in Figure 9 are for histograms of state solutions at t = 50 days, and those in Figure 10 are for histograms of state solutions at t = 100 days.We observe from these two figures that the histograms obtained by tau-leaping methods are reasonably close to those obtained by the SSA (similar behavior is also observed for the histograms of state solutions at other time points).This suggests that the accuracy of results obtained by tau-leaping methods are acceptable (recall that results obtained by the SSA are exact, so the differences between histograms obtained by the SSA and the ones obtained by tauleaping methods provide a measure of simulation errors in tau-leaping methods when the number of simulation runs is sufficiently large).
We also observe from the figures in Appendix B.5 that the stochastic solutions agree with the deterministic solutions when ν = 5 × 10 6 µl-blood.This suggests that ordinary differential equations can be used to approximate the dynamics of HIV for this particular problem (with the given parameter values and initial conditions).

Concluding Remarks
In this paper we present a detailed discussion on how to apply the tau-leaping methods to two stochastic infection models with different levels of complexity, and compared their computational efficiency along with that of the SSA.One of these models is an existing stochastic model used for describing the dynamics of VRE infection in a hospital unit.The other model is derived in this paper based on an existing deterministic model used for describing the dynamics of HIV infection within a host.
Even though tau-leaping methods have now been widely used in biochemistry literature, to our knowledge, tau-leaping methods have not been applied to complex nonlinear dynamical infectious disease models such as the HIV model that we presented in this paper (with transition rates being complicated nonlinear functions of state variables rather than some simple polynomial functions).We do note that the computational performance of these three methods vary from model to model (as we demonstrated in this paper), which suggests that for a given model of interest one might need to perform some initial simulation studies to select an appropriate algorithm (for example among those we propose in this paper).However, the step-by-step implementation recipe demonstrated in this paper for these algorithms can be applied to a wide range of other complex biological models, specifically those in infectious disease progression and epidemics.
Simulation results reveal that for both models the computational times of the SSA increase as the sample size (number of beds N in the VRE model and volume of blood ν in the HIV model) increases.This is because the mean time stepsize for the SSA is the inverse of the sum of transition rates, which increases as the sample size increases.In addition, the results suggest all three algorithms have comparable computational times for the VRE model because of the low number of species and small number of transitions, and the SSA is the best choice for this problem due to its simplicity and accuracy.For the HIV model both tau-leaping methods have significantly lower computational costs than those of the SSA except when the sample size ν is very small (e.g., less than 100 µl-blood).In addition, the implicit tau-leaping method has lower computational costs than the explicit tau-leaping method when the sample size is sufficiently large (due to the stiffness of the system).
Note that the stochastic HIV model in this paper is of interest in early or acute infection where the number of uninfected resting T cells is large (on the order of 1000 cells per µl-blood).This explains why the SSA requires more than 8000 seconds to run even one sample path with ν = 1000 µl-blood.If one considers an average person having 5 × 10 6 µlblood, to run the SSA for even one sample path is impractical at this scale.The numerical results demonstrate the dynamics of uninfected resting CD4+ T cells can be well approximated by ordinary differential equations even with ν = 10 µl-blood.In addition, Table 5 demonstrates there is a large variation between the values of the parameters.Thus, the HIV model in this paper is multi-scaled in both states and time.There are some hybrid simulation methods (also referred to as multi-scale approaches) specifically designed for the multi-scaled system (the interested readers may refer to [20] for an overview of these methods).The basic idea of the hybrid methods is to partition the system into two sub-systems, one containing fast reactions and the other containing slow reactions.Then the two subsystems are simulated iteratively by using numerical integration of ordinary differential equations (or stochastic differential equations) and stochastic algorithms (such as the SSA), respectively.Although these algorithms are very attractive, they are most challenging to implement and require a great deal of user intervention.Future efforts to study the transition from initial HIV infection (with small numbers of viral particles and large numbers of target cells) to chronic infection (with large numbers in all subpopulations so that continuum models with ordinary differential equations are appropriate) will require such a multi-scale state algorithmic approach.
of beds in a hospital) Computational Time (sec) SSA Explicit Tau−Leaping Implicit Tau−Leaping

Figure 2 :
Figure 2: Comparison of computational times for different algorithms (SSA, Explicit Tau-Leaping and Implicit Tau-Leaping) for an average of five typical simulation runs.

Figure 3 :
Figure3: Results in the left column are for uncolonized patients (U ) and colonized patients in isolation (J), and the ones in right column are for the colonized patients (C).The (D) and (S) in the legend denote the solution obtained with the deterministic VRE model and stochastic VRE model, respectively, where the stochastic results are obtained with the SSA (top two panels), the explicit tau-leaping (middle two panels) and the implicit tau-leaping (bottom two panels).

Figure 4 :
Figure 4: Histograms of state solutions to the stochastic VRE model (3.3) at t = 80 days (left column) and t = 160 days (right column), where histograms are obtained by simulating 1000 sample paths of the solutions to the stochastic VRE model.

Figure 5 :
Figure 5: Solid gray arrows indicate birth/input.PI and RTI denote protease inhibitors and reverse transcriptase inhibitors, respectively.States Unit Description T 1 cells/µl-blood concentration of uninfected activated CD4+ T-cells T * 1 cells/µl-blood concentration of infected activated CD4+ T-cells T 2 cells/µl-blood concentration of uninfected resting CD4+ T-cells T * 2 cells/µl-blood concentration of infected resting CD4+ T-cells V I RNA copies/µl-blood concentration of infectious free virus E 1 cells/µl-blood concentration of HIV-specific effector CD8+ T-cells E 2 cells/µl-blood concentration of HIV-specific memory CD8+ T-cellsTable 4: Model states for the deterministic HIV model.

1 T 1
is used to denote the loss of virions due to infection.The terms involving γ T T 1 (resp.γ T T * 1 ) are included in the model to account for the phenomenon of differentiation of uninfected (resp.infected) activated CD4+ T-cells into uninfected (resp.infected) resting CD4+ T-cells at rate γ T .Similarly, the term γ E T 1 +T * +T * 1 +κγ E 1 is used to describe the phenomenon of differentiation of HIV-specific activated CD8+ T cells into HIV-specific resting CD8+ T cells.

4 n T e 2 − e 4 E x ν 5 x ν 5 + k V x ν 7 n E e 6 − e 7 Production of new virions simultaneous with δ V x ν 2 −e 2 +k s x ν 5 + k s ν e 3
Activation of HIV-specific memory CD8+ T cellsa n V e 5 the death of infected activated CD4+ T cells Birth of uninfected resting CD4+ T cells ζ T

Figure 6 :
Figure 6: Comparison of computational times of different algorithms (SSA, Explicit Tau-Leaping and Implicit Tau-Leaping) for an average of five typical simulation runs.

Figure 7 :
Figure 7: Results for uninfected and infected activated CD4+ T cells (left column), and uninfected and infected resting CD4+ T cells (right column).The (D) and (S) in the legend denote the solution obtained by the deterministic HIV model and the stochastic HIV model, respectively, where the stochastic results are obtained by the SSA (top two panels), the explicit tau-leaping (middle two panels) and the implicit tau-leaping (bottom two panels).
; they are obtained by simulating 1000 sample paths of solutions to the stochastic HIV

Figure 8 :
Figure 8: Results for infectious virus (left column), and HIV-specific CD8+ T cells (right column).The (D) and (S) in the legend denote the solutions obtained by the deterministic HIV model and the stochastic HIV model, respectively, where the stochastic results are obtained by the SSA (top two panels), the explicit tau-leaping (middle two panels) and the implicit tau-leaping (bottom two panels).

Figure 9 :
Figure 9: Histograms of state solutions to the stochastic HIV model (3.8) (in terms of concentrations) with ν = 200 µl-blood at t = 50 days, where histograms are obtained by simulating 1000 sample paths of solutions to the stochastic HIV model.

Figure 10 :
Figure 10: Histograms of state solutions to the stochastic HIV model (3.8) (in terms of concentrations) with ν = 200 µl-blood at t = 100 days, where histograms are obtained by simulating 1000 sample paths of solutions to the stochastic HIV model.

Figure 15 :
Figure 15: Results for infectious virus (left column), and HIV-specific CD8+ T cells (right column).The (D) and (S) in the legend denote the solution obtained by deterministic HIV model and stochastic HIV model, respectively, where stochastic results are obtained by SSA (top two panels), explicit tau-leaping (middle two panels) and implicit tau-leaping (bottom two panels).

Figure 17 :
Figure 17: Results for infectious virus (left column), and HIV-specific CD8+ T cells (right column).The (D) and (S) in the legend denote the solution obtained by deterministic HIV model and stochastic HIV model, respectively, where stochastic results are obtained by SSA (top two panels), explicit tau-leaping (middle two panels) and implicit tau-leaping (bottom two panels).

Table 3 :
Description of model parameters in the stochastic VRE model as well as the values of parameters used in the simulation.

Table 5 :
E Rate at which E 1 eliminates T * 1 0.01 µl-blood/(cells • day) ζ T Maximum source rate of T 2 7 cells/(µl-blood • day) κ s Saturation constant (for source rate of T 2 ) 10 2 copies/(µl-blood) Description of model parameters in (3.7) as well as the values of parameters used in the simulations.

Table 6 :
2, . . ., 7 are stated as follows.In any small State variables for the stochastic HIV model.