Mathematically Modeling PCR: An Asymptotic Approximation with Potential for Optimization

A mathemati cal model for PCR (Polymerase Chain Reaction ) is developed using the law of mass action. Differential equations are written from the chemical equations, preserving th e detail of the complementary DNA single strand being extended one bas e pair at a time. Th e equations for th e annealing stage are solved analytically. The method of multiple scales is used to approximate solutions for the extension stage. A map is then developed from the solutions to simulate PCR. The advantage of this model is the ability to use the map to optimize the process. Our result s suggest that dynamically optimizing the extension and annealing stages may significantly reduc e the total time for a PCR run,


Introduction
The Polymerase Chain Reaction (PCR ) is a technique for enzymatic amplification of specific segments of DNA.Since its inception (Saiki et al. , 1985), it has revolutionized research involvin g genomic material.Pathogen detection, disease diagnosis, human genetics and developmental biology are just a few of the research areas impacted by PCR (Stone et al. , 2006).
PCR is performed by repeating three temperature-induced stages : dissociation , annealing , and extension.In dissociation a samp le containing the target DNA is first heated to approximately 95°C to separate the DNA into single strands.The mixture is then cooled to allow primers to anneal to the temp late DNA.Primers are short single strands of DNA specifically designed t o target and bracket the sequence of DNA in the sample to be duplicated (t he amplicon).Th e temperature of this stage is primer-specific , ranging from 37°C to 72°C.The solution is then heated to 74°C for exte nsion.During this phase the thermostable enzyme Taq Polymerase synthesizes a new DNA strand, completing the complimentary sequence started by the primer.These three stages are repeated 30 to 40 times yielding millions of copies of the target DNA.
Real-time PCR uses fluorescent probes to monitor the amplification of DNA throughout the reaction.The speed at which the fluorescent signal reaches a threshold level correlates with the amo unt of target DNA in the initial sample.Real-time PCR is used to precisely distingu ish and measure specific DNA sequences even if there is only a very small quantity present in the original samp le (Valasek and Repa, 2005).This technology has many applications, including those that benefit from rapidity.Identification of microbes or parasites in commercial food and municipaJ water supplies, pathogen detection, and forensic applications are just a few.Portable, rapid, realtime PCR machines can determine the presence of a pathogen , such as anthrax, in as little as 30 minutes.However , 30 minutes can be a long time on a battlefield or in the event of a biological terror attack.Using mathematics to optimize the process to potentially reduce this time is a valuable exercise.
PCR has been mathematically modeled in several different ways.Early models assumed growth per cycle proportional to the amplicon density (Stone et al., 2006).However , assuming exponential growth of tem plate copies greatly oversimplifies the process and models t he growth only for the first few cycles.In reality, limiting factors cause the process to slow and eventually stop .These factors may includ e exhaustion of pr imer molecules and raw base pairs or a decrease i11 the effectiveness of Taq (Liu and Saint, 2002).
In consideration of this limiting behavior, several mathematical models in the literature predict the efficiency of the PCR process.Liu and Saint (2002) used a sigmoidal mathematical model to fit real time PCR data , and demonstrated that amplification efficiency can change from cycle to cycle.
A linear regression approach to calculating PCR efficiency was given by Ramakers et al. (2002), and Rutledge and Cote (2003) proposed a simplified method for absolute quantification.Gevertz et al. (2005) considered the efficiency of the reaction as a function of the cycle number.They considered an equilibrium model as well as a kinetic model by deriving differential equations directly from the chemical equations for the annealing and extension phases.Aach and Church (2003) also derived mathematical equations from the chemical reactions , but for diffusion-constrained PCR reactions.
Stochastic and probabilistic models are also used to model PCR.Velikanow and Kapral (1999) treated the eJ,,,'tension step as a microscopic Markov process in which the nucleotides bind onto the primed single strand of DNA one at a time.Sun et al. (1996) used the theory of branching processes to develop a model for distributions of mutations and estimation of mutation rates during PCR.Weiss and Von Hessler (1995) treated the accumulation of new molecules during PCR as a random bifurcation tree to estimate overall error rates for the reaction.More recently, Jagers, et al. (2003) used Galton-Watson branching processes to arrive at a linear growth phase following the initial expo nential phase.Lalam (2006) based another model on a Galton-Watson branching process des cription of PCR to estimate the reaction efficiency.A drawback of man y of these models , particularly from the standpoint of optimization, is their complexity, which requires numerical integration and obscures dynamical understanding of the process.
The model we present uses th e chemical reactions of PCR to derive a system of differential equations, mimicking the physical behavior of the single-stranded DNA (ssDNA) copy being extended one nucleotide base pair at a time.Addition of individual nucleotides occurs very rapidl y, and the amount of Taq is small relativ e to both primer and base pair concentration.Thi s provide s leverage to apply an asymptot ic soluti on strategy .Th ere are two main objecti ves for the solution strategy for this mathematical model.First, the approximate solutions found by a multiple tim e scale approach can be put into a map to simulat e the PCR process .Second, the solutions and the map can be used to optimize the time spent in each stage in order to obtain the m ost amplificat ion in the shortest possible time.Our results indicate that dynamic optimizing of the extension and annealing phases may significantly decrease the time require d for the entire PCR process.

Dissociation
Dissociation occurs wh en the sample containing the doubl e-strand ed DNA (dsDNA) is heated to separate th e dsDNA into single strands.The chemical equation can be written as n-2s , where D is dsDNA and Sis single-stranded DNA (ssDNA) .Experimentation has shown that dsDNA held at temperatures above 94°C for more that five seconds is comp letely denatured (Gevertz et al., 2005) .T his justifie s th e assumption that dissociat ion is comp lete.Using lower case letters to indicate concentrations (s = [S],d = [DJ), we represent this stage mathema t icall y bys= 2d.

Annealing
After the dsDNA is denatur ed, two complementary ssDNA templates are formed.A primer is designed to anneal at the end of the target DNA template for each of the two complementar y stra nds.\Ve simplify the reaction by includin g only one chemical equation for t hi s, assuming that t h e priming for each of the compl ementar y strands oc cur s at the same rate.The chemical equation for annealing describes the primer, P, attaching to the ssDNA, S, to form a molecule of prim ed ssDNA, S'.It can be written as

S+Pk~i S'.
The constant k 1 is the rate the reaction moves forward, creating primed ssDNA.A constant k-1 is included to model the reverse reaction of primers falling off of previously primed ssDNA.Reaction temperatures are chosen so that k 1 > > k_ 1, allowing the reaction to proceed rapidly.
Other reactions can occur during annealing.The two complementary template strands can reanneal and primers can anneal to each other instead of to the ssDNA.Since the length of the strand is generally greater than 100 base pairs, reannealing of the complementary templates is unlikely and we assume it does not happen.We also neglect the scenario of primer to primer annealing, since tremendous effort is exerted in primer design to prevent this (Eyre, 2005).

Extension
In the extension stage, Taq polymerase, Q, binds with primed ssDNA, S', to form a complex, C, at the rate >-1, Taq facilitates the addition of base pairs in order from the primer to the end of the strand.We write a separate equation for each base pair added.Cj denotes the complex with j base pairs (j = 1, ... , n).The number, n , denotes the number of nucleotide base pairs needed to comp lete the complementary strand and R represents the resources containing all 4 types of individual base pairs for extension.We assume that all base pairs add on to the template at the same rate , .,\ 2 , and that all are presen t and needed in equal proportions.
The equations for this process are as follow8: The Taq separates from the dsDNA as the template copy is completed at the rate ,\3,
At the reaction temperatures used for PCR, Taq is quite efficient at synthesizing the complementary strands.Therefore, we consider any back reactions in this stage to be negligible.Descriptions of the reactants and constants are shown in Table 1.
Taq polymerase , q = [Q] complex of primed ssDNA with Taq, c = [C] complex with j base pairs added, j = l, 2, ... , n, Cj = [Cj] resources containing nucleotides for extension, r = [ R] forward and backward reaction rates for annealing forward reaction rates for extension stage initial amount of ssDNA for a single cycle initial amount of primer for a single cycle initial amount of primed ssDNA for a single cycle initial amount of Taq for a single cycle initial amount of resources containing the individual base pairs rescaled time for extension stage (>-2ft) resca led reactants for extension stage

.1 Annealing
The law of mass action is invoked to •write a system of differential equations for the annealing stage of a single cycle of PCR.This approach is particularly justified in the case of quantitative PCR where reactions occur in small, well-mixed containers.

ds
, and ds' dt = k1sp -k_1s'. (3) Equation ( 1) represents the change in the concentration of ssDNA as a function of the concentrations of ssDNA s, primer, p, and primed ssDNA, s', scaJed by the forward and backward reaction rates, k1 and L 1 .Likewise, equations ( 2) and (3) describe the change in concentrat ion of primer and primed ssDNA.The initial conditions are s'(O) = 0 (since no primed ssDNA survjves denaturing), s(O) = s (the amount of ssDNA after dissociation), and p(O) = p (the amount of primer at the beginning of this stage).It can be observed from equations ( 1)-(3 ) that ~~ + ct;; = 0 and <iJf.+ ct;; = 0.This gjves rise to two conserved quantities: Using these quantitie s the system can be reduc ed to a single eq uation , ds' and solved analytically, using separation of variables.The solution is (5) where ( 6) The sigmoidal solution (5) increases to a limiting quantity determined by the initial amounts of ssDNA and primer.for primed ssDNA levels off when all of the ssDNA strands have been primed.
The solutions for primer and single-stranded DNA come from the conserved quantities and are: and p(t) = ps' (t).

2 Extension
The law of mass action is again applied to write differential equations ( 9)-( 16) that describe the change in concentration of each of the reactants as Taq extends the template copy.The constants, >-1 ,>-2, and A3 are forward reaction rates for this stage.The change in the concentration of primed ssDNA, is proportional to the product of the concentrations of primed ssDN A, s 1 , and unattached Taq, q, as shown in the equation, The change in the concentration of unattached Taq is also a function of s' and q as well as the concentration of the complex with all the base pairs added, en.This models Taq binding with the primed ssDNA at the beginning of extension and detaching after the template strand is completed giving, The change in the concentration of the complex, c, is a function of i, q, c, and the concentration of resources, r, ( The next n -1 equations, represented by ( 12) and ( 13), exhibit a distinct pattern as they model addition of base pairs to the template strand.The pattern models the creation of the comp lex, Cj as the jth base pair is added to the previous comp lex , Cj-1, and its subsequent disappearance when the next base pair is added.Th e change in the concentration for a particular complex with j base pairs added is a function of the concentration of that complex, Cj, r, and the concentration of the previously formed complex, Cj-l The equation for the change in the concentration of the comp lex with n base pairs differs from ( 12)-( 13) and is written, It is a function of the concentrations of the previous complex, Cn-1, r, and itself, Cn, but it includes A3, the rate at which Taq detaches from the completed complex, Cn, forming dsDNA.The change in the concentration of resources is affected by t h e concentrat ions of all the complexes up to Cn-1 as shown in the equation, The change in the concentration of dsDNA is proportional to the concentration of the complex with all of the base pairs added, en: The equation for double-stranded DNA, ( 16), is coupled only to ( 14) and can be solved by direct integration after ( 9)-( 15 An inherent small quantity in this stage of the PCR process is the proportion of t he initial amount of Taq to the initial amount of resources , i , due to the fact that nucleotide base pairs are relatively eas ier to obtain and used in much larger quantities than Taq.Therefore, we choose a time sca le and concentration scales to form a din1ensionless system in a way that let s us take advantage of this small quantity.This amounts to assuming that Taq is the rate-limiting quantity .Th e time c = ~' Cj -% , and , f = r Using dot notation for lT (~~ = s', etc.), the equations ( 9)-( 15) become: and >, ' .:..

>. 3 sec -1 100
Table 2: Parameters in PCR model and the ir estimated values in simulation s.Concentrations are in micromoles per microliter (µmol µz-1 ) and time is in second s.

Multiple Time Scale Analysis
The presence of a sma11 pa ram eter, E, allows the application of the method of multiple time sca les.
A more detailed description of this method is found in Holmes (1995).Assigning t1 = T to be the fast time sca le and t2 = ET to be the slow time sca le, l 7 becomes ¾ + r:-Jh = Ot 1 + EOt 2 .We substitute this new time deriva tiv e into equ at ions ( 24)-( 30), along with a power series expansion of the form , for each concentrat ion variabl e.For exa mpl e, (24) beco mes (32) Collecting the order t: 0 terms gives th e leading order equa t ions : (39) and ( 40) The initial conditions become: s 0 (t1 = t2 = 0) = ,', iio(t1 = t2 = 0) = ro(t1 = t2 = 0) = 1, and Equations ( 33) and ( 40) imply that ro and s 0 are constant on the fast time scale.This means that the system of equations,(34)-( 39), is linear and can be written in vector form as where A is the coefficient matrix.The sum of the rows of A equals zero, and it is easy to show that A has one zero eigenvalue and n + l eigenvalues less than zero .Therefore, the solution for thi s linear system takes the form , where v 0 is an eigenvector associated with the zero eigenvalue , and the >./s are the negativ e eigenvalues , with their associated eigenvectors , Vj.Consequently, x (t1) -t v o exponentially fast on the fast time scale.Ignoring these transients, the leading order solution becomes x(t 1) = vo.
With the above in mind, we determine v 0 using the conserved quantity obtained by adding together the right sides of ( 34)-( 39), where n, the number of base pairs added, acts as a shape parameter.
The order t: 1 equation for s' is Solving this equation yields: since s 0 and iJo are functions only at t2.To eliminate the secular term in (48), we require using the expression for q 0 in ( 46).We solve ( 49) using separation of variables.Separating and integrating gives µlnso + (nµ + l)vso = -µt2 + K, where K is an integration constant.The initial condition s 0 (t2 = 0) = 1 gives K = µlw y + (nµ + l )wy.

Multi-cycle Map
A map from one cycle of PCR to the next can be constructed from the solutions for the annealing and extens ion stages and their initi al conditions.This is done by cascading the resu lts of a previous cycl e into the initial conditions for the next cycle .Let the concentration of primed ssDNA be represented by s'A for the annealing stage and by s'E for the extension phase.Also , let the final times for the annealing and extension stages be fixed at tA and tE respectively.
The first cycle begins with initial amounts of resources , primer and Taq , and a samp le containing the dsDNA to be duplicat ed.Th e amount of resources, f , is considered as essentially constant in The longer the target strand, the longer it takes the solution to reach a limiting value.Thu s, n acts as a shape param eter.Du e to a relationship between E and n in the solutions, the error increases as n increases , causing the solutions to rise above the eA-pected limiting value (the initial amount of primer).
the map ; the amount of resource used is too small to hav e an appreciable effect.Assuming that dissociation is complete, the init ial amo unt of ssDNA for the annealin g stage of the first cycle is s 1 (0) = 2d 1 .The other init ial conditions for the annealing stage of the first cycle are P1 (0) = p ( the amount of primer at th e beginning of the PCR run) and st(0) = 0.For all other cycles dissociation not only denatures th e dsDNA created in the extension phase , it also denature s any primed ssDNA and any complexes that remain after extension.Thus , for the annealing stage of the ith cycle , the initial amount of ssDNA, si(0), include s not only twice th e amount of dsDNA from the previous cycle, but also the amount of ssDNA, the amounts of primed ssDNA, and the amount of all of the complexes from the previous cycle.This can be written as Likewise , the initial amount of prim er for the ith cycle is the sum of the amount of primer from the previous cycle plus the primers gained from denaturing the primed ssDNA and the complex of Taq and primed ssDNA from the previous cycle , written as, The initial condition for primed ssDNA is s~A(O) = 0, because dissociation is assumed to be complet e.
Then using solutions ( 5)-( 8), the map for the annealing stage of the ith cycle is: wher e and The amount of primed ssDNA at the beginning of extension stage for the ith cycle is the amount at the end of the annealing stage of that same cycle, st(O) = st(tA)-Th e initial amount of Taq is the beginning amount for the first cycle, q 1 = q and the amount from the previous cycle for the rest of the cycles, qi(O) = qi-l(tE )-Th e initial conditions for dsDNA and all the comp lexes for the extens ion stage of the ith cycle are: c; because dissociation is assumed to be complete.
A map for the extension stage can be derived from the solutions (51)-( 55).The value for s~E (tE) is extracted numerically from th e implicit relationship The concentrations for the rest of the reactants are given by: and The equation for ~(tE ) is solved numerically , using trapezoidal quadrature .Th e behavior of the solutions can be explored by using this map to simulate the PCR process.
However , the parameters must be estima t ed or fit from t he data.This include s iA, iE, the rate constants and the initial concentrations of primer, dsDNA, Taq, and resources .One example of implementing this map with estimated parameters is shown in Figure 3.The lag phase befor e the 1:,olution ri1:,es is shorter than it is in most actual PCR runs.One explanation for this is the sensitivity this model has for target DNA strand length.In the early cycles of PCR the amplicon is part of a much longer DNA strand.Until the primers bracket the target sequence, longer strands are being duplicated, which will vastly lower the initial efficiency of the reaction.This map assumes a uniform strand length being amplified from the very first cycle.

Optimization
This model and it s map may be used to optimize the time spent in each stage of a PCR cycle in order to produce the most DNA copies in the shortest amount of time possible.For an examp le of how this might be done, we consider optimizing the time spent in the extension stage of a cycle.
Let tv and iA be the fixed times for the dissociation and annealing stages, respectively.Let t be the variable representing the time spent in the extension stage.Then the total time to complete one cycle of PCR is tv + tA + t.Let d(t) be the total amount of dsDNA produced in a PCR cycle.
Then the t such that is the optimal time for t he extension stage for one cycle.Differentiating , Figure 5 illustrates the idea of optimizing th e time spent in the extension stage.Setting the right hand side of ( 5 7) equal to zero gives us (58) Then using the right sides of equations ( 16) and ( 56 (60) The t that makes (60) true is found numerically by using the solutions for the extension stage .
In order to produce the most dsDNA over the shortest amount of time for an entire PCR run, we include the optimization for the extens ion stage in the map.The optimal time is calculated and used for each iteration of the extension stage.A set time is used for all the iterations of the annealing stage.with three optimized runs .First just the extension stage is optimized , second , just the annealing stage, and then both.An 8 second dissociation time is assumed for all runs.Th e optimized simu latio ns take more cycles to produce the maximum amount of product, but less time as shown in Figure 6.The total time for 35 cycles is 22 minute s for the non optimized run , 20 minutes for the optimized extension stage run , 18 minutes for th e optimized annealing stage run, and 14 minutes when both stages ar e optimized.These results show that optimizing the annealing stage may redu ce the time for a PCR run more than just optimizing the extension stage.The time spen t in the annealing and extension stages for each cycle for a run with both stages optimized is shown in figure 7. The time spent in the annealing and extension stages for each cycle for a run with both stages optimized.The optimal time changes dynamically with the amount of product being produced .This differs from the set constant times for the annealing and extension stages of the not optimized run.

Discussion and Conclusion
In this paper, we have described PCR, discussed existing models, and developed a model from the chemical equations, using the law of mass action.This model of PCR is sensitive to the length of the target DNA st.randand models the effect of strand length on the solution shape .We found a simple solution for the dissociation stage, analytical solutions for the annealing stage , and asymptotic approximations for the extension stage , using the method of multiple scales.Thes e solutions were put into a multi-cycle map to simulate PCR.The solutions and the map were then used to optimize the time spent in the extension and annealing stages of each cycle.This research suggests that such an optimization can significantly reduce the overall time for a PCR run.
Further research includes parameterizing the model with PCR data and running optimizations to fit the needs of specific applications .Also, the model could b e modified to mor e realisticall y Figure l: The graphs of the exact solution s for the annea ling stage of a single cycle of PCR with p = .5ands= .002.The rate constants used are k 1 = 0.205 and L 1 = 0.01025.The solution curve (8) A graph of this solution is shown in Figure (1).
) are solved.The initial conditions are: s' (0) = s' ( the amount of primed ssDNA present at the end of the previous annealing stage), q(O) = (j (the initial amount of Taq) , c(O) = C j (O) = d(O) = 0 (since dissociation peels off any partially competed arnplicon ) , and r(O) = f ( the amount of resources remaining after the previous extens ion stage).

Figure 2
Figure2shows a comparison between the asymptotic and numerical solutions for the concentra-

Figure 2 :
Figure2: Compar ison of asymptotic (solid lines) and numerica l (dashed lines) solutions for dsD NA in the extension stage for various values oft: with a target strand length of n=lOO.A Runge Kutta method was used to generate the numerical solutions .As t: tends towards zero, the asymptotic and numerical solutions converg e.

Figure 3 :
Figure3: The PCR map for various lengths of target DNA .The longer the target strand, the longer it takes the solution to reach a limiting value.Thu s, n acts as a shape param eter.Du e to a relationship between E and n in the solutions, the error increases as n increases , causing the solutions to rise above the eA-pected limiting value (the initial amount of primer).

Figure 4 :
Figure 4: An illustration of the idea of optimization.The blue graph represents the solution for the extension stage for n= 200.The negative part of the time axis represents the time spent in the preceding dissociation and annealing stages.The tangent line touches the blue curve at the poin t of the optimal time to run the extension phase .

Figure 5 :
Figure5: A comparison of optimized runs to a run with a fixed annealing and extension stage times of 10 and 20 seconds, respectively in terms of cycle number.All runs are for target strands of n=l00 .Even though the optimized runs need more cycles to reach equi librium , the total time required for the cycles is less .

Figure 6 :
Figure6: A comparison of optimized runs to a run with a fixed annealing and extension stage times of 10 and 20 seconds, respectively in minutes needed for 35 cycles.Each run has a target strand length of n=lO0.The total time for 35 cycles is 22 minutes for the non optimized run, 20 minutes for the optimized extension stage run, 18 minutes for the optimized annealing stage run, and 14 minutes when both stages are optimized .
Figure7: The time spent in the annealing and extension stages for each cycle for a run with both stages optimized.The optimal time changes dynamically with the amount of product being produced .This differs from the set constant times for the annealing and extension stages of the not optimized run.

Table l :
List of variables and constants used in PCR model.