Predicting population extinction or disease outbreaks with stochastic models

Models of exponential growth, logistic growth and epidemics are common applications in undergraduate differential equation courses. The corresponding stochastic models are not part of these courses, although when population sizes are small their behaviour is often more realistic and distinctly different from deterministic models. For example, the randomness associated with births and deaths may lead to population extinction even in an exponentially growing population. Some background in continuous-time Markov chains and applications to populations, epidemics and cancer are presented with a goal to introduce this topic into the undergraduate mathematics curriculum that will encourage further investigation into problems on conservation, infectious diseases and cancer therapy. MATLAB programs for graphing sample paths of stochastic models are provided in the Appendix.


Introduction
Deterministic and stochastic models of populations and epidemics have a long history. Malthusian or exponential growth, named for Thomas Robert Malthus, was first discussed in An Essay on the Principle of Population (under the pseudonym Joseph Johnson) published in 1798 (Johnson, 1798). Malthus edited and published five more versions of this original essay. These essays were very influential at the time, impacting, for example, the work of Charles Darwin on the theory of natural selection and the work of Pierre-Francois Verhulst on logistic growth. In 1838, Verhulst mentioned Malthus' principle of geometric growth but argued for a constraint on the rate of growth which he modelled as proportional to the square of the population size (Verhulst, 1838). In epidemic models, William Ogilvy Kermack and Anderson Gray McKendrick were the first to introduce the three-compartment SIR epidemic model (Susceptible-Infectious-Recovered). Their first paper in 1927 entitled 'A contribution to the mathematical theory of epidemics' defined the concept of an epidemic threshold for disease outbreaks, now known as the basic reproduction number (Kermack & McKendrick, 1927). Some of the earliest stochastic models for population growth and extinction were published in the mid-1800s. Seemingly simple questions about the survival of family names led to some new mathematical approaches in probability theory. These new approaches in turn led to the birth of a new field of mathematics, now known as branching processes. It was Bienaymé (1845) and Watson and Galton (1875) who first posed these questions. Other early contributors to stochastic population and epidemic theory on simple birth and death processes and probability of disease outbreaks include Feller (1939), Kendall (1949Kendall ( , 1956, Bartholomay (1958) and Whittle (1955). Kendall (1949, p. 230) stated: The classical theory of population growth treats the size of the population as a continuous variable, …. It is clear that a more refined analysis must take into account the role of chance effects in the development of the population, and starting with W. Feller in 1939 a good deal of attention has recently been paid to this question.
Despite the long history and the many advances in stochastic population theory, this theory has not been included in many undergraduate courses. The accessibility of computers makes it an easy task to demonstrate the differences between deterministic and stochastic modelling approaches in the classroom. The drawback, however, may be the required background in continuous-time Markov chain theory. Although discrete-time Markov chains are introduced in courses on Discrete Mathematics, the closely related continuous-time theory requires calculus and differential equations. With some background in continuoustime Markov chains, stochastic applications to populations, epidemics and cancer can be easily introduced into the undergraduate mathematics curriculum. Stochastic models are especially relevant when population sizes are small and subject to extinction, whereas deterministic models, differential equations or difference equations, are a good approximation to the overall dynamics when population sizes are large. These latter topics are covered in the traditional mathematics undergraduate courses.
Our goals are to provide a few well-known applications to populations and epidemics, and to a more recent application to cancer therapy based on continuous-time Markov chain theory. These applications can be incorporated into undergraduate mathematics courses such as Calculus, Differential Equations or Mathematical Modelling. Specifically, our goals are (1) to give a brief introduction to continuous-time Markov chain theory and the simple birth and death process, (2) to provide MATLAB algorithms to numerically simulate continuous-time Markov chain models and (3) to demonstrate how probability of extinction from the simple birth and death process can be used to predict extinction in problems associated with population growth or infectious diseases, when population sizes or number of infectious individuals are small or to predict time of extinction when treatment is designed to eliminate disease. In advanced courses, the theory and application of Markov chains can be studied in some depth whereas in introductory courses, a short presentation on Markov chains accompanied by MATLAB simulations can be used for illustration purposes.

Simple birth and death process
Formally, a stochastic process is a collection of random variables, X(ω, t). Capital X denotes a random variable that depends on ω and t. The value of ω is an element in the sample space and t is time, t ∈ [0, ∞). Therefore, a stochastic process is a collection The sample space is the same for all random variables in the collection. Since we are interested in the size of a population, an element ω in the sample space is a subset of the nonnegative integers. For a fixed ω, X(ω, t) is a discrete random variable that changes with time, and is referred to as a sample path or a stochastic realization of the process. Each ω ∈ , yields a new sample path.
For simplicity, the ω is often omitted and we write X(t) instead of X(ω, t). We emphasize that each of the random variables, X(t), is discrete-valued, taking values in the set of nonnegative integers, We use the term state of a random variable to mean the value of the random variable. For example, if X(t) = 2, then 2 is the state of the random variable X at time t.
A stochastic process is generally defined by relating the random variable X(t + t) to the random variable X(t), when t is small (Karlin & Taylor, 1975). These relations are defined by the infinitesimal transition probabilities. Here, we refer to them as transition probabilities. A transition probability for a birth is the probability associated with the change X(t + t) = X(t) + 1. A transition probability for a death is the probability associated with the change X(t + t) = X(t) − 1.
Denote the transition probability for a change in state from i to state j in a short time t as the following conditional probability: (1) When the transition probabilities depend only on the length of time t between the transitions and not on the particular time t at which these transitions occur, as in (1), then the stochastic process is referred to as a time-homogeneous process. Let b be the per capita birth rate and d be the per capita death rate of the process. Then the transition probabilities for a small interval of time t, Definition (2) states that when t is small and X(t) = i, the probability of a birth during the time interval t is approximately bi t, the probability of a death is approximately di t, the probability of no change is approximately 1 − (b + d)i t and all other changes are negligible. The sum of all the probabilities must equal one. Definition (2) also states that jumps occur in the population size when there is a birth or a death. The time that elapses between jumps (a birth or a death) is referred to as the interevent time.
This simple birth and death process is an example of a continuous-time Markov chain. The term chain implies the random variable X(t) is discrete-valued and the term Markov (named after Andrey Andreyevich Markov) implies the future state at t + t, given the present state at time t, is independent of the past. That is, for time t > 0, P(X(t + t)|X(t)) = P(X(t + t)|X(t), X(s)) for any s < t. This is referred to as the memoryless property. The only continuous probability distribution with the memoryless property is the exponential distribution. The interevent time in a Markov chain is a continuous random variable with an exponential distribution. This property is demonstrated in the birth and death process in Equation (4). It is important to note that if the initial state X(0) = i is known, then the transition probabilities are actually the probabilities p j (t) associated with the random variables X(t):

Sample paths
Because the values of the random variables are discrete, any sample path X(t) of a continuous-time Markov chain is not a continuous function of time. For example, if the process is in state i and there is a birth, then the process jumps to state i + 1 or if there is a death, the process jumps to state i − 1. From Definition (2), if X(0) = i ≥ 1 and if the interevent time is τ > 0, then a change of state occurs at τ . It follows that the left-hand limit of the process at τ is i and the right-hand limit is either i − 1 or i + 1. The stochastic process is continuous from the right but not from the left (see Figure 1). That is, The probability of a birth, X(τ To numerically simulate a birth or a death the uniform distribution U on [0, 1] is applied. In general, for k different events, each representing a jump in the process and each with a given positive probability q j , j = 1, . . . , k, the unit interval [0, 1] is divided into k subintervals of length q j , k j=1 q j = 1. Then, if a uniform random number u ∈ U lies in the subinterval q j , event j occurs. In the birth and death process there are only two events, there is a birth but if not, there is a death. For example, given X(0) = 2, b = 2, d = 1.5 and u ∈ U, the MATLAB code is given below. Notice in MATLAB that the initial state is written as X(1) = 2 and initial time as t(1) = 0, since array indices in MATLAB start at one.

Figure 1.
A sample path is continuous from the right but not from the left. In this example, the jump times are 1, 1.75, 2.25 and 3.25 and the interevent times are τ 1 = 1, τ 2 = .75, τ 3 = .5 and τ 4 = 1.
In addition to the birth and death probabilities, numerical simulation of a sample path requires computation of the values for the interevent times. The interevent time T i is a continuous random variable with values τ ∈ (0, ∞). As noted earlier, for Markov chains, the interevent time is exponentially distributed. Given that the process is in state i at time t, X(t) = i, the interevent time is exponentially distributed with parameter (b + d)i (the sum of the rates corresponding to all possible events). In particular, the probability density . The mean and standard deviation of the exponential distribution are equal to [(b + d)i] −1 . Therefore, as the population size i increases, the mean interevent time decreases. The Markov property of the interevent time can be easily demonstrated. The interevent time T i does not depend on the length of time t to reach state i but only on the current state i of the process (independent of past history or memoryless property): The uniform distribution U and the cumulative distribution F i are used to compute the interevent time of a sample path. We derive an identity for T i in terms of U that depends on the following properties of U: Applying the first property to The definition F i (τ ) = P(T i ≤ τ ) and the preceding identity show that T i and F −1 i (U) have the same cumulative distribution, namely, F i (τ ). Computation of the Note: Although the sample paths are discontinuous, when graphed with MATLAB the discrete jumps are connected.
inverse F −1 i (U) yields a formula for the interevent time in terms of the uniform random variable U: where we have used the second property of U. The preceding calculation is part of an algorithm for the interevent time which is often referred to as the Gillespie algorithm (Gillespie, 1977). A uniform random number u ∈ U generates a value τ = ln (u)/((b+d)i) for the interevent time. For example, in MATLAB code, the time of the first jump, given X(0) = 2, b = 2 and d = 1.5 is computed as follows: The deterministic analogue of the simple birth and death process is the well-known Malthusian exponential growth model. Expressed as a differential equation, the model is there is exponential growth, solutions approach infinity, but if b < d, there is exponential decay; solutions approach zero. In fact, the exponential growth model (5) is the mean of the simple birth and death process. We use MATLAB to generate five sample paths of the simple birth and death process and plot them against the corresponding deterministic solution in Figure 2 (the MATLAB code is in Appendix 2). To compare the dynamics of the exponential growth model to the simple birth and death process, we assume both models have the same initial size and the same birth and death rates. In the example in Figure 2, X(0) = 2 = x(0), and b = 2 and d = 1.5.
In the simple birth and death process, the zero state is an absorbing state, corresponding to population extinction. That is, if X(t) = 0, then X(t + τ ) = 0 for τ ∈ [0, ∞). Absorption into the zero state is illustrated in two of the five sample paths in Figure 2. Although extinction cannot occur in the deterministic model when b > d, the probability of extinction in the stochastic model is always positive. In the next section, we give an analytical expression for the probability of extinction p 0 (t).

Probability of extinction
The probability of extinction can be derived from differential equations that follow from the transition probabilities, Definition (2) and techniques from probability generating functions (e.g. Allen, 2010;Bailey, 1990). The differential equations are known as the Kolmogorov differential equations to honour the contributions of the mathematician Andrey Nikolaevich Kolmogorov. The derivation is relegated to Appendix 1. The derivation for the probability of extinction can also be found in the classic textbook by Feller (1968).
The derivation in Appendix 1 yields an expression for the probability generating function, The probability generating function is useful for generating the probabilities p j (t) and the moments E([X(t) m ]. Evaluating P at s = 0 gives p 0 (t) and differentiating P with respect to s and evaluating at s = 0 gives p 1 (t). The expectation E(X(t)) = ∞ j=1 jp j (t) is found by differentiating P with respect to s and evaluating at s = 1.
The closed form expression for P(s, t), derived in Appendix 1, is well-known (e.g. Allen, 2010;Bailey, 1990). An underlying assumption in this derivation is that each individual in the simple birth and death process acts independently, so that the probability generating function for X(0) = i is just the product of i probability generating functions for X(0) = 1. In particular, Differentiation of P with respect to s leads to Evaluating P at s = 0 leads to a formula for the probability of extinction by time t: In the limit, the probability of extinction is If the difference b − d is large, then the limit in (8) is reached more rapidly than if the difference b − d is small (see Figure 3). In the case that the birth rate is less than the death rate, the probability of extinction approaches one. That is, the limiting stationary distribution of X(t) as t → ∞ is concentrated at zero. For the example in Figure 2, the value of p 0 (10) is very close to the asymptotic limit. Applying the limiting formula in (8) to the simple birth and death process illustrated in Figure 2, the probability of population extinction is Recall that two out of five sample paths reached extinction in the computations. The extinction results of the simple birth and death process are applicable to a wide range of population and epidemic processes. The limiting formula in (8), where b > d, approximates the initial dynamics of population and epidemic processes when there is exponential growth away from an absorbing state, demonstrated in Examples 1 and 2. In Example 3, we illustrate an application of the simple birth and death process to cancer therapy, when the birth rate is less than the death rate. In this example, the probability of extinction p 0 (t) approximates the time until cancer cells are eliminated.

Logistic growth
The well-known Verhulst logistic model (Verhulst, 1838) is given by the following nonlinear differential equation where r > 0 is the intrinsic growth rate and K > 0 is the carrying capacity of the population.
The growth declines at a rate proportional to the square of the population size. Notice that the birth and death rates in (9) are not explicitly defined. The solution of (9) can be written in terms of the parameters and the initial population size x 0 as It follows that lim t→∞ x(t) = K so there is no possibility of extinction. However, if the population size is small, close to zero, the logistic Equation (9) is approximately an exponential growth model, dx(t) dt ≈ rx(t), r > 0.
To account for randomness in births and in death, we formulate a logistic Markov chain, {X(t), t ∈ [0, ∞)}, a continuous-time general birth and death process. The state space is a subset of the nonnegative integers and may be finite X(t) ∈ {0, 1, . . . , N} or infinite X(t) ∈ {0, 1, 2, . . .}. We assume that in a small time interval t, only a birth, a death or no change may occur.
Let λ i and μ i be the birth and death rates given X(t) = i. The infinitesimal transition probabilities, P(X(t + t) = j|X(t) = i) = p i,j (t) are similar to (2) but are not linear, In addition, the birth rate equals the death rate at x = K since K is a steady-state solution, The preceding assumptions (11) and (12) are satisfied for many choices of the birth and death rates (Allen, 2010). A reasonable assumption is that the birth and death rates are quadratic functions, Parameters b 1 > 0, d 1 ≥ 0 and d 2 > 0 but b 2 may be positive, negative or zero to ensure There are four parameters b 1 , b 2 , d 1 and d 2 in the stochastic logistic model but only two parameters r and K in the deterministic model. Therefore, there are infinitely many stochastic logistic models with birth and death rates explicitly defined that correspond to the same deterministic logistic model. For example, b 1 = r + c, d 1 = c, b 2 = 0 and d 2 = r/K, then λ x is linear in x but μ x is quadratic and both are nonnegative for x ∈ {0, 1, 2, . . .}. Another example, is b 1 = r, d 1 = 0, b 2 = −r/(2K) and d 2 = r/(2K), where both λ x and μ x are quadratic and nonnegative for x ∈ {0, . . . , 2K}. In this latter example, births and deaths are density-dependent. Of course, growth in the stochastic logistic process differs significantly from the simple birth and death process when population sizes are large, but their behaviour is similar near the origin. In fact, when population sizes are small, the asymptotic limit in (8) is a good approximation to the probability of extinction.
Near the origin, the linear approximation of the logistic model has birth and death rates b = b 1 and d = d 1 . The formula in (8), given X(0) = x 0 , yields an estimate for the probability of population extinction for the stochastic logistic model. The estimate depends on the initial population size x 0 . The limit is denoted as P 0 (x 0 ): The case b 1 ≤ d 1 never occurs in the logistic model since r > 0. The probability of extinction is always less than one but greater than zero for the stochastic logistic model.

Example 1: population extinction
Suppose the birth rate is linear and the death rate is a quadratic function, where the two conditions (11) and (12) are satisfied, b 1 , d 1 > 0 and  Notes: The figure on the left shows two of the sample paths that reach the carrying capacity K and the figure on the right shows a close-up of the same sample paths on the interval [0, 10] with one hitting zero before t = 5. The probability of population extinction is P 0 (2) = .25.
One of the three sample paths hits zero. The analytical approximation of the probability of extinction for this example is Extinction occurs rapidly in this example.
To check the analytical estimate given in (14), we numerically simulated ten thousand sample paths of the stochastic logistic model using the MATLAB code in Appendix 2, until either the population size hits zero or the population reaches a size of 50. Values other than 50 can be chosen but 50 is reasonable since it is a sufficiently large value with exponential growth prior to 50 (< K/2). It is highly unlikely for the population to hit zero after reaching a size of 50. The simulations continue until the population size enters one of two subsets, either 0 = {X|X(t) = 0 for the first time t ∈ (0, ∞)} or 50 = {X|X(t) = 50 for the first time t ∈ (0, ∞)}. Then we count the number of sample paths in each of these sets. The union of these sets contains all ten thousand sample paths, | 0 ∪ 50 | = 10 4 . For example, in one particular example, the numerical estimate | 0 |/10 4 = .2547 which compares well with the analytical estimate given in (14).
This example illustrates the importance of a sufficiently large population size to prevent extinction. However, with only two parameters, the obvious conclusion is that extinction can be prevented by increasing births or by decreasing deaths. In practice, models for threatened or endangered species require much more complex models that depend on the particular species, the spatial environment, and the available options. Developing more comprehensive models to accurately portray the species dynamics in their natural environment pose significant mathematical, statistical and computational challenges (Green et al., 2005).

SIR epidemic model
The Kermack-McKendrick SIR epidemic model is a system of differential equations for the spread of an infectious disease in a population (Kermack & McKendrick, 1927). In the SIR model, an infected individual recovers and develops immunity which protects against reinfection. The model applies to diseases such as measles, mumps, rubella and chickenpox, where immunity occurs after infection. The total population N of individuals is divided into three groups: (S) susceptible, (I) infected and infectious, and (R) recovered. In the simplest case the population size is constant N = S(t) + I(t) + R(t). There are no births or deaths, only infection and recovery. The system of differential equations for the SIR epidemic model is as follows: where the initial conditions satisfy S(0) > 0, I(0) > 0, R(0) ≥ 0 and S(0)+I(0)+R(0) = N. The parameter β is the transmission rate, the number of contacts per unit time that result in an infection of a susceptible individual, γ is the recovery rate, and 1/γ is the average length of the infectious period.
The dynamics of this model are well known and we summarize them here. Eventually, the disease dies out: lim t→∞ I(t) = 0, lim t→∞ S(t) = S(∞), and lim t→∞ R(t) = R(∞). The effective reproduction number R and the basic reproduction number R 0 are defined as R = S(0) N · β γ and R 0 = β γ .
If R ≤ 1, then there is no epidemic; solution I(t) decreases monotonically to zero. If R > 1, then I(t) increases first before decreasing to zero; an epidemic occurs. The basic reproduction number R 0 (introduced in 1927 by Kermack and McKendrick) is the number of secondary infections caused by introduction of one infectious individual into an entirely susceptible population.
Since (15) can be reduced to a system of two differential equations: Near the disease-free equilibrium (S, I) = (N, 0), the infectious differential equation, dI/dt, in (16) is approximately Therefore, near the disease-free equilibrium, the stochastic model for the infectious population will behave similarly to a birth and death process with birth rate b = β and death rate d = γ. Now, we construct a stochastic SIR model. The same notation S(t) and I(t) are used for the random variables. The SIR Markov chain model is a multivariate process, {(S(t), I(t))|t ∈ [0, ∞)}, where S(t) and I(t) are discrete random variables for the number of susceptible and infectious individuals at time t. The values of the random variables satisfy S(t), I(t) ∈ {0, 1, 2, . . . , N} and S(t) + I(t) ≤ N.
Let P((S(t + t), I(t + t)) = (k, j)|(S(t), I(t)) = (s, i)) = p (s,i),(k,j) ( t). The infinitesimal transition probabilities, given (S(t), I(t)) = (s, i), are The process is time-homogeneous. Near the disease-free equilibrium, the Markov chain model can be approximated by a simple birth and death process (similar to the exponential growth model). Death of an infected individual corresponds to a recovery, d = γ , and birth of an infected individual corresponds to a new infection, b = β. Then The probability of disease extinction follows from (8) and the simple birth and death process. Given an initial infective population size of I(0) = i 0 , the probability of no epidemic outbreak is approximately The preceding formula is equivalent to the expression originally derived by Whittle (1955). An estimate for the probability of an outbreak is approximately 1 − P 0 (i 0 ).

Example 2: disease outbreaks
Suppose the infectious period on the average is 5 days (γ = 1/5), the total population size is N = 800 and the basic reproduction number is R 0 = 2 (β = 2/5). Three sample paths of the SIR Markov chain model along with the deterministic solution are plotted in Figure 5 when I(0) = i 0 = 2 (The MATLAB program is given in Appendix 2). Applying (18), an estimate for the probability of no outbreak is An estimate for the probability of a disease outbreak is 1 − P 0 (2) = .75. In the deterministic model, the total number of cumulative cases is 639 with a maximum total number of cases at any time reaching about 125. To check the analytical estimate in (19), ten thousand sample paths are numerically simulated until either the infectious population hits zero or reaches a size of 25 (MATLAB program in Appendix 2). Values other than 25 can be chosen but 25 is reasonable since there is exponential growth prior to 25, before the peak value is reached in the deterministic model. In realistic outbreaks, when the number of cases reaches 25, this would be considered a major outbreak. We count the number of sample paths that lie in one of two subsets: 0 = {I|I(t) = 0 for the first time t ∈ (0, ∞)} or 25 = {I|I(t) = 25 for the first time t ∈ (0, ∞)}. The numerical estimate for probability of disease extinction is | 0 |/10 4 = .2531 for one set of ten thousand sample paths which is in good agreement with the analytical estimate in Equation (19).
Prevention of an outbreak is a major public health concern. Methods for prevention include prophylactic interventions (antibiotic and antiviral drugs), vaccination, quarantine and isolation. Studies on the emergence of new diseases such as SARS and Ebola and the re-emergence of old diseases such as drug resistant tuberculosis require more sophisticated models and the expertise of public health and medical professionals, statisticians and mathematicians. Mathematical models play an important role in testing the feasibility and the outcomes of various disease prevention and intervention strategies (e.g. Anderson & May, 1992;Brauer & Castillo-Chavez, 2010).

Cell populations
A cell divides and reproduces two daughter cells. During the cell cycle, mutations may occur, so that the daughter cells are not identical to the parent cell. The mutation can be neutral meaning the genetic sequences passed on to a daughter cell are neither beneficial nor detrimental to survival or reproduction. However, if the mutation impacts survival or reproduction then 'fitness' has been altered. In an advantageous mutation, fitness is increased and either the reproduction cycle is shortened or the survival time is lengthened. On the other hand, in a deleterious mutation, fitness is decreased and the cell line does not survive. Most acquired mutations that occur in somatic stem cells (not egg or sperm) are neutral or deleterious. However, mutations that lead to cancer or tumour cells are generally advantageous, reproducing more rapidly than healthy cells. Cancer cells have the ability to renew and regenerate, a property typical of normal stem cells (Hardavella, George, & Sethi, 2016). These features of cancer cells have led to what is known as the cancer stem cell hypothesis: the existence of cancer stem cells that may be 'responsible for cancer initiation, progression, metastasis, recurrence and drug resistance' (Hardavella et al., 2016).
The simple birth and death process is directly applicable to cellular reproduction, where birth is cell division; two cells are reproduced from one via cell division, p i,i+1 ( t) = bi t. Cell death is the transition p i,i−1 ( t) = di t (Equation (2)). Additional reasons that the birth and death process is a good approximation for cellular reproduction are that cell lines reproduce independently and the fitness of individual cell lines is constant, not frequencydependent or density-dependent (Iwasa, Michor, & Nowak, 2004). Here, we consider the simple birth and death process as a model for two distinct cell lines, that is, normal healthy cells and cancer cells under chemotherapy treatment. The birth and death rates differ between the two cell lines when chemotherapy treatment targets the cancer cells (Sehl, Zhou, Sinsheimer, & Lange, 2011).

Example 3: cancer therapy
Suppose there are two cell lines: a normal healthy cell population H and a cancer cell population C. Healthy cells divide into two daughter cells at rate b h or die at a rate d h , whereas cancer cells divide at rate b c or die at rate d c . With chemotherapy treatment, both healthy cells and cancer cells are affected, but the treatment is designed to target the cancer cells. Therefore, we consider the case that with targeted treatment, the death rate for cancer cells is greater than for normal healthy cells.
In a recent model for cancer stem cell therapy, Sehl et al. (2011), apply a simple birth and death process to study the effects of treatment on cancer stem cells. The following birth and death rates were assumed in two examples for treatment of chronic leukaemia, b h = .02/week, b c = b h , d h = .08/week, with two different death rates for cancer stem cells: d c = .31/week and d c = .59/week (Sehl et al., 2011). In the absence of treatment, the death rate of healthy stem cells is about .002/day (Sehl et al., 2011). At the beginning of therapy, the number of cancer cells in the population is large, X h (0) = 4, 400 and X c (0) = 17, 600 (Sehl et al., 2011). Because b h < d h and b c < d c , the probability of extinction approaches one (Equation (8)), lim t→∞ p 0 (t) = 1, for both healthy and cancer stem cells. The rate of convergence depends on e (b−d)t . During treatment, an important question is the time to stop treatment. If chemotherapy treatment is prolonged, both healthy and cancer stem cells will be eliminated. Chemotherapy must be continued until most of the cancer stem cells are eliminated, but should cease before the healthy cell population has decreased to dangerously low levels. With cessation of treatment, the hope is that the healthy cells will resume normal growth with b h > d h . In a simple birth and death process, when the death rate exceeds the birth rate, d > b, the expression for the probability of extinction p 0 (t) in (7) is the cumulative distribution for the time to extinction. For the particular parameter values in the preceding examples, Figure 6 is a plot of p 0 (t) for healthy cells and for two cancer cell lines with different death rates. There is a distinct difference in time to extinction. Since d c d h , most of the cancer stem cells are eliminated before the healthy cells are damaged (while p 0 (t) for the healthy cells is still small). The timing of treatment is a crucial part of chemotherapy. A higher death rate d h of cancer cells shortens the time of treatment and reduces the damage to healthy cells. Increased understanding of these processes and the effects of treatment have been obtained by applying more complex stochastic models along with methods from multitype branching processes (see, e.g. Durrett, 2015;Kimmel & Axelrod, 2002).

Additional resources
Population extinction, disease outbreaks and cancer therapy represent three applications where the theory of birth and death processes provides greater insight into the extinction process. More complex models with several random variables representing multiple age classes, multiple disease stages or multiple types of cancer cells have been studied via methods from branching processes (e.g. Allen & van den Driessche, 2013;Athreya & Ney, 1972;Dorman, Sinsheimer, & Lang, 2004;Durrett, 2015;Haccou, Jager, & Vatutin, 2005;Jagers, 1975;Kimmel & Axelrod, 2002;Michor, Nowak, & Iwasa, 2006;Novozhilov, Karev, & Koonin, 2006). Interesting historical accounts about model development and applications to population dynamics and branching processes can be found in the book by Bacaër (2011) and in the Lecture Notes by Jagers (2009).