A random walk with catastrophes

Random population dynamics with catastrophes (events pertaining to possible elimination of a large portion of the population) has a long history in the mathematical literature. In this paper we study an ergodic model for random population dynamics with linear growth and binomial catastrophes: in a catastrophe, each individual survives with some fixed probability, independently of the rest. Through a coupling construction, we obtain sharp two-sided bounds for the rate of convergence to stationarity which are applied to show that the model exhibits a cutoff phenomenon.


Model
Consider a population with the following birth and death rules. Given two parameters p ∈ (0, 1) and c ∈ (0, 1], the population size is a discrete-time Markov chain (X t : t ∈ Z + ) on the state-space Z + of non-negative integers (Z + := N ∪ {0}) with transition function When there is no risk of ambiguity, we will omit the superscripts p, c and write p. In words, conditioned on the history of the process up to time t, the population size at time t + 1 is determined by tossing an independent coin with probability p of success. In the case of success, the population increases by 1, and in the case of a failure, also known as a catastrophe, the population is an independent binomial with parameters X t and 1 − c. That is, in a catastrophe, each individual survives with probability 1 − c independently of the other, and is otherwise killed. Note that p is aperiodic and irreducible. As will be shown below, p is also geometrically ergodic in total variation. The model is a version of subcritical branching process (the catastrophes) with linear migration (population increase), and belongs to a larger class of stochastic models with catastrophes extensively studied in the literature. The term "catastrophe" loosely refers to events where a large proportion or the entire population may be wiped out. There are many ways to model catastrophes and several were studied in the literature. We discuss the literature in Section 1.3 below. The particular model we study corresponds to binomial catastrophes of [25, Section 2].

Motivation
Our original interest in the model came from a curiously strong persistence feature we observed in simulations: repulsion from zero and long fluctuations in a narrow band before first hitting zero. Figure 1.2 shows a simulation of the model for p = 0.4 and c = 0.01, between times 0 and 10 5 . The initial population size is X 0 = 10. The population climbs quickly and fluctuates in a narrow band around an empirical mean close to 66.667 for a very long time. In Section 2 we show that the process is mean-reverting around the mean of its stationary distribution. Corollary 4.5 shows that already after 1500 steps the total variation distance between the process and its stationary distribution is bounded above by 0.001. These, along with the fact that the expectation of the first extinction time is of the order 10 24 , shown in Section 2, give at least a partial explanation to the simulations.
Additional motivation for our work on the model is in its amenability to coupling methods yielding sharp bounds on the rate of convergence to stationarity. These allow us to prove that the process exhibits the cutoff phenomenon. These results form the bulk of our work.

Literature
Stochastic models with catastrophes are studied in mathematical literature since mid-1970's, for a first systematic account and a review of the early literature see [3]. For a motivation and background in biological sciences see, for instance, [9,10,19,23]. Most of the work in the literature concern with either continuous time (generalized) birth and death chains with catastrophes or ODE-based models with a random disturbance. For a recent review and an extensive bibliography see [16]. The persistence feature is discussed for models with catastrophes in, for instance, [6,23]. We remark that despite the variety of mathematical approaches to modeling population catastrophes, some results seem to be of a universal nature and are exhibited by models of different types. As an example, we mention the logarithmic dependence of the first extinction time on the initial population size which we discuss in Section 5.2.3.
As mentioned above, the model we study is a particular version of binomial catastrophes case in the model introduced by Neuts in [25,Section 2]. A continuous-time analogue of our model was introduced in [3, Section 4]. For recent progress, see [1,7,16]. In our model, deaths occur in a branching fashion, and in Section 5.1 we reformulate and discuss the model Figure 1: Long fluctuations of the random walk around its average before hitting zero as a special branching process with immigration in a random environment. The study of branching processes as models of population growth with catastrophes (or disasters) goes back to at least [15], where a branching process without immigration is considered. Due to their tractability, much attention in the literature has been received by models with a deterministic growth between catastrophes, so called semi-stochastic models [5,10,11,20].
Many results in the literature focus on the phase transition between survival and non survival, see [2] and [14]. The results concerning first extinction times and stationary distributions are typically given in terms of Laplace transform or generating functions, see [2] and [16].

Organization
In Section 2 we give a probabilistic representation of the stationary distribution of the process. The bulk of our contribution is reported in Sections 3 and 4. In Section 3 we introduce a coupling and use it compute sharp bounds on the total variation distance between the distributions of the process starting from two different initial states. In Section 4 we consider a sequence of models whose stationary distribution converges to a Poisson limit. We show that this sequence exhibit a cutoff phenomenon, namely on a certain time scale the total variation distance to the stationary distribution drops from one to zero in a narrow time window. Our study of both topics appear to be original in the context of stochastic models with catastrophes and we are not aware of similar results in the literature for any type of such models. Finally, in Section 5 we estimate the first extinction time and we use a branching representation for several purposes.
Throughout the paper, the notation a n ∼ b n stands for lim n→∞ an bn = 1, and X d ∼Y indicates that the random variables X and Y have the same distribution.

Stationary distribution 2.1 Representation formula
Given a Z + -valued random variable R and ε ∈ [0, 1], write Bin(R, ε) for the random variable which, conditioned on R is binomial with parameters R and ε. We begin with the following lemma whose proof is omitted.
Lemma 2.1. Suppose that R 0 , R 1 . . . are independent Z + -valued random variables and let ε 0 , ε 1 , ε 2 , . . . be a sequence taking values in [0, 1]. Assume that Bin j (R j , ε j ) and ε > 0. Then Bin(R, ε) has the same distribution as ∞ j=0 Bin j (R j , εε j ). For α ∈ (0, 1], write Geom − (α) for the shifted Geometric distribution with probability mass function equal to ( The following proposition gives the stationary distribution for X. Note that [25, formula (12)] gives the generating function of the stationary distribution for a class of Markov chains. Our model is in that class. The next proposition gives a probabilistic representation of the stationary distribution for our model . An interpretation through branching processes representation is discussed in Section 5.1.
Let R be as in Lemma 2.1, and let π be its distribution. Then π is stationary for p.
In the degenerate case c = 1, π is Geom − (1 − p)-distributed. In Section 5.1 we discuss the case when p and c are both close to one.
Proof of Proposition 2.2. Suppose that X 0 ∼ R. We verify that X 1 ∼ R through the generating function of X 1 . For s ∈ [0, 1], we have By Lemma 2.1, we have that The sum of Bin(X 0 , 1 − c) and R 0 has the same distribution as X 0 . In other words:
Before continuing to our next topic we briefly discuss several related observations. Using Proposition 2.2 and identity (1) we have .
Let τ be the hitting time of 0, or the first extinction time, Thus, In the biological literature, this expected value is often referred to as the persistence time of the model [5,22,23]. Using that we get for p < 1/2 that .

Mean Reversal
It follows from Proposition 2.2 that Note that the local drift of X has the sign opposite to the deviation from µ. Thus the random walk always drifts toward its expected value. We also comment that the probability to hit 0 in the next step decays geometrically with the state of the system, that is These observations suggest that the process will tend to fluctuate about its mean before the first extinction, as can be seen in the simulation, see 3 Coupling and convergence to stationarity 3

.1 Construction of the coupling
The key result of this section is a coupling of the probability laws P x (X t ∈ ·) and P y (X t ∈ ·), obtained from a simple representation of the process. Let x, y ∈ Z + with x < y. Set X 0 = x, X 0 = y and H 0 = y − x. We continue inductively, assuming ((X s , X s , H s ), s ≤ t) were defined and X s = X s + H s for all s ≤ t. Conditioned on ((X s , X s , H s ), s ≤ t), • With probability p, independently of the past, H t+1 = H t , X t+1 = X t + 1 and X t+1 = X t + 1.
• Otherwise, that is with probability 1 − p, set independent of each other and of the past. Moreover, set X t+1 = X t+1 + H t+1 .
It immediately follows that X and X are both copies of our Markov chain and that In addition, the process (H t : t ∈ Z + ) is non-increasing. Write P x,y and E x,y for the joint distribution and expectation of X and X . Let ξ be the coupling time of two marginal processes, that is If H t > 0, then H t+1 = H t with probability equal to Therefore, it immediately follows that under P x,y , ξ is stochastically dominated by a sum of y − x independent copies of Geometric random variables with parameter (1 − p)c. Hence, ξ < ∞, P x,y -a.s. and has a geometric tail. Furthermore, X t = X t for all t ≥ ξ. Let N t denote the number of catastrophes up to time t. Then N t d ∼Bin(t, 1 − p). It follows from the construction of the coupling that Therefore, where the last inequality is due to Bernoulli's inequality. Letting we have that Therefore We comment that this bound is asymptotically sharp as t → ∞. That is as can be seen by expanding the expression (1 − (1 − c) Nt ) y−x through the binomial theorem and taking expectation.

Upper bounds on total variation
Recall that the total variation distance between two probability measures Q 1 and Q 2 on Z + is defined as By Aldous' coupling inequality [28], d t (x, y) ≤ P x,y (ξ > t). By combining this inequality and (12) we have proved Recall from (5) We have In particular, Proof. For any A ⊂ Z + , where the inequality follows from Proposition 3.1. The result follows because of the identity y |y − x|π(y) = x − µ + 2 y>x (y − x)π(y).

Lower bounds on total variation
The goal of this section is to obtain a lower bound for d t (x, y) which is of the same order as the lower bound in Proposition 3.1. We comment that the difficulty in proving such a result stems from the fact that the state space is infinite, because couplings which preserve linear ordering on a finite state space always satisfy this property, see [4] for a proof in continuous-time setting. We need to introduce some notation. Let .
x is the law of the Markov chain X with initial state X 0 = x, and transition function p p,c . We will also refer to the corresponding stationary distribution as π ( p) .
The main result of this section is the following theorem.
Before turning to the proof, we note the following In particular, lim t→∞ 1 t log d t (x, y) = α, thus the L ∞ spectral gap of the Markov chain X is 1 − α = p(1 − c), see [18]. The upper bound is Proposition 3.1. As for the lower bound, the ergodicity of the chain p p,c shows that for every j ∈ Z + , each summand in Theorem 3.3 P We prove Theorem 3.3 through two lemmas.
x (X t ∈ ·). Proof of Lemma 3.5. The first claim follows immediately from (8) with y = x + 1. We turn to the second claim. Conditioned on N t , ξ and X t are independent. Therefore,

This gives
The distribution of X t conditioned on N t does not depend on the parameter p, and from this we obtain and the result follows. Then Proof of Lemma 3.6. Clearly, Since for t < ξ we have X t = X t + 1, it follows that the expectations under the summation sign are all zero, and that the last summand is also zero. Therefore, where the equality follows from Lemma 3.5. The proof of the lemma is complete.
Proof of Theorem 3.3. Let A j be as in the proof of Lemma 3.6. Then and the theorem follows by virtue of Lemma 3.6.
We conclude this section with the following generalization of Lemma 3.5.
Next, repeating the argument in the proof of Lemma 3.5 we obtain with the last line follows from the binomial theorem and the bounded convergence theorem. Thus, . Putting it all together, The proof of the theorem is complete.

Poisson limit and a cutoff phenomenon
In this section we let p and c tend to 0. We will work under the following assumption Assumption 4.1. For n ∈ N p n , c n ∈ (0, 1) with p n → 0 and lim n→∞ p n c n = β ∈ (0, ∞).
We will use the superscript (n) to denote the dependence of the total variation distance, probability, expectation, and stationary distribution of the parameters, e.g. the stationary distribution for the process with parameters p n and c n will be denoted by π (n) . The proof is a routine calculation of moment generating functions, and the proof appears at the end of the section. We note that the actual form of the limit distribution is irrelevant for our next and main result of this section, the cutoff phenomenon, although we do rely on the tightness of (π (n) : n ∈ N) to prove the second claim below. t (y n , π (n) ) = 0.
Therefore with a choice of parameters as in Theorem 4.3, the model exhibits a cutoff at t n with window size O(max(ln y n , ln ln yn cn )), see [21, p. 248]. To prove the theorem we will use the following lemma. t (0, y n ) ≤ ln y n + t ln α n ≤ ln y n + λ n (θ) ln α n = ln y n + ln y n + θ ln(1 − c n (1 − p n )) c n (1 − p n ) (1 − p n ) ≤ ln y n − (ln y n + θ) = −θ, from which the first assertion of the lemma follows.
In what follows, in order to simplify the notation, we will simply write ν n and γ n instead of, respectively, ν n (θ) and γ n (θ).
By the Chernoff-Hoeffding inequality, for any t ≤ ν n , Therefore, On the other hand, under P (n) yn , X t stochastically dominates Bin(y n , (1 − c n ) Nt ), which in turn, dominates Bin y n , (1 − c n ) t . Notice that y n (1 − c n ) νn = p n c n · ln y n · e θ (ln yn) 1/4 .
Thus, for n large enough, we have where at the last but one step we used the inequality e −x ≤ 1 − x 2 , which is true for any sufficiently small x > 0, with x = θ (ln yn) 1/4 . Therefore, by the Chernoff-Hoeffding inequality, for any t ≤ ν n , P (n) yn X t ≤ γ n ≤ P (n) yn Bin y n , (1 − c n ) t ≤ γ n ≤ P (n) yn Bin y n , (1 − c n ) νn ≤ γ n ≤ exp − y n (1 − c n ) νn θ 4 48 ln y n .
Taking in account (15) this implies from which the second claim of the lemma follows.
In order to obtain easier expressions to work with, we observe that for θ large enough, independently of n, we have We are ready to prove Theorem 4.3.
The result now follow from this, combined with the first claim in Corollary 4.5.
We conclude this section with the proof of Theorem 4.2 Proof of Theorem 4.2. Let Z n be a random variable distributed according to π (n) . By Proposition 2.2 we can write . Then Thus, . Next, and since ∞ j=0 q 2j n ≤ ∞ j=0 q j n = 1 cn , We have thus proved that lim n→∞ Λ(t) = −β(1 − e −t ).

Branching process representation
We adopt a scheme of Key [17] for general branching process with immigration in random environment to give a probabilistic interpretation of the particular instance of Neuts' formula [25]. Using the approach of [1] we compute the generating function of the extinction time in Section 5.2.3.
The process X can be thought of as a branching process with immigration in random environment. Branching process have been used to model growth of a population subject to random catastrophes by many authors (see, for instance, a comprehensive literature review in [16]), the idea goes back to at least [15] where a branching process in random environment (without immigration) was considered. In this section we use a branching representation of our process and Key's [17] representation of its stationary distribution for several purposes.
First, it yields Lemma 5.1 below stating that the extinction time τ has exponential tails, next it provides an illuminating probabilistic representation of the invariant distribution π for our process, including the extreme case of rare but nearly total catastrophes (see the discussion after Proposition 5.2 and Theorem 5.3 below).
Let ω t = 1 if a birth event occurs at time t 0 if a catastrophe occurs at time t We refer to the sequence ω := (ω t ) t∈Z + as a random environment. We denote the distribution of the environment by P, the law of the process conditional on the environment by P ω , and the corresponding expectation by E ω . The Markov process X can be described using the following branching equation: where I t = ω t is interpreted as the number of immigrants joining the system at generation t and U t,i as the number of progeny of i-th particle living at generation t. Under the probability law conditional on the environment ω t , U t,i are independent Bernoulli variables with parameter c t : In statistical applications, this special type of branching processes with Bernoulli reproduction mechanism is often referred to as a RCINAR(1) random coefficient integer-valued autoregressive process of order one [31]. In this context, (20) is written as where (1 − c t ) * describes the action of a binomial thinning operator [24,30]. Stationary distribution of branching processes with immigration in a random environment, in a general (and, in fact, multi-type) setting, was studied in [17]. In particular, it follows from results in [17] that random variable τ has exponential distribution tails (in order to deduce this, one may replace I t by 1 in (20) to be able to formally use Theorem 4.2 in [17], and then apply a stochastic dominance argument). We state it formally as Lemma 5.1. There exists a, b > 0 such that P 0 (τ > t) ≤ ae −bt for any t ≥ 0.
We next consider a branching process obtained from X by sampling at the times when catastrophes occur. This auxiliary process has a slightly simpler structure than the underlying process X. We use it below to obtain an alternative probabilistic representation of the stationary distribution of X.
Observe that the sequence (T n − T n−1 : n ≥ 1) is an IID sequence of Geom(1 − p) random variables. Let Z n = X Tn and Z := (Z n ) n∈Z + .
Proposition 5.2. The Markov chain Z has a unique stationary distribution Z ∞ , whose generating function is given by Thus, in the language of Proposition 2.2, Considering R t as an immigration process, Z t can be constructed as a branching process with immigration governed by the following branching identity: where V t,k are IID Bernoulli random variables, independent of the immigration process and Z 0 , such that The result thus follows from Theorem 4.2 in [17].
We remark that an auxiliary process similar to our (Z n ) n∈Z + has been used, for instance, in [7,15] to derive the stationary distribution for different models with catastrophes.
Following the representation of the stationary distribution in [17], one can write where Z k,0 d ∼ Bin |k| R |k| , (1−c) |k| is the number of descendant alive at time zero of a "demo" immigrant arrived at time k < 0. Heuristically, in this representation Z ∞ is the population at time zero of a branching process that starts at minus infinity [17]. In between two regeneration times T n , the process goes up Geom − (1−p) number of times. When one observe the original chain in the stationary regime, time-wise the chain is in a random place between two random times T n . This suggests (using the key renewal theorem) that the stationary distribution of the original Markov chain should be the convolution of Z ∞ and an independent Geom − (1−p) variable. The result is formally confirmed in Proposition 5.2. We conclude this section with a brief discussion of the case of "severe but rare" catastrophes. For a biological motivation of this regime see, for instance, [13,19,26,27,29]. Specifically, a sequence of parameters (p n , c n ) such that p n → 1, c n → 1 as n → ∞, and lim n→∞ 1−cn 1−pn = β for some β. We will denote the stationary distribution for the n-th model, given by Proposition 2.2, by R (n) . Observe that With this, it is not hard to verify the following result: where A n is independent of R 0 and converges in distribution, as n → ∞, to Poiss(β).
Proof. Recall (24), and set x n (k) := pn(1−cn) k 1−pn , k ∈ Z + , n ∈ N, so that To estimate the right-hand side, one can apply to x n (k) the inequality x − x 2 2 ≤ ln(1 + x) ≤ x which is true for all x > 0 sufficiently small (and hence, uniformly on k, for all x n (k) with n large enough). The result follows from the fact where we took in account that x n (k) is monotone decreasing on k. Thus ln E s R (n) converges, as n → ∞, to −β(1 − s), and the proof of the theorem is complete.
Note that in view of Proposition 5.2, Poiss(β) is the limit in distribution of Z ∞ . Furthermore, using (23) and a similar representation for the underlying branching process X, one can by virtue of the renewal theorem interpret −R 0 as the time of the last catastrophe before time zero and A n as the distribution of the population right after the last catastrophe in the stationary branching process (X t ) t∈Z .

Overview
In this section we discuss the following two aspects related to the first extinction time τ : • Asymptotic behavior of τ under large initial population.

Asymptotic for large population
In this section we discuss the asymptotic behavior of the first extinction time τ when the process starts from a large population.To do that we will use the coupling construction of Section 3.1. Consider the processes X (0) t and X (n) t with initial populations 0 and n, respectively. From our coupling we know that for every t ≥ 0 we have Let τ (n) and ξ (n) be the hitting time of 0 by X (n) and H (n) , respectively: Then τ (n) , ξ (n) are both nondecreasing.
Let T 0 = 0 and let T 1 , T 2 , . . . be the increasing sequence of times X (0) visits 0. Then clearly, This is because X . Then ρ (n) depends on the past of the coupled system only through the size of the population X (n) ξ n . Thus its distribution coincides with the distribution of τ (X 0 ξ n ) . By ergodicity of X (0) , and the fact that ξ (n) ∞ a.s. as n → ∞, it follows that ρ (n) converges weakly to the distribution of τ , the hitting time of 0 under π. We have proved the following: Proposition 5.4. τ (n) − ξ (n) converges in distribution to P π (τ ∈ ·) as n → ∞.

Generating function
For s ∈ [0, 1], let a n (s) = E n [s τ ] and ψ(s, z) = ∞ n=1 a n z n . Note that a 0 = 1. The process has the following first-step decomposition: where I A stands for the indicator of the event A, namely I A (ω) = 1 if ω ∈ A and I A (ω) = 0 if ω ∈ A., and ω t is defined in (19). The generating function ψ(s, z) can be evaluated using (26) and an analytical method of [1]. In particular, we have The proof of the theorem is similar to the proof of Theorem 3.1, part (ii), in [1]. Namely, an application of (26) leads to a recursive equation for the generating function ψ(s, z) of a type that has been analyzed in [1]. We comment that through the recurrence relation (28), we can obtain an explicit formula for E n [s τ ] for each n ∈ N. The proof below is provided for the sake of completeness. and ϕ(z) = ψ(s, z)(ps − z).