Non-homogeneous random walks on a semi-infinite strip

We study the asymptotic behaviour of Markov chains $(X_n,\eta_n)$ on $\mathbb{Z}_+ \times S$, where $\mathbb{Z}_+$ is the non-negative integers and $S$ is a finite set. Neither coordinate is assumed to be Markov. We assume a moments bound on the jumps of $X_n$, and that, roughly speaking, $\eta_n$ is close to being Markov when $X_n$ is large. This departure from much of the literature, which assumes that $\eta_n$ is itself a Markov chain, enables us to probe precisely the recurrence phase transitions by assuming asymptotically zero drift for $X_n$ given $\eta_n$. We give a recurrence classification in terms of increment moment parameters for $X_n$ and the stationary distribution for the large-$X$ limit of $\eta_n$. In the null case we also provide a weak convergence result, which demonstrates a form of asymptotic independence between $X_n$ (rescaled) and $\eta_n$. Our results can be seen as generalizations of Lamperti's results for non-homogeneous random walks on $\mathbb{Z}_+$ (the case where $S$ is a singleton). Motivation arises from modulated queues or processes with hidden variables where $\eta_n$ tracks an internal state of the system.


Introduction
There are many applications that naturally give rise to Markov processes on a product state-space X×S where S describes some operating regime or internal state of the system, which influences the motion of the process in the primary space X. Important classes of examples include, among others, • modulated queues, in which S may contain operating states of the servers or other auxiliary information such as the size of a retrial buffer, as arise in various applications such as those described by Neuts in [25]; • regime-switching processes in mathematical finance or ecology, where S may contain market or other environmental information; • physical processes with internal degress of freedom, where S may describe internal energy or momentum states of a particle, such as adopted by Sinai as a tool for studying the Lorentz gas (see e.g. [17]), or exemplified by the so-called correlated or persistent random walk.
In several of the key examples, the S-component of the process is 'hidden', and the main interest is in the asymptotic behaviour of the X-component of the process.
In the most classical setting, the projection of the process onto S is itself Markovian. In this case, the queueing models become Markov-modulated [25], while other examples fit into the class of Markov random walks [13]. This case also includes processes that can be represented as additive functionals of Markov chains [26]. Such models pose a variety of mathematical questions, which have been studied rather deeply over several decades using various techniques that take advantage of the additional Markov structure, and much is now known.
Much less is known when the process projected onto S is not Markovian: the main focus of the present work is to replace the Markovian assumption by a weaker (asymptotic) condition that provides sufficient structure. This relaxation is necessary to probe more intimately the recurrence-transience phase transition for these models, since the natural setting (paralleling the classical work of Lamperti) is to suppose that the law of the process is non-homogeneous in X, in particular, the mean drift of the X-component of the process will be asymptotically zero. This non-homogeneity precludes, in general, the S-component of the process from being Markovian, but admits our weaker conditions.
As an example consider the following queueing model. A queue is served by a single server and experiences arrivals at rate λ; the service rate is modulated via an internal state of the server η n , as well as the length of the queue X n (in discrete time, i.e., in terms of the jump process). Allowing the service rate to depend on the queue length distinguishes this model from the class of semi-Markov queues [25]. When (X n , η n ) = (x, i), x ≥ 1, the service rate is ρ i (x) = ρ 1 − 2c i x , where c i , i ∈ S are parameters of model with |c i | < 1/2. In the case where c i ≡ 0 for all i, the internal states of the server are indistinguishable and the model is simply (the jump process of) an M/M/1 queue with arrival rate λ and service rate ρ; the critical case from the point of view of recurrence and transience is ρ = λ, and so that is the most interesting setting to perturb with non-zero c i . So we take ρ = λ from now on. The specification of the model is completed by stipulating that whenever an arrival (departure) occurs the internal state of the server transitions according to the stochastic matrix (a ij ) ((b ij )). In other words, given (X n , η n ) = (x, i), x ≥ 1, Given (X n , η n ) = (0, i), (X n+1 , η n+1 ) = (1, j) with probability a ij .
In general, (η n ) is not itself a Markov chain, so this model falls outside the usual Markov-modulated queue framework. However, for large queue lengths the probabilities of arrival and departure are approximately equal, and so the η n process should be well approximated by the Markov chain on S with transition matrix M ij = 1 2 (a ij + b ij ). Under the condition that the matrix M be irreducible, our results determine conditions for transience and recurrence in terms of the stationary distribution of the chain with transition matrix M and the constants c i .

Model and main results
We now describe precisely our model. Our state-space is the half-strip Z + × S, where S is finite and nonempty; for k ∈ S, we call the subset Z + × {k} a line. We consider an irreducible Markov chain (X n , η n ) ∈ Z + × S, with transition probabilities and provide conditions for recurrence/transience of (X n ), in a sense that we explain below. Throughout we use the notation F n := σ(X 0 , η 0 , . . . , X n , η n ) and R + := [0, ∞). The process (X n ) is typically not itself a Markov chain; under our standing assumptions, however, it does inherit the recurrence/transience dichotomy from (X n , η n ), as the following result shows.
Lemma 2.1. Exactly one of the following holds: In the former case, we call (X n ) recurrent, and in the latter case, we call (X n ) transient.
If (X n ) is recurrent, then we say that it is null-recurrent or positive-recurrent according to which of (i) or (ii) holds.
The proofs of Lemmas 2.1 and 2.2 are standard and are omitted.
In the cases that we consider, we will assume that the displacement of the Xcoordinate has bounded p-moments for some p < ∞: In particular, (B p ) for some p > 4 will suffice for all of our results, while for some of our results p > 1 is sufficient.
Define q x (i, j) = y∈Z + p(x, i, y, j). We also assume: exists for all i, j ∈ S, and (q(i, j)) is an irreducible stochastic matrix.
Note that since j∈S q x (i, j) = 1, the limit in (Q ∞ ) is necessarily stochastic; however the irreducibility of (q(i, j)) does not follow from the irreducibility of (q x (i, j)) for all x ∈ Z + . For some of our results, it is necessary to assume a stronger condition than (Q ∞ ) that controls the rate of convergence of q x (i, j), namely: Given (Q ∞ ), we define (η ⋆ n ) to be a Markov chain on S with transition probabilities given by q(i, j). Since (η ⋆ n ) is irreducible and finite there exists a unique stationary distribution π on S with π(j) > 0 for all j ∈ S and satisfying π(j) = i∈S π(i)q(i, j).
for all x ≥ x 0 , i.e., for all x large enough, the transition probabilities depend on x and y only through y − x. Then, q x (i, j) = q(i, j) = z≥−x 0 r(z, i, j) for all x ≥ x 0 . The homogeneity condition (H) plays an important role in much of the existing literature, but is too restrictive for our purposes. We discuss (H) and some of its consequences, including the connection to the theory of additive functionals of Markov chains, in Section 3.1 below. For now, we remark that if (H) holds for all x ≥ x 0 , then necessarily X n+1 − X n is uniformly bounded below (by −x 0 ).
We denote the moments of the displacements in the X-coordinate by then µ 1 is well defined provided (B p ) holds for some p ≥ 1, while µ 2 is finite if (B p ) holds for some p ≥ 2. Our results will apply to the following two cases: (M L ) There exist c i ∈ R and s 2 i ∈ R + , with at least one s 2 i nonzero, such that for all i ∈ S, as x → ∞, Since S is finite, the implicit constants in the x → ∞ error terms in these expressions (and similar ones later on) may be chosen uniformly over i. Just as above, some of our results will require a stronger assumption than (M L ) that controls the error terms as a function of x, namely: (M + L ) There exists δ 1 > 0 such that, as x → ∞, Next we state our main results. The first two are concerned with the classification of the process as transient, null-recurrent, or positive-recurrent. Of these, first we consider the case where each line is associated with a drift that is asymptotically constant, and where at least one of these constants is nonzero. Theorem 2.4. Suppose that (B p ) holds for some p > 1, and conditions (M C ) and (Q ∞ ) hold. Then the following classification applies.
(i) If i∈S d i π(i) > 0, then X n is transient.
(ii) If i∈S d i π(i) < 0, then X n is positive-recurrent.
In the special case of (Q ∞ ) in which q x ≡ q does not depend on x, Theorem 2.4 is contained in Theorem 3.1.2 of Fayolle et al. [9], who imposed, in part, an assumption of a uniform lower bound on X n+1 − X n . In the generality of (Q ∞ ), part (ii) is contained in a paper of Falin [7], who also stated a version of part (i) assuming that (H) holds for x large enough.
The next result deals with the case of drift conditions of Lamperti-type.
Theorem 2.5. Suppose that (B p ) holds for some p > 2, and conditions (Q ∞ ) and (M L ) hold. The following sufficient conditions apply.
If, in addition, (Q + ∞ ) and (M + L ) hold, then the following condition also applies (yielding an exhaustive classification): In the case where S is a singleton, Theorem 2.5 reduces essentially to results of Lamperti [18,20], and so our result can be seen as a generalization of Lamperti's.
Our final main result concerns the weak convergence of (X n , η n ). The limit statement will involve the distribution function F α,θ defined for parameters α > 0 and θ > 0 by Note that, if Z ∼ Γ(α, θ) is a gamma random variable with shape parameter α > 0 and scale parameter θ > 0, then P[ √ Z ≤ x] = F α,θ (x). (In the special case with α = 1/2 and θ = 2, F α,θ is the distribution of the square-root of a χ 2 random variable with one degree of freedom, i.e., the absolute value of a standard normal random variable.) Theorem 2.6. Suppose that (B p ) holds for some p > 4, and conditions (Q ∞ ) and (M L ) hold. Suppose that the matrix q appearing in (Q ∞ ) is aperiodic. Suppose also that i∈S (2c i + s 2 i )π(i) > 0. Then, for any k ∈ S and x ∈ R + , where α = 1 2 + i∈S c i π(i) i∈S s 2 i π(i) , and θ = 2 i∈S s 2 i π(i).

(2.4)
Remarks 2.7. (i) Under the hypothesis of Theorem 2.6, Theorem 2.5 shows that the process is null-recurrent or transient; Theorem 2.6 demonstrates a form of asymptotic independence between X n (rescaled) and η n (which converges to π). By contrast, in the positive-recurrent aperiodic case, P[X n ≤ x, η n = k] (with no scaling) possesses a limit, but that limit cannot be identified without additional assumptions (and the limit distribution of η n need not even be π).
(ii) The case of Theorem 2.6 in which S is a singleton is essentially Lamperti's weak convergence result from [19].
(iii) If in addition (Q + ∞ ) and (M + L ) hold, then the boundary case i∈S (2c i +s 2 i )π(i) = 0 is null-recurrent, by Theorem 2.5. In this case the proof given in Section 5.2 below can be modified to show that n −1/2 X n → 0 in probability; this is consistent with the fact that the α → 0 limit of F α,θ corresponds to a point mass at 0.
(iv) With some additional work, the arguments in Section 5.2 should yield the process version of Theorem 2.6: in the sense of finite dimensional distributions, as n → ∞, where (2/θ) 1/2 x t is a Bessel process with dimension 2α and ω t is an S-valued white noise process whose finite-dimensional marginals are sequences of i.i.d. π-distributed variables.
The remainder of the paper is organized as follows. In Section 3 we give some additional context to the present work by describing how our setting generalizes the literature on additive functionals of Markov chains, and by presenting some additional examples, including a variant of the correlated random walk. Section 4 contains the bulk of our analysis, which proceeds via considering an embedded Markov chain. The proofs of the main theorems are then completed in Section 5.
To simplify the presentation in the rest of the paper, we often write P x,i [ · ] for P[ · | X 0 = x, η 0 = i], corresponding to the law of the Markov chain with initial state (x, i) ∈ Z + × S; similarly for (expectation) E x,i .
We finish this section with some general remarks. Our method of proof is different from other approaches in the literature. Falin [7,8], while also making use of Foster-Lyapunov results, bases his computations on a delicate algebraic calculation. Rogers [26] uses an embedded Markov chain, as we do, but his analysis relies on the additive functional representation (see Section 3.1). Our approach to the excursion estimates for the embedded process, via the Doob decomposition, makes the emergence of the 'pseudodrift' quantities particularly intuitive from a probabilistic perspective: see the discussion around (3.3) below.
The case where S is infinite can give rise to completely different phenomena from the finite setting, and we do not consider this here. Under suitable assumptions, however, such as uniform versions of our asymptotic conditions (Q ∞ ), (M C ) or (M L ), and sufficient moments for τ and the increments of X n , the results of the present paper should extend to the infinite setting.
3 Examples and remarks on the literature 3

.1 Homogeneity and additive functionals
As mentioned in Remark 2.3, condition (H) is assumed in much of the literature. A special structure emerges when (H) is imposed for all x. Indeed, one then has that (η n ) itself is a Markov chain, since Then if ψ : Z × S → Z is given by ψ(z, i) = z, we may write which represents X n as an additive functional of a Markov chain. However, for x ∈ Z + , assuming that (H) holds for all x ≥ 0 is very restrictive, and implies that X n+1 − X n ≥ 0 a.s. (see Remark 2.3). So in the homogeneous setting, it makes sense to instead take the state space to be Z × S so that (2.1) now holds with x and y in Z. Assuming that (H) holds for all x ∈ Z now yields the additive functional structure above, without imposing additional restrictions on the magnitude of X n+1 −X n .
In either case, we may note that say, assuming that the mean increments are well defined; so there is a constant mean drift µ 1 (i) for each i ∈ S. Moreover, if π is the stationary distribution on S associated with the Markov chain (η n ) given by (3.1), then a calculation shows that the Markov chain (X n − X n−1 , η n ) has stationary distribution ̟(z, i) on Z × S given by In this context, a result of Rogers [26] on additive functionals of Markov chains shows that recurrence classification of (X n ) depends on the sign of i∈S z∈Z There are many similar results in the literature for additive functionals of Markov chains in more general spaces, and related results in ergodic theory concerning 'co-cycles' (see, e.g., [2]). However, the methods adapted to this additive functional structure seem to depend crucially on the homogeneity assumption (H). The interpretation of the quantity of (3.3) is as a 'pseudo-drift' accumulated over i.i.d. excursions of the Markov chain: see Rogers [26]. We take this idea further, as the analogues of these excursions in our setting are not i.i.d., due to the additional nonhomogeneity. However, our methods exploit the essential structure that remains.

Correlated random walk
In the one-dimensional correlated random walk, a particle performs a random walk on Z with a short-term memory: the distribution of X n+1 depends not only on the current position X n , but also on the 'direction of travel' X n − X n−1 . Formally, (X n , X n − X n−1 ) is a Markov chain on Z × {−1, +1}. Supposing also that (H) holds for all x ∈ Z, this is a special case of the framework discussed in Section 3.1, with η n = X n − X n−1 .
One standard version of the model supposes that the nonzero transition probabilities are given by p( is the transition matrix of the Markov chain (η n ), and ρ i ∈ (− 1 2 , 1 2 ) are fixed parameters. For this random walk, the additive structure described in Section 3.1 is particularly simply is zero if and only if ρ i = ρ is the same for each i; the random walk is recurrent in exactly this case.
A positive ρ i corresponds to persistence of the walker in direction i (the walker has an 'inertia'); a negative ρ i corresponds to a walker who vacillates in direction i, and has an increased propensity to turn around.
Such models have a long history, and have been studied under different names by many different researchers: as 'persistent random walks' by Fürth [10], 'correlated random walks' by Gillis [11], 'random walks with restricted reversals' by Domb and Fisher [5], and, recently, 'Newtonian random walks' by Lenci [21]. Under appropriate rescaling, the model leads to the telegrapher's equation in the scaling limit, as discussed by Goldstein [12] and Kac [16]. There has been a large amount of recent work on correlated random walk and related models; a small selection is [1,3,14,27]. Motivation for studying these models arises from several sources, including physical Brownain motion [10] and models for molecular configurations [4]. We refer to [15] for some additional background and references.
As an application of our main results, consider the following variation on the onedimensional correlated random walk, intended to probe more precisely the recurrencetransience phase transition. This time we take the state-space to be Z + × {−1, +1} to fit into the setting of Section 2. We suppose that the nonzero transition probabilities are For c > 0, the walk is persistent in the positive direction but vacillating in the negative direction; conversely for c < 0. So for nonzero c, the symmetry between the two directions present in the (recurrent) c = 0 case is broken: how does this affect the recurrence?

Modulated queue
To finish this section we return to the queueing model as presented in the introduction.
Recall that the critical case from the point of view of recurrence and transience is when ρ = λ, and we are interested in the behaviour of the model under perturbations of the constants c i for i ∈ S. For this model we have so that (M + L ) holds. Let π be the stationary distribution associated with transition matrix M, and setc = i∈S c i π(i). Applying Theorems 2.5 and 2.6 yields the following result (cf. Corollary 3.1).
We refer to (X m , η m ) τn≤m≤τ n+1 as the nth excursion from the line 0. The basis for our analysis of the embedded Markov chain (Y n ) will be an analysis of a single excursion, depending on the starting position. A key component of this analysis is a coupling result, which we present in the next subsection.

Coupling construction
Lemma 4.1. Suppose that condition (B p ) holds for some p > 1 and condition (Q ∞ ) holds. Then there exists a Markov chain (X n , η n , η ⋆ n ) on Z + × S × S such that • (X n , η n ) is a Markov chain on Z + × S with transition probabilities p(x, i, y, j); • (η ⋆ n ) is a Markov chain on S with transition probabilities q(i, j); and • for all n ∈ Z + and all i ∈ S, Finally, suppose in addition that (Q + ∞ ) holds. Then there exists δ > 0 such that, for any A < ∞, for all i ∈ S, as x → ∞, The statements of Lemma 4.1 will follow from a coupling argument. Essentially, equation (4.1) is proved using a maximal coupling of η n and η ⋆ n ; the condition (Q ∞ ) that q x (i, j) has a limit as x → ∞ means that we can control the probability of decoupling, provided that X n stays sufficiently large, and it is this dependence on X n that introduces a (minor) complication to an otherwise standard argument. Equation (4.2) is proved in a similar manner using the stronger condition (Q + ∞ ) on q x (i, j); the full details of the proof can be found in Appendix A.
In the remainder of this subsection we explore some consequences of the coupling described in Lemma 4.1. First we introduce additional notation in the context of the joint probability space on which the coupled process (X n , η n , η ⋆ n ) is constructed. We denote by τ ⋆ the first return time to 0 of the Markov chain (η ⋆ n ), namely τ ⋆ := min{n ≥ 1 : η ⋆ n = 0}.
Moreover, we write P x,i,j for the probability measure conditional on X 0 = x, η 0 = i, η ⋆ 0 = j, and E x,i,j for the corresponding expectation.
Irreducibility of the time-homogeneous Markov chain (X n , η n ) and finiteness of S imply that for any x, there exist m(x) < ∞ and ϕ(x) > 0 such that In the specific case that q x (i, j) is constant in x, the process (η n ) is distributed exactly as the finite irreducible Markov chain (η ⋆ n ), so the functions m(x) and ϕ(x) in (4.3) can be chosen to be uniform over x. Our first consequence of the above coupling is that (4.3) can be strengthened to such a uniform version under our weaker conditions: roughly speaking, assumption (Q ∞ ) implies that (η n ) is sufficiently close to (η ⋆ n ) when the X-coordinate of (X n , η n ) is sufficiently large, and irreducibility does the rest.
In the proof of this result, and at several points later on, we consider the event Proof of Lemma 4.2. We work with the Markov chain (X n , η n , η ⋆ n ) given in Lemma 4.1. Since η ⋆ is a finite irreducible Markov chain, there exist m < ∞ and ϕ > 0 such that P x,i,i [τ ⋆ ≤ m] ≥ 2ϕ for all i and all x. Conditional on η n and η ⋆ n remaining coupled up to time m, we have τ ≤ m if and only if τ ⋆ ≤ m; hence But by Lemma 4.1, there exists x 0 such that P x,i,i [E c m ] ≤ ϕ for all x ≥ x 0 and hence (4.4) holds for all i and all x ≥ x 0 .

Excursion durations and occupation estimates
Next we give an exponential tail bound for the duration of excursions, uniform in the initial location. Lemma 4.3. Suppose that condition (B p ) holds for some p > 1 and condition (Q ∞ ) holds. Then there exist constants c > 0 and C < ∞ such that, for all x, n, and r, Proof. Recall that since τ n+1 − τ n conditional on Y n = x has the same distribution as τ conditional on X 0 = x, η 0 = 0, it suffices to show that, for some constants C, c > 0, P x,i [τ > r] ≤ Ce −cr , for all x and i.
(4.6) (We then get the claimed result for τ n+1 −τ n by setting i = 0.) Recall that, by Lemma 4.2, P x,i [τ ≤ m] ≥ ϕ. Moreover, using the time-homogeneity of (X n , η n ), for all x and i, for all positive integers k. But this implies that, for all positive integers k, Finally, for general r ∈ Z + , there exists an integer k such that km ≤ r < (k + 1)m, so for constants C, c > 0 dependent only on ϕ and m, giving (4.6).
The next result shows that the mean occupation time of (X n , η n ) on line i per excursion can be approximated by the mean occupation time of (η ⋆ n ) in state i per excursion. Lemma 4.4. Suppose that condition (B p ) holds for some p > 1 and condition (Q ∞ ) holds. Then, for any i ∈ S, If, in addition, (Q + ∞ ) holds, then there exists δ > 0 such that, for any i ∈ S, as x → ∞, Proof. Again we work with the Markov chain (X n , η n , η ⋆ n ) whose existence is given in the statement of Lemma 4.1. Fix i ∈ S. For the duration of this proof, we write Since (η ⋆ n ) is a Markov chain on S with transition probabilities q(i, j), standard Markov chain theory yields E x,0,0 [W ⋆ ] = π(i)/π(0), for any x ∈ Z + . The statements of the lemma will follow from suitable estimates for E Again define E n by (4.5). Then, for any positive integer n, (4.7) Here, by Cauchy-Schwarz and the tail estimates in Lemma 4.3, for some constants C < ∞ and c > 0, not depending on x, and similarly for the term involving τ ⋆ . For the first statement in the lemma, it suffices to show that Under assumption (Q ∞ ), it follows from (4.8) and its analogue for τ ⋆ that for any ε > 0 we may choose n ≥ n 0 sufficiently large so that the right-hand side of (4.7) is less than ε, and then E x,0,0 [|W − W ⋆ |] ≤ nP x,0,0 [E c n ] + ε. For fixed n, P x,0,0 [E c n ] → 0 as x → ∞ by (4.1), so that lim sup x→∞ E x,0,0 [|W − W ⋆ |] ≤ ε. Since ε > 0 was arbitrary, (4.9) follows.
For the second statement in the lemma, under assumption (Q + ∞ ), we use a similar argument but with n = n(x) = ⌊A log x⌋. As before, . For a sufficiently large choice of constant A, the exponential bound (4.8) shows that the right-hand side of (4.7) decays as a power of x, for n = n(x). Finally, the term n(x)P x,0,0 [E c n(x) ] also decays as a power of x, by (4.2), and so we see that E x,0,0 [|W − W ⋆ |] decays as a power of x, as required.

Recurrence and transience relationships
In this subsection we demonstrate the equivalence of recurrence properties of the embedded process (Y n ) to those of the process (X n ).
From this point of the paper onwards, we will be increasingly concerned with multiple excursions, and it is useful to introduce the notation σ 0 := 0 and, for n ∈ Z + , σ n+1 := τ n+1 − τ n for the durations of the excursions. Recall the definition of Y n from Section 4.1. Under our conditions (cf. Lemma 4.3), σ n < ∞ a.s. for each n. Hence Y n = ∂ a.s., and we can identify Y n with X τn for all n. For the remainder of the paper we employ this slight abuse of notation, and assume that the state space of (Y n ) is Z + . The next result relates recurrence of (X n ) to recurrence of (Y n ). Proof. As explained in Section 4.1, the fact that (Y n ) is a Markov chain follows from the strong Markov property for (X n , η n ).
Irreducibility of (Y n ) follows from the irreducibility of (X n , η n ), as follows. For any x, y ∈ Z + , there exists a finite path in the state space Z + × S from (x, 0) to (y, 0) that the chain (X n , η n ) has a positive probability of following. But then the (finite) subpath consisting of the points that are on line 0 corresponds to a path in the state space Z + that (Y n ) has a positive probability of following. Finally, we verify (ii). Let ξ = min{n ≥ 1 : Y n = 0}, and ζ = min{n ≥ 1 : (X n , η n ) = (0, 0)}.
Then (Y n ) is positive-recurrent if and only if E x,0 ξ < ∞ for some (hence all) x, while (X n , η n ) is positive-recurrent if and only if E x,0 ζ < ∞. However, ξ and ζ are related since, given η 0 = 0, it is the case that τ 0 = 0 and ζ = τ ξ , i.e., In particular, (4.10) shows that ζ ≥ ξ, a.s., so E x,0 ζ < ∞ implies that E x,0 ξ < ∞. For the implication in the other direction, take expectations in the final expression in (4.10) and use linearity of expectations and Fubini's Theorem to get is uniformly bounded by a constant, C, say, so that Hence E x,0 ζ < ∞ if and only if E x,0 ξ < ∞. Finally, (ii) follows from Lemma 2.2, which gives the equivalence of positive-recurrence for (X n , η n ) and (X n ).

Increment moment estimates
So far, we have studied the excursions of (X n , η n ) away from the line η n = 0 in terms of the η-coordinate. The next stage is to study the behaviour, over an excursion, of the X-coordinate. In particular, we estimate the moments of Y n+1 − Y n , with a view to later applying a Lamperti condition to determine the recurrence/transience of (Y n ). First, we need estimates on the maximum deviation of X n during a single excursion: note that the distribution of D n given X τn = x depends only on x and not on n.
Lemma 4.6. Suppose that condition (B p ) holds for some p > 1 and condition (Q ∞ ) holds. Then, for any q ∈ (0, p), Proof. Conditional on X τn = x, we have for all d ≥ 0 and y > 0, by Lemma 4.3. Here, which follows from the inequalities of Boole and Markov and the fact that by assumption (B p ). Then, taking y = d (p−q)/(1+p) , where q ∈ (0, p), we obtain , as claimed. The final claim follows from the fact that which is finite when α ∈ (0, q), where q can be arbitrarily close to p.
We are now in a position to calculate the moments of Y n+1 − Y n . The first case to consider is when, for each i, µ(x, i) is asymptotically d i .
Lemma 4.7. Suppose that condition (B p ) holds for some p > 1, and conditions (Q ∞ ) and (M C ) hold. Then there exists ε > 0 such that (4.12) Also, as x → ∞, Proof. First, note that |Y n+1 − Y n | = |X τ n+1 − X τn | ≤ |D n |, a.s., where D n is given by (4.11). Then the statement (4.12) follows from Lemma 4.6 with (B p ) for p > 1. It remains to prove (4.13); by the time-homogeneity of (X n , η n ) and since Y n = X τn , it suffices to consider E x,0 [X τ − X 0 ]. The Doob decomposition for X n is where M n is a martingale with M 0 = 0. Hence, by definition of µ 1 (x, i), Since Eτ < ∞, and E[|M n+1 − M n | | F n ] ≤ 2E[|X n+1 − X n | | F n ] ≤ 2C 1 , a.s., (by the p = 1 case of (B p )), the Optional Stopping Theorem gives EM τ = M 0 = 0. Therefore, (4.14) Now, let D = max 0≤k≤τ |X k − X 0 |, and set A x = {D < x γ }, for some γ ∈ (0, 1). Note that, conditional on X 0 = x and η 0 = 0, the random variable D has the same distribution as the random variable D n defined at (4.11) given X τn = x, so by Lemma 4.6 we have Now, given X 0 = x and A x , we have for all 0 ≤ k ≤ τ that X k ≥ x − x γ ≥ x/2, say, for all x sufficiently large. Thus, by (M C ), for any θ > 0, there exists x 0 < ∞ such that, Since max x,i |µ 1 (x, i)| < ∞ and max i |d i | < ∞, it follows that there exists a constant C < ∞ such that, given Hence, given Here, by the Cauchy-Schwarz inequality, using (4.15) and the fact that τ has all moments, by Lemma 4.3. So, for any δ > 0, we can choose x 1 < ∞ sufficiently large so that, given Together with Lemma 4.4 and (4.14) this yields (4.13).
If, in addition (Q + ∞ ) and (M + L ) hold, then there exists δ > 0 such that Proof. First, since |Y n+1 −Y n | ≤ D n , with D n as defined at (4.11), and because Lemma 4.6 implies that sup x E[(D n ) 2+ε | X τn = x] < ∞, (4.16) follows. The proof of (4.17) and (4.18) using (Q ∞ ) and (M L ) and the proof of (4.19) and (4.20) using (Q + ∞ ) and (M + L ) are essentially the same, the only difference being in the error terms associated to each expression. We present the proof of (4.19) and (4.20); it should be clear how to adapt the argument to prove (4.17) and (4.18).
We proceed as in the proof of Lemma 4.7. Indeed, we follow the reasoning from the second paragraph of that proof through to equation (4.14), giving and we let D = max 0≤k≤τ |X k − X 0 |, and set A x = {D < x γ } as before, but now we require γ ∈ (1/2, 1). Note that, conditional on X 0 = x and η 0 = 0, the random variable D has the same distribution as the random variable D n defined at (4.11) given X τn = x, so by Lemma 4.6 we have that P x,0 [D ≥ d] = O(d −p ′ ) for some p ′ > 2 since τ has all moments and (B p ) holds for some p > 2. Now, given X 0 = x and A x , we have |X k − x| ≤ D < x γ for k ≤ τ , so that, by (M + L ), where the implicit constants are uniform in x and in i. By (Q + ∞ ) and the second statement in Lemma 4.4, we have that for some δ ′ > 0, so Here, by Hölder's inequality, for all r, s > 0 with r −1 + s −1 = 1, . Then, since γ ∈ (1/2, 1), δ 1 > 0 and δ ′ > 0 we have, for some δ ′′ > 0, To calculate the second moment of X τ −X 0 , we will make repeated use of the algebraic identity a 2 − b 2 = (a − b) 2 + 2b(a − b), which will help to simplify the calculations that follow. Taking the Doob decomposition for X 2 n , we write by (M + L ), where M n is a martingale satisfying M 0 = 0. Moreover, given X 0 = x, where D = max 0≤k≤τ |X k −X 0 | is as defined earlier, and C < ∞ is a constant. Thus, M n∧τ is uniformly integrable (in n) and so by the Optional Stopping Theorem EM τ = M 0 = 0. Therefore, As in the calculation of the first moment, we can bound the error term by bootstrapping on the event as above, and therefore, by (4.21), for some δ ′′′ > 0. Finally, taking δ = min{δ ′′ , δ ′′′ } yields (4.19) and (4.20), as required.

Recurrence classification
To prove Theorems 2.4 and 2.5, we use the increment moment estimates from Section 4.5 together with some Foster-Lamperti conditions to classify the process (Y n ), and then deduce the classification for (X n ) from the equivalence results in Section 4.4. For Theorem 2.5, under Lamperti-type drift assumptions, we apply the following classification result.
, for some δ > 0, then Z n is null-recurrent.
Lemma 5.1 is essentially due to Lamperti [18,20], although the form given here is taken from Menshikov et al. [24,Theorem 3]. The conditions for recurrence and transience are contained in Theorem 3.2 of [18], and the condition for positive-recurrence is contained in Theorem 2.1 of [20]. The condition for null-recurrence here is slightly sharper than Lamperti's original results [20].
Proof of Theorem 2.5. We apply Lemma 5.1 to classify Z n = Y n , and thus, by Lemma 4.5, classify X n . First, assuming (B p ) for some p > 2, (Q ∞ ) and (M L ), by Lemma 4.8 it is clear that (5.1) and (5.2) hold for Z n = Y n . Furthermore, which means the middle condition of Lemma 5.1 holds for any δ > 0, and therefore Y n is null-recurrent. Now suppose that (Q + ∞ ) and (M + L ) also hold. Then, by Lemma 4.8, we have for some δ > 0, which means that | i∈S 2c i π(i)| = i∈S s 2 i π(i) implies that (Y n ) is null-recurrent, completing the classification of (Y n ) and therefore of (X n ).
For Theorem 2.4 we will apply the following condition.
Lemma 5.2. Let (Z n ) be an irreducible time-homogeneous Markov chain on Z + . For (Z n ) to be transient, it is sufficient that there exists ε > 0 such that We omit the proof of Lemma 5.2, which is similar to the proof of Lemma 5.1 and relies on demonstrating the existence of a suitable Lyapunov function with negative drift outside a bounded set, using Taylor's formula and some careful truncation.
Proof of Theorem 2.4. Consider the Markov chain (Y n ). Under the conditions of part (i) of the theorem, Lemma 4.7 implies that the hypotheses of Lemma 5.2 hold for Z n = Y n , so that (Y n ) is transient. Hence, by Lemma 4.5, (X n ) is also transient.
As mentioned after the statement, part (ii) was obtained by Falin [7]. Our results furnish a different proof: Lemma 4.7 gives positive-recurrence for (Y n ) by Foster's criterion (e.g. Theorem 2.2.3 of [9]), so, by Lemma 4.5, (X n ) is also positive-recurrent.

Convergence in distribution
The first step in the proof of Theorem 2.6 is to apply a result of Lamperti [19] to obtain a weak limit for the embedded Markov chain (Y n ). Recall the distribution function F α,θ as defined at (2.3).
Lemma 5.3. Suppose (X n , η n ) is a Markov chain satisfying (B p ) for some p > 4, (Q ∞ ) and (M L ). Suppose that the matrix q appearing in (Q ∞ ) is aperiodic. Suppose also that i∈S (2c i + s 2 i )π(i) > 0. Define α and θ as at (2.4). Then, for any x ∈ R + , Proof. If (B p ) holds for some p > 4, then a consequence of Lemma 4.6 is that Now we apply Theorem 2.1 of [19] to the Markov chain (Y n ), using the increment moment estimates of Lemma 4.8 and noting the remark preceding the theorem in [19], to obtain Taking x = βx in (2.3) and using the change of variable v = u/β one observes the scaling relation, valid for any β > 0, F α,θ (βx) = F α,θ/β 2 (x), which implies the result.
The next goal is to deduce from the weak limit for Y n a weak limit for X n . To do so, we need (i) to control the value of the process (X n ) between successive observations of the embedded process, and (ii) to account for the change of time. First we address point (i). For each n ∈ Z + , let N(n) := max{k : τ k ≤ n}, so that τ N (n) ≤ n < τ N (n)+1 .
Next we turn to point (ii) mentioned above. For our purposes, the following renewaltype result will suffice. Lemma 5.6. Suppose (X n , η n ) is a Markov chain satisfying (B p ) for some p > 2, (Q ∞ ) and (M L ). Suppose also that i∈S (2c i + s 2 i )π(i) > 0. Then, as n → ∞, n −1 N(n) → π(0) in probability.
Proof. Under the conditions of the lemma, Theorem 2.5 shows that X n (and hence Y n ) is null, i.e., null-recurrent or transient. In particular, for any x ≥ 0, We use an extension of the coupling given in Lemma 4.1 to multiple excursions. We construct on the same probability space (X n , η n ) together with a sequence (η ⋆ k,n ) of copies (for k ∈ Z + ) of the Markov chain (η ⋆ n ) as follows. At each τ k , k ∈ Z + , start (η ⋆ k,n ) n≥0 , an independent copy of (η ⋆ n ) n≥0 , from η ⋆ k,0 = η τ k = 0 ∈ S, coupled to (η n ) n≥τ k as described in Lemma 4.1; denote by σ ⋆ k+1 the number of steps until η ⋆ k,n returns to 0. Extending the notation E n defined at (4.5), we write E k,n = ∩ 0≤ℓ≤n {η τ k +ℓ = η ⋆ k,ℓ }, the event that the coupling started at τ k succeeds for n steps. Now we use this coupling construction and the null property (5.5) to show that n −1 τ n → π(0) −1 in probability. For s > 0, denote χ s (x) := x1{x ≤ s}. Note that Here A similar argument holds for σ ⋆ k+1 . Hence, for any ε > 0, there exists s 0 < ∞ such that for all s ≥ s 0 and all n. On the event E k,s (the coupling started at τ k succeeds for s steps) we have χ s (σ k+1 ) = χ s (σ ⋆ k+1 ). Then, for any x > 0, So for fixed s ≥ s 0 , Lemma 4.1 shows we may choose x ≥ x 0 large enough such that, for all n. Combining this with the null property (5.5), we obtain that, for fixed s ≥ s 0 , Thus with (5.6) we conclude that Since ε > 0 was arbitrary, and σ ⋆ k+1 are i.i.d. random variables with mean π(0) −1 , it follows that n −1 τ n → π(0) −1 in probability.
The claimed result now follows by inverting the law of large numbers: for example, which tends to 0 as n → ∞ for any ε > 0; similarly in the other direction.
In the proof of Theorem 2.6 we will use two facts about convergence in distribution that we now recall (see e.g. [6, p. 73]). First, if sequences of random variables ξ n and ζ n are such that ζ n → ζ in distribution for some random variable ζ and |ξ n − ζ n | → 0 in probability, then ξ n → ζ in distribution (this is Slutsky's theorem). Second, if ζ n → ζ in distribution and α n → α in probability, then α n ζ n → αζ in distribution.

A Proof of coupling lemma
In this appendix we give the deferred technical proof of our coupling result, Lemma 4.1.
Proof of Lemma 4.1. As commented on earlier, the proof follows an almost standard coupling argument. Indeed, since the first two statements of the lemma will be satisfied for any coupling of (X n , η n ) and (η ⋆ n ) on a common probability space, in order to also prove (4.1/4.2) it makes sense to use a maximal coupling of η n and η ⋆ n , which we will construct in a step-wise fashion. For us, the condition that q x (i, j) has a limit as x → ∞ means that the probability of decoupling at any step will be small, provided that X n stays sufficiently large. This introduces some complications to the standard coupling arguments, as we will need to keep control of the variation of X n .
We construct the Markov chain (X n , η n , η ⋆ n ) by describing a single step: • If η n = η ⋆ n then produce (X n+1 , η n+1 ) from (X n , η n ) according to the transition probabilities p(x, i, y, j), and produce η ⋆ n+1 from η ⋆ n independently according to the transition probabilities q(i, j).
Then, given η n+1 = j we produce X n+1 via which in turn implies that To complete the proof we need to show that, for x sufficiently large, so (A.2) will follow from lim r→∞ max x,i