On recurrence and transience of multivariate near-critical stochastic processes

We obtain complementary recurrence and transience criteria for processes $X=(X_n)_{n \ge 0}$ with values in $\mathbb R^d_+$ fulfilling a non-linear equation $X_{n+1}=MX_n+g(X_n)+ \xi_{n+1}$. Here $M$ denotes a primitive matrix having Perron-Frobenius eigenvalue 1, and $g$ denotes some function. The conditional expectation and variance of the noise $(\xi_{n+1})_{n \ge 0}$ are such that $X$ obeys a weak form of the Markov property. The results generalize criteria for the 1-dimensional case in [5].


Introduction and main results
For Markov chains with a higher-dimensional state space it is in general difficult to obtain criteria for recurrence or transience which cover a broader class of models. Typically this requires some specific assumptions on the type of model. In this paper we consider discrete time stochastic processes X = (X n ) n≥0 taking values in the positive orthant R d + (consisting of column vectors) with d ≥ 1, which obey non-linear equations of the form X n+1 = M X n + g(X n ) + ξ n+1 , n ∈ N 0 . (1.1) Here M denotes a d × d matrix with non-negative entries and g : R d + → R d + a measurable function. Let us successively discuss our assumptions on M , g and the random fluctuations (ξ n+1 ) n≥0 . We require that M is a primitive matrix meaning that for a certain power of M all entries are (strictly) positive. Then it is known from Perron-Frobenius theory that M has left and right eigenvectors = ( 1 , . . . , d ) and r = (r 1 , . . . , r d ) T belonging to some positive eigenvalue and possessing only positive entries. We assume that this eigenvalue is 1: M = , M r = r .
Further and r are unique up to scaling factors. As is customary we choose them such that r = 1 . (1.2) For the function g we assume that g(x) = o( x ) as x → ∞ (1.3) with some norm on the Euclidian space R d . As to the random fluctuations we demand that X is adapted to a filtration F = (F n ) n≥0 such that E[ ξ n+1 | F n ] = 0 , E[( ξ n+1 ) 2 | F n ] = σ 2 (X n ) a.s. (1.4) for some measurable function σ : R d + → R + fulfilling σ(x) = o( x ) for x → ∞ . (1.5) In view of applications such as branching processes we might summerize these requirements on the whole as the assumption of near criticality. Quite a few models fit into this framework. Here we do not dwell on them but refer to the paper [6] and to the literature cited therein. The assumption (1.4) establishes a weak form of the Markov property. We do not assume that X is a Markov chain but just formulate those assumptions which are required for the martingale considerations in our proofs. Certainly applications of our results will typically concern Markov chains.
The aim of this paper is to establish criteria which allow to decide whether X n → ∞ is an event of zero probability or not. Loosely speaking these are criteria for recurrence or transience of our models. In the univariate case d = 1 this question has been discussed in [5]. Ignoring some side conditions the result there was as follows: If for some ε > 0 and for x sufficiently large then we have recurrence. If on the other hand for some ε > 0 and for x sufficiently large then there is transience. Heuristically this can be understood as follows: In the first regime it is the noise ξ n+1 which dominates the drift g(X n ), while in the second regime it is the other way round. We like to generalize this dichotomy to the multivariate setting. A possible way of generalization is to suitably convert each of the two conditions to all x ∈ R d + with sufficiently large norm x , see Klebaner [7] and González et al [3]. A relaxation of this approach for special choices of g and σ 2 covering new examples has been obtained by Adam [1]. Yet one can do with weaker assumptions. The intuition behind this assertion is that our processes behave in a sense 1-dimensional. More precisely, if the event X n → ∞ occurs, then in view of (1.3) and (1.5) it is the term M X n , which dominates on the right-hand side of (1.1). Thus one would expect that X n will escape to ∞ approximately along the ray r = {νr : ν ≥ 0} spanned by the eigenvector r of M . This suggests that the two conditions above are required only in certain vicinities of this ray. (The last assertion of Theorem 2 below confirms this heuristics.) To formalize these considerations let us introduce some notation. For any x ∈ R d let x := r x ,x := (I − r )x , thus x =x +x , with the identity matrix I. Note thatx is the multiple ( x)r of the vector r and thus belongs to the ray r. From (1.2) r r = r respectivelyx =x meaning that r is a projection matrix. Moreover x = x and x = 0. The two conditionsx ∈ r and x = x determinex ∈ R d uniquely.
For convenience we require the additional moment condition (which could be relaxed) p | F n ] ≤ cσ p (X n ) with p = 2 + δ .
In the case d = 1 we havex = 0 and x · g(x) = xg(x) such that we are back to the result from [5]. Note that due to (1.3) the above condition x 2 ≤ b x · g(x) applies only to vectors x ∈ R d + with x = o( x ) for x → ∞. Since alsox = 0 for x ∈ r, the condition defines a certain vicinity of the ray r (depending on g). Outside this region the relation between g and σ 2 stays arbitrary.
For our second result on divergence of (X n ) n≥0 we first rule out an evident case. We Moreover we strengthen (1.5) to the assumption where δ is as in assumption (A1).

Theorem 2.
Let (A1) to (A3) be fulfilled and let ε > 0. Assume that for every b > 0 there exists some a > 0 such that for Then there is a real number v ≥ 0 such that P lim sup n X n ≤ v or X n → ∞ = 1 .
If also P(sup n≥0 X n > c) > 0 for every c > 0, then P( X n → ∞) > 0 and P X n X n → r r X n → ∞ = 1 .
Again we recover for d = 1 the corresponding result from [5]. Due to (A3) it is now the condition x ≤ bσ(x) giving the vicinity of the ray r, where g(x) and σ 2 (x) are interrelated.
Remark. Let us comment on the assumptions of Theorem 2.
1. Obviously (A2) is also a necessary requirement in Theorem 2. Typically it is easily checked in concrete examples. For Markov chains with a countable discrete state space S ⊂ R d + it says that away from zero there are no absorbing states. In the general case there is the following criterion: (A2) holds if g(x) is uniformly bounded away from zero on sets of the form {x ∈ R d + : u ≤ x ≤ u + 1} with u > 0 sufficiently large. For the proof of this claim adopt the arguments at the end of section 2 in [5] to the process ( X n ) n≥0 .
2. Assumption (A3) cannot be weakened substantially in our general context. This follows from example C, Section 3 in [5]. We note that (A3) is weaker than the corresponding assumption in [5] for the 1-dimensional case.
3. Remarkably, condition (1.7) cannot be relaxed in our general context. It is not enough to require (1.7) just for some b > 0 as we shall see at the end of this paper by means of a counterexample. It is tempting to conjecture that condition (1.6) cannot be weakened, too.
So far we have not specified any choice of the norm on R d . This was not necessary so far, since as is well-known all norms on a finite dimensional Euclidean space are equivalent, and one easily convinces oneself that all our conditions or statements involving norms are preserved if one passes to an equivalent norm. Thus, in examples one may work with the most convenient one, e.g. the l 1 -or l 2 -norm. For our proofs these norms are not appropriate. We shall utilize a norm specificially suited for our purposes. This norm is introduced in section 2. The proofs of the theorems are then presented in section 3 and 4. They use ideas from [5] and [8] and are based on the construction of Lyapunov functions of the form Section 5 contains the counterexample.
For notational convenience we use the symbol c for a positive constant which may change its value from line to line.

A useful norm
Let us briefly put together the facts on matrices which we are going to use. Recall that M is a primitive matrix with Perron-Frobenius eigenvalue 1 and corresponding left and right eigenvectors and r. Then as is well-known from Perron-Frobenius theory (see This maximum is called the spectral radius of the matrix M − r . It follows from matrix theory (see [4], Lemma 5.6.10) that one can construct a matrix norm ||| ||| on the space of all d × d matrices such that ρ := |||M − r ||| < 1 .
From this matrix norm we obtain (see [4], Theorem 5.7.13) a functional on R d via where C x denotes the d × d matrix having all columns equal to x.
is a norm, since the properties of norms transfer from ||| ||| directly to . This is the norm we are going to work with in the sequel. It has the property Ax ≤ |||A||| · x (2.1) for x ∈ R d and any d × d matrix A. Indeed C Ax = AC x and the property |||C Ax ||| ≤ |||A||| · |||C x ||| of matrix norms gives the claim. In particular By equivalence of norms we may change from to any other norm. In particular there is a constant λ < ∞ such that To see this observe that from the inequality (2.1) we have x ≤ γ x with γ = |||I − r |||. Also x := 1 |x 1 | + · · · + d |x d | defines a norm on R d , since i > 0 for all i = 1, . . . , d.
Thus by equivalence of norms we arrive at (2.3).
In order to apply these results to our process (X n ) n≥0 note that we have (I − r )M = M − r = M (I − r ) and X n = 0, thuš for some c < ∞. (Here we need that g(x) has only non-negative components.) Further observe that for any µ > 0 and a, b ≥ 0 we have Applying this estimate twice to the right-hand side of (2.4) we obtain for any µ > 0 with a suitable c < ∞.

Proof of Theorem 1
First observe that if we replace X n by X n := X n + r for all n ≥ 0 then equations (1.1) and (1.4) as well as assumption (A1) still hold, if g(x) and σ 2 (x) are replaced by g(x) := g(x − r) and σ 2 (x) := σ 2 (x − r). Note that the assumptions (1.3) and (1.5) are not affected if g and σ 2 are substituted by g and σ 2 , and the same holds true for the conditions formulated in Theorem 1 if one replaces ε by ε/2. Thus without loss of generality we may assume X n ≥ 1 for all n ≥ 0 throughout the proof. Then for any α > 0 L n := X n 2 ( X n ) 2 + α log X n , n ∈ N 0 , is a sequence of non-negative random variables. We show that for large α it possesses a supermartingale property. The proof uses the following estimate, where I(A) denotes the indicator variable of an event A.

Lemma 2.
If α is chosen large enough, then there is a number s > 0 such that Proof. Since M = we have the equation X n+1 = X n + g(X n ) + ξ n+1 . (1 + µ)ρ 2 X n 2 + c ( g(X n )) 2 + c ξ n+1 for some sufficiently large c < ∞. Now ρ < 1, thus, if µ is sufficiently close to 0, X n+1 In view of (A1), if we further enlarge c, Next from (3.1) and Lemma 1 (with t = X n + g(X n ) and h = ξ n+1 ) for η > 0 log X n+1 ≤ log( X n + g(X n )) By means of (1.3), (1.4), (A1) and the Markov inequality and choosing η sufficiently small it follows for X n sufficiently large with some c < ∞. Because of (1.5) there is a number s > 0 such that for X n ≥ s a.s.
for X n ≥ s and s sufficiently large. If we let α ≥ 6c/ε − c we arrive at for X n ≥ s. We are now ready for the conclusion: If (α + c) g(X n ) · X n ≤ µ X n 2 , then obviously E[L n+1 | F n ] ≤ L n a.s. for X n ≥ s. If on the other hand µ X n 2 ≤ (α + c) g(X n ) · X n then by equivalence of norms there is a b < ∞ such that X n 2 ≤ b g(X n ) · X n . Now the assumption of Theorem 1 comes into play, and again E[L n+1 | F n ] ≤ L n a.s., if only X n is large enough. Thus the claim of the lemma follows.
We complete the proof of Theorem 1 now as in [5]. Suppose that the event X n → ∞ has positive probability. Then the same holds for the event L n → ∞, and there is natural number N such that P(E) > 0 for the event Define the stopping time T N := min{n ≥ N : L n < s} .
In view of Lemma 2 the process (L n∧T N ) n≥N is a supermartingale. It is non-negative and thus a.s. convergent. However, on the event E we have T N = ∞ and L n → ∞ and consequently L n∧T N → ∞. This contradicts the assumption P(E) > 0, and the proof is finished.

Proof of Theorem 2
Here we may replace X n by X n + 3r. Therefore without loss of generality we assume X n ≥ 3 for all n ∈ N 0 . Now we consider the processes L = L α,β,γ,j given by with the jth component X n,j of X n , 1 ≤ j ≤ d, and with α, β > 0 and γ ≥ 0. In view of the Jensen inequality we may without loss of generality restrict ourselves to the case 2 < p ≤ 3, in which the following estimate is valid.
Then there is a constant c < ∞ such that for all t ≥ 3 and h > 3 − t Proof. See formula (6) in [5]. Lemma 4. Let 0 < β < κδ − 1 and γ ≥ 0 such that (1 + γ/ j )ρ 2 < 1. Then, if α is sufficiently large, there is a real number s > 0 such that Proof. We proceed similarly as in the proof of Lemma 2. Here instead of (3.2) we have the estimate By assumption on γ and for µ > 0 sufficiently small this implies with some c < ∞. Next from Lemma 3 with t = X n and h = g(X n ) + ξ n+1 , from (2.5) and (3.1) and from g(X n ) ≥ 0 for X n sufficiently large. Combining this estimate with (4.1) and rearranging terms we L = L α,β,γ,j . Observe that for some s > 0 and for m, m > 0 and t > s fulfilling If we choose α, β, γ and s as demanded in Lemma 4, then (m ∧ L n ) n≥0 becomes a non-negative supermartingal, which thus is a.s. convergent. Then up to a null-event there arise three possibilities. Either L n → 0, then X n → ∞. Or lim inf n L n ≥ m, then lim sup n X n ≤ t. Or else L n has a limit 0 < L ∞ < m, then s ≤ lim inf n X n < ∞. In order to transfer these alternatives to the process (X n ) n≥0 we choose different β 1 , β 2 > 0 and a γ > 0 fitting the assumptions of Lemma 4. We consider the processes L 0 := L α,β1,0,1 , L 1 := L α,β1,γ,1 , . . . , L d := L α,β1,γ,d , L d+1 := L α,β2,0,1 and for some s, t, m > 0 the events We let α, s, t large and m small enough such that the above conclusion for L = (L n ) n≥0 applies simultaneously to all processes L 0 , . . . , L d+1 . Then P(E ∪ E ∪ E ) = 1. Let us show that P(E ) = 0 for s sufficiently large. We have Thus the sequence X n is convergent on E with s ≤ lim n X n < ∞. This means that the random variablesX n = r X n converge on E . Next from the definition of L 0 it follows that the sequence X n converges on the event E with some limit Z. If Z = 0 thenX n → 0, and we obtain that X n =X n +X n is convergent on E . If on the other hand Z > 0, then we see from the convergence of L 1 n , . . . , L d n that the components X n,1 , . . . , X n,d all converge on E . Again we conclude that X n is a convergent sequence on the event E . Let X ∞ be the limit. Now, given u > 0, if we choose s sufficiently large then from s ≤ lim n X n < ∞ on E we obtain u ≤ X ∞ < ∞ by equivalence of norms. Therefore assumption (A2) may be applied and we obtain P(E ) = 0 and consequently P(E ∪ E ) = 1. By equivalence of norms this translates into the first assertion of Theorem 2.
For the second assertion we switch back to the supermartingale m ∧ L with γ = 0. Let c > t be such that If now P(E ) = 1, then lim n m ∧ L n ≥ α(log t) −β a.s. which contradicts the last inequality. Therefore it follows P(E) > 0. This gives the second assertion.
For the last assertion we first show that ξ n+1 = o( X n ) a.s. on the event X n → ∞ . If again α, β, γ and s are chosen in accordance with Lemma 4 then (L n∧T N ) n≥0 is a non-negative supermartingal and thus a.s. convergent. It follows ∞ k=0 σ p (X k ) ( X k ) p < ∞ a.s. on the event T N = ∞ . Now in view of the first assertion of this theorem {T N = ∞} ↑ { X n → ∞} for N → ∞, if only s is sufficiently large. Therefore ∞ k=0 σ p (X k ) ( X k ) p < ∞ a.s. on the event X n → ∞ .
Because of (A1) and the Markov inequality this entails for every η > 0 ∞ k=0 P( ξ k+1 > η X k | F k ) < ∞ a.s. on the event X n → ∞ , and the martingale version of the Borel-Cantelli Lemma (see [2] By induction On the other handX n / X n = r/ r . This yields the last claim of Theorem 2.
However, due to the definition of g(x), the condition (1.7) will never be satisfied for b > 1, no matter how g and σ are chosen. We shall see that indeed the conclusion of Theorem 2 fails, even though (1.7) can be achieved for b ≤ 1 (but not all b). The reason is that the process X again and again leaves the region defined by the inequality x ≤ σ(x).