Scaling limits of recurrent excited random walks on integers

We describe scaling limits of recurrent excited random walks (ERWs) on integers in i.i.d. cookie environments with a bounded number of cookies per site. We allow both positive and negative excitations. It is known that ERW is recurrent if and only if the expected total drift per site, delta, belongs to the interval [-1,1]. We show that if |delta|<1 then the diffusively scaled ERW under the averaged measure converges to a (delta,-delta)-perturbed Brownian motion. In the boundary case, |delta|=1, the space scaling has to be adjusted by an extra logarithmic term, and the weak limit of ERW happens to be a constant multiple of the running maximum of the standard Brownian motion, a transient process.

An element of Ω M is called a cookie environment. For each z ∈ Z, the sequence {ω z (i)} i∈N can be thought of as a stack of cookies at site z. The number ω z (i) represents the transition probability from z to z + 1 of a nearest-neighbor random walk upon the i-th visit to z. If ω z (i) ≥ 1/2 (resp. ω z (i) < 1/2) the corresponding cookie is called non-negative (resp. negative).
Let P be a probability measure on Ω M , which satisfies the following two conditions: (A1) Independence: the sequence (ω z (·)) z∈Z is i.i.d. under P; (A2) Non-degeneracy: (1 − ω 0 (i)) > 0. For x ∈ Z and ω ∈ Ω M consider an integer valued process X := (X j ), j ≥ 0, on some probability space (X , F , P x,ω ), which P x,ω -a.s. satisfies P x,ω (X 0 = x) = 1 and P x,ω (X n+1 = X n + 1 | F n ) = 1 − P x,ω (X n+1 = X n − 1 | F n ) = ω Xn (L Xn (n)), where F n ⊂ F , n ≥ 0, is the natural filtration of X and L m (n) := n j=0 ½ {Xj =m} is the number of visits to site m by X up to time n. Informally speaking, upon each visit to a site the walker eats the topmost cookie from the stack at that site and makes one step to the right or to the left with probabilities prescribed by this cookie. The consumption of a cookie ω z (i) induces a drift of size 2ω z (i) − 1. Since ω z (i) = 1/2 for all i > M , the walker will make unbiased steps from z starting from the (M + 1)-th visit to z. Let δ be the expected total drift per site, i.e. (1) δ The parameter δ plays a key role in the classification of the asymptotic behavior of the walk. For a fixed ω ∈ Ω the measure P ω,x is called quenched. The averaged measure P x is obtained by averaging over environments, i.e. P x ( · ) := E (P x,ω ( · )).
There is an obvious symmetry between positive and negative cookies: if the environment (ω z ) z∈Z is replaced by (ω ′ z ) z∈Z where ω ′ z (i) = 1 − ω z (i), for all i ∈ N, z ∈ Z, then X ′ , the ERW corresponding to the new environment, satisfies X ′ d = −X, where d = denotes the equality in distribution. Thus, it is sufficient to consider only non-negative δ (this, of course, allows both negative and positive cookies), and we shall always assume this to be the case.
ERW on Z in a non-negative cookie environment and its natural extension to Z d (when there is a direction in R d such that the projection of a drift induced by every cookie on that direction is nonnegative) were considered previously by many authors (see, for example, [4], [22], [23], [2], [3], [17] [5], [9], [16], and references therein).
Our model allows both positive and negative cookies but restricts their number per site to M . This model was studied in [14], [15], [20], [19]. It is known that the process is recurrent (i.e. for P-a.e. ω it returns to the starting point infinitely often) if and only if δ ≤ 1 ( [14]). For transient (i.e. not recurrent) ERW, there is a rich variety of limit laws under P 0 ( [15]).
In this paper we study scaling limits of recurrent ERW under P 0 . The functional limit theorem for recurrent ERW in stationary ergodic non-negative cookie environments on strips Z × (Z/LZ), L ∈ N, under the quenched measure was proven in [9]. Our results deal only with i.i.d. environments on Z with bounded number of cookies per site but remove the non-negativity assumption on the cookies. We are also able to treat the boundary case δ = 1. Extensions of these results and results of [15] to strips, or Z d for d > 1, or the "boundary" case for the model treated in [9] are still open problems.
Theorem 2 (Boundary case). Let δ = 1 and B * (t) := max s≤t B(s), t ≥ 0. Then there exists a constant D > 0 such that Observe that for δ = 1 the limiting process is transient while the original process is recurrent. To prove Theorem 2 we consider the process S j := max 0≤i≤j X i , j ≥ 0, and show that after rescaling it converges to the running maximum of Brownian motion. The stated result then comes from the fact that with an overwhelming probability the maximum amount of "backtracking" of X j from S j for j ≤ [T n] is of order √ n, which is negligible on the scale √ n log n (see Lemma 10).

Notation and preliminaries
Assume that δ ≥ 0 and X 0 = 0. Let T x = inf{j ≥ 0 : X j = x} be the first hitting time of x ∈ Z. Set S n = max k≤n X k , I n = min k≤n X k , R n = S n − I n + 1, n ≥ 0.
Here we used the observation that the number of jumps from k to k + 1 before time T n is equal to D n,k+1 + 1 for all 0 ≤ k ≤ n − 1. It follows from the definition that (D n,n , D n,n−1 . . . , D n,0 ) is a Markov process. Moreover, it can be recast as a branching process with migration (see [14], Section 3, as well as [15], Section 2). Let V := (V k ), k ≥ 0, be the process such that V 0 = 0 and Denote by σ ∈ [1, ∞] and Σ ∈ [0, ∞] respectively the lifetime and the total progeny over the lifetime of V , i.e. σ = inf{k > 0 : The probability measure that corresponds to V will be denoted by P V 0 . The following result will be used several times throughout the paper.

Non-boundary case: two useful lemmas
Let δ ∈ [0, 1). First of all, we show that by time n the walker consumes almost all the drift between I n and S n .
Proof. We shall start with (7) and use the connection with branching processes. Since the event we are interested in depends only on the environment and the behavior of the walk on {n − ℓ, n − ℓ + 1, . . . }, we may assume without loss of generality that the process starts at n − ℓ and, thus, by translation invariance consider only the case ℓ = n. Let At first, consider the case δ ∈ (0, 1). Let k = 0. Then (see (4) and (6)) for all sufficiently large n we get Since γ 1 > δ, this implies the desired estimate for k = 0. Let k ∈ {1, 2, . . . , M − 1}. Then for any ε > 0 We only need to estimate the last term. Notice that by (A2) there is ε > 0 such that . . , M − 1} and j ∈ N. Therefore, the last term is bounded above by the probability that in at least [n γ1 /M ] independent Bernoulli trials with probability of success in each trial of at least ε there are at most [εn γ1 /(2M )] successes. This probability is bounded above by exp(−cn γ1 /M ) for some positive c = c(ε). This completes the proof of (7) for δ > 0.
If δ = 0 we modify the environment by increasing slightly the drift (to the right) in the first cookie at each site. Let V be the branching process corresponding to the modified environment. There is a natural coupling between V and V such that and (7) for δ = 0 follows from the result for δ > 0 and the second line of (9).
Next after replacing X by −X proving (8) reduces to proving (7) for δ ≤ 0 and γ 1 > 0. As above, the result for δ ≤ 0 can be deduced from the result for δ ∈ (0, γ 1 ) by coupling of the corresponding branching processes.
Next we show that √ n is a correct scaling in Theorem 1.
Notice that by the Markov property and the stochastic monotonicity of V in the initial number of particles Suppose that we can show that there exist K, n 0 ∈ N such that for all n ≥ n 0 Then using (10) and (11) we get that for all L > 4K 2 and n ≥ √ Ln 0 and we are done.
To prove (11), we observe that due to (4) the sequence σ m /m 1/δ , m ∈ N, has a limiting distribution ( [10], Theorem 3.7.2) and, thus, if K is large then P 0 (σ [( √ Kn) δ ] > Kn) ≤ 1/4 for all large enough n. We conclude that there is an n 0 ∈ N such that for all n ≥ n 0 . This immediately gives (11) if K is chosen sufficiently large.

Non-boundary case: Proof of Theorem 1
Let ∆ n = X n+1 − X n and (12) Then X n = B n + C n , where (B n ), n ≥ 0 is a martingale. Define Theorem 1 is an easy consequence of the following three lemmas, the first of which holds for the quenched and the last two for the averaged measures.
Lemma 6. Let B be a standard Brownian motion. Then B (n) J1 ⇒ B as n → ∞ for P-a.e. ω.
Proof of Theorem 1 assuming Lemmas 6-8. Since X (n) , n ≥ 1, is tight and B (n) J1 ⇒ B as n → ∞, the sequence C (n) , n ≥ 1, as the difference of two tight sequences is also tight. We can assume by choosing a subsequence that X (n) J1 ⇒ X, where X is continuous by Lemma 8. The mapping . Therefore, by the continuous mapping theorem (13) r The tightness of C (n) , n ≥ 1, (13), Lemma 7, and the "convergence together" result ( [6], Theorem 3.1) imply that C (n) J1 ⇒ δr X as n → ∞. Now we have a vector-valued sequence of processes (X (n) , B (n) , C (n) ), n ≥ 1, that is tight. Therefore, along a subsequence, this 3-dimensional process converges to (X, B, δr X ). Since X (n) = B (n) + C (n) , we get that X = B + δr X .
We shall conclude this section with proofs of Lemmas 6-8.
By Lemma 5, given ν > 0, we can choose K sufficiently large so that P 0 (R [nt] > K √ n) < ν/2 for all n ∈ N. We have By the strong law of large numbers lim , and the first term in the right-hand side of (15) does not exceed Thus, we only need to estimate the second term in the right-hand side of (15). Divide the interval [I [nt] , S [nt] ] into subintervals of length n 1/4 . By Lemma 4, given γ 1 ∈ (δ, 1), with probability at least 1 − θ n γ 2 /4 Kn 1/4 all subintervals except the two extreme ones have at most n γ1/4 points which are visited less than M times. Hence, for n sufficiently large To prove both statements of Lemma 8 it is enough to show that there exists C 3 , α > 0 such that for all ℓ ∈ N and sufficiently large n, n > 2 ℓ , where Ω n,k,ℓ = X (n) k + 1 (see e.g. the last paragraph in the proof of Lemma 1 in [12], Chapter III, Section 5). Let Then 4J} and B n and C n are defined in (12).
Since (B j+m1 − B m1 ), j ≥ 0, is a martingale, whose quadratic variation grows at most linearly, the maximal inequality and Burkholder-Davis-Gundy inequality ( [13], Theorem 2.11 with p = 4) imply that Hence, P 0 ∪ k<2 ℓ Ω B n,k,ℓ ≤ C ′ 2 −ℓ/2 . To control P 0 (Ω C n,k,ℓ ) consider the following intervals: Then To estimate P 0 (Ω C n,k,ℓ,3 ) note that to accumulate a drift larger than J the walk should visit at least [J/M ] distinct sites, i.e. Using Lemma 5, we can find K > 1 such that P 0 (S n > K √ n) < 2 −l for all sufficiently large n. Therefore, where Ω † n,m,ℓ = T (m+1)J − T mJ ≤ m 2 − m 1 . Since m 2 − m 1 ≤ CJ 2 /2 6l for some constant C > 0, Lemma 5 implies that there isθ < 1 such that and all sufficiently large n P 0 (∪ k<2 ℓ Ω C n,k,ℓ,1 ) is estimated in the same way. We consider now A 2 , which is a random subinterval of [−m 1 , m 1 ] and, on Ω C n,k,ℓ,2 , has length between J/M and 8J. To estimate P 0 (Ω C n,k,ℓ,2 ) we notice that by Lemma 4, outside of an event of exponentially small (in J γ2 ) probability, the number of cookies that are left in A 2 at time m 1 does not exceed CJ γ1 , where γ 1 < 1. Even if the walker consumes all cookies in that interval, it can not build up a drift of size J ≫ CJ γ1 (for J large). With this idea in mind, we turn now to a formal proof.
As we noted above, on Ω C n,k,ℓ,2 , we have A 2 ∈ I, where I denotes the set of all intervals of the form The cardinality of I does not exceed 16m 1 J ≤ Cn 3/2 . Therefore, (18) P 0 (Ω C n,k,ℓ,2 ) ≤ Cn 3/2 max By the definition of A 2 , the walk necessarily crosses the interval A 2 by the time m 1 . The leftover drift in A 2 is at most M times the number of sites in A 2 , which still have at least one cookie. Writing A as [a, b], a, b ∈ Z, a < b, we can estimate the last probability by If a ≥ 0 we can apply Lemma 4 and get that for all sufficiently large n (such that (8J) γ1 ≤ J/M ) The case b ≤ 0 is similar. Finally, consider the case a < 0 < b. Then for all sufficiently large J. Lemma 4 implies that The first term in the right-hand side of (20) is estimated in the same way. We conclude that for some constant C and all sufficiently large n P 0 (∪ k<2 ℓ Ω C n,k,ℓ,2 ) ≤ Cn 3/2 2 ℓ θ (J/(2M)) γ 2 < 2 −ℓ . This completes the proof of (16) establishing Lemma 8.
Lemma 9. The finite dimensional distributions of T (n) converge to those of cH, where c > 0 is a constant and H is given by (22).
Theorem 2 is an easy consequence of these lemmas.
Proof of Theorem 2. Lemma 9 implies that the finite dimensional distributions of the process S (n) converge to those of DB * , where D > 0 is a constant. Since the trajectories of S (n) are monotone and the limiting process B * is continuous, we conclude that S (n) converges weakly to DB * in the (locally) uniform topology (see [1], Corollary 1.3 and Remark (e) on p. 588). Finally, by Lemma 10 for each in P 0 probability. By the "converging together" theorem ([6, Theorem 3.1]) we conclude that X (n) converges weakly to DB * in the (locally) uniform topology, and, thus, in J 1 .
Proof of Lemma 9. Let k ∈ N and 0 = x 0 < x 1 < · · · < x k . We have to show that for any 0 = t 0 < t 1 < t 2 < · · · < t k where T (·) = cH(·) for some c > 0. At time T [nx k ] consider the structure of the corresponding branching process as we look back from [nx k ]. Notice that D [nxi],j ≤ D [nx k ],j for i ≤ k and all j. This simple observation will allow us to get bounds on T [nxi] , i = 1, 2, . . . , k − 1, in terms of the structure of downcrossings at time T [nx k ] . This means that we can use the same copy of the branching process V to draw conclusions about all hitting times T [nxi] , i = 1, 2, . . . , k.
Recalling our definition of N (k−i) we get that for every ε, ν > 0 there is n 0 such that for all n ≥ n 0 In particular, for C = (1 + ν)bx k we have that Define λ n = (log n) −1/2 (any sequence λ n , n ∈ N, such that λ n → 0 and λ n log n → ∞ will work) and notice that by Theorem 3 there is n 1 such that for all n ≥ n 1 P max Thus, on a set Ω ε of measure at least 1 − 2ε for all n ≥ n 0 ∨ n 1 the number of lifetimes of the branching process V covering [nx k ] − [nx i ] generations, i = 0, 1, 2, . . . , k − 1, is well controlled and the maximal lifetime over [nx k ] generations does not exceed nλ n . In particular, on Ω ε , the number of lifetimes in any interval ([nx i ], [nx i+1 ]), i = 0, 1, . . . , k − 1, goes to infinity as n → ∞.
By Lemma 9 we can find K > 0 such that for all large n P 0 (S n ≥ K √ n ln n) ≤ P 0 (T [K √ n ln n] ≤ n) < ν.
To estimate the last term in (26) we shall use properties of the branching process V . Let N = min{m ∈ N : σ m > K √ n ln n}. Then the last term in (26) is bounded by for some large C and all sufficiently large n. Finally, from (4) we conclude that for all large enough n the last probability does not exceed This completes the proof.