Recurrence of Reinforced Random Walk On a Ladder

Consider reinforced random walk on a graph that looks like a doubly inﬁnite ladder. All edges have initial weight 1, and the reinforcement convention is to add δ > 0 to the weight of an edge upon ﬁrst crossing, with no reinforcement thereafter. This paper proves recurrence for all δ > 0. In so doing, we introduce a more general class of processes, termed multiple-level reinforced random walks. A draft of this paper was written in 1994. The paper is one of the ﬁrst to make any progress on this type of reinforcement problem. It has motivated a substantial number of new and sometimes quite diﬃcult studies of reinforcement models in pure and applied probability. The persistence of interest in models related to this has caused the original unpublished manuscript to be frequently cited, despite its lack of availability and the presence of errors. The opportunity to rectify this situation has led us to the somewhat unusual step of publishing a result that may have already entered the mathematical folklore.

1 Introduction and summary Coppersmith and Diaconis (1987) have initiated the study of a class of processes called reinforced random walks; see also Diaconis (1988). Take a graph with initial weights assigned to the edges. Then define a discrete-time, nearest-neighbor random walk on the vertices of this graph as follows. At each stage, the (conditional, given the past) probability of transition from the current vertex to an adjacent vertex is proportional to the weight currently assigned to the connecting edge. The random walk always jumps, so these conditional transition probabilities sum to 1. The weight of an edge can increase when the edge is crossed, with the amounts of increase depending on the reinforcement convention. The convention most studied by Coppersmith and Diaconis (1987) is to always add +1 to the weight of an edge each time it is crossed. In this setting, they show that a reinforced random walk on a finite graph is a mixture of stationary Markov random walks. The mixing measure is given explicitly in terms of the "loops" of the graph. Pemantle (1988) has studied reinforced random walks with the Coppersmith-Diaconis reinforcing convention on infinite acyclic graphs. Davis (1989) obtained results for nearest-neighbor reinforced random walks on the integers Z with very general reinforcement schemes.
Consider nearest-neighbor reinforced random walk on the lattice Z 2 of points in R 2 with integer coordinates. All edges between neighboring points are assigned initial weight 1. It seems plausible (perhaps even "obvious") that any spatially homogeneous reinforcement scheme for which the process cannot get stuck forever on a finite set of points will be recurrent, that is, will visit each point of the lattice infinitely often. However, determining whether any such reinforcement scheme leads to recurrence of reinforced random walk on Z 2 has remained open for fifteen years.
Michael Keane (2002) has proposed the following simpler variant: consider nearest-neighbor reinforced random walk on the points of Z 2 with y coordinate 1 or 2 (and starting at (0, 1), say). If one draws in the edges between nearest neighbors, one of course gets an infinite horizontal ladder. Again, all initial weights are taken to be +1. For the reinforcement scheme, Keane suggested that edges be reinforced by δ = 1 the first time they are crossed and then never again. This kind of reinforcement is now known as once-reinforced random walk; see, e.g., Durrett, Kesten and Limic (2002). This main result of this paper is that Keane's one-time reinforced random walk on a ladder is recurrent for any positive reinforcement parameter δ.

Notation and results
Let (Ω, {F n } n≥0 , P) be a standard filtration. Let d ≥ 2 be an integer and let {(X n , Y n )} n≥0 be random variables in Z × {1, . . . , d} that are adapted (that is, (X n , Y n ) ∈ F n ) and satisfy n=0 is a nearest neighbor random motion on Z, i.e., P {|X n+1 − X n | = 1} = 1 ∀ n.
Here W n (x, y) ∈ F n is the weight at time n of the horizontal edge to the right of (x, y), and is defined in our model by W n = 1 + δ · R(x, y, n) and R(x, y, n) is the indicator function of the event that the edge to the right of (x, y) has been crossed by time n: For lack of a better name, the process just described will be called MLRRW, standing for multiple-level reinforcing random walk. The way to think about it is that we first move horizontally from (X n , Y n ) according to the rules of reinforced random walk, and then we can move vertically in an arbitrary way before the next horizontal move. To make Keane's reinforced random walk on a ladder into an MLRRW, take F n to be the σ-field generated by the process up to just before the (n + 1) st horizontal step, together with the knowledge that the next step will be horizontal. It is easy to show that the conditions for MLRRW are satisfied by this choice of {F n } ∞ n=0 .
Let p = (1 + δ)/(2 + δ) and q = 1 − p = 1/(2 + δ). Note that p is the probability of crossing the reinforced edge when the choice is between one reinforced edge and one unreinforced edge. The notation I(A) is used for the indicator function of the event A as in (2.1) above. A sum m i=j a i will be taken to equal 0 when m < j. The main result of the paper is: For an MLRRW with d = 2, one has X n = 0 infinitely often, almost surely. Consequently, Keane's once-reinforced random walk on a ladder is recurrent.

Outline of the argument and an easy weaker result
The idea of the argument is this. If {X n } were a martingale, it would be forced to return to zero infinitely often. While it is not a martingale, we may get an upper bound on the total amount of compensation per horizontal distance that can be required to make it into a martingale. The following result, while not strong enough to imply Theorem 2.1, is a useful preliminary result and gives a flavor of the argument. Define to be the compensator of the increment X n+1 − X n so that Proposition 3.1 For any x and y, and any stopping times σ 1 , σ 2 , PROOF: Let τ 0 ≤ ∞ denote the least time n ≥ σ 1 that (X n , Y n ) = (x, y) and R(x − 1, y, n) = 1, let τ j be the least n > τ j−1 for which (X n , Y n ) = (x, y), and let T be the least n for which R(x, y, n) = 1. When n ≥ σ 1 , we may bound C n I{(X n , Y n ) = (x, y)} above by zero unless n = τ j < T for some j, in which case C n ≤ δ/(2 + δ). Evidently, so we see inductively that proving the proposition.
to be the total compensation occurring at sites (x, y), summed over y. Then for all x > 0, PROOF: Condition on the first time τ ≥ σ 1 that X n = x. If τ = ∞ then of course C(x) = 0. If not, then while for every y = Y τ , the previous lemma with σ 1 = τ gives and taking expectations with respect to F σ 1 then proves the corollary. .
This already allows us to prove the following weaker recurrence result.
on the event that τ is finite and |X τ | < M . Assume without loss of generality that X τ > 0 (the argument for X τ < 0 is similar and the case X τ = 0 is automatic). Since The theorem now follows directly. Take τ = τ (k) to be the least time n that |X n | = 2 k and take M = A2 k with A sufficiently large so that (d − 1)δ + A −1 = 1 − < 1. The conditional probability given F τ (k) of returning to zero between times τ (k) and τ (k+1) is at least , so the theorem follows from the conditional Borel-Cantelli lemma.

Recurrence when d = 2 and improvements for d > 2
The key lemma will be the following strengthening of Corollary 3.2.
Lemma 4.1 Let 2 ≤ x < M and 0 ≤ k < d be fixed. Let τ = τ (x) be the least n for which X n = x, and define T to be the least n ≥ τ for which X T = 0 or X T = x + 1. Then, on the event that d y=1 R(x − 1, y, τ ) = d − k, where C(x, τ, T ) is defined as in (3.3). In other words, the accumulation of compensation at horizontal position x, from the time x is hit until the time x + 1 or zero is hit, can be no more than −1, plus the number of unreinforced edges immediately to the left of position x when x was hit, plus the probability that the walk will return to zero before ever reaching x + 1.
PROOF: The proof is by induction on k. First assume k = 0. Let T be the least n > τ for which X n = x + 1 or X n = 0. Define τ 0 = τ and τ j = inf{n > τ j : X n = x}. We may compute On the other hand, and these two together yield which proves the lemma in the case k = 0.
Now assume for induction that the lemma is true when k is replaced by k−1. There are two cases in the induction step. The first case is when at time τ , the random walker sees an unreinforced edge to the left, that is, R(X τ − 1, Y τ , τ ) = 0. The next paragraph refers to inequalities holding on this event.
We may split C into three components, depending on whether the next move is to the right or the next move is to the left and the walk does or does not return to horizontal position x before zero. Formally, write C = C(I 1 + I 2 + I 3 ) where I 1 = I{X τ +1 = x + 1} ; Clearly CI 3 = 0 since then the walk is at x only once before time T , at which time the compensator is zero. For the first piece, let I 4 = I{X τ +1 = x + 1, τ 1 < T }, and observe that I 4 ∈ F τ 1 while, by Corollary 3.2, E(C | F τ 1 ) ≤ δk. Thus Finally, on the event I 2 there are at most k − 1 unreinforced edges just to the left of x at time τ 1 so that the induction hypothesis implies that Putting together the three estimates gives finishing the case where the walker sees an unreinforced edge to the left.
Finally, we remove the assumption of an unreinforced edge to the left. Let τ ≥ τ be the least n for which X n = x + 1 or X n = x and R(x − 1, Y n , n) = 0. Then C n I{X n = x} ≤ 0 for n < τ , whence But E(C(x, τ ) | F τ ) is bounded above by δ(k − 1 + P(X T = 0 | F τ )) on the event {X τ = x} (this is the case in the previous paragraph), and by (k − 1)δ if X τ = x + 1 (Corollary 3.2). Removing the conditioning on F τ , we see that E(C | F τ ) ≤ δ(k − 1 + P(X T = 0 | F τ )) in either case, which completes the induction and the proof of the lemma.
As a Corollary we get the following strengthening of (3.4): Corollary 4.2 For 2 ≤ m < M , let τ be the least n for which X n = m and let T be the least n > τ for which X n = 0 or X n = M . Then PROOF: As in the proof of (3.4), the quantity is a martingale, and X T is either 0 or M , so where Corollary 3.2 and Lemma 4.1 were used to bound the two summations. Thus (4.5) and solving for r gives which is equivalent to the conclusion of the corollary.

Further remarks
It is expected that Keane's once edge-reinforced walk on a ladder is recurrent for any d and δ. However, Theorem 4.3 is sharp in the sense that a MLRRW with d levels can be transient for any δ > (d − 2) −1 ; thus the freedom to take arbitrary vertical jumps does seem to alter the critical value of δ. To see that Theorem 4.3 is sharp, define an MLRRW by choosing Y i at each stage to make C i positive whenever possible. If C i cannot be made positive, then let it be zero if possible, giving preference to sites where the edges to either side are already reinforced. The proof of transience is similar to arguments to be found in Sellke (1993). The gist is as follows. First of all, it is easy to show that X n is transient if and only if X + n , the positive part of X n , is transient. (Note that C i is never negative when X i is negative.) So consider the process X + n . A zero-one law argument shows that X + n is either almost surely transient or almost surely recurrent. If X + n were almost surely recurrent, we could find an M large enough so that, for the overwhelming majority of x values between 0 and M , the probability is near 1 that all horizontal edges at x are reinforced before X + n hits M . One then shows that, for the randomized enlightened greedy algorithm, the expected cumulative bias at a positive x is (d − 2)δ if X + n visits x often enough. Consequently, the expected cumulative bias accumulated by X + n by the time T M that M is finally hit can be shown to be greater than M . But this would imply E(X T M ) > M , which contradicts X T M ≡ M .
In the critical case δ = (d − 2) −1 , this algorithm can be shown to produce recurrence, again by arguments similar to those in Sellke (1993).