Large deviations and slowdown asymptotics for one-dimensional excited random walks

We study the large deviations of one-dimensional excited random walks. We prove a large deviation principle for both the hitting times and the position of the random walk and give a qualitative description of the respective rate functions. When the excited random walk is transient with positive speed $v_0$, then the large deviation rate function for the position of the excited random walk is zero on the interval $[0,v_0]$ and so probabilities such as $P(X_n<nv)$ for $v \in (0,v_0)$ decay subexponentially. We show that rate of decay for such slowdown probabilities is polynomial of the order $n^{1-\delta/2}$, where $\delta>2$ is the expected total drift per site of the cookie environment.


Introduction
In this paper we study the large deviations for one-dimensional excited random walks. Excited random walks are a model for a self-interacting random walk, where the transition probabilities depend on the number of prior visits of the random walk to the current site. The most general model for excited random walks on Z is the following. Let Ω = [0, 1] Z×N , and for any element ω = {ω i (j)} i∈Z, j≥1 ∈ Ω we can define an excited random walk X n by letting ω i (j) be the probability that the random walk moves to the right upon its j-th visit to the site i ∈ Z. More formally, we will let P ω (X 0 = 0) and P ω (X n+1 = x + 1| X n = x) = 1 − P ω (X n+1 = x − 1| X n = x) = ω x (#{k ≤ n : X k = x}) .
Note that the excited random walk X n is not a Markov chain since the transition probabilities depend on the entire past of the random walk and not just the current location.
Excited random walks are also sometimes called cookie random walks, since one imagines a stack of "cookies" at every site which each induce a specific bias to the walker. When the walker visits the site x for the i-th time, he eats the i-th cookie which causes his next step to be as a simple random walk with parameter ω x (i). For this reason we will also refer to ω = {ω i (j)} i∈Z, j≥1 as a cookie environment.
We can also assume that the cookie environment ω is first chosen randomly. That is, let P be a probability distribution on the space of cookie environments Ω, and define a new measure on the space of random walk paths Z Z + by averaging over all cookie environments. That is, let P (·) = Ω P ω (·) P(dω).
For a fixed cookie environment ω, the law P ω is referred to as the quenched law of the excited random walk, and P is called the averaged law of the excited random walk.
Most of the results for excited random walks make the assumption that there are only finitely many cookies per site. That is, there exists an M such that ω i (j) = 1/2 for any i ∈ Z and j > M so that after M visits to any site the transitions are like a simple symmetric random walk. Assumption 1. There exists an integer M < ∞ such that there are almost surely only M cookies per site. That is, P(Ω M ) = 1, where Ω M = Ω ∩ {ω : ω i (j) = 1/2, ∀i ∈ Z, ∀j > M }.
We will also make the common assumption that the cookie environment is i.i.d. in the following sense.
Assumption 2. The distribution P is such that the sequence of cookie environments at each site {ω i (·)} i∈Z is i.i.d.
Excited random walks were first studied by Benjamini and Wilson in [BW03], where they considered the case of deterministic cookie environments with one cookie per site (that is M = 1). The focus of Benjamini and Wilson was mainly on the Z d case, but they showed that excited random walks with one cookie per site are always recurrent. The model was further generalized by Zerner in [Zer05] to allow for multiple cookies per site and for randomness in the cookie environment, but with the restriction that all cookies induced a non-negative drift (that is ω i (j) ≥ 1/2). Recently the model of excited random walks was further generalized by Zerner and Kosygina to allow for cookies with both positive and negative drifts [KZ08].
The recurrence/transience and limiting speed for one-dimensional excited random walks have been studied in depth under the above assumptions. A critical parameter for describing the behavior of the excited random walk is the expected total drift per site Zerner showed in [Zer05] that excited random walks with all cookies ω i (j) ≥ 1/2 are transient to +∞ if and only if δ > 1. Additionally, Zerner showed that the limiting speed v 0 = lim n→∞ X n /n exists, P -a.s., but wasn't able to determine when the speed is non-zero. Basdevant and Singh solved this problem in [BS07] where they showed that v 0 > 0 if and only if δ > 2. These results for recurrence/transience and the limiting speed were given only for cookies with non-negative drift but were recently generalized by Kosygina and Zerner [KZ08] to the general model we described above that allows for cookies with both positive and negative drifts. In summary, under Assumptions 1 -3, the following results are known.
Limiting distributions for excited random walks are also known with type of rescaling and limiting distribution depend only on the parameter δ given in (1). The interested reader is referred to the papers [BS08, KZ08, KM11, Dol11, DK12] for more information on limiting distributions.
1.1. Main Results. In this paper, we are primarily concerned with the large deviations of excited random walks. In a similar manner to the approach used for large deviations of random walks in random environments, we deduce a large deviation principle for X n /n from a large deviation principle for T n /n, where T n = inf{k ≥ 0 : X k = n}, n ∈ Z are the hitting times of the excited random walk. However, we don't prove a large deviation principle for the hitting times directly. Instead, we use an associted branching process with migration V i that has been used previously in some of the above mentioned papers on the speed and limiting distributions for excited random walks [BS07,KZ08,KM11]. We prove a large deviation principle for n −1 n i=1 V i and use this to deduce a large deviation principle for T n /n which in turn implies the following large deviation principle for X n /n. Theorem 1.1. The empirical speed of the excited random walk X n /n satisfies a large deviation principle with rate function I X defined in (28). That is, for any open set G ⊂ [−1, 1], and for any closed set F ⊂ [−1, 1], Remark 1.2. After the initial draft of this paper was completed, it was noted that a general large deviation principle for certain non-Markovian random walks due to Rassoul-Agha [RA04] can be used to prove Theorem 1.1 in certain cases. Thus, it is necessary to point out some of the differences with the current paper.
• In [RA04] the random walks are assumed to be uniformly elliptic, which in the context of this paper would require ω i (j) ∈ [c, 1 − c] for all i ∈ Z, j ≥ 1 and some c > 0. In contrast, we only assume the weaker condition in Assumption 3. • The results of [RA04] only apply directly to excited random walks with deterministic cookie environments. If the cookie environments are allowed to be random and satisfying Assumption 2, then a technical difficulty arises in satisfying one of the conditions for the large deviation principle in [RA04]. Specifically, the transition probabilities q(w, z) for the shifted paths as defined in [RA04] do not appear to be continuous in w for the required topology. We suspect, however, that the techniques of [RA04] could be adapted to apply to this case as well. • The formulation of the large deviation rate function in [RA04] is difficult to work with and the only stated properties of the rate function are convexity and a description of the zero set. In contrast, our method gives a more detailed description of the rate function (see Lemma 5.1 and Figure 3). • The method in [RA04] does not also give a large deviation principle for the hitting times of the random walk.
As mentioned in the above remark, the formulation of the rate function I X given in the proof of Theorem 1.1 allows us to give a good qualitative description of the rate function (see Lemma 5.1). One particularly interesting property is that when δ > 2 (so that the limiting speed v 0 > 0) then I X (x) = 0 when x ∈ [0, v 0 ]. Thus, probabilities of the form P (X n < nx) decay subexponentially if x ∈ (0, v 0 ). In fact, as the following example shows, one can see quite easily that such slowdown probabilities must have a subexponential rate of decay.
Example 1.1. We exhibit a naive strategy for obtaining a slowdown of the excited random walk. Consider the event where the excited random walk first follows a deterministic path that visits every site in [0, n 1/3 ) M times (so that no cookies remain in the interval) and then the random walk stays in the interval [0, n 1/3 ) for n steps. The probabilistic cost of forcing the random walk to follow the deterministic path at the beginning is e −c M n 1/3 for some c > 0. Then, since there are no cookies left in the interval, the probability of then staying in [0, n 1/3 ) for n steps before exiting to the right is a small deviation computation for a simple symmetric random walk. The probability of this event can be bounded below by Ce −c n 1/3 for some C, c > 0 (see Theorem 3 in [Mog74]). Thus, the total probability of the above event for the excited random walk is at least Ce −cn 1/3 . The example above shows that P (X n < xn) decays slower than a stretched exponential. However, this strategy turns out to be far from the optimal way for obtaining such a slowdown. The second main result of this paper is that the true rate of decay for slowdowns is instead polynomial of the order n 1−δ/2 .

1.2.
Comparison with RWRE. Many of the prior results for one-dimensional excited random walks are very similar to the corresponding statements for random walks in random environments (RWRE). For instance, both models can exhibit transience with sublinear speed and they have the same types limiting distributions for the hitting times and the location of the random walk [KZ08,Sol75,KKS75]. Thus, it is interesting to compare the results of this paper with what is known for one-dimensional RWRE.
Large deviations for one-dimensional RWRE (including a qualitative description of the rate functions) were studied in [CGZ00] and subexponential slowdown asymptotics for ballistic RWRE similar to Theorem 1.3 were studied in [DPZ96]. The similarities to the current paper are greatest when the excited random walk has δ > 2 and the RWRE is transient with positive speed and "nestling" (i.e., the environment has positive and negative drifts). In this case, the large deviation rate function for either model is zero on the interval [0, v 0 ], where v 0 = lim n→∞ X n /n is the limiting speed. Moreover, the polynomial rates of decay of the slowdown probabilities are related to the limiting distributions of the random walks in the same way. For instance, in either model if the slowdown probabilities decay like n 1−α with α ∈ (1, 2) then n −1/α (X n − nv 0 ) converges in distribution to an α-stable random variable [KZ08,KKS75].
An interesting difference in the rate functions for excited random walks and RWRE is that I X (0) = 0 in the present paper, while for transient RWRE the left and right derivatives of the rate function are not equal at the origin [CGZ00]. Since (in both models) I X is defined in terms of the large deviation rate function I T (t) for the hitting times T n /n, this is related to the fact that inf t I T (t) = 0 for excited random walks (see Lemma 4.1) while the corresponding rate function for the hitting times of RWRE is uniformly bounded away from 0 if the walk is transient to the left.
1.3. Outline. The structure of the paper is as follows. In Section 2 we define the associated branching process with migration V i , mention its relationship to the hitting times of the excited random walk, and prove a few basic properties about the process V i . Then in Section 3 we prove a large deviation principle for the empirical mean of the process V i and prove some properties of the corresponding rate function. The large deviation principle for the empirical mean of the process V i is then used to deduce large deviation principles for T n /n and X n /n in Sections 4 and 5, respectively. Finally, in Section 6 we prove the subexponential rate of decay for slowdown probabilities.

A related branching process with random migration
In this section we recall how the hitting times T n of the excited random walk can be related to a branching process with migration. We will construct the related branching process with migration using the "coin tossing" construction that was given in [KZ08]. Let a cookie environment ω = {ω i (j)} i∈Z,j≥1 be fixed, and let {ξ i,j } i∈Z,j≥1 be an independent family of Bernoulli random variables with P (ξ i,j = 1) = ω i (j). For i fixed, we say that the j-th Bernoulli trial is a "success" if ξ i,j = 1 and a "failure" otherwise. Then, let F (i) m be the number of failures in the sequence {ξ i,j } j≥1 before the m-th success. That is, Finally, we define the branching process with migration {V i } i≥1 by If the ω i (j) were all equal to 1/2 then the process {V i } would be a critical Galton-Watson branching process with one additional immigrant per generation. Allowing the first M cookie strengths at each site to be different than 1/2 has the effect of making the migration effect more complicated (in particular, the migration in each generation is random and can depend on the current population size). We refer the interested reader to [BS07] for a more detailed description of the interpretation of V i as a branching process with migration.
In addition to the above branching process with migration, we will also need another branching process with a random initial population and one less migrant each generation. For any n ≥ 1, let V (n) 0 = V n where V n is constructed as above and let V i } i≥0 to the hitting times T n of the excited random walk is the following.
To explain this relation let U n i = #{k ≤ T n : X k = i, X k+1 = i − 1} be the number of times the random walk jumps from i to i − 1 before time T n . Then, it is easy to see that T n = n + 2 i≤n U n i and (5) follows from the fact that . The details the above joint equality in distribution can be found in [BS07] or [KM11].
Remark 2.1. Technically, the relation 5 is proved in [BS07] and [KM11] only in the cases where T m < ∞ with probability one. However, an examination of the proof shows that for any finite k and so both sides of (5) are infinite with the same probability as well.
2.1. Regeneration structure. We now define a sequence of regeneration times for the branching process V i . Let σ 0 = 0 and for k ≥ 1 be the total offspring of the branching process by the k th regeneration time. The tails of σ 1 and W 1 were analyzed in [KM11] in the case when δ > 0.
Lemma 2.2 (Theorems 2.1 and 2.2 in [KM11]). If δ > 0 then, Note that if the Markov chain V i is transient, then eventually σ k = W k = ∞ for all k large enough. The following Lemma specifies the recurrence/transience properties of the Markov chain V i . Proof. The tail decay of σ 1 shows that E[σ 1 ] < ∞ if δ > 1 and E[σ 1 ] = ∞ if δ ∈ (0, 1]. Therefore, it is enough to show that V i is recurrent if and only if δ ≥ 0. This can be proven by an appeal to some previous results on branching proceses with migration as was done in [KZ08]. A small difficulty arises in that the distribution of the migration that occurs before the generation of the (i + 1)-st generation depends on the population of i-th generation. However, this can be dealt with in the same manner as was done in [KZ08]. To see this, let V i be defined by Note that V i and V i have the same transition probabilities when starting from a site k ≥ M − 1, and thus V i and V i are either both recurrent or both transient.
To see the other implication, note that M is the number of failures in {ξ i,j } j≥1 between the M -th success and success number ( Finally, it can be shown that Z i is a branching process with migration where the migration component has mean 1−δ and the branching component has offspring distribution that is Geometric(1/2) (see Lemmas 16 and 17 in [KZ08]). Then, previous results in the branching process with migration literature show that Z i is recurrent if and only if δ ≥ 0 (see Theorem A and Corollary 4 in [KZ08] for a summary of these results).
We close this section by noting that the above regeneration structure for the process V i can be used to give a representation for the limiting speed of the excited random walk. First note that, as was shown in [BS07], the representation (5) can be used to show that when δ > 1, To compute the last limit above, first note that and that the tail decay of σ 1 given in Theorem 2.1 of [KM11] implies that Therefore, we obtain the following formula for the limiting speed of transient excited random walks.

Large Deviations for the Branching Process
In this section we discuss the large deviations of be the logarithmic moment generating function of (W 1 , σ 1 ), and let The relevance of these functions is seen by the following Theorem, which is a direct application of a more general result of Nummelin and Ney (see remark (ii) at the bottom of page 594 in [NN87]).
Theorem 3.1. Let I V (x) be defined as in (10). Then, for all open G and any j ≥ 0, and lim sup for all closed F and any j ≥ 0.
In order to obtain large deviation results for the related excited random walk, it will also be necessary to obtain large deviation asymptotics of n −1 n i=1 V i without the added condition on the value of V n .
for all open G, and for all closed F . Remark 3.3. There are many results in the large deviations literature that imply a large deviation principle for the empirical mean of a Markov chain. However, we were not able to find a suitable theorem that implied Theorem 3.2. Some of the existing results required some sort of fast mixing of the Markov chain [BD96,DZ10], but the Markov chain {V i } i≥0 mixes very slowly since if V 0 is large it typically takes a long time to return to 0 (on the order of O(V 0 ) steps). Moreover, it is very important that the rate functions are the same in Theorems 3.1 and 3.2, and many of the results for the large deviations for the empirical mean of a Markov chain formulate the rate function in terms of the spectral radius of an operator [dA85] instead of in terms of logarithmic moment generating functions as in (9) and (10).
Proof. Obviously the lower bound in Theorem 3.2 follows from the corresponding lower bound in (3.1), and so it is enough to prove the upper bound only. Our proof will use the following facts about the functions Λ V and I V .
(i) Λ V (λ) is convex and continuous on (−∞, 0] and Λ V (λ) = ∞ for all λ > 0. Therefore, These properties and more will be shown in Section 3.1 below where we give a qualitative description of the rate function I V . By property (ii) above, it will be enough to prove the large deviation upper bound for closed sets of the form F = (−∞, x]. That is, we need only to show that This will follow from Indeed, combining (13) with the usual Chebyshev upper bound for large deviations gives that for any x < ∞ and λ < 0, Optimizing over λ < 0 and using property (i) above proves (12).
It remains still to prove (13). By decomposing according to the time of the last regeneration before n we obtain where we used the Markov property in the second equality. The definition of Λ V and the monotone convergence theorem imply that Λ W,σ (λ, −Λ V (λ)) ≤ 0. Therefore, To bound the second sum in (15) we need the following lemma, whose proof we postpone for now.
Lemma 3.4 implies the expectation (14) is uniformly bounded in n and that the second sum in (15) grows at most linearly in n. Since the first sum in (15) also grows linearly in n this implies that lim sup which is obviously equivalent to (13). It remains only to give the proof of Lemma 3.4 Proof of Lemma 3.4. First, note that where in the last equality we use the notation E m for the expectation with respect to the law of the Markov chain V i conditioned on V 0 = m. Since V i is an irreducible Markov chain and E[e λW 1 −Λ V (λ)σ 1 1 {σ 1 <∞} ] ≤ 1, then the inner expectation in (16) is finite for any value of V t and can be uniformly bounded below if V t is restricted to a finite set. Thus, for any K < ∞, Let C K,λ < ∞ be defined to be the right side of the inequality above.
Note that the upper bound (17) doesn't depend on t. The key to finishing the proof of Lemma 3.4 is using the upper bound (17) in an iterative way. For any t ≥ 1, By choosing K > Λ V (λ)/λ so that e λK−Λ V (λ) < 1, we thus obtain that 3.1. Properties of the rate function I V . We now turn our attention to a qualitative description of the rate function I V . Since I V is defined as the Legendre dual of Λ V , these properties will in turn follow from an understanding of Λ V (and also Λ W,σ ). We begin with some very basic properties of Λ V and the corresponding properties of I V .
To show that Λ V (λ) = ∞ for λ > 0 it is actually easiest to refer back to the excited random walk. Recall the naive strategy for slowdowns of the excited random walk in Example 1.1. We can modify the strategy slightly by not only consuming all cookies in [0, n 1/3 ) and then staying in the interval for n steps, but also requiring that the random walk then exits the interval on the right. This event still has a probability bounded below by Ce −cn 1/3 . Examining the branching process corresponding to the excited random walk we see that event for the random walk described above implies that U N i ≥ 1 for all i ∈ [1, N − 1], U N 0 = 0 and N i=1 U N i > n/2, where N = n 1/3 . Then, using (6) we obtain that that P (W 1 > n/2, σ 1 = n 1/3 ) ≥ Ce −cn 1/3 for all n ≥ 1 which implies that E[e λW 1 +ησ 1 1 {σ 1 <∞} ] ≥ e λn/2+ηn 1/3 P (W 1 > n/2, σ 1 = n 1/3 ) ≥ Ce λn/2+ηn 1/3 −cn 1/3 , for any λ > 0 and η < 0. Since this lower bound can be made arbitrarily large by taking n → ∞, this shows that Λ W,σ (λ, η) = ∞ for any λ > 0 and η < 0, and thus Λ V (λ) = ∞ for all λ > 0.
We would like to say that Λ W,σ (λ, −Λ V (λ)) = 0. However, in order to be able to conclude this is true, we need to show that Λ W,σ (λ, η) ∈ [0, ∞) for some η. The next series of lemmas gives some conditions where we can conclude this is true.

Proof. We need to show that
< ∞ for λ negative and sufficiently close to zero. Since λ < 0 we need to bound the sum in the exponent from below. Note that all the terms in the sum except the last one are larger than −(m − 1) and that the terms are non-negative if V i ≥ m. Therefore, letting N m = #{1 ≤ i ≤ σ 1 : V i < m} we obtain that To show that this last expectation is finite for λ close to zero, we need to show that N m has exponential tails. To this end, note that the event {N m > n} implies that the first n times that the process V i < m, the following step is not to 0. Thus, Therefore, the statement of the Lemma holds with Since Λ W,σ is analytic and strictly convex in D • W,σ , the function g(λ) = Λ W,σ (λ, −m 0 λ) is strictly convex and analytic on the interval (λ 1 , 0). In particular, g is differentiable and Since g is strictly convex, Therefore, g(λ) is strictly decreasing on (λ 1 , 0). Since, lim λ→0 − g(λ) = g(0) = 0 we obtain that g(λ) = Λ W,σ (λ, −m 0 λ) > 0 for λ ∈ (λ 1 , 0). Thus, for every λ ∈ (λ 1 , 0) there exists an η ∈ (0, −m 0 λ) such that Λ W,σ (λ, η) = 0, and the definition of Λ V implies that η = −Λ V (λ). We have shown that Λ W,σ (λ, −Λ V (λ)) = 0 and (λ, −Λ V (λ)) ∈ D • W,σ for all λ ∈ (λ 1 , 0). As was the case in the proof of Lemma 3.6 these facts imply that Λ V (λ) is analytic and strictly convex on (λ 1 , 0).
We are now ready to deduce some properties of the rate function I V .
When δ < 0, then Λ W,σ (0, 0) = log P (σ 1 < ∞) < 0 and so it is more difficult to show Λ V (0) = 0. Instead we will prove inf x I V (x) = 0 in a completely different manner. First note that letting Therefore, we need to show that P (V n = 0) doesn't decay exponentially fast in n. The explanation of the representation (5) implies that P (T n < T −1 ) = P (U n 0 = 0) ≤ P (V n = 0), and thus we are reduced to showing that P (T n < T −1 ) doesn't decay exponentially fast in n. In fact, we claim that there exists a constant C > 0 such that To see this, suppose that the first 2M + 1 steps of the random walk alternate between 0 and 1. The probability of this happening is At this point the random walker has consumed all the "cookies" at the sites 0 and 1. Therefore, by a simple symmetric random walk computation, the probability that the random walk from this point hits x = 2 before x = −1 is 2/3. Since δ < 0 the random walk will eventually return from x = 2 to x = 1 again, and then the probability that the random walk again jumps M more times from x = 1 to x = 2 without hitting x = −1 is (2/3) M . After jumping from x = 1 to x = 2 a total of M + 1 times there are no longer any cookies at x = 2 either, and thus the probability that the random walk now jumps M + 1 times from x = 2 to x = 3 without visiting x = −1 is (3/4) M +1 . We continue this process at successive sites to the right until the random walk makes M + 1 jumps from x = n − 2 to x = n − 1 without hitting x = −1 (which happens with probability ((n − 1)/n) M +1 ). Upon this last jump to x = n − 1 the random walk has consumed all cookies at x = n − 1 and so the probability that the next step is to the right is 1/2. Putting together the above information we obtain the lower bound This completes the proof of (20), and thus inf x I V (x) = 0 when δ < 0.
Proof. The convexity of I V follows from the definition of I V as the Legendre transform of Λ V . The fact that I V (x) is non-increasing follows from the fact that Λ V (λ) = ∞ for any λ > 0. Indeed, if I V is also lower-semicontinuous since it is defined as a Legendre transform, and since it is also non-increasing it follows that I V is continuous on the domain where it is finite.
Then, the standard Chebyshev large deviation upper bound implies that lim sup On the other hand, Theorem 3.1 and the fact that I V is non-increasing implies that Thus, we see that I V (x) ≥ sup λ<0 (λx − Λ 1 (λ)) for any x < ∞. Then, similar to the case δ ∈ (1, 2] above, it will follow that I V (x) > 0 for all x < ∞ if we can show that lim λ→0 − Λ 1 (λ)/λ = ∞.

Fix an integer
Recall the construction of the process V i in Section 2 and defineṼ i byṼ 0 = 0 and That is, jumps are governed by the same process as the jumps of the Markov chain V i with the exception that any attempted jump to a site in [1, K − 1] is replaced by a jump to 0. Note that the above construction ofṼ i gives a natural coupling with V i so thatṼ i ≤ V i for all i. Letσ k , k = 1, 2, . . . be the successive return times to 0 of the Markov chainṼ i . Then, sinceṼ i < K implies Sinceσ k is the sum of k i.i.d. random variables, Cramer's Theorem implies that this last probability decays on an exponential scale like e −nθIσ(1/θ) , where Iσ is the large deviation rate function forσ k /k.
Since the term in the minimum of (24) is decreasing in θ and the second term in the minimum is increasing in θ, the supremum is obtained for the value of θ that makes the two terms in the minimum equal. Thus, the supremum is greater than λK(h(−λK) − 1) (see Figure 1) which in turn implies that Λ 1 (λ) ≤ λK(1 − h(−λK)). Therefore, Since the above argument works for any finite K, this implies that lim λ→0 − Λ 1 (λ)/λ = ∞.

Large Deviations for Hitting Times
The large deviation principles for n −1 n i=1 V i in Theorems 3.1 and 3.2 imply a large deviation principle for the hitting times.
Theorem 4.1. Let I T (t) = I V ((t − 1)/2). Then, T n /n satisfies a large deviation principle with convex rate function I T (t). That is, δ > 2 δ ≤ 2 analytic and strictly convex analytic and strictly convex for all open G, and for all closed F . Moreover, the following qualitative properties are true of the rate function I T .
(i) I T (t) is convex, non-increasing, and continuous on [1, ∞), and there exists a t 2 > 1 such that I T (t) is strictly convex and analytic on (1, t 2 ).
Proof. The properties of the rate function I T follow directly from the corresponding properties of I V proved above in Lemmas 3.9 and 3.10. Note that when δ > 2 we use that the formula for the limiting speed of the excited random walk in (8) implies that 1/v 0 = E[σ 1 + 2W 1 ]/E[σ 1 ] = 1 + 2m 0 .
Recall the relationship between the hitting times T n and the processes V i and V (n) i given in (5). Then, The large deviation lower bound (25) then follows from Theorem 3.1.
Since I T is non-increasing and inf t I T (t) = 0, as in the proof of Theorem 3.2 the large deviation upper bound will follow from Again, the relationship between the hitting times T n and the processes V i and V (n) i given in (5) implies that and Theorem 3.2 implies that (27) holds.
To obtain a large deviation principle for the position of the excited random walk we will also need a large deviation principle for the hitting times to the left. However, this is obtained directly as a Corollary of Theorem 4.1 by switching the direction of the cookie drifts. To be more precise, for any cookie environment ω = {ω i (j)}, let ω = {ω i (j)} be the associated cookie environment given by ω i (j) = 1 − ω i (j). Let T n be the hitting times of the excited random walk in the cookie environment ω. An obvious symmetry coupling gives T −n = T n .
Corollary 4.2. The random variables T −n /n satisfy a large deviation principle with convex rate function I T , where I T is the rate function given by Theorem 4.1 for the hitting times T n /n.

Large deviations for the random walk
In this section will show a large deviation principle for X n /n. We begin by defining the rate function I X (x).
Before stating the large deviation principle for X n /n we will prove some simple facts about the rate function I X .
Lemma 5.1. The function I X is non-negative and continuous on [−1, 1] and has the following additional properties such that I X is strictly convex and analytic on (x 1 , v 0 ) and continuously differentiable on (x 1 , 0]..

Proof.
Most of the properties in the statement of the Lemma follow directly from the corresponding properties of I T (or I T ) given by Theorem 4.1, and thus we will content ourselves with only discussing property (ii) from the statement of the Lemma.
analytic and strictly convex analytic and strictly convex It is a general fact of convex analysis that if f (x) is a convex function on [1, ∞) then g(x) = xf (1/x) is also a convex function on (0, 1]. Therefore, the convexity of I T and I T imply that I X is convex on [−1, 0) and (0, 1], respectively. Next, note that lim x→0 + I X (x) = lim x→0 + xI T (1/x) = 0 since I T is finite and non-increasing, and similarly lim x→0 − I X (x) = 0. Therefore, I X is continuous at x = 0 which in turn implies that I X is convex on [−1, 0] and [0, 1], respectively. Finally, the convexity of I X on all of [−1, 1] follows from the convexity on [−1, 0] and [0, 1] and the monotonicity properties in in part (i) of the lemma.
We now are ready to prove the large deviation principle for the position of the excited random walk. To prove the large deviations lower bound it is enough to show that First consider the case where x ∈ (0, 1]. Then, since the random walk is a nearest neighbor walk P (|X n − nx| < εn) ≥ P (|T nx − n| < εn − 1).

Slowdowns
If δ > 2, then Lemma 5.1 shows that the rate function I X is zero in the interval [0, v 0 ]. Thus, probabilities such as P (X n < nx) decay to zero sub-exponentially for x ∈ (0, v 0 ). Similarly, since I T is zero in [1/v 0 , ∞) probabilities of the form P (T n > nt) decay sub-exponentially if t > 1/v 0 . The main goal of this section is to prove Theorem 1.3 which gives the correct polynomial rate of decay for these probabilities.
In order to prove Theorem 1.3 we will need the following bound on backtracking probabilities for transient excited random walks.
Lemma 6.1. Let δ > 1. Then there exists a constant C > 0 such that for any n, r ≥ 1, Remark 6.2. In [BS08], Basdevant and Singh showed that such backtracking probabilities could be bounded uniformly in n by a term that vanishes as r → ∞. However, their argument uses an assumption of non-negativity of the cookie strengths, and their bounds do not give any information on the rate of decay of the probabilities in r. Our argument is more general (allowing positive and negative cookie drifts) and gives a quantitative rate of decay in r.
Proof. First, note that The event {inf T n+r ≤k<Tm X k ≤ n} implies that for every site i ∈ [n + 1, n + r] the excited random walk jumps from i to i − 1 at least one time before time T m . Therefore, Now, the asymptotic age distribution for a discrete renewal process (see Section 6.2 of [Law06]) implies that for any k ≥ Applying this to (30) and (31) we obtain P inf k≥T n+r The tail decay of σ 1 in (7) implies that when δ > 1 there exists a constant C > 0 such that E[(σ 1 − r) + ] ≤ Cr 1−δ for any r ≥ 1.
We will also need the following large deviation asymptotics for heavy tailed random variables.
Remark 6.4. Lemma 6.3 is not new, but we provide a quick proof here for the convenience of the reader since we could not find a statement of this lemma in the literature.
When κ ∈ (1, 2] it is no longer necessarily true that P ( n k=1 Z k > xn) ∼ nP (Z 1 > xn) and so a different approach is needed. To this end, first note that since the Z k are non-negative a simple lower bound is P n k=1 Z k > xn ≥ P (∃k ≤ n : Z k > xn) = 1 − (1 − P (Z 1 > xn)) n .
We are now ready to give the proof of the main result of this section.
Proof of Theorem 1.3. We first prove the polynomial rate of decay for the hitting time probabilities in (4). Since σ k and W k are sums of k i.i.d. non-negative random variables with tail decay given by (7), Lemma 6.3 implies that Recall the relationship between the hitting times T n and the branching processes V i and V (n) i given in (5). Also, note that the branching process V (n) i starts with V (n) 0 = V n and has the same offspring distribution as the branching process V i but without the extra immigrant each generation. Thus, V (n) i = 0 implies that V (n) j for all j ≥ i and the processes are naturally coupled so that V (n) i ≤ V n+i for all i ≥ 1. Therefore, T n is stochastically dominated by n+2 where k(n) is defined by σ k(n)−1 < n ≤ σ k(n) . Thus, for any c > 0 P (T n > nt) ≤ P (k(n) > cn) + P W cn > n(t − 1) 2 ≤ P (σ cn < n) + P W cn > n(t − 1) 2 .
While (32) implies that the right tail large deviations of σ k /k decay polynomially, the left tail large deviations decay exponentially since σ k is the sum of non-negative random variables (use Cramer's theorem). That is, Therefore, if we can choose c such that 1/c < E[σ 1 ] and (t − 1)/(2c) > E[W 1 ] the first term in (34) will decay exponentially in n while the second term will decay polynomially on the order n 1−δ/2 .
For a matching lower bound on the polynomial rate of decay of P (T n > nt), we again use the relationship between the hitting times and the branching process in (5) to obtain P (T n > nt) ≥ P n i=1 V i > n(t − 1) 2 ≥ P ∃k ≤ n : W k > n(t − 1) 2 , σ k ≤ n ≥ P W cn > n(t − 1) 2 − P (σ cn > n).
We now turn to the subexponential rate of decay for P (X n < xn). A lower bound follows immediately from (4) since P (X n < xn) ≥ P (T xn > n). To obtain a corresponding upper bound, note that P (X n < xn) ≤ P (T n(x+ε) > n) + P inf k>T n(x+ε) X k < xn ≤ P (T n(x+ε) > n) + C(nε) 1−δ , where the last inequality follows from Lemma 6.1. Now, if ε > 0 is sufficiently small (so that x + ε < v 0 ) then (4) implies that the probability in (35) is n 1−δ/2+o(1) . Since n 1−δ/2 is much larger than n 1−δ this completes the proof of the upper bound needed for (3).