LARGE FAVOURITE SITES OF SIMPLE RANDOM WALK AND THE WIENER PROCESS

: Let U ( n ) denote the most visited point by a simple symmetric random walk f S k g k (cid:21) 0 in the (cid:12)rst n steps. It is known that U ( n ) and max 0 (cid:20) k (cid:20) n S k satisfy the same law of the iterated logarithm, but have di(cid:11)erent upper functions (in the sense of P. L(cid:19)evy). The distance between them however turns out to be transient. In this paper, we establish the exact rate of escape of this distance. The corresponding problem for the Wiener process is also studied. Summary. Let U ( n ) denote the most visited point by a simple symmetric random walk f S k g k (cid:21) 0 in the (cid:12)rst n steps. It is known that U ( n ) and max 0 (cid:20) k (cid:20) n S k satisfy the same law of the iterated logarithm, but have di(cid:11)erent upper functions (in the sense of P. L(cid:19)evy). The distance between them however turns out to be transient. In this paper, we establish the exact rate of escape of this distance. The corresponding problem for the Wiener process is also studied.


Introduction
Let {S k } k≥0 denote a simple symmetric (Bernoulli) random walk on the line, starting from 0, i.e. at each step, the random walk visits either of its two neighbours with equal probability 1/2. Define, for n ≥ 0 and x ∈ Z, which counts the number of visits of the site x by the random walk in the first n steps. Let which stands for the set of the most visited sites or favourite sites of the random walk.
We (measurably) choose an arbitrary point in U(n), say, x, which is referred to by Erdős and Révész [12] as the (largest) favourite site of {S k } 0≤k≤n .
We mention that all the results for U(n) stated in this paper remain true if "max" is replaced for example by "min" in (1.2). The process U(n) has some surprising properties. For example, it is proved by Bass and Griffin [2] that it is transient, in the sense that lim n→∞ |U(n)| = ∞ almost surely. More precisely, they obtain the following: Theorem A ( [2]). With probability one, lim inf n→∞ (log n) a n 1/2 |U(n)| = 0 if a < 1, ∞ if a > 11.
Remark. The exact rate of escape of |U(n)| is unknown.
Theorem B confirms that both U(n) and S n def = max 0≤k≤n S k satisfy the same law of the iterated logarithm (LIL). A natural question is: do they have the same upper functions? Of course, for the random walk, the upper functions are characterized by the classical Kolmogorov test (also referred to as the Erdős-Feller-Kolmogorov-Petrowsky or EFKP test, cf. Révész [21, p. 35]).
Theorem C ( [12]). There exists a deterministic sequence (a n ) n≥0 of non-decreasing positive numbers such that with probability one, U(n) < a n , for all sufficiently large n, S n > a n , for infinitely many n.
As a consequence, U(n) and S n have different upper functions.
Remark. An example of the sequence (a n ) satisfying Theorem C is explicitly given in [12], cf. also Révész [21,Theorem 11.25]. Whether it is possible to obtain an integral test to characterize the upper functions of U(n) remains an unanswered question. See Révész [21, pp. 130-131] for a list of 10 (ten) other open problems for U(n) and U(n).
We suggest to study the upper limits of U(n) in this paper. Intuitively, when U(n) reaches some extraordinarily large values, it would be very close to S n . The question is: how close can U(n) be to S n ? The fact that the process n → S n − U(n) is transient, follows from Révész [21,Theorem 13.25]. Our aim here is to determine the exact escape rate of the process. This problem is communicated to us by Omer Adelman. Theorem 1.1. There exists a universal constant c 0 ∈ (0, ∞) such that lim inf n→∞ (log log n) 3/2 n 1/2 S n − U(n) = c 0 , a.s. Remark 1.1.1. The rate n 1/2 /(log log n) 3/2 might somewhat seem surprising. One might have expected to see for example n 1/2 /(log log n) 1/2 (the rate in Chung's LIL for the random walk), or even something like n 1/2 /(log n) a (for some a > 0; the rate in Hirsch's LIL). (For these LIL's, cf. Chung [5], Hirsch [14], or Csáki [7] for a unified approach). The correct rate of escape of S n − U(n) is therefore a kind of "compromise" between the rates in the Chung and Hirsch LIL's.
Remark 1.1.2. An immediate consequence of Theorem 1.1 is that almost surely for all large n, if S n < c n 1/2 /(log log n) 3/2 (where c < c 0 ), then all the favourite points are in the negative part of the line.
Theorem 1.1 provides information about the absolute distance between U(n) and S n .
However, one may wonder how U(n) can be close to S n in the scale of the latter. Our answer to this is a self-normalized LIL stated as follows.
where j 0 ≈ 2, 405 is the smallest positive root of the Bessel function Remark 1.2.1. It follows from Theorems 1.1 and 1.2 that if ( S n − U(n))/ S n is as small as possible, then S n should be very large. More precisely, the events {S n − U(n) < c 1 S n (log log n) −2 } and {S n < c 2 (n log log n) 1/2 }, where c 1 c 2 < c 0 , cannot occur simultaneously for infinitely many n with probability one.
We conclude the introduction part by mentioning that the problem of the favourite sites for random walk is also studied by Tóth and Werner [24]. See also Khoshnevisan and Lewis [17] for the Poisson process, Borodin [4], Eisenbaum [10] and Leuridan [19] for the Wiener process, Eisenbaum [11] for the stable Lévy process, Bertoin and Marsalle [3] for the drifted Wiener process, and Hu and Shi [15] for the Wiener process in space.
The rest of the paper is as follows. Section 2 is devoted to some preliminaries for Brownian local times and Bessel processes. Theorem 1.2 is proved in Section 3, and Theorem 1.1 in Section 4.
In the sequel, c i (3 ≤ i ≤ 22) denote some (finite positive) universal constants, except that when their values depend on ε, they will be written as c i (ε). We adopt the usual Since we only deal with (possibly random) indices n and t which ultimately tend to infinity, our statementssometimes without further mention -are to be understood for the situation when the appropriate index is sufficiently large. We also mention that our use of "almost surely" is not systematic.

Preliminaries
In the rest of the paper, {W (t); t ≥ 0} denotes a real-valued Wiener process with W (0) = 0. There exists a jointly continuous version of the local time process of W , We shall be working on this jointly continuous version.
Consider the process of the first hitting times for W : Let us recall the following well-known Ray-Knight theorem, cf. Ray [20], Knight [18], or Rogers and Williams [23, Theorem VI.52.1 (i)]: it is a squared Bessel process of dimension 2 starting from 0, and becomes a squared Bessel process of dimension 0 for x ≥ 1.
Remark 2.1.1. We recall that when d ≥ 1 is integer, a d-dimensional Bessel process can be realized as the Euclidean norm of an R d -valued Wiener process. On the other hand, a squared Bessel process of dimension 0 is a diffusion process with generator 2x d 2 / dx 2 , absorbed once it hits 0.
Notation. Throughout the paper, In words, ζ Z denotes the life-time of Z, and L H is the last exit time of H from 1. Since the 4-dimensional squared Bessel process H is transient, the random variable L H is welldefined.
The next is a collection of known results on the Bessel processes, which we shall need later. Fact 2.2 is a duality theorem for Bessel processes of dimensions 0 and 4. A more general result can be found in Revuz where " law = " denotes identity in distribution. In words, a Bessel process of dimension 0, starting from 1, is the time reversal of a Bessel process of dimension 4, starting from 0, killed when exiting from 1 for the last time.
Fact 2.4. As x goes to 0, where j 0 is as before the smallest positive root of J 0 , and c 3 is an absolute constant whose value is explicitly known. As a consequence, there exists an absolute constant c 4 such that for all t > 0 and x > 0, Similarly, there exist c 5 and c 6 such that for all positive t and x, Fact 2.5. The probability transition density of the (strong) Markov process Q is given by, for t > 0, where I 0 is the modified Bessel function of index 0.
where " · " is the Euclidean norm in R 2 .
Finally, let us recall three results for local times. The first (Fact 2.7) is Kesten's LIL for the maximum local time, cf. [16]. For an improvement in form of integral criterion, cf.
Csáki [8]. The second (Fact 2.8), which concerns the increments of the Wiener local time with respect to the space variable, is due to Bass and Griffin [2]. The third (Fact 2.9) is a joint strong approximation theorem, cf. Révész [21, pp. 105-107].
a.s., (2.12) where ξ x n and L x n denote the local times of (S k ) and W respectively. Remark 2.9.1. The approximation rate in (2.11) is not optimal, but is sufficient for our needs. For the best possible rates, cf. Csörgő and Horváth [9].

Proof of Theorem 1.2
Without loss of generality, we shall be working in an enlarged probability space where the coupling for {S k } k≥0 and W in Fact 2.9 is satisfied. Recall that L x t is the local time of W . For brevity, write The main result in this section is the following theorem.
There exists ε 0 ∈ (0, 1) such that for all 0 < ε < ε 0 , we have, (i) almost surely for all sufficiently large t, (ii) almost surely, there exists a sequence (t n ) ↑ ∞, satisfying By admitting Theorem 3.1 for the moment, we can now easily prove Theorem 1.2.
Proof of Theorem 1.2. Fix a small ε > 0. Let {S k } k≥0 and W be the coupling in Fact 2.9. According to (2.12), for all large n, In the last inequality, we have used the following well-known LIL's (cf. for example Révész [21, pp. 35 and 39]): for a > 0 and almost surely all large n, For other applications later, we mention that (3.5) has a continuous-time analogue (Révész [21, p. 53]): for a > 0 and almost surely all large t, or, equivalently, for a > 0 and almost surely all large r, Applying (3.2), (2.11) and (2.10), and in view of (3.4), we obtain (writing b ε (n) By the definition of U(n) (cf. (1.2)), this yields that (almost surely) for all large n, This implies the lower bound in Theorem 1.2, as ε can be as close to 0 as possible. The upper bound in the theorem can be proved exactly in the same way, using (3.3) instead of (3.2).
To prove Theorem 3.1, we need the following two lemmas.
Consequently, for all 0 < y ≤ 1, Proof. Let as before Q and Z be squared Bessel processes of dimensions 2 and 0 respectively, with Q(0) = 0 and Z(0) = 1. Assume they are independent. By the Ray-Knight theorem (cf. Fact 2.1 in Section 2), sup x≤0 L x T (1) has the same law as Q(1) sup t≥0 Z(t). Since Z is a linear diffusion process in natural scale (Revuz and Yor [22,Chap. XI]), we have P( sup t≥0 Z(t) < z) = 1 − z −1 for all z > 1. Accordingly, by conditioning on Q(1), Recall that Q(1) has the exponential distribution, with mean 2, this immediately yields the lemma.
Proof. Write Λ 1 for the probability term on the left hand side of (3.9). Since Q can be considered as the squared modulus of a planar Wiener process, by conditioning on {Q(t); 0 ≤ t ≤ a} and using Anderson's inequality (Fact 2.6), where Q 2 is an independent copy of Q. Now, applying (2.7) yields the last identity following from integration by parts. By the usual Gaussian tail estimate, We have used the fact that Proof of (3.2). Fix a small ε > 0, and define Clearly, for each n, Θ n is a stopping time with respect to the natural filtration of W .
Consider the events, on {Θ n < ∞}, This means Consider now the process { W (t) By the strong Markov property, W is again a Wiener process, independent of F Θ n , where {F t } t≥0 denotes the natural filtration of W . We can define the local time L and first hitting time T for W exactly as L and T for W . Clearly, for all t ≥ 0 and x ∈ R, Assume T (r n−1 ) < Θ n ≤ T (r n ). Then W (Θ n ) ≥ (1 − δ n )r n , which implies T (r n ) − Θ n ≤ T (δ n r n ). In view of (3.11), we have, on {T (r n−1 ) < Θ n ≤ T (r n )}, Since {T (r n−1 ) < Θ n ≤ T (r n )} is an F Θ n -measurable event, combining this with (3.10) gives By scaling, the second probability term on the right hand side is by means of (3.8). It follows that (3.13) P T (r n−1 ) ≤ Θ n ≤ T (r n ) ≤ n ε c 12 (ε) P(F n ) + P Θ n = T (r n−1 ) .
By the scaling property of W , According to the Ray-Knight theorem (cf. Fact 2.1), where Q is a 2-dimensional squared Bessel process (with Q(0) = 0) as in (2.4). Since 2(r n − r n−1 )/r n < δ n (for large n), we can apply Lemma 3.3 to arrive at Moreover, In view of (3.13), we have n P(T (r n−1 ) ≤ Θ n ≤ T (r n )) < ∞. By the Borel-Cantelli lemma, almost surely for all large n and t ∈ [T (r n−1 ), T (r n )], Since for t ∈ [T (r n−1 ), T (r n )], (the last inequality following from (3.6)), and we also have This yields (3.2) (replacing ε by a small constant multiple of ε), hence the first part in Theorem 3.1.
By the strong Markov property, According to the Ray-Knight theorem (Fact 2.1), the last probability term equals where Q is as before a 2-dimensional squared Bessel process starting from 0, and Z is a squared Bessel process of dimension 0, starting from 1, independent of Q. Therefore, (1 − 6ε)j 0 δ 1/2 n < Q(δ n ) < (1 − 4ε)j 0 δ 1/2 n , after time δ n , the process Q hits ε 2 j 0 δ Recall that Q is a (strong) Markov process. Write P x (for x ≥ 0) the probability under with Q starts from x (thus P 0 = P). Define for r > 0, σ(r) = inf t > 0 : Q(t) = r .
Finally, by triangular inequality and Fact 2.4, Assembling (3.14)-(3.18): which implies n P(G n ) = ∞. By the strong Markov property, G n are independent events.

Proof of Theorem 1.1
That the liminf expression in Theorem 1.1 should be a constant (possibly zero or infinite) can be seen by means of a 0-1 argument. Indeed, write and we now show that c 0 is almost surely a constant.
By the Hewitt-Savage 0-1 law, it suffices to check that c 0 remains unchanged under any finite permutation of the variables {X i } i≥1 . By induction, we only have to treat the case of permutation between two elements, say X i and X j . Without loss of generality, we can assume that |j − i| = 1.
For typesetting simplification, we write the proof only for the case i = 1 and j = 2. Let and define the corresponding simple random walk S 0 = 0 and There is also a local time process ξ x n associated with { S n } n≥0 , and the (largest) favourite point is denoted by U(n). For all x ∈ Z\{−1, 1}, ξ x n = ξ x n , and | ξ y n − ξ y n | ≤ 1 if y = ±1. It is proved by Bass and Griffin [2] that ξ y n ≤ sup x∈Z ξ x n − 2 (for y = ±1), almost surely for all large n. Therefore eventually.
Since max 0≤k≤n S k = S n for all large n, this proves that c 0 remains unchanged under the permutation between X 1 and X 2 .
Consequently, c 0 is almost surely a constant.  = t 1/2 /(log log t) 3/2 . There exist universal constants c > 0 and ε 0 ∈ (0, 1) such that for all 0 < ε < ε 0 and almost surely all sufficiently large t, The rest of the section aims at the proof of Theorem 4.1, which is based on several preliminary estimates. We start with the following estimates for Gaussian tails, which will be frequently used later. Recall that Q, H are squared Bessel processes of dimensions 2 and 4 respectively, both starting from 0 (cf. (2.4) and (2.5)), and that T is the process of first hitting times for W , cf. (2.1). Then for all positive x, t and r, Recall that Z is a squared Bessel process of dimension 0, starting from 1, and ζ Z is Proof. Let L H be the last exit time from 1 of H, cf. (2.6). By Fact 2.2, sup t≥0 Z(t)/ζ Z has the same law as sup 0≤t≤L H H(t)/L H . Applying Fact 2.3 to the bounded functional , by means of the Hölder inequality. Since by a Gaussian calculation, H −3/2 (1) has finite expectation, this yields (4.6) by using (2.8) (which, as was recalled in Section 2, goes back to Ciesielski and Taylor [6]). The proof of (4.7) follows exactly from the same lines, using (4.3) instead of (2.8).

Lemma 4.3.
For any x > 0 and t > 0, Proof. Recall that Z is a diffusion process, starting from 1, absorbed by 0, with generator 2x d 2 / dx 2 . Therefore, it can be realized as, for t < ζ Z , where W is the Wiener process. Hence Z(t) ≤ 1 + W (t) for all t < ζ Z . Accordingly, by virtue of the usual Gaussian tail estimate. Since (x 1/2 − 1) 2 ≥ (x − 2)/2, this gives On the other hand, by means of (4.7). Combining (4.9) and (4.10) yields the lemma. where c 19 is the absolute constant in (4.6).
Observe that by scaling, Since T (1) = ∞ 0 L 1−t T (1) dt, by distinguishing two possible situations ν n T (1) > 1 and ν n T (1) ≤ 1, the Ray-Knight theorem (cf. Fact 2.1) confirms that the probability term on the right hand side with the notation where, as before (cf. (2.2) and (2.4)), Q is a 2-dimensional squared Bessel process starting from 0, and Z is a squared Bessel process of dimension 0 starting from 1 (the processes Q and Z being independent). For our needs later, we insist that (4.17) X n law = ν n T (1).
The proof of (4.16) (hence of Theorem 4.1) will be complete once we prove the following lemma. The proof of Lemma 4.4 is divided into two parts, namely, the two estimates (4.18) and (4.19) are established separately.

Large favourite sites of the Wiener process
The problem of favourite sites can be posed for the Wiener process W as well. Let L x t be the jointly continuous local time process of W , and we can define the set of the favourite sites of W : It is known (cf. Leuridan [19], Eisenbaum [11]) that almost surely for all t > 0, V(t) is either a singleton or composed of two points. Let us choose easily see that a 0-1 law applies also for V (t). Indeed, Bass and Griffin [2] proved that lim t→∞ |V (t)|(log t) 12 t −1/2 = ∞ a.s., so V (t) depends on large values of the Wiener process W and hence the initial portion {W (s), 0 ≤ s ≤ log t} has no influence on V (t). It follows that the lim inf in Theorem 5.1 should be a constant. We believe that c 0 must be identical with c 0 of Theorem 1.1 but due to lack of strong invariance principle between U and V we can not prove it.
Theorem 5.2. Almost surely, where j 0 is the smallest positive root of the Bessel function J 0 (·).