Extreme value theory for random walks on homogeneous spaces

In this paper we study extreme events for random walks on homogeneous spaces. We consider the following three cases. On the torus we study closest returns of a random walk to a fixed point in the space. For a random walk on the space of unimod- ular lattices we study extreme values for lengths of the shortest vector in a lattice. For a random walk on a homogeneous space we study the maximal distance a random walk gets away from an arbitrary fixed point in the space. We prove an exact limiting distribution on the torus and upper and lower bounds for sparse subsequences of random walks in the two other cases. In all three settings we obtain a logarithm law.


Introduction
Let X be a probability space and G a group acting on X . Let m be a G-invariant probability measure on X and fix also a probability measure µ on G. We define a random walk on X as a sequence of random variables X i = g i · · · g 1 x where the g j 's have distribution µ and x has distribution m. Fix a function ∆ : X → Ê. The focus of our interest is the random variable M n = max 0≤i<n ∆(X i ).
There exists a natural measure on the space of all random walks on X which we denote by P and define formally in section 3.1. We are particularly interested in the existence of sequences a n and b n such that the distribution P(M n ≤ a n r + b n ) has a non-degenerate limit and if this is the case, determining the limit. We refer to such a limit as the extreme value distribution of the random walk. One reason why extreme value distributions are interesting is that they imply asymptotics for the growth of extreme values of ∆(X n ). In many cases this turns out to be a logarithm law, namely we get that almost surely lim sup n→∞ ∆(X n ) log n = C for some C > 0. One result of this kind is Sullivans logarithm law for geodesics on hyperbolic d-space [23]. Kleinbock and Margulis later generalised this to certain classes of homogeneous spaces, see [11], and Athreya, Ghosh and Prasad proved ultrametric analogues of this result, see [1], [2]. The general framework for determining extreme value distributions is known as extreme value theory (EVT). EVT was first applied in dynamics by Collet [10], who studied C 2 This research was supported by ERC grant 239606.
transformations T of an interval. He was interested in the entrance times of T j x into a shrinking neighborhood around a fixed point x 0 and to understand this, he determined the limiting distribution of the maximum of − log d(T j x, x 0 ). Similar results to Collets have since been proven for other choices of T and other types of maps, see for example [13], [17], [18], [19]. In the context of this paper, recent results by Aytac, Freitas and Vaienti [3] are particularly interesting as they apply EVT to a setting involving randomness, more precisely, iterations of a randomly perturbed map. Freitas, Freitas and Todd have developed a general framework for applying EVT to dynamical systems T : X → X, see [14], [15], [16].
Classically, random walks were studied as objects living on Ê d . However, the concept of random walks generalizes easily to many other spaces, for example to homogeneous spaces with a group action which we are particularly interested in. In [12], Eskin and Margulis studied recurrence properties for random walks on finite volume homogeneous spaces G/Γ where G is a semisimple Lie group and Γ a nonuniform irreducible lattice. In a series of papers Benoist and Quint [5], [6], [7], [8], developed this theory further by studying stationary measures on G/Γ while also generalizing their results to p-adic Lie groups.
The main idea of this paper is to apply EVT to random walks on homogeneous spaces. The level of dependency among the X i 's is the deciding factor in whether EVT can successfully be applied to obtain limiting distributions for the maximum of ∆(X i ). The closer X i is to being an independent sequence the easier it is to apply EVT. EVT provides independence-like conditions that, if satisfied by X i , imply a limiting distribution for M n . The idea of this paper is to verify these conditions by rewriting the joint distribution of the random walk using the averaging operator. The spectral gap property of the averaging operator is the crucial ingredient in showing that the independence-like conditions are satisfied by the random walk.
Our main results are divided into three different settings. In the following, let S µ and G µ denote the semigroup and group generated by the support of µ respectively.
1.1. Closest returns on the torus. Let X = Ì d be the d-dimensional torus with Lebesque measure m and Euclidian metric d. Let G = Aut(Ì d ) denote the group of linear automorphisms of Ì d and fix a probability measure µ on the group. We assume that there is no G µ -invariant factor torus T of Ì d such that the projection of G µ on Aut(T ) is amenable.
We are interested in the closest returns of a random walk to a fixed point on the torus and in particular, how these shortest distances distribute. Let x 0 ∈ X be fixed and define We see that for small values of d(x, x 0 ), ∆(x) becomes large hence we can study the closest returns of X i by looking at successive maxima of ∆(X i ). Theorem 1.1. Assume that the support of µ is bounded and that det(g − I) = 0 for all g ∈ S µ . Then for u n = r + 1 d log n we have that for a.e. x 0 ∈ X lim where V d is the volume of the unit ball in Ê d .
The stationary measures of random walks on the torus were recently studied by Bourgain, Furman, Lindenstrauss and Mozer [9].
The limiting distribution implies a logarithm law. That is, For P-a.e. random walk and every x 0 ∈ X we have Actually we will see later that we only need a sufficiently good upper bound on the limiting distribution of M n to derive the logarithm law.
1.2. Shortest vectors on the space of unimodular lattices. Let X = L d denote the space of d-dimensional unimodular lattices and let m denote the normalized Haar measure on L d . Recall that L d can be identified with SL(d, Ê)/SL(d, ) and thus can be thought of as a homogeneous space. Let G = SL(d, Ê) and fix a probability measure µ on the group.
Assume that G µ is non-amenable. Set We see that this maximum will always be attained for the shortest vector in the lattice Λ. The function ∆ plays a crucial role in connections between flows on the space of lattices in Ê d and Diophantine approximation.
where V d is the volume of the unit ball in Ê d . There exist constants w(a) ∈ Ê such that w(a) → w as a → ∞ and As the reader will notice, we are not able to prove an exact limiting distribution. Instead we get an upper and lower bound only differing by a constant multiple which goes to zero as the random walk becomes infinitely sparse. The difference between this case and the random walk on the torus is that one of the independence-like conditions from EVT is not fully satisfied in this setup. It is natural to ask what additional assumptions would suffice to prove an exact limit. This question is answered by the following theorem.
where w is the constant from Theorem 1.3.
Again we obtain a logarithm law.
Corollary 1.5. For P-almost every random walk and every x 0 ∈ X we have lim sup n→∞ ∆(X n ) log n = 1 d .

1.3.
Maximal excursions on homogeneous spaces. Let X = G/Γ where G is a simple, non-compact Lie group with finite center and Γ a non-uniform lattice in G. Let m denote the normalized Haar measure on X and fix also a probability measure µ on G. Assume that G µ is non-amenable. We are interested in the maximal distance a random walk gets away from some arbitrary fixed point x 0 ∈ X . Therefore, define where d is a Riemannian metric on X chosen by fixing a right invariant Riemannian metric on G which is bi-invariant with respect to a maximal compact subgroup of G. Let M n,a be defined as in (1.2). Theorem 1.6. There exists constants k > 0, w > 0 and w(a) ∈ Ê such that for sufficiently large a we have w(a) > 0 and where u n = r + 1 k log n. Remark 1.7. The constant k is explicit and has been computed in [11] (Lemma 5.6).
Again we do not obtain an exact limit and again this relates to the inability to verify one of the independence-like conditions from EVT. In this setting we also do not prove an analogue of Theorem 1.4. The reason is that we need to know exact asymptotics for the tail distribution function of ∆, a property which we call k-SDL (see definition 3.2). While this was proven in [11] for the shortest vectors on L d , only a weaker property called k-DL is known for the Riemannian distance on homogeneous spaces.
As in the previous cases a logarithm law follows from Theorem 1.6.
Corollary 1.8. For P-almost every random walk and for all x 0 ∈ X we have lim sup n→∞ ∆(X n ) log n = 1 k .
Here k > 0 is again the constant from Remark 1.7.
This is a random walk analogue of the logarithm law Kleinbock and Margulis proved for geodesics. A natural question to ask is whether one could determine the extreme value distribution for the geodesic flow, since it would be a generalization of the logarithm law mentioned. One result in this direction is by Pollicott [21]. He determines the exact limiting distribution for the geodesic flow on SL(2, Ê)/SL(2, ). However, the proof uses connections between geodesics on the upper half plane and continued fractions, a connection that only exists for d = 2.
1.4. Structure of the paper. We begin by giving a short introduction to extreme value theory in Section 2. This introduction is short and in no way a complete overview of the field. However, for the reader unfamiliar with extreme value theory, the section should be sufficient to understand this paper without having to look elsewhere. In Section 3 we formally define the random walk and introduce the main tools used in the paper. In this section we also show how the averaging operator and its spectral gap property is used to prove quasi-independence for the random walk. We prove various results for the limiting distribution of the random walk under general assumptions. In Section 4 we finalize the proofs of our main theorems using the results from the previous section and known results. For the case of the torus an additional argument is required which we give in this section as well.

General extreme value theory
EVT deals with determining the distributional properties of the maximum or minimum of a sequence of random variables f n as n becomes large. This task is fairly simple if one assumes that the random variables are mutually independent. However, in many interesting cases we have some degree of dependence among the random variables. How much we can prove in the dependent case is related to how strong the dependency among the random variables is.
In the following we elaborate on the basics of EVT for stationary sequences of identically distributed random variables. For a reference on general extreme value theory, see [20].
Notice that F f 0 ,...,f n−1 (r) = P(M n ≤ r). Also notice that since the f i are identically distributed we have F f i (r) = F f j (r) for all i, j ∈ AE. We denote this common distribution simply by F . We are concerned with the limiting distribution of M n under linear scalings a −1 n (M n − b n ), where a n > 0 and b n are sequences of real numbers. By this we mean the limit lim n→∞ P M n − b n a n ≤ r , where r ∈ Ê. The sequences a n and b n , known as scaling sequences, are introduced in order to avoid cases of degenerate limiting distributions, a notion we explain in the following. To understand why degenerate cases occur, look for example at any i.i.d. stochastic process.
In this case we easily see that We call this a degenerate limiting distribution and we see that such one provides us with little information about M n . Later in this section we discuss how to determine a n and b n , but for now assume these exist such that P M n − b n a n ≤ r = P (M n ≤ a n r + b n ) → G(r), where G : Ê → [0, 1] is a non-degenerate distribution function. To simplify notation set u n := a n r + b n . As mentioned, the i.i.d. case is the simplest, and in this case the limiting distribution of M n is known. When dealing with the dependent case, we are interested in stationary sequences that only exhibit little dependency. In other words, these are sequences that in some sense are close to being independent. This notion is formalized through two independence type conditions denoted D(u n ) and D ′ (u n ).
Condition D(u n ). Condition D(u n ) will be said to hold for f i and u n if for any integers 0 ≤ i 1 < · · · < i p < j 1 < · · · < j p ′ < n for which j 1 − i p ≥ l, we have where there exists a sequence l n s.t. α(n, l n ) → 0 as n → ∞ and ln n → 0 for n → ∞. Condition D ′ (u n ). Condition D ′ (u n ) will be said to hold for f i and u n if It is a standard result from EVT that if a stationary sequence f i satisfies these two conditions, then the limiting distribution of M n is the same as if f i were an i.i.d. process. This is the content of the following theorem. ). Let u n = a n r + b n be a scaling sequence s.t. D(u n ) and D ′ (u n ) are satisfied for the stationary sequence f n . If τ = τ (r) is a real function such that nP(f 0 > u n ) → τ , then (2.1) For some cases of dependent stationary sequences, either or both of Condition D(u n ) and D ′ (u n ) are not satisfied. However, it is possible to weaken these conditions and still salvage some information about the limiting distribution. For the purpose of this paper we introduce the following weakened version of Condition D ′ (u n ).
For the stationary sequence f i , let Condition D ′ g(r) (u n ) will be said to hold for f i and u n if lim sup where g : Ê → Ê only depends on r ∈ Ê.
Under these weakened assumptions we can prove the following theorem.
Theorem 2.2. Let u n = a n r + b n be a scaling sequence s.t. D(u n ) and D ′ g(r) (u n ) are satisfied for the stationary sequence f i . If τ 1 = τ 1 (r) and τ 2 = τ 2 (r) denote real functions such that The proof of Theorem 2.2 is essentially similar to the proof of Theorem 2.1. Notice that we also made a weakening of the assumption that nP(f 0 > u n ) → τ . This is to accommodate cases where the limit cannot be determined or does not exist.
Until now we have assumed the existence of scaling sequences a n and b n such that the limit of P(M n ≤ a n r + b n ) is non-degenerate. However, such scaling sequences do not necessarily exist. In the case of Theorem 2.1, the assumption that nP(f 0 > u n ) → τ provides the most straightforward way to determine if suitable a n and b n exist and, if this is the case, what they are. Namely, we see that if the limit function τ is either zero or infinity, then the limit in (2.1) becomes a degenerate distribution. Thus in order to obtain a non-degenerate limit, we must choose a n and b n such that the limit nP(f 0 > u n ) → τ is non-trivial. In specific cases writing out the expression for nP(f 0 > u n ) often provides an easy way to see how a n and b n must be chosen in order for the limit to exist and be non-trivial.
Similarly for Theorem 2.2, if τ 2 = ∞ then we get a trivial lower bound on the lim inf of P(M n ≤ u n ). So again, by looking at the expression for nP(f 0 > u n ) we can often see how a n and b n must be chosen for the upper bound on the lim sup to be less than infinity.

EVT for random walks in a general setting
In this section we define random walks on a general probability space with a group action. In this general setting we show how the averaging operator can be used to prove extreme value distributions and logarithm laws for random walks. First we introduce the setup and define notation.
3.1. Notation and setup. Let (X , m) denote a probability space and G a group acting measurably on X preserving m. Fix also some probability measure µ on G. The product space G ×AE naturally inherits the product measure µ ⊗AE which is also a probability measure.
We define the probability space (Y, P) by We denote elements in G ×AE byḡ and write these asḡ = (g 1 , . . . , g i , . . . ). By a random walk on X generated by G we mean a sequence of the form X i = g i · · · g 1 x where x ∈ X has distribution m and the g j ∈ G have distribution µ. Define the map L i : G ×AE → G by L i (ḡ) = g i · · · g 1 . Then for each i, L i (ḡ)x represents the i'th position of the random walk along the pathḡ starting at x. We use the convention that L 0 (ḡ) = e, i.e. the neutral element in G. We define the sequence of random variables X i : Y → X by We see that Y can be thought of as the space of all possible random walks on X . Let ∆ : X → Ê. We define the sequence of real random variables ξ i : Y → Ê by and define a new sequence of random variables M n : Y → Ê by It follows from G-invariance of m that ξ i is a stationary sequence with respect to P. Stationarity in particular implies that the random variables are identically distributed and we let F (r) denote the common distribution function of the ξ i .
We denote by The natural probability measure on G i is the convolution measure defined as the push- for any function f : G i → Ê.
3.1.1. Averaging operator. As previously mentioned, the so-called averaging operator plays a very important role in this work. Denote by A : L 2 (X , m) → L 2 (X , m) the averaging operator with respect to G given by where f ∈ L 2 (X , m). We get the n'th iterate of A by straight forward calculation, this is Since m is G-invariant we also get Notice that A is linear.
Definition 3.1. We say that the averaging operator has spectral gap in L 2 (X , m) if there exists constants λ ∈ (0, 1) and c 0 > 0 such that for all f ∈ L 2 (X , m) and all n ∈ AE We are going to introduce two types of function ∆ that we are interested in. For this we need the tail distribution function of ∆. We define this as Notice that Definition 3.2. For k > 0, we say that ∆ is k-DL ("Distance-Like") if it is continuous and satisfies For k > 0, we say that ∆ is k-SDL ("Strong-Distance-Like") if it is continuous and satisfies The notion of distance-like functions was introduced in [11]. Throughout the paper we will make use of big O notation as well as Vinogradov symbols when appropriate. So for a set S and functions f, g on S we write f (s) = O(g(s)) if there exists a constant c such that |f (s)| ≤ c |g(s)| for all s ∈ S. We sometimes write f (s) ≪ g(s) meaning the same as f (s) = O(g(s)) and we write f (s) ≍ g(s) if f (s) ≪ g(s) and g(s) ≪ f (s).

3.2.
Bounds on the limiting distribution of M n . Theorem 3.3. Assume that ∆ is k-DL for some k > 0 and that A has spectral gap on L 2 (X , m). Set u n = r + 1 where v 1 , v 2 and c 0 , λ are the constants from Definition 3.2 and 3.1 respectively.
Remark 3.4. We see that for λ close to 1, we get θ λ > 0 rendering the upper bound on the limiting distribution trivial. However, for small values of λ we get θ λ < 0, hence a non-trivial upper bound.
Naturally, the strategy of the proof will be to verify the assumptions of Theorem 2.2. We begin by determining the correct scaling sequences a n and b n . Assume that ∆ is a Since Φ ∆ (u n ) = P(ξ 0 > u n ), the upper bound on (2.2) will be non-trivial if we can find sequences a n and b n such that the limit of ne −k(anr+bn) exists and is non-trivial. By writing ne −k(anr+bn) = ne kanr e kbn it is easy to see that for a n = 1 and b n = 1 k log n we get Obviously for this choice of scaling sequences we also get a non-trivial lower bound. We formulate this conclusion as a lemma where v 1 , v 2 > 0 are the constants from Definition 3.2.
Remark 3.6. It follows immediately that if ∆ is assumed to be k-SDL, then the lemma holds with the same choice of u n .
The next lemma verifies Condition D ′ g(r) (u n ) under the assumptions of Theorem 3.3.
Lemma 3.7. Assume that ∆ is k-DL for some k > 0 and suppose A has spectral gap in where v 2 and c 0 , λ are the constants from Definition 3.2 and 3.1 respectively.
Proof. We can rewrite the joint probability of ξ 0 and ξ j in terms of integrals of characteristic functions. Set W : Then we get Recall the Cauchy-Schwartz inequality stating that f g 1 ≤ f 2 g 2 for f, g ∈ L 2 (X , m). We proceed by estimating the difference |P(ξ 0 > u n , ξ j > u n ) − P(ξ 0 > u n )P(ξ j > u n )|. Written in terms of integrals we have (3.5) The Cauchy-Schwartz inequality was used to get the second last inequality while the spectral gap property of A was applied to get the final estimate. It follows that Since ψ is a characteristic function we know that We also notice that Using that ∆ is k-DL and using that u n = r + 1 k log n we see that We do the summation from Condition D ′ g(r) (u n ) to get Recall that since λ ∈ (0, 1) we have ∞ j=1 λ j = λ 1−λ so when we take the lim sup n→∞ we get Finally taking the lim sup q→∞ gives lim sup So Condition D ′ g(r) (u n ) holds with g(r) = λ 1−λ c 0 v 2 e −kr .
Remark 3.8. Notice that g(r) vanishes as the spectral gap λ goes to zero.
3.2.1. Verifying Condition D(u n ). To verify Condition D(u n ) we need to rewrite the joint distribution function of the ξ i using the averaging operator. The idea is the same as the one we used to rewrite the joint distribution in the proof of Lemma 3.7. Now we essentially do the same calculation in higher generality. Throughout the following computation letn = (n 1 , . . . , n t ) denote a fixed t-tuple of integers where n 1 < · · · < n t . Let W = (−∞, u n ] and again use the notation V 0 = {x ∈ X : ∆(x) ∈ W } and Vḡ i = {x ∈ X : ξ i (ḡ, x) ∈ W } introduced in the proof of Lemma 3.7. Furthermore, set Λn := {y ∈ Y : ξ n 1 (y) ∈ W, . . . , ξ nt (y) ∈ W } .
Using this notation we rewrite the joint distribution function of ξ n 1 , . . . , ξ nt in terms of integrals of characteristic functions.
where again ψ := ½ V 0 . It is practical to introduce the notation g [i,j] = g i · · · g j for i > j.
We now look at the integral with respect to µ ⊗AE in (3.6). We can rewrite this integral using the averaging operator in the following way. First we write Now, on the right hand side of (3.7), look only at the integrals with respect to g nt , . . . , g n t−1 +1 . We get Inserting this in (3.7) we get We repeat this step by looking at the integrals in (3.7) with respect to g n t−1 , . . . , g n t−2 +1 . These integrals, rewritten in terms of the averaging operator as done above, become Again we can insert this in (3.7) and repeat the procedure. Doing this t times eventually gives that the integral with respect to µ ⊗AE in (3.6) is By integrating again with respect to m and applying (3.1) we finally get We can simplify notation by defining the following sequence of operators. For the sequencē n = (n 1 , . . . , n t ) and the fixed function ψ : Notice that E i is linear since A is linear. Using this notation and setting ϕ = ψ we get Having rewritten the joint distribution, we proceed by demonstrating how to apply the spectral gap property of the averaging operator. More explicitly, we look at how we can split (3.8) into a product of two integrals at the cost of an error term when A has spectral gap. Letn = (n 1 , . . . , n p , n p+1 , . . . , n t ) and also setq = (n 1 , . . . , n p ) ands = (n p+1 , . . . , n t ). Again assume that n 1 < · · · < n p < n p+1 < · · · < n t . We want to estimate the difference |P(Λn) − P(Λq)P(Λs)| .

Written as integrals this is
Notice that E t n (ϕ) = E p q (ψA n p+1 −np (E t−p s (ϕ))).
Here we alternated between using the Hölder inequality to split into products of norms and equation (3.1) to get rid of the averaging operator. The last inequality holds since ψ is a characteristic function on a probability space. We can now continue the calculation in (3.9) by applying the spectral gap property of A: since it is easily seen that σ 2 ≤ 1. All together we have shown that Proof. Set W = (−∞, u n ] such that forn = (n 1 , . . . , n t ) we have P(Λn) = P(ξ n 1 ≤ u n , . . . , ξ nt ≤ u n ).
We rewrite the distribution function using the averaging operator as demonstrated earlier.
We can now conclude on the proof of Theorem 3.3. In Lemma 3.5 we determined that the inequalities in (2.2) are non-trivial for the scaling sequence u n = r + 1 k log n and in Lemma 3.7 we proved that Condition D ′ g(r) (u n ) is satisfied for ξ i with g(r) = 1 1−λ c 0 v 2 − v 1 . In Lemma 3.9 we proved that Condition D(u n ) is satisfied for ξ i for any choice of u n and ∆. This means that all assumptions of Theorem 2.2 are satisfied and so Theorem 3.3 follows from Theorem 2.2.
Proof. Set η i = ξ ai and fix a ∈ AE. First notice that ξ i being stationary implies that η i is stationary. This also means that the common distribution of ξ i and η i is the same and so nothing is changed in the proof of Lemma 3.5. The appropriate scaling sequence for η i is therefore also u n = r + 1 k log n. In Lemma 3.7 replace j by aj throughout the proof to obtain In Lemma 3.9, equation (3.11) is bounded above by λ n p+1 −np . The equivalent equation for η i is bounded by λ a(n p+1 −np) and so Condition D(u n ) holds as well. Again all assumptions of Theorem 2.2 are satisfied and so the corollary follows from Theorem 2.2.
3.3. Proving Theorem 1.4 in the general setting. Theorem 3.11. Assume that ∆ is k-SDL for some k > 0 and that A has spectral gap on L 2 (X , m). Let {m j } be a subsequence in AE such that {m j+1 − m j } is strictly increasing. Also, let α n < β n denote sequences in AE such that α n → ∞ and N n := β n − α n → ∞.
Then for u n = r + 1 k log N n we have where v 1 > 0 is the constant from Definition 3.2.
We first prove a lemma.
Lemma 3.12. Suppose ∆ is k-SDL for some k > 0 and let u n = r + 1 k log n. Then lim n→∞ P(ξ 0 ≤ u n ) n = e −v 1 e −kr .

Logarithm law for random walks.
Corollary 3.13. Assume that ∆ is k-DL for some k > 0 and that A has spectral gap on L 2 (X , m). Then for P-a.e. y ∈ Y we have lim sup n→∞ ξ n (y) log n = 1 k .
Proof. We prove lim sup n→∞ ξn(y) log n ≤ 1 k and lim sup n→∞ ξn(y) log n ≥ 1 k for P-a.e. y ∈ Y. The proof of the upper bound is an application of the classical Borel-Cantelli Lemma. For completeness we give the proof. Recall the Borel-Cantelli Lemma stating that for any sequence A n ⊂ Y we have that ∞ n=1 P(A n ) < ∞ ⇒ P({y ∈ Y : y ∈ A n for infinitely many n}) = 0.
Let ε > 0 be given. We look at the sequence of sets Since ξ n is stationary we have that Since ∆ is k-DL we get 1 n 1+kε < ∞ implying that for P-a.e. y ∈ Y, the inequality ξ n (y) ≥ 1 k + ε log n only holds true for finitely many n. So by taking the lim sup n→∞ and dividing by log n we get lim sup Since this holds true for every ε > 0 we have proved the desired inequality for P-a.e. y ∈ Y. We now prove the lower bound. Assume for contradiction that the lower bound does not hold, i.e. assume that there exists ε > 0 such that For each y ∈ B we can find sufficiently large n 0 ∈ AE such that This implies that Since P(B) > ε there must be some n 1 ∈ AE for which P sup for some δ > 0. For any n 2 ≥ n 1 and any a ∈ AE we have δ < P max where θ λ = λ a 1−λ a c 0 v 2 − v 1 . For simplicity we make a change of variables. Set r = 1 k log s where s ∈ (0, ∞). Then lim sup Pick a ∈ AE sufficiently large to ensure that θ λ < 0. Let δ > 0 be as in (3.16). Then for s > 0 sufficiently small we get that e θ λ s −1 < δ 2 . Also by picking n ∈ AE sufficiently large we Since ξ aj is stationary, we see that P max Since (3.17) holds for any n 2 ≥ n 1 we can set n 2 := a(n 1 + n). Inserting this in (3.17) gives P max Set n 3 := n 1 + n. It is a simple calculation to show that if we choose n large enough we get This inequality implies the following sequence of inequalities, which is a contradiction.

Proofs of main results
At this stage we are almost done with the proofs of the main results concerning maximal excursions and shortest vectors. The only part that remains is to combine the results of the previous section with known results from other papers.
For the closest returns on the torus we still need some additional arguments specific to this setup. 4.0.1. Proofs of main results for shortest vectors on the space of unimodular lattices. In the setup of Subsection 1.2 it was proven by Kleinbock and Margulis [11] (Proposition 7.1) that ∆(x) as defined in (1.1) is d-SDL. In the proof of the same proposition the explicit value of the constant w is derived as well. Furthermore, we know from Shalom [22] (Theorem C) that in the same setup the averaging operator has spectral gap in L 2 . Notice that the theorem applies to L d since we can identify the space with SL(d, Ê)/SL(d, ). So Theorem 1.3, Theorem 1.4 and Corollary 1.5 follow from Corollary 3.10, Theorem 3.11 and Corollary 3.13 respectively. Using the d-SDL property of ∆ and (3.12) we see that 4.0.2. Proofs of main results for maximal excursions on homogeneous spaces. In the setup of Subsection 1.3 it was also proven by Kleinbock and Margulis [11] (Proposition 5.1) that ∆(x) = d(x, x 0 ) is a k-DL function for some k > 0. The spectral gap property of the averaging operator in L 2 in this setup also follows from Shalom [22] (Theorem C). So Theorem 1.6 and Corollary 1.8 follow from Corollary 3.10 and Corollary 3.13 respectively. 4.1. Proofs of main results for closest returns on the torus. We recall the setup of Theorem 1.1. Let X = Ì d equipped with Lebesque measure m and Euclidian metric d. Also, let G = Aut(Ì d ) equipped with a probability measure µ. Assume that there is no G µ -invariant factor torus T such that the projection of G µ on Aut(T ) is amenable. We know from Bekka and Guivarc'h [4] (Theorem 5) that the averaging operator has spectral gap in L 2 (X , m).
Let x 0 ∈ X be a fixed point and set ∆(x) = − log d(x, x 0 ). The random variables ξ i are then given by The strategy for proving Theorem 1.1 is to verify the assumptions of Theorem 2.1. Notice that Lemma 3.9 verifies Condition D(u n ) for ξ i with any choice of u n and ∆. This means that we are left with the task of determining the scaling sequence u n such that the limit of nP(ξ 0 > u n ) is non-trivial and, for this u n , verifing Condition D ′ (u n ).
First we determine u n . Let B r (x 0 ) ⊂ X denote the ball of radius r at x 0 and V d the volume of the unit ball in Ê d . Then As in the case of Lemma 3.5 we set u n = r + 1 d log n. Since X is locally Euclidian we get that for sufficiently large n, and taking limits we get Again, we collect this conclusion in a lemma. Having determined u n , we proceed to verify Condition D ′ (u n ). This is the step which requires the most work. Fix δ ∈ (0, 1). Recall the Hardy-Littlewood maximal operator M, which for a function f : X → is given by The Hardy-Littlewood maximal inequality then states that for any f ∈ L 1 (X ) we have The next lemma gives sufficient assumptions for Condition D ′ (u n ) to hold.
Lemma 4.2. Suppose that for constants α ∈ (0, d) and κ > 0 we have that for all s > 0, Then Condition D ′ (u n ) holds for ξ i and u n (r) = r + 1 d log n for a.e. x 0 ∈ X . Proof. Using that P = µ ⊗AE ⊗ m we can rewrite the estimate as Define the function Ψ s : X → Ê by and apply the Hardy-Littlewood maximal operator to Ψ s to get Set MΨ s (x) := M s (x). Using (4.2) and the Hardy-Littlewood maximal inequality we get that for every β > 0 Let ε > 0. Set γ = 1+2ε κ and notice that γκ − ε = 1 + ε > 1. Let n be an integer and substitute s with n γ and set β = n −ε . Then The classical Borel-Cantelli Lemma then tells us that for a.e. x 0 ∈ X So there exists a number N(x 0 ) such that for all n ≥ N(x 0 ) we have M n γ (x 0 ) ≤ n −ε . That is Choose n so large that 1 n γ ∈ (0, δ) and set R = 1 n γ . Then We want to switch back to the real variable s instead of the integer variable n while preserving the inequality above. Let s ∈ (n, n+ 1). On the right hand side of the inequality we can clearly substitute n with s − 1 and the inequality will still hold. The left hand side written out is We see that by changing n to s inside the integral, the measure of the intersection becomes smaller. However, to ensure that we are not summing over more terms we need to change n to s − 1 in the upper limit of the sum. All together we get We aim to connect the left hand side of (4.3) with the sum in Condition D ′ (u n ). To do this we derive as follows using the triangle inequality for the inclusion: γ . Notice that the last line is exactly the set inside the integral in (4.3) above with s substituted by l. Using this gives In the last line above we replaced l − 1 with l for notational simplicity. We can do this since we are only interested in the behavior as n → ∞. Inserting the expression for l gives Since (l − 1) γα = O(n α d ) and α d < 1, we see that for sufficiently large n, [(l − 1) γα ] ≤ n q for any q ∈ AE. This means that the left hand side of (4.4) does not necessarily account for the entire quantity that we need to estimate to verify Condition D ′ (u n ). To obtain this we need to add to the left hand side of equation (4.4). To find an upper bound on this sum we apply the averaging operator exactly like in Lemma 3.7. This gives where λ ∈ (0, 1) comes from the spectral gap property of the averaging operator. From the proof of Lemma 4.1 we see that P(ξ 0 > u n ) = 1 n V d e −dr . Inserting this gives Consequently, Adding this to (4.4) we get Taking the lim sup for n → ∞ gives and finally by letting q → ∞ we obtain We conclude that Condition D ′ (u n ) has been established.
In the following set Ω := supp(µ). To complete the proof of Condition D ′ (u n ) we need to show that the estimate in (4.1) holds for the setup of Theorem 1.1. This is the content of the next lemma. Lemma 4.3. Assume that there exists T > 1 such that ω ≤ T for all ω ∈ Ω. Assume also that det(ω − I) = 0 for all ω ∈ S µ . Let α < d. Then there exists κ > 0 such that for (4.5) Proof. The strategy of the proof is to derive two different upper bounds on using two different methods. One method generates a bound that is good for small values of i while the other method gives a good bound for large values of i. Using the two in combination gives the upper bound in (4.5).
4.1.1. Method 1. Letω ∈ Ω ×AE and for notational simplicity set For s > 0 we look at A point x ∈ X can be written as x = y + d for some y ∈ [0, 1] d . Multiplication by ω gives Assume that x ∈ E ω s . Then Rearranging (4.6) we get where we used that det(ω − I) = 0. So We see that (ω − I) −1 d can at most have finitely many points in [0, 1] d so the measure must be bounded from above by a scalar multiple of m (ω − I) −1 B 1 s . To estimate the measure we first see that Since det(ω − I) = 0 and ω has integer entries we see that |det(ω − I)| ≥ 1. Then To find an upper bound on the number of copies of (ω − I) −1 B 1 s in [0, 1] d , first notice that where A is some integer matrix. This implies that So the integer lattice d will at most be contracted by the factor det(ω − I) in all d directions. This means that By assumption ω ≤ T for all ω ∈ Ω. So for ω ∈ Ω i it follows simply by multiplying matrices that ω ≤ (dT ) i . SetT = dT . By definition of the determinant we then see that Multiplying the number of sets by the measure of each set we get Finally, as the upper bound is independent of ω = L i (ω), integrating over Ω ×AE is trivial and so

Method 2.
Let againω ∈ Ω ×AE and L i (ω) = ω. Again, for s > 0 we look at the set The idea of how to estimate its measure is to find a set, which contains E ω s , and whose measure is easier to compute. Think of X as the d-cube [0, 1] d and partition this into sub-cubes of the form Let Cj denote the cube corresponding to the vectorj. Clearly x ∈ X : ωx ∈ B 1 s (x) = j ∈J x : x ∈ Cj, ωx ∈ B 1 s (x) .
In particular this means that |x k − y k | < 1 s for all k ∈ {1, . . . , d}. Assume further that x ∈ Cj. Then for every k, x k ∈ j k s , j k +1 s so we must have y k ∈ j k s − 1 s , j k +1 s + 1 s implying that ωx ∈ C + j . So we have x ∈ X : ωx ∈ B 1 s (x) ⊂ j ∈J x : x ∈ Cj, ωx ∈ C + j .
Taking measures we get Integrating over Ω ×AE we get, where A is the averaging operator. Performing the analogous calculation as in (3.5) we get where λ ∈ (0, 1). This gives us Now, recall that there were s d sub-cubes in the partition of [0, 1] d so instead of summing over allj ∈ J, we may multiply by s d to finally get We can write the above idea as for some K ∈ AE. SinceT > 1 we can estimate the first sum by For the second sum we have Choose K = δ log s where δ > 0 is some constant to be determined. Inserting this we get The estimate as a whole must be polynomially decreasing in s, so we need all exponents to be negative. This is true for α − d by assumption and for δ log λ since λ ∈ (0, 1). Also, by choosing δ > 0 sufficiently small we get that d 2 (δ logT ) − d < 0. Pick δ such that this inequality is satisfied and set κ := min(|α −d|, |δ log λ|, |d 2 (δ logT ) −d|). We then conclude that We can now conclude on the proof of Theorem 1.1. In Lemma 4.1 we proved that the correct scaling sequence was u n = r + 1 d log n. Lemma 4.3 and 4.2 together prove that Condition D ′ (u n ) is satisfied under the assumptions of Theorem 1.1. Condition D(u n ) was proven already in Lemma 3.9. This means that all assumptions of Theorem 2.1 have been satisfied and so Theorem 1.1 follows.
Proof of Corollary 1.2. We want to prove the logarithm law without the assumptions on Ω and S µ made in Lemma 4.3 hence we cannot apply Theorem 1.1 directly. However, the proof of Lemma 3.7 works for the random walk on the torus as well. In this case the role of the k-DL assumption is played by the fact that P(ξ 0 > u n ) = 1 n V d e −dr which we derived in the proof of Lemma 4.1.
The analogue of Lemma 3.7 for closest returns on the torus implies that the conclusion of Theorem 3.3, Corollary 3.10 and Corollary 3.13 holds for closest returns on the torus. In particular, Corollary 3.13 then implies Corollary 1.2.