Dirichlet’s theorem on diophantine approximation and homogeneous flows

We show that for any $\epsilon<1$ and any $\mathcal{T}$ `drifting away from walls', Dirichlet's Theorem cannot be $\epsilon$-improved along $\mathcal{T}$ for Lebesgue almost every system of linear forms $Y$ (see the paper for definitions). In the case $m = 1$ we also show that for a large class of measures $\mu$ there is $\epsilon_0>0$ such that for any drifting away from walls $\mathcal{T}$, any $\epsilon<\epsilon_0$, and for $\mu$-almost every $Y$, Dirichlet's Theorem cannot be $\epsilon$-improved along $\mathcal{T}$. These measures include natural measures on sufficiently regular smooth manifolds and fractals. Our results extend those of several authors beginning with the work of Davenport and Schmidt done in late 1960s. The proofs rely on a translation of the problem into a dynamical one regarding the action of a diagonal semigroup on the space $\text{SL}_{m+n}(\mathbb{R})/\text{SL}_{m+n}(\mathbb{Z})$.


INTRODUCTION
Let m, n be positive integers, and denote by M m,n the space of m × n matrices with real entries. Dirichlet's Theorem (hereafter abbreviated by 'DT') on simultaneous diophantine approximation states that for any Y ∈ M m,n (viewed as a system of m linear forms in n variables) and for any t > 0 there exist q = (q 1 , . . . , q n ) ∈ Z n {0} and p = (p 1 , . . . , p m ) ∈ Z m satisfying the following system of inequalities: Y q − p < e −t /m and q ≤ e t /n .
Here and hereafter, unless otherwise specified, · stands for the norm on R k given by x = max 1≤i ≤k |x i |. See [29] for a discussion of two ways of proving this theorem, due to Dirichlet and Minkowski respectively. Given Y as above and positive ε < 1, we will say that DT can be ε-improved for Y , and write Y ∈ DI ε (m, n), or Y ∈ DI ε when the dimensionality is clear from the context, if for every sufficiently large t one can find q ∈ Z n {0} and p ∈ Z m with (1.2) Y q − p < εe −t /m and q < εe t /n , that is, one can satisfy (1.1) with the right-hand side terms multiplied by ε (for convenience we will also replace ≤ in the second inequality by <). Also note that Y is called singular if Y ∈ DI ε for any ε > 0 (in other words, DT can be 'infinitely improved' for Y ).
The two papers [9,10] by H. Davenport and W. Schmidt give a few basic results concerning the properties defined above. For example, the following is proved there: THEOREM 1.1 ([10]). For any 1 m, n ∈ N and any ε < 1, the sets DI ε (m, n) have Lebesgue measure zero.
In other words, λ-generic systems of linear forms do not allow any improvement to DT (λ will denote Lebesgue measure throughout the paper).
Another question raised by Davenport and Schmidt concerns the possibility of improving DT for matrices with some functional relationship between entries. Specifically they considered row matrices (1.3) f(x) = x x 2 ∈ M 1,2 , and proved THEOREM 1.2 ([9]). For any ε < 4 −1/3 , the set of x ∈ R for which f(x) ∈ DI ε (1, 2) has zero Lebesgue measure.
In other words, generic matrices of the form (1.3) do not allow a sufficiently drastic improvement to DT. This result was subsequently extended by R. Baker, Y. Bugeaud, and others. Namely, for some other smooth submanifolds of R n they exhibited constants ε 0 such that almost no points on these submanifolds (viewed as row or column matrices) are in DI ε for ε < ε 0 . We will discuss the history in more detail in §4.
In the present paper we significantly generalize Theorems 1.1 and 1.2 by using a homogeneous dynamics approach and following a theme developed in the paper [21] which dealt with infinite improvement to DT, that is, with singular systems. Namely, we study the subject of improvement of the multiplicative version of DT. In what follows we fix m, n ∈ N and let k = m + n. Let us denote by a + 1 Even though the results of [10] are stated for matrices with one row or one column, i.e., for a vector or a single linear form, the proofs can be generalized to the setting of systems of linear forms. the set of k-tuples t = (t 1 , . . . , t k ) ∈ R k such that (1.4) t 1 , . . . , t k > 0 and m i =1 It is not hard to see that both Dirichlet's and Minkowski's proofs of DT easily yield the following statement: Now, given an unbounded subset T of a + and positive ε < 1, say that DT can i.e., (1.5) with the right-hand side terms multiplied by ε, have nontrivial integer is the 'central ray' in a + . Also say that Y is singular along T if it belongs to DI ε (T ) for every positive ε. The latter definition was introduced in [21] for T contained in an arbitrary ray in a + emanating from the origin (the context of diophantine approximation with weights) in the special case n = 1.
To state our first result we need one more definition. We write and say that T ⊂ a + drifts away from walls if In other words, the distance from t ∈ T to the boundary of a + is unbounded (hence, in particular, T itself is unbounded). Using Lebesgue's Density Theorem and an elementary argument which can be found e.g., in [5, Chapter V, §7] and dates back to Khintchine, one can show that for any m, n and T ⊂ a + drifting away from walls, DI ε (T ) has Lebesgue measure zero as long as ε < 1/2. However proving the same for any ε < 1 is more difficult, even in the case T = R considered by Davenport and Schmidt. In [10] they derived this fact from the density of generic trajectories of certain one-parameter subgroups on the space G/Γ, where (1.9) G = SL k (R) and Γ = SL k (Z) .
In §2.2 we show how the fact that DI ε (T ) has Lebesgue measure zero for any ε < 1 and any unbounded T ⊂ R follows from mixing of the G-action on G/Γ.
We also explain why mixing is not enough to obtain such a result for an arbitrary T ⊂ a + drifting away from walls, even contained in a single 'noncentral' ray, and prove the following multiplicative analogue of Theorem 1.1: For any T ⊂ a + drifting away from walls and any ε < 1, the set DI ε (T ) has Lebesgue measure zero.
In other words, given any T as above, DT in its generalized form (Theorem 1.3) cannot be improved for generic systems of linear forms. Theorem 1.4 is derived from the equidistribution of translates of certain measures on G/Γ (Theorem 2.2). Our proof of this theorem, described in §2, is a modification of arguments used in analyzing epimorphic groups [37,33]. It relies on S. G. Dani's classification of measures invariant under horospherical subgroups, and the 'linearization method' developed by Dani, G. A. Margulis, M. Ratner, J. Smillie, N. Shah and others. See also [17] for an alternative approach suggested to the authors by Margulis. Next, building on the approach of [14,21], we generalize Theorem 1.2, namely, consider measures other than Lebesgue. Here we will restrict ourselves to measures µ on R n ∼ = M 1,n and study diophantine properties of µ-almost all y ∈ R n interpreted as row vectors (linear forms); that is, we put m = 1 and k = n +1. The dual case of simultaneous approximation (n = 1, which was the context of [14] and [21]) can be treated along the same lines.
All measures on Euclidean spaces will be assumed to be Radon (locally finite regular Borel) measures. Generalizing the context of Theorem 1.2, we will consider measures on R n of the form f * ν, where ν is a measure on R d and f a map from R d to R n . Our assumptions on f and ν rely on definitions of: • measures ν which are D-Federer on open subsets U of R d , and • functions f : U → R which are (C , α)-good on U w.r.t. ν (here D,C , α are positive constants). We postpone the precise definitions, introduced in [16,14], until §3.2. Now given a measure ν on R d , an open U ⊂ R d with ν(U ) > 0 and a map f : R d → R n , say that a pair (f, ν) is • nonplanar on U if for any ball B ⊂ U centered in supp ν, the restrictions of 1, f 1 , . . . , f n to B ∩ suppν are linearly independent over R; in other words, if f(B ∩ supp ν) is not contained in any proper affine subspace of R n ; • (C , α)-good (resp., nonplanar) if for ν-a.e. x there exists a neighborhood U of x such that ν is (C , α)-good (resp., nonplanar) on U .
Similarly, we will say that ν is D-Federer if for ν-a.e. x ∈ R d there exists a neighborhood U of x such that ν is D-Federer on U .
In §3 we prove Note that ε 0 in the above theorem can be explicitly computed in terms of d , n,C , α, D, although the value that can be obtained seems to be far from optimal. In particular they are not as good as the values of ε 0 obtained in [9,11,4]. On the other hand the context of Theorem 1.5 is much more general than that of the aforementioned papers, as illustrated by examples considered in §4.
We remark that the above theorem does not require an additional condition where I ℓ stands for the ℓ×ℓ identity matrix. Since Γ is the stabilizer of Z k under the action of G on the set of lattices in R k , G/Γ can be identified with GZ k , that is, with the set of all unimodular lattices in R k . To highlight the relevance of the objects defined above to the diophantine problems considered in the introduction, note that i.e., K ε is the collection of all unimodular lattices in R k which contain no nonzero vector of norm smaller than ε. By Mahler's compactness criterion (see e.g., [27, Chapter 10]), each K ε is compact, and for each compact K ⊂ G/Γ there is an ε > 0 such that K ⊂ K ε . Note also that K ε is empty if ε > 1 by Minkowski's Lemma, and has nonempty interior if ε < 1.
Then, using (2.1), it is straightforward to see that the system (1.6) has a nonzero integer solution if and only if g tτ (Y ) ∉ K ε . We therefore arrive at PROPOSITION 2.1. For Y ∈ M m,n , 0 < ε < 1 and unbounded T ⊂ a + , one has Y ∈ DI ε (T ) if and only if g tτ (Y ) is outside of K ε for all t ∈ T with large enough norm. Equivalently, In particular, Y is singular along T if and only if the trajectory {g tτ (Y ) : t ∈ T } is divergent (i.e., eventually leaves K ε for any ε > 0). The latter observation was first made in [6, Proposition 2.12] for T = R, and then in [12,Theorem 7.4] for an arbitrary ray in a + .

Lebesgue measure and uniform distribution.
Recall that one of the goals of this paper is to show that, whenever ε < 1 and T ⊂ a + drifts away from walls, λ-a.e. Y ∈ M m,n does not belong to DI ε (T ) (Theorem 1.4). The latter, in view of the above proposition, amounts to showing that the set {t ∈ T : g tτ (Y ) ∈ K ε } is unbounded for λ-a.e. Y ∈ M m,n . In this section we will prove a stronger statement. Here and hereafter 'vol' stands for the G-invariant probability measure on G/Γ.
for all z ∈ L and t ∈ a + , ⌊t⌋ ≥ T .
Let us denote by λ z,B the pushforward of 1 The above theorem asserts that g t -translates of λ z,B weak* converge to vol as ⌊t⌋ → ∞, and the convergence is uniform in z when the latter is restricted to a compact subset of G/Γ. By taking z to be the standard lattice Z k ∈ G/Γ and approximating K ε (which has nonempty interior and boundary of measure zero for any ε < 1) by continuous functions on G/Γ, from Theorem 2.2 one obtains that This immediately rules out the existence of T ⊂ a + drifting away from walls and B ⊂ M m,n of positive Lebesgue measure such that the set {t ∈ T : g tτ (Y ) ∈ K ε } is bounded for any Y ∈ B . Thus Theorem 1.4 follows from Theorem 2.2.
Note that the conclusion of Theorem 2.2 is not new for T being a subset of the 'central ray' R defined in (1.7). Indeed, one can immediately see that the group is expanding horospherical (see e.g., [18,Chapter 1] for the definition) with respect to g t for any t ∈ R. In this case Theorem 2.2 coincides with [15, Proposition 2.2.1], where it was deduced from mixing of the G-action on G/Γ, using an argument dating back to the Ph.D. Thesis of Margulis [23]. On the other hand, H is strictly contained in the expanding horospherical subgroup relative to g t for any t ∈ a + R, thus the aforementioned argument does not prove Theorem 2.2.
In the remainder part of this section we show how to bypass this difficulty. See also [17] for an alternative proof.

Expanding vectors along cones.
In this section we will discuss an important representation-theoretic property of the pair (a + , H ).

representation (of algebraic groups) on a finite-dimensional normed vector space V without nonzero fixed vectors, and let
Then there are positive c, c 0 such that for any v ∈ V H and t ∈ a + one has Proof. Let us denote by A the group of positive diagonal matrices in G, let a be its Lie algebra, and let h be the Lie algebra of H . Since A normalizes H , V H is a ρ(A)-invariant subspace, and we may write where Ψ is a finite set of weights (linear functionals on a), such that is nonzero for any χ ∈ Ψ. There is no loss of generality in assuming that · is the sup-norm with respect to a basis of ρ(A)-eigenvectors; thus it suffices to show that inf t∈a + χ(t)/⌊t⌋ is positive for any χ ∈ Ψ.
and for each (i , j ) ∈ I let G 0 = G 0 (i , j ) be the Lie subgroup of G whose Lie algebra g 0 is generated by E i j , E j i , F i j (here E r s stands for the k × k matrix with 1 in position (r, s) and 0 elsewhere, and F r s stands for E r r − E ss ). Then G 0 is a copy of , by the representation theory of sl 2 (see e.g., Since V contains no nonzero vectors fixed by ρ(G), and since the group generated by {G 0 (i , j ) : (i , j ) ∈ I } is equal to G, there is at least one (i 0 , j 0 ) for which Any Y ∈ a + can be written as a linear combination of F i j : (i , j ) ∈ I with nonnegative coefficients, and since t 0 = t − ⌊t⌋F i 0 j 0 ∈ a + , we have: finishing the proof.
To put this result in context, recall that a subgroup L 1 of an algebraic group L 2 is said to be epimorphic in L 2 if for any representation ρ : L 2 → GL(V ), any vector fixed by ρ(L 1 ) is also fixed by ρ(L 2 ). For example, in our present notation, AH is epimorphic in G. The 'cone lemma' ([37, Lemma 1]) shows that if TU is epimorphic in L, where U is unipotent, T is diagonalizable and normalizes U , and L is generated by unipotents, then for any ρ : [37] is nonconstructive. The true meaning of Lemma 2.3 is a precise determination of a cone which works for all representations ρ in the case T = A, U = H .
The next proposition is a consequence of Lemma 2.3.

PROPOSITION 2.4. Let V , ρ, c be as in Lemma 2.3, and let B be a neighborhood of
0 in M m,n . Then there exists b > 0 such that for any v ∈ V and t ∈ a + one has v ∈ V and t ∈ a + one can write 2.4. Recurrence to compact sets. In order to establish the equidistribution of g t -translates of λ z,B as ⌊t⌋ → ∞, one needs to at least show the existence of one limit point (which is not guaranteed a priori since G/Γ is not compact). In other words, there must exist a compact subset K of G/Γ such that λ z,B g −1 t (K ) is big enough whenever ⌊t⌋ is large. We show in this section how to construct such a compact set using Proposition 2.4 and a theorem of Dani and Margulis.
Denote by g the Lie algebra of G, let V := dim(G)−1 j =1 j g, and let ρ : G → GL(V ) be the representation obtained by acting on V via the adjoint representation and its exterior powers. Note that V has no nonzero G-fixed vectors since G is simple. For any proper connected Lie subgroup W of G we will denote by p W an associated vector in V .
We have the following result of Dani and Margulis, see [31, Theorem 2.2] for a more general statement: PROPOSITION 2.5. Let G, Γ and π : G → G/Γ be as above. Then there exist finitely many closed subgroups W 1 , . . . ,W ℓ of G such that π(W i ) is compact and ρ(Γ)p W i is discrete for each i ∈ {1, . . . , ℓ}, and the following holds: for any positive α, ε there is a compact K ⊂ G/Γ such that for any g ∈ G, t ∈ a and bounded convex open B ⊂ M m,n , one of the following is satisfied: Proof. Let W 1 , . . . ,W ℓ be as in Proposition 2.5, and for any compact L ⊂ G/Γ consider It is positive since L is compact and ρ(Γ)p W i is discrete. Proposition 2.4 then implies that for any neighborhood B of 0 in M m,n there exist constants b, c such that for any g ∈ π −1 (L), γ ∈ Γ, i = 1, . . . , ℓ and t ∈ a + , one has Now take an arbitrary ε > 0 and α = 1, and choose K according to Proposition 2.5. Then it follows from (2.8) that for any B there exists T such that whenever ⌊t⌋ ≥ T and π(g ) ∈ L, the second alternative of Proposition 2.5 must hold.
2.5. The linearization method. As remarked in the introduction, our proof of Theorem 2.2 relies on the work of many mathematicians. Although we do not require Ratner's results on the classification of measures invariant under unipotent flows (the earlier results of Dani on horospherical subgroups are sufficient for us), we do use the linearization method developed by many authors following Ratner's work. These results are described in detail in [18]. Since our argument will be close to arguments in [31,33] we will rely on the notation and results as stated in [33], where additional references to the literature may be found. Let H be the set of all closed connected subgroups W of G such that W ∩ Γ is a lattice in W , and the subgroup of W generated by its one-parameter unipotent subgroups acts ergodically on W /(W ∩Γ). This is a countable collection. For any W ∈ H , we define Recall that the subgroup H of G is horospherical. Dani [7] classified all the measures on G/Γ invariant under the H -action. The following is a consequence of Dani's classification and ergodic decomposition: PROPOSITION 2.7. Let µ be a finite H -invariant measure on G/Γ which is not equal to vol. Then µ π (N (W, H )) > 0 for some W ∈ H which is a proper subgroup of G.
For W ∈ H let V W be the span of ρ (N (W, H ))p W in V and let N 1 The next proposition uses the representation ρ defined in §2.4 to detect orbits which stay close to π N (W, H ) for some W ∈ H . The idea has a long history and is used in a similar context by Dani and Margulis in [8].
2.6. Proof of Theorem 2.2. Take a sequence of points z n ∈ L and a sequence t n ∈ a + drifting away from walls. It follows from Corollary 2.6 that the sequence of translated measures (g t n ) * λ z n ,B is weak* precompact, that is, along a subsequence we have g t n λ z n ,B → µ, where µ is a Borel measure on G/Γ. Our goal is thus to show that µ = vol.
The hypothesis about drifting away from walls implies that for any h ∈ H we have g −1 t n hg t n → e, where e is the identity element in H . A simple computation (see [33,Claim 3.2]) shows that µ is H -invariant. By Proposition 2.7, if µ = vol, there is a proper subgroup W ∈ H such that µ π(N (W, H )) > 0. Making W smaller if necessary we can assume that µ π(N * (W, H )) > 0. Let C ⊂ π N * (W, H ) be compact with µ(C ) > 0, and put ε := µ(C )/2. Let L ⊂ G be a compact subset such that π( L) = L, and let g n ∈ π −1 (z n ) ∩ L. Applying Proposition 2.8, we find that there is a compact D ⊂ V W with the following property. For each n, let D n+1 ⊂ D n be a compact neighborhood of D in V W such that n D n = D. Then there is an open neighborhood C n of C in G/Γ such that one of the following holds: 1. there is v n ∈ ρ(g n Γ)p W such that ρ(g t n τ(B ))v n ⊂ D n ; 2. (g t n ) * λ z n ,B (C n ) < ε.
Since g t n λ z n ,B → µ and C ⊂ C n , we find that λ z n ,B (g −1 t n C n ) > µ(C )/2 = ε for all sufficiently large n, so condition (2) above does not hold. Therefore a bounded subset of V W . On the other hand, since L is compact and ρ(Γ)p W is discrete, we have inf n v n > 0, hence sup Y ∈B ρ g t n τ(Y ) v n → ∞ by (2.6), a contradiction.
3. THEOREM 1.5 AND QUANTITATIVE NONDIVERGENCE 3.1. A sufficient condition. The second goal of this paper is to show that sets DI ε (T ) are null with respect to certain measures µ on M m,n other than Lebesgue. We will use Proposition 2.1 to formulate a condition sufficient for µ DI ε (T ) = 0 for fixed ε > 0 and all unbounded T ⊂ a + . Similarly to the context of Theorem for any t ∈ a + with t ≥ s. Then F * ν DI ε (T ) = 0 for any unbounded T ⊂ a + .
Proof. Since T is unbounded, it follows from the assumption of the proposition that for any ball B ⊂ U and any positive t one has ν t∈T , t ≥t Therefore, ν {x ∈ B : F (x) ∈ DI ε (T )} ≤ cν(B ) by (2.3). In view of a density theorem for Radon measures on Euclidean spaces [24, Corollary 2.14], this forces F −1 DI ε (T ) to have ν-measure zero.

3.2.
A quantitative nondivergence estimate. The proof of Theorem 2.2 given in §2 relies on Corollary 2.6, which is a quantitative nondivergence estimate for translates of unipotent trajectories. This kind of estimate has its origins in the proof by Margulis [22] that orbits of unipotent flows do not diverge, see [18] for a historical account. During the last decade, starting from the paper [16], these techniques were transformed into a powerful method yielding measure estimates as in We need to introduce some more notation in order to state a theorem from [14]. Let W := the set of proper nonzero rational subspaces of R k .
From here until the end of this section, we let · stand for the Euclidean norm on R k , induced by the standard inner product 〈·, ·〉, which we extend from R k to its exterior algebra. For V ∈ W and g ∈ G, let where {v 1 , . . . , v j } is a generating set for Z k ∩ V ; note that ℓ V (g ) does not depend on the choice of {v i }.

Checking (i) and (ii).
At this point we restrict ourselves to the context of Theorem 1.5, that is we consider measures on R n ∼ = M 1,n of the form f * ν, where ν is a measure on R d and f = ( f 1 , . . . , f n ) is a map from an open U ⊂ R d with ν(U ) > 0 to R n . In order to combine Proposition 3.1 with Theorem 3.2, one needs to work with functions ℓ V •h t for each V ∈ W , where (3.3) h t := g t • τ • f , and find conditions sufficient for the validity of (i) and (ii) of Theorem 3.2 for large enough t ∈ T . The explicit computation that is reproduced below first appeared in [16]. Let e 0 , e 1 , . . . , e n be the standard basis of R n+1 , and for let e I := e i 1 ∧ · · · ∧ e i j ; then {e I | #I = j } is an orthonormal basis of j (R n+1 ). Similarly, it will be convenient to put t = (t 0 , t 1 , . . . , t n ) ∈ a + where (3.5) Then one immediately sees that for any I as in (3.4), (3.6) e I is an eigenvector for g t with eigenvalue e t I , where We remark that in view of (3.5), e t I is not less than 1 for any t ∈ a + and 0 ∈ I .
Moreover, let ℓ be such that t ℓ = max i =1,...,n t i . Then Now take V ∈ W , choose a generating set {v 1 , . . . , v j } for Z k ∩ V , and expand w := v 1 ∧ · · · ∧ v j with respect to the above basis by writing w = I ⊂{0,...,n}, #I = j w I e I ∈ j (Z k ) {0}. Then Here is an immediate implication of the above formula: It is also clear from (3.9) that τ(·)w ≡ w if the subspace V represented by w contains e 0 (in other words, if w I = 0 whenever 0 ∉ I ). In this case for any I ∋ 0 all the coefficients c i in (3.10) with i ≥ 1 are equal to 0, and |c 0 | = e t I |w I | ≥ 1 as long as w I = 0.
Then there exists s > 0 such that for any t ∈ a + with t ≥ s and any ε < 1, one has Proof. We will apply Theorem 3.2 with h = h t as in (3.3). Take V ∈ W and, as before, represent it by w = v 1 ∧ · · · ∧ v j , where {v 1 , . . . , v j } is a generating set for Z k ∩V . From Lemma 3.3 and assumption (1) above it follows that for any t, each coordinate of h t (·)w is (C , α)-good onB with respect to ν. Hence the same, with  (2) implies the existence of δ > 0 (depending on B ) such that c 0 + n i =1 c i f i ν,B ≥ δ for any c 0 , c 1 , . . . , c n with max|c i | ≥ 1. Using Lemma 3.4 and the remark preceding it, we conclude that either ℓ V • h t ν,B ≥ 1 (in the case e 0 ∈ V ) or ℓ V • h t ν,B ≥ δe t /n (in the complementary case). So condition (ii) of Theorem 3.2 holds with ρ = 1 whenever t is at least s := −n log δ.
We can now proceed with the Proof of Theorem 1.5. It suffices to show that for ν-a.e. x there exists a ball B centered at x such that (3.11) ν {x ∈ B : f(x) ∈ DI ε (T )} = 0 .
We remark that the constant C 1 from Theorem 3.2, and hence C 2 from Theorem 3.5 and ε 0 from Theorem 1.5, can be explicitly estimated in terms of the input data of those theorems, see [16,3,19,13]. However we chose not to bother the reader with explicit computations, the reason being that in the special cases previously considered in the literature our method produces much weaker estimates. More on that in the next section.

EXAMPLES AND APPLICATIONS
4.1. Polynomial maps. A model example of functions that are (C , α)-good with respect to Lebesgue measure is given by polynomials: it is shown in [16,Lemma 3.2] that any polynomial of degree ℓ is (C , α)-good on R with respect to λ, where α = 1/ℓ and C depends only on ℓ. The same can be said about polynomials in d ≥ 1 variables, with α = 1/d ℓ and C depending on ℓ and d . Obviously λ is 3 d -Federer on R d . Thus, as a corollary of Theorem 1.5, we obtain the existence of ε 1 = ε 1 (n, d , ℓ) such that whenever ε < ε 1 , (1.10) holds for ν = λ and any polynomial map f = ( f 1 , . . . , f n ) of degree ℓ in d variables such that 1, f 1 , . . . , f n are linearly independent over R. This in particular applies to f(x) = (x, . . . , x n ), a generalization of the context of Theorem 1.2 considered by Baker in [1] for n = 3 and then by Bugeaud for an arbitrary n. Note that it is proved in [4] that (x, . . . , x n ) is almost surely not in DI ε for ε < 1/8. Our method, in comparison, shows that (x, . . . , x n ) is almost surely not in DI ε (T ) for any unbounded T ⊂ a + and ε < 1/n n (n + 1) 2 2 n 2 +n .
Improving these results to any ε < 1 is a natural and challenging problem. Recently [32] the following was obtained: The proof follows a similar strategy as our proof of Theorem 2.2, but is considerably more difficult. Repeating the argument of §2.2, one obtains: Then We remark that [16,Lemma 3.3] instead of (4.1) assumed ∀ multiindex β with |β| ≤ ℓ , and produced the same conclusion as the above lemma, with A i = A and a i = a for all i .
Sketch of Proof. The case d = 1 can be proved by a verbatim repetition of the argument from [16] -it is easy to verify that a bound on just the top derivative is enough for the proof. The general case then follows using [19,Corollary 2.3].
Recall that a map f from U ⊂ R d to R n is called ℓ-nondegenerate at x ∈ U if partial derivatives of f at x up to order ℓ span R n , and ℓ-nondegenerate if it is ℓnondegenerate at λ-a.e. x ∈ U . Arguing as in the proof of [16,Proposition 3.4], from the above lemma one deduces This implies that if f : U → R n is ℓ-nondegenerate, then (f, λ) is (C , 1/d ℓ)-good for any C > C d,ℓ . Also, nonplanarity is clearly an immediate consequence of nondegeneracy. Thus, by Theorem 1.5, there exists ε 2 = ε 2 (n, d , ℓ) such that whenever ε < ε 2 , (1.10) holds for ν = λ and any ℓ-nondegenerate f : U → R n , U ⊂ R d . This was previously established in the case d = 1, l = n = 2 in [2], with the additional assumption that f be C 3 rather than C 2 . Also, M. Dodson, B. Rynne, and J. Vickers considered C 3 submanifolds of R n with 'two-dimensional definite curvature almost everywhere', a condition which implies 2-nondegeneracy (and requires the dimension of the manifold to be at least 2). It is proved in [11] that almost every point on such a manifold is not in DI ε for ε < 2 − n n+1 . We also remark that the result of [32] extends to nondegenerate analytic curves in R 2 .

Friendly measures. The class of friendly measures was introduced in [14]
, the word 'friendly' being an approximate abbreviation of 'Federer, nonplanar and decaying'. Using the terminology of the present paper, we can define this class as follows: a measure µ on R n is friendly if for µ-a.e. x ∈ R n there exist a neighborhood U of x and D,C , α > 0 such that µ is D-Federer on U , and (Id, µ) is both (C , α)-good and nonplanar 3 on U . In order to apply Theorem 1.5 we would like to use somewhat more uniform version: given C , α, D > 0, define µ to be (D,C , α)-friendly if for µ-a.e. x ∈ R n there exists a neighborhood U of x such that µ is D-Federer on U and (Id, µ) is both (C , α)-good and nonplanar on U . In view of Theorem 1.5, almost all points with respect to those measures are not in DI ε (T ) for any T drifting away from walls and small enough ε, where ε depends only on C , D, α.
As discussed in the previous subsections, smooth measures on nondegenerate submanifolds of R n satisfy the above properties. Furthermore, the class of friendly measures is rather large; many examples are described in [14,20,35,36,34]. A notable class of examples is given by limit measures of finite irreducible systems of contracting similarities [14, §8] (or, more generally, selfconformal contractions, [35]) of R n with the open set condition. These measures were shown to be (D,C , α)-friendly for some D,C , α, thus satisfy the conclusions of Theorem 1.5. 4.4. Improving DT along nondrifting T . Comparing Theorem 1.4 with Theorem 1.5, one sees that the former has more restrictive assumptions, namely T has to drift away from walls as opposed to just be unbounded. This is not an accident: the drift condition is in fact necessary for the main technical tools of the proof, that is, equidistribution results of §2.
To see this, for simplicity let us restrict ourselves to the case m = 2, n = 1; the argument for the general case is similar.
ℓ ∈ N} ⊂ a + is unbounded but does not satisfy (1.8); that is, either {t (ℓ) 1 } or {t (ℓ) 2 } is bounded. Without loss of generality, and passing to a subsequence, we can assume that t (ℓ) 1 is convergent as ℓ → ∞; that is, for any ℓ we can write t (ℓ) = s (ℓ) + u (ℓ) , where (4.2) s (ℓ) = (0, s (ℓ) , s (ℓ) ), s (ℓ) → ∞ as ℓ → ∞ , and g u (ℓ) → g ∈ SL 3 (R) as ℓ → ∞. Note that for any ℓ and any y ∈ R 2 , g s (ℓ) τ(y) belongs to the group a semidirect product of SL 2 (R) and R 2 . Thus a trajectory of the form {g t τ(y)Z 3 : t ∈ T } must approach g H Z 3 , which is a proper submanifold inside the space of lattices in R 3 (in fact, H Z 3 is the set of lattices in R 3 containing e 1 = (1, 0, 0) as a primitive vector), and therefore is never dense. We conclude that it is not possible to prove the analogue of Theorem 2.2 with ⌊t⌋ replaced by t . Note that this a priori does not rule out proving Theorem 1.4 with a relaxed assumption on T : recall that our goal was to make almost every orbit return to a specific set K ε , not just any nonempty open set. And indeed, it is easy to show, using induction and Fubini's Theorem, that the set DI ε (T ) is λ-null whenever ε < 1 and for every i , {t i : t ∈ T } is either unbounded or converges to 0.
However, in general Theorem 1.4 is false if one just assumes that T is unbounded. Here is a simple counterexample, also in the case m = 2, n = 1. Write t (ℓ) = s (ℓ) + u, where s (ℓ) is as in (4.2) and u = (u, u, 0). Then for any ℓ and any y, g s (ℓ) τ(y)Z 3 belongs to H Z 3 , that is, contains e 1 as a primitive vector. Therefore for any t ∈ T and any y, g t τ(y)Z 3 contains e u e 1 as a primitive vector. Now suppose that 1/ε 2 < e u < 2ε. Let B ε be given by {|x 1 | < e u , |x 2 | < ε, |x 3 | < ε}. It is a convex centrally symmetric domain of volume greater that 8, therefore, by Minkowski's Lemma, it must have a nonzero vector v ∈ g t τ(y)Z 3 . However, since e u < 2ε, the sup-norm distance of v to either e u e 1 or −e u e 1 is less than ε, and it is positive since ±e u e 1 ∉ B ε . This proves that g t τ(y)Z 3 is always disjoint from K ε ; thus, under those assumptions on T and ε, the set DI ε (T ) is equal to R 2 .
Similar counterexamples exist in any dimension. Still, it seems plausible that Theorem 1.4 remains valid if (1.8) is replaced by an assumption that lim inf ℓ→∞ ⌊t⌋ is large enough. The proof of this requires equidistribution results for more general homogeneous spaces (specifically, spaces similar to H Z 3 in the above example). 4.5. Weighted badly approximable systems. We conclude the paper with another application of Theorem 2.2. Let g t be a one-parameter subgroup of G. Suppose a subgroup H of G normalized by g t is such that (a) the conjugation by g t , t > 0, restricted to H is an expanding automorphism of H , and (b) g ttranslates of the leaves H x, x ∈ G/Γ, become equidistributed as t → ∞, with the convergence being uniform as x ranges over compact subsets of G/Γ. These conditions were shown in [15] to imply that for any x ∈ G/Γ, the set h ∈ H : the trajectory {g t hx : t > 0} is bounded is thick (that is, has full Hausdorff dimension at every point). When H is as in (2.5) and {g t : t > 0} is any one-parameter subsemigroup of G contained in exp(a + ), both (a) and (b) are satisfied, the latter being a consequence of Theorem 2.2.
Let us now take g t of the form Then it is known [12] that the trajectory {g tτ (Y ) : t > 0} is bounded in G/Γ if and only if Y is (r, s)-badly approximable, which by definition means The components of vectors r, s should be thought of as weights assigned to linear forms Y i and integers q j . Thus one can obtain a weighted generalization of Schmidt's theorem [28] on the thickness of the set of badly approximable systems of linear forms: This was previously established by A. Pollington and S. Velani [26] in the case n = 1.