The Landau Equation for Maxwellian molecules and the Brownian Motion on SO_R(N)

In this paper we prove that the spatially homogeneous Landau equation for Maxwellian molecules can be represented through the product of two elementary processes. The first one is the Brownian motion on the group of rotations. The second one is, conditionally on the first one, a Gaussian process. Using this representation, we establish sharp multi-scale upper and lower bounds for the transition density of the Landau equation, the multi-scale structure depending on the shape of the support of the initial condition.


Statement of the problem and existing results
The spatially homogeneous Landau equation for Maxwellian molecules is a common model in plasma physics. It can be obtained as a certain limit of the spatially homogeneous Boltzmann equation for N dimensional particles subject to pairwise interaction, when the collisions become grazing and when the interaction forces between particles at distance r are order 1/r 2N +1 (see Villani [24] and Guérin [15]).

1.2)
Here, a is an N × N nonnegative and symmetric matrix that depends on the collisions between binary particles. It is given by (up to a multiplicative constant) where Id N denotes the identity matrix of size N , and v ⊗ v = vv ⊤ , v ⊤ denoting the transpose of v, v being seen as a column vector in R N . The unknown function f (t, v) represents the density of particles of velocity v ∈ R N at time t ≥ 0 in a gas. It is assumed to be independent of the position of the particles (spatially homogeneous case). The density f (t, v) being given, the nonlocal operator L can be seen as a standard linear Fokker-Planck operator, with diffusion matrix a(t, v) = R N a(v − v * )f (t, v * )dv * and with drift b(t, v) = −(N − 1) R N (v − v * )f (t, v * )dv * . Such a reformulation permits to approach the Landau equation by means of the numerious tools that have been developed for linear diffusion operators. As a key fact in that direction, the diffusion matrix a can be shown to be uniformly elliptic for a wide class of initial conditions. This suggests that the solution f (t, v) must share some of the generic properties of non-degenerate diffusion operators.
Such a remark is the starting point of the analysis initiated by Villani in [25,Proposition 4]. Therein, it is proved that, whenever the initial condition f (0, v) is nonnegative and has finite mass and energy, the Landau PDE (1.1) admits a unique solution, which is bounded and C ∞ (R N ) in positive time. Moreover, [25,Proposition 9] ensures that the solution satisfies the lower Gaussian bound for some C t > 0 and δ t > 0. The values of the constants C t and δ t are specified in Desvillettes and Villani [5,Theorem 9(ii)] when N = 3, under the additional condition that f (0, v) has finite entropy and is bounded from below by a strictly positive constant on a given ball. The lower bound (1.3) is then established with C t = 1 and δ t = b 0 t+c 0 /t. This proves that, in finite time, the rate of propagation of the mass to the infinity is at least the same as for the heat equation. The key argument in [5] is to prove that the spectrum ofā(t, v) is uniformly far away from zero, so that the mass can be indeed diffused to the whole space. Anyhow, even if the lower bound (1.3) fits the off-diagonal decay of the heat kernel, it is worth mentioning thatā(t, v) does not enter the required framework for applying twosided Aronson's estimates for diffusion operators, see [1]. Indeed, the upper eigenvalue of a(t, v) can be shown to behave as |v| 2 when |v| is large. The matrixā(t, v) thus exhibits several scales when |v| tends to the infinity, which is the basic observation for motivating our analysis. Actually, a simple inspection will show that, for the same type of initial conditions as above, the quadratic form associated withā(t, v) has two regimes when |v| is large. Along unitary vectors parallel to v, the quadratic form takes values of order 1. Along unitary vectors orthogonal to v, it takes values of order |v| 2 . This suggests that the mass is spread out at a standard diffusive rate along radial directions, but at a much quicker rate along tangential directions. One of the main objective of the paper is to quantify this phenomenon precisely and to specify how it affects the lower bound (1.3), especially for highly anisotropic initial conditions. We also intend to discuss the sharpness of the bound by investigating the corresponding upper bound.
The strategy we have in mind is probabilistic. The starting point consists in deriving a probabilistic interpretation of the nonlinear operator L by means of a stochastic diffusion process (X t ) t≥0 interacting with its own distribution, in the spirit of McKean to handle Vlasov type equations (see Sznitman [22]). Actually, McKean-Vlasov representations of the Landau equation were already investigated in earlier works by Funaki [9,10,11,12] and more recently by Guérin [13,14]. Part of the analysis developed in these series of papers is based on a very useful trick for representing the square root of the matrix a, the square root of the diffusion matrix playing a key role in the dynamics of the stochastic process involved in the representation. In short, the key point therein is to enlarge the underlying probability space in order to identify the diffusive term with the stochastic integral of the root of a (and not the root of a) with respect to a two-parameter white noise process. Basing the representation on the root of a makes it more tractable since a(v) has a very simple geometric interpretation in terms of the orthogonal projection on the orthogonal v ⊥ of v. In this paper, we go one step forward into the explicitness of is to prove that large deviations of the process (Z t ) t≥0 play an essential role in the shape of the off-diagonal decay of the transition density. Precisely, because of that large deviations, we can show that, when the initial condition of the transition is restricted to compact sets, the off-diagonal decay of the transition density is not Gaussian but is a mixture of an exponential and a Gaussian regimes, see Theorem 2.12.
Besides the density estimates, we feel that our representation of the solution raises several questions and could serve as a basis for further investigations. Obviously, the first one concerns possible extensions to more general cases, when the coefficients include a hard or soft potential (so that molecules are no more Maxwellian) or when the solution of the Landau equation also depends on the position of the particle (and not only on its velocity). In the same spirit, we could also wonder about a possible adaptation of this approach to the Boltzmann equation itself. Finally, the representation might be also useful to compute the solution numerically, providing a new angle to tackle with the particle approach developed by Fontbona et al [6] and Carrapatoso [3] or Fournier [7]. We leave all these questions to further prospects.
The paper is organized as follows. Main results are detailed in Section 2. In Section 3, we give some preliminary estimates concerning the Brownian motion on SO N (R). Section 4 is devoted to the analysis of the non-degenerate case and Section 5 to the degenerate case.
A couple of processes (X , Y ) on (Ω, F, , and for all t ≥ 0, the following equation holds where σ is an N × N matrix such that σσ ⊤ = a, the symbol ⊤ standing from now on for the transposition. Roughly speaking, the connection with (1.1) can be derived by computing: thus identifying the local covariance in (2.1) with the diffusion matrixā. Existence and uniqueness of a solution to (2.1) has been discussed in [13].
The starting point of our analysis is the geometric interpretation of the covariance matrix where, for v = 0, Π(v) is the orthogonal projection onto v ⊥ . Indeed, the key observation is that a(v) also reads as the covariance matrix of the image of v by an antisymmetric standard Gaussian matrix of dimension N × N : The proof is just a consequence of the fact where we have used the Kronecker symbol in the second line. We derive the following result, which is at the core of the proof: Lemma 2.1. Given the process (Y t ) t≥0 , solution to Equation (2.1), consider the solution (X t ) t≥0 to the SDE Then, (X t ) t≥0 has the same law as (Y t ) t≥0 and thus as the solution of the Landau SDE.
Proof. The proof follows from a straightforward identification of the bracket (in time) of the martingale part withā(t, X t )dt.
The representation (2.4) is linear and therefore factorizes through the resolvent. Namely, The solution (X t ) t≥0 to (2.4) admits the following representation

5)
where letting the process (Z t ) t≥0 solves the SDE:
The proof follows from a straightforward application of Itô's formula, noticing that the bracket (i,k) ) as covariance. In particular, ((B i,j t ) 1≤i<j≤N ) t≥0 is a standard Brownian motion with values in R N (N −1)/2 . The matrix valued process B thus corresponds to the Brownian motion on the set A N (R) of antisymmetric matrices. Recalling that A N (R) is the Lie algebra of the special orthogonal group, this allows to identify (Z t ) t≥0 with the right Brownian motion on SO N (R) (see e.g. Chapter V in Rogers and Williams [20] and Chapter VII in Franchi and Le Jan [8]).

2.2.
Conditional representation of the transition density. Throughout the paper, we shall assume that the centering condition is in force. Actually, there is no loss of generality since, whenever E[ The main representation of the conditional density is then the following: Assume that X 0 is not a Dirac mass and is centered. Then for all t > 0, the conditional law of X t given X 0 = x 0 has a density, which can be expressed as The proof of Proposition 2.3 is postponed to Section 3. From the above expression of the (stochastic) covariance matrix C t , we introduce the (deterministic) matrix The matrix Λ s then plays a key role for the control of the non-degeneracy of the diffusion matrixā(s, v), which, by (2.2), reads Since, for all s ≥ 0, E[X s ] = E[X 0 ] = 0, we get that for all v ∈ R N : where we used that a is positive semidefinite for the last inequality. The behavior of Λ s can be summarized with the following result.
Proposition 2.4. Assume that X 0 is not a Dirac mass and is centered. Then, for any t > 0, and for all ξ ∈ R N , where for all (t, β) ∈ R + × [0, 1], and Proposition 2.4 will be proved in the next section. For any t > 0, it provides a lower bound for the spectrum of Λ t . There are two cases. If λ < 1, letting η := (1 − λ) ∧ (1 − 1/N ) > 0 (with the standard notations a ∧ b := min(a, b) and a ∨ b := max(a, b)), it holds that, for any t ≥ 0 and ξ ∈ R N , (2.14) so that Λ t is non-degenerate, uniformly in time and space. If λ = 1, i.e. X 0 is embedded in a line, then for any t > 0 and ξ ∈ R N , so that Λ t is non-degenerate in positive time, uniformly on any [ε, +∞) × R N , ε > 0. For t small, the lower bound for the spectrum behaves as 2(N − 1)t, so that Λ t degenerates in small time.

2.3.
Estimates in the non-degenerate case. When λ < 1, the spectrum of C t in (2.11) can be easily controlled since Z t Z −1 s = Z t Z ⊤ s ∈ SO N (R). In such a case, we then obtain from (2.10) the following first result for the conditional density of the Landau SDE: Theorem 2.6. Assume that X 0 is not a Dirac mass, is centered with variance 1, and its law is not supported on a line. Then, for all t > 0 and v ∈ R N , Remark 2.7. Observe that, since (Z s ) s≥0 defines an isometry, the off-diagonal cost |v − Z t x 0 | 2 may be rewritten |Z ⊤ t v − x 0 | 2 . This formulation may be more adapted than the previous one when integrating the conditional density with respect to the initial law of X 0 . Now, exploiting the Aronson like heat kernel bounds for the marginal density of the rotation process (Z t ) t≥0 , see e.g. Varopoulos et al. [23] or Stroock [21], we actually derive in Section 4 the following control: Theorem 2.8 (Explicit bounds for the conditional density). Under the assumptions of Theorem 2.6, there exists C := C(N ) ≥ 1 such that, for all t > 0, x 0 , v ∈ R N , If |x 0 | ∧ |v| ≤ 1, then δ t is equal to 1 and I can be chosen as I = |x 0 − v| 2 , which corresponds to a usual Gaussian estimate.
We stress the fact that the above bounds are sharp. The contribution in ||v| − |x 0 || 2 in I corresponds to a 'radial cost' and the contribution in |v/|v| − x 0 /|x 0 || 2 to a 'tangential cost'. The term (1 ∧ |v| ∧ |x 0 |) 2 reads as the inverse of the variance along tangential directions. It must be compared with the variance along tangential directions in a standard Gaussian kernel, the inverse of which is of order (|v|∧|x 0 |) 2 as shown in Remark 2.11 below. This says that, when |x 0 | and |v| are greater than 1, f x 0 (t, v) is superdiffusive in the tangential directions. This is in agreement with the observations made in Introduction: The non-Gaussian regime of the density for x 0 large occurs because of the super-diffusivity along iso-radial curves.
Anyhow, it is worth mentioning that the two-sided bounds become Gaussian when t tends to ∞. Indeed, noting that δ t → 1 as t → ∞ and that the tangential cost (1 ∧ |v| ∧ |x 0 |) 2 |v/|v| − x 0 /|x 0 || 2 is bounded by 4, (2.15) yields for t large enough (with respect to |x 0 |, uniformly in |v|) and for a new constant C (independent of |x 0 | and |v|). This coincides with the asymptotic behavior of the Ndimensional Gaussian kernel: In the Gaussian regime, the variance along the tangential directions is (|v| ∧ |x 0 |) 2 , which is less than |x 0 | 2 and which shows, in the same way as in (2.16), that the Gaussian tangential cost is also small in front of t, uniformly in v. However, some differences persist asymptotically when |x 0 | is large. Due to the superdiffusivity of the tangential directions in the Landau equation, the Landau tangential cost decays faster than the Gaussian one. Intuitively, the reason is that the 'angle' of the Landau process (X t ) t≥0 reaches the uniform distribution on the sphere at a quicker rate than in the Gaussian regime. Clearly, the fact that the system forgets the initial angle of x 0 in long time could be recovered from Theorem 2.6 by replacing (at least formally) Z t by a uniformly distributed random matrix on SO N (R). Of course, when the initial mass is already uniformly distributed along the spheres centered at 0, the marginal density of (X t ) t≥0 already behaves in finite time as if the transition density was Gaussian. We illustrate this property in the following corollary (the proof of which is deferred to the next section): Corollary 2.9. Assume that X 0 admits an initial density of the radial form: for some Borel function f : R + → R + . Then, we can find a constant C := C(N ) ≥ 1 such that, for all t > 0, 1 where g N denotes the standard Gaussian kernel of dimension N and where f t is the solution of the Landau equation, which here reads To conclude this subsection, notice that the Gaussian regime (that corresponds to |x 0 | ∧ |v| ≤ 1 in the statement of Theorem 2.8) can be derived from (2.15) using the following Lemma and Remark.
by orthogonal projection on a closed convex subset, and the lower bound follows. By convexity, we obtain the upper bound.
Remark 2.11. Let us consider two given points In particular, when |x 0 | ≤ 1 we derive from (2.15) in Theorem 2.8 the usual two-sided Gaussian estimates. Now, if |v| ≤ |x 0 | and |v| ≤ 1, this still holds by symmetry.

2.4.
Estimates in the degenerate case. We now discuss the case when the initial condition lies in a straight line, which by rotation invariance can be assumed to be the first vector e 1 of the canonical basis. By Proposition 2.4, we already know that the matrix Λ t (see (2.12)) driving the ellipticity of the covariance matrix C t (see (2.11)) becomes non-degenerate in positive time. This says that, after a positive time t 0 , the system enters the same regime as the one discussed in Theorem 2.8, so that the transition density of the process satisfies, after t 0 , the bounds (2.15). Anyhow, this leaves open the small time behavior of the transition kernel of the process.
Here, we thus go thoroughly into the analysis and specify both the on-diagonal rate of explosion and the off-diagonal decay of the conditional density in small time. Surprisingly, we show that the tail of the density looks much more like an exponential distribution rather than a Gaussian one. Precisely, we show that the off-diagonal decay of the density is of Gaussian type for 'untypical' values only, which is to say that, for values where the mass is effectively located, the decay is of exponential type. Put it differently, the two-sided bounds we provide for the conditional density read as a mixture of exponential and Gaussian distributions. where The reason why the conditional density follows a mixture of exponential and Gaussian rates may be explained as follows in the simplest case when x 0 = 0. The starting point is formula (2.10) in Proposition 2.3. When the initial condition is degenerate, the conditional covariance matrix C t in (2.11) has two scales. As shown right below, the eigenvalues of C t along the directions e 2 , . . . , e N are of order t whereas the eigenvalue λ 1 t of C t along the direction e 1 is of order t 2 with large probability. Anyhow, with exponentially small probability, λ 1 t is of order t: Precisely, the probability that it is of order ξt has logarithm of order −ξ/t when ξ ∈ (0, 1). Such large deviations of λ 1 t follow from large deviations of (Z s ) 0≤s≤t far away from the identity. This rough description permits to compare the contributions of typical and rare events in the formula (2.11) for the density f x 0 (t, v), when computed at a vector v parallel to the direction e 1 . On typical scenarios, the off-diagonal cost C −1 t v, v in the exponential appearing in (2.11) is of order |v| 2 /t 2 . In comparison with, by choosing ξ of order |v|, the events associated with large deviations of C t generate an off-diagonal cost C −1 t v, v of order |v|/t with an exponentially small probability of logarithmic order -|v|/t: The resulting contribution in the off-diagonal decay is order |v|/t, which is clearly smaller than |v| 2 /t 2 . This explains the exponential regime of f x 0 (t, v). The Gaussian one follows from a threshold phenomenon: as (Z s ) 0≤s≤t takes values in SO N (R), there is no chance for its elements to exceed 1 in norm. Basically, it means that, when |v| is large, the best choice for ξ is not |v| but 1: The corresponding off-diagonal cost is |v| 2 /t, which occurs with probability of logarithmic order −1/t. This explains the Gaussian part of f x 0 (t, v).
In the case when the conditioned initial position x 0 is not zero, specifically when it is far away from 0, things become much more intricate as the transport of the initial position x 0 by Z t affects the density. This is the reason why we consider a compactly supported initial condition. To compare with, notice that, in the non-degenerate case, (2.15) gives Gaussian estimates when x 0 is restricted to a compact set. This is exactly what the statement of Theorem 2.8 says when |x 0 | ≤ 1, the argument working in the same way when |x 0 | ≤ C 0 , for some C 0 > 1.

Proof of Proposition 2.3. We claim
the processes (B t ) t≥0 and (B t ) t≥0 are independent. Also, the processes (Z t ) t≥0 and (B t ) t≥0 are independent.
Proof. We know that settingZ t := exp((N − 1)t)Z t theñ Hence it suffices to show that B andB are independent. As both are Gaussian processes, this can be easily proved by computing their covariance which turns out to be zero if X 0 is centered, see (2.8).
Recalling that we can rewrite X t as X 0 being independent of (B t ,B) t≥0 , and using (2.3) to compute the covariance matrix of the Gaussian process (B t ) t≥0 : 3) the existence of the transition density and the representation (2.10) are direct consequences of (3.2) and Lemma 3.1. This proves Lemma 2.2.

3.2.
Additional properties on the resolvent process. We give in this paragraph some additional properties on the process Z that are needed for the derivation of the density estimates. We will make use of the following lemma whose proof can be found in Franchi and Le Jan [8], see Theorem VII.2.1 and Remark VII.2.6. Lemma 3.2. Given t > 0, the process (Z t Z ⊤ t−s ) 0≤s≤t has the same law as the process Since we also assumed that E[X 0 ] = 0, the process (X t ) t≥0 is centered. The point is then to compute Therefore, the energy is preserved: Moreover, using the expression (2.1) of the Landau SDE (which implies that, just in the equation right below, W becomes again an N -dimensional space-time white noise), we see that Finally, Therefore, Plugging the values of λ and λ in (3.6), we get the announced result.
3.4. Proof of Corollary 2.9. By Theorem 2.8, the result is straightforward when |v| ≤ 1 (as the transition density has a Gaussian shape). When |v| ≥ 1, the problem can be reformulated as follows. Given a constant C > 0, the point is to estimate where we have let δ t (|x 0 |) := By a polar change of variable, we get where ν S N−1 denotes the Lebesgue measure on the sphere S N −1 of dimension N − 1.
As we shall make use of its renormalized version below, we normalize ν S N−1 , so that ν S N−1 reads as a probability measure. Up to a multiplicative constant, the above expression remains unchanged. In particular, as we are just interested in lower and upper bounds of q t (v), we can keep the above as a definition for q t (v), with ν S N−1 being normalized.
Let us now recall the following two-sided heat kernel estimate on S N −1 , see e.g. [21]. There exists C ′ := C ′ (N ) ≥ 1 such that, for all t > 0, Therefore, what really counts in the expression of q t (v) is the product Up to a redefinition of the function q t , it is thus sufficient to consider Compare now with what happens when the convolution in (3.7) is made with respect to the Gaussian kernel. Basically δ t (|x 0 |) is replaced by 1 and 1 ∧ |x 0 | is replaced by |v| ∧ |x 0 | (see Remark 2.11). Equivalently, δ t (ρ) is replaced by 1 and 1 ∧ ρ by |v| ∧ ρ in (3.8). This says that, in (3.9), 1 ∧ ρ is replaced by |v| ∧ ρ. Then, in (3.10), δ t (ρ) is replaced by 1 and 1 ∧ ρ by |v| ∧ ρ, which leads exactly to the same three equalities. This shows that, in the Gaussian regime, the right quantity to consider is also (3.11 [23], we derive that, for t > 0, the law of Z t has a density, denoted by p SO N (t, Id N , ·), with respect to the probability Haar measure µ SO N of SO N (R). Moreover, there exists a constant β > 1 such that, for any g ∈ SO N (R) and for all t > 0: where d SO N (Id N , g) denotes the Carnot distance between Id N and g: · standing for the usual matricial norm on M N (R). Proof of the diagonal rate in (4.1) relies on the following volume estimate from Theorem V.4.1 in [23]: By compactness of SO N (R), there exists C N ≥ 1 such that, for all t > 0, , for ρ > 0, denotes the ball of radius ρ and center Id N . By local inversion of the exponential, it is well-checked that the Carnot distance is continuous with respect to the standard matricial norm on M N (R). In particular, by compactness of SO N (R), it is bounded on the whole group. Actually, we claim: Lemma 4.1 (Equivalence between Carnot distance and matrix norm on the group). There exists a constant C := C(N ) > 1 such that, for any g ∈ SO N (R), Proof. We first prove the upper bound. Considering a given g ∈ SO N (R), we can assume without any loss of generality that Id N − g ≤ ε, for some arbitrarily prescribed ε > 0. Indeed, if Id N − g > ε, the upper bound directly follows from the boundedness of the Carnot distance on the group.
Choosing ε small enough, we can assume that the logarithm mapping on M N (R) realizes a diffeomorphism from the ball of center Id N and radius ε > 0 into some open subset around the null matrix. Then, letting H := ln(g), we deduce from the variational definition of the distance that d SO N (Id N , g) ≤ H . Writing H = ln(Id N + g − Id N ), we obtain that H ≤ C g − Id N for some C := C(N ), which proves that The converse is proved in a similar way. Without any loss of generality, we can assume that d SO N (Id N , g) ≤ ε, for some given ε > 0. By the variational definition of the distance, this says that there exists a matrix H ∈ A N (R) such that exp(H) = g and d SO N (Id N , g) ≥ H /2, with H ≤ 2ε. By the Lipschitz property of the exponential around 0, g − Id N ≤ C H (for a possibly new value of the constant C), which yields g − Id N ≤ 2Cd SO N (Id N , g).
Part of our analysis relies on a specific parametrization of SO N (R) by elements of S N −1 × SO N −1 (R), where S N −1 is the sphere of dimension N − 1. Namely, for an element h ∈ SO N −1 (R), we denote by L h the element of SO N (R): Moreover, for an element s ∈ S N −1 , we denote by V s an element of SO N (R) such that V s e 1 = s. It is constructed in the following way. When s, e 1 = 0, the family (s, e 2 , . . . , e N ) is free. We can orthonormalize it by means of the Gramm-Schmidt procedure. By induction, we let and then s i := u i /|u i |, for all i ∈ {1, . . . , N }, so that s 1 = s. Then, the family (s 1 , s 2 , . . . , s N ) is an orthonormal basis and V s is given by the passage matrix expressing the (s i ) 1≤i≤N ' in the basis (e i ) 1≤i≤N . When s, e 1 = 0, we consider s, e 2 . If s, e 2 = 0, then the family (s, e 3 , . . . , e N , e 1 ) is free and we can apply the Gramm-Schmidt procedure. If s, e 2 = 0, we then go on until we find some index k ∈ {3, . . . , N } such that s, e k = 0. Such a construction ensures that the mapping With s → V s and h → L h at hand, we claim that the mapping φ : In other words, V ⊤ ge 1 g always fits some L h , the value of h being uniquely determined by the lower block (V ⊤ ge 1 g) 2≤i,j≤N , which proves the bijective property of φ. Denoting by Π N −1 the projection mapping: The mapping φ allows us to disintegrate the Haar measure on SO N (R) in terms of the product of the Lebesgue probability measure ν S N−1 on the sphere S N −1 and the Haar probability measure on SO N −1 (R). We have the following result, see e.g. Proposition III.3.2 in [8] for a proof: (4.4)

4.2.
Proof of Theorem 2.8. From (4.1) and Theorem 2.6, we derive the following two-sided bound for the conditional density. There existsC :=C(N ) ≥ 1 such that, for all t > 0, which will be the starting point to derive the bounds of Theorem 2.8.

Gaussian Regime.
Let us first concentrate on the bounds when |x 0 | ∧ |v| ≤ 1. Without loss of generality, we can assume by symmetry that |x 0 | ≤ 1. Indeed, for all Moreover, the Haar measure is invariant by transposition. This can be checked as follows. If Z is distributed according to the Haar measure, then, for any rotation ρ, ρZ ⊤ = (Zρ ⊤ ) ⊤ .
Since Zρ ⊤ has the same law as Z (as the group is compact, it is known the Haar measure is invariant both by left and right multiplications), we deduce that the law of Z ⊤ is invariant by rotation. Now, write: From (4.5) and the assumption |x 0 | ≤ 1, we get that: the constantC being allowed to increase from line to line. On the other hand, using once again Lemma 4.1 and (4.1) and choosingC large enough: where we have chosenC such that, for all g ∈ SO N (R),

Non Gaussian Regime.
We now look at the case |x 0 | ∧ |v| > 1. Starting from (4.5) and Lemma 4.1, we aim at giving, for given c > 0, upper and lower bounds, homogeneous to those of (2.15), for the quantity t −N/2 p x 0 (t, v), where : Since the Haar measure is invariant by transposition, the roles of v and x 0 can be exchanged in formula (4.6) and we can assume that |v| ≥ |x 0 |.
By Lemma 2.10 (with x 0 replaced by gx 0 ), we know that Radial cost. The term | |v| − |x 0 | | 2 is referred to as the radial cost. Since it is independent of g, we can focus on the other one, called the tangential cost. Then, changing v into (|x 0 |/|v|)v, we can assume that |v| = |x 0 |. Tangential cost. We now assume that |v| = |x 0 |. By rotation, we can assume that x 0 = |x 0 |e 1 . Then, we can write v = |x 0 |he 1 for some h ∈ SO N (R). We then expand in (4.6) The strategy is then quite standard and consists in reducing the quadratic form |e 1 − ge 1 | 2 + |x 0 | 2 |he 1 − ge 1 | 2 . We write we finally get that As the second term is independent of g, we write (4.7) Now, we notice that e 1 + |x 0 | 2 he 1 1 + |x 0 | 2 ≤ 1.
Since |x 0 | 2 > 1, we have |e 1 + |x 0 | 2 he 1 | > 0. Therefore, we can proceed as in the previous paragraph: in the first term inside the second exponential in (4.7), we use Remark 2.11 to split the radial and tangential costs. The radial cost is here given by Up to multiplicative constants, the last term above can be bounded from above by (1 + |x 0 | 2 ) −2 (1 − e 1 , he 1 ). In particular, up to a modification of the constant c in p x 0 (t, v), we can see the radial cost as a part of the exponential pre-factor in (4.7). Therefore, without any ambiguity, we can slightly modify the definition of p x 0 (t, v) and assume that it writes   Lower bound. Observe first that, for all i ∈ {2, · · · , N }, , using that V s defines an isometry for the last control. From Lemma 4.1, we now derive that , where (c 1 , c 2 ) := (c 1 , c 2 )(N ). By (4.1), applied for N − 1, we get that there exists C := C(N ) ≥ 1 (the value of which is allowed to increase below) such that Thus, Let us restrict the integral to a neighborhood ofs in S N −1 of the form Then, We can write, for i ∈ {2, . . . , N }, (4.14) Now, by (4.13), Therefore, by (4.14) and (4.15) and by a standard induction, for all i ∈ {2, . . . , N }, In the above, we can always choose the sign in ∓ so that 1 ∓ e 1 , s ≥ 1. Therefore, for all i ∈ {2, . . . , N }, Since, for s ∈ Vs, | s, e k | ≤ | s, e k | + t 1/2 /|x 0 |, we deduce from (4.17): We derive from (4.12) that denoting, as in Theorem 2.8, δ t := 1 ∧ t 1/2 |x 0 | 1 ∧ t 1/2 and using (4.11) for the last inequality. Assume first that e 1 , he 1 ≤ 0. The above equation yields Recalling that, for the tangential cost analysis, we have assumed |x 0 | = |v|, we derive which gives the claim.
Assume now that e 1 , he 1 ≥ 0. It can be checked from the definition ofs in (4.10) that e 1 ,s ≥ e 1 , he 1 so that we eventually get: We conclude by the same argument as above.
Upper bound. Going back to (4.9) and using the fact that V s ∈ SO N (R) for any s ∈ S N −1 , we get We then focus on the integral with respect to k, namely for a given s ∈ S N −1 , the normalization (1 ∧ t) (N −1)(N −2)/4 standing for the order of the volume of the ball of radius t 1/2 in SO N −1 (R). Denoting byŝ 2,N the N − 1 square matrix made of the column vectors (( Now, we distinguish two cases. For a given ε > 0 to be specified next, we first consider the case when ŝ 2,N − k ≥ ε for any k ∈ SO N −1 (R). Then, there exists a constant c ′ := c ′ (ε) > 0 such that ŝ 2,N − k ≥ c ′ d SO N−1 (Id N −1 , k), so that (up to a modification of c) for a constantC :=C(N ).
Let us now assume that there exists k 0 ∈ SO N −1 (R) such that ŝ 2,N − k 0 ≤ ε. By invariance by rotation of the Haar measure, we notice that q t (s) can be bounded by Lettings 2,N := k ⊤ 0ŝ 2,N , we notice that s 2,N − Id N −1 ≤ ε. This permits to definẽ S 2,N := ln(s 2,N ) (provided ε is chosen small enough).
Again, we distinguish two cases, according to the value of the variable k in the integral. When s 2,N − k ≥ ε, we can use the same trick as before and say that s 2,N − k ≥ cd SO N (Id N −1 , k). Repeating the computations, we get When s 2,N − k ≤ ε, we have Id N −1 − k ≤ 2ε, so that k may be inverted by the logarithm and written as k = exp(K) for some antisymmetric matrix K of size N − 1.
By local Lipschitz property of the logarithm, we deduce that, for such a k (and for a new value of c ′ ), We then denoteH 2,N the orthogonal projection ofS 2,N on A N −1 (R). We get Clearly,H 2,N is in the neighborhood of 0. By local Lipschitz property of the exponential, we finally obtain (again, for a new value of c ′ ) Lettingh 2,N := exp(H 2,N ), we end up with where we have used Lemma 4.1 on SO N −1 (R) to get the second line. By a new rotation argument, which shows that q t (s) ≤C. Equation (4.18) thus yields: Observing now that there existsc > 1 such thatc −1 |s −s| ≤ d(s,s) ≤c|s −s|, where d stands for the Riemannian metric on the sphere S N −1 , we then deduce from the heat kernel estimates in Stroock [21] that where p S N−1 stands for the heat kernel on S N −1 . Since we have assumed |x 0 | ≥ 1 we finally derive up to a modification ofC: which gives an upper bound homogeneous to the lower bound and completes the proof.

The degenerate case
The strategy to complete the proof of Theorem 2.12 relies on an expansion of Z t in terms of iterated integrals of the Brownian motion on the Lie algebra A N (R) of SO N (R). In that framework, it is worth mentioning that we do not exploit anymore the underlying group structure. Instead, we explicitly make use of the Euclidean structure of A N (R). Indeed the analysis relies on precise controls of events described by the whole trajectory of Z. We manage to handle the probability of those events by controlling the corresponding trajectories of the A N (R)-valued Brownian motion B. In that perspective, the heat kernel estimates (4.1) for the marginals of Z in SO N (R) are not sufficient, as once again, the distribution of the whole path is needed to carry on the analysis. 5.1. Set-up. In the whole section, we will assume that degeneracy occurs along the first direction of the space, that is X 0 has the form: where e 1 is the first vector for the canonical basis and X 1 0 is a square integrable realvalued random variable. Because of the isotropy of the original equation, this choice is not restrictive. To make things simpler, additionally to the centering assumption, recall E X 1 0 = 0, we will also suppose (without any loss of generality) that X 1 0 is reduced, that is E X 1 0 2 = 1.
Given a real x 1 0 , we will work under the conditional measure given {X 1 0 = x 1 0 }, which we will still denote by P. Therefore, recalling (2.5) and (3.2), we will write in the whole section (X t ) t≥0 as which is understood as the conditional version of the original process (X t ) t≥0 given the initial condition X 0 = x 1 0 e 1 . In this framework, the typical scales of X t in small time t are given by: showing that the fluctuations of the density is t in the first component and t 1/2 and the other ones. Eq. (5.2) will be proved below.

Small time expansions.
The key point in the whole analysis lies in small time expansions of the process (Z t ) t≥0 and of the 'conditional covariance' matrix C t in (2.11). The precise strategy is to expand both of them in small times, taking care of the tails of the remainders in the expansion (recalling that the covariance matrix is random). We thus remind the reader of the so-called Bernstein equality, that will play a major role in the whole proof, see e.g. Revuz and Yor [19]: be a continuous scalar martingale satisfying M 0 = 0. Then, for any A > 0 and σ > 0, where we have used the standard notation M * t := sup 0≤s≤t |M s |. Remark 5.2 (Notation for supremums). With a slight abuse of notation, for a process (Y t ) t≥0 with values in R ℓ , ℓ ≥ 1, we will denote Y * t := max i∈{1,...,ℓ} (Y ℓ t ) * . Identifying R ℓ ⊗ R k with R ℓ×k , we will also freely use those notations for matrix valued processes.

Landau notations revisited.
In order to express the remainders in the expansion of the covariance matrix in a quite simple way, we will make a quite intensive use of Landau notation, but in various forms: Definition 5.3 (Laudau notations). Given some T > 0, we let: (i) Given a deterministic function (ψ t ) 0≤t≤T (scalar, vector or matrix valued), we write ψ t = O(t α ), for some α ≥ 0 and for any t ∈ [0, T ] if there exists a constant C := C(N, T ) such that |ψ t | ≤ Ct α .

5.2.2.
Small time expansion of the Brownian motion on SO N (R). Following the proof of Lemma 3.1, we then expand Z t according to Given some time horizon T > 0, the remainders (S t ) 0≤t≤T and (R t ) 0≤t≤T can be controlled as follows on [0, T ]: What really counts in the sequel is the first column (Z ·,1 t ) of the matrix Z t . By antisymmetry of the matrix-valued process (B t ) t≥0 , the entries of the column (Z ·,1 t ) By (3.3) and (3.6), we have We then notice that C t reads where we have denotedZ s := Z t Z ⊤ t−s , s ∈ [0, t]. By the invariance in law of Lemma 3.2, we know that (Z s ) 0≤s≤t and (Z s ) 0≤s≤t have the same law. In particular, noting that Z t =Z t , the following identity in law holds: Now, by (5.8) and (5.10), we can expandC t intō

5.3.
Proof of the Lower Bound in Theorem 2.12. We start from the representation formula (5.11) derived from the identity in law (5.9). We insist here that we choose some 'untypical' events for the Brownian path on A N (R) to derive the bounds of Theorem 2.12.

First
Step. The point is to find some relevant scenarios to explain the typical behavior of f x 0 (t, v) in (5.11). Given ξ ∈ (0, 1] such that t/ξ 2 ≤ 1 and γ ∈ (0, 1], we thus introduce the events Proof. On the event B 1 , it holds, for all j ∈ {2, . . . , N }, since we have γt/ξ 2 ≤ 1. By independence of B j,i and B k,1 for i, j, k ∈ {2, . . . , N }, we also know that, conditionally on B 1 , the process ( s 0 dB i,j r B j,1 r ) 0≤s≤t behaves as a Wiener integral, with a variance process less than (4ξ 2 s) 0≤s≤t . Therefore, using a Brownian change of time, we obtain where (β s ) s≥0 is a 1D Brownian motion. We deduce that there exists a constant c > 0 (which value is allowed to increase from line to line) such that In fact, we must bound from below the conditional probability P(∩ N i,j=2 B i,j |B 1 ). By antisymmetry of the matrix B and conditional independence of the processes (B i,j ) 2≤i<j≤N , we deduce that It thus remains to bound P(B 1 ) from below. For some j ∈ {2, . . . , N }, we deduce from Girsanov's theorem that where (β s ) s≥0 is a 1D Brownian motion. By independence of the processes (B 1,j ) 2≤j≤N , we deduce that We finally deduce that

Third
Step. We go thoroughly into the analysis ofC 0 t . When ξ = γ = 0, the determinant ofC 00 can be computed explicitly by adding 1/2 times the column i to the first column, for any i = 2, . . . , N . We obtain as a result det(C 00 ) |ξ=γ=0 = N − 1 12 .
We deduce that In a similar way, where, for γ and ξ 2 small enough, . Therefore, referring to (5.21), we writē with M t = O(t 2 + tξ 9/4 ) on B ∩ R, and we let where the exponent 1/2 indicates the symmetric square root. Indeed, when γ = ξ = 0, C 00 is the covariance matrix of the vector with (ζ 1 , . . . , ζ N ) (law) = N ⊗N (0, 1), so that it is non-negative symmetric matrix; since its determinant is positive, it is a positive symmetric matrix. By continuity, this remains true for ξ and γ small enough. For the same values of ξ and γ, (5.21) says thatC 0 t is also symmetric and positive. Then, ) is small when t/ξ 2 and ξ are small, we can write provided t/ξ 2 and ξ are small enough.
Therefore, for t/ξ 2 and ξ small enough, the matrix Id N + M ′ t , which is symmetric by construction, has all its eigenvalues between 1/2 and 2, so that, for given a vector Step. We can summarize what we have proved in the following way: There exists a constant K := K(N ) ≥ 1 such that, for max(t/ξ 2 , ξ 2 , γ) ≤ 1/K, Eq. (5.27) holds for any (5.27). Put it differently, we are to bound: inf where we have used t ≤ ξ 2 in both expansions. Pay attention that this step is crucial as, together with the previous paragraph, it gives the joint behavior of (Z ·,1 t ,C t ) on B ∩ R. Therefore, we can find a constant C := C(N ) > 0 such that The value of C being allowed to increase from line to line, we get: We now handle the minimization problem in (5.28) according to the value of If ς ≥ 1/K, we choose ξ 2 = 1/K in the infimum. We obtain inf Kt≤ξ 2 ≤1/K If ς ∈ [Kt, 1/K], we choose ξ 2 = ς in the infimum. We obtain inf This gives a lower bound for the exponential factor in (5.27) on the event B ∩ R. When x 0 ∈ [−C 0 , C 0 ], we can modify C (allowing it to depend on C 0 ) in such a way that, in any of three cases, inf Therefore, modifying C ′ if necessary, In the same way, (5.19) implies for a polynomial function R N on R N 2 . Plugging (5.38) into (5.36) we obtain: The covariance matrix M t being given, the integral in the right-hand side can be interpreted as the probability that an N -dimensional centered Gaussian random vector with c −1 M t as covariance matrix be in the set {z ∈ R N : sign(y i )z i > |y i |, i = 1, . . . , N }. Conditionally on F Z t , we know that c −1/2 T −1 t Z t Γ t is precisely a centered Gaussian vector with c −1 M t as covariance. Therefore, choosing Since the matrix T t is diagonal, Cauchy-Schwarz inequality gives: The proofs of Lemmas 5.6 and 5.7 are given in the subsections 5.4.2 and 5.4.3 respectively.

5.4.2.
Derivation of the diagonal controls. This subsection is dedicated to the proof of Lemma 5.6. Usually, in the Malliavin calculus approach to density estimates, this step is the most involved and requires a precise control of the determinant of the Malliavin covariance matrix, see e.g. Kusuoka and Stroock [17] or Bally [2]. In the current framework the determinant of the 'covariance' matrix M t still plays a key role but the specific structure of that matrix, especially the fact that (Z s ) s≥0 defines an isometry, yields the required estimate almost for free.
Precisely, we have the following Proposition.
, we here concentrate on det(C t ). The claim of the proposition indeed follows from the bound det(C t ) ≥ Ct N +1 . (5.41) To derive (5.41) we recall the 'variational' formulation of the determinant for symmetric matrices (see for instance [4]). Recall the expression of C t from (5.7). Since Z t is an isometry, we have det(C t ) = det(Ĉ t ) whereĈ t = To achieve the proof of Lemma 5.6, it therefore remains to check that the entries of the matrix M t are bounded in any L p (P), p ≥ 1 (uniformly on [0, T ]). With the notation of Definition 5.3, Lemma 5.6 will follow from the control ∀(i, j) ∈ {1, · · · , N } 2 , (M t ) i,j = O P (1).
(5.43) Associated with Proposition 5.8, this will indeed yield that M −1/2 t also satisfies (5.43) (by controlling from above and below the eigenvalues of M t in terms of its determinant and its norm). Equation (5.43) is easily derived from (5.9), (5.15) and the definition of the scale matrix t 1/2 T t in (5.33).

Derivation of the tail estimates.
This subsection is dedicated to the proof of Lemma 5.7. Conditioning with respect to F B t := σ((B s ) 0≤s≤t ) (which is independent of (B s ) s≥0 ), Since (Z s ) 0≤s≤t is an isometry, it is bounded and so is (Z t Z ⊤ s ) 0≤s≤t . Moreover, by (5.8), d B t /dt is less than Id N (in the sense of symmetric matrices). Therefore, By Proposition 5.1 (Bernstein inequality) applied to the conditionally Gaussian variables ( t 0 Z t Z ⊤ s dB s ) i i∈{1,··· ,N } , there exists a constantc :=c(N ) ≥ 1 such that (5.44) Equation (5.44) provides us with the Gaussian part of the estimate. To derive the exponential one, we apply Chebychev inequality: for any γ > 0, recalling (Z t Γ t ) 1 = (Z t t 0 Z ⊤ s dB s ) 1 and using also the Cauchy-Schwarz and Bernstein inequalities (similarly to (5.44)) to pass from the first to the second line. Recalling (5.7) and using the Gaussian character of the conditional distribution of When taking the expectation, we know from the identity in law (5.9) that we can replace C t byC t . By (5.12) and (5.13), We then write Z 1,1 s = (1 + O(s))(1 + S 1,1 s ), which leads to a simplified version of (5.15): . The point is then to plug the above expansion into the expectation of (5.46). We thus compute the moments of the right-hand side above. We make use of Lemma 5.4, which says that S * t /t has an exponential tail. Therefore, choosing γ small enough, we can bound the last factor in the right-hand side in (5.46) byC :=C(N, T ). This completes the proof of Lemma 5.7 5.4.4. Conclusion. Combining Lemmas 5.6 and 5.7 we derive that, for t ∈ [0, T ], with C := C(N, T ). Using Cauchy-Schwarz inequality, it suffices to bound We start with F 2 . By the inequality, −2|v i − (Z t x 0 ) i | 2 ≤ −|v i | 2 + 2|(Z t x 0 ) i | 2 , we obtain Now, N i=2 (Z i,1 t ) 2 = 1 − (Z 1,1 t ) 2 = O(1 − Z 1,1 t ) = O(S 1,1 t + t). Therefore, for |x 1 0 | ≤ C 0 , we deduce from Lemma 5.4 that we can choose C := C(N, T, C 0 ) large enough in (5.47) such that the second factor in the last line is bounded by C.
(As in the lower bound, the dependence of C upon C 0 can be made explicit.)