On the curvature and heat flow on Hamiltonian systems

We develop the differential geometric and geometric analytic studies of Hamiltonian systems. Key ingredients are the curvature operator, the weighted Laplacian, and the associated Riccati equation. We prove the appropriate generalizations of Bochner--Weitzenb\"ock formula and Laplacian comparison theorem, and study the heat flow.


Introduction
The aim of this article is to apply the recently developed technique in Finsler geometry to the study of Hamiltonian systems. A Finsler manifold carries a (Minkowski) norm on each tangent space. Although Finsler manifolds form a much wider class than Riemannian manifolds, the notion of curvature makes sense and we can consider various comparison theorems similarly to the Riemannian case (see, e.g., [Sh1]). Especially, the weighted Ricci curvature introduced by the author [Oh4] has fruitful applications including the curvature-dimension condition ( [Oh4]), Laplacian comparison theorem for the natural nonlinear Laplacian ( [OS1]), Bochner-Weitzenböck formula and gradient estimates ( [OS3]), and generalizations of the Cheeger-Gromoll splitting theorem ( [Oh6]). To be precise, the weighted Ricci curvature is defined for a pair consisting of a Finsler manifold and a measure on it, and our Laplacian depends on the choice of the measure (see Subsections 2.3, 4.1 for details).
Then, it is natural to expect that the theory of curvature can be applied beyond Finsler manifolds, and a class of manifolds M endowed with Lagragians L or, equivalently, Hamiltonians H is a natural choice. In fact, on the one hand, we know that Agrachev and Gamkrelidze [AG] (see also [Agr2]) have developed the theory of curvature operator for Hamiltonian systems in connection with optimal control theory (see [Agr1] for a dynamical application). On the other hand, optimal transport theory (which is related to the curvature-dimension condition) for Lagrangian cost functions has been well investigated ( [BB], [FF], [Vi]). Furthermore, Lee [Le2] recently showed a Riccati equation (see also [AL1], [AL2], [LLZ] for the sub-Riemannian case) as well as convexity estimates for entropy functionals along smooth optimal transports for general (time-dependent) Hamiltonian systems, by means of the curvature operator. His unified approach recovers both the curvature-dimension condition (CD(K, ∞) and CD(0, N) to be precise) for Riemannian or Finsler manifolds and various monotonicity formulas along flows in Riemannian metrics related to the Ricci flow.
Our Hamiltonian will always be time-independent and non-negative. Compared with the Finsler situation, the lack of the homogeneity causes many difficulties, for example, we need to take care of the difference between the Lagrangian and the Hamiltonian (they coincide as functions via the Legendre transform in the Finsler case). Nevertheless, by combining the Riccati equation and the technique in the Finsler case, we prove the Bochner-Weitzenböck formula (Theorem 4.4) and Laplacian comparison theorem (Theorem 6.2). We also obtain functional inequalities from the convexity of the relative entropy (Theorem 7.6). In these results, we use the weighted Ricci curvature derived from the curvature operator as well as the Laplacian ∆ H m induced from the Hamiltonian H and the reference measure m on M. In general, this Laplacian is not only nonlinear (∆ H m (f + g) = ∆ H m f + ∆ H m g) but also non-homogeneous (∆ H m (cf ) = c∆ H m f for c ∈ R). We also study the evolution equation ∂ t u = ∆ H m u which can be thought of as the 'heat equation' in our context. In various settings including Finsler manifolds ( [OS1]), the heat flow is regarded as gradient flow in the following two ways: (I) the gradient flow of the Dirichlet energy in the L 2 -space; (II) the gradient flow of the relative entropy in the L 2 -Wasserstein space.
The identification of these two strategies was recently established for general metric measure spaces satisfying the curvature-dimension condition ( [AGS3]). Although a Hamiltonian does not induce a distance function, we verify the analogue of the former approach (I) in Theorem 8.4. Indeed, since our energy form E(u) = M H(du) dm is a convex functional on the L 2 -space, the classical theory of Brézis et al applies. Interestingly, however, the latter strategy (II) leads to the different equation ∂ t u = − div m (u∇[− log u]) (Theorem 8.7). This shows that the identification of (I) and (II) is essentially due to the homogeneity of H.
We explicitly calculate the curvature and Laplacian in the special class of Hamiltonians given by convex deformations of Finsler Hamiltonians (that is, H = h • F * , where h is a convex function and F * is the dual of a Finsler structure F : T M −→ R; see Subsection 5.3). What is of particular interest is the p-homogeneous deformations (h(t) = t p /p), which derives the Finsler analogue of p-Laplacians. However, we do not focus on this specific case because the aim of this work is to present the general framework. In the same spirit, we sometimes discuss under seemingly superfluous assumptions for the sake of technical simplicity.
Acknowledgments I am grateful to Professor Paul W. Y. Lee for his helpful comments on a preliminary version of the paper.

Preliminaries
Throughout the article, let M be a connected C ∞ -manifold of dimension n ≥ 2 without boundary. For a local coordinate system (x i ) n i=1 on an open set U ⊂ M, we will always consider the (fiber-wise linear) coordinates (x i ; v j ) n i,j=1 of the tangent bundle T U and (x i ; α j ) n i,j=1 of the co-tangent bundle T * U given by respectively. We will use the usual abbreviations such as for brevity, but only for the Lagrangian L on T M and the Hamiltonian H on T * M.

Lagrangians
We consider only time-independent (autonomous) Lagrangians for simplicity. General time-dependent Lagrangians can be treated similarly to a great extent (see [AG], [Agr2], [Le2]). (2) (Super-linearity) There are some complete Riemannian metric g of M and positive constant C > 0 such that L(v) ≥ |v| g − C for all v ∈ T M; (3) (Strong convexity) For any x ∈ M, L has the positive-definite Hessian at every v ∈ T x M \ {0} (with respect to an arbitrary linear coordinate of T x M).
Note that the strong convexity implies L > 0 on T M \0 T M and that the super-linearity (2) follows from (1) and (3) if M is compact. The reason why only the C 1 -regularity is assumed on 0 T M is to include non-Riemannian Finsler metrics (see Remark 2.2(a)). Then the non-negativity L ≥ 0 and L| 0 T M ≡ 0 are imposed to ensure that an action-minimizing curve η : [0, T ] −→ M withη(0) = 0 always enjoysη = 0 on whole [0, T ] (due to the conservation of the Hamiltonian), thusη lives in which is a critical point of the action T 0 L(η) dt (among variations fixing the endpoints) is called an action-minimizing curve and satisfies the Euler-Lagrange equation To be precise, η is either a constant curve or a C ∞ -curve withη = 0. Thus η is always C ∞ and (2.1) makes sense (note that L x i (0) = 0 since L| 0 T M ≡ 0). Given any v ∈ T M, there is a unique action-minimizing curve η v : Let us summarize some fundamental remarks on the difference from the Riemannian or Finsler case, caused by the non-homogeneity of L (see also Remark 2.3 below).
gives a Finsler metric (see Subsections 2.3, 5.1 below). Moreover, in this case, F comes from a Riemannian metric if and only if F 2 ∈ C 2 (T M) (see [Sh1,Proposition 2.2]).
(b) An action-minimizing curve η does not necessarily have a constant speed (i.e., L(η) may not be constant). This is one of the reasons why the Hamiltonian (which is constant along the Hamiltonian flow) fits better to our consideration.
(c) The strong convexity does not imply the uniform (strict) convexity of L even in a single tangent space T x M. For instance, for f (t) = |t| p on R with p ∈ (1, ∞), we have lim t→∞ f ′′ (t) = 0 if p < 2 and lim t↓0 f ′′ (0) = 0 if p > 2. This is one of the major differences from the Finsler setting, in which the uniform convexity and smoothness are used in various analytic and geometric estimates (see [OS1,Sections 2,3], [Oh3]).

Hamiltonians
Let L be a Lagrangian as in Definition 2.1. The associated Hamiltonian is given by Choosing v = 0 ensures that H ≥ 0 as well as . Given α ∈ T * x M, by virtue of the strong convexity of L, we can find a unique vector v ∈ T x M satisfying H(α) = α(v)−L(v). Such a vector is denoted by τ * (α) and called the Legendre transform of α, and the map τ * : T * M −→ T M is explicitly written as (2. 2) The inverse map τ : T M −→ T * M of τ * is similarly given by Remark 2.3 (a) The transform τ (or τ * ) is a linear operator if and only if L comes from a Riemannian metric. Precisely, τ (cv) = cτ (v) holds for all c > 0 and v ∈ T M if and only if L comes from a Finsler structure, and τ (v + w) = τ (v) + τ (w) holds only in the Riemannian case. Moreover, in the Finsler case, τ | TxM is differentiable at the origin if and only if F | TxM comes from an inner product.
The Hamiltonian vector field on T * M is defined by A C ∞ -curve η solves the Euler-Lagrange equation (2.1) if and only if τ (η) verifies Φ t (τ (η(0))) = τ (η(t)). That is to say, the Hamiltonian flow coincides with the flow on T M generated from the Euler-Lagrange equation via the Legendre transform. Thus the forward completeness (resp. completeness) of L is equivalent to that of Φ t , namely the existence of (Φ t ) t≥0 (resp. (Φ t ) t∈R ) on whole T * M.
Let us denote by

Finsler manifolds
This subsection is devoted to a concise review on the special class of Finsler manifolds, where we have a clearer understanding of curvature and heat flow. We refer to [BCS] and [Sh2] for basics of Finsler geometry.
Definition 2.4 (Finsler structures) We say that a function F : T M −→ [0, ∞) is a C ∞ -Finsler structure of M if it satisfies the following: (1) F ∈ C ∞ (T M \ 0 T M ); (2) (Positive 1-homogeneity) F (cv) = cF (v) for all v ∈ T M and c > 0; (3) (Strong convexity) For any v ∈ T M \ 0 T M , the (n × n)-symmetric matrix We do not assume the absolute homogeneity F (−v) = F (v) in general. Thanks to the positive homogeneity, the Lagrangian L F := F 2 /2 and the corresponding Hamiltonian Euler's theorem on homogeneous functions is a key tool (see [BCS,Theorem 1.2.1]). In the context of general Lagrangians, the lack of this basic tool causes many differences from the Finsler setting.
Similarly to the Riemannian case, one can regard the Euler-Lagrange equation as the geodesic equation with respect to the distance structure induced from F . Precisely, the distance function d F : where the infimum is taken over all piece-wise C 1 -curves η : [0, 1] −→ M with η(0) = x and η(1) = y. Note that d F can be nonsymmetric (i.e., d F (y, x) = d F (x, y)) since F is only positively homogeneous. Then a curve η solves the Euler-Lagrange equation if and only if it is a geodesic in the sense that it is locally d F -minimizing and of constant speed (i.e., F (η) is constant).
In the Finsler world, corresponding to the sectional curvature of Riemannian manifolds is the flag curvature K(v, w) for linearly independent vectors v, w ∈ T x M. We remark that, different from the Riemannian case, K(v, w) depends not only on the plane v ∧ w spanned by v and w (the flag), but also on the choice of v in it (the flagpole).
We know a useful interpretation of the flag curvature, due to Shen ([Sh2, §6.2]), as follows. Fix v ∈ T x M \ {0} and extend it to a C ∞ -vector field V on a neighborhood U of x (i.e., V (x) = v) in such a way that all integral curves of V are geodesic (this is always possible, whereas the choice of V is not unique). By the strong convexity, V induces the Riemannian structure g V on U via (2.4) as (2.5) Then, for w ∈ T x M which is not co-linear with v, the sectional curvature of v ∧ w with respect to g V coincides with K(v, w) (independent of the choice of V ). This remarkable fact shows the usefulness of V and g V as above in the study of Finsler manifolds from the Riemannian geometric viewpoint. For a unit vector v ∈ T x M ∩ F −1 (1), the Ricci curvature Ric F (v) is defined as the trace of K(v, ·) with respect to g v (defined similarly to (2.5)). Thus Ric F (v) coincides with the Ricci curvature of v with respect to the Riemannian structure g V as in the previous paragraph. We also set Ric F (cv) := c 2 Ric F (v) for c ≥ 0. Now we fix a positive C ∞ -measure m on M and modify Ric F into the weighted Ricci curvature Ric F N which was introduced in [Oh4] inspired by the theory of weighted Riemannian manifolds.
Definition 2.5 (Weighted Ricci curvature) Given v ∈ T x M \ {0}, extend it to a C ∞vector field V on a neighborhood U of x such that all integral curves of V are geodesic. Using the volume measure vol V of g V , we decompose m as m = e −ψ vol V on U. Let η be the geodesic withη(0) = v. Then we define, for N ∈ (n, ∞), As the limits, define We also set Ric F N := 0 on 0 T M for all N ∈ [n, ∞]. It is easily seen that Ric F N (v) is well-defined (independent of the choice of V ) and Ric F N (cv) = c 2 Ric F N (v) for all c > 0. In the Riemannian case, Ric g ∞ is the famous Bakry-Emery tensor and Ric g N with N < ∞ was introduced by Qian (see [BE], [Qi]).
Remark 2.6 On a Riemannian manifold (M, g), if we choose the volume measure as the reference measure m, then Ric g N = Ric g for all N ∈ [n, ∞]. On a general Finsler manifold, however, there does not necessarily exist a measure m with Ric F n > −∞ (see [Oh5]). This means that there is no nice reference measure in general, so that it is natural to start from an arbitrary measure.
It was demonstrated in [Oh4] that (M, d F , m) satisfies Lott, Sturm and Villani's curvature-dimension condition CD(K, N) if and only if Ric F N (v) ≥ KF (v) 2 for all v ∈ T M. Roughly speaking, in CD(K, N), K acts as the lower bound of the Ricci curvature and N is regarded as the upper bound of the dimension. This characterization extends the Riemannian one (by [CMS], [vRS], [St1], [St2], [LV1], [LV2]), and has many analytic and geometric applications via the general theory of curvature-dimension condition (developed in [St1], [St2], [LV1], [LV2], [Vi]). Moreover, the Laplacian and heat flow on (M, F, m) (both are nonlinear except for the Riemannian case) were studied in [OS1], [OS2] and [OS3], where we have shown the Bochner-Weitzenböck formula and the Bakry-Émery and Li-Yau gradient estimates among others. See also [Oh6] for a further application to generalizations of the Cheeger-Gromoll splitting theorem.

Curvature for Hamiltonians
In [AG], Agrachev and Gamkrelidze introduced the curvature operator for Hamiltonian systems. We recall their definition along the line of [Le2] (see also [Le1]). Although our Lagrangian is assumed to be time-independent and non-negative, the original definition is concerned with general time-dependent Lagrangians.
Remark 3.1 According to [AD], the construction of curvature in [AG] is equivalent to those in the independent works [Gr] and [Fo] (see also Acknowledgment in the preprint version arXiv:1205.1442v6 of [Le2]). We refer to [AD] and the references therein for details and some other related works.
Fix ξ = n i=1 ξ i ∂ α i ∈ V α . Choose an arbitrary smooth curve ξ(t) = n i=1 ξ i (t)∂ α i ∈ V α(t) with ξ(0) = ξ, and put e t := (dΦ t ) −1 (ξ(t)) ∈ J t α . Since e t lives in the same linear space T α (T * M) for all t, we can defineė t ∈ T α (T * M) by differentiating the coefficients of e t (with respect to any linear coordinate of T α (T * M)). Recall that ω denotes the canonical symplectic form on T * M.
Proof. Since e 0 = ξ ∈ V α is in the vertical part of T α (T * M), it is sufficient to calculate only the horizontal part ofė 0 . By differentiating dΦ t (e t ) = ξ(t) at t = 0 and noticinġ Φ t = − → H • Φ t and e 0 = ξ, we havė Hence the horizontal part ofė 0 is − n i,j=1 ξ j H α i α j (α)∂ x i and we obtain the claim. ✷ Remark 3.3 The above inner product should be compared with (2.4) and (2.5) in the Finsler case. To be precise, g τ * (α) of T x M coincides with ·, · α of V α in Lemma 3.2 via the Legendre transform and the canonical identification between T * x M and V α .
Definition 3.4 (Canonical frames) If a family of smooth curves e t i ∈ J t α (i = 1, . . . , n) satisfies then we call the family (e t i ;ė t j ) n i,j=1 (referred simply by (e t i ) n i=1 henceforth) a canonical frame along α.
by (2.3) and Lemma 3.2. In Appendix A.1, we will review how to construct a canonical frame from an orthonormal basis with respect to ·, · α(t) along the recipe in [Le2]. Fix a canonical frame E t = (e t i ) n i=1 along α.
Lemma 3.5 We have ω(e t i , e t j ) = 0 for all i, j and t. In particular, ω(e t i , J t α ) = 0 for all i and t.
Proof. This follows from ω(e 0 i , e 0 j ) = 0 (since e 0 i ∈ V α ) and with respect to ω. The next lemma yields the uniqueness of a canonical frame up to an orthogonal transformation.
Lemma 3.6 (see [Le2,Proposition 3 where we used Lemma 3.5 and O T is the transpose of O. It also follows from Lemma 3. α is a linear operator and is independent of the choice of the canonical frame (e t i ) n i=1 thanks to Lemma 3.6. In Appendix A.2, we explicitly calculate the curvature operator in coordinates along [Le2]. The following property of R t α shows that the definition of R t α can be reduced to the case of t = 0.
Lemma 3.8 (see [Le2,(25)]) For any t, we have R 0 Proof. Fix t and observe that gives a canonical frame along α(t + s). Thus we obtain The curvature operator R t α is symmetric in the sense that for all i, j and t.
Proof. By Lemma 3.8, it suffices to see the claim at t = 0. We deduce from Lemma 3.5 that ω(e t i ,ë t j ) ≡ 0, and then ω( Definition 3.10 (Ricci curvature) Define the Ricci curvature Ric H (α) ∈ R as the trace of R 0 α : V α −→ V α with respect to the inner product ·, · α given in Lemma 3.2. We also set Ric H := 0 on 0 T * M .
We remark that setting Ric H (0) = 0 is reasonable since then α is constant. The weighted version can be introduced similarly to the Finsler case as follows (recall Definition 2.5).
Then the weighted Ricci curvature is defined by We also set Ric H N := 0 on 0 T * M for all N ∈ [n, ∞].

Laplacian
In this section, we introduce the natural nonlinear Laplacian ∆ H m associated with the Hamiltonian H and the reference measure m on M in a similar way to the Finsler case (see [OS1]). It will turn out that ∆ H m coincides with the Laplacian ∆ H studied in [Le2] (see also [AL2] for the sub-Riemannian case) up to a difference depending on m.

Gradient vectors, Laplacian and Hessian
For a differentiable function u on M, we call For an open set U ⊂ M, we will use two kinds of Sobolev spaces: Clearly they coincide in the Riemannian or Finsler case. We also introduce H 1 loc (U; L) and is not necessarily a linear space because of the nonhomogeneity of L (same for H 1 (U; H), H 1 loc (U; L) and H 1 loc (U; H)). We know only that Note that the right-hand side is well-defined since If V is differentiable, then we can write down in coordinates as Note that our Laplacian is a negative operator in the sense that and equality holds if and only if u is constant. We remark that, even if u ∈ C ∞ (M), ∇u may not be differentiable at points where ∇u vanishes (Remark 2.3(a)). On the set where ∇u = 0, we can calculate (4.1) The Laplacian studied in [Le2] can be regarded as an unweighted version of ∆ H m . Let us recall the definition in [Le2,§4]. Given u ∈ C ∞ (M), the image of the derivative (which is a Lagrangian subspace with respect to ω). We fix x ∈ M with du x = 0 and shall identify P x with the graph of a linear map via a canonical frame and is independent of the choice of a canonical frame by Lemma 3.6. Choose a coordinate around x such that H α i α j (du x ) = δ ij for all i, j, and take the canonical frame (e t i ) n i=1 with e 0 i = ∂ α i . (We prefer this coordinate than a simpler one with H α i α j • α ≡ δ ij for the sake of visibility of the structure of the calculation.) Then we have, by Proposition A.
The negative of this map is called the Hessian with respect to the canonical frame is called the Laplacian in [Le2].
To compare ∆ H m u with ∆ H u, similarly to Definition 3.11, let us decompose m along the action-minimizing curve η(t) := π M (α(t)) as so that ∆ H m u and ∆ H u coincide up to the term depending on the weight function ψ. In other words, ∆ H can be interpreted as the weighted Laplacian ∆ H m with respect to a measure m such that ψ is constant along any action-minimizing curve η, whereas such a measure does not necessarily exist even on a Finsler manifold (Remark 2.6).
Remark 4.1 It can be checked by hand that the above Hessian coincides with ∇ 2 u ∈ T * M ⊗ T M used in [OS3] to study the Bochner-Weitzenböck formula in the Finsler setting. To see this, recall from [OS3, Lemma 2.3] that where g ij (∇u(x)) = δ ij is still assumed. We calculate by using the notations in [OS3] as, at x, where (g ij (du)) is the inverse matrix of (g ij (∇u)). It follows from the homogeneity that Consequently, the geodesic equationη k + G k (η) ≡ 0 shows that Compare this with (4.2).

Energy and harmonic functions
Lemma 4.2 The energy functional E U is lower semi-continuous on L 2 (U; m). Namely, for any sequence including the case where both energies are infinite). We cover W with finitely many, mutually disjoint open sets {U k } (up to an m-negligible set) such that each U k is diffeomorphic to the unit ball in R n . Then Serrin's classical theorem ( [Se], see also [FGM]) is applicable on each U k to obtain We complete the proof by letting ε → 0. ✷

We say that
is weakly harmonic on U if and only if, for any relatively compact open set U ′ ⊂ U, Proof. The convexity of H yields that, for any t ∈ (0, 1), Hence the dominated convergence theorem implies that which completes the proof of the both assertions. ✷ Note that u ∈ H 1 (U; H) was necessary to ensure E U (u) < ∞, while u ∈ H 1 loc (U; L) was used to make ∆ H m u well-defined. The hypothesis of the lemma is fulfilled by, for example, the p-homogeneous deformations of Finsler (or Riemannian) Hamiltonians (see Subsection 5.4). In that case our Laplacian coincides with the (weighted) p-Laplacian.

Riccati equation
In [Le2,§4], Lee showed a Riccati type equation with respect to the Laplacian ∆ H in (4.3) (see also [AL1], [AL2] and [LLZ] for the sub-Riemannian case). We repeat his argument for completeness, and derive a generalization of the Bochner-Weitzenböck formula along the line of [OS3] in the next subsection.
Let u ∈ C ∞ (M) and fix x ∈ M with du x = 0. On a small neighborhood U of x on which du = 0, let us consider the solution (u t ) t∈(−ε,ε) ⊂ C ∞ (U) to the Hamilton-Jacobi equation (4.5) A geometric (or dynamical) meaning of (4.5) is that, for each y ∈ U, Φ t (du y ) = (du t ) Tt(y) (4.6) holds (as far as T t (y) ∈ U), where t −→ T t (y) is the action-minimizing curve with ∂ t [T t (y)]| t=0 = ∇u(y). Put η(t) := T t (x) and let (e t i ) n i=1 be a canonical frame along α(t) := (du t ) η(t) . Then, for each fixed τ ,ẽ t,τ gives a canonical frame along α(τ + t) (similarly to the proof of Lemma 3.8). We write down the map d(du) x : where A(t) = (A ij (t)) and B(t) = (B ij (t)) are (n × n)-matrices. Applying dΦ t yields On the one hand, sinceẽ t,t j ∈ V α(t) , (4.8) implies On the other hand, differentiating (4.6) at y = By combining these with (4.8), we have This shows that Hess H u t (η(t)) = −B(t) −1 A(t) in the frame (ẽ t,t i ;ė t,t j ). By differentiating (4.7) in t, we obtain where we set R t α (e t j ) = n k=1 R jk (t)e t k . This implieṡ We consequently obtain the matrix Riccati equation ([Le2, (31)]) ∂ t Hess H u t η(t) + Hess H u t η(t) 2 + R(t) = 0. (4.9) Taking the trace yields, by the symmetry of Hess H u t (see (4.2)) and Lemma 3.8, where · HS(dut) denotes the Hilbert-Schmidt norm with respect to a canonical frame along α(t) = (du t ) η(t) .

Bochner-Weitzenböck formula
Taking the reference measure m into account, we readily observe from (4.10) and (4.4) that From this, similarly to the Finsler case, one can derive the Bochner-Weitzenböck formula.
Theorem 4.4 (Bochner-Weitzenböck formula) Let u ∈ C ∞ (M). At any x ∈ M with du x = 0, we have Proof. Calculate the first term in (4.11) at t = 0 as Then (4.12) follows fromη(0) = ∇u(x) and (4.5) since One can derive (4.13) from (4.12) in a standard way with the help of (4.4) (see the proof of [OS3,Theorem 3.3] for instance). ✷ (b) In the Finsler case, one can treat functions with lower regularity by passing to the weak (integrated) formulations of (4.12) and (4.13) ([OS3, Theorem 3.6]). It is a more delicate issue for general Hamiltonians. Furthermore, we could obtain Bakry-Émery and Li-Yau gradient estimates in [OS3,§4] as applications of the (weak form of) Bochner-Weitzenböck formulas (see also [Oh6], [WX], [Xi] for further applications). In these proofs, however, we were essentially indebted to the homogeneity of the Finsler metric (Euler's theorem [OS3,Theorem 2.2] to be precise) and it is unclear whether these gradient estimates can be generalized to general Hamiltonians.

Examples
This section is devoted to discussing several examples of Hamiltonians and calculating the curvature and Laplacian of them.

Finsler Hamiltonians
Let (M, F ) be a Finsler manifold as in Subsection 2.3.
One can verify this coincidence by the Riccati equation (4.9) (whose Finsler version can be found in [OS3, Lemmas 3.1, 3.2]) for instance (see also [Le2,§11]). Recall that there is a useful interpretation of the Finsler curvature by using Riemannian structures induced from vector fields satisfying a certain condition (see Subsection 2.3). For general Lagrangians, however, it seems difficult to obtain an analogous interpretation because even 2L

Natural mechanical Hamiltonians
Let (M, g) be a Riemannian manifold and Z ∈ C ∞ (M).
Example 5.2 (Agrachev) The natural mechanical Hamiltonian by identifying V α on the LHS and T x M on the RHS as in Example 5.1, where τ * is the Legendre transform common to H g and H. In particular, we have Ric H (α) = Ric g (τ * (α)) + ∆ g Z(x).
We remark that H is allowed to be negative since it is C ∞ on whole T * M. Since Z is constant on each T x M, we immediately find that L(v) = L g (v) − Z(x) for v ∈ T x M, and that the Legendre transform of H coincides with that of H g . One can also see that the Hessian and Laplacian are common to H g and H. Then, in the Riccati equation (4.9), the term Hess g Z in (5.1) comes from the difference between the Hamilton-Jacobi equations (4.5) of H g and H. To see this, given u ∈ C ∞ (M) and x ∈ M with du x = 0, let (u t ) and (ũ t ) be the solutions to (4.5) around x, with u 0 =ũ 0 = u, with respect to H g and H, respectively. By comparing the matrix representations of Hess Hũ t (x) and Hess g u t (x) as in (4.2), we have We used H α i α j α k ≡ 0 in the first equality. This yields (5.1) (with α = du x ). One can alternatively show (5.1) by the direct computation in (A.4). The weighted curvature can be treated similarly as follows. Let m be a positive C ∞measure on M. In the current situation, we can write down as m = e −ψ vol g globally. Since the Legendre transform is common to H g and H, the weighted Laplacian ∆ H m coincides with ∆ g m . Thus we find from (4.1) that, for (u t ) and (ũ t ) as above, We used Euler's theorem in the last line. Hence we have, by comparing (4.11) for H g and H, involves only the first order derivatives of the weight function ψ, we immediately obtain for N ∈ [n, ∞) as well.

Convex deformations of Finsler Hamiltonians
Let (M, F ) be a Finsler manifold and h : One can see this also by calculating (4.1). The calculation of the curvature is a little more involved, for that we go back to the definition of the curvature operator. The following proposition shows that the ratio of Ric H (α) to Ric F (τ * F (α)) depends only on F * (α) and h. Proposition 5.3 Let H(α) = h(F * (α)) be as above. Then we have for any α ∈ T * M \ 0 T * M , where R 0 α denotes the curvature operator (in the sense of Definition 3.7) with respect to H F and we set c(α) := h ′ (F * (α))/F * (α). It also holds that Ric H N (α) = c(α) 2 Ric F N (τ * F (α)) for any N ∈ [n, ∞].
give a canonical frame along α(t) with respect to H. Now we have and similarly R t α (e t i ) = c 2 R ct α (e t i ) for i ≥ 2. These show R 0 α = c 2 R 0 α as well as Ric H (α) = c 2 Ric F (τ * F (α)). The weighted version immediately follows from the above calculations. To be precise, denoting byψ and ψ the weight functions as in Definition 3.11 with respect to H F and H, we find ψ • π M • α(t) =ψ • π M •α(ct) + C with some constant C (depending on F * (α) and h). This yields Ric H N (α) = c 2 Ric F N (τ * F (α)). ✷ For instance, if h(t) = (at) 2 /2 with some a > 0, then we have When we deform the Lagrangian as L(v) := h(F (v)) instead of the Hamiltonian, we observe from

Homogeneous deformations of Finsler Hamiltonians
Let (M, F ) be a Finsler manifold again. One of the most important examples of convex deformations of F * is the p-homogeneous deformation: The corresponding Lagrangian is L q (v) := F (v) q /q, where q = p/(p − 1) is the dual exponent of p. By (5.2), H p leads to the (Finsler analogue of) famous p-Laplacian

One can perform better analysis for ∆
Hp m thanks to the homogeneity, although we do not pursue that direction in this paper (except for Example 8.8, we refer to recent works [Ke1], [Ke2] instead). Note also that Ric Hp N (α) = F * (α) 2(p−2) Ric F N (τ * F (α)) by Proposition 5.3. This suggests that the behavior of ∆ Hp m u depends on how large F * (du) is.

Laplacian comparison theorem
We return to a general Hamiltonian H on M and show a generalization of the Laplacian comparison theorem associated with the weighted Ricci curvature Ric H N (suggested in [Le2,Remark 4.4], see also [AL2] for a related work in the sub-Riemannian setting). We also refer to [OS1,Theorem 5.2] for the Finsler case and to [Gi] for the more general case of metric measure spaces enjoying the curvature-dimension condition. Our Laplacian comparison theorem would be compared with the Bonnet-Myers type theorem of Agrachev and Gamkrelidze ([AG,Theorem in §4], [Agr2, Theorem 2.1]). Throughout the section, we assume the forward completeness of the Hamiltonian flow Φ t .
Remark 6.1 In the Riemannian and Finsler cases, the map exp c z coincides with the ordinary exponential map exp z for all c > 0 (via the Legendre transform as usual). For the p-homogeneous deformation H p of a Finsler metric as in Subsection 5.4, it holds that L(η) ds.
Theorem 6.2 (Laplacian comparison) Assume that Ric H N (α) ≥ K holds for some K ∈ R, N ∈ [n, ∞), and all α ∈ T * M with H(α) = c. Then we have, for any z ∈ M, α ∈ T * z M with H(α) = c, and the action-minimizing curve η : [0, Proof. It follows from (4.11) with u t = u c z − ct and Theorem 4.4 that for t ∈ (0, T α ). We have Ric H N ((du c z ) η(t) ) = Ric H N (α(η(t))) ≥ K by assumption, therefore Put u(t) := ∆ H m u c z (η(t)) for brevity and compare it withũ(t) := Ns ′ K,N (t)/s K,N (t), where Observe thatũ ′ = −K −ũ 2 /N and lim t↓0 s K,N (t) 2ũ (t) = 0. Then the desired inequality u ≤ũ follows from lim t↓0 s K,N (t) 2 u(t) = 0 and the calculation (see [WW,(3.10 Finally, lim t↓0 s K,N (t) 2 u(t) = 0 can be seen as follows. Choose a local coordinate (x i ) n i=1 of a small neighborhood U of z such that every action-minimizing curve σ with σ(0) = z and H(τ (σ)) ≡ c is represented as In the Riemannian and Finsler cases, u c z / √ 2c coincides with the distance function d z from z, and (6.1) is improved to by decomposing ∆ m d z into the radial and spherical directions from z. It is unclear if a similar improvement can be done for general Hamiltonians (see also the Bonnet-Myers type theorem of Agrachev and Gamkrelidze). The point is that u c z (η(t)) is not proportional to t since (u c z • η) ′ (t) = c + L(η(t)) and L(η) is not necessarily constant. We also remark that even the stronger inequality (6.2) does not imply the curvature bound Ric H N ≥ K (see [St2,Remark 5.6] for a simple example, and [Ju] for a related work on the gap between the measure contraction property and the curvature-dimension condition in Heisenberg groups).

Measure contraction property
A geometric counterpart to the Laplacian comparison is the measure contraction property (see [Oh1], [St2,§5] for the precise definition on metric measure spaces, and [AL1], [LLZ] for related works on sub-Riemannian manifolds). To state it, we fix c > 0, z ∈ M, α ∈ T * z M and η : [0, T α ] −→ M as in Theorem 6.2, and take a local coordinate ( of a neighborhood U of η(T ) with T ∈ (0, T α ) such that ∇u c z ≡ ∂ x 1 . We also introduce ς : U −→ R by m = e ς dx 1 · · · dx n . Then the measure contraction property we consider is that the ratio e ς(η(t)) s K,N (t) N is non-increasing in t. This is clearly equivalent to Therefore the above measure contraction property is equivalent to the Laplacian comparison (6.1): Remark 6.4 In [Le2, Theorems 2.2, 2.10], Lee showed convexity estimates of the relative and Rényi entropies along smooth optimal transports. These can be interpreted as generalizations of the curvature-dimension condition (recall the last paragraph in Section 2), and has applications to various monotonicity formulas along flows in Riemannian metrics related to the Ricci flow ( [Le2,§2]). Precisely, this approach recovers the curvaturedimension conditions CD(K, ∞) and CD(0, N) for Riemannian or Finsler manifolds. The condition CD(K, N) with K = 0 and N < ∞ is more delicate and seems difficult to treat for general Hamiltonians for the same reasoning as Remark 6.3.

Optimal transports and functional inequalities
In this section, we briefly recall some properties of optimal transports measured by Lagrangian cost functions. Then, assuming that the relative entropy is convex along all optimal transports, we obtain functional inequalities along the lines of [OV], [LV2]. See the comprehensive reference [Vi] for optimal transport theory. Let M be compact throughout the section for simplicity, and L be our Lagrangian. For later use, we fix an auxiliary Riemannian metric g of M.  Π(µ, ν) the set of all couplings of µ and ν, which is nonempty since the product measure µ × ν is clearly a coupling of µ and ν. Then the transport cost from µ to ν (measured by L) is defined as

Optimal transport theory
which is finite since M is compact. A coupling π ∈ Π(µ, ν) achieving the infimum in (7.1) is called an optimal coupling of µ and ν. Under mild assumptions, there exists a unique optimal coupling whose support is drawn as the graph of some map T : M −→ M (an optimal transport from µ to ν). This fundamental fact was first shown for the canonical Lagrangian on Euclidean spaces by Brenier [Bre], and extended to compact Riemannian manifolds by McCann [Mc], and then to general Lagrangians on noncompact spaces by Bernard and Buffoni [BB], Fathi and Figalli [FF] and Villani [Vi,Chapter 10].
Remark 7.2 Lagrangians are assumed to be C 2 on whole T M at some places in [Vi]. However, it causes no problem because our Lagrangian fails to be C 2 only on the zero section 0 T M which is isolated in the Euler-Lagrange flow (recall the paragraph following Definition 2.1). Therefore, if Ric H ∞ (α) ≥ KH(α) for some K ∈ R and all α ∈ T * M, then we have for later use. Note that C H T = C L T in the Riemannian and Finsler cases. We say that Ent m is K-convex for K ∈ R and T > 0 if (7.3) holds for all (not necessarily smooth) optimal transports (µ t ) t∈[0,T ] as in Theorem 7.1.
Example 7.4 (a) For Finsler manifolds, thanks to the homogeneity, the K-convexity for some T > 0 implies the K-convexity for all T > 0. In this case, the 2K-convexity of Ent m with respect to H F (α) = F * (α) 2 /2 is the very definition of the curvature-dimension condition CD(K, ∞) and is equivalent to is the diameter of (M, F ). Therefore Ent m is (pK(diam F M/T ) p−2 )-convex with respect to H p for T > 0 when (1) K = 0 and p ∈ (1, ∞); or (2) K > 0 and p ∈ (1, 2); or (3) K < 0 and p ∈ (2, ∞).
The following proposition will be useful.
Proposition 7.5 (Directional derivatives of Ent m ) Suppose that Ent m is K-convex for some K ∈ R and T > 0, and take µ = ρm ∈ P(M) such that ρ ∈ H 1 (M; H). Then we have, for any optimal transport µ t = (T t ) ♯ µ, t ∈ [0, T ], as in Theorem 7.1, Proof. It follows from Theorem 7.3 with f (t) = t log t that (To be precise, we applied Theorem 7.3 to max{f, 0} and max{−f, 0} and took their difference.) Hence, by localizing the assumption (7.3), we find that − log(D m [T t ](x)) is K-convex in t for µ-almost every x. Thus the monotone convergence theorem shows that We see by calculation Indeed, observe in the calculation in Subsection 4.3 that dT t = B(t) and B(0) = I n . Then we have Plugging the effect of the measure m into this yields (7.5) (recall (4.4)).
Notice that the singular part of ∆ H m ϕ is non-negative by looking at the second order term of (4.1). Finally, we obtain (7.4) by integration by parts, see Steps 1-3 in [Vi,Theorem 23.14] for details. ✷ For µ = ρm ∈ P(M), define This quantity can be thought of as the Fisher information adapted to our context.
Theorem 7.6 (Functional inequalities) Assume that m[M] = 1 and that Ent m is Kconvex in the sense of (7.3) for some K, T > 0.
(ii) We again use (7.6) and find, for t ∈ (0, 1], Proposition 7.5 yields This completes the proof. ✷ In the Riemannian or Finsler case, the HWI inequality (with T = 1) has the sharper form [OT1,§5] and [OT2,§6] for related functional inequalities on different kinds of entropies inspired by information theory.

Heat flow
In this section, we study the evolution equation ∂ t u = ∆ H m u regarded as the heat equation associated with our nonlinear weighted Laplacian ∆ H m . In the Euclidean setting, there are two ways to interpret heat flow as gradient flow. The classical strategy is to consider heat flow as the gradient flow of the Dirichlet energy in the L 2 -space. Another one initiated by the seminal work of Jordan et al. [JKO] is to consider heat flow as the gradient flow of the relative entropy in the L 2 -Wasserstein space. These interpretations and their equivalence were generalized to various settings ( [Sa], [Oh2], [GO], [Er], [OS1], [Maa], [GKO], [AGS3] etc.) including singular spaces without differentiable structures.
We shall investigate these two strategies for our heat equation and see that they produce different gradient flows, because of the non-homogeneity of the Hamiltonian H.
Let M be compact throughout the section, and fix an auxiliary Riemannian metric g of M similarly to the previous section.

Heat flow as gradient flow of energy
for every test function φ ∈ C ∞ (M) (provided that E(u t + εφ) < ∞ for some ε > 0). Therefore, with respect to the L 2 -inner product structure, ∆ H m u t is the 'gradient vector' of −E at u t and the heat equation ∂ t u = ∆ H m u coincides with the (descending) gradient flow equation of the potential function E.
Motivated by the above heuristic argument, we shall build weak solutions to ∂ t u = ∆ H m u as gradient curves of E. In fact, since E is a convex and lower semi-continuous functional on L 2 (M; m) (Lemma 4.2), the classical theory due to Brézis, Crandall, Liggett et al applies (see [Bré], [CL]). For completeness, we repeat the construction in [OS1,§3] concerning the Finsler case with the help of [Oh2]. Because of the author's familiarity, our construction follows the line of [May] which deals with the more general situation of convex functions on nonpositively curved metric spaces.
For u ∈ H 1 (M; H), define It follows from the convexity of E that Thus we have, together with the definition of |∇(−E)|(u), The convexity of E also implies that, for any i, j ≥ 1, we used w i L 2 = w j L 2 = |∇(−E)|(u). Since the LHS is not greater than |∇(−E)|(u), we find from (8.2) that lim inf i,j→∞ w i + w j L 2 ≥ 2|∇(−E)|(u). Combining this with w i L 2 = |∇(−E)|(u) for all i implies that {w i } i∈N is a Cauchy sequence in L 2 (M; m) and converges to some element w ∈ L 2 (M; m). The desired equation (8.1) holds due to (8.2).
The uniqueness follows from the strict convexity of · L 2 and the convexity of E. ✷ We excluded the case of |∇(−E)|(u) = 0 since (8.1) does not necessarily hold with w = 0. A gradient curve of E should be understood as a solution to ∂ t u = ∇(−E)(u). We explain how to formulate and construct it. Fix u 0 ∈ H 1 (M; H) and δ > 0. Denote by U δ (u 0 ) ∈ H 1 (M; H) the unique minimizer of the strictly convex, lower semi-continuous functional Then, as k → ∞, the discrete approximation scheme (U t/k ) k (u 0 ) converges to a curve (u t ) t>0 ⊂ L 2 (M; m) such that lim t↓0 u t = u 0 in L 2 (M; m) and E(u t ) ≤ E(u 0 ) for all t > 0 (we refer to [May,Theorem 1.13] for details). Moreover, the convergence is uniform on (0, T ] for each T > 0, and (u t ) t≥0 satisfies the following: • The curve (u t ) t>0 is locally Lipschitz in L 2 (M; m) and satisfies at all t ≥ 0 ([May, Theorems 2.9, 2.17]). In particular, |∇(−E)|(u t ) < ∞ holds and the gradient vector ∇(−E)(u t ) ∈ L 2 (M; m) is well-defined for all t > 0.
With the help of (8.4) and (8.5), the same discussion as Lemma 8.1 (replacingŵ i and w i with u t+δ − u t and (u t+δ − u t )/δ, respectively) shows that for all t > 0, i.e., ∂ t u t = ∇(−E)(u t ) in the weak sense. In addition, we obtain the following similarly to [Oh2,Lemma 6.4].
(ii) Moreover, dσ δ converges to du t as δ ↓ 0 in the L 2 -norm with respect to g. Proof.
(ii) The strict convexity of H and (8.7) show that {dσ δ } δ>0 is a Cauchy sequence and converges to du t , i.e., lim δ↓0 M |dσ δ − du t | 2 g dm = 0. (We remark that, however, M |dσ δ | 2 g dm = ∞ may happen.) ✷ We are ready to show the main result of the subsection. Due to technical difficulties, we impose the following assumption: Fix t > 0. For small δ, ε > 0, let us compare σ δ := U δ (u t ) and σ ε δ := σ δ + εφ t . By the definition of U δ (u t ), we have On the one hand, by expanding σ δ − u t 2 L 2 and σ ε δ − u t 2 L 2 , we find On the other hand, we deduce from the convexity of H that Therefore we have, with the help of Lemma 8.3(ii) and (A), To be precise, for each ε > 0, we have by (A) which tends to 0 as δ ↓ 0 and then ε ↓ 0. Together with Lemma 8.3(i), we obtain lim inf By exchanging φ with −φ, we obtain the desired equation (8.9). For a time-independent test function φ ∈ C ∞ (M), (8.6) and (8.9) show that for all t > 0. Therefore the distributional Laplacian ∆ H m u t is absolutely continuous with respect to m and the density function is ∇(−E)(u t ). ✷ Given two solutions (u t ) t≥0 , (ū t ) t≥0 to the heat equation constructed in Theorem 8.4, applying (8.9) with φ = u −ū (this is possible by [OS1, Remark 3.2(i)]), we find at almost every t > 0. The convexity of H implies which yields the contraction ( u t −ū t 2 L 2 ) ′ ≤ 0 of the heat flow. In particular, if u 0 =ū 0 m-almost everywhere, then u t =ū t m-almost everywhere for all t > 0.

Gradient flow of the relative entropy
We next consider the gradient flow of the relative entropy Ent m with respect to the Lagrangian cost function C L T introduced in Section 7. We will verify that such a flow is produced from weak solutions to the evolution equation which is different from the heat equation ∂ t u = ∆ H m u due to the non-homogeneity of the gradient operator ∇.
Let us begin with a discussion on how to define gradient flows. Our strategy follows the metric approach recently intensively studied by Ambrosio, Gigli, Savaré and others (see [AGS1]). Given a C 1 -function f : M −→ R, the gradient flow of f with respect to L (or H) is introduced as the family of maps in T η(t) M and −df η(t) (v) ≤ H(−df η(t) ) + L(v) holds in general, the gradient flow equation (8.11) can be rewritten as (f • η) ′ (t) ≤ −H(−df η(t) ) − L η(t) .
Then, it is natural to introduce the following metric definition.
where the supremum is taken over all g-Lipschitz functions ϕ : [t, t + s] × M −→ R which are C 1 on (t, t + s) × M and satisfy the inequality (of Hamilton-Jacobi type) ∂ r ϕ + H(dϕ) ≤ 0 on (t, t + s) × M.
Together with (8.10), we obtain by the bounded convergence theorem (since ρ t is locally g-Lipschitz and inf M ρ t > 0) and lim r↓0 dρ t+r = dρ t . ✷ Combining above three estimates, we immediately obtain the following.
Theorem 8.7 (Gradient flow of Ent m ) Assume that Ent m is K-convex in the sense of (7.3) for some K ∈ R and all T > 0. Let (ρ t ) t≥0 be a weak solution to the equation (8.10) satisfying (C) and µ t := ρ t m ∈ P(M) for all t ≥ 0. Then (µ t ) t≥0 is a gradient curve of Ent m in the sense of (8.12).
Proof. Since dρ t /ρ t = d[log ρ t ], we deduce from (8.14), (8.13) and Lemma 8.6 that This completes the proof. ✷ Example 8.8 For the p-homogeneous deformation H = | · | p g /p of a Riemannian metric g, the equation (8.10) turns out This recovers the case of L(v) = |v| q /q on R n studied in [AGS1,§11.3] and [Agu], in which more general entropy functionals are investigated. For instance, employing the Rényi-Tsallis entropy Therefore we have, by (2.3), H α k α l α(t) ξ l i (t)ξ k j (t) = δ ij .

✷
One can construct a canonical frame from an orthonormal frame as follows.
Proposition A.2 (see [Le2,Proposition 3.2]) Given a frame E t = (ē t i ) n i=1 orthonormal in the sense of Lemma A.1, put Ω t ij := ω(ė t i ,ė t j ) and let O t be the solution tȯ Then E t = (e t i ) n i=1 with e t i := n j=1 O t ijē t j gives a canonical frame.
Proof. We will sometimes omit the evaluations at time t for brevity. Then it follows from (the proof of) Lemma 3.5 and ω(Ė, E) ≡ I n that ω(Ė t , E t ) ≡ I n . It remains to showë t i ∈ J t α . We havë This impliesë t i ∈ J t α because the intersection of the kernels of ω(·,ē t j ), j = 1, . . . , n, coincides with J t α . ✷

A.2 Coordinate representation
We compute the curvature operator in coordinates by using the canonical frame in the previous subsection. First of all, Proposition A.2 yields the following (see [Le2,Proposition 3.3]): We choose a coordinate satisfying H α i α j (α) = δ ij for all i, j, so that we can take ξ i (0) = ∂ α i . Then it holds that, by the calculation in Lemma A.1, Now, for further simplification, let us assume H α i α j (α(t)) = δ ij and put ξ i (t) = ∂ α i for all t. It is always possible to choose such a coordinate by noticing that gives an inner product of T η(t) M for each t and by taking (x i ) n i=1 such that (∂ x i ) n i=1 is orthonormal with respect to this inner product at every η(t). Then we have, by omitting evaluations at α and t = 0 for brevity, Combining this with (A.1), (A.2) and (A.3), we see that the horizontal part of R 0 α (e 0 i ) indeed vanishes (so that R 0 α (e 0 i ) ∈ V α ). We finally calculatė Consequently, we obtain that the coefficient of ∂ α j = e 0 j in R 0 α (e 0 i ) is Note that this is indeed symmetric in i and j.