Some trace inequalities for exponential and logarithmic functions

Consider a function $F(X,Y)$ of pairs of positive matrices with values in the positive matrices such that whenever $X$ and $Y$ commute $F(X,Y)= X^pY^q.$ Our first main result gives conditions on $F$ such that ${\rm Tr}[ X \log (F(Z,Y))] \leq {\rm Tr}[X(p\log X + q \log Y)]$ for all $X,Y,Z$ such that ${\rm Tr} Z = {\rm Tr} X$. (Note that $Z$ is absent from the right side of the inequality.) We give several examples of functions $F$ to which the theorem applies. Our theorem allows us to give simple proofs of the well known logarithmic inequalities of Hiai and Petz and several new generalizations of them which involve three variables $X,Y,Z$ instead of just $X,Y$ alone. The investigation of these logarithmic inequalities is closely connected with three quantum relative entropy functionals: The standard Umegaki quantum relative entropy $D(X||Y) = {\rm Tr} [X(\log X-\log Y])$, and two others, the Donald relative entropy $D_D(X||Y)$, and the Belavkin-Stasewski relative entropy $D_{BS}(X||Y)$. They are known to satisfy $D_D(X||Y) \leq D(X||Y)\leq D_{BS}(X||Y)$. We prove that the Donald relative entropy provides the sharp upper bound, independent of $Z$, on ${\rm Tr}[ X \log (F(Z,Y))]$ in a number of cases in which $(Z,Y)$ is homogeneous of degree $1$ in $Z$ and $-1$ in $Y$. We also investigate the Legendre transforms in $X$ of $D_D(X||Y)$ and $D_{BS}(X||Y)$, and show how our results for these lead to new refinements of the Golden-Thompson inequality.


Introduction
Let M n denote the set of complex n × n matrices. Let P n and H n denote the subsets of M n consisting of strictly positive and self-adjoint matrices respectively. For X, Y ∈ H n , X ≥ Y to 1 Work partially supported by U.S. National Science Foundation grant DMS 1501007. 2 Work partially supported by U.S. National Science Foundation grant PHY 1265118 c 2017 by the authors. This paper may be reproduced, in its entirety, for non-commercial purposes.
indicate that X − Y is positive semi-definite; i.e., in the closure of P n , and X > Y indicates that X ∈ P n .
Let p and q be non-zero real numbers. There are many functions F : P n × P n → P n such that F (X, Y ) = X p Y q whenever X and Y compute. For example, Further examples can be constructed using geometric means: For positive n × n matrices X and Y , and t ∈ [0, 1], the t-geometric mean of X and Y , denoted by X# t Y , is defined by Kubo and Ando [26] to be X# t Y := X 1/2 (X −1/2 Y X −1/2 ) t X 1/2 . (1. 2) The geometric mean for t = 1/2 was initially defined and studied by Pusz and Woronowicz [36]. The formula (1.2) makes sense for all t ∈ R and it has a natural geometric meaning [40]; see the discussion around Definition 2.4 and in Appendix C. Then for all r > 0 and all t ∈ (0, 1), is such a function with p = r(1 − t) and q = rt. Other examples will be considered below.
If F is such a function, then Tr[X log F (X, Y )] = Tr[X(p log X + q log Y )] whenever X and Y commute. We are interested in conditions on F that guarantee either Tr[X log F (X, Y )] ≥ Tr[X(p log X + q log Y )] (1.4) or Tr[X log F (X, Y )] ≤ Tr[X(p log X + q log Y )] (1.5) for all X, Y ∈ P n . Some examples of such inequalities are known: Hiai and Petz [23] proved that 1 p Tr[X log(Y p/2 X p Y p/2 )] ≤ Tr[X(log X + log Y )] ≤ 1 p Tr[X log(X p/2 Y p X p/2 )] , (1.6) for all X, Y > 0 and all p > 0. Replacing Y by Y q/p shows that for F (X, Y ) = X p/2 Y q X p/2 , (1.4) is valid, while for F (X, Y ) = Y q/2 X p Y q/2 , (1.5) is valid: Remarkably, the effects of noncommutativity go in different directions in these two examples. Other examples involving functions F of the form (1.3) have been proved by Ando and Hiai [2].
Here we prove several new inequalities of this type, and we also strengthen the results cited above by bringing in a third operator Z: For example, Theorem  this persists in the non-commutative case, and we obtain similar results for other choices of F , in particular for those defined in terms of gemetric means. One of the reasons that inequalities of this sort are of interest is their connection with quantum relative entropy. By taking Y = W −1 , with X and W both having unit trace, so that both X and W are density matrices, the middle quantity in (1.6), Tr[X(log X − log W )], is the Umegaki relative entropy of X with respect to W . Thus (1.6) provides upper and lower bounds on the relative entropy.
There is another source of interest in the inequalities (1.6), which Hiai and Petz refer to as logarithmic inequalities. As they point out, logarithmic inequalities are dual, via the Legendre transform, to certain exponential inequalities related to the Golden-Thompson inequality. Indeed, the quantum Gibbs variational principle states that The left side of (1.9) provides a lower bound for log(Tr[e H+log W ]) in terms of a Legendre transform, which, unfortunately, cannot be evaluated explicitly.
An alternate use of the inequality on the right in (1.6) does yield an explicit lower bound on log(Tr[e H+log W ]) in terms of a geometric mean of e H and W . This was done in [23]; the bound is  [23] as a complement to the Golden-Thompson inequality.
Hiai and Petz show [23, Theorem 2.1] that the inequality (1.10) is equivalent to the inequality on the right in (1.6). One direction in proving the equivalence, starting from (1.10), is a simple differentiation argument; differentiating (1.10) at t = 0 yields the result. While the inequality on the left in (1.6) is relatively simple to prove, the one on the right appears to be deeper and more difficult to prove, from the perspective of [23].
In our paper we prove a number of new inequalities, some of which strengthen and extend (1.6) and (1.10). Our results show, in particular, that the geometric mean provides a natural bridge between the pair of inequalities (1.6). This perspective yields a fairly simple proof of the deeper inequality on the right of (1.6), and thereby places the appearance of the geometric mean in (1.10) in a natural context.
Before stating our results precisely, we recall the notions of operator concavity and operator convexity. A function F : P n → H n is concave in case for all X, Y ∈ P n and all t ∈ [0, 1], and F is convex in case −F is concave. For example, F (X) := X p is concave for p ∈ [0, 1] as is F (x) := log X.
A function F : P n ×P n → H n is jointly concave in case for all X, Y, W, Z ∈ P n and all t ∈ [0, 1] and F is jointly convex in case −F is jointly concave. Strict concavity or convexity means that the left side is never zero for any t ∈ (0, 1) unless X = Y and Z = W . A particularly well-known and important example is provided by the generalized geometric means. By a theorem of Kubo and Ando [26], Other examples of jointly concave functions are discussed below. Our first main result is the following: 1.1 THEOREM. Let F : P n × P n → P n be such that: (1) For each fixed Y ∈ P n , X → F (X, Y ) is concave, and for all λ > 0, F (λX, Y ) = λF (X, Y ).
(2) For each n × n unitary matrix U, and each X, Y ∈ P n , Then, for all X, Y, Z ∈ P n such that Tr[Z] = Tr[X], If, moreover, X → F (X, Y ) is strictly concave, then the inequality in (1.12) is strict when Z and Y do not commute.
dλ, which evidently satisfies the conditions of Theorem 1.1 with q = −1. We obtain, thereby, the following inequality: Another simple application can be made to the function F (X, Y ) = Y 1/2 XY 1/2 , however in this case, an adaptation of method of proof of Theorem 1.1 yields a more general result for the two-parameter family of functions F (X, Y ) = Y p/2 X p Y p/2 for all p > 0 1.4 THEOREM. For all X, Y, Z ∈ P n such that Tr[Z] = Tr [X], and all p > 0, (1.14) The inequality in (1.14) is strict unless Z and Y commute, Specializing to the case Z = X, (1.14) reduces to the inequality on the left in (1.6). Theorem 1.4 thus extends the inequality of [23] by inclusion of the third variable Z, and specifies the cases of equality there. We also obtain results for the two parameter family of functions 1]. and r > 0. In this case, when X and Y commute, F (X, Y ) = X p Y q with p = rs and q = r(1 − s) .
(1.15) It would be possible to deduce at least some of these results directly from Theorem 1.1 is we knew that, for example, For s ∈ (0, 1), when Z does not commute with Y , the inequality is strict.
The case in which Z = X is proved in [2] using log-majorization methods. The inequality (1.16) is an identity at s = 1. As we shall show, differentiating it at s = 1 in the case Z = X yields the inequality on the right in (1.6). Since the geometric mean inequality (1.16) is a consequence of our generalization of the inequality on the left in (1.6), this derivation shows how the geometric means construction 'bridges' the pair of inequalities (1.6). Theorems 1.3, 1.4 and 1.6 provide infinitely many new lower bounds on the Umegaki relative entropy. -one for each choice of Z. The trace functional on the right side of (1.6) bounds the Umegaki relative entropy from above, and in many ways better-behaved than the trace functional on the left, or any of the individual new lower bounds. By a theorem of Fujii and Kamei [17] X, W → X 1/2 log(X 1/2 W −1 X 1/2 )X 1/2 is jointly convex as a function from P n × P n to P n , and then as a trivial consequence, is jointly convex. When X and W are density matrices, Tr[X log(X 1/2 W −1 X 1/2 )] =: D BS (X||W ) is the Belavkin-Stasewski relative entropy [6]. The joint convexity of the Umegaki relative entropy is a Theorem of Lindblad [32], who deduced it as a direct consequence of the main concavity theorem in [30].
A seemingly small change in the arrangement of the operators -X 1/2 W −1 X 1/2 replaced with W −1/2 XW −1/2 -obliterates convexity; is not jointly convex, and even worse, the function W → Tr[X log(W −1/2 XW −1/2 )] is not convex for all fixed X ∈ P n . Therefore, although the function in (1.17) agrees with the Umegaki relative entropy when X and W commute, its lack of convexity makes it unsuitable for consideration as a relative entropy functional. We discuss the failure of convexity at the end of Section 3. However, Theorem 1.4 provides a remedy by introducing a third variable Z with respect to which we can maximize. The resulting functional is still bounded above by the Umegaki relative entropy: that is, for all density matrices X and W , (1.18) One might hope that the left side is a jointly convex function of X and W , which does turn out to be the case. In fact, the left hand side is a quantum relative entropy originally introduced by Donald [14], through a quite different formula. Given any orthonormal basis {u 1 , . . . , u n } of C n , define a "pinching" map Φ : M n → M n by defining Φ(X) to be the diagonal matrix whose jth diagonal entry is u j , Xu j . Let P denote the sets of all such pinching operations. For density matrices X and Y , the Donald relative entropy, D D (X||Y ) is defined by Hiai and Petz [23] showed that for all density matrices X and all Y ∈ P n , arguing as follows. Fix any orthonormal basis {u 1 , . . . , u n } of C n . Let X be any density matrix and let Y be any positive matrix. Define x j = u j , Xu j and y j = u j , Y u j for j = 1, . . . , n. For (h 1 , . . . , h n ) ∈ R n , define H to be the self-adjoint operator given by Hu j = h j u j , j = 1, . . . , n. Then by the classical Gibb's variational principle.  By the joint convexity of the Umegaki relative entropy, for each Φ ∈ P, D(Φ(X)||Φ(Y )) is jointly convex in X and Y , and then since the supremum of a family of convex functions is convex, the Donald relative entropy D D (X||Y ) is jointly convex. Making the change of variables Z = W 1/2 e H W 1/2 in (1.18), one sees that the supremum in (1.20) is exactly the same as the supremum in (1.21), and thus for all density matrices X and W , D D (X||W ) ≤ D(X||W ) which can also be seen as a consequence of the joint convexity of the Umegaki relative entropy. Tr[X log(Y −1 # 1/2 Z) 2 ] (1.23) Proposition 3.1 shows that both of the supremums are equal to D D (X||Y ). Our next results concern the partial Legendre transforms of the three relative entropies D D (X||Y ), D(X||Y ) and D BS (X||Y ). For this, it is natural to consider them as functions on P n × P n , and not only on density matrices. The natural extension of the Umegaki relative entropy functional to P n × P n is It is homogeneous of degree one in X and W and, with this definition, D(X||Y ) ≥ 0 with equality only in case X = W , which is a consequence of Klein's inequality, as discussed in Appendix A. The natural extension of the Belavkin-Stasewski relative entropy functional to P n × P n is To avoid repetition, it is useful to note that all three of these functionals are examples of quantum relative entropy functionals in the sense of satisfying the following axioms. This axiomatization differs from many others, such as the ones in [14] and [18], which are designed to single out the Umegaki relative entropy.

DEFINITION.
A quantum relative entropy is a function R(X||W ) on P n × P n with values in [0, ∞] such that (2) For all X, W ∈ P n and all λ > 0, R(λX, λW ) = λR(X, W ) and (1.27)  where · 1 denotes the trace norm.
The proof is given towards the end of Section 3. It is known for the Umegaki relative entropy [21], but the proof uses only the properties (1), (2) and (3).
The following pair of inequalities summarizes the relation among the three relative entropies. For all X, W ∈ P n , These inequalities will imply a corresponding pair of inequalities for the partial Legendre transforms in X.
1.10 Remark. The partial Legendre transform of the relative entropy, which figures in the Gibbs variational principle, is in many ways better behaved than the full Legendre transform. Indeed the Legendre transform F * of a function F on R n that is convex and homogenous of degree one always has the form F * (y) = 0 y ∈ C ∞ y / ∈ C for some convex set C [38]. The set C figuring in the full Legendre transform of the Umegaki relative entropy was first computed by Pusz and Woronowicz [37], and somewhat more explicitly by Donald in [14].
Consider any function R(X||Y ) on P n × P n that is convex and lower semicontinuous in X. There are two natural partial Legendre transforms that are related to each other, namely Φ R (H, Y ) and Ψ R (H, Y ) defined by where H ∈ H n is the conjugate variable to X. For example, let R(X||Y ) = D(X||Y ) , the Umegaki relative entropy. Then, by the Gibbs variational principle, Let R(X||Y ) be any function on P n ×P n that is convex and lower semicontinuous in X, and which satisfies the scaling relation (1.27). Then for all H ∈ H n and all Y ∈ P n .
This simple relation between the two Legendre transforms is a consequence of scaling, and hence the corresponding relation holds for any quantum relative entropy.
Consider the Donald relative entropy and define In Lemma 3.7, we prove the following analog of (1.32): For H ∈ H n and Y ∈ P n , where for any self-adjoint operator K, λ max (K) is the largest eigenvalue of K, and we prove that The inequality on the right in (1.39) arises through the simple of choice Q = e H /Tr[Y e H ] in the variational formula for Ψ D (H, Y ). The Q chosen here is optimal only when H and Y commute. Otherwise, there is a better choice for Q, which we shall identify in section 4, and which will lead to a tighter upper bound. In section 4 we shall also discuss the Legendre transform of the Belavkin-Staszewski relative entropy and form this we derive further refinements of the Golden Thompson inequality. Finally, in Theorem 4.3 we prove a sharpened form of (1.10), the complementary Golden-Thompsen inequality of Hiai and Petz, incorporating a relative entropy remainder term. Three appendices collect background material for the convenience of the reader.

Proof of Theorem 1.1 and Related Inequalities
Proof of Theorem 1.1. Our goal is to prove that for all X, Y, Z ∈ P n such that Tr whenever F has the properties (1), (2) and (3) By the Peierls-Bogoliubov inequality (A.3), it suffices to prove that Let J denote an arbitrary finite index set with cardinality |J |. Let U = {U 1 , . . . , U |J | } be any set of unitary matrices each of which commutes with Y . Then for each j ∈ J , by (2) Recall that W → Tr[e H+log W ] is concave [30]. Using this, the concavity of Z → F (Z, Y ) specified in (1), and the monotonicity of the logarithm, averaging both sides of (2.4) over j yields Now making an appropriate choice of U [13], Z becomes the "pinching" of Z with respect to Y ; i.e., the orthogonal projection in M n onto the * -subalgebra generated by Y and 1. In this case, Z and Y commute so that by (3), Altogether, and this proves (2.3).
For the case F (X, Y ) = Y p/2 X p Y p/2 , we can make a similar use of the Peierls-Bogoliubov inequality but can avoid the appeal to convexity.
Proof of Theorem 1.4. The inequality we seek to prove is equivalent to and again by the Peierls-Bogoliubov inequality it suffices to prove that A refined version of the Golden-Thompson inequality due to Friedland and So [16] says that for all positive A, B, and all r > 0, and moreover the right hand side is a strictly increasing function of r, unless A and B commute, in which case it is constant in r. The fact that the right side of (2.7) is increasing in r is a conseqence of the Araki-Lieb-Thirring inequality [4], but here we shall need to know that the increase is strict when A and B do not commute; this is the contribution of [16]. Applying (2.7) with r = p, By the condition for equality in (2.7), there is equality in (2.8) if and only if (Y p/2 Z p Y p/2 ) 1/p and Y commute, and evidently this is the case if and only if Z and Y commute.
In the one parameter family of inequalities provided by Theorem 1.4, some are stronger than others. It is worth noting that the lower the value of p > 0 in (1.14) the stronger this inequality is, in the following sense: 2.1 PROPOSITION. The validity of (1.14) for p = p 1 and for p = p 2 implies its validity for p = p 1 + p 2 .
Proof. Since there is no constraint on Y other than that Y is positive, we may replace Y by any power of Y . Therefore, it is equivalent to prove that for all X, Y, Z ∈ P n such that Tr[Z] = Tr[X] and all p > 0, If (2.9) is valid for p = p 1 and for p = p 2 , then it is also valid for p = p 1 + p 2 : One more application of (2.9), this time with p = p 2 , yields By the last line of Corollary 1.4, the inequality (2.10) is strict if Z and Y do not commute and at least one of p 1 or p 2 belongs to (0, 1).
Our next goal is to prove Theorem 1.6. As indicated in the Introduction, we will show that Theorem 1.6 is a consequence of Theorem 1.4. The determination of cases of equality in Theorem 1.4 is essential for the proof of the key lemma, which we give now. Proof. We may suppose, without loss of generality, that Y and Z do not commute since, if they do commute, the inequality is trivially true, just as in Remark 1.5. We compute

Now note that
Moreover, by definition W = Φ(X) where Φ is a completely positive, trace and identity preserving linear map. By Lemma B.2 this implies that Consequently, Therefore, unless Y and Z commute, the derivative on the left is strictly negative, and hence, for some ǫ > 0, (1.16) is valid as a strict inequality for all s ∈ (0, ǫ). If Y and Z commute, (1.16) is trivially true for all p > 0 and all s ∈ [0, 1].
Proof of Theorem 1.6. Suppose that (1.16) is valid for s = s 1 and s = s 2 , Since (by eqs. (C.7) and (C. Therefore, whenever (1.16) is valid for s = s 1 and s = s 2 , it is valid for s = s 1 + s 2 − s 1 s 2 . By Lemma 2.2, there is some ǫ > 0 so that (1.16) is valid as a strict inequality for all s ∈ (0, ǫ). Define an increasing sequence {t n } n∈N recursively by t 1 = ǫ and t n = 2t n − t 2 n for n > 1. Then by what we have just proved, (1.16) is valid as a strict inequality for all s ∈ (0, t n ). Since lim n→∞ t n = 1, the proof is complete.
The next goal is to show that the inequality on the right in (1.6) is a consequence of Theorem 1.6 by a simple differentiation argument. This simple proof is the new feature, The statement concerning cases of equality was proved in [20].

THEOREM.
For all X, Y ∈ P n and all p > 0,

11)
and this inequality is strict unless X and Y commute.
Proof. Specializing to the case Z = X in Theorem 1.6, At s = 1 both sides of (2.12) equal Tr[X log X r ], Therefore, we may differentiate at s = 1 to obtain a new inequality. Rearranging terms in (2.12) yields (2.13) Taking the limit s ↑ 1 on the left side of (2.15) yields d ds . ¿From the integral representation for the logarithm, namely log A = Altogether, by the cyclicitiy of the trace, d dp Replacing Y by Y −1 yields (2.11).
This completes the proof of the inequality itself, and it remains to deal with the cases of equality. Fix r > 0 and X and Y that do hot commute. By Theorem 1.3 applied with Z = X and s = 1/2, there is some δ > 0 such that Now use the fact that Y # 3/4 X = (Y # 1/2 X)# 1/2 X, and apply (2.11) and then (2.14): We may only apply strict in the last step since δ depends on X and Y , and strict need not hold if Y is replaced by Y # 1/2 X. However, in this case, we may apply (2.11).
Further iteration of this argument evidently yields the inequalities for each k ∈ N. We may now improve (2.15) to By the calculations above, taking s → 1 along this sequence yields the desired strict inequality.
Further inequalities, which we discuss now, involve an extension of the notion of geometric means. This extension is introduced here and explained in more detail in Appendix C.
Recall that for t ∈ [0, 1] and X, Y ∈ P n , X# t Y := X 1/2 (X −1/2 Y X −1/2 ) t X 1/2 . As noted earlier, this formula makes sense for all t ∈ R, and it has a natural geometric meaning. The map t → X# t Y , defined for t ∈ R, is a constant speed geodesic running between X and Y for a particular Riemannian metric on the space of positive matrices.
2.4 DEFINITION. For X, Y ∈ P n and for t ∈ R, The geometric picture leads to an easy proof of the following identity: Let X, Y ∈ P n , and t 0 , t 1 ∈ R. Then for all t ∈ R See Theorem C.4 for the proof. As a special case, take t 1 = 0 and t 0 = 1. Then, for all t, is valid for all t ∈ [1, ∞) and r > 0. If Y and Z do not commute, the inequality is strict for all t > 1.
The inequalities in Theorem 2.5 and in Theorem 1.6 are equivalent. The following simple identity is the key to this observation: 2.6 LEMMA. For B, C ∈ P n and s = 1, let A = B# s C. Then (2.20) Proof. Note that by (2.16) and (2.18) Proof. Define W ∈ P n by W r := Y r # s Z r . The identity (2.20) then says that Y r = Z r # 1/(1−s) W r . Therefore, Since s ∈ (0, 1), the right side of (2.21) is non-positive if and only if With this lemma we can now prove Theorem 2.5.
Proof of Theorem 2.5. Lemma 2.7 says that Theorem 2.5 is equivalent to Theorem 1.6.
There is a complement to Theorem 2.5 in the case Z = X that is equivalent to a result of Hiai and Petz, who formulate it differently and do not discuss extended geometric means. The statement concerning cases of equality is new.

THEOREM.
For all X, Y ∈ P n , is valid for all t ∈ (−∞, 0] and r > 0. If Y and X do not commute, the inequality is strict for all t < 0.
Proof. By definition 2.4 Therefore, by (2.11), By the definition of W and (2.18) once more, By combining the inequalities we obtain (2.22).
The proof given by Hiai and Petz is quite different. It uses a tensorization argument.   For the function in (3.2), we make a similar change of variables. Define Φ : P n → P n by Φ(Z) = Z# 1/2 Y := Q 1/2 from P n to P n . This map is invertible: It follows by direct computation from the definition (1.2) that for Q 1/2 : Finally, for the function in (3.3), we make a similar change of variables. Define Φ : P n → P n by

Quantum Relative Entropy Inequalities
from P n to P n . This map is invertible: Φ −1 (Q) = With the Donald relative entropy having taken center stage, we now bend our efforts to establishing some of its properties.
3.2 LEMMA. Fix X, Y ∈ P n , and define K X,Y := {Q ∈ P n : Tr[QY ] ≤ Tr[X]}. There exists a unique Q X,Y ∈ K X,Y such that Tr[Q X,Y Y ] ≤ Tr[X] and such that for all other Q ∈ K X,Y . The equation has a unique solution in P n , and this unique solution is the unique maximizer Q X,Y .
Proof. Note that K X,Y is a compact, convex set. Since Q → log Q is strictly concave, Q → Tr[X log Q] is strictly concave on K X,Y , and it has the value −∞ on ∂P n ∩ K X,Y , there is a unique maximizer Q X,Y that lies in P n ∩ K X,Y .
Let H ∈ H n be such that Tr[HY ] = 0. For all t in a neighborhood of 0, Q X,Y +tH ∈ P n ∩K X,Y . Differentiating in t at t = 0 yields for some λ ∈ R. Multiplying through on both sides by Q 1/2 X,Y and taking the trace yields λ = 1, which shows that Q X,Y solves (3.5). Conversely, any solution of (3.5) yields a critical point of our strictly concave functional, and hence must be the unique maximizer.

Remark.
There is one special case for which we can give a formula for the solution Q X,Y to (3.5): When X and Y commute, Q X,Y = XY −1 .

LEMMA.
For all X, Y ∈ P n and all λ > 0, and

7)
Proof. By (3.5) the maximizer Q X,Y in Lemma 3.2 satisfies the scaling relations For an appropriate choice of the set {U 1 , . . . , U N }, Q is the orthogonal projection of Q, with respect to the Hilbert-Schmidt inner product, onto the abelian subalgebra of M n generated by X, Y and 1 [13]. By the concavity of the logarithm, Therefore, in taking the supremum, we need only consider operators Q that commute with both X and Y . The claim now follows by Remark 3.3.

Remark.
Another simple proof of this can be given using Donald's original formula (1.19).
We have now proved that D D has properties (2) and (3) in the Definition 1.8 of relative entropy, and have already observed that it inherits joint convexity from the Umegaki relative entropy though its original definition by Donald. We now compute the partial Legendre transform of D D (X||Y ). In doing so we arrive at a direct proof of the joint convexity of D D (X||Y ), independent of the joint convexity of the Umegaki relative entropy. We first prove Lemma 1.11.
Proof of Lemma 1.11. For X ∈ P n , define a = Tr[X] and W := a −1 X, so that W is a density matrix. Then  We wish to evaluate the supremum as explicitly as possible.

LEMMA.
For H ∈ H n and Y ∈ P n , where for any self-adjoint operator K, λ max (K) is the largest eigenvalue of K.
Our proof of (3.10) makes use of a Minimax Theorem; such theorems give conditions under which a function f (x, y) on A × B satisfies (3.11) The original Minimax Theorem was proved by von Neumann [44]. While most of his paper deals with the case in which f is a bilinear function on R m × R n for some m and n, and A and B are simplexes, he also proves [44, p. 309] a more general results for functions on R × R that are quasi-concave in X and quasi convex in y. According to Kuhn Proof. Fix Y > 0 and let A ∈ H n be such that Y ± := Y ± A are both positive. Let Q be optimal in the variational formula (3.10) for Φ(H, Y ). We claim that there exists c ∈ R so that Suppose for the moment that this is true. Then By (3.14), which proves midpoint concavity. The general concavity statement follows by continuity.
To complete this part of the proof, it remains to show that we can choose c ∈ R so that (3.14) is satisfied. Define a := Tr[QA]. Since Y ± A > 0, and Tr[Q(Y ± A)] > 0, which is the same as 1 ± a > 0. That is, |a| < 1. We then compute We may now improve on Lemma 3.9: Not only is Φ D (H, Y ) concave in Y ; its exponential is also concave in Y .
and thus we conclude The proof that the function in (3.15) is concave has two components. One is the identification (3.18) of this function with Ψ D (H, Y ). The second makes use of the direct analog of an argument of Tropp [41] proving the concavity in Y of Tr[e H+log Y ] = Ψ(H, Y ) + Tr[Y ] as a consequence of the joint convexity of the Umegaki relative entropy. Once one has the formula (3.18), the convexity of the function in (3.15) follows from the same argument, applied instead to the Donald relative entropy, which is also jointly convex.
However, it is of interest to note here that this argument can be run in reverse to deduce the joint convexity of the Donald relative entropy without invoking the joint convexity of the Umegaki relative entropy. To see this, note that Lemma 3.9 provides a simple direct proof of the concavity in Y of Φ D (H, Y ). By the Fenchel-Moreau Theorem, for all density matrices X is evidently jointly convex. Since the supremum of any family of convex functions is convex, we conclude that with the X variable restricted to be a density matrix, X, Y → D D (X||Y ) is jointly convex. The restriction on X is then easily removed; see Lemma 3.11 below. This gives an elementary proof of the joint convexity of D D (X||Y ). It is somewhat surprising the the joint convexity of the Umegaki relative entropy is deeper than the joint convexity of either D D (X||Y ) or D BS (X||Y ). In fact, the simple proof by Fujii and Kamei that the latter is jointly convex stems from a joint operator convexity result; see the discussion in Appendix C. The joint convexity of the Umegaki relative entropy, in contrast, stems from the basic concavity theorem in [30].
3.11 LEMMA. Let f (x, y) be a (−∞, ∞] valued function on R m × R n that is homogeneous of degree one. Let a ∈ R m , and let K a = {x ∈ R m : a, x = 1}, and suppose that whenever f (x, y) < ∞, a, x > 0. If f is convex on K a × R n , then it is convex on R m × R n .
We next provide the proof of Proposition 1.9, which we recall says that any quantum relative entropy functional satisfies the inequality for all X, W ∈ P n , where · 1 denotes the trace norm.
Proof of Proposition 1.9. By scaling, it suffices to show that when X and W are density matrices, Let X and W be density matrices and define H = X − W . Let P be the spectral projection onto the subspace of C n spanned be the eigenvectors of H with non-negative eigenvalues. Let A be the * -subalgebra of M n generated by H and 1, and let E A be the orthogonal projection in M n equipped with the Hilbert-Schmidt inner product onto A. Then A → E A A is a convex operation [13], and then by the joint convexity of R, Since both E A X and E A Y belong to the commutative algebra A, (3.22) together with property (3) in the definition of quantum relative entropies then gives us , the inequality now follows from the classical Csiszar-Kullback-Leibler-Pinsker inequality [12,28,29,35] on a two-point probability space.
3.12 Remark. The proof of the lower bound (3.21) given here is essentially the same as the proof for the case of the Umegaki relative entropy given in [21]. The proof gives one reason for attaching importance to the joint convexity property, and since it is short, we spelled it out to emphasize this.
We conclude this section with a brief discussion of the failure of convexity of the function φ(X, Y ) = TrX 1/2 log(Y −1/2 XY −1/2 )X 1/2 . We recall that if we write this in the other order, i.e., define the function ψ(X, Y ) = TrX 1/2 log(X 1/2 Y −1 X 1/2 )X 1/2 , the function ψ is jointly convex. In fact ψ is operator convex if the trace is omitted. We might have hoped, therefore, that φ would at least be convex in Y alone, and even have hoped that log(Y −1/2 XY −1/2 ) is operator convex in Y . Neither of these things is true. The following lemma precludes the operator convexity.
3.13 LEMMA. Let F be a function mapping the set of positive semidefinite matrices into itself.
is not operator convex, then there is a unit vector v and there are density By Jensen's inequality, for all density matrices X, v, f (F (X))v ≤ f ( v, F (X)v ). Therefore, By the lemma, if Y → log(Y −1/2 ZY −1/2 ) were convex, Y → Y −1/2 ZY −1/2 would be convex. But this may be shown to be false in the 2 × 2 case by simple computations in an neighborhood of the identity with Z a rank-one projector. A more intricate computation of the same type shows that -even with the trace -convexity fails. Writing out the identity X# t Y = Y # 1−t X gives

Exponential Inequalities Related to the Golden Thompson Inequality
Differentiating at t = 0 yields This provides an alternate expression for D BS (X||Y ) that involves X in a somewhat simpler way that is advantageous for the partial Legendre transform in X: where f (x) = x log x. A different derivation of this formula may be found in [23]. Introducing the variable R = Y −1/2 XY −1/2 we have, for all H ∈ H n , Therefore, When Y and H commute, the supremum on the right is achieved at R = e H since for this choice of R, The inequality is proved by writing Y = e L .
We now turn to the specification of the actual maximizer.
4.2 LEMMA. For K ∈ H n and Y ∈ P n , the function on P n has a unique maximizer R K,Y in P n which is contained in P n , and R K,Y is the unique critical point of this function in P n .
Proof. Since f is strictly operator convex, R → Tr[RK] − Tr[Y f (R)] is strictly concave. There are no local maximizers on the boundary on P n since lim x↓0 (−f ′ (x)) = ∞, so that if R has a zero eigenvalue, a small perturbation of R will yield a higher value. Finally, and then one readily concludes that the unique maximizer R H,Y to the variational problem in (4.5) is the unique solution in P n of When H and Y commute, one readily checks that R = e H is the unique solution in P n .  valid for A, B ∈ P n . is the special case of Theorem C.4 in which t 1 = 1, t = −t 0 /(t − t 0 ) and t 0 = s. Taking A = X r = e rH and B = Y r = e rK , we have X r = W r # β Y r , with β = −s/(1 − s). Therefore, by (2.22),  [23]. Their proof is also based on (2.22), together with an identity equivalent to (4.11), but they employ these differently, thereby omitting the remainder term D(W ||V ).
We remark that one may obtain at least one of the cases of (1.10) directly from (4.2) and (4.3) by making an appropriate choice of X in terms of H and Y : Define X 1/2 := Y #e H . Then and, therefore, making this choice of X, and Tr[P λ Q µ ] are non-negative, and hence the right side of (A.2) is non-negative. This yields Klein's inequality: Now suppose that the function f is strictly convex on an interval containing σ(A)∪σ(B), Then and P λ ≤ Q λ . The same reasoning shows that for each µ ∈ σ(B), µ ∈ σ(A) and Q µ ≤ P λ . Thus  This is the Gibbs variational principle for the entropy S(X) = −Tr[X log X]. Now let Y ∈ P n and replace K with K +log Y in (A.5) to conclude that for all density matrices X, all Y ∈ P n and all K ∈ H n ,

B Majorization inequalities
Let x = (x 1 , . . . , x n ) and y = (y 1 , . . . , y n ) be two vectors in R n such that x j+1 ≤ x j and y j+1 ≤ y j for each j = 1, . . . , n − 1. Then y is said to majorize x in case and in this case we write x ≺ y. A matrix P ∈ M n is doubly stochastic in case P has non-negative entries and the entries in each row and column sum to one. By a theorem of Hardy, Littlewood and Pólya, x ≺ y if and only if there is a doubly stochastic matrix P such that x = P y. Therefore, if φ is convex on R and x ≺ y, let P be a doubly stochastic matrix such that x = P y. By Jensen's inequality n j=1 φ(x j ) = n j=1 φ n k=1 P j,k y k ≤ n j,k=1 P j,k φ (y k ) = n k=1 φ(y k ) .
That is, for every convex function φ, Let X, Y ∈ H n , and let λ X and λ Y be the eigenvalue sequences of X and Y respectively with the eigenvalues repeated according to their geometric multiplicity and arranged in decreasing order considered as vectors in R n . Then Y is said to majorize X in case λ X ≺ λ Y , and in this case we write X ≺ Y . It follows immediately from (B.2) that if φ is an increasing convex function, Proof. Note that Φ(X) ∈ H n . Let Φ(X) = n j=1 λ j |v j v j | be the spectral resolution of Φ(X) with λ j ≥ λ j+1 for j = 1, . . . , n − 1, Fix k ∈ {1, . . . , n − 1}.and let P k = k j=1 |v j v j |. Then with Φ * denoting the adjoint of Φ with respect to the Hilbert-Schmidt inner product, Choi [10,11] has shown that, for all n ≥ 2, the transformation cannot be written in the form (B.5), yet it satisfies the conditions of Theorem B.1.
B.2 LEMMA. Let A ∈ P n and let Φ be defined by . Then for all X ∈ H n , (B.4) is satisfied, and for all p ≥ 1, Proof. Φ evidently satisfies the conditions of Theorem B.1, and then (B.4) implies (B.7) as discussed above.

C Geodesics and Geometric Means
There is a natural Riemannian metric on P n such that the corresponding distance δ(X, Y ) is invariant under conjugation: for all X, Y ∈ P n and all invertible n × n matrices A. It turns out that for A, B ∈ P n , t → A# t B, t ∈ [0, 1], is a constant speed geodesic for this metric that connects A and B. This geometric point of view, originating in the work of statisticians, and was developed in the form presented here by Bhatia and Holbrook [7].
, be a smooth path in P n . The arc-length along this path in the conjugation invariant metric is b a where · 2 denotes the Hilbert-Schmidt norm and the prime denotes the derivative. The corresponding distance between X, Y ∈ P n is defined by To see the conjugation invariance, let the smooth path X(t) be given, let an invertible matrix A be given, and define Z(t) := A * X(t)A. Then by cyclicity of the trace, Given any smooth path t → X(t), define H(t) := log(X(t)) so that X(t) = e H(t) , and then or equivalently, Now let X(t) be a smooth path in P n with X(0) = X and X(1) = Y . Then, with If X and Y commute, this lower bound is exact: Given X, Y ∈ P n that commute, define  C.2 LEMMA. When X, Y ∈ P n commute, there is exactly one constant speed geodesic running from X to Y in unit time, namely, X(t) = e (1−t) log X+t log Y , and δ(X, Y ) = log Y − log X 2 .
Since conjugation is an isometry in this metric, it is now a simple matter to find the explicit formula for the geodesic connecting X and Y in P n . Apart from the statement on uniqueness, the following theorem is due to Bhatia and Holbrook [7].
By the conjugation invariance of the metric, δ(X, Y ) = δ(1, X −1/2 Y X −1/2 ) and X(t) as defined in (C.5) has the constant speed δ(X, Y ) and runs from X to Y in unit time. Thus it is a constant speed geodesic running from X to Y in unit time.
If there were another such geodesic, say X(t), then X −1/2 X(t)X −1/2 would be a constant speed geodesic running from 1 to X −1/2 Y X −1/2 in unit time, and different form W (t), but this would contradict the uniqueness in Lemma C.2.
Since the speed along the curve T → X# t Y has the constant value log(X −1/2 Y X −1/2 ) 2 , this, together with the uniqueness in Theorem C.3, shows that for all t 0 < t 1 in R, the restriction of t → X# t Y to [t 0 , t 1 ] is the unique constant speed geodesic running from X# t 0 Y to X# t 1 Y in time t 1 − t 0 .
This has a number of consequences.
C.4 THEOREM. Let X, Y ∈ P n , and t 0 , t 1 ∈ R. Then for all t ∈ R Proof. By what we have noted above, t → X# (1−t)t 0 +tt 1 Y is a constant speed geodesic running from X# t 0 Y to X# t 1 Y in unit time, as is t → (X# t 0 Y )# t (X# t 1 Y ). The identity (C.6) now follows from the uniqueness in Theorem C.3.
Taking t 0 = 0 and t 1 = s, we have the special case X# ts Y = X# t (X# s Y ) . (C.7) Taking t 0 = 1 and t 1 = 0, we have the special case The identity (C.8) is well-known, and may be derived directly from the formula in (C.5).
We are particularly concerned with t → X# t Y for t ∈ [−1, 2]. Indeed, from the formula in (C.5), Let t ∈ (0, 1). By combining the formula X# t Y = X 1/2 (X −1/2 Y X −1/2 ) t X 1/2 = X 1/2 (X 1/2 Y −1 X 1/2 ) −t X 1/2 with the integral representation we obtain, for t ∈ (0, 1), (C.10) The merit of this formula lies in the following lemma: which expresses weighted geometric means as average over harmonic means. By the operator monotonicity of the map A → A −1 , the map X, Y → X : Y is monotone in each variable, and then by (C.12) this is also true of X, Y → X# t Y . This proves the following result of Ando and Kubo [26]: C.6 THEOREM (Ando and Kubo). For all t ∈ [0, 1], (X, Y ) → X# t Y is jointly concave, and monotone increasing in X and Y .
The method of Ando and Kubo can be used to prove joint operator concavity theorems for functions on P n × P n that are not connections. The next theorem, due to Fujii and Kamei [17], provides an important example.
Proof. The representation from which the claim follows.