On bounding the Thompson metric by Schatten norms

The Thompson metric provides key geometric insights in the study of non-linear matrix equations and in many optimization problems. However, knowing that an approximate solution is within dT units, in the Thompson metric, of the actual solution provides little insight into how good the approximation is as a matrix or vector approximation. That is, bounding the Thompson metric between an approximate and accurate solution to a problem does not provide obvious bounds either for the spectral or the Frobenius norm, both Schatten norms, of the difference between the approximation and accurate solution. This paper reports such an upper bound, namely that k X Ykp 21p e d 1 ð Þ ed max k Xkp; k Ykp where k kpdenotes the Schatten p-norm and d denotes the Thompson metric between X and Y. Furthermore, a more geometric proof leads to a slightly better bound in the case of the Frobenius norm, k X Yk2 e d 1 ð Þ ffiffiffiffiffiffiffiffiffi e2dþ1 p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k Xk2þ k Yk2 q 212 e d 1 ð Þ ffiffiffiffiffiffiffiffiffi e2dþ1 p max k Xkp; k Ykp . Subjects: Science; Mathematics & Statistics; Advanced Mathematics; Algebra; Linear & Multilinear Algebra; Analysis Mathematics; Functional Analysis; Geometry


PUBLIC INTEREST STATEMENT
Metrics are functions, mapping pairs of points to non-negative real numbers, which generalize the concept of distance to apply to abstract spaces. The Thompson metric provides critical geometric insights into dynamical systems, optimization problems and solving systems of equations. However, the Thompson metric is not an intuitive generalization of the concept of distance. The Thompson metric does provide an upper bound for more intuitive measurements of distance, such as those based on Schatten norms, but the currently known relation between Thompson metrics and more intuitive generalizations of distance is not always a tight bound. This paper presents a tighter bound relating metrics based on Schatten norms to Thompson metrics. The results in this paper can refine our geometrical understanding of problems arising in fluid mechanics, in geophysics as well as in robotics, and may improve assessments of the quality of data and image processing techniques.

Introduction
The Thompson metric is a variant of the Hilbert metric (Nussbaum & Walsh, 2004). The Hilbert metric generalizes the metric structure of hyperbolic geometry to the generalized concept of cones used in the study of Banach (complete normed vector) spaces, such as the space of Hermitian matrices. When applied to the unit disk, the Hilbert metric yields the Klein model of hyperbolic geometry, but when applied to a cone, such as the cone of positive definite or positive semidefinite matrices, the Hilbert metric is actually a pseudometric. A slight tweak of the Hilbert metric yields the Thompson (part) metric: the Thompson metric d T X; Y ð Þ is the minimal d T ¼ log α ð Þ such that both αX À Y and αY À X are both positive semidefinite. The Thompson metric is well defined over the cone of positive definite matrices but may be infinite when applied to other matrices, such as positive semidefinite matrices.
The Thompson metric (Lemmens & Roelands, 2015;Nussbaum & Walsh, 2004) provides key geometric insights into the study of non-linear matrix equations. In particular, many flows, which in other metrics may not even be contractions, have well-characterized contraction rates in the Thompson metric (Lee & Lim, 2008). That flows arising in many non-linear optimization, filtering and control problems are contractions in the Thompson metric (Carli & Sepulchre, 2015;Del Moral, Kurtzmann, & Tugaut, 2017;Gaubert & Qu, 2014;Lawson & Lim, 2007;Qu, 2014) endows this metric with great utility. Applications of the Thompson metric range from proofs of the existence and uniqueness of positive definite solutions for many types of non-linear equations (Liao, et al., 2010) to non-linear optimization theory (Gaubert & Qu, 2014;Montrucchio, 1998) and nonlinear Perron-Frobenius theory (Lemmens & Nussbaum, 2012;Nussbaum, 1988). Relatedly, matrix bounds in the Löwner order characterize the error in approximate solutions to continuous algebraic Riccati equation (Zhang & Liu, 2010).
While the Thompson metric is convenient for solving many optimization problems involving matrices, it is often more intuitive to view matrices solving such problems within more typical geometric contexts. Knowing that the solution of a problem X and its n th approximation X n are d T units apart in the Thompson metric provides little indication of how close X n is to X, i.e. knowing that X n αX and X αX n in the Löwner ordering (Baksalary & Pukelsheim, 1991), where α ¼ e d T , does not intuitively bound k X À X n k for any of the usual matrix norms k Á k . But it is k X À X n k in a suitable matrix norm, not d T , or similar expressions relating X and X n in the Löwner ordering, that provides insight as to the quality of an approximation X n .
In particular, considering the matrices X n and X as linear operators on Euclidean vector spaces, the spectral norm, i.e. a Schatten p-norm with p ¼ 1, of X À X n is the relevant measure of how well X n approximates X. Considering these matrices as themselves vectors in a Euclidean space, then the relevant assessment of how well X n approximates X is the Frobenius norm, i.e. a Schatten norm with p ¼ 2, of X À X n . Therefore, it is useful to know an upper bound for the Schatten p-norm k X À X n k p given some minimal information about X (e.g. its norm) as well as the Thompson metric d ¼ d T X; X n ð Þ. For a cone with normality constant δ, in a Banach Space, the following inequality holds (Lemmens & Nussbaum, 2012;Nussbaum, 1988) However, this inequality does not preclude the existence of tighter bounds relating specific norms and Thompson metrics such as Schatten p-norms and the Thompson metric induced by the Löwner order on the cone of positive semidefinite matrices.
This paper thus seeks to fill this important gap in our understanding of the relationship between Thompson metrics and Schatten norms by providing an upper bound for the Schatten p-norm k X À Yk p given the Thompson metric d ¼ d T X; Y ð Þ as well as the Schatten p-norms of X and Y. In particular, the application of Weyl's inequalities establishes that k X À Yk p 2 1 p e d À1 ð Þ e d max k Xk p ; k Yk p Â Ã . Hopefully, this paper will serve as the beginning of a conversation leading to ever tighter bounds on k X À Yk p given d ¼ d T X; Y ð Þ as well as minimal information about X and Y, such as their norms and perhaps some knowledge of their spectra of eigenvalues.

Preliminaries
This paper will generally use a consistent set of letters and symbols to denote certain matrices and their norms and eigenvalues. Let X and Y each denote positive definite Hermitian matrices with eigenvalues χ 1 ! ::: ! χ n and υ 1 ! ::: ! υ n , respectively. While the proofs presented in this paper do not explicitly require the matrices be positive definite, in such cases the Thompson metric may be infinite, when the matrix is not positive definite the results presented here are trivial as any finite metric is 1. Thus, this paper will focus on positive definite matrices X and Y. Denote the eigenvalues of the matrix Δ ¼ X À Y by δ 1 ! ::: ! δ n and those of E ¼ ÀΔ ¼ Y À X by ε 1 ! ::: ! ε n . Note that δ i ¼ Àε nÀiþ1 . k M k denotes a Schatten norm of the matrix M and k Mk p specifically denotes the Schatten p-norm (which is a norm for p such that 1 p 1). Note that k Mk p is a function of the eigenvalues μ 1 ! ::: ! μ n of M: k Mk p ¼ f p μ 1 ; :: . Similarly, this paper will use the notation of f μ 1 ; :::; μ n ð Þas the functional form of k M k . Depending on the context, and ! denote either the usual ordering on real numbers or the Löwner ordering on matrices: i.e. X Y indicates that Y À X is positive semidefinite. In terms of the Löwner ordering, the Thompson metric d T X; Y ð Þis the minimal d T ¼ log α ð Þ such that Y αX and X αY (Nussbaum & Walsh, 2004). As is standard, tr M ð Þ denotes the trace of the matrix M.
Key to the proofs in this paper are the well-established Weyl's inequalities (Bhatia, 2007;Weyl, 1949) for the eigenvalues of Hermitian matrices: let M, Y and P be Hermitian matrices such that Use of Mathematica (Wolfram Research I, 2016) proved invaluable in simplifying the equations and inequalities presented in this paper. Numerical results were calculated using MATLAB (MathWorks I, 2017).

Proof of general case
The proof begins with a lemma applying Weyl's inequalities to bound the eigenvalues of Δ ¼ X À Y by the eigenvalues of Y given upper and lower bounds for X in the Löwner ordering. The second lemma, a consequence of the first lemma, bounds the eigenvalues of Δ ¼ X À Y by the eigenvalues of X .
Theorem 3.5: Consider (positive definite) Hermitian matrices X and Y, i.e. such that Thompson Proof: Since raising positive numbers to powers ! 1 is monotonically increasing, a consequence of Hence, by the definition of and monotonicity of f p ,

The Frobenius (p = 2) case
We begin by noting that tr A T B À Á defines an inner product yielding the Frobenius norm, i.e.
. This, together with the commutative property of the trace, leads to the following version of the law of cosines for matrices: k A À Bk 2 2 ¼k Ak 2 2 þ k Bk 2 2 À 2 Á tr A T B À Á . Since for two (symmetric) positive semidefinite matrices X and Y, tr X T Y À Á ¼ tr XY ð Þ ! 0 (Yang, 2000;Yang, Yang, & Teo, 2001), θ ¼ cos À1 tr XY ð Þ kXk 2 kYk 2 h i π 2 rad ð Þ and hence k X À Yk 2 2 k Xk 2 2 þ k Yk 2 2 . Note that the Frobenius norm of a matrix is the same as the Euclidean norm of that matrix reshaped as a vector, so matrices under the Frobenius norm can be treated just as vectors in a Euclidean space.
Solving for k Δk 2 , we have our result

A generalization of the Frobenius (p = 2) case
Consider the related and more general problem of bounding the Frobenius norm k Δk 2 , with Δ ¼ X À Y, given matrix bounds X g Á Y and Y f Á X, for scalars f and g. This generalization, illustrated in Figure 2, yields the following equations: k Δk 2 2 ¼k Xk 2 2 þ k Yk 2 2 À 2 k Xk 2 k Yk 2 Á cosϕ (8) (9) Similar to the argument above (in part 4), X g Á Y and Y f Á X imply that θ π 2 rad ð Þ, which implies The above system of two quadratic equations and one quadratic inequality has (assuming k Xk 2 is known, even if X is an unknown, approximated by Y) three unknowns: k Δk 2 , k Δ fg k 2 and cosϕ.
Solving this system and simplifying the resulting solutions with Mathematica (Wolfram Research I, 2016) yields the following inequalities: Figure 2. Difference and Frobenius norm between two vectors give matrix bounds. This figure represents matrices X and Y as vectors that span a plane and illustrates the geometric intuition behind Equations (8) and (9) as well as inequality (10).
Note that this not only establishes a bound for k X À Yk 2 , given matrix bounds X g Á Y and Y f Á ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi b 2 À 4ac p Á X, but this analysis also yields a bound for cosϕ. Thus, this analysis provides information about the inner product between X and Y, even in cases where X is an unknown, approximated by Y. Of course, when f ¼ g ¼ α ¼ e d , (11) simplifies to the result established in Section 4.

Numerical results
Fifty calculations, performed in MATLAB, with pairs of random positive definite 5 Â 5 matrices tested the tightness of the bounds presented in this paper. The following formulas generated the i th pair X i ; Y i ð Þof matrices where A, B and C have elements randomly drawn from the uniform distribution on [0,1] and D is a diagonal matrix with diagonal elements randomly drawn from that same distribution. The occurrence of the index i in the formula ensured a range of distances among the 50 matrix pairs tested. Figure 3 compares values of X i À Y ip for (A) p = 1 (trace norm), (B) p = 2 (Frobenius norm) and (C) p = ∞ (spectral norm) with bounds for those values calculated using Theorem 3.5, (and for panel B) Equation (7) and Equation (11). The function, thompson_metric.m, used to calculate the Thompson metric as well as f and g in Equation (11) is available via MATLAB Central File Exchange, and the script, and data calculated using that script, used to generate Figure 3 is available from the author upon request. While the bounds described in this paper are clearly not very tight (for matrices more distant from each other), hopefully, these results will spark further research leading to tighter bounds on Schatten norms based on the Thompson metric.

Discussion
Weyl's inequalities, and hence some knowledge of the spectra of X and Y, form the backbone of the proofs presented above. In the motivating case where Y is an approximation of an unknown X, the spectrum of X may also be unknown. While the principle result of this paper ultimately only requires knowledge of k Xk p (as well as k Yk p , which is generally known), purely geometric/ trigonometric proofs, such as the one given for the Frobenius case, of the results presented in this paper would be more elegant given the nature of the motivating problem.
Furthermore, proofs not based on the matrix structure of X and Y but based purely on the ordering (Löwner ordering in this case) and norm (Schatten p-norm) being compared might allow for tighter bounds on k X À Yk p even in the absence of any knowledge of the spectrum of X (or even of Y, for that matter), other than perhaps a restriction that X and Y be positive semidefinite. In comparison, Theorem 3.4 provides a tighter bound on k X À Yk p than the main result (Theorem 3.5), but it requires some knowledge of the spectrum of X (at least that its eigenvalues are lower in magnitude than the corresponding eigenvalues of Y).
Additionally, proofs not based on the matrix structure of X and Y may lead to the generalization of these results in other orderings, which can also induce Thompson metrics (Cobzaş & Rus, 2014), and other norms. For instance, since the Frobenius norm arises from an inner product, a geometrically flavored argument leads to a slightly tighter bound on k X À Yk 2 than obtained from the general bound for k X À Yk p and setting p ¼ 2. On the other hand, the already established general result for a Thompson metric induced by a normal cone in a Banach space (Lemmens & Nussbaum, 2012;Nussbaum, 1988) is not as tight as the main result (Theorem 3.5) presented here: as ! 0; the value of δ such that 0 X X þ I ð Þ) kX k δ k X þ I k approaches unity; thus, the normality constant for the cone of positive semidefinite matrices is unity, and the general result for Banach spaces reduces to k X À Y k 3 e d T X;Y ð Þ À 1 À Á max k X k; k Y k ½ , the right-hand side of which inequality is clearly greater than 2 2<3 and e d ! 1 since the (Thompson) metric d is non-negative.
As illustrated in Section 5 of this paper, more general analysis of the Frobenius case yields not only a bound for k X À Yk 2 but also bounds inner product between X and Y. In the case where X is an unknown, approximated by Y, bounds on the inner product between X and Y further quantify how well Y approximates X, and may provide further insight into improved approximations of an unknown X.
Hopefully, future research can further generalize the analysis presented in Section 5 to cases where X g Y ð Þ and Y f X ð Þ, for more general classes of functions on X and Y than mere scalar multiplication. Such inequalities, in the Löwner order, arise, for example, in characterizing approximate solutions to the continuous algebraic Riccati equation (Zhang & Liu, 2010). Further generalization of the results presented here will facilitate expressing the quality of approximations, found in many areas of matrix algebra and optimization theory, in terms of geometrically intuitive metrics based on Schatten norms rather than less geometrically intuitive bounds in the Löwner order.

Citation information
Cite this article as: On bounding the Thompson metric by Schatten norms, David A. Snyder, Cogent Mathematics & Statistics (2019), 6: 1614318.