Riemannian Gradient Algorithm for the Numerical Solution of Linear Matrix Equations

Duan, Xiaomin; Sun, Huafei; Zhao, Xinyu

doi:https://doi.org/10.1155/2014/507175

Journal of Applied Mathematics

On this page

Abstract Introduction Conclusion Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2014 | Article ID 507175 | https://doi.org/10.1155/2014/507175

Riemannian Gradient Algorithm for the Numerical Solution of Linear Matrix Equations

Xiaomin Duan,^1,2Huafei Sun,¹and Xinyu Zhao^3,4

Academic Editor: Zhi-Hong Guan

Received06 Aug 2013

Revised10 Dec 2013

Accepted10 Dec 2013

Published06 Jan 2014

Abstract

A Riemannian gradient algorithm based on geometric structures of a manifold consisting of all positive definite matrices is proposed to calculate the numerical solution of the linear matrix equation . In this algorithm, the geodesic distance on the curved Riemannian manifold is taken as an objective function and the geodesic curve is treated as the convergence path. Also the optimal variable step sizes corresponding to the minimum value of the objective function are provided in order to improve the convergence speed. Furthermore, the convergence speed of the Riemannian gradient algorithm is compared with that of the traditional conjugate gradient method in two simulation examples. It is found that the convergence speed of the provided algorithm is faster than that of the conjugate gradient method.

1. Introduction

The linear matrix equation where are arbitrary real matrices, is a nonnegative integer, and denotes the transpose of the matrix , arises from many fields, such as the control theory, the dynamic programming, and the stochastic filtering [1–4]. In the past decades, there has been increasing interest in the solution problems of this equation. In the case of , some numerical methods, such as Bartels-Stewart method, Hessenberg-Schur method, and Schur and QR decompositions method, were proposed in [5, 6]. Based on the Kronecker product and the fixed point theorem in partially ordered sets, some sufficient conditions for the existence of a unique symmetric positive definite solution are given in [7, 8]. Ran and Reurings ([7, Theorem 3.3] and [9, Theorem 3.1]) pointed out that if is a positive definite matrix, then there exists a unique solution and it is symmetric positive definite. Recently, under the condition that (1) is consistent, Su and Chen presented an efficient numerical iterative method based on the conjugate gradient method (CGM) [10].

In addition, based on geometric structures on a Riemannian manifold, Duan et al. proposed a natural gradient descent algorithm to solve algebraic Lyapunov equations [11, 12]. Following them, we investigate the solution problem of (1) in the view of Riemannian manifolds. Note that this solution of (1) is a symmetric positive definite matrix and the set of all the symmetric positive definite matrices can be considered as a manifold. Thus, it is more convenient to investigate the solution problem with the help of these geometric structures on this manifold. To address such a need, a new framework is presented in this paper to calculate the numerical solution, which is based on the geometric structures on the Riemannian manifold of positive definite symmetric matrices. We will first describe briefly some fundamental knowledge on manifold. Then a Riemannian gradient algorithm is proposed, where the geodesic distance on the curved Riemannian manifold is taken as an objective function and the geodesic curve is treated as the convergence path. Also the optimal variable step sizes corresponding to the minimum value of the objective function are provided in order to improve the convergence speed. Finally, the behavior of the provided algorithm and the traditional CGM is compared and demonstrated in two simulation examples.

2. A Survey of Some Geometrical Concepts

In the present section, we briefly survey the geometry of smooth manifolds by recalling concepts as tangent spaces, the Riemannian gradient, geodesic curves, and geodesic distance. More details can be found in [13, 14]. Specially, these concepts of manifold consisting of all symmetric positive definite matrices have also been introduced in [15–20].

2.1. Riemannian Manifolds

Let us denote a Riemannian manifold by . Its tangent space at point is denoted by . Given any pair of points , an inner product is defined. The specification of the inner product for a Riemannian manifold turns it into a metric space. In fact, the length of a curve , such that and , is given by Given arbitrary and , the curve of shortest length is called geodesic curve. Such minimal length is called geodesic distance between two points, namely, and . The Riemannian distance between two points is denoted by It is obvious that if any pair of points on manifold can be connected by a geodesic curve, then it is possible to measure the distance between any given pair of points on it. Given a regular function , its Riemannian gradient in the direction of the vector measures the rate of change of the function in the direction . Namely, given any smooth curve , such that and , the Riemannian gradient is the unique vector in such that

Instead of the Euclidean space, by using the Riemannian gradient the optimization method may be readily extended to smooth manifolds [21, Chapter 7]. For this purpose, let us consider the differential equation on manifold

If the function is bounded and the smooth manifold is compact, then the solution of such differential equation tends to a local minimum of the function in , depending on the initial point . In fact, it is achieved by definition of the Riemannian gradient

2.2. Manifold of Symmetric Positive Definite Matrices

Let us denote the general linear group by , which consists of all nonsingular real matrices. And means that is a symmetric positive definite matrix. For a different notation , we will use . And the manifold consisting of all symmetric positive definite matrices is defined by . The tangent space at a point is given by . On manifold , the Riemannian metric at the point is defined by where and denotes the trace of the matrix. Then, manifold with the Riemannian metric (7) becomes a Riemannian manifold. Moreover, it is a curved manifold since this metric is invariant under the group action of . The geodesic curve at the point in the direction can be expressed as Obviously, this geodesic curve is entirely contained in manifold. The Hopf-Rinow theorem implies that manifold with the Riemannian metric (7) is geodesically complete ([22, Theorem 2.8]). This means, for any given pair , we can find a geodesic curve , such that and , by taking the initial velocity . According to (2), the geodesic distance can be computed explicitly by

3. Riemannian Gradient Algorithm

Some researches show that the solution equation (1) is unique positive definite if and satisfy ([7, Theorem 3.3] and [9, Theorem 3.1]) For convenience, we set . And it is obvious that is positive definite for any . Our purpose is to seek a matrix so that is as close as possible to the given matrix . We figure out some key points in designing such an algorithm as follows. (1)For some , the difference between the given positive definite matrix and the matrix should be determined and calculated firstly.(2)Then the Riemannian gradient descent algorithm can be used to adjust so that the difference will be as small as possible.

Note that the Riemannian manifold is geodesically complete; hence we take the geodesic distance between and as the objective function ; namely,

Then the exact solution of (1) may be defined as In order to find a minimizer for the criterion , we may make use of the differential equation (5). The minimum is apparently achieved for a value such that It is necessary to solve the optimization problem (12) by a numerical optimization algorithm that takes the geometry of the Riemannian manifold into account. In particular, starting from an initial guess , it is possible to provide an iterative learning algorithm that generates a pattern at any learning step . Following [21, 23], such sequence may be generated by moving from each point to the next point along a short geodesic curve in the opposite direction of the Riemannian gradient of the objective function . Namely, if we set then the Riemannian gradient algorithm may be expressed as where denotes any suitable step-size schedule that drives the iterative algorithm (15) to be convergent. Moreover, with the aim to employ the learning algorithm (14) and (15), we first need to compute the Riemannian gradient of the objective function to be optimized. In the definition of the Riemannian gradient (4), the generic smooth curve may be replaced with a geodesic curve. We obtain the following theorem.

Theorem 1. The Riemannian gradient of the objective function at point is given by

Proof. First, we compute the derivative of the real-valued function with respect to , where is the geodesic curve emanating from in the direction . Using Proposition 2.1 in [16] and some properties of the trace, we have From (4), it is shown that formula (16) is valid.

Remark 2. In fact, note that , for all and for all ; then can be rewritten as Therefore it can be shown that belongs to .

Corollary 3. The Riemannian gradient has the unique zero point . Furthermore, for the suitable step-size , the sequence generated by (15) converges to the solution of (1).

Proof. Obviously, is a zero point of the Riemannian gradient . If there exists such that the iterative process (15) stops, then it means , which is equivalent to On (20), the matrices are multiplied by and the traces of the matrices are taken; then we have Also, if we replace with and follow the similar procedure as the above equation, then the following equalities are valid: Combining (21) and (22), it can be obtained that Note that the matrix is positive definite; hence its orthogonal similarity diagonalization can be expressed as , where is an orthogonal matrix, , and . Thus, we have Note that the second equality of (24) is equivalent to Thus we have . That means that is also the solution of (1), which is contrary to the uniqueness of the solution. Therefore, the Riemannian gradient has the unique zero point . This completes the proof of Corollary 3.

Now, the Riemannian gradient algorithm with a constant step-size (RGACS) becomes where the initial value . The step-size corresponding to the best convergence speed can be obtained experimentally and the details will be described in our following simulations.

3.1. Riemannian Gradient Algorithm with Variable Step Sizes

In order to improve the convergence speed of the RGACS, an optimal variable step-size schedule will be considered and defined as follows. The step-size should be evaluated in such a way that the objective function at step is reduced as much as possible with respect to . We may regard as a function of the step-size that Thus, we can optimize the value of to ensure that the difference could be as large as possible. Since it is difficult to solve this nonlinear problem exactly, we use a suboptimal approximation in practice. Under the hypothesis that the step-size value is small enough, we may invoke the expansion of the function around the point as follows: Then the optimal step-size , corresponding to the maximum of the difference , can be expressed as The coefficients , and can be calculated using the first-order and the second-order derivatives of the function with respect to the parameter at the point . On the basis of the above discussion, we may obtain the optimal step-size

Then, the Riemannian gradient algorithm with variable step-sizes (RGAVS) can be written explicitly as where is the initial value.

3.2. Volume Feature of Riemannian Gradient Algorithm

The RGACS shows an interesting volume feature that the ratio always keeps a certain relationship. In order to prove the volume feature, let us recall two properties of determinant operator ; namely, for each , there hold By applying the determinant operator to both sides of (31), we obtain Therefore, the following property is valid: This equation shows that the determinant of (1) is closely linked with that of the sequence . Moreover, the ratio is directly proportional to for the fixed step-size . In the case of the RGAVS, the similar conclusion can also be obtained.

4. Simulations

Two simulation examples are given to compare the convergence speed of the Riemannian gradient algorithm (the RGACS and the RGAVS) with those of the traditional CGM. The error criterion used in these simulations is , where denotes the spectral radius of the matrix.

Example 4. First we consider the matrices equation in the case of . And , , , are as follows: which satisfy the condition (10). Both the RGACS and RGAVS are used and compared with the traditional CGM, and is taken as the initial value in the following simulation. In the RGACS case, the step sizes versus the iterations are shown in Figure 1. It can be found that the iterations will reduce gradually as the step-size changes from zero to a critical value; then the iterations will increase gradually above this critical value. Therefore, the critical value about 0.64 can be selected experimentally as an optimal step-size and used in the RGACS.

Figure 2 shows the convergence comparison between the RGACS and the RGAVS and the traditional CGM. It can be found that the RGAVS possesses the most steady convergence and the fastest convergence speed, and it needs 23 iterations to obtain the numerical solution of (35) as follows: The case of the CGM realizes the given error through 33 iterations and the slowest one among them is the RGACS with 31 steps. By comparison, it is shown that the convergence speed of the RGAVS is faster than both those of the RGACS and the traditional CGM in solving (35).

Example 5. A 20-order linear matrix equation with is also considered here following the similar procedure as Example 4, where , and satisfy the condition (10). The simulation indicates that the numerical solution needs 26, 45, and 37 iterations in the RGAVS, the RGACS (), and the traditional CGM, respectively. Therefore, the convergence speed of the RGAVS is also better than those of the RGACS and the CGM.

5. Conclusion

Using geometric structures of manifold consisting of all symmetric positive definite matrices, the Riemannian gradient algorithm is developed to calculate the numerical solution for the linear matrix equation . In this algorithm, the geodesic distance on the curved Riemannian manifold is taken as an objective function and the geodesic curve is treated as the convergence path. Also, the variable step sizes algorithm corresponding to the minimum value of the objective function is provided in order to improve the convergence speed in the constant step-size case. Finally, the convergence speed of the Riemannian gradient algorithm has been compared with the traditional conjugate gradient method by two simulation examples in 5-order and 20-order linear matrix equations, respectively. It is shown that the convergence speed of the Riemannian gradient algorithm with variable step sizes is faster than that of the conjugate gradient method. Although the Riemannian gradient algorithm is used here for the linear matrix equation in the real number field, it may be generalized to the complex case as well.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to express their sincere thanks to the referees for their valuable suggestions, which greatly improved the presentation of this paper. This subject is supported by the National Natural Science Foundations of China (nos. 61179031, 10932002, and 51105033).

References

M. Berzig, “Solving a class of matrix equations via the Bhaskar-Lakshmikantham coupled fixed point theorem,” Applied Mathematics Letters, vol. 25, no. 11, pp. 1638–1643, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
J. C. Engwerda, “On the existence of a positive definite solution of the matrix equation $X + A^{T} X^{- 1} A = I$ ,” Linear Algebra and Its Applications, vol. 194, pp. 91–108, 1993.
View at: Publisher Site | Google Scholar | MathSciNet
W. L. Green and E. W. Kamen, “Stabilizability of linear systems over a commutative normed algebra with applications to spatially-distributed and parameter-dependent systems,” SIAM Journal on Control and Optimization, vol. 23, no. 1, pp. 1–18, 1985.
View at: Publisher Site | Google Scholar | MathSciNet
W. Pusz and S. L. Woronowicz, “Functional calculus for sesquilinear forms and the purification map,” Reports on Mathematical Physics, vol. 8, no. 2, pp. 159–170, 1975.
View at: Google Scholar | MathSciNet
C.-Y. Chiang, E. K.-W. Chu, and W.-W. Lin, “On the $★$ -Sylvester equation $A X \pm X^{★} B^{★} = C$ ,” Applied Mathematics and Computation, vol. 218, no. 17, pp. 8393–8407, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
Z. Gajic and M. T. J. Qureshi, Lyapunov Matrix Equation in System Stability and Control, vol. 195, Academic Press, London, UK, 1995.
View at: MathSciNet
A. C. M. Ran and M. C. B. Reurings, “The symmetric linear matrix equation,” Electronic Journal of Linear Algebra, vol. 9, pp. 93–107, 2002.
View at: Google Scholar | MathSciNet
M. C. B. Reurings, Symmetric matrix equations [Ph.D. thesis], Vrije Universiteit, Amsterdam, The Netherlands, 2003.
A. C. M. Ran and M. C. B. Reurings, “A fixed point theorem in partially ordered sets and some applications to matrix equations,” Proceedings of the American Mathematical Society, vol. 132, no. 5, pp. 1435–1443, 2004.
View at: Publisher Site | Google Scholar | MathSciNet
Y. Su and G. Chen, “Iterative methods for solving linear matrix equation and linear matrix system,” International Journal of Computer Mathematics, vol. 87, no. 4, pp. 763–774, 2010.
View at: Publisher Site | Google Scholar | MathSciNet
X. Duan, H. Sun, L. Peng, and X. Zhao, “A natural gradient descent algorithm for the solution of discrete algebraic Lyapunov equations based on the geodesic distance,” Applied Mathematics and Computation, vol. 219, no. 19, pp. 9899–9905, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
X. Duan, H. Sun, and Z. Zhang, “A natural gradient descent algorithm for the solution of Lyapunov equations based on the geodesic distance,” Journal of Computational Mathematics, vol. 219, no. 19, pp. 9899–9905, 2013.
View at: Google Scholar
M. Spivak, A Comprehensive Introduction to Differential Geometry, vol. 1, Publish or Perish, Berkeley, Calif, USA, 3rd edition, 1999.
View at: MathSciNet
S. Lang, Fundamentals of Differential Geometry, vol. 191, Springer, 1999.
View at: Publisher Site | MathSciNet
F. Barbaresco, “Interactions between symmetric cones and information geometrics: Bruhat-tits and Siegel spaces models for high resolution autoregressive Doppler imagery,” in Emerging Trends in Visual Computing, vol. 5416 of Lecture Notes in Computer Science, pp. 124–163, 2009.
View at: Google Scholar
M. Moakher, “A differential geometric approach to the geometric mean of symmetric positive-definite matrices,” SIAM Journal on Matrix Analysis and Applications, vol. 26, no. 3, pp. 735–747, 2005.
View at: Publisher Site | Google Scholar | MathSciNet
M. Moakher, “On the averaging of symmetric positive-definite tensors,” Journal of Elasticity, vol. 82, no. 3, pp. 273–296, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
A. Schwartzman, Random ellipsoids and false discovery rates: statistics for diffusion tensor imaging data [Ph.D. thesis], Stanford University, 2006.
J. D. Lawson and Y. Lim, “The geometric mean, matrices, metrics, and more,” The American Mathematical Monthly, vol. 108, no. 9, pp. 797–812, 2001.
View at: Publisher Site | Google Scholar | MathSciNet
R. Bhatia, Positive definite Matrices, Princeton Series in Applied Mathematics, Princeton University Press, Princeton, NJ, USA, 2007.
View at: MathSciNet
C. Udrişte, Convex Functions and Optimization Methods on Riemannian Manifolds, vol. 297 of Mathematics and its Applications, Kluwer Academic Publishers, Dordrecht, Germany, 1994.
View at: MathSciNet
M. P. D. Carmo, Riemannian Geometry, Springer, 1992.
D. G. Luenberger, “The gradient projection method along geodesics,” Management Science, vol. 18, pp. 620–631, 1972.
View at: Google Scholar | MathSciNet

Copyright

Copyright © 2014 Xiaomin Duan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2021

Downloads

1491

Citations