Fast methods for resumming matrix polynomials and Chebyshev matrix polynomials

doi:10.1016/j.jcp.2003.08.027

Journal of Computational Physics

Volume 194, Issue 2, 1 March 2004, Pages 575-587

https://doi.org/10.1016/j.jcp.2003.08.027 Get rights and content

Abstract

Fast and effective algorithms are discussed for resumming matrix polynomials and Chebyshev matrix polynomials. These algorithms lead to a significant speed-up in computer time by reducing the number of matrix multiplications required to roughly twice the square root of the degree of the polynomial. A few numerical tests are presented, showing that evaluation of matrix functions via polynomial expansions can be preferable when the matrix is sparse and these fast resummation algorithms are employed.

Introduction

Functions of a matrix can be defined by their series expansions, such as the matrix exponential: $e^{X} =1+ X + 12 X^{2} + 1 3! X^{3} +⋯.$ It is relatively unusual for a matrix function to be evaluated this way because typically it is more efficient and precise to employ the eigenvectors and eigenvalues of X, if it is diagonalizable. For example, if X is Hermitian, i.e., $X^{T} = X$ , we can evaluate the matrix function $e^{X}$ as $e^{X} = U e^{x} U^{†},$ where U are the eigenvectors and x the diagonal matrix of eigenvalues of matrix X.

However, some situations arise where evaluation of the series is actually preferable. The particular case we have in mind is where the dimension of X is very large, and in addition (or as a consequence) X is very sparse. Then sparse matrix multiplication methods can be used to evaluate the products entering the matrix polynomial. The resulting computational effort can then in principle increase only linearly with the dimension of the matrix, if its bandwidth is constant. By contrast, explicitly obtaining the eigenvalues and eigenvectors cannot scale linearly – indeed it is cubic scaling if standard linear algebra routines are employed. For example, this situation arises in formulating linear scaling algorithms to perform tight-binding or Kohn–Sham density functional theory calculations on very large molecules [1], [2], [3], [4], [5], [6], [7], in evaluation of time-evolution operator e^−iHt [8], and the calculation of Boltzmann factors and partition functions of very large molecules in computational physics [9], [10], [11].

In this paper, we report fast algorithms for summing matrix and Chebyshev matrix polynomials. The problem is to minimize the computer time required to perform the sum to a given power, N_pol, the degree of the polynomial. To an excellent approximation this is equivalent to minimizing the number of matrix multiplications, since the cost of matrix multiplications completely dominates the cost of matrix additions. The other degree of freedom that is available is to store multiple intermediate quantities to permit reduction of the matrix multiplication effort.

Simple term-by-term evaluation of a matrix polynomial of degree N_pol requires N_pol−1 matrix multiplications. However it is possible to do substantially better, as has been shown in a number of papers where fast algorithms have been reported for resumming matrix polynomials with fewer matrix multiplications [12], [13], [14], [15], [16]. Perhaps the most effective algorithm was suggested by Paterson and Stockmeyer [14], which requires only $O (N_{pol})$ nonscalar multiplications to evaluate polynomials of degree N_pol. These authors also presented proofs that at least $N_{pol}$ nonscalar multiplications are required to resum a polynomial of degree N_pol. The algorithm has practical application in the evaluation of matrix polynomials with scalar coefficients and offers huge improvement over simple term-by-term evaluation, at the expense of requiring more stored matrices. Storage of $O (N_{pol})$ matrices is required in this algorithm, where N is the dimension of the matrix X. Van Loan [15] later showed how this procedure could be implemented with greatly reduced storage, although significantly more matrix multiplies were required.

In the following section (Section 2), we review the algorithm of Paterson and Stockmeyer [14] and identify other two algorithms for matrix polynomial evaluation that also scale with $N_{pol}$ , the square root of the maximum degree of the polynomial. The coefficient is approximately 2. These two relatively distinct algorithms turn out to yield very similar matrix multiplication counts but require fewer stored matrices than Paterson and Stockmeyer’s algorithm.

Another purpose of the paper is to generalize these algorithms for summing simple matrix polynomials to accelerate the summation matrix series based on Chebyshev matrix polynomials of the form $f(X)=∑^{N_{pol}}_{i=0} a_{i} T_{i} (X).$ $T_{i} (X)$ is the Chebyshev matrix polynomial of degree i, which is recursively defined in terms of T_i−1 and T_i−2, and a₀,a₁,…,a_N are the scalar coefficients. If the eigenvalues of X lie in the range [−1,1], then Chebyshev approximations are numerically very stable and they are less susceptible to round-off errors due to limited machine precision than the equivalent power series [17]. In the context of linear scaling electronic structure methods, Chebyshev matrix polynomials have therefore been proposed as a more stable and efficient alternative to simple matrix polynomials [3], [18], [19]. These generalizations are described in Section 3. They yield algorithms that are broadly similar to the ones described for matrix polynomials, with differences arising from the recursion relations associated with Chebyshev matrix polynomials. Detailed application of this approach to linear scaling electronic structure calculations based on Chebyshev polynomials is described elsewhere [20].

In Section 4, we present three types of data to characterize the performance of the 3 fast summation algorithms. First, the number of matrix multiplies, as well as the number of stored intermediates, are reported for polynomials of various degrees. Second, computer time for the evaluation of fast summations is compared with the time for conventional term-by-term summation on some test matrices. Third, the computer time for evaluating matrix polynomials by fast summation is compared with evaluation by explicit diagonalization for the case where the matrix is large and sparse. Finally we conclude in Section 5.

Section snippets

Fast methods for resumming matrix polynomials

Three kinds of algorithms are presented to effectively resum matrix polynomials in this section. The central strategy of these algorithms is the hierarchical decomposition of matrix polynomials into many blocks and to reuse multiple intermediate quantities to reduce the matrix multiplications.

Fast algorithms for resumming Chebyshev matrix polynomials

In many numerical applications, it is preferable to use Chebyshev matrix polynomials rather than simple matrix polynomials because of their superior numerical stability. Therefore it is of practical interest to generalize the fast methods for resumming a simple matrix polynomial discussed above, so that they can be employed to resum Chebyshev matrix polynomials. The key difference is that because higher Chebyshev polynomials are defined in terms of lower ones by a two-term recurrence relation

Results and discussion

As the first step in demonstrating the performance of the fast summation methods, we focus on the numbers of matrix multiplications, 〈M〉, and the number of intermediate matrices that must be saved, SM, for each algorithm, for a variety of polynomial degrees, N_pol. This data is presented in Table 1. We observe that the values of 〈M〉 and SM are far vastly less than N_pol+1. Indeed it is apparent that the value of 〈M〉 in all three algorithms is approximately equal to $∼2 N_{pol} +1$ for polynomials of

Conclusions

Fast summation algorithms for evaluating matrix polynomials and matrix functions have been revisited in this paper. Three algorithms have been presented to reduce the number of matrix multiplications in the evaluation of matrix polynomials and Chebyshev matrix polynomials. They significantly reduce the total number of matrix multiplications and thus lead to speed-ups in CPU time relative to simple term-by-term summation.

These methods can be very useful when the matrices in question are large

Acknowledgements

WZL would like to express her deep gratitude to Prof. Arup K. Chakraborty and Mr. Baron Peters for many stimulating discussions relevant to this work. Financial support from BP (WZL) and the Israel-US Binational Science Foundation (Baer & MHG) as well as support (MHG) from the National Science Foundation (CHE-9981997) are gratefully acknowledged. This work was also supported by funding from Q-Chem Inc via an SBIR subcontract from the National Institutes of Health (R43GM069255-01). MHG is a part

References (23)

R.N. Silver et al.
J. Comp. Phys.
(1996)
T. Helgaker et al.
Chem. Phys. Lett.
(2000)
X.P. Li et al.
Phys. Rev. B
(1993)
S. Goedecker et al.
Phys. Rev. Lett.
(1994)
T. Helgaker et al.
Molecular Electronic-Structure Theory
(2000)
A.M. Niklasson
Phys. Rev. B
(2002)
A.H.R. Palser et al.
Phys. Rev B
(1998)
H. Tal-ezer et al.
J. Chem. Phys.
(1984)
R. Kosloff et al.
Chem. Phys. Lett.
(1986)
R.N. Silver et al.
Int. J. Mol. Phys. C
(1994)

Y. Motome et al.

J. Phys. Soc. Jpn.

(1999)

Cited by (21)

Computing the matrix sine and cosine simultaneously with a reduced number of products
2021, Applied Numerical Mathematics
Citation Excerpt :
Notice that the dimension of M is twice the dimension of A and so the cost of matrix-matrix multiplications grows, in general, by a factor of eight. The search of fast algorithms for evaluating matrix polynomials has received considerable interest in the literature [2,3,18,20,23,25,29,30]. We next briefly summarize how to approximate the matrix sine and cosine functions by means of certain polynomials involving a reduced number of matrix products.
A new procedure is presented for computing the matrix cosine and sine simultaneously by means of Taylor polynomial approximations. These are factorized so as to reduce the number of matrix products involved. Two versions are developed to be used in single and double precision arithmetic. The resulting algorithms are more efficient than schemes based on Padé approximations for a wide range of norm matrices.
Hybrid programming-model strategies for GPU offloading of electronic structure calculation kernels
2024, Journal of Chemical Physics
Hybrid programming-model strategies for GPU offloading of electronic structure calculation kernels
2024, arXiv
Markov Decision Processes and Stochastic Positional Games: Optimal Control on Complex Networks
2024, International Series in Operations Research and Management Science
A fast, dense Chebyshev solver for electronic structure on GPUs
2023, Journal of Chemical Physics
A fast, dense Chebyshev solver for electronic structure on GPUs
2023, arXiv

View all citing articles on Scopus

¹: Those authors contributed equally.

View full text

Fast methods for resumming matrix polynomials and Chebyshev matrix polynomials

Abstract

Introduction

Section snippets

Fast methods for resumming matrix polynomials

Fast algorithms for resumming Chebyshev matrix polynomials

Results and discussion

Conclusions

Acknowledgements

J. Comp. Phys.

Chem. Phys. Lett.

Phys. Rev. B

Phys. Rev. Lett.

Molecular Electronic-Structure Theory

Phys. Rev. B

Phys. Rev B

J. Chem. Phys.

Chem. Phys. Lett.

Int. J. Mol. Phys. C

J. Phys. Soc. Jpn.