Fast methods for resumming matrix polynomials and Chebyshev matrix polynomials

https://doi.org/10.1016/j.jcp.2003.08.027Get rights and content

Abstract

Fast and effective algorithms are discussed for resumming matrix polynomials and Chebyshev matrix polynomials. These algorithms lead to a significant speed-up in computer time by reducing the number of matrix multiplications required to roughly twice the square root of the degree of the polynomial. A few numerical tests are presented, showing that evaluation of matrix functions via polynomial expansions can be preferable when the matrix is sparse and these fast resummation algorithms are employed.

Introduction

Functions of a matrix can be defined by their series expansions, such as the matrix exponential:eX=1+X+12X2+13!X3+⋯.It is relatively unusual for a matrix function to be evaluated this way because typically it is more efficient and precise to employ the eigenvectors and eigenvalues of X, if it is diagonalizable. For example, if X is Hermitian, i.e., XT=X, we can evaluate the matrix function eX aseX=UexU,where U are the eigenvectors and x the diagonal matrix of eigenvalues of matrix X.

However, some situations arise where evaluation of the series is actually preferable. The particular case we have in mind is where the dimension of X is very large, and in addition (or as a consequence) X is very sparse. Then sparse matrix multiplication methods can be used to evaluate the products entering the matrix polynomial. The resulting computational effort can then in principle increase only linearly with the dimension of the matrix, if its bandwidth is constant. By contrast, explicitly obtaining the eigenvalues and eigenvectors cannot scale linearly – indeed it is cubic scaling if standard linear algebra routines are employed. For example, this situation arises in formulating linear scaling algorithms to perform tight-binding or Kohn–Sham density functional theory calculations on very large molecules [1], [2], [3], [4], [5], [6], [7], in evaluation of time-evolution operator e−iHt [8], and the calculation of Boltzmann factors and partition functions of very large molecules in computational physics [9], [10], [11].

In this paper, we report fast algorithms for summing matrix and Chebyshev matrix polynomials. The problem is to minimize the computer time required to perform the sum to a given power, Npol, the degree of the polynomial. To an excellent approximation this is equivalent to minimizing the number of matrix multiplications, since the cost of matrix multiplications completely dominates the cost of matrix additions. The other degree of freedom that is available is to store multiple intermediate quantities to permit reduction of the matrix multiplication effort.

Simple term-by-term evaluation of a matrix polynomial of degree Npol requires Npol−1 matrix multiplications. However it is possible to do substantially better, as has been shown in a number of papers where fast algorithms have been reported for resumming matrix polynomials with fewer matrix multiplications [12], [13], [14], [15], [16]. Perhaps the most effective algorithm was suggested by Paterson and Stockmeyer [14], which requires only O(Npol) nonscalar multiplications to evaluate polynomials of degree Npol. These authors also presented proofs that at least Npol nonscalar multiplications are required to resum a polynomial of degree Npol. The algorithm has practical application in the evaluation of matrix polynomials with scalar coefficients and offers huge improvement over simple term-by-term evaluation, at the expense of requiring more stored matrices. Storage of O(Npol) matrices is required in this algorithm, where N is the dimension of the matrix X. Van Loan [15] later showed how this procedure could be implemented with greatly reduced storage, although significantly more matrix multiplies were required.

In the following section (Section 2), we review the algorithm of Paterson and Stockmeyer [14] and identify other two algorithms for matrix polynomial evaluation that also scale with Npol, the square root of the maximum degree of the polynomial. The coefficient is approximately 2. These two relatively distinct algorithms turn out to yield very similar matrix multiplication counts but require fewer stored matrices than Paterson and Stockmeyer’s algorithm.

Another purpose of the paper is to generalize these algorithms for summing simple matrix polynomials to accelerate the summation matrix series based on Chebyshev matrix polynomials of the formf(X)=∑Npoli=0aiTi(X).Ti(X) is the Chebyshev matrix polynomial of degree i, which is recursively defined in terms of Ti−1 and Ti−2, and a0,a1,…,aN are the scalar coefficients. If the eigenvalues of X lie in the range [−1,1], then Chebyshev approximations are numerically very stable and they are less susceptible to round-off errors due to limited machine precision than the equivalent power series [17]. In the context of linear scaling electronic structure methods, Chebyshev matrix polynomials have therefore been proposed as a more stable and efficient alternative to simple matrix polynomials [3], [18], [19]. These generalizations are described in Section 3. They yield algorithms that are broadly similar to the ones described for matrix polynomials, with differences arising from the recursion relations associated with Chebyshev matrix polynomials. Detailed application of this approach to linear scaling electronic structure calculations based on Chebyshev polynomials is described elsewhere [20].

In Section 4, we present three types of data to characterize the performance of the 3 fast summation algorithms. First, the number of matrix multiplies, as well as the number of stored intermediates, are reported for polynomials of various degrees. Second, computer time for the evaluation of fast summations is compared with the time for conventional term-by-term summation on some test matrices. Third, the computer time for evaluating matrix polynomials by fast summation is compared with evaluation by explicit diagonalization for the case where the matrix is large and sparse. Finally we conclude in Section 5.

Section snippets

Fast methods for resumming matrix polynomials

Three kinds of algorithms are presented to effectively resum matrix polynomials in this section. The central strategy of these algorithms is the hierarchical decomposition of matrix polynomials into many blocks and to reuse multiple intermediate quantities to reduce the matrix multiplications.

Fast algorithms for resumming Chebyshev matrix polynomials

In many numerical applications, it is preferable to use Chebyshev matrix polynomials rather than simple matrix polynomials because of their superior numerical stability. Therefore it is of practical interest to generalize the fast methods for resumming a simple matrix polynomial discussed above, so that they can be employed to resum Chebyshev matrix polynomials. The key difference is that because higher Chebyshev polynomials are defined in terms of lower ones by a two-term recurrence relation

Results and discussion

As the first step in demonstrating the performance of the fast summation methods, we focus on the numbers of matrix multiplications, 〈M〉, and the number of intermediate matrices that must be saved, SM, for each algorithm, for a variety of polynomial degrees, Npol. This data is presented in Table 1. We observe that the values of 〈M〉 and SM are far vastly less than Npol+1. Indeed it is apparent that the value of 〈M〉 in all three algorithms is approximately equal to ∼2Npol+1 for polynomials of

Conclusions

Fast summation algorithms for evaluating matrix polynomials and matrix functions have been revisited in this paper. Three algorithms have been presented to reduce the number of matrix multiplications in the evaluation of matrix polynomials and Chebyshev matrix polynomials. They significantly reduce the total number of matrix multiplications and thus lead to speed-ups in CPU time relative to simple term-by-term summation.

These methods can be very useful when the matrices in question are large

Acknowledgements

WZL would like to express her deep gratitude to Prof. Arup K. Chakraborty and Mr. Baron Peters for many stimulating discussions relevant to this work. Financial support from BP (WZL) and the Israel-US Binational Science Foundation (Baer & MHG) as well as support (MHG) from the National Science Foundation (CHE-9981997) are gratefully acknowledged. This work was also supported by funding from Q-Chem Inc via an SBIR subcontract from the National Institutes of Health (R43GM069255-01). MHG is a part

References (23)

  • R.N. Silver et al.

    J. Comp. Phys.

    (1996)
  • T. Helgaker et al.

    Chem. Phys. Lett.

    (2000)
  • X.P. Li et al.

    Phys. Rev. B

    (1993)
  • S. Goedecker et al.

    Phys. Rev. Lett.

    (1994)
  • T. Helgaker et al.

    Molecular Electronic-Structure Theory

    (2000)
  • A.M. Niklasson

    Phys. Rev. B

    (2002)
  • A.H.R. Palser et al.

    Phys. Rev B

    (1998)
  • H. Tal-ezer et al.

    J. Chem. Phys.

    (1984)
  • R. Kosloff et al.

    Chem. Phys. Lett.

    (1986)
  • R.N. Silver et al.

    Int. J. Mol. Phys. C

    (1994)
  • Y. Motome et al.

    J. Phys. Soc. Jpn.

    (1999)
  • Cited by (21)

    View all citing articles on Scopus
    1

    Those authors contributed equally.

    View full text