Abstract
The Level 3 BLAS (BLAS3) are a set of specifications of FORTRAN 77 subprograms for carrying out matrix multiplications and the solution of triangular systems with multiple right-hand sides. They are intended to provide efficient and portable building blocks for linear algebra algorithms on high-performance computers. We describe algorithms for the BLAS3 operations that are asymptotically faster than the conventional ones. These algorithms are based on Strassen's method for fast matrix multiplication, which is now recognized to be a practically useful technique once matrix dimensions exceed about 100. We pay particular attention to the numerical stability of these “fast BLAS3.” Error bounds are given and their significance is explained and illustrated with the aid of numerical experiments. Our conclusion is that the fast BLAS3, although not as strongly stable as conventional implementations, are stable enough to merit careful consideration in many applications.
- 1 AHO, A. V., HOPCROFT, J. E, AND ULLMAN, J. D. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974. Google Scholar
- 2 BAILEY, D.H. Extra high speed matrix multiplication on the Cray-2. SIAM J. Sci. Stat. Comput. 9 (1988), 603-607. Google Scholar
- 3 BINI, D., AND LOTTI, D. Stability of fast algorithms for matrix multiplication. Numer. Math. 36 (1980), 63-72.Google Scholar
- 4 BRASSARD, G., AND BRATLEY, P. Algorithmics: Theory and Practice. Prentice-Hall, Englewood Cliffs, N.J., 1988. Google Scholar
- 5 BRENT, R. P. Algorithms for matrix multiplication. Tech. Rep. CS 157, Computer Science Dept., Stanford Univ., Palo Alto, Calif., 1970. Google Scholar
- 6 BRENT, R.P. Error analysis of algorithms for matrix multiplication and triangular decomposition using Winograd's identity. Numer. Math. 16 (1970), 145-156.Google Scholar
- 7 COPPERSMITH, D., AND WINOGRAD, S. Matrix multiplication via arithmetic progression. In Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, 1987, 1-6. Google Scholar
- 8 DAYD~, M0' J', AND DUFF, I.S. Use of level 3 BLAS in LU factorization on the Cray-2, the ETA-10P, and the IBM 3090-200/VF. Tech. Rep. CSS 229, Computer Science and Systems Div., Harwell Lab., 1988.Google Scholar
- 9 DEMMEL, J. W., DONGARRA, J. J., Du CROZ, J. J., GREENBAUM, A., HAMMARLING, S. J., AND SORENSEN, D. C. Prospectus for the development of a linear algebra library for highperformance computers. Tech. Memor. 97. Mathematics and Computer Science Div., Argonne National Lab., Argonne, Ill., 1987.Google Scholar
- 10 DONGARRA, J. J., Du CROZ, J. J., DUFF, I. S., AND HAMMARLING, S.J. A set of Level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16 (1990), 1-17. Google Scholar
- 11 DONGARRA, J. J., Du CROZ, J. J., DUFF, I. S., AND HAMMARLING, S.J. Algorithm 679: A set of Level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16 (1990), 18-28. Google Scholar
- 12 GALLIVAN, K., JALBY, W., AND MEIER, U. The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory. SIAM j. Sci. Stat. Comput. 8 (1987), 1079-1084. Google Scholar
- 13 GALLIVAN, K., JALBY, W., MEIER, U., AND SAMEH, A. I-I. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2 (1988), 12-48.Google Scholar
- 14 GOLUB, G. H., AND VAN LOAN, C.F. Matrix Computations, 2nd ed. Johns Hopkins University Press, Baltimore, Md., 1989. Google Scholar
- 15 HEYMAN, D.P. Further comparisons of direct methods for computing stationary distributions of Markov chains. SIAM J. Alg. Discrete Meth. 8 (1987), 226-232. Google Scholar
- 16 HIGHAM, N.J. The accuracy of solutions to triangular systems. SIAM J. Numer. Anal. 26 (1989), 1252-1265. Google Scholar
- 17 HIGHAM, N. J., AND SCHREIBER, R.S. Fast polar decomposition of an arbitrary matrix. SIAM J. Sci. Stat. Comput. 11 (1990), 648-655. Google Scholar
- 18 IBM. Engineering and Scientific Subroutine Library, Guide and Reference, Release 3. 4th ed., Program 5668-863, 1988.Google Scholar
- 19 MILLER, W. Computational complexity and numerical stability. SIAM J. Comput. 4 (1975), 97-107.Google Scholar
- 20 MOLER, C. B., LITTLE, J. N., AND BANGERT, S. Pro-Matlab User's Guide. The MathWorks, Inc., South Natick, Mass., 1987.Google Scholar
- 21 PRESS, W. H., FLANNERY, B. P., TEUKOLSKY, S. A., AND VETTERLING, W.T. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, Cambridge, England, 1986. Google Scholar
- 22 SCHREIBER, R.S. Block algorithms for parallel machines. In Numerical Algorithms for Modern Parallel Computer Architectures, M. H. Schultz, Ed., IMA Volumes In Mathematics and Its Applications 13, Springer-Verlag, Berlin, 1988, 197-207.Google Scholar
- 23 SEDGEWICK, R. Algorithms. 2nd ed., Addison-Wesley, Reading, Mass., 1988. Google Scholar
- 24 STRASSEN, V. Gaussian elimination is not optimal. Numer. Math. I3 (1969), 354-356.Google Scholar
Index Terms
- Exploiting fast matrix multiplication within the level 3 BLAS
Recommendations
Stability of block algorithms with fast level-3 BLAS
Block algorithms are becoming increasingly popular in matrix computations. Since their basic unit of data is a submatrix rather than a scalar, they have a higher level of granularity than point algorithms, and this makes them well suited to high-...
Robust level-3 BLAS Inverse Iteration from the Hessenberg Matrix
Inverse iteration is known to be an effective method for computing eigenvectors corresponding to simple and well-separated eigenvalues. In the non-symmetric case, the solution of shifted Hessenberg systems is a central step. Existing inverse iteration ...
Comments