article

Free Access

Exploiting fast matrix multiplication within the level 3 BLAS

Author:
Nicholas J. Higham

Univ. of Manchester, Manchester, UK

Univ. of Manchester, Manchester, UK
View Profile

Authors Info & Claims

ACM Transactions on Mathematical Software Volume 16 Issue 4pp 352–368https://doi.org/10.1145/98267.98290

Published:01 December 1990Publication History

ACM Transactions on Mathematical Software

Abstract

The Level 3 BLAS (BLAS3) are a set of specifications of FORTRAN 77 subprograms for carrying out matrix multiplications and the solution of triangular systems with multiple right-hand sides. They are intended to provide efficient and portable building blocks for linear algebra algorithms on high-performance computers. We describe algorithms for the BLAS3 operations that are asymptotically faster than the conventional ones. These algorithms are based on Strassen's method for fast matrix multiplication, which is now recognized to be a practically useful technique once matrix dimensions exceed about 100. We pay particular attention to the numerical stability of these “fast BLAS3.” Error bounds are given and their significance is explained and illustrated with the aid of numerical experiments. Our conclusion is that the fast BLAS3, although not as strongly stable as conventional implementations, are stable enough to merit careful consideration in many applications.

References

1 AHO, A. V., HOPCROFT, J. E, AND ULLMAN, J. D. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974. Google Scholar
2 BAILEY, D.H. Extra high speed matrix multiplication on the Cray-2. SIAM J. Sci. Stat. Comput. 9 (1988), 603-607. Google Scholar
3 BINI, D., AND LOTTI, D. Stability of fast algorithms for matrix multiplication. Numer. Math. 36 (1980), 63-72.Google Scholar
4 BRASSARD, G., AND BRATLEY, P. Algorithmics: Theory and Practice. Prentice-Hall, Englewood Cliffs, N.J., 1988. Google Scholar
5 BRENT, R. P. Algorithms for matrix multiplication. Tech. Rep. CS 157, Computer Science Dept., Stanford Univ., Palo Alto, Calif., 1970. Google Scholar
6 BRENT, R.P. Error analysis of algorithms for matrix multiplication and triangular decomposition using Winograd's identity. Numer. Math. 16 (1970), 145-156.Google Scholar
7 COPPERSMITH, D., AND WINOGRAD, S. Matrix multiplication via arithmetic progression. In Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, 1987, 1-6. Google Scholar
8 DAYD~, M0' J', AND DUFF, I.S. Use of level 3 BLAS in LU factorization on the Cray-2, the ETA-10P, and the IBM 3090-200/VF. Tech. Rep. CSS 229, Computer Science and Systems Div., Harwell Lab., 1988.Google Scholar
9 DEMMEL, J. W., DONGARRA, J. J., Du CROZ, J. J., GREENBAUM, A., HAMMARLING, S. J., AND SORENSEN, D. C. Prospectus for the development of a linear algebra library for highperformance computers. Tech. Memor. 97. Mathematics and Computer Science Div., Argonne National Lab., Argonne, Ill., 1987.Google Scholar
10 DONGARRA, J. J., Du CROZ, J. J., DUFF, I. S., AND HAMMARLING, S.J. A set of Level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16 (1990), 1-17. Google Scholar
11 DONGARRA, J. J., Du CROZ, J. J., DUFF, I. S., AND HAMMARLING, S.J. Algorithm 679: A set of Level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16 (1990), 18-28. Google Scholar
12 GALLIVAN, K., JALBY, W., AND MEIER, U. The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory. SIAM j. Sci. Stat. Comput. 8 (1987), 1079-1084. Google Scholar
13 GALLIVAN, K., JALBY, W., MEIER, U., AND SAMEH, A. I-I. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2 (1988), 12-48.Google Scholar
14 GOLUB, G. H., AND VAN LOAN, C.F. Matrix Computations, 2nd ed. Johns Hopkins University Press, Baltimore, Md., 1989. Google Scholar
15 HEYMAN, D.P. Further comparisons of direct methods for computing stationary distributions of Markov chains. SIAM J. Alg. Discrete Meth. 8 (1987), 226-232. Google Scholar
16 HIGHAM, N.J. The accuracy of solutions to triangular systems. SIAM J. Numer. Anal. 26 (1989), 1252-1265. Google Scholar
17 HIGHAM, N. J., AND SCHREIBER, R.S. Fast polar decomposition of an arbitrary matrix. SIAM J. Sci. Stat. Comput. 11 (1990), 648-655. Google Scholar
18 IBM. Engineering and Scientific Subroutine Library, Guide and Reference, Release 3. 4th ed., Program 5668-863, 1988.Google Scholar
19 MILLER, W. Computational complexity and numerical stability. SIAM J. Comput. 4 (1975), 97-107.Google Scholar
20 MOLER, C. B., LITTLE, J. N., AND BANGERT, S. Pro-Matlab User's Guide. The MathWorks, Inc., South Natick, Mass., 1987.Google Scholar
21 PRESS, W. H., FLANNERY, B. P., TEUKOLSKY, S. A., AND VETTERLING, W.T. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, Cambridge, England, 1986. Google Scholar
22 SCHREIBER, R.S. Block algorithms for parallel machines. In Numerical Algorithms for Modern Parallel Computer Architectures, M. H. Schultz, Ed., IMA Volumes In Mathematics and Its Applications 13, Springer-Verlag, Berlin, 1988, 197-207.Google Scholar
23 SEDGEWICK, R. Algorithms. 2nd ed., Addison-Wesley, Reading, Mass., 1988. Google Scholar
24 STRASSEN, V. Gaussian elimination is not optimal. Numer. Math. I3 (1969), 354-356.Google Scholar

Index Terms

Exploiting fast matrix multiplication within the level 3 BLAS

Recommendations

Stability of block algorithms with fast level-3 BLAS

Block algorithms are becoming increasingly popular in matrix computations. Since their basic unit of data is a submatrix rather than a scalar, they have a higher level of granularity than point algorithms, and this makes them well suited to high-...
Read More
GEMM-Based Level-3 BLAS
Read More
Robust level-3 BLAS Inverse Iteration from the Hessenberg Matrix
Inverse iteration is known to be an effective method for computing eigenvectors corresponding to simple and well-separated eigenvalues. In the non-symmetric case, the solution of shifted Hessenberg systems is a central step. Existing inverse iteration ...
Read More

Reviews

Reviewer: Peter Bruce Worland

The BLAS3 (Level 3 Basic Linear Algebra Subprograms) is a set of specifications, based on matrix-matrix operations, for efficient and portable routines to be used on high-performance computers. This interesting and clearly written paper explores the possibility of enhancing the performance of the BLAS3 routines using variants of Strassen's algorithm for matrix multiplication. Much of the paper surveys previous work on the Strassen algorithm. The algorithm, which was the first published scheme that broke the O ( n 2 ) operations barrier for multiplying n × n matrices, is still considered by many to be of theoretical interest only. The author shows, however, as others have shown, that it is of practical importance for n greater than about 100. Higham gives a lucid description of the basic algorithms in terms of a product of rectangular matrices. He then presents Strassen-based recursive algorithms for the other BLAS3 basic operations: rank- r and rank-2 r updates of a real symmetric matrix; multiplication of a matrix by a triangular matrix; and solving a triangular system of equations with multiple right-hand sides. Because of their recursive nature, there is concern for the stability of these algorithms. The author presents a detailed error analysis, obtaining essentially the same result as that of Brendt [1]. Numerical experiments were conducted to verify the error bounds. Unlike the conventional BLAS3 routines, the so-called “fast” BLAS3 routines do not satisfy strong component-wise bounds for the residuals. They do satisfy similar norm-wise bounds, but with somewhat larger constant terms. No timing runs were made to confirm the analytical results. The tradeoff between accuracy and efficiency remains unclear, but the author's results are sufficiently encouraging to warrant further study. The paper should interest anyone working on algorithms that involve, or could involve, matrix products.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Mathematical Software Volume 16, Issue 4
Dec. 1990
104 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/98267
Issue’s Table of Contents

Copyright © 1990 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 1990
Published in toms Volume 16, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 109
  Total Citations
  View Citations
- 2,123
  Total Downloads
- Downloads (Last 12 months)188
- Downloads (Last 6 weeks)27
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploiting fast matrix multiplication within the level 3 BLAS

ACM Transactions on Mathematical Software

Abstract

References

Cited By

Index Terms

Recommendations

Stability of block algorithms with fast level-3 BLAS

GEMM-Based Level-3 BLAS

Robust level-3 BLAS Inverse Iteration from the Hessenberg Matrix

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Exploiting fast matrix multiplication within the level 3 BLAS

ACM Transactions on Mathematical Software

Abstract

References

Cited By

Index Terms

Recommendations

Stability of block algorithms with fast level-3 BLAS

GEMM-Based Level-3 BLAS

Robust level-3 BLAS Inverse Iteration from the Hessenberg Matrix

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media