skip to main content
article
Free Access

Exploiting fast matrix multiplication within the level 3 BLAS

Published:01 December 1990Publication History
Skip Abstract Section

Abstract

The Level 3 BLAS (BLAS3) are a set of specifications of FORTRAN 77 subprograms for carrying out matrix multiplications and the solution of triangular systems with multiple right-hand sides. They are intended to provide efficient and portable building blocks for linear algebra algorithms on high-performance computers. We describe algorithms for the BLAS3 operations that are asymptotically faster than the conventional ones. These algorithms are based on Strassen's method for fast matrix multiplication, which is now recognized to be a practically useful technique once matrix dimensions exceed about 100. We pay particular attention to the numerical stability of these “fast BLAS3.” Error bounds are given and their significance is explained and illustrated with the aid of numerical experiments. Our conclusion is that the fast BLAS3, although not as strongly stable as conventional implementations, are stable enough to merit careful consideration in many applications.

References

  1. 1 AHO, A. V., HOPCROFT, J. E, AND ULLMAN, J. D. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974. Google ScholarGoogle Scholar
  2. 2 BAILEY, D.H. Extra high speed matrix multiplication on the Cray-2. SIAM J. Sci. Stat. Comput. 9 (1988), 603-607. Google ScholarGoogle Scholar
  3. 3 BINI, D., AND LOTTI, D. Stability of fast algorithms for matrix multiplication. Numer. Math. 36 (1980), 63-72.Google ScholarGoogle Scholar
  4. 4 BRASSARD, G., AND BRATLEY, P. Algorithmics: Theory and Practice. Prentice-Hall, Englewood Cliffs, N.J., 1988. Google ScholarGoogle Scholar
  5. 5 BRENT, R. P. Algorithms for matrix multiplication. Tech. Rep. CS 157, Computer Science Dept., Stanford Univ., Palo Alto, Calif., 1970. Google ScholarGoogle Scholar
  6. 6 BRENT, R.P. Error analysis of algorithms for matrix multiplication and triangular decomposition using Winograd's identity. Numer. Math. 16 (1970), 145-156.Google ScholarGoogle Scholar
  7. 7 COPPERSMITH, D., AND WINOGRAD, S. Matrix multiplication via arithmetic progression. In Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, 1987, 1-6. Google ScholarGoogle Scholar
  8. 8 DAYD~, M0' J', AND DUFF, I.S. Use of level 3 BLAS in LU factorization on the Cray-2, the ETA-10P, and the IBM 3090-200/VF. Tech. Rep. CSS 229, Computer Science and Systems Div., Harwell Lab., 1988.Google ScholarGoogle Scholar
  9. 9 DEMMEL, J. W., DONGARRA, J. J., Du CROZ, J. J., GREENBAUM, A., HAMMARLING, S. J., AND SORENSEN, D. C. Prospectus for the development of a linear algebra library for highperformance computers. Tech. Memor. 97. Mathematics and Computer Science Div., Argonne National Lab., Argonne, Ill., 1987.Google ScholarGoogle Scholar
  10. 10 DONGARRA, J. J., Du CROZ, J. J., DUFF, I. S., AND HAMMARLING, S.J. A set of Level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16 (1990), 1-17. Google ScholarGoogle Scholar
  11. 11 DONGARRA, J. J., Du CROZ, J. J., DUFF, I. S., AND HAMMARLING, S.J. Algorithm 679: A set of Level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16 (1990), 18-28. Google ScholarGoogle Scholar
  12. 12 GALLIVAN, K., JALBY, W., AND MEIER, U. The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory. SIAM j. Sci. Stat. Comput. 8 (1987), 1079-1084. Google ScholarGoogle Scholar
  13. 13 GALLIVAN, K., JALBY, W., MEIER, U., AND SAMEH, A. I-I. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2 (1988), 12-48.Google ScholarGoogle Scholar
  14. 14 GOLUB, G. H., AND VAN LOAN, C.F. Matrix Computations, 2nd ed. Johns Hopkins University Press, Baltimore, Md., 1989. Google ScholarGoogle Scholar
  15. 15 HEYMAN, D.P. Further comparisons of direct methods for computing stationary distributions of Markov chains. SIAM J. Alg. Discrete Meth. 8 (1987), 226-232. Google ScholarGoogle Scholar
  16. 16 HIGHAM, N.J. The accuracy of solutions to triangular systems. SIAM J. Numer. Anal. 26 (1989), 1252-1265. Google ScholarGoogle Scholar
  17. 17 HIGHAM, N. J., AND SCHREIBER, R.S. Fast polar decomposition of an arbitrary matrix. SIAM J. Sci. Stat. Comput. 11 (1990), 648-655. Google ScholarGoogle Scholar
  18. 18 IBM. Engineering and Scientific Subroutine Library, Guide and Reference, Release 3. 4th ed., Program 5668-863, 1988.Google ScholarGoogle Scholar
  19. 19 MILLER, W. Computational complexity and numerical stability. SIAM J. Comput. 4 (1975), 97-107.Google ScholarGoogle Scholar
  20. 20 MOLER, C. B., LITTLE, J. N., AND BANGERT, S. Pro-Matlab User's Guide. The MathWorks, Inc., South Natick, Mass., 1987.Google ScholarGoogle Scholar
  21. 21 PRESS, W. H., FLANNERY, B. P., TEUKOLSKY, S. A., AND VETTERLING, W.T. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, Cambridge, England, 1986. Google ScholarGoogle Scholar
  22. 22 SCHREIBER, R.S. Block algorithms for parallel machines. In Numerical Algorithms for Modern Parallel Computer Architectures, M. H. Schultz, Ed., IMA Volumes In Mathematics and Its Applications 13, Springer-Verlag, Berlin, 1988, 197-207.Google ScholarGoogle Scholar
  23. 23 SEDGEWICK, R. Algorithms. 2nd ed., Addison-Wesley, Reading, Mass., 1988. Google ScholarGoogle Scholar
  24. 24 STRASSEN, V. Gaussian elimination is not optimal. Numer. Math. I3 (1969), 354-356.Google ScholarGoogle Scholar

Index Terms

  1. Exploiting fast matrix multiplication within the level 3 BLAS

        Recommendations

        Reviews

        Peter Bruce Worland

        The BLAS3 (Level 3 Basic Linear Algebra Subprograms) is a set of specifications, based on matrix-matrix operations, for efficient and portable routines to be used on high-performance computers. This interesting and clearly written paper explores the possibility of enhancing the performance of the BLAS3 routines using variants of Strassen's algorithm for matrix multiplication. Much of the paper surveys previous work on the Strassen algorithm. The algorithm, which was the first published scheme that broke the O ( n 2 ) operations barrier for multiplying n × n matrices, is still considered by many to be of theoretical interest only. The author shows, however, as others have shown, that it is of practical importance for n greater than about 100. Higham gives a lucid description of the basic algorithms in terms of a product of rectangular matrices. He then presents Strassen-based recursive algorithms for the other BLAS3 basic operations: rank- r and rank-2 r updates of a real symmetric matrix; multiplication of a matrix by a triangular matrix; and solving a triangular system of equations with multiple right-hand sides. Because of their recursive nature, there is concern for the stability of these algorithms. The author presents a detailed error analysis, obtaining essentially the same result as that of Brendt [1]. Numerical experiments were conducted to verify the error bounds. Unlike the conventional BLAS3 routines, the so-called “fast” BLAS3 routines do not satisfy strong component-wise bounds for the residuals. They do satisfy similar norm-wise bounds, but with somewhat larger constant terms. No timing runs were made to confirm the analytical results. The tradeoff between accuracy and efficiency remains unclear, but the author's results are sufficiently encouraging to warrant further study. The paper should interest anyone working on algorithms that involve, or could involve, matrix products.

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Mathematical Software
          ACM Transactions on Mathematical Software  Volume 16, Issue 4
          Dec. 1990
          104 pages
          ISSN:0098-3500
          EISSN:1557-7295
          DOI:10.1145/98267
          Issue’s Table of Contents

          Copyright © 1990 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 December 1990
          Published in toms Volume 16, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader