Performance of parallel Cholesky factorization algorithms using BLAS

Luecke, Glenn R.; Yun, Jae Heon; Smith, Philip W.

doi:10.1007/BF00155804

Performance of parallel Cholesky factorization algorithms using BLAS

Published: December 1992

Volume 6, pages 315–329, (1992)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Glenn R. Luecke¹,
Jae Heon Yun² &
Philip W. Smith³

61 Accesses
1 Citation
Explore all metrics

Abstract

This paper considers four parallel Cholesky factorization algorithms, including SPOTRF from the February 1992 release of LAPACK, each of which call parallel Level 2 or 3 BLAS, or both. A fifth parallel Cholesky algorithm that calls serial Level 3 BLAS is also described. The efficiency of these five algorithms on the CRAY-2, CRAY Y-MP/832, Hitachi Data Systems EX 80, and IBM 3090-600J is evaluated and compared with a vendor-optimized parallel Cholesky factorization algorithm. The fifth parallel Cholesky algorithm that calls serial Level 3 BLAS provided the best performance of all algorithms that called BLAS routines. In fact, this algorithm outperformed the Cray-optimized libsci routine (SPOTRF) by 13–44%;, depending on the problem size and the number of processors used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Massively Parallel Implementation

Programming the LU Factorization for a Multicore System with Accelerators

Parallel Implementation of the Sherman-Morrison Matrix Inverse Algorithm

References

Agarwal, R.C., and Gustavson, F.G. 1988. A parallel implementation of matrix multiplication and LU factorization on IBM 3090. Proc., IFIP WG 2.5 Working Conf. on Aspects of Computation on Asynchronous Parallel Processors (Stanford, Calif.), Elsevier, New York.
Google Scholar
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S.
McKenney, A., and Sorenson, D. 1990. LAPACK: A portable linear algebra library for high-performance computers. In Proc., Supercomputing '90, IEEE Comp. Soc. Press, pp. 2–11.
Anderson, R.E., Grimes, R.G., and Simon, H.D. 1988. Performance comparison of the CRAY X-MP/24 with SDD and the CRAY-2. The J. Supercomputing, 1, 4 (Aug.): 409–419.
Google Scholar
Benzoni, A., Radicati di Brozolo, G., Mayes, P., and Sales, M.L. 1991. Block factorization algorithms on IBM processors. IBM ECSEC tech. rept. ICE-0041, Rome.
Cray Research, Inc., 1988. Autotasking User's Guide (SN-2088). Eagan, Minn.
Cray Research, Inc., 1989. Multitasking Programmer's Manual (SR-0222F). Eagan, Minn.
Cray Research, Inc., 1991. Volume 3: UNICOS Math and Scientific Library Reference Manual (SR-2081). Eagan, Minn.
Dackland, K., Elmroth, E., Kagstrom, B., and Van Loan, C. 1992. Parallel block matrix factorizations on the shared memory multiprocessor IBM3090xVF/600J. Internat. J. Supercomputer Applications, 6, 1: 69–97.
Google Scholar
Dayde, M.J., and Duff, I.S. 1990. Use of parallel Level 3 BLAS in LU factorization on three vector multiprocessors: The Alliant FX/80, the CRAY-2, and the IBM 3090 VF. Proc., 1990 Internat. Conf. on Supercomputing. ACM Press, pp. 82–95.
Dongarra, J., and Eisenstat, S. 1984. Squeezing the most out of an algorithm in CRAY FORTRAN. ACM Trans. Math. Soft., 10, 3: 219–230.
Google Scholar
Dongarra, J., Gustavson, F., and Karp, A. 1984. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Review, 26, 1: pp. 91–112.
Google Scholar
Dongarra, J.J., Du Croz, J., Duff, I.S., and Hammarling, S. 1990. A set of Level 3 basic linear algebra subprograms. ACM Trans. Math. Software, 16: 1–17.
Google Scholar
Dongarra, J.J., Du Croz, J., Hammarling, S., and Hanson, R. 1988. An extended set of Fortran basic linear algebra subprograms. ACM Trans. Math. Software. 14: 1–17.
Google Scholar
Golub, G.H., and Van Loan, C.F. 1989. Matrix Computations. Johns Hopkins Univ. Press, Baltimore, Md.
Google Scholar
IBM. 1988. Parallel Fortran Language and Library Reference (SC23-0431-0). Armonk, N.Y.
IBM. 1990. Engineering and Scientific Subroutine Library Guide and Reference (SC23-0184-4). Armonk, N.Y.
Lawson, C., Hanson, R., Kincaid, D., and Krogh, F. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Software, 5: 308–323.
Google Scholar
Luecke, G.R., Yun, J.H., and Jespersen, H.W. 1989. Performance comparisons of Cholesky factorization algorithms using Level 2 & 3 BLAS on the National Advanced Systems AS/XL vector computer. In Proc., 1989 Internat. Conf. on Supercomputing, ACM Press, pp. 254–262.
Luecke, G.R., Yun, J.H., Smith, P., and Li, I. 1990. Performance comparisons of Cholesky fectorization algorithms using Level 2 & 3 BLAS on the HDS AS/EX V60, IBM 3090/VF 600E, CRAY-2, CRAY X-MP, and CRAY Y-MP. IMSL tech. rept. series no. 9003, IMSL, Inc., Sugar Land, Tex.
Google Scholar
Mayes, P., and Radicati di Brozolo, G. 1989. Portable and efficient factorization algorithms on the IBM 3090/VF. In Proc., 1989 Internat. Conf. on Supercomputing, ACM Press, pp. 263–270.
Schönauer, W. 1987. Scientific Computing on Vector Computers. North-Holland
van der Steen, A.J., and Van der Pas, R.J. 1987. A family portrait: Benchmark tests on a CRAY Y-MP and a CRAY-2S. Tech. rept. TR-30, joint pub. of the WGS and ACCU Computers, North-Holland.

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computation Center, Iowa State University, 50011, Ames, Iowa, USA
Glenn R. Luecke
Department of Mathematics, College of Natural Science, Chungbuk National Unversity, 360-763, Cheongju City, Chungbuk, South Korea
Jae Heon Yun
IMSL, Inc., 14141 Southwest Freeway, Suite 3000, 77478-3498, Sugar Land, TX, USA
Philip W. Smith

Authors

Glenn R. Luecke
View author publications
You can also search for this author in PubMed Google Scholar
Jae Heon Yun
View author publications
You can also search for this author in PubMed Google Scholar
Philip W. Smith
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This work was supported by grants from IMSL, Inc., and Hitachi Data Systems. The first version of this paper was presented as a poster session at Supercomputing '90, New York City, November 1990.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luecke, G.R., Yun, J.H. & Smith, P.W. Performance of parallel Cholesky factorization algorithms using BLAS. J Supercomput 6, 315–329 (1992). https://doi.org/10.1007/BF00155804

Download citation

Issue Date: December 1992
DOI: https://doi.org/10.1007/BF00155804

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance of parallel Cholesky factorization algorithms using BLAS

Abstract

Access this article

Similar content being viewed by others

Massively Parallel Implementation

Programming the LU Factorization for a Multicore System with Accelerators

Parallel Implementation of the Sherman-Morrison Matrix Inverse Algorithm

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance of parallel Cholesky factorization algorithms using BLAS

Abstract

Access this article

Similar content being viewed by others

Massively Parallel Implementation

Programming the LU Factorization for a Multicore System with Accelerators

Parallel Implementation of the Sherman-Morrison Matrix Inverse Algorithm

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation