Skip to main content
Log in

Performance of parallel Cholesky factorization algorithms using BLAS

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper considers four parallel Cholesky factorization algorithms, including SPOTRF from the February 1992 release of LAPACK, each of which call parallel Level 2 or 3 BLAS, or both. A fifth parallel Cholesky algorithm that calls serial Level 3 BLAS is also described. The efficiency of these five algorithms on the CRAY-2, CRAY Y-MP/832, Hitachi Data Systems EX 80, and IBM 3090-600J is evaluated and compared with a vendor-optimized parallel Cholesky factorization algorithm. The fifth parallel Cholesky algorithm that calls serial Level 3 BLAS provided the best performance of all algorithms that called BLAS routines. In fact, this algorithm outperformed the Cray-optimized libsci routine (SPOTRF) by 13–44%;, depending on the problem size and the number of processors used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwal, R.C., and Gustavson, F.G. 1988. A parallel implementation of matrix multiplication and LU factorization on IBM 3090. Proc., IFIP WG 2.5 Working Conf. on Aspects of Computation on Asynchronous Parallel Processors (Stanford, Calif.), Elsevier, New York.

    Google Scholar 

  • Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S.

  • McKenney, A., and Sorenson, D. 1990. LAPACK: A portable linear algebra library for high-performance computers. In Proc., Supercomputing '90, IEEE Comp. Soc. Press, pp. 2–11.

  • Anderson, R.E., Grimes, R.G., and Simon, H.D. 1988. Performance comparison of the CRAY X-MP/24 with SDD and the CRAY-2. The J. Supercomputing, 1, 4 (Aug.): 409–419.

    Google Scholar 

  • Benzoni, A., Radicati di Brozolo, G., Mayes, P., and Sales, M.L. 1991. Block factorization algorithms on IBM processors. IBM ECSEC tech. rept. ICE-0041, Rome.

  • Cray Research, Inc., 1988. Autotasking User's Guide (SN-2088). Eagan, Minn.

  • Cray Research, Inc., 1989. Multitasking Programmer's Manual (SR-0222F). Eagan, Minn.

  • Cray Research, Inc., 1991. Volume 3: UNICOS Math and Scientific Library Reference Manual (SR-2081). Eagan, Minn.

  • Dackland, K., Elmroth, E., Kagstrom, B., and Van Loan, C. 1992. Parallel block matrix factorizations on the shared memory multiprocessor IBM3090xVF/600J. Internat. J. Supercomputer Applications, 6, 1: 69–97.

    Google Scholar 

  • Dayde, M.J., and Duff, I.S. 1990. Use of parallel Level 3 BLAS in LU factorization on three vector multiprocessors: The Alliant FX/80, the CRAY-2, and the IBM 3090 VF. Proc., 1990 Internat. Conf. on Supercomputing. ACM Press, pp. 82–95.

  • Dongarra, J., and Eisenstat, S. 1984. Squeezing the most out of an algorithm in CRAY FORTRAN. ACM Trans. Math. Soft., 10, 3: 219–230.

    Google Scholar 

  • Dongarra, J., Gustavson, F., and Karp, A. 1984. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Review, 26, 1: pp. 91–112.

    Google Scholar 

  • Dongarra, J.J., Du Croz, J., Duff, I.S., and Hammarling, S. 1990. A set of Level 3 basic linear algebra subprograms. ACM Trans. Math. Software, 16: 1–17.

    Google Scholar 

  • Dongarra, J.J., Du Croz, J., Hammarling, S., and Hanson, R. 1988. An extended set of Fortran basic linear algebra subprograms. ACM Trans. Math. Software. 14: 1–17.

    Google Scholar 

  • Golub, G.H., and Van Loan, C.F. 1989. Matrix Computations. Johns Hopkins Univ. Press, Baltimore, Md.

    Google Scholar 

  • IBM. 1988. Parallel Fortran Language and Library Reference (SC23-0431-0). Armonk, N.Y.

  • IBM. 1990. Engineering and Scientific Subroutine Library Guide and Reference (SC23-0184-4). Armonk, N.Y.

  • Lawson, C., Hanson, R., Kincaid, D., and Krogh, F. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Software, 5: 308–323.

    Google Scholar 

  • Luecke, G.R., Yun, J.H., and Jespersen, H.W. 1989. Performance comparisons of Cholesky factorization algorithms using Level 2 & 3 BLAS on the National Advanced Systems AS/XL vector computer. In Proc., 1989 Internat. Conf. on Supercomputing, ACM Press, pp. 254–262.

  • Luecke, G.R., Yun, J.H., Smith, P., and Li, I. 1990. Performance comparisons of Cholesky fectorization algorithms using Level 2 & 3 BLAS on the HDS AS/EX V60, IBM 3090/VF 600E, CRAY-2, CRAY X-MP, and CRAY Y-MP. IMSL tech. rept. series no. 9003, IMSL, Inc., Sugar Land, Tex.

    Google Scholar 

  • Mayes, P., and Radicati di Brozolo, G. 1989. Portable and efficient factorization algorithms on the IBM 3090/VF. In Proc., 1989 Internat. Conf. on Supercomputing, ACM Press, pp. 263–270.

  • Schönauer, W. 1987. Scientific Computing on Vector Computers. North-Holland

  • van der Steen, A.J., and Van der Pas, R.J. 1987. A family portrait: Benchmark tests on a CRAY Y-MP and a CRAY-2S. Tech. rept. TR-30, joint pub. of the WGS and ACCU Computers, North-Holland.

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work was supported by grants from IMSL, Inc., and Hitachi Data Systems. The first version of this paper was presented as a poster session at Supercomputing '90, New York City, November 1990.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luecke, G.R., Yun, J.H. & Smith, P.W. Performance of parallel Cholesky factorization algorithms using BLAS. J Supercomput 6, 315–329 (1992). https://doi.org/10.1007/BF00155804

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00155804

Keywords

Navigation