Abstract
This paper considers four parallel Cholesky factorization algorithms, including SPOTRF from the February 1992 release of LAPACK, each of which call parallel Level 2 or 3 BLAS, or both. A fifth parallel Cholesky algorithm that calls serial Level 3 BLAS is also described. The efficiency of these five algorithms on the CRAY-2, CRAY Y-MP/832, Hitachi Data Systems EX 80, and IBM 3090-600J is evaluated and compared with a vendor-optimized parallel Cholesky factorization algorithm. The fifth parallel Cholesky algorithm that calls serial Level 3 BLAS provided the best performance of all algorithms that called BLAS routines. In fact, this algorithm outperformed the Cray-optimized libsci routine (SPOTRF) by 13–44%;, depending on the problem size and the number of processors used.
Similar content being viewed by others
References
Agarwal, R.C., and Gustavson, F.G. 1988. A parallel implementation of matrix multiplication and LU factorization on IBM 3090. Proc., IFIP WG 2.5 Working Conf. on Aspects of Computation on Asynchronous Parallel Processors (Stanford, Calif.), Elsevier, New York.
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S.
McKenney, A., and Sorenson, D. 1990. LAPACK: A portable linear algebra library for high-performance computers. In Proc., Supercomputing '90, IEEE Comp. Soc. Press, pp. 2–11.
Anderson, R.E., Grimes, R.G., and Simon, H.D. 1988. Performance comparison of the CRAY X-MP/24 with SDD and the CRAY-2. The J. Supercomputing, 1, 4 (Aug.): 409–419.
Benzoni, A., Radicati di Brozolo, G., Mayes, P., and Sales, M.L. 1991. Block factorization algorithms on IBM processors. IBM ECSEC tech. rept. ICE-0041, Rome.
Cray Research, Inc., 1988. Autotasking User's Guide (SN-2088). Eagan, Minn.
Cray Research, Inc., 1989. Multitasking Programmer's Manual (SR-0222F). Eagan, Minn.
Cray Research, Inc., 1991. Volume 3: UNICOS Math and Scientific Library Reference Manual (SR-2081). Eagan, Minn.
Dackland, K., Elmroth, E., Kagstrom, B., and Van Loan, C. 1992. Parallel block matrix factorizations on the shared memory multiprocessor IBM3090xVF/600J. Internat. J. Supercomputer Applications, 6, 1: 69–97.
Dayde, M.J., and Duff, I.S. 1990. Use of parallel Level 3 BLAS in LU factorization on three vector multiprocessors: The Alliant FX/80, the CRAY-2, and the IBM 3090 VF. Proc., 1990 Internat. Conf. on Supercomputing. ACM Press, pp. 82–95.
Dongarra, J., and Eisenstat, S. 1984. Squeezing the most out of an algorithm in CRAY FORTRAN. ACM Trans. Math. Soft., 10, 3: 219–230.
Dongarra, J., Gustavson, F., and Karp, A. 1984. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Review, 26, 1: pp. 91–112.
Dongarra, J.J., Du Croz, J., Duff, I.S., and Hammarling, S. 1990. A set of Level 3 basic linear algebra subprograms. ACM Trans. Math. Software, 16: 1–17.
Dongarra, J.J., Du Croz, J., Hammarling, S., and Hanson, R. 1988. An extended set of Fortran basic linear algebra subprograms. ACM Trans. Math. Software. 14: 1–17.
Golub, G.H., and Van Loan, C.F. 1989. Matrix Computations. Johns Hopkins Univ. Press, Baltimore, Md.
IBM. 1988. Parallel Fortran Language and Library Reference (SC23-0431-0). Armonk, N.Y.
IBM. 1990. Engineering and Scientific Subroutine Library Guide and Reference (SC23-0184-4). Armonk, N.Y.
Lawson, C., Hanson, R., Kincaid, D., and Krogh, F. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Software, 5: 308–323.
Luecke, G.R., Yun, J.H., and Jespersen, H.W. 1989. Performance comparisons of Cholesky factorization algorithms using Level 2 & 3 BLAS on the National Advanced Systems AS/XL vector computer. In Proc., 1989 Internat. Conf. on Supercomputing, ACM Press, pp. 254–262.
Luecke, G.R., Yun, J.H., Smith, P., and Li, I. 1990. Performance comparisons of Cholesky fectorization algorithms using Level 2 & 3 BLAS on the HDS AS/EX V60, IBM 3090/VF 600E, CRAY-2, CRAY X-MP, and CRAY Y-MP. IMSL tech. rept. series no. 9003, IMSL, Inc., Sugar Land, Tex.
Mayes, P., and Radicati di Brozolo, G. 1989. Portable and efficient factorization algorithms on the IBM 3090/VF. In Proc., 1989 Internat. Conf. on Supercomputing, ACM Press, pp. 263–270.
Schönauer, W. 1987. Scientific Computing on Vector Computers. North-Holland
van der Steen, A.J., and Van der Pas, R.J. 1987. A family portrait: Benchmark tests on a CRAY Y-MP and a CRAY-2S. Tech. rept. TR-30, joint pub. of the WGS and ACCU Computers, North-Holland.
Author information
Authors and Affiliations
Additional information
This work was supported by grants from IMSL, Inc., and Hitachi Data Systems. The first version of this paper was presented as a poster session at Supercomputing '90, New York City, November 1990.
Rights and permissions
About this article
Cite this article
Luecke, G.R., Yun, J.H. & Smith, P.W. Performance of parallel Cholesky factorization algorithms using BLAS. J Supercomput 6, 315–329 (1992). https://doi.org/10.1007/BF00155804
Issue Date:
DOI: https://doi.org/10.1007/BF00155804