GPU-accelerated preconditioned iterative linear solvers

Li, Ruipeng; Saad, Yousef

doi:10.1007/s11227-012-0825-3

GPU-accelerated preconditioned iterative linear solvers

Published: 05 October 2012

Volume 63, pages 443–466, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Ruipeng Li¹ &
Yousef Saad¹

2570 Accesses
166 Citations
Explore all metrics

Abstract

This work is an overview of our preliminary experience in developing a high-performance iterative linear solver accelerated by GPU coprocessors. Our goal is to illustrate the advantages and difficulties encountered when deploying GPU technology to perform sparse linear algebra computations. Techniques for speeding up sparse matrix-vector product (SpMV) kernels and finding suitable preconditioning methods are discussed. Our experiments with an NVIDIA TESLA M2070 show that for unstructured matrices SpMV kernels can be up to 8 times faster on the GPU than the Intel MKL on the host Intel Xeon X5675 Processor. Overall performance of the GPU-accelerated Incomplete Cholesky (IC) factorization preconditioned CG method can outperform its CPU counterpart by a smaller factor, up to 3, and GPU-accelerated The incomplete LU (ILU) factorization preconditioned GMRES method can achieve a speed-up nearing 4. However, with better suited preconditioning techniques for GPUs, this performance can be further improved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Two-Level Preconditioned Conjugate Gradient Method on the GPU

Development of Krylov and AMG Linear Solvers for Large-Scale Sparse Matrices on GPUs

Iterative Sparse Triangular Solves for Preconditioning

References

Agarwal A, Levy M (2007) The kill rule for multicore. In: DAC’07: proceedings of the 44th annual design automation conference, New York, NY, USA. ACM, New York, pp 750–753
Chapter Google Scholar
Agullo E, Demmel J, Dongarra J, Hadri B, Kurzak J, Langou J, Ltaief H, Luszczek P, Tomov S (2009) Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J Phys Conf Ser 180(1):012037
Article Google Scholar
Ament M, Knittel G, Weiskopf D, Strasser W (2010) A parallel preconditioned conjugate gradient solver for the Poisson problem on a Multi-GPU platform. In: PDP’10: proceedings of the 2010 18th euromicro conference on parallel, distributed and network-based processing, Washington, DC, USA. IEEE Comput. Soc., Los Alamitos, pp 583–592
Google Scholar
Baskaran MM, Bordawekar R (2008) Optimizing sparse matrix-vector multiplication on GPUs. Tech report, IBM Research
Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC’09: proceedings of the conference on high performance computing networking, storage and analysis, New York, NY, USA. ACM, New York, pp 1–11
Chapter Google Scholar
Bell N, Garland M (2010) Cusp: generic parallel algorithms for sparse matrix and graph computations. Version 0.1.0
Bolz J, Farmer I, Grinspun E, Schröoder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans Graph 22(3):917–924
Article Google Scholar
Choi JW, Singh A, Vuduc RW (2010) Model-driven autotuning of sparse matrix-vector multiply on GPUs. ACM SIGPLAN Not 45:115–126
Article Google Scholar
Davis PJ (1963) Interpolation and approximation. Blaisdell, Waltham
MATH Google Scholar
Davis TA (1994) University of Florinda sparse matrix collection, na digest
Google Scholar
Erhel J, Guyomarc’H F, Saad Y (2001) Least-squares polynomial filters for ill-conditioned linear systems. Tech report umsi-2001-32, Minnesota Supercomputer Institute, University of Minnesota, Minneapolis, MN
George A, Liu JWH (1989) The evolution of the minimum degree ordering algorithm. SIAM Rev 31(1):1–19
Article MathSciNet MATH Google Scholar
Georgescu S, Okuda H (2007) Conjugate gradients on graphic hardware: performance & feasibility
Google Scholar
Gupta R (2009) A GPU implementation of a bubbly flow solver. Master’s thesis, Delft Institute of Applied Mathematics, Delft University of Technology, 2628 BL, Delft, The Netherlands
Karypis G, Kumar V (1998) Metis—a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, version 4.0. Tech report, University of Minnesota, Department of Computer Science/Army HPC Research Center
Lanczos C (1950) An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J Res Natl Bur Stand 45:255–282
Article MathSciNet Google Scholar
Monakov A, Avetisyan A (2009) Implementing blocked sparse matrix-vector multiplication on nvidia GPUs. In: Bertels K, Dimopoulos N, Silvano C, Wong S (eds) Embedded computer systems: architectures, modeling, and simulation. Lecture notes in computer science, vol 5657. Springer, Berlin, pp 289–297
Chapter Google Scholar
Monakov A, Lokhmotov A, Avetisyan A (2010) Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: Patt Y, Foglia P, Duesterwald E, Faraboschi P, Martorell X (eds) High performance embedded architectures and compilers. Lecture notes in computer science, vol 5952. Springer, Berlin, pp 111–125
Chapter Google Scholar
NVIDIA (2012) CUBLAS library user guide 4.2
NVIDIA (2012) CUDA CUSPARSE Library
NVIDIA (2012) NVIDIA CUDA C programming guide 4.2
Oberhuber T, Suzuki A, Vacata J (2010) New row-grouped csr format for storing the sparse matrices on GPU with implementation in CUDA. CoRR abs/1012.2270
Robert Y (1982) Regular incomplete factorizations of real positive definite matrices. Linear Algebra Appl 48:105–117
Article MathSciNet MATH Google Scholar
Saad Y (1990) SPARSKIT: A basic tool kit for sparse matrix computations. Tech report RIACS-90-20, Research Institute for Advanced Computer Science, NASA Ames Research Center, Moffett Field, CA
Saad Y (1994) ILUT: a dual threshold incomplete ILU factorization. Numer Linear Algebra Appl 1:387–402
Article MathSciNet MATH Google Scholar
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelphia
Book MATH Google Scholar
Sengupta S, Harris M, Zhang Y, Owens JD (2007) Scan primitives for GPU computing. Graphics hardware 2007. ACM, New York, pp 97–106
Google Scholar
Sudan H, Klie H, Li R, Saad Y (2010) High performance manycore solvers for reservoir simulation. In: 12th European conference on the mathematics of oil recovery
Google Scholar
Vázquez F, Garzon EM, Martinez JA, Fernandez JJ (2009) The sparse matrix vector product on GPUs. Tech report, Department of Computer Architecture and Electronics, University of Almeria
Volkov V, Demmel J (2008) LU, QR and Cholesky factorizations using vector capabilities of GPUs. Tech report, Computer Science Division University of California at Berkeley
Wang M, Klie H, Parashar M, Sudan H (2009) Solving sparse linear systems on nvidia tesla GPUs. In: ICCS’09: proceedings of the 9th international conference on computational science. Springer, Berlin, pp 864–873
Google Scholar
Williams S, Bell N, Choi JW, Garland M, Oliker L, Vuduc R (2010) Scientific computing with multicore and accelerators. CRC Press, Boca Raton, pp 83–109. Chap 5
Book Google Scholar
Zhou Y, Saad Y, Tiago ML, Chelikowsky JR (2006) Parallel self-consistent-field calculations via Chebyshev-filtered subspace acceleration. Phys Rev E 74:066704
Article Google Scholar

Download references

Acknowledgements

This work is supported by DOE under grant DE-FG 08ER 25841 and by the Minnesota Supercomputer Institute.

Author information

Authors and Affiliations

Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
Ruipeng Li & Yousef Saad

Authors

Ruipeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yousef Saad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruipeng Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, R., Saad, Y. GPU-accelerated preconditioned iterative linear solvers. J Supercomput 63, 443–466 (2013). https://doi.org/10.1007/s11227-012-0825-3

Download citation

Published: 05 October 2012
Issue Date: February 2013
DOI: https://doi.org/10.1007/s11227-012-0825-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GPU-accelerated preconditioned iterative linear solvers

Abstract

Access this article

Similar content being viewed by others

Efficient Two-Level Preconditioned Conjugate Gradient Method on the GPU

Development of Krylov and AMG Linear Solvers for Large-Scale Sparse Matrices on GPUs

Iterative Sparse Triangular Solves for Preconditioning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GPU-accelerated preconditioned iterative linear solvers

Abstract

Access this article

Similar content being viewed by others

Efficient Two-Level Preconditioned Conjugate Gradient Method on the GPU

Development of Krylov and AMG Linear Solvers for Large-Scale Sparse Matrices on GPUs

Iterative Sparse Triangular Solves for Preconditioning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation