A GPU method for the analysis stage of the SPTRSV kernel

Freire, Manuel; Ferrand, Juan; Seveso, Franco; Dufrechou, Ernesto; Ezzatti, Pablo

doi:10.1007/s11227-023-05238-8

A GPU method for the analysis stage of the SPTRSV kernel

Published: 13 April 2023

Volume 79, pages 15051–15078, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Manuel Freire¹,
Juan Ferrand¹,
Franco Seveso¹,
Ernesto Dufrechou¹ &
…
Pablo Ezzatti¹

182 Accesses
Explore all metrics

Abstract

The solution of sparse triangular linear systems of equations (SPTRSV) is often the main computational bottleneck of many numerical methods in science and engineering. In GPUs, this operation is solved using mainly two approaches. Level-set strategies perform a costly pre-processing (called analysis stage) to examine the dependencies between rows of the matrix and derive a static schedule for the subsequent solution stage. On the other hand, synchronization-free methods discover this scheduling dynamically and avoid the analysis stage, although some hybrid synchronization-free methods can leverage the level-set analysis to improve the performance. In this work, we present an efficient GPU routine to compute the analysis stage and then apply some of these ideas to accelerate a synchronization-free solver that does not require analysis. The experimental comparison with the well-known cusparse library shows up to 40\(\times \) speedups in the solution of triangular linear systems, and up to 262\(\times \) concerning the level-set analysis phase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

qRLS: quantum relaxation for linear systems in finite element analysis

Article 24 April 2024

Availability of data and materials

The source code is available on Github (https://github.com/HCL-Fing/SPTRSV). The sparse matrices used for the experimental evaluation are available in the SuiteSparse matrix collection.

References

Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia
Book MATH Google Scholar
Alvarado F, Schreiber R (1991) Fast parallel solution of sparse triangular systems, in 13th IMACS World Congress on Computation and Applied Mathematics, Dublin
Vuduc R, Kamil S, Hsu J, Nishtala R, Demmel JW, Yelick KA (2002) Automatic performance tuning and analysis of sparse triangular solve, in In ICS 2002: Workshop on Performance Optimization via High-Level Languages and Libraries
Totoni E, Heath MT, Kale LV (2014) Structure-adaptive parallel solution of sparse triangular linear systems. Parallel Comput 40(9):454–470
Article MathSciNet Google Scholar
Kabir H, Booth JD, Aupy G, Benoit A, Robert Y, Raghavan P (2015) Sts-k: a multilevel sparse triangular solution scheme for numa multicores, in SC15: International Conference for High Performance Computing. Storage and Analysis, Nov, Networking, pp 1–11
Mayer J (2009) Parallel algorithms for solving linear systems with sparse triangular matrices. Computing 86(4):291–312
Article MathSciNet MATH Google Scholar
Naumov M (2011) Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU. NVIDIA Corp., Westford, MA, USA, Tech. Rep. NVR-2011, vol 1
Liu W, Li A, Jonathan ISD, Hogg D, Vinter B (2016) A synchronization-free algorithm for parallel sparse triangular solves, in Euro-Par 2016: Parallel Processing - 22nd International Conference on Parallel and Distributed Computing, Grenoble, France, August 24–26, 2016, Proceedings, Lecture Notes in Computer Science, vol 9833, pp 617–630
Li R, Saad Y (2013) EnglishGpu-accelerated preconditioned iterative linear solvers. EnglishThe J Supercomput 63(2):443–466
Article Google Scholar
Dufrechou E, Ezzatti P (2020) Using analysis information in the synchronization-free GPU solution of sparse triangular systems. Concurr Comput Pract Exp 32(10):e5499
Article Google Scholar
Freire M, Seveso F, Ferrand J, Dufrechou E, Ezzatti P (2022) Accelerating the level-set analysis stage of a SPTRSV algorithm for GPUS, in 22th International Conference Computational and Mathematical Methods in Science and Engineering. Cadiz, Spain, p 2022
Dufrechou E, Ezzatti P (2018) Solving sparse triangular linear systems in modern GPUS: a synchronization-free algorithm, in Merelli I, Liò P, Kotenko IV (Eds) 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing, PDP 2018, Cambridge, United Kingdom, March 21–23, 2018, IEEE Computer Society, pp 196–203. [Online]. Available: https://doi.org/10.1109/PDP2018.2018.00034
Wing O, Huang JW (1980) A computation model of parallel solution of linear equations. IEEE Trans Comput 29:632–638
Article MathSciNet MATH Google Scholar
Saad Y, Schultz MH (1986) Gmres: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J Sci Stat Comput 7(3):856–869
Article MathSciNet MATH Google Scholar
Ahmad N, Yilmaz B, Unat D (2021) A split execution model for sptrsv. IEEE Trans Parallel Distrib Syst 32(11):2809–2822
Article Google Scholar
George A, Heath MT, Liu J, Ng E (1986) Solution of sparse positive definite systems on a shared-memory multiprocessor. Int J Parallel Program 15(4):309–325
Article MathSciNet MATH Google Scholar
Liu W, Li A, Hogg J, Duff IS, Vinter B (2016) A synchronization-free algorithm for parallel sparse triangular solves, in European Conference on Parallel Processing. Springer, pp 617–630
Su J, Zhang F, Liu W, He B, Wu R, Du X, Wang R (2020) CapelliniSpTRSV: a thread-level synchronization-free sparse triangular solve on GPUs, in Proceedings of the 49th International Conference on Parallel Processing
Zhang F, Su J, Liu W, He B, Wu R, Du X, Wang R (2021) Yuenyeungsptrsv: a thread-level and warp-level fusion synchronization-free sparse triangular solve. IEEE Trans Parallel Distrib Syst 32(9):2321–2337
Article Google Scholar
Dufrechou E, Ezzatti P (2018) A new GPU algorithm to compute a level set-based analysis for the parallel solution of sparse triangular systems, in 2018 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, Vancouver, BC, Canada, May 21–25, 2018. IEEE Computer Society, pp 920–929. [Online]. Available: https://doi.org/10.1109/IPDPS.2018.00101
Aliaga JI, Dufrechou E, Ezzatti P, Quintana-Ortí ES (2019) An efficient GPU version of the preconditioned GMRES method. J Supercomput 75(3):1455–1469
Article Google Scholar
Aliaga JI, Dufrechou E, Ezzatti P, Quintana-Orti ES (2019) Accelerating the task/data-parallel version of ilupack’s bicg in multi-CPU/GPU configurations. Parallel Comput 85:79–87
Article MathSciNet Google Scholar
Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25
MathSciNet MATH Google Scholar
Thrust: Algorithms. http://thrust.github.io/doc/group__algorithms.html, Access date 21 Aug 2022
NVIDIA, Vingelmann P, Fitzek FH (2022) Cuda c++ programming guide: 11.7.0. [Online]. Available: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-reduce-functionsl
Chandra R, Dagum L, Kohr D, Menon R, Maydan D, McDonald J (2001) Parallel programming in OpenMP. Morgan kaufmann

Download references

Acknowledgements

This work is partially funded by the UDELAR CSIC-INI project CompactDisp: Formatos dispersos eficientes para arquitecturas de hardware modernas. The authors also thank PEDECIBA Informática and the University of the Republic, Uruguay.

Funding

Manuel Freire received funding from the UDELAR CSIC-INI project CompactDisp: Formatos dispersos eficientes para arquitecturas de hardware modernas. The authors also thank PEDECIBA Informática and the University of the Republic, Uruguay.

Author information

Authors and Affiliations

Facultad de Ingeniería, Instituto de Computación, INCO, Universidad de la República, Montevideo, Uruguay
Manuel Freire, Juan Ferrand, Franco Seveso, Ernesto Dufrechou & Pablo Ezzatti

Authors

Manuel Freire
View author publications
You can also search for this author in PubMed Google Scholar
Juan Ferrand
View author publications
You can also search for this author in PubMed Google Scholar
Franco Seveso
View author publications
You can also search for this author in PubMed Google Scholar
Ernesto Dufrechou
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Ezzatti
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MF, JF and FS equally contributed to the design, implementation and evaluation of the analysis and solver routine, which was part of their final graduation project in Computer Engineering. PE and ED were the supervisors of the project.

Corresponding author

Correspondence to Manuel Freire.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Freire, M., Ferrand, J., Seveso, F. et al. A GPU method for the analysis stage of the SPTRSV kernel. J Supercomput 79, 15051–15078 (2023). https://doi.org/10.1007/s11227-023-05238-8

Download citation

Accepted: 27 March 2023
Published: 13 April 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s11227-023-05238-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A GPU method for the analysis stage of the SPTRSV kernel

Abstract

Access this article

Similar content being viewed by others

Parallelizing the dual revised simplex method

Shared Memory Parallelism in Modern C++ and HPX

qRLS: quantum relaxation for linear systems in finite element analysis

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A GPU method for the analysis stage of the SPTRSV kernel

Abstract

Access this article

Similar content being viewed by others

Parallelizing the dual revised simplex method

Shared Memory Parallelism in Modern C++ and HPX

qRLS: quantum relaxation for linear systems in finite element analysis

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation