Skip to main content
Log in

A GPU method for the analysis stage of the SPTRSV kernel

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The solution of sparse triangular linear systems of equations (SPTRSV) is often the main computational bottleneck of many numerical methods in science and engineering. In GPUs, this operation is solved using mainly two approaches. Level-set strategies perform a costly pre-processing (called analysis stage) to examine the dependencies between rows of the matrix and derive a static schedule for the subsequent solution stage. On the other hand, synchronization-free methods discover this scheduling dynamically and avoid the analysis stage, although some hybrid synchronization-free methods can leverage the level-set analysis to improve the performance. In this work, we present an efficient GPU routine to compute the analysis stage and then apply some of these ideas to accelerate a synchronization-free solver that does not require analysis. The experimental comparison with the well-known cusparse library shows up to 40\(\times \) speedups in the solution of triangular linear systems, and up to 262\(\times \) concerning the level-set analysis phase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Availability of data and materials

The source code is available on Github (https://github.com/HCL-Fing/SPTRSV). The sparse matrices used for the experimental evaluation are available in the SuiteSparse matrix collection.

References

  1. Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia

    Book  MATH  Google Scholar 

  2. Alvarado F, Schreiber R (1991) Fast parallel solution of sparse triangular systems, in 13th IMACS World Congress on Computation and Applied Mathematics, Dublin

  3. Vuduc R, Kamil S, Hsu J, Nishtala R, Demmel JW, Yelick KA (2002) Automatic performance tuning and analysis of sparse triangular solve, in In ICS 2002: Workshop on Performance Optimization via High-Level Languages and Libraries

  4. Totoni E, Heath MT, Kale LV (2014) Structure-adaptive parallel solution of sparse triangular linear systems. Parallel Comput 40(9):454–470

    Article  MathSciNet  Google Scholar 

  5. Kabir H, Booth JD, Aupy G, Benoit A, Robert Y, Raghavan P (2015) Sts-k: a multilevel sparse triangular solution scheme for numa multicores, in SC15: International Conference for High Performance Computing. Storage and Analysis, Nov, Networking, pp 1–11

  6. Mayer J (2009) Parallel algorithms for solving linear systems with sparse triangular matrices. Computing 86(4):291–312

    Article  MathSciNet  MATH  Google Scholar 

  7. Naumov M (2011) Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU. NVIDIA Corp., Westford, MA, USA, Tech. Rep. NVR-2011, vol 1

  8. Liu W, Li A, Jonathan ISD, Hogg D, Vinter B (2016) A synchronization-free algorithm for parallel sparse triangular solves, in Euro-Par 2016: Parallel Processing - 22nd International Conference on Parallel and Distributed Computing, Grenoble, France, August 24–26, 2016, Proceedings, Lecture Notes in Computer Science, vol 9833, pp 617–630

  9. Li R, Saad Y (2013) EnglishGpu-accelerated preconditioned iterative linear solvers. EnglishThe J Supercomput 63(2):443–466

    Article  Google Scholar 

  10. Dufrechou E, Ezzatti P (2020) Using analysis information in the synchronization-free GPU solution of sparse triangular systems. Concurr Comput Pract Exp 32(10):e5499

    Article  Google Scholar 

  11. Freire M, Seveso F, Ferrand J, Dufrechou E, Ezzatti P (2022) Accelerating the level-set analysis stage of a SPTRSV algorithm for GPUS, in 22th International Conference Computational and Mathematical Methods in Science and Engineering. Cadiz, Spain, p 2022

  12. Dufrechou E, Ezzatti P (2018) Solving sparse triangular linear systems in modern GPUS: a synchronization-free algorithm, in Merelli I, Liò P, Kotenko IV (Eds) 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing, PDP 2018, Cambridge, United Kingdom, March 21–23, 2018, IEEE Computer Society, pp 196–203. [Online]. Available: https://doi.org/10.1109/PDP2018.2018.00034

  13. Wing O, Huang JW (1980) A computation model of parallel solution of linear equations. IEEE Trans Comput 29:632–638

    Article  MathSciNet  MATH  Google Scholar 

  14. Saad Y, Schultz MH (1986) Gmres: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J Sci Stat Comput 7(3):856–869

    Article  MathSciNet  MATH  Google Scholar 

  15. Ahmad N, Yilmaz B, Unat D (2021) A split execution model for sptrsv. IEEE Trans Parallel Distrib Syst 32(11):2809–2822

    Article  Google Scholar 

  16. George A, Heath MT, Liu J, Ng E (1986) Solution of sparse positive definite systems on a shared-memory multiprocessor. Int J Parallel Program 15(4):309–325

    Article  MathSciNet  MATH  Google Scholar 

  17. Liu W, Li A, Hogg J, Duff IS, Vinter B (2016) A synchronization-free algorithm for parallel sparse triangular solves, in European Conference on Parallel Processing. Springer, pp 617–630

  18. Su J, Zhang F, Liu W, He B, Wu R, Du X, Wang R (2020) CapelliniSpTRSV: a thread-level synchronization-free sparse triangular solve on GPUs, in Proceedings of the 49th International Conference on Parallel Processing

  19. Zhang F, Su J, Liu W, He B, Wu R, Du X, Wang R (2021) Yuenyeungsptrsv: a thread-level and warp-level fusion synchronization-free sparse triangular solve. IEEE Trans Parallel Distrib Syst 32(9):2321–2337

    Article  Google Scholar 

  20. Dufrechou E, Ezzatti P (2018) A new GPU algorithm to compute a level set-based analysis for the parallel solution of sparse triangular systems, in 2018 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, Vancouver, BC, Canada, May 21–25, 2018. IEEE Computer Society, pp 920–929. [Online]. Available: https://doi.org/10.1109/IPDPS.2018.00101

  21. Aliaga JI, Dufrechou E, Ezzatti P, Quintana-Ortí ES (2019) An efficient GPU version of the preconditioned GMRES method. J Supercomput 75(3):1455–1469

    Article  Google Scholar 

  22. Aliaga JI, Dufrechou E, Ezzatti P, Quintana-Orti ES (2019) Accelerating the task/data-parallel version of ilupack’s bicg in multi-CPU/GPU configurations. Parallel Comput 85:79–87

    Article  MathSciNet  Google Scholar 

  23. Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25

    MathSciNet  MATH  Google Scholar 

  24. Thrust: Algorithms. http://thrust.github.io/doc/group__algorithms.html, Access date 21 Aug 2022

  25. NVIDIA, Vingelmann P, Fitzek FH (2022) Cuda c++ programming guide: 11.7.0. [Online]. Available: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-reduce-functionsl

  26. Chandra R, Dagum L, Kohr D, Menon R, Maydan D, McDonald J (2001) Parallel programming in OpenMP. Morgan kaufmann

Download references

Acknowledgements

This work is partially funded by the UDELAR CSIC-INI project CompactDisp: Formatos dispersos eficientes para arquitecturas de hardware modernas. The authors also thank PEDECIBA Informática and the University of the Republic, Uruguay.

Funding

Manuel Freire received funding from the UDELAR CSIC-INI project CompactDisp: Formatos dispersos eficientes para arquitecturas de hardware modernas. The authors also thank PEDECIBA Informática and the University of the Republic, Uruguay.

Author information

Authors and Affiliations

Authors

Contributions

MF, JF and FS equally contributed to the design, implementation and evaluation of the analysis and solver routine, which was part of their final graduation project in Computer Engineering. PE and ED were the supervisors of the project.

Corresponding author

Correspondence to Manuel Freire.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Freire, M., Ferrand, J., Seveso, F. et al. A GPU method for the analysis stage of the SPTRSV kernel. J Supercomput 79, 15051–15078 (2023). https://doi.org/10.1007/s11227-023-05238-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05238-8

Keywords

Navigation