ABSTRACT
Direct solvers for dense systems of linear equations commonly use partial pivoting to ensure numerical stability. However, pivoting can introduce significant performance overheads, such as synchronization and data movement, particularly on distributed systems. To improve the performance of these solvers, we present an alternative to pivoting in which numerical stability is obtained through additive updates. We implemented this approach using SLATE, a GPU-accelerated numerical linear algebra library, and evaluated it on the Summit supercomputer. Our approach provides better performance (up to 5-fold speedup) than Gaussian elimination with partial pivoting for comparable accuracy on most of the tested matrices. It also provides better accuracy (up to 15 more digits) than Gaussian elimination with no pivoting for comparable performance.
- Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra. 2017. Novel HPC Techniques to Batch Execution of Many Variable Size BLAS Computations on GPUs. In Proceedings of the International Conference on Supercomputing (ICS '17). Association for Computing Machinery, New York, NY, USA, 1--10. Google ScholarDigital Library
- Patrick R. Amestoy, Alfredo Buttari, Jean-Yves L'Excellent, and Theo A. Mary. 2019. Bridging the Gap between Flat and Hierarchical Low-Rank Matrix Formats: The Multilevel Block Low-Rank Format. SIAM Journal on Scientific Computing 41, 3 (Jan. 2019), A1414--A1442. Google ScholarDigital Library
- Knud D. Andersen. 1996. A Modified Schur-complement Method for Handling Dense Columns in Interior-Point Methods for Linear Programming. ACM Trans. Math. Software 22, 3 (Sept. 1996), 348--356. Google ScholarDigital Library
- Erin Carson and Nicholas J. Higham. 2018. Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions. SIAM Journal on Scientific Computing 40, 2 (Jan. 2018), A817--A847. Google ScholarDigital Library
- Chang Zhai, Yingyu Liu, Shugang Jiang, Zhongchao Lin, and Xunwang Zhao. 2020. Integrated Simulation and Analysis of Super Large Slotted Waveguide Array. Applied Computational Electromagnetics Society Journal 35, 7 (July 2020), 813--820.Google Scholar
- Ali Charara, David Keyes, and Hatem Ltaief. 2019. Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs. ACM Trans. Math. Software 45, 2 (May 2019), 15:1--15:28. Google ScholarDigital Library
- James W. Demmel, Nicholas J. Higham, and Robert S. Schreiber. 1995. Stability of Block LU Factorization. Numerical Linear Algebra with Applications 2, 2 (1995), 173--190. Google ScholarCross Ref
- Simplice Donfack, Jack Dongarra, Mathieu Faverge, Mark Gates, Jakub Kurzak, Piotr Luszczek, and Ichitaro Yamazaki. 2015. A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination. Concurrency and Computation: Practice and Experience 27, 5 (2015), 1292--1309. Google ScholarDigital Library
- Mathieu Faverge, Julien Herrmann, Julien Langou, Bradley Lowery, Yves Robert, and Jack Dongarra. 2015. Mixing LU and QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers. J. Parallel and Distrib. Comput. 85 (2015), 32--46. Google ScholarDigital Library
- Mark Gates, Jakub Kurzak, Ali Charara, Asim YarKhan, and Jack Dongarra. 2019. SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19). Association for Computing Machinery, Denver, CO, USA, 1--18. Google ScholarDigital Library
- George A. Geist and Charles H. Romine. 1988. LU Factorization Algorithms on Distributed-Memory Multiprocessor Architectures. SIAM J. Sci. Statist. Comput. 9, 4 (July 1988), 639--649. Google ScholarDigital Library
- Gene H. Golub and Chales F. Van Loan. 2013. Matrix Computations (fourth ed.). The John Hopkins University Press, Baltimore, MD, USA.Google Scholar
- T. N. E. Greville. 1966. Note on the Generalized Inverse of a Matrix Product. SIAM Rev. 8, 4 (Oct. 1966), 518--521. Google ScholarDigital Library
- Laura Grigori, James W. Demmel, and Hua Xiang. 2011. CALU: A Communication Optimal LU Factorization Algorithm. SIAM J. Matrix Anal. Appl. 32, 4 (Oct. 2011), 1317--1350. Google ScholarDigital Library
- Wolfgang Hackbusch. 2015. Hierarchical Matrices: Algorithms and Analysis. Springer, Berlin, Heidelberg. Google ScholarCross Ref
- William W. Hager. 1989. Updating the Inverse of a Matrix. SIAM Rev. 31, 2 (June 1989), 221--239. Google ScholarDigital Library
- Nicholas J. Higham. 2002. Accuracy and Stability of Numerical Algorithms (second ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. Google ScholarCross Ref
- Awais Khan, Hyogi Sim, Sudharshan S. Vazhkudai, Ali R. Butt, and Youngjae Kim. 2021. An Analysis of System Balance and Architectural Trends Based on Top500 Supercomputers. In The International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2021). Association for Computing Machinery, New York, NY, USA, 11--22. Google ScholarDigital Library
- Grzegorz Kwasniewski, Marko Kabic, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Jens Eirik Saethre, André Gaillard, Timo Schneider, Maciej Besta, Anton Kozhevnikov, Joost VandeVondele, and Torsten Hoefler. 2021. On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '21). Association for Computing Machinery, New York, NY, USA, 1--15. Google ScholarDigital Library
- Cornwall Lau, E. F. Jaeger, Nicola Bertelli, Lee A. Berry, David L. Green, Masanori Murakami, Jin M. Park, Robert I. Pinsker, and Ron Prater. 2018. AORSA Full Wave Calculations of Helicon Waves in DIII-D and ITER. Nuclear Fusion 58, 6, Article 066004 (April 2018), 13 pages. Google ScholarCross Ref
- Xiaoye S. Li and J.W. Demmel. 1998. Making Sparse Gaussian Elimination Scalable by Static Pivoting. In SC '98: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing. IEEE Computer Society, San Jose, CA, USA, 34--34. Google ScholarCross Ref
- Neil Lindquist, Mark Gates, Piotr Luszczek, and Jack Dongarra. 2022. Threshold Pivoting for Dense LU Factorization. In 2022 IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems (ScalAH). IEEE Computer Society, Dallas, Texas, USA, 34--42. Google ScholarCross Ref
- Neil Lindquist, Piotr Luszczek, and Jack Dongarra. 2020. Replacing Pivoting in Distributed Gaussian Elimination with Randomized Techniques. In 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA). IEEE Press, Atlanta, GA, USA, 35--43. Google ScholarCross Ref
- Victor Y. Pan and Liang Zhao. 2017. Numerically Safe Gaussian Elimination with No Pivoting. Linear Algebra Appl. 527 (Aug. 2017), 349--383. Google ScholarCross Ref
- D. Stott Parker. 1995. Random Butterfly Transformations with Applications in Computational Linear Algebra. Technical Report CSD-950023. Computer Science Department, UCLA, Los Angeles, CA, USA. 20 pages.Google Scholar
- Gilbert W Stewart. 1974. Modifying Pivot Elements in Gaussian Elimination. Math. Comp. 28, 126 (1974), 537--542. Google ScholarCross Ref
- John Todd. 1977. Basic Numerical Mathematics. Birkhäuser, Basel. Google ScholarCross Ref
- Lloyd N. Trefethen and Robert S. Schreiber. 1990. Average-Case Stability of Gaussian Elimination. SIAM J. Matrix Anal. Appl. 11, 3 (July 1990), 335--360. Google ScholarDigital Library
- Max A. Woodbury. 1950. Inverting Modified Matrices. Memorandum Report, Vol. 42. Statistical Research Group, Princeton, NJ.Google Scholar
- E. L. Yip. 1986. A Note on the Stability of Solving a Rank-p Modification of a Linear System by the Sherman-Morrison-Woodbury Formula. SIAM J. Sci. Statist. Comput. 7, 2 (April 1986), 507--513. Google ScholarDigital Library
- Hong Zheng and Jianlin Li. 2007. A Practical Solution for KKT Systems. Numerical Algorithms 46, 2 (Oct. 2007), 105--119. Google ScholarCross Ref
- G. Zielke. 1974. Testmatrizen mit maximaler Konditionszahl. Computing 13, 1 (March 1974), 33--54. Google ScholarCross Ref
Index Terms
- Using Additive Modifications in LU Factorization Instead of Pivoting
Recommendations
Updating an LU Factorization with Pivoting
We show how to compute an LU factorization of a matrix when the factors of a leading principle submatrix are already known. The approach incorporates pivoting akin to partial pivoting, a strategy we call incremental pivoting. An implementation using the ...
CALU: A Communication Optimal LU Factorization Algorithm
Since the cost of communication (moving data) greatly exceeds the cost of doing arithmetic on current and future computing platforms, we are motivated to devise algorithms that communicate as little as possible, even if they do slightly more arithmetic, ...
An Unsymmetric-Pattern Multifrontal Method for Sparse LU Factorization
Sparse matrix factorization algorithms for general problems are typically characterized by irregular memory access patterns that limit their performance on parallel-vector supercomputers. For symmetric problems, methods such as the multifrontal method ...
Comments