research-article

Using Additive Modifications in LU Factorization Instead of Pivoting

Authors:
Neil Lindquist

Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, United States of America

Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, United States of America

https://orcid.org/0000-0001-9404-3121
View Profile

,
Piotr Luszczek

Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, United States of America

Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, United States of America

https://orcid.org/0000-0002-0089-6965
View Profile

,
Jack Dongarra

Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, USA

Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, USA

https://orcid.org/0000-0003-3247-1782
View Profile

ICS '23: Proceedings of the 37th International Conference on SupercomputingJune 2023Pages 14–24https://doi.org/10.1145/3577193.3593731

Published:21 June 2023Publication History

ICS '23: Proceedings of the 37th International Conference on Supercomputing

Pages 14–24

ABSTRACT

Direct solvers for dense systems of linear equations commonly use partial pivoting to ensure numerical stability. However, pivoting can introduce significant performance overheads, such as synchronization and data movement, particularly on distributed systems. To improve the performance of these solvers, we present an alternative to pivoting in which numerical stability is obtained through additive updates. We implemented this approach using SLATE, a GPU-accelerated numerical linear algebra library, and evaluated it on the Summit supercomputer. Our approach provides better performance (up to 5-fold speedup) than Gaussian elimination with partial pivoting for comparable accuracy on most of the tested matrices. It also provides better accuracy (up to 15 more digits) than Gaussian elimination with no pivoting for comparable performance.

References

Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra. 2017. Novel HPC Techniques to Batch Execution of Many Variable Size BLAS Computations on GPUs. In Proceedings of the International Conference on Supercomputing (ICS '17). Association for Computing Machinery, New York, NY, USA, 1--10. Google ScholarDigital Library
Patrick R. Amestoy, Alfredo Buttari, Jean-Yves L'Excellent, and Theo A. Mary. 2019. Bridging the Gap between Flat and Hierarchical Low-Rank Matrix Formats: The Multilevel Block Low-Rank Format. SIAM Journal on Scientific Computing 41, 3 (Jan. 2019), A1414--A1442. Google ScholarDigital Library
Knud D. Andersen. 1996. A Modified Schur-complement Method for Handling Dense Columns in Interior-Point Methods for Linear Programming. ACM Trans. Math. Software 22, 3 (Sept. 1996), 348--356. Google ScholarDigital Library
Erin Carson and Nicholas J. Higham. 2018. Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions. SIAM Journal on Scientific Computing 40, 2 (Jan. 2018), A817--A847. Google ScholarDigital Library
Chang Zhai, Yingyu Liu, Shugang Jiang, Zhongchao Lin, and Xunwang Zhao. 2020. Integrated Simulation and Analysis of Super Large Slotted Waveguide Array. Applied Computational Electromagnetics Society Journal 35, 7 (July 2020), 813--820.Google Scholar
Ali Charara, David Keyes, and Hatem Ltaief. 2019. Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs. ACM Trans. Math. Software 45, 2 (May 2019), 15:1--15:28. Google ScholarDigital Library
James W. Demmel, Nicholas J. Higham, and Robert S. Schreiber. 1995. Stability of Block LU Factorization. Numerical Linear Algebra with Applications 2, 2 (1995), 173--190. Google ScholarCross Ref
Simplice Donfack, Jack Dongarra, Mathieu Faverge, Mark Gates, Jakub Kurzak, Piotr Luszczek, and Ichitaro Yamazaki. 2015. A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination. Concurrency and Computation: Practice and Experience 27, 5 (2015), 1292--1309. Google ScholarDigital Library
Mathieu Faverge, Julien Herrmann, Julien Langou, Bradley Lowery, Yves Robert, and Jack Dongarra. 2015. Mixing LU and QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers. J. Parallel and Distrib. Comput. 85 (2015), 32--46. Google ScholarDigital Library
Mark Gates, Jakub Kurzak, Ali Charara, Asim YarKhan, and Jack Dongarra. 2019. SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19). Association for Computing Machinery, Denver, CO, USA, 1--18. Google ScholarDigital Library
George A. Geist and Charles H. Romine. 1988. LU Factorization Algorithms on Distributed-Memory Multiprocessor Architectures. SIAM J. Sci. Statist. Comput. 9, 4 (July 1988), 639--649. Google ScholarDigital Library
Gene H. Golub and Chales F. Van Loan. 2013. Matrix Computations (fourth ed.). The John Hopkins University Press, Baltimore, MD, USA.Google Scholar
T. N. E. Greville. 1966. Note on the Generalized Inverse of a Matrix Product. SIAM Rev. 8, 4 (Oct. 1966), 518--521. Google ScholarDigital Library
Laura Grigori, James W. Demmel, and Hua Xiang. 2011. CALU: A Communication Optimal LU Factorization Algorithm. SIAM J. Matrix Anal. Appl. 32, 4 (Oct. 2011), 1317--1350. Google ScholarDigital Library
Wolfgang Hackbusch. 2015. Hierarchical Matrices: Algorithms and Analysis. Springer, Berlin, Heidelberg. Google ScholarCross Ref
William W. Hager. 1989. Updating the Inverse of a Matrix. SIAM Rev. 31, 2 (June 1989), 221--239. Google ScholarDigital Library
Nicholas J. Higham. 2002. Accuracy and Stability of Numerical Algorithms (second ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. Google ScholarCross Ref
Awais Khan, Hyogi Sim, Sudharshan S. Vazhkudai, Ali R. Butt, and Youngjae Kim. 2021. An Analysis of System Balance and Architectural Trends Based on Top500 Supercomputers. In The International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2021). Association for Computing Machinery, New York, NY, USA, 11--22. Google ScholarDigital Library
Grzegorz Kwasniewski, Marko Kabic, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Jens Eirik Saethre, André Gaillard, Timo Schneider, Maciej Besta, Anton Kozhevnikov, Joost VandeVondele, and Torsten Hoefler. 2021. On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '21). Association for Computing Machinery, New York, NY, USA, 1--15. Google ScholarDigital Library
Cornwall Lau, E. F. Jaeger, Nicola Bertelli, Lee A. Berry, David L. Green, Masanori Murakami, Jin M. Park, Robert I. Pinsker, and Ron Prater. 2018. AORSA Full Wave Calculations of Helicon Waves in DIII-D and ITER. Nuclear Fusion 58, 6, Article 066004 (April 2018), 13 pages. Google ScholarCross Ref
Xiaoye S. Li and J.W. Demmel. 1998. Making Sparse Gaussian Elimination Scalable by Static Pivoting. In SC '98: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing. IEEE Computer Society, San Jose, CA, USA, 34--34. Google ScholarCross Ref
Neil Lindquist, Mark Gates, Piotr Luszczek, and Jack Dongarra. 2022. Threshold Pivoting for Dense LU Factorization. In 2022 IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems (ScalAH). IEEE Computer Society, Dallas, Texas, USA, 34--42. Google ScholarCross Ref
Neil Lindquist, Piotr Luszczek, and Jack Dongarra. 2020. Replacing Pivoting in Distributed Gaussian Elimination with Randomized Techniques. In 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA). IEEE Press, Atlanta, GA, USA, 35--43. Google ScholarCross Ref
Victor Y. Pan and Liang Zhao. 2017. Numerically Safe Gaussian Elimination with No Pivoting. Linear Algebra Appl. 527 (Aug. 2017), 349--383. Google ScholarCross Ref
D. Stott Parker. 1995. Random Butterfly Transformations with Applications in Computational Linear Algebra. Technical Report CSD-950023. Computer Science Department, UCLA, Los Angeles, CA, USA. 20 pages.Google Scholar
Gilbert W Stewart. 1974. Modifying Pivot Elements in Gaussian Elimination. Math. Comp. 28, 126 (1974), 537--542. Google ScholarCross Ref
John Todd. 1977. Basic Numerical Mathematics. Birkhäuser, Basel. Google ScholarCross Ref
Lloyd N. Trefethen and Robert S. Schreiber. 1990. Average-Case Stability of Gaussian Elimination. SIAM J. Matrix Anal. Appl. 11, 3 (July 1990), 335--360. Google ScholarDigital Library
Max A. Woodbury. 1950. Inverting Modified Matrices. Memorandum Report, Vol. 42. Statistical Research Group, Princeton, NJ.Google Scholar
E. L. Yip. 1986. A Note on the Stability of Solving a Rank-p Modification of a Linear System by the Sherman-Morrison-Woodbury Formula. SIAM J. Sci. Statist. Comput. 7, 2 (April 1986), 507--513. Google ScholarDigital Library
Hong Zheng and Jianlin Li. 2007. A Practical Solution for KKT Systems. Numerical Algorithms 46, 2 (Oct. 2007), 105--119. Google ScholarCross Ref
G. Zielke. 1974. Testmatrizen mit maximaler Konditionszahl. Computing 13, 1 (March 1974), 33--54. Google ScholarCross Ref

Index Terms

Using Additive Modifications in LU Factorization Instead of Pivoting
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
2. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices
  2. Mathematical software
    1. Mathematical software performance

Recommendations

Updating an LU Factorization with Pivoting

We show how to compute an LU factorization of a matrix when the factors of a leading principle submatrix are already known. The approach incorporates pivoting akin to partial pivoting, a strategy we call incremental pivoting. An implementation using the ...
Read More
CALU: A Communication Optimal LU Factorization Algorithm

Since the cost of communication (moving data) greatly exceeds the cost of doing arithmetic on current and future computing platforms, we are motivated to devise algorithms that communicate as little as possible, even if they do slightly more arithmetic, ...
Read More
An Unsymmetric-Pattern Multifrontal Method for Sparse LU Factorization

Sparse matrix factorization algorithms for general problems are typically characterized by irregular memory access patterns that limit their performance on parallel-vector supercomputers. For symmetric problems, methods such as the multifrontal method ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '23: Proceedings of the 37th International Conference on Supercomputing
June 2023
505 pages
ISBN:9798400700569
DOI:10.1145/3577193
Chair:
Kyle Gallivan,
Co-chair:
Efstratios Gallopoulos,
Program Co-chairs:
Dimitrios S. Nikolopoulos,
Ramon Beivide
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 June 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
LU factorization
linear algebra
communication avoidance
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 100
  Total Downloads
- Downloads (Last 12 months)100
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Using Additive Modifications in LU Factorization Instead of Pivoting

ICS '23: Proceedings of the 37th International Conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Updating an LU Factorization with Pivoting

CALU: A Communication Optimal LU Factorization Algorithm

An Unsymmetric-Pattern Multifrontal Method for Sparse LU Factorization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Using Additive Modifications in LU Factorization Instead of Pivoting

ICS '23: Proceedings of the 37th International Conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Updating an LU Factorization with Pivoting

CALU: A Communication Optimal LU Factorization Algorithm

An Unsymmetric-Pattern Multifrontal Method for Sparse LU Factorization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media