research-article

Open Access

Implementation-Oblivious Transparent Checkpoint-Restart for MPI

Authors:
Yao Xu

Northeastern University, United States of America

Northeastern University, United States of America

0000-0002-7014-411X
View Profile

,
Leonid Belyaev

Northeastern University, United States of America

Northeastern University, United States of America

0009-0007-3256-3341
View Profile

,
Twinkle Jain

Northeastern University, United States of America

Northeastern University, United States of America

0000-0001-9576-7391
View Profile

,
Derek Schafer

University of New Mexico, United States of America

University of New Mexico, United States of America

0000-0001-8438-5144
View Profile

,
Anthony Skjellum

Tennessee Tech University, United States of America

Tennessee Tech University, United States of America

0000-0001-5252-6600
View Profile

,
Gene Cooperman

Northeastern University, United States of America

Northeastern University, United States of America

0000-0003-2175-3848
View Profile

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisNovember 2023Pages 1738–1747https://doi.org/10.1145/3624062.3624255

Published:12 November 2023Publication History

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

Pages 1738–1747

ABSTRACT

This work presents experience with traditional use cases of checkpointing on a novel platform. A single codebase (MANA) transparently checkpoints production workloads for major available MPI implementations: “develop once, run everywhere”. The new platform enables application developers to compile their application against any of the available standards-compliant MPI implementations, and test each MPI implementation according to performance or other features.

Since its original academic prototype, MANA has been under development for three of the past four years, and is planned to enter full production at NERSC in early Fall of 2023. To the best of the authors’ knowledge, MANA is currently the only production-capable, system-level checkpointing package running on a large supercomputer (Perlmutter at NERSC) using a major MPI implementation (HPE Cray MPI). Experiments are presented on large production workloads, demonstrating low runtime overhead with one codebase supporting four MPI implementations: HPE Cray MPI, MPICH, Open MPI, and ExaMPI.

References

Jason Ansel, Kapil Arya, and Gene Cooperman. 2009. DMTCP: Transparent checkpointing for cluster computations and the desktop. In 2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS’09). IEEE, Rome, Italy, 1–12.Google ScholarDigital Library
Deborah Bard, Cory Snavely, Lisa Gerhardt, Jason Lee, Becci Totzke, Katie Antypas, William Arndt, Johannes Blaschke, Suren Byna, Ravi Cheema, 2022. The LBNL superfacility project report. Technical Report. U.S. Department of Energy Office of Scientific and Technical Information (OSTI); and Lawrence Bekeley National Laboratory (LBNL).Google Scholar
Leonardo Bautista-Gomez, Seiji Tsuboi, Dimitri Komatitsch, Franck Cappello, Naoya Maruyama, and Satoshi Matsuoka. 2011. FTI: High performance fault tolerance interface for hybrid systems. In Proceedings of 2011 international conference for high performance computing, networking, storage and analysis. 1–32.Google ScholarDigital Library
H.J.C. Berendsen, D. van der Spoel, and R. van Drunen. 1995. GROMACS: A Message-passing Parallel Molecular Dynamics Implementation. Computer Physics Communications 91, 1 (1995), 43 – 56.Google ScholarCross Ref
Mark S Birrittella, Mark Debbage, Ram Huggahalli, James Kunz, Tom Lovett, Todd Rimmer, Keith D Underwood, and Robert C Zak. 2015. Intel® Omni-Path architecture: Enabling scalable, high performance fabrics. In 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects. IEEE, 1–9.Google ScholarDigital Library
Wesley Bland, Aurelien Bouteiller, Thomas Herault, George Bosilca, and Jack Dongarra. 2013. Post-failure recovery of MPI communication capability: Design and rationale. The International Journal of High Performance Computing Applications 27, 3 (2013), 244–254.Google ScholarDigital Library
Johannes P Blaschke, Aaron S Brewster, Daniel W Paley, Derek Mendez, Asmit Bhowmick, Nicholas K Sauter, Wilko Kröger, Murali Shankar, Bjoern Enders, and Deborah Bard. 2021. Real-time XFEL data analysis at SLAC and NERSC: a trial run of nascent exascale experimental data analysis. Technical Report.Google Scholar
Johannes P Blaschke, Felix Wittwer, Bjoern Enders, and Debbie Bard. 2023. How a Lightsource Uses a supercomputer for live interactive analysis of large data sets: Perspectives on the NERSC-LCLS superfacility. Synchrotron Radiation News (Sept. 2023), 1–7.Google ScholarCross Ref
Aurelien Bouteiller, Thomas Herault, Géraud Krawezik, Pierre Lemarinier, and Franck Cappello. 2006. MPICH-V project: A multiprotocol automatic fault-tolerant MPI. The International Journal of High Performance Computing Applications 20, 3 (2006), 319–333.Google ScholarDigital Library
Greg Bronevetsky, Daniel Marques, Keshav Pingali, and Paul Stodghill. 2003. Automated application-level checkpointing of MPI programs. In Proc. of the Ninth ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming. 84–94.Google ScholarDigital Library
Jiajun Cao, Kapil Arya, Rohan Garg, Shawn Matott, Dhabaleswar K. Panda, Hari Subramoni, Jéôme Vienne, and Gene Cooperman. 2016. System-level Scalable Checkpoint-Restart for Petascale Computing. In 22nd IEEE Int. Conf. on Parallel and Distributed Systems (ICPADS’16). IEEE Press, 932–941.Google Scholar
Jiajun Cao, Gregory Kerr, Kapil Arya, and Gene Cooperman. 2014. Transparent Checkpoint-Restart over InfiniBand. In ACM Symposium on High Performance Parallel and and Distributed Computing (HPDC’14). ACM Press, 12 pages.Google ScholarDigital Library
Prashant Singh Chouhan, Harsh Khetawat, Neil Resnik, Twinkle Jain, Rohan Garg, Gene Cooperman, Rebecca Hartman–Baker, and Zhengji Zhao. 2021. Improving scalability and reliability of MPI-agnostic transparent checkpointing for production workloads at NERSC (extended abstract). In First International Symposium on Checkpointing for Supercomputing (SuperCheck’21). Berkeley, CA, 1–3. https://arxiv.org/abs/2103.08546; from https://supercheck.lbl.gov/resources.Google Scholar
Cray. 2014. Understanding Communication and MPI on Cray XC40. https://www.hpc.kaust.edu.sa/sites/default/files/files/public//KSL/150607-Cray_training/3.05_cray_mpi.pdfGoogle Scholar
Daniele De Sensi, Salvatore Di Girolamo, Kim H McMahon, Duncan Roweth, and Torsten Hoefler. 2020. An in-depth analysis of the Slingshot interconnect. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–14.Google ScholarCross Ref
Jack Dongarra, Michael A Heroux, and Piotr Luszczek. 2016. A New Metric for Ranking High-performance Computing Systems. National Science Review (2016), 30–35. (benchmark at https://www.hpcg-benchmark.org/).Google Scholar
Benjamin Driscoll and Zhengji Zhao. 2020. Automation of NERSC Application Usage Report. In 2020 IEEE/ACM International Workshop on HPC User Support Tools (HUST) and Workshop on Programming and Performance Visualization Tools (ProTools). IEEE, 10–18.Google Scholar
Qi Gao, Weikuan Yu, Wei Huang, and Dhabaleswar K. Panda. 2006. Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand. In Int. Conf. on Parallel Processing (ICPP’06). 471–478.Google Scholar
Rohan Garg, Gregory Price, and Gene Cooperman. 2019. MANA for MPI: MPI-Agnostic Network-Agnostic Transparent Checkpointing. In Proc. of the 28th Int. Symp. on High-Performance Parallel and Distributed Computing. ACM, 49–60.Google ScholarDigital Library
Anna Giannakou, Johannes P Blaschke, Deborah Bard, and Lavanya Ramakrishnan. 2021. Experiences with cross-facility real-time light source data analysis workflows. In 2021 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC). IEEE, 45–53.Google Scholar
Richard L Graham, George Bosilca, and Jelena Pješivac-Grbovic. 2007. A Comparison of Application Performance Using Open MPI and Cray MPI. Cray Users Group (CUG’07) (2007), 10 pages.Google Scholar
Richard L Graham, Timothy S Woodall, and Jeffrey M Squyres. 2006. Open MPI: A flexible high performance MPI. In Parallel Processing and Applied Mathematics: 6th International Conference, PPAM 2005, Poznań, Poland, September 11-14, 2005, Revised Selected Papers 6. Springer, 228–239.Google ScholarDigital Library
William Gropp and Ewing Lusk. 1996. User’s guide for MPICH, a portable implementation of MPI.Google Scholar
Jürgen Hafner. 2008. Ab-initio simulations of materials using VASP: Density-functional theory and beyond. Journal of computational chemistry 29, 13 (2008), 2044–2078.Google ScholarCross Ref
Paul H Hargrove and Jason C Duell. 2006. Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters. Journal of Physics: Conference Series 46, 1 (2006), 494.Google ScholarCross Ref
Hewlett Packard Enterprise. 2017. Aries High-Speed Network. https://pubs.cray.com/bundle/Urika-GX_Hardware_Guide_H-6142_Rev_C_Urika-GX_HW_Guide_DITAval/page/Aries_High_Speed_Network_Urika-GX.htmlGoogle Scholar
Joshua Hursey, Timothy I Mattox, and Andrew Lumsdaine. 2009. Interconnect agnostic checkpoint/restart in Open MPI. In Proc. of 18th ACM Int. Symp. on High Performance Distributed Computing. 49–58.Google ScholarDigital Library
Joshua Hursey, Jeffrey M Squyres, Timothy I Mattox, and Andrew Lumsdaine. 2007. The design and implementation of checkpoint/restart process fault tolerance for Open MPI. In 2007 IEEE International Parallel and Distributed Processing Symposium. IEEE, 1–8.Google ScholarCross Ref
Ian Karlin, Jeff Keasler, and J Robert Neely. 2013. Lulesh 2.0 updates and changes. Technical Report. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).Google Scholar
Ignacio Laguna, Ryan Marshall, Kathryn Mohror, Martin Ruefenacht, Anthony Skjellum, and Nawrin Sultana. 2019. A large-scale study of MPI usage in open-source HPC applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–14.Google ScholarDigital Library
Ignacio Laguna, David F Richards, Todd Gamblin, Martin Schulz, Bronis R de Supinski, Kathryn Mohror, and Howard Pritchard. 2016. Evaluating and extending User-Level Fault Tolerance in MPI applications. The International Journal of High Performance Computing Applications 30, 3 (2016), 305–319.Google ScholarCross Ref
Nuria Losada, Patricia González, María J Martín, George Bosilca, Aurélien Bouteiller, and Keita Teranishi. 2020. Fault tolerance of MPI applications in exascale systems: The ULFM solution. Future Generation Computer Systems 106 (2020), 467–481.Google ScholarDigital Library
Ping-Jing Lu, Ming-Che Lai, and Jun-Sheng Chang. 2022. A survey of high-performance interconnection networks in high-performance computer systems. Electronics 11, 9 (2022), 1369.Google ScholarCross Ref
Mellanox Technologies. 2015. RDMA Aware Networks Programming User Manual (Rev 1.7). https://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdfGoogle Scholar
Jamaludin Mohd-Yusof, Sriram Swaminarayan, and Timothy C Germann. 2013. Co-design for molecular dynamics: An exascale proxy application. LA-UR 13-20839 (2013), 88–89.Google Scholar
Adam Moody, Greg Bronevetsky, Kathryn Mohror, and Bronis R De Supinski. 2010. Design, modeling, and evaluation of a scalable multi-level checkpointing system. In SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–11.Google ScholarDigital Library
NERSC [n. d.]. NERSC, the primary scientific computing facility for the Office of Science in the U.S. Department of Energy. https://nersc.gov/.Google Scholar
Bogdan Nicolae, Adam Moody, Elsa Gonsiorowski, Kathryn Mohror, and Franck Cappello. 2019. VeloC: Towards high performance adaptive asynchronous checkpointing at large scale. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 911–920.Google ScholarCross Ref
Dhabaleswar Kumar Panda, Hari Subramoni, Ching-Hsiang Chu, and Mohammadreza Bayatpour. 2021. The MVAPICH project: Transforming research into high-performance MPI library for HPC community. Journal of Computational Science 52 (2021), 101208.Google ScholarCross Ref
Dhabaleswar K Panda, Karen Tomko, Karl Schulz, and Amitava Majumdar. 2013. The MVAPICH project: Evolution and sustainability of an open source production quality MPI library for HPC. In Workshop on Sustainable Software for Science: Practice and Experiences, held in conjunction with Int’l Conference on Supercomputing (WSSPE). 5 pages.Google Scholar
Massimo Papa, Toshiki Maruyama, and Aldo Bonasera. 2001. Constrained molecular dynamics approach to fermionic systems. Physical Review C 64, 2 (2001), 024612.Google ScholarCross Ref
N Anders Petersson and Björn Sjögreen. 2015. Wave propagation in anisotropic elastic materials and curvilinear coordinates using a summation-by-parts finite difference method. J. Comput. Phys. 299 (2015), 820–841.Google ScholarDigital Library
Martin Schulz, Greg Bronevetsky, Rohit Fernandes, Daniel Marques, Keshav Pingali, and Paul Stodghill. 2004. Implementation and evaluation of a scalable application-level checkpoint-recovery scheme for MPI programs. In SC’04: Proc. of the 2004 ACM/IEEE Conf. on Supercomputing. IEEE, 38–38.Google ScholarDigital Library
Anthony Skjellum, Martin Rüfenacht, Nawrin Sultana, Derek Schafer, Ignacio Laguna, and Kathryn Mohror. 2020. ExaMPI: A modern design and implementation to accelerate Message Passing Interface innovation. In High Performance Computing: 6th Latin American Conference, CARLA 2019, Turrialba, Costa Rica, September 25–27, 2019, Revised Selected Papers 6. Springer, 153–169.Google Scholar
Aidan P Thompson, H Metin Aktulga, Richard Berger, Dan S Bolintineanu, W Michael Brown, Paul S Crozier, Pieter J in’t Veld, Axel Kohlmeyer, Stan G Moore, Trung Dac Nguyen, 2022. LAMMPS-a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Computer Physics Communications 271 (2022), 108171.Google ScholarCross Ref
Top500 2021. Top500 Supercomputers (June, 2021). https://www.top500.org/lists/top500/2021/06/. [Online; accessed Aug., 2021].Google Scholar
Yao Xu, Zhengji Zhao, Rohan Garg, Harsh Khetawat, Rebecca Hartman-Baker, and Gene Cooperman. 2021. MANA-2.0: A future-Proof design for transparent checkpointing of MPI at scale. https://ieeexplore.ieee.org/document/9721343; technical report at https://arxiv.org/abs/2112.05858. In Int. Symp. on Checkpointing for Supercomputing (SuperCheck’SC-21), 2021 SC Workshops Supplementary Proceedings (St. Louis, MO). IEEE, 68–78.Google Scholar
Junchao Zhang, Bill Long, Kenneth Raffenetti, and Pavan Balaji. 2014. Implementing the MPI-3.0 Fortran 2008 binding. In Proceedings of the 21st European MPI Users’ Group Meeting. 1–6.Google ScholarDigital Library

Index Terms

Implementation-Oblivious Transparent Checkpoint-Restart for MPI
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Reliability

Recommendations

MPI-StarT: delivering network performance to numerical applications
SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing

We describe an MPI implementation for a cluster of SMPs interconnected by a high-performance interconnect. This work is a collaboration between a numerical applications programmer and a cluster interconnect architect. The collaboration started with the ...
Read More
The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing

As high performance clusters continue to grow in size and popularity, issues of fault tolerance and reliability are becoming limiting factors on application scalability. To address these issues, we present the design and implementation of a system for ...
Read More
Parallel MATLAB Using Standard MPI Implementations
HPCMP-UGC '10: Proceedings of the 2010 DoD High Performance Computing Modernization Program Users Group Conference

Traditionally, applications written in MATLAB are oriented to single-processor systems. However, by applying standard parallel processing techniques and Message Passing Interface (MPI) implementations, these applications can benefit from the advantages ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ExaMPI
MANA
MPI
MPICH
Open MPI
transparent checkpointing
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 95
  Total Downloads
- Downloads (Last 12 months)95
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Implementation-Oblivious Transparent Checkpoint-Restart for MPI

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

MPI-StarT: delivering network performance to numerical applications

The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing

Parallel MATLAB Using Standard MPI Implementations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Implementation-Oblivious Transparent Checkpoint-Restart for MPI

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

MPI-StarT: delivering network performance to numerical applications

The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing

Parallel MATLAB Using Standard MPI Implementations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media