skip to main content
10.1145/3624062.3624255acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open Access

Implementation-Oblivious Transparent Checkpoint-Restart for MPI

Published:12 November 2023Publication History

ABSTRACT

This work presents experience with traditional use cases of checkpointing on a novel platform. A single codebase (MANA) transparently checkpoints production workloads for major available MPI implementations: “develop once, run everywhere”. The new platform enables application developers to compile their application against any of the available standards-compliant MPI implementations, and test each MPI implementation according to performance or other features.

Since its original academic prototype, MANA has been under development for three of the past four years, and is planned to enter full production at NERSC in early Fall of 2023. To the best of the authors’ knowledge, MANA is currently the only production-capable, system-level checkpointing package running on a large supercomputer (Perlmutter at NERSC) using a major MPI implementation (HPE Cray MPI). Experiments are presented on large production workloads, demonstrating low runtime overhead with one codebase supporting four MPI implementations: HPE Cray MPI, MPICH, Open MPI, and ExaMPI.

References

  1. Jason Ansel, Kapil Arya, and Gene Cooperman. 2009. DMTCP: Transparent checkpointing for cluster computations and the desktop. In 2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS’09). IEEE, Rome, Italy, 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Deborah Bard, Cory Snavely, Lisa Gerhardt, Jason Lee, Becci Totzke, Katie Antypas, William Arndt, Johannes Blaschke, Suren Byna, Ravi Cheema, 2022. The LBNL superfacility project report. Technical Report. U.S. Department of Energy Office of Scientific and Technical Information (OSTI); and Lawrence Bekeley National Laboratory (LBNL).Google ScholarGoogle Scholar
  3. Leonardo Bautista-Gomez, Seiji Tsuboi, Dimitri Komatitsch, Franck Cappello, Naoya Maruyama, and Satoshi Matsuoka. 2011. FTI: High performance fault tolerance interface for hybrid systems. In Proceedings of 2011 international conference for high performance computing, networking, storage and analysis. 1–32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H.J.C. Berendsen, D. van der Spoel, and R. van Drunen. 1995. GROMACS: A Message-passing Parallel Molecular Dynamics Implementation. Computer Physics Communications 91, 1 (1995), 43 – 56.Google ScholarGoogle ScholarCross RefCross Ref
  5. Mark S Birrittella, Mark Debbage, Ram Huggahalli, James Kunz, Tom Lovett, Todd Rimmer, Keith D Underwood, and Robert C Zak. 2015. Intel® Omni-Path architecture: Enabling scalable, high performance fabrics. In 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects. IEEE, 1–9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Wesley Bland, Aurelien Bouteiller, Thomas Herault, George Bosilca, and Jack Dongarra. 2013. Post-failure recovery of MPI communication capability: Design and rationale. The International Journal of High Performance Computing Applications 27, 3 (2013), 244–254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Johannes P Blaschke, Aaron S Brewster, Daniel W Paley, Derek Mendez, Asmit Bhowmick, Nicholas K Sauter, Wilko Kröger, Murali Shankar, Bjoern Enders, and Deborah Bard. 2021. Real-time XFEL data analysis at SLAC and NERSC: a trial run of nascent exascale experimental data analysis. Technical Report.Google ScholarGoogle Scholar
  8. Johannes P Blaschke, Felix Wittwer, Bjoern Enders, and Debbie Bard. 2023. How a Lightsource Uses a supercomputer for live interactive analysis of large data sets: Perspectives on the NERSC-LCLS superfacility. Synchrotron Radiation News (Sept. 2023), 1–7.Google ScholarGoogle ScholarCross RefCross Ref
  9. Aurelien Bouteiller, Thomas Herault, Géraud Krawezik, Pierre Lemarinier, and Franck Cappello. 2006. MPICH-V project: A multiprotocol automatic fault-tolerant MPI. The International Journal of High Performance Computing Applications 20, 3 (2006), 319–333.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Greg Bronevetsky, Daniel Marques, Keshav Pingali, and Paul Stodghill. 2003. Automated application-level checkpointing of MPI programs. In Proc. of the Ninth ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming. 84–94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jiajun Cao, Kapil Arya, Rohan Garg, Shawn Matott, Dhabaleswar K. Panda, Hari Subramoni, Jéôme Vienne, and Gene Cooperman. 2016. System-level Scalable Checkpoint-Restart for Petascale Computing. In 22nd IEEE Int. Conf. on Parallel and Distributed Systems (ICPADS’16). IEEE Press, 932–941.Google ScholarGoogle Scholar
  12. Jiajun Cao, Gregory Kerr, Kapil Arya, and Gene Cooperman. 2014. Transparent Checkpoint-Restart over InfiniBand. In ACM Symposium on High Performance Parallel and and Distributed Computing (HPDC’14). ACM Press, 12 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Prashant Singh Chouhan, Harsh Khetawat, Neil Resnik, Twinkle Jain, Rohan Garg, Gene Cooperman, Rebecca Hartman–Baker, and Zhengji Zhao. 2021. Improving scalability and reliability of MPI-agnostic transparent checkpointing for production workloads at NERSC (extended abstract). In First International Symposium on Checkpointing for Supercomputing (SuperCheck’21). Berkeley, CA, 1–3. https://arxiv.org/abs/2103.08546; from https://supercheck.lbl.gov/resources.Google ScholarGoogle Scholar
  14. Cray. 2014. Understanding Communication and MPI on Cray XC40. https://www.hpc.kaust.edu.sa/sites/default/files/files/public//KSL/150607-Cray_training/3.05_cray_mpi.pdfGoogle ScholarGoogle Scholar
  15. Daniele De Sensi, Salvatore Di Girolamo, Kim H McMahon, Duncan Roweth, and Torsten Hoefler. 2020. An in-depth analysis of the Slingshot interconnect. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–14.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jack Dongarra, Michael A Heroux, and Piotr Luszczek. 2016. A New Metric for Ranking High-performance Computing Systems. National Science Review (2016), 30–35. (benchmark at https://www.hpcg-benchmark.org/).Google ScholarGoogle Scholar
  17. Benjamin Driscoll and Zhengji Zhao. 2020. Automation of NERSC Application Usage Report. In 2020 IEEE/ACM International Workshop on HPC User Support Tools (HUST) and Workshop on Programming and Performance Visualization Tools (ProTools). IEEE, 10–18.Google ScholarGoogle Scholar
  18. Qi Gao, Weikuan Yu, Wei Huang, and Dhabaleswar K. Panda. 2006. Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand. In Int. Conf. on Parallel Processing (ICPP’06). 471–478.Google ScholarGoogle Scholar
  19. Rohan Garg, Gregory Price, and Gene Cooperman. 2019. MANA for MPI: MPI-Agnostic Network-Agnostic Transparent Checkpointing. In Proc. of the 28th Int. Symp. on High-Performance Parallel and Distributed Computing. ACM, 49–60.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Anna Giannakou, Johannes P Blaschke, Deborah Bard, and Lavanya Ramakrishnan. 2021. Experiences with cross-facility real-time light source data analysis workflows. In 2021 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC). IEEE, 45–53.Google ScholarGoogle Scholar
  21. Richard L Graham, George Bosilca, and Jelena Pješivac-Grbovic. 2007. A Comparison of Application Performance Using Open MPI and Cray MPI. Cray Users Group (CUG’07) (2007), 10 pages.Google ScholarGoogle Scholar
  22. Richard L Graham, Timothy S Woodall, and Jeffrey M Squyres. 2006. Open MPI: A flexible high performance MPI. In Parallel Processing and Applied Mathematics: 6th International Conference, PPAM 2005, Poznań, Poland, September 11-14, 2005, Revised Selected Papers 6. Springer, 228–239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. William Gropp and Ewing Lusk. 1996. User’s guide for MPICH, a portable implementation of MPI.Google ScholarGoogle Scholar
  24. Jürgen Hafner. 2008. Ab-initio simulations of materials using VASP: Density-functional theory and beyond. Journal of computational chemistry 29, 13 (2008), 2044–2078.Google ScholarGoogle ScholarCross RefCross Ref
  25. Paul H Hargrove and Jason C Duell. 2006. Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters. Journal of Physics: Conference Series 46, 1 (2006), 494.Google ScholarGoogle ScholarCross RefCross Ref
  26. Hewlett Packard Enterprise. 2017. Aries High-Speed Network. https://pubs.cray.com/bundle/Urika-GX_Hardware_Guide_H-6142_Rev_C_Urika-GX_HW_Guide_DITAval/page/Aries_High_Speed_Network_Urika-GX.htmlGoogle ScholarGoogle Scholar
  27. Joshua Hursey, Timothy I Mattox, and Andrew Lumsdaine. 2009. Interconnect agnostic checkpoint/restart in Open MPI. In Proc. of 18th ACM Int. Symp. on High Performance Distributed Computing. 49–58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Joshua Hursey, Jeffrey M Squyres, Timothy I Mattox, and Andrew Lumsdaine. 2007. The design and implementation of checkpoint/restart process fault tolerance for Open MPI. In 2007 IEEE International Parallel and Distributed Processing Symposium. IEEE, 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  29. Ian Karlin, Jeff Keasler, and J Robert Neely. 2013. Lulesh 2.0 updates and changes. Technical Report. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).Google ScholarGoogle Scholar
  30. Ignacio Laguna, Ryan Marshall, Kathryn Mohror, Martin Ruefenacht, Anthony Skjellum, and Nawrin Sultana. 2019. A large-scale study of MPI usage in open-source HPC applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ignacio Laguna, David F Richards, Todd Gamblin, Martin Schulz, Bronis R de Supinski, Kathryn Mohror, and Howard Pritchard. 2016. Evaluating and extending User-Level Fault Tolerance in MPI applications. The International Journal of High Performance Computing Applications 30, 3 (2016), 305–319.Google ScholarGoogle ScholarCross RefCross Ref
  32. Nuria Losada, Patricia González, María J Martín, George Bosilca, Aurélien Bouteiller, and Keita Teranishi. 2020. Fault tolerance of MPI applications in exascale systems: The ULFM solution. Future Generation Computer Systems 106 (2020), 467–481.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ping-Jing Lu, Ming-Che Lai, and Jun-Sheng Chang. 2022. A survey of high-performance interconnection networks in high-performance computer systems. Electronics 11, 9 (2022), 1369.Google ScholarGoogle ScholarCross RefCross Ref
  34. Mellanox Technologies. 2015. RDMA Aware Networks Programming User Manual (Rev 1.7). https://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdfGoogle ScholarGoogle Scholar
  35. Jamaludin Mohd-Yusof, Sriram Swaminarayan, and Timothy C Germann. 2013. Co-design for molecular dynamics: An exascale proxy application. LA-UR 13-20839 (2013), 88–89.Google ScholarGoogle Scholar
  36. Adam Moody, Greg Bronevetsky, Kathryn Mohror, and Bronis R De Supinski. 2010. Design, modeling, and evaluation of a scalable multi-level checkpointing system. In SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. NERSC [n. d.]. NERSC, the primary scientific computing facility for the Office of Science in the U.S. Department of Energy. https://nersc.gov/.Google ScholarGoogle Scholar
  38. Bogdan Nicolae, Adam Moody, Elsa Gonsiorowski, Kathryn Mohror, and Franck Cappello. 2019. VeloC: Towards high performance adaptive asynchronous checkpointing at large scale. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 911–920.Google ScholarGoogle ScholarCross RefCross Ref
  39. Dhabaleswar Kumar Panda, Hari Subramoni, Ching-Hsiang Chu, and Mohammadreza Bayatpour. 2021. The MVAPICH project: Transforming research into high-performance MPI library for HPC community. Journal of Computational Science 52 (2021), 101208.Google ScholarGoogle ScholarCross RefCross Ref
  40. Dhabaleswar K Panda, Karen Tomko, Karl Schulz, and Amitava Majumdar. 2013. The MVAPICH project: Evolution and sustainability of an open source production quality MPI library for HPC. In Workshop on Sustainable Software for Science: Practice and Experiences, held in conjunction with Int’l Conference on Supercomputing (WSSPE). 5 pages.Google ScholarGoogle Scholar
  41. Massimo Papa, Toshiki Maruyama, and Aldo Bonasera. 2001. Constrained molecular dynamics approach to fermionic systems. Physical Review C 64, 2 (2001), 024612.Google ScholarGoogle ScholarCross RefCross Ref
  42. N Anders Petersson and Björn Sjögreen. 2015. Wave propagation in anisotropic elastic materials and curvilinear coordinates using a summation-by-parts finite difference method. J. Comput. Phys. 299 (2015), 820–841.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Martin Schulz, Greg Bronevetsky, Rohit Fernandes, Daniel Marques, Keshav Pingali, and Paul Stodghill. 2004. Implementation and evaluation of a scalable application-level checkpoint-recovery scheme for MPI programs. In SC’04: Proc. of the 2004 ACM/IEEE Conf. on Supercomputing. IEEE, 38–38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Anthony Skjellum, Martin Rüfenacht, Nawrin Sultana, Derek Schafer, Ignacio Laguna, and Kathryn Mohror. 2020. ExaMPI: A modern design and implementation to accelerate Message Passing Interface innovation. In High Performance Computing: 6th Latin American Conference, CARLA 2019, Turrialba, Costa Rica, September 25–27, 2019, Revised Selected Papers 6. Springer, 153–169.Google ScholarGoogle Scholar
  45. Aidan P Thompson, H Metin Aktulga, Richard Berger, Dan S Bolintineanu, W Michael Brown, Paul S Crozier, Pieter J in’t Veld, Axel Kohlmeyer, Stan G Moore, Trung Dac Nguyen, 2022. LAMMPS-a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Computer Physics Communications 271 (2022), 108171.Google ScholarGoogle ScholarCross RefCross Ref
  46. Top500 2021. Top500 Supercomputers (June, 2021). https://www.top500.org/lists/top500/2021/06/. [Online; accessed Aug., 2021].Google ScholarGoogle Scholar
  47. Yao Xu, Zhengji Zhao, Rohan Garg, Harsh Khetawat, Rebecca Hartman-Baker, and Gene Cooperman. 2021. MANA-2.0: A future-Proof design for transparent checkpointing of MPI at scale. https://ieeexplore.ieee.org/document/9721343; technical report at https://arxiv.org/abs/2112.05858. In Int. Symp. on Checkpointing for Supercomputing (SuperCheck’SC-21), 2021 SC Workshops Supplementary Proceedings (St. Louis, MO). IEEE, 68–78.Google ScholarGoogle Scholar
  48. Junchao Zhang, Bill Long, Kenneth Raffenetti, and Pavan Balaji. 2014. Implementing the MPI-3.0 Fortran 2008 binding. In Proceedings of the 21st European MPI Users’ Group Meeting. 1–6.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Implementation-Oblivious Transparent Checkpoint-Restart for MPI

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
      November 2023
      2180 pages
      ISBN:9798400707858
      DOI:10.1145/3624062

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 November 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)95
      • Downloads (Last 6 weeks)20

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format