skip to main content
10.1145/3624062.3624135acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open Access

Towards Collaborative Continuous Benchmarking for HPC

Authors Info & Claims
Published:12 November 2023Publication History

ABSTRACT

Benchmarking is integral to procurement of HPC systems, communicating HPC center workloads to HPC vendors, and verifying performance of the delivered HPC systems. Currently, HPC benchmarking is manual and challenging at every step, posing a high barrier to entry, and hampering reproducibility of the benchmarks across different HPC systems. In this paper, we propose collaborative continuous benchmarking to enable functional reproducibility, automation, and community collaboration in HPC benchmarking. Recent progress in HPC automation allows us to consider previously unimaginable large-scale improvements to the HPC ecosystem. We define the minimal requirements for collaborative continuous benchmarking and develop a common language to streamline the interactions between HPC centers, vendors, and researchers. We demonstrate the initial implementation of collaborative continuous benchmarking, and introduce an open source continuous benchmarking repository, Benchpark, for community collaboration. We believe collaborative continuous benchmarking will help overcome the human bottleneck in HPC benchmarking, enabling better evaluation of our systems and enabling a more productive collaboration within the HPC community.

References

  1. Hartwig Anzt, Yen-Chen Chen, Terry Cojean, Jack Dongarra, Goran Flegar, Pratik Nayak, Enrique S. Quintana-Ortí, Yuhsiang M. Tsai, and Weichung Wang. 2019. Towards Continuous Benchmarking: An Automated Performance Evaluation Framework for High Performance Software. In Proceedings of the Platform for Advanced Scientific Computing Conference (Zurich, Switzerland) (PASC ’19). Association for Computing Machinery, New York, NY, USA, Article 9, 11 pages. https://doi.org/10.1145/3324989.3325719Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. David Boehme, Pascal Aschwanden, Olga Pearce, Kenneth Weiss, and Matthew LeGendre. 2021. Ubiquitous Performance Analysis. In High Performance Computing, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, and Piotr Luszczek (Eds.). Springer International Publishing, Cham, 431–449.Google ScholarGoogle Scholar
  3. David Boehme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Gimenez, Matthew LeGendre, Olga Pearce, and Martin Schulz. 2016. Caliper: Performance Introspection for HPC Software Stacks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Salt Lake City, Utah) (SC ’16). IEEE Press, Article 47, 11 pages.Google ScholarGoogle Scholar
  4. Thomas Breuer, Sebastian Lührs, Andreas Smolenko, and Julia Wellmann. 2022. JUBE (Version 2.5.1); 2.5.1. https://doi.org/10.5281/ZENODO.7534373Google ScholarGoogle ScholarCross RefCross Ref
  5. Stephanie Brink, Michael McKinsey, David Boehme, W. Daryl Hawkins, Connor Scully-Allison, Ian Lumsden, Treece Burgess, Vanessa Lama, Katherine E. Isaacs, Jakob Lüttgau, Michela Taufer, and Olga Pearce. 2023. Thicket: Seeing the Performance Experiment Forest for the Individual Run Trees. In ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC). ACM, Orlando, FL, USA. https://doi.org/10.1145/3588195.3592989Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alexandru Calotoiu, Torsten Hoefler, Marius Poke, and Felix Wolf. 2013. Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. In Proc. of the ACM/IEEE Conference on Supercomputing (SC13), Denver, CO, USA. ACM, 1–12. https://doi.org/10.1145/2503210.2503277Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Massimiliano Culpo, Gregory Becker, Carlos Eduardo Arango Gutierrez, Kenneth Hoste, and Todd Gamblin. 2020. archspec: A library for detecting, labeling, and reasoning about microarchitectures. In In 2nd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC’20) (12).Google ScholarGoogle ScholarCross RefCross Ref
  8. ECP. 2010. Jacamar-CI. https://gitlab.com/ecp-ci/jacamar-ci.Google ScholarGoogle Scholar
  9. Exascale Computing Project. 2018. ECP Proxy Apps Suite. https://proxyapps.exascaleproject.org/ecp-proxy-apps-suite/.Google ScholarGoogle Scholar
  10. Todd Gamblin, Matthew LeGendre, Michael R. Collette, Gregory L. Lee, Adam Moody, Bronis R. de Supinski, and Scott Futral. 2015. The Spack Package Manager: Bringing Order to HPC Software Chaos(Supercomputing 2015 (SC’15)). Austin, Texas, USA. https://doi.org/10.1145/2807591.2807623 LLNL-CONF-669890.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Google. 2016. PerfKit Benchmarker. https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.Google ScholarGoogle Scholar
  12. Google. 2023. Ramble. https://github.com/GoogleCloudPlatform/ramble.Google ScholarGoogle Scholar
  13. I. Bicking. 2011. pip: Package Install tool for Python. https://github.com/pypa/pip.Google ScholarGoogle Scholar
  14. I. Z. Schlueter. 2009. NPM. https://github.com/npm/npm.Google ScholarGoogle Scholar
  15. Doug Jacobsen and Bob Bird. 2023. Ramble: A flexible, extensible, and composable experimentation framework. In HPC Tests Workshop at the ACM/IEEE International Conference on High Performance Computing, Network, Storage, and Analysis (SC|23). ACM, Denver, CO, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. LANL. 2014. Pavilion Framework. https://github.com/hpc/pavilion2.Google ScholarGoogle Scholar
  17. Carl Lerche, Yehuda Katz, and André Arko. 2010. Bundler. https://github.com/rubygems/bundler/blob/master/LICENSE.md.Google ScholarGoogle Scholar
  18. LLNL. 2015. Spack. https://github.com/spack/spack.Google ScholarGoogle Scholar
  19. LLNL. 2017. Caliper. https://github.com/llnl/caliper.Google ScholarGoogle Scholar
  20. LLNL. 2019. Adiak. http://github.com/LLNL/adiak.Google ScholarGoogle Scholar
  21. LLNL. 2023. AMG2023. https://github.com/LLNL/amg2023.Google ScholarGoogle Scholar
  22. LLNL. 2023. Benchpark. https://github.com/LLNL/benchpark.Google ScholarGoogle Scholar
  23. LLNL. 2023. Hubcast. https://github.com/LLNL/hubcast.Google ScholarGoogle Scholar
  24. LLNL. 2023. Thicket. https://github.com/llnl/thicket.Google ScholarGoogle Scholar
  25. Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim Hazelwood, Andrew Hock, Xinyuan Huang, Atsushi Ike, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Guokai Ma, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St. John, Tsuguchika Tabaru, Carole-Jean Wu, Lingjie Xu, Masafumi Yamazaki, Cliff Young, and Matei Zaharia. 2019. MLPerf Training Benchmark. arxiv:1910.01500 [cs.LG]Google ScholarGoogle Scholar
  26. ML Commons. 2023. MLPerf. https://mlcommons.org/en/.Google ScholarGoogle Scholar
  27. Rust. 2014. Cargo: The Rust package manager. https://github.com/rust-lang/cargo.Google ScholarGoogle Scholar
  28. Guido Van Rossum and Fred L. Drake. 2009. Python 3 Reference Manual. CreateSpace, Scotts Valley, CA.Google ScholarGoogle Scholar
  29. J. Zurawski, M. Swany, and D. Gunter. 2006. A Scalable Framework for Representation and Exchange of Network Measurements. In 2nd International Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities, 2006. TRIDENTCOM 2006.9 pp.–417. https://doi.org/10.1109/TRIDNT.2006.1649176Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Towards Collaborative Continuous Benchmarking for HPC
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
                November 2023
                2180 pages
                ISBN:9798400707858
                DOI:10.1145/3624062

                Copyright © 2023 Owner/Author

                This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 12 November 2023

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed limited
              • Article Metrics

                • Downloads (Last 12 months)402
                • Downloads (Last 6 weeks)223

                Other Metrics

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              HTML Format

              View this article in HTML Format .

              View HTML Format