ABSTRACT
Benchmarking is integral to procurement of HPC systems, communicating HPC center workloads to HPC vendors, and verifying performance of the delivered HPC systems. Currently, HPC benchmarking is manual and challenging at every step, posing a high barrier to entry, and hampering reproducibility of the benchmarks across different HPC systems. In this paper, we propose collaborative continuous benchmarking to enable functional reproducibility, automation, and community collaboration in HPC benchmarking. Recent progress in HPC automation allows us to consider previously unimaginable large-scale improvements to the HPC ecosystem. We define the minimal requirements for collaborative continuous benchmarking and develop a common language to streamline the interactions between HPC centers, vendors, and researchers. We demonstrate the initial implementation of collaborative continuous benchmarking, and introduce an open source continuous benchmarking repository, Benchpark, for community collaboration. We believe collaborative continuous benchmarking will help overcome the human bottleneck in HPC benchmarking, enabling better evaluation of our systems and enabling a more productive collaboration within the HPC community.
- Hartwig Anzt, Yen-Chen Chen, Terry Cojean, Jack Dongarra, Goran Flegar, Pratik Nayak, Enrique S. Quintana-Ortí, Yuhsiang M. Tsai, and Weichung Wang. 2019. Towards Continuous Benchmarking: An Automated Performance Evaluation Framework for High Performance Software. In Proceedings of the Platform for Advanced Scientific Computing Conference (Zurich, Switzerland) (PASC ’19). Association for Computing Machinery, New York, NY, USA, Article 9, 11 pages. https://doi.org/10.1145/3324989.3325719Google ScholarDigital Library
- David Boehme, Pascal Aschwanden, Olga Pearce, Kenneth Weiss, and Matthew LeGendre. 2021. Ubiquitous Performance Analysis. In High Performance Computing, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, and Piotr Luszczek (Eds.). Springer International Publishing, Cham, 431–449.Google Scholar
- David Boehme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Gimenez, Matthew LeGendre, Olga Pearce, and Martin Schulz. 2016. Caliper: Performance Introspection for HPC Software Stacks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Salt Lake City, Utah) (SC ’16). IEEE Press, Article 47, 11 pages.Google Scholar
- Thomas Breuer, Sebastian Lührs, Andreas Smolenko, and Julia Wellmann. 2022. JUBE (Version 2.5.1); 2.5.1. https://doi.org/10.5281/ZENODO.7534373Google ScholarCross Ref
- Stephanie Brink, Michael McKinsey, David Boehme, W. Daryl Hawkins, Connor Scully-Allison, Ian Lumsden, Treece Burgess, Vanessa Lama, Katherine E. Isaacs, Jakob Lüttgau, Michela Taufer, and Olga Pearce. 2023. Thicket: Seeing the Performance Experiment Forest for the Individual Run Trees. In ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC). ACM, Orlando, FL, USA. https://doi.org/10.1145/3588195.3592989Google ScholarDigital Library
- Alexandru Calotoiu, Torsten Hoefler, Marius Poke, and Felix Wolf. 2013. Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. In Proc. of the ACM/IEEE Conference on Supercomputing (SC13), Denver, CO, USA. ACM, 1–12. https://doi.org/10.1145/2503210.2503277Google ScholarDigital Library
- Massimiliano Culpo, Gregory Becker, Carlos Eduardo Arango Gutierrez, Kenneth Hoste, and Todd Gamblin. 2020. archspec: A library for detecting, labeling, and reasoning about microarchitectures. In In 2nd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC’20) (12).Google ScholarCross Ref
- ECP. 2010. Jacamar-CI. https://gitlab.com/ecp-ci/jacamar-ci.Google Scholar
- Exascale Computing Project. 2018. ECP Proxy Apps Suite. https://proxyapps.exascaleproject.org/ecp-proxy-apps-suite/.Google Scholar
- Todd Gamblin, Matthew LeGendre, Michael R. Collette, Gregory L. Lee, Adam Moody, Bronis R. de Supinski, and Scott Futral. 2015. The Spack Package Manager: Bringing Order to HPC Software Chaos(Supercomputing 2015 (SC’15)). Austin, Texas, USA. https://doi.org/10.1145/2807591.2807623 LLNL-CONF-669890.Google ScholarDigital Library
- Google. 2016. PerfKit Benchmarker. https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.Google Scholar
- Google. 2023. Ramble. https://github.com/GoogleCloudPlatform/ramble.Google Scholar
- I. Bicking. 2011. pip: Package Install tool for Python. https://github.com/pypa/pip.Google Scholar
- I. Z. Schlueter. 2009. NPM. https://github.com/npm/npm.Google Scholar
- Doug Jacobsen and Bob Bird. 2023. Ramble: A flexible, extensible, and composable experimentation framework. In HPC Tests Workshop at the ACM/IEEE International Conference on High Performance Computing, Network, Storage, and Analysis (SC|23). ACM, Denver, CO, USA.Google ScholarDigital Library
- LANL. 2014. Pavilion Framework. https://github.com/hpc/pavilion2.Google Scholar
- Carl Lerche, Yehuda Katz, and André Arko. 2010. Bundler. https://github.com/rubygems/bundler/blob/master/LICENSE.md.Google Scholar
- LLNL. 2015. Spack. https://github.com/spack/spack.Google Scholar
- LLNL. 2017. Caliper. https://github.com/llnl/caliper.Google Scholar
- LLNL. 2019. Adiak. http://github.com/LLNL/adiak.Google Scholar
- LLNL. 2023. AMG2023. https://github.com/LLNL/amg2023.Google Scholar
- LLNL. 2023. Benchpark. https://github.com/LLNL/benchpark.Google Scholar
- LLNL. 2023. Hubcast. https://github.com/LLNL/hubcast.Google Scholar
- LLNL. 2023. Thicket. https://github.com/llnl/thicket.Google Scholar
- Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim Hazelwood, Andrew Hock, Xinyuan Huang, Atsushi Ike, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Guokai Ma, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St. John, Tsuguchika Tabaru, Carole-Jean Wu, Lingjie Xu, Masafumi Yamazaki, Cliff Young, and Matei Zaharia. 2019. MLPerf Training Benchmark. arxiv:1910.01500 [cs.LG]Google Scholar
- ML Commons. 2023. MLPerf. https://mlcommons.org/en/.Google Scholar
- Rust. 2014. Cargo: The Rust package manager. https://github.com/rust-lang/cargo.Google Scholar
- Guido Van Rossum and Fred L. Drake. 2009. Python 3 Reference Manual. CreateSpace, Scotts Valley, CA.Google Scholar
- J. Zurawski, M. Swany, and D. Gunter. 2006. A Scalable Framework for Representation and Exchange of Network Measurements. In 2nd International Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities, 2006. TRIDENTCOM 2006.9 pp.–417. https://doi.org/10.1109/TRIDNT.2006.1649176Google ScholarCross Ref
Index Terms
- Towards Collaborative Continuous Benchmarking for HPC
Recommendations
Flexible workload generation for HPC cluster efficiency benchmarking
The High Performance Computing (HPC) community is well-accustomed to the general idea of benchmarking. In particular, the TOP500 ranking as well as its foundation--the Linpack benchmark--have shaped the field since the early 1990s. Other benchmarks with ...
Benchmarking of high throughput computing applications on Grids
Grids constitute a promising platform to execute loosely coupled, high-throughput parameter sweep applications, which arise naturally in many scientific and engineering fields like bio-informatics, computational fluid dynamics, particle physics, etc. In ...
Comments