research-article

Open Access

Towards Collaborative Continuous Benchmarking for HPC

Authors:
Olga Pearce

Lawrence Livermore National Laboratory, United States of America and Texas A&M University, United States of America

Lawrence Livermore National Laboratory, United States of America and Texas A&M University, United States of America

0000-0002-1904-9627
View Profile

,
Alec Scott

Lawrence Livermore National Laboratory, USA

Lawrence Livermore National Laboratory, USA

0000-0001-6255-1308
View Profile

,
Gregory Becker

Lawrence Livermore National Laboratory, United States of America

Lawrence Livermore National Laboratory, United States of America

0000-0002-8472-4449
View Profile

,
Riyaz Haque

Lawrence Livermore National Laboratory, USA

Lawrence Livermore National Laboratory, USA

0000-0001-8930-5721
View Profile

,
Nathan Hanford

Lawrence Livermore National Laboratory, United States of America

Lawrence Livermore National Laboratory, United States of America

0000-0002-2214-7447
View Profile

,
Stephanie Brink

Lawrence Livermore National Laboratory, United States of America

Lawrence Livermore National Laboratory, United States of America

0000-0002-1458-8453
View Profile

,
Doug Jacobsen

Google, United States of America

Google, United States of America

0000-0002-3836-207X
View Profile

,
Heidi Poxon

Amazon, USA

Amazon, USA

0009-0005-4104-3968
View Profile

,
Jens Domke

RIKEN, Japan

RIKEN, Japan

0000-0002-5343-414X
View Profile

,
Todd Gamblin

Lawrence Livermore National Laboratory, USA

Lawrence Livermore National Laboratory, USA

0000-0002-7857-2805
View Profile

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisNovember 2023Pages 627–635https://doi.org/10.1145/3624062.3624135

Published:12 November 2023Publication History

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

Pages 627–635

ABSTRACT

Benchmarking is integral to procurement of HPC systems, communicating HPC center workloads to HPC vendors, and verifying performance of the delivered HPC systems. Currently, HPC benchmarking is manual and challenging at every step, posing a high barrier to entry, and hampering reproducibility of the benchmarks across different HPC systems. In this paper, we propose collaborative continuous benchmarking to enable functional reproducibility, automation, and community collaboration in HPC benchmarking. Recent progress in HPC automation allows us to consider previously unimaginable large-scale improvements to the HPC ecosystem. We define the minimal requirements for collaborative continuous benchmarking and develop a common language to streamline the interactions between HPC centers, vendors, and researchers. We demonstrate the initial implementation of collaborative continuous benchmarking, and introduce an open source continuous benchmarking repository, Benchpark, for community collaboration. We believe collaborative continuous benchmarking will help overcome the human bottleneck in HPC benchmarking, enabling better evaluation of our systems and enabling a more productive collaboration within the HPC community.

References

Hartwig Anzt, Yen-Chen Chen, Terry Cojean, Jack Dongarra, Goran Flegar, Pratik Nayak, Enrique S. Quintana-Ortí, Yuhsiang M. Tsai, and Weichung Wang. 2019. Towards Continuous Benchmarking: An Automated Performance Evaluation Framework for High Performance Software. In Proceedings of the Platform for Advanced Scientific Computing Conference (Zurich, Switzerland) (PASC ’19). Association for Computing Machinery, New York, NY, USA, Article 9, 11 pages. https://doi.org/10.1145/3324989.3325719Google ScholarDigital Library
David Boehme, Pascal Aschwanden, Olga Pearce, Kenneth Weiss, and Matthew LeGendre. 2021. Ubiquitous Performance Analysis. In High Performance Computing, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, and Piotr Luszczek (Eds.). Springer International Publishing, Cham, 431–449.Google Scholar
David Boehme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Gimenez, Matthew LeGendre, Olga Pearce, and Martin Schulz. 2016. Caliper: Performance Introspection for HPC Software Stacks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Salt Lake City, Utah) (SC ’16). IEEE Press, Article 47, 11 pages.Google Scholar
Thomas Breuer, Sebastian LÃ¼hrs, Andreas Smolenko, and Julia Wellmann. 2022. JUBE (Version 2.5.1); 2.5.1. https://doi.org/10.5281/ZENODO.7534373Google ScholarCross Ref
Stephanie Brink, Michael McKinsey, David Boehme, W. Daryl Hawkins, Connor Scully-Allison, Ian Lumsden, Treece Burgess, Vanessa Lama, Katherine E. Isaacs, Jakob LÃ¼ttgau, Michela Taufer, and Olga Pearce. 2023. Thicket: Seeing the Performance Experiment Forest for the Individual Run Trees. In ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC). ACM, Orlando, FL, USA. https://doi.org/10.1145/3588195.3592989Google ScholarDigital Library
Alexandru Calotoiu, Torsten Hoefler, Marius Poke, and Felix Wolf. 2013. Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. In Proc. of the ACM/IEEE Conference on Supercomputing (SC13), Denver, CO, USA. ACM, 1–12. https://doi.org/10.1145/2503210.2503277Google ScholarDigital Library
Massimiliano Culpo, Gregory Becker, Carlos Eduardo Arango Gutierrez, Kenneth Hoste, and Todd Gamblin. 2020. archspec: A library for detecting, labeling, and reasoning about microarchitectures. In In 2nd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC’20) (12).Google ScholarCross Ref
ECP. 2010. Jacamar-CI. https://gitlab.com/ecp-ci/jacamar-ci.Google Scholar
Exascale Computing Project. 2018. ECP Proxy Apps Suite. https://proxyapps.exascaleproject.org/ecp-proxy-apps-suite/.Google Scholar
Todd Gamblin, Matthew LeGendre, Michael R. Collette, Gregory L. Lee, Adam Moody, Bronis R. de Supinski, and Scott Futral. 2015. The Spack Package Manager: Bringing Order to HPC Software Chaos(Supercomputing 2015 (SCâ€™15)). Austin, Texas, USA. https://doi.org/10.1145/2807591.2807623 LLNL-CONF-669890.Google ScholarDigital Library
Google. 2016. PerfKit Benchmarker. https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.Google Scholar
Google. 2023. Ramble. https://github.com/GoogleCloudPlatform/ramble.Google Scholar
I. Bicking. 2011. pip: Package Install tool for Python. https://github.com/pypa/pip.Google Scholar
I. Z. Schlueter. 2009. NPM. https://github.com/npm/npm.Google Scholar
Doug Jacobsen and Bob Bird. 2023. Ramble: A flexible, extensible, and composable experimentation framework. In HPC Tests Workshop at the ACM/IEEE International Conference on High Performance Computing, Network, Storage, and Analysis (SC|23). ACM, Denver, CO, USA.Google ScholarDigital Library
LANL. 2014. Pavilion Framework. https://github.com/hpc/pavilion2.Google Scholar
Carl Lerche, Yehuda Katz, and André Arko. 2010. Bundler. https://github.com/rubygems/bundler/blob/master/LICENSE.md.Google Scholar
LLNL. 2015. Spack. https://github.com/spack/spack.Google Scholar
LLNL. 2017. Caliper. https://github.com/llnl/caliper.Google Scholar
LLNL. 2019. Adiak. http://github.com/LLNL/adiak.Google Scholar
LLNL. 2023. AMG2023. https://github.com/LLNL/amg2023.Google Scholar
LLNL. 2023. Benchpark. https://github.com/LLNL/benchpark.Google Scholar
LLNL. 2023. Hubcast. https://github.com/LLNL/hubcast.Google Scholar
LLNL. 2023. Thicket. https://github.com/llnl/thicket.Google Scholar
Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim Hazelwood, Andrew Hock, Xinyuan Huang, Atsushi Ike, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Guokai Ma, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St. John, Tsuguchika Tabaru, Carole-Jean Wu, Lingjie Xu, Masafumi Yamazaki, Cliff Young, and Matei Zaharia. 2019. MLPerf Training Benchmark. arxiv:1910.01500 [cs.LG]Google Scholar
ML Commons. 2023. MLPerf. https://mlcommons.org/en/.Google Scholar
Rust. 2014. Cargo: The Rust package manager. https://github.com/rust-lang/cargo.Google Scholar
Guido Van Rossum and Fred L. Drake. 2009. Python 3 Reference Manual. CreateSpace, Scotts Valley, CA.Google Scholar
J. Zurawski, M. Swany, and D. Gunter. 2006. A Scalable Framework for Representation and Exchange of Network Measurements. In 2nd International Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities, 2006. TRIDENTCOM 2006.9 pp.–417. https://doi.org/10.1109/TRIDNT.2006.1649176Google ScholarCross Ref

Index Terms

Towards Collaborative Continuous Benchmarking for HPC

Index terms have been assigned to the content through auto-classification.

Recommendations

Flexible workload generation for HPC cluster efficiency benchmarking

The High Performance Computing (HPC) community is well-accustomed to the general idea of benchmarking. In particular, the TOP500 ranking as well as its foundation--the Linpack benchmark--have shaped the field since the early 1990s. Other benchmarks with ...
Read More
Application-specific benchmarking
Read More
Benchmarking of high throughput computing applications on Grids

Grids constitute a promising platform to execute loosely coupled, high-throughput parameter sweep applications, which arise naturally in many scientific and engineering fields like bio-informatics, computational fluid dynamics, particle physics, etc. In ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062

Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2023
Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 402
  Total Downloads
- Downloads (Last 12 months)402
- Downloads (Last 6 weeks)223
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Towards Collaborative Continuous Benchmarking for HPC

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Flexible workload generation for HPC cluster efficiency benchmarking

Application-specific benchmarking

Benchmarking of high throughput computing applications on Grids

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Towards Collaborative Continuous Benchmarking for HPC

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Flexible workload generation for HPC cluster efficiency benchmarking

Application-specific benchmarking

Benchmarking of high throughput computing applications on Grids

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media