The HOPSA Workflow and Tools

Mohr, Bernd; Voevodin, Vladimir; Giménez, Judit; Hagersten, Erik; Knüpfer, Andreas; Nikitenko, Dmitry A.; Nilsson, Mats; Servat, Harald; Shah, Aamer; Winkler, Frank; Wolf, Felix; Zhukov, Ilya

doi:10.1007/978-3-642-37349-7_9

Bernd Mohr⁶,
Vladimir Voevodin⁷,
Judit Giménez⁸,
Erik Hagersten¹⁰,
Andreas Knüpfer¹¹,
Dmitry A. Nikitenko⁷,
Mats Nilsson¹⁰,
Harald Servat⁸,
Aamer Shah⁹,
Frank Winkler¹¹,
Felix Wolf⁹ &
…
Ilya Zhukov⁶

677 Accesses
5 Citations

Abstract

To maximise the scientific output of a high-performance computing system, different stakeholders pursue different strategies. While individual application developers are trying to shorten the time to solution by optimising their codes, system administrators are tuning the configuration of the overall system to increase its throughput. Yet, the complexity of today’s machines with their strong interrelationship between application and system performance presents serious challenges to achieving these goals. The HOPSA project (HOlistic Performance System Analysis) therefore sets out to create an integrated diagnostic infrastructure for combined application and system-level tuning – with the former provided by the EU and the latter by the Russian project partners. Starting from system-wide basic performance screening of individual jobs, an automated workflow routes findings on potential bottlenecks either to application developers or system administrators with recommendations on how to identify their root cause using more powerful diagnostic tools. Developers can choose from a variety of mature performance-analysis tools developed by our consortium. Within this project, the tools will be further integrated and enhanced with respect to scalability, depth of analysis, and support for asynchronous tasking, a node-level paradigm playing an increasingly important role in hybrid programs on emerging hierarchical and heterogeneous systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

J. Labarta, S. Girona, V. Pillet, T. Cortes, L. Gregoris, DiP: A parallel program development environment. in: Proc. of the 2nd International Euro-Par Conference, Lyon, France, Springer, 1996.
Google Scholar
M. Geimer, F. Wolf, B.J.N. Wylie, E. Ábrahám, D. Becker, B. Mohr: The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience, 22(6):702–719, April 2010.
Google Scholar
E. Berg, E. Hagersten: StatCache: A Probabilistic Approach to Efficient and Accurate Data Locality Analysis. In: Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2004), Austin, Texas, USA, March 2004.
Google Scholar
W. Nagel, M. Weber, H.-C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and Analysis of MPI Resources. Supercomputer, 12(1):69–80, 1996.
Google Scholar
D. an Mey, S. Biersdorff, C. Bischof, K. Diethelm, D. Eschweiler, M. Gerndt, A. Knüpfer, D. Lorenz, A.D. Malony, W.E. Nagel, Y. Oleynik, C. Rössel, P. Saviankou, D. Schmidl, S.S. Shende, M. Wagner, B. Wesarg, F. Wolf: Score-P: A Unified Performance Measurement System for Petascale Applications. In: Competence in High Performance Computing 2010 (CiHPC), pp. 85–97. Gauß-Allianz, Springer (2012).
Google Scholar
M. Gerndt and M. Ott. Automatic Performance Analysis with Periscope. Concurrency and Computation: Practice and Experience, 22(6):736–748, 2010.
Google Scholar
S. Shende and A. D. Malony. The TAU Parallel Performance System. International Journal of High Performance Computing Applications, 20(2):287–331, 2006. SAGE Publications.
Google Scholar
M. Geimer, P. Saviankou, A. Strube, Z. Szebenyi, F. Wolf, B. J. N. Wylie: Further improving the scalability of the Scalasca toolset. In: Proceedings of PARA 2010: State of the Art in Scientific and Parallel Computing, Part II: Minisymposium Scalable tools for High Performance Computing, Reykjavik, Iceland, June 6–9 2010, volume 7134 of Lecture Notes in Computer Science, pages 463–474, Springer, 2012.
Google Scholar
D. Eschweiler, M. Wagner, M. Geimer, A. Knüpfer, W. E. Nagel, F. Wolf: Open Trace Format 2 – The Next Generation of Scalable Trace Formats and Support Libraries. In: Proceedings of the International Conference on Parallel Computing (ParCo), Ghent, Belgium, 2011, volume 22 of Advances in Parallel Computing, pages 481–490, IOS Press, 2012.
Google Scholar
A. Knüpfer, R. Brendel, H. Brunst, H. Mix, W. E. Nagel: Introducing the Open Trace Format (OTF), In: Vassil N. Alexandrov, Geert Dick van Albada, Peter M. A. Sloot, Jack Dongarra (Eds): Computational Science – ICCS 2006: 6th International Conference, Reading, UK, May 28–31, 2006, Proceedings, Part II, Springer Verlag, ISBN: 3-540-34381-4, pages 526–533, Vol. 3992, 2006.
Google Scholar
F. Wolf, B. Mohr: EPILOG Binary Trace-Data Format. Technical Report FZJ-ZAM-IB-2004-06, Forschungszentrum Jülich, 2004.
Google Scholar
H. Servat Gelabert, G. Llort Sanchez, J. Gimenez, and J. Labarta. Detailed performance analysis using coarse grain sampling. In: Euro-Par 2009 – Parallel Processing Workshops, Delft, The Netherlands, August 2009, volume 6043 of Lecture Notes in Computer Science, pages 185–198. Springer, 2010.
Google Scholar
T-Platforms, Moscow, Russia. Clustrx HPC Software. http://www.t-platforms.com/products/software/clustrxproductfamily.html, last accessed September 2012.
A.V. Adinets, P.A. Bryzgalov, Vad.V. Voevodin, S.A. Zhumatiy, D.A. Nikitenko. About one approach to monitoring, analysis and visualization of jobs on cluster system (In Russian). In: Numerical Methods and Programming, 2011, vol. 12, Pp. 90–93
Google Scholar
Z. Szebenyi, F. Wolf, B. J.N. Wylie. Space-Efficient Time-Series Call-Path Profiling of Parallel Applications. In: Proc. of the ACM/IEEE Conference on Supercomputing (SC09), Portland, OR, USA, ACM, 2009.
Google Scholar

Download references

Acknowledgements

HOPSA is a coordinated twin project funded under FP7-ICT-2011-EU-Russia grant number FP7-277463 and Russian Ministry of Education and Science contract number 07.514.12.4001. The authors also would like to thank our collegues working with us on this project: Andrew Adinetz, Daniel Becker, Peter Bryzgalov, Jens Domke, Markus Geimer, Juan Gonzalez, André Grötzsch, Thomas Ilsche, Germán Llort, Christopher Schleiden, Konstantin Stefanov, Zoltán Szebenyi, Igor Zacharov, Pavel Saviankou, Igor Ustinov, Vadim Voevodin, and Sergey Zhumatiy as well as the Paraver, Scalasca, and Vampir teams in general.

Author information

Authors and Affiliations

Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH, Juelich, Germany
Bernd Mohr & Ilya Zhukov
Moscow State University, RCC, Moscow, Russia
Vladimir Voevodin & Dmitry A. Nikitenko
Barcelona Supercomputing Centre, Barcelona, Spain
Judit Giménez & Harald Servat
German Research School for Simulation Sciences GmbH / RWTH Aachen University, Aachen, Germany
Aamer Shah & Felix Wolf
Rogue Wave Software AB, Uppsala, Sweden
Erik Hagersten & Mats Nilsson
Technical University Dresden, Dresden, Germany
Andreas Knüpfer & Frank Winkler

Authors

Bernd Mohr
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Voevodin
View author publications
You can also search for this author in PubMed Google Scholar
Judit Giménez
View author publications
You can also search for this author in PubMed Google Scholar
Erik Hagersten
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Knüpfer
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry A. Nikitenko
View author publications
You can also search for this author in PubMed Google Scholar
Mats Nilsson
View author publications
You can also search for this author in PubMed Google Scholar
Harald Servat
View author publications
You can also search for this author in PubMed Google Scholar
Aamer Shah
View author publications
You can also search for this author in PubMed Google Scholar
Frank Winkler
View author publications
You can also search for this author in PubMed Google Scholar
Felix Wolf
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Zhukov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bernd Mohr .

Editor information

Editors and Affiliations

Höchstleistungsrechenzentrum, Stuttgart (HLRS), Universität Stuttgart, Nobelstraße 19, Stuttgart, 70550, Germany
Alexey Cheptsov
Höchstleistungsrechenzentrum, Stuttgart (HLRS), Universität Stuttgart, Nobelstraße 19, Stuttgart, 70550, Germany
Steffen Brinkmann
Höchstleistungsrechenzentrum, Stuttgart (HLRS), Universität Stuttgart, Nobelstraße 19, Stuttgart, 70550, Germany
José Gracia
Höchstleistungsrechenzentrum, Stuttgart (HLRS), Universität Stuttgart, Nobelstrasße 19, Stuttgart, 70550, Germany
Michael M. Resch
Zentrum für Informationsdienste, und Hochleistungsrechnen (ZIH), Technische Universität Dresden, Helmholtzstr. 10, Dresden, 01062, Germany
Wolfgang E. Nagel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mohr, B. et al. (2013). The HOPSA Workflow and Tools. In: Cheptsov, A., Brinkmann, S., Gracia, J., Resch, M., Nagel, W. (eds) Tools for High Performance Computing 2012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37349-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-37349-7_9
Published: 13 May 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37348-0
Online ISBN: 978-3-642-37349-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics