research-article

Model-Based Performance Analysis of the HyTeG Finite Element Framework

Authors:
Dominik Thönnes

Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany

Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany

https://orcid.org/0000-0002-8340-4849
Search about this author

,
Ulrich Rüde

Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany

CERFACS, Toulouse, France

Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany

CERFACS, Toulouse, France

https://orcid.org/0000-0001-8796-8599
Search about this author

PASC '23: Proceedings of the Platform for Advanced Scientific Computing ConferenceJune 2023Article No.: 23Pages 1–12https://doi.org/10.1145/3592979.3593422

Published:26 June 2023Publication History

PASC '23: Proceedings of the Platform for Advanced Scientific Computing Conference

Pages 1–12

ABSTRACT

In this work, we present how code generation techniques significantly improve the performance of the computational kernels in the HyTeG software framework. This HPC framework combines the performance and memory advantages of matrix-free multigrid solvers with the flexibility of unstructured meshes. The pystencils code generation toolbox is used to replace the original abstract C++ kernels with highly optimized loop nests. The performance of one of those kernels (the matrix-vector multiplication) is thoroughly analyzed using the Execution-Cache-Memory (ECM) performance model. We validate these predictions by measurements on the SuperMUC-NG supercomputer. The experiments show that the performance mostly matches the predictions. In cases where the prediction does not match, we discuss the discrepancies. Additionally, we conduct a node-level scaling study which shows the expected behavior for a memory-bound compute kernel.

References

Agner Fog. 2022. Instruction tables, https://www.agner.org/optimize/instruction_tables.pdf. Accessed: 2022-08-07. (2022).Google Scholar
Christie L. Alappat, Johannes Hofmann, Georg Hager, Holger Fehske, Alan R. Bishop, and Gerhard Wellein. 2020. Understanding hpc benchmark performance on intel broadwell and cascade lake processors. In High Performance Computing. Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, and Hatem Ltaief, (Eds.) Springer International Publishing, Cham, 412--433. isbn: 978-3-030-50743-5.Google Scholar
Satish Balay, William D. Gropp, Lois Curfman McInnes, and Barry F. Smith. 1997. Efficient management of parallelism in object oriented numerical software libraries. In Modern Software Tools in Scientific Computing. E. Arge, A. M. Bruaset, and H. P. Langtangen, (Eds.) Birkhäuser Press, 163--202.Google ScholarDigital Library
Satish Balay et al. 2019. PETSc Users Manual. Tech. rep. ANL-95/11 - Revision 3.11. Argonne National Laboratory.Google Scholar
Richard Barrett et al. 1994. Templates for the solution of linear systems: building blocks for iterative methods. SIAM.Google Scholar
Martin Bauer et al. 2019. Code generation for massively parallel phase-field simulations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19) Article 59. Association for Computing Machinery, Denver, Colorado, 32 pages, isbn: 9781450362290. Google ScholarDigital Library
Simon Bauer et al. 2020. TerraNeo --- mantle convection beyond a trillion degrees of freedom. Software for Exascale Computing SPPEXA, 569.Google Scholar
G.-T. Bercea, A. T. T. McRae, D. A. Ham, L. Mitchell, F. Rathgeber, L. Nardi, F. Luporini, and P. H. J. Kelly. 2016. A structure-exploiting numbering algorithm for finite elements on extruded meshes, and its performance evaluation in firedrake. Geoscientific Model Development, 9, 10, 3803--3815. Google ScholarCross Ref
Benjamin Karl Bergen and Frank Hülsemann. 2004. Hierarchical hybrid grids: data structures and core algorithms for multigrid. Numerical linear algebra with applications, 11, 2--3, 279--291.Google Scholar
Georg Hager, Jan Treibig, Johannes Habich, and Gerhard Wellein. 2016. Exploring performance and power properties of modern multi-core chips via simple machine models. Concurrency and Computation: Practice and Experience, 28, 2, 189--210. doi: https://doi.org/10.1002/cpe.3180. Google ScholarDigital Library
Intel. 2019. Intel architecture code analyzer, https://software.intel.eom/en-us/articles/intel-architecture-code-analyzer. Accessed: 2022-08-07. (2019).Google Scholar
Nils Kohl, Marcus Mohr, Sebastian Eibl, and Ulrich Rüde. 2022. A massively parallel eulerian-lagrangian method for advection-dominated transport in viscous fluids. SIAM Journal on Scientific Computing, 44, 3, C260--C285. doi: 10.1137/21M1402510. Google ScholarCross Ref
Nils Kohl, Dominik Thönnes, Daniel Drzisga, Dominik Bartuschat, and Ulrich Rüde. 2019. The HyTeG finite-element software framework for scalable multigrid solvers. International Journal of Parallel, Emergent and Distributed Systems, 34, 5, 477--496.Google ScholarCross Ref
C. Lengauer et al. 2014. ExaStencils: Advanced stencil-code engineering. In Euro-Par 2014: Parallel Processing Workshops (Lecture Notes in Computer Science) (Porto, Portugal). Vol. 8806. Springer, (Aug. 25--29, 2014), 553--564. isbn: 978-3-319-14312-5. Google ScholarCross Ref
LRZ. 2018. Supermuc-ng. https://doku.lrz.de/display/PUBLIC/SuperMUC-NG. Accessed: 2022-08-07. (2018).Google Scholar
C. C. Paige and M. A. Saunders. 1975. Solution of sparse indefinite systems of linear equations. SIAM Journal on Numerical Analysis, 12, 4, 617--629. doi: 10.1137/0712047. Google ScholarDigital Library
M. W. Scroggs, J. S. Dokken, C. N. Richardson, and G. N. Wells. 2022. Construction of arbitrary order finite element degree-of-freedom maps on polygonal and polyhedral cell meshes. ACM Transactions on Mathematical Software. To appear Google ScholarDigital Library
Holger Stengel, Jan Treibig, Georg Hager, and Gerhard Wellein. 2015. Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). Association for Computing Machinery, Newport Beach, California, USA, 207--216. isbn: 9781450335591. Google ScholarDigital Library
Jan Treibig and Georg Hager. 2010. Introducing a performance model for bandwidth-limited loop kernels. In Parallel Processing and Applied Mathematics. Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, and Jerzy Wasniewski, (Eds.) Springer Berlin Heidelberg, Berlin, Heidelberg, 615--624. isbn: 978-3-642-14390-8.Google Scholar
Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM, 52, 4, 65--76.Google ScholarDigital Library

Index Terms

Model-Based Performance Analysis of the HyTeG Finite Element Framework
1. Computing methodologies
  1. Modeling and simulation

Recommendations

Analytical performance estimation during code generation on modern GPUs
Abstract
Automatic code generation is frequently used to create implementations of algorithms specifically tuned to particular hardware and application parameters. The code generation process involves the selection of adequate code ...
Highlights
- Analytical Performance Modeling helps to find the best tuning parameters for code generation.
Read More
A Matrix-Free Preconditioner for Sparse Symmetric Positive Definite Systems and Least-Squares Problems

We analyze and discuss matrix-free and limited memory preconditioners for sparse symmetric positive definite systems and normal equations of large and sparse least-squares problems. The preconditioners are based on a partial Cholesky factorization and can be ...
Read More
Performance analysis of matrix-free conjugate gradient kernels using SYCL
IWOCL '22: Proceedings of the 10th International Workshop on OpenCL

We examine the performance of matrix-free SYCL implementations of the conjugate gradient method for solving sparse linear systems of equations. Performance is tested on an NVIDIA A100-80GB device and a dual socket Intel Ice Lake CPU node using different ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PASC '23: Proceedings of the Platform for Advanced Scientific Computing Conference
June 2023
274 pages
ISBN:9798400701900
DOI:10.1145/3592979
Chairs:
Axel Huebl
Lawrence Berkeley National Laboratory
,
Cristina Silvano
Politecnico di Milano
,
Proceedings Chair:
Timothy Robinson
ETH Zurich / CSCS
Copyright © 2023 Owner/Author(s)
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2023
Check for updates
Author Tags
analytical performance modeling
code generation
stencil codes
matrix-free
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate83of185submissions,45%
Upcoming Conference
PASC '24

Sponsor:

sighpc

Platform for Advanced Scientific Computing Conference

June 3 - 5, 2024

Zurich , Switzerland
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 57
  Total Downloads
- Downloads (Last 12 months)57
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Model-Based Performance Analysis of the HyTeG Finite Element Framework

PASC '23: Proceedings of the Platform for Advanced Scientific Computing Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Analytical performance estimation during code generation on modern GPUs

A Matrix-Free Preconditioner for Sparse Symmetric Positive Definite Systems and Least-Squares Problems

Performance analysis of matrix-free conjugate gradient kernels using SYCL