skip to main content
10.1145/3592979.3593422acmconferencesArticle/Chapter ViewAbstractPublication PagespascConference Proceedingsconference-collections
research-article

Model-Based Performance Analysis of the HyTeG Finite Element Framework

Published:26 June 2023Publication History

ABSTRACT

In this work, we present how code generation techniques significantly improve the performance of the computational kernels in the HyTeG software framework. This HPC framework combines the performance and memory advantages of matrix-free multigrid solvers with the flexibility of unstructured meshes. The pystencils code generation toolbox is used to replace the original abstract C++ kernels with highly optimized loop nests. The performance of one of those kernels (the matrix-vector multiplication) is thoroughly analyzed using the Execution-Cache-Memory (ECM) performance model. We validate these predictions by measurements on the SuperMUC-NG supercomputer. The experiments show that the performance mostly matches the predictions. In cases where the prediction does not match, we discuss the discrepancies. Additionally, we conduct a node-level scaling study which shows the expected behavior for a memory-bound compute kernel.

References

  1. Agner Fog. 2022. Instruction tables, https://www.agner.org/optimize/instruction_tables.pdf. Accessed: 2022-08-07. (2022).Google ScholarGoogle Scholar
  2. Christie L. Alappat, Johannes Hofmann, Georg Hager, Holger Fehske, Alan R. Bishop, and Gerhard Wellein. 2020. Understanding hpc benchmark performance on intel broadwell and cascade lake processors. In High Performance Computing. Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, and Hatem Ltaief, (Eds.) Springer International Publishing, Cham, 412--433. isbn: 978-3-030-50743-5.Google ScholarGoogle Scholar
  3. Satish Balay, William D. Gropp, Lois Curfman McInnes, and Barry F. Smith. 1997. Efficient management of parallelism in object oriented numerical software libraries. In Modern Software Tools in Scientific Computing. E. Arge, A. M. Bruaset, and H. P. Langtangen, (Eds.) Birkhäuser Press, 163--202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Satish Balay et al. 2019. PETSc Users Manual. Tech. rep. ANL-95/11 - Revision 3.11. Argonne National Laboratory.Google ScholarGoogle Scholar
  5. Richard Barrett et al. 1994. Templates for the solution of linear systems: building blocks for iterative methods. SIAM.Google ScholarGoogle Scholar
  6. Martin Bauer et al. 2019. Code generation for massively parallel phase-field simulations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19) Article 59. Association for Computing Machinery, Denver, Colorado, 32 pages, isbn: 9781450362290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Simon Bauer et al. 2020. TerraNeo --- mantle convection beyond a trillion degrees of freedom. Software for Exascale Computing SPPEXA, 569.Google ScholarGoogle Scholar
  8. G.-T. Bercea, A. T. T. McRae, D. A. Ham, L. Mitchell, F. Rathgeber, L. Nardi, F. Luporini, and P. H. J. Kelly. 2016. A structure-exploiting numbering algorithm for finite elements on extruded meshes, and its performance evaluation in firedrake. Geoscientific Model Development, 9, 10, 3803--3815. Google ScholarGoogle ScholarCross RefCross Ref
  9. Benjamin Karl Bergen and Frank Hülsemann. 2004. Hierarchical hybrid grids: data structures and core algorithms for multigrid. Numerical linear algebra with applications, 11, 2--3, 279--291.Google ScholarGoogle Scholar
  10. Georg Hager, Jan Treibig, Johannes Habich, and Gerhard Wellein. 2016. Exploring performance and power properties of modern multi-core chips via simple machine models. Concurrency and Computation: Practice and Experience, 28, 2, 189--210. doi: https://doi.org/10.1002/cpe.3180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Intel. 2019. Intel architecture code analyzer, https://software.intel.eom/en-us/articles/intel-architecture-code-analyzer. Accessed: 2022-08-07. (2019).Google ScholarGoogle Scholar
  12. Nils Kohl, Marcus Mohr, Sebastian Eibl, and Ulrich Rüde. 2022. A massively parallel eulerian-lagrangian method for advection-dominated transport in viscous fluids. SIAM Journal on Scientific Computing, 44, 3, C260--C285. doi: 10.1137/21M1402510. Google ScholarGoogle ScholarCross RefCross Ref
  13. Nils Kohl, Dominik Thönnes, Daniel Drzisga, Dominik Bartuschat, and Ulrich Rüde. 2019. The HyTeG finite-element software framework for scalable multigrid solvers. International Journal of Parallel, Emergent and Distributed Systems, 34, 5, 477--496.Google ScholarGoogle ScholarCross RefCross Ref
  14. C. Lengauer et al. 2014. ExaStencils: Advanced stencil-code engineering. In Euro-Par 2014: Parallel Processing Workshops (Lecture Notes in Computer Science) (Porto, Portugal). Vol. 8806. Springer, (Aug. 25--29, 2014), 553--564. isbn: 978-3-319-14312-5. Google ScholarGoogle ScholarCross RefCross Ref
  15. LRZ. 2018. Supermuc-ng. https://doku.lrz.de/display/PUBLIC/SuperMUC-NG. Accessed: 2022-08-07. (2018).Google ScholarGoogle Scholar
  16. C. C. Paige and M. A. Saunders. 1975. Solution of sparse indefinite systems of linear equations. SIAM Journal on Numerical Analysis, 12, 4, 617--629. doi: 10.1137/0712047. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. W. Scroggs, J. S. Dokken, C. N. Richardson, and G. N. Wells. 2022. Construction of arbitrary order finite element degree-of-freedom maps on polygonal and polyhedral cell meshes. ACM Transactions on Mathematical Software. To appear Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Holger Stengel, Jan Treibig, Georg Hager, and Gerhard Wellein. 2015. Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). Association for Computing Machinery, Newport Beach, California, USA, 207--216. isbn: 9781450335591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jan Treibig and Georg Hager. 2010. Introducing a performance model for bandwidth-limited loop kernels. In Parallel Processing and Applied Mathematics. Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, and Jerzy Wasniewski, (Eds.) Springer Berlin Heidelberg, Berlin, Heidelberg, 615--624. isbn: 978-3-642-14390-8.Google ScholarGoogle Scholar
  20. Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM, 52, 4, 65--76.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Model-Based Performance Analysis of the HyTeG Finite Element Framework

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PASC '23: Proceedings of the Platform for Advanced Scientific Computing Conference
      June 2023
      274 pages
      ISBN:9798400701900
      DOI:10.1145/3592979

      Copyright © 2023 Owner/Author(s)

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 June 2023

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate83of185submissions,45%

      Upcoming Conference

      PASC '24
      Platform for Advanced Scientific Computing Conference
      June 3 - 5, 2024
      Zurich , Switzerland
    • Article Metrics

      • Downloads (Last 12 months)57
      • Downloads (Last 6 weeks)3

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader