skip to main content
10.1145/3295500.3356186acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Code generation for massively parallel phase-field simulations

Published:17 November 2019Publication History

ABSTRACT

This article describes the development of automatic program generation technology to create scalable phase-field methods for material science applications. To simulate the formation of microstructures in metal alloys, we employ an advanced, thermodynamically consistent phase-field method. A state-of-the-art large-scale implementation of this model requires extensive, time-consuming, manual code optimization to achieve unprecedented fine mesh resolution. Our new approach starts with an abstract description based on free-energy functionals which is formally transformed into a continuous PDE and discretized automatically to obtain a stencil-based time-stepping scheme. Subsequently, an automatized performance engineering process generates highly optimized, performance-portable code for CPUs and GPUs. We demonstrate the efficiency for real-world simulations on large-scale GPU-based (PizDaint) and CPU-based (SuperMUC-NG) supercomputers. Our technique simplifies program development and optimization for a wide class of models.

We further outperform existing, manually optimized implementations as our code can be generated specifically for each phase-field model and hardware configuration.

References

  1. T. Takaki, T. Shimokawabe, M. Ohno, A. Yamanaka, and T. Aoki. Unexpected selection of growing dendrites by very-large-scale phase-field simulation. Journal of Crystal Growth, 382:21--25, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  2. M. Bauer, J. Hötzer, M. Jainta, P. Steinmetz, M. Berghoff, F. Schornbaum, C. Godenschwager, H. Köstler, B. Nestler, and U. Rüde. Massively parallel phase-field simulations for ternary eutectic directional solidification. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, page 8. ACM, 2015.Google ScholarGoogle Scholar
  3. J. S. Rowlinson. Translation of J. D. van der Waals' "the thermodynamik theory of capillarity under the hypothesis of a continuous variation of density". Journal of Statistical Physics, 20(2):197--200, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  4. L. D. Landau and I. M. Khalatikow. The selected works of 1. d. landau (engl, transi), 1963.Google ScholarGoogle Scholar
  5. S. M. Allen and J. W. Cahn. Coherent and incoherent equilibria in iron-rich iron-aluminum alloys. Acta Metallurgica, 23(9):1017--1026, 1975. ISSN 0001-6160. URL http://www.sciencedirect.com/science/article/pii/0001616075901066. Google ScholarGoogle ScholarCross RefCross Ref
  6. J. E. Hilliard and J. W. Cahn. On the nature of the interface between a solid metal and its melt. Acta Metallurgica, 6(12):772--774, 1958. ISSN 0001-6160. URL http://www.sciencedirect.com/science/article/pii/000161605890052X. Google ScholarGoogle ScholarCross RefCross Ref
  7. U. Hecht, L. Gránásy, T. Pusztai, B. Böttger, M. Apel, V. Witusiewicz, L. Ratke, J. De Wilde, L. Froyen, D. Camel, et al. Multiphase solidification in multicomponent alloys. Materials Science and Engineering: R: Reports, 46(1):1--49, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  8. M. Asta, C. Beckermann, A. Karma, W. Kurz, R. Napolitano, M. Plapp, G. Purdy, M. Rappaz, and R. Trivedi. Solidification microstructures and solid-state parallels: Recent developments, future directions. Acta Materialia, 57(4):941--971, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. Hötzer*, M. Kellner*, P. Steinmetz*, and B. Nestler (*equal authors). Applications of the phase-field method for the solidification of micro structures in multi-component systems. Journal of the Indian Institute of Science, 2016.Google ScholarGoogle Scholar
  10. T. Shimokawabe, T. Aoki, T. Takaki, A. Yamanaka, A. Nukada, T. Endo, N. Maruyama, and S. Matsuoka. Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer. In High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, pages 1--11. IEEE, 2011.Google ScholarGoogle Scholar
  11. J. Hötzer, M. Jainta, P. Steinmetz, B. Nestler, A. Dennstedt, A. Genau, M. Bauer, H. Köstler, and U. Rüde. Large scale phase-field simulations of directional ternary eutectic solidification. Acta Materialia, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Hötzer. Massiv-parallele und großskalige phasenfeldsimulationen zur Untersuchung der mikrostrukturentwicklung, 2017.Google ScholarGoogle Scholar
  13. J. Zhang, C. Zhou, Y. Wang, L. Ju, Q. Du, X. Chi, D. Xu, D. Chen, Y. Liu, and Z. Liu. Extreme-scale phase field simulations of coarsening dynamics on the sunway taihulight supercomputer. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, page 4. IEEE Press, 2016.Google ScholarGoogle Scholar
  14. A. Logg, K. Mardal, and G. Wells. Automated solution of differential equations by the finite element method: The FEniCS book, volume 84. Springer Science & Business Media, 2012.Google ScholarGoogle Scholar
  15. J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a cpu and gpu math expression compiler. In Proceedings of the Python for scientific computing conference (SciPy), volume 4. Austin, TX, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  16. C. Lengauer, S. Apel, M. Bolten, A. Größlinger, F. Hannig, H. Köstler, U. Rüde, J. Teich, A. Grebhahn, S. Kronawitter, et al. Exastencils: Advanced stencil-code engineering. In European Conference on Parallel Processing, pages 553--564. Springer, 2014.Google ScholarGoogle Scholar
  17. P. Vincent, F. Witherden, B. Vermeire, J. S. Park, and A. Iyer. Towards green aviation with python at petascale. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, page 1. IEEE Press, 2016.Google ScholarGoogle Scholar
  18. M. Januszewski and M. Kostur. Sailfish: A flexible multi-gpu implementation of the lattice boltzmann method. Computer Physics Communications, 185(9): 2350--2368, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  19. S. Kuckuk, G. Haase, D. A. Vasco, and H. Köstler. Towards generating efficient flow solvers with the exastencils approach. Concurrency and Computation: Practice and Experience, 29(17):e4062, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Fernando, D. Neusen, H. Lim, E. Hirschmann, and H. Sundar. Massively parallel simulations of binary black hole intermediate-mass-ratio inspirais. SIAM Journal on Scientific Computing, 41(2):C97--C138, 2019. URL https://doi.org/10-l137/18M1196972. Google ScholarGoogle ScholarCross RefCross Ref
  21. G. Baumgartner, A. Auer, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. J. Harrison, So. Hirata, S. Krishnamoorthy, et al. Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models. Proceedings of the IEEE, 93(2):276--292, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  22. T. Muranushi, H. Hotta, J. Makino, S. Nishizawa, H. Tomita, K. Nitadori, M. Iwasawa, N. Hosono, Y. Maruyama, H. Inoue, et al. Simulations of below-ground dynamics of fungi: 1.184 pflops attained by automated generation and autotuning of temporal blocking codes. In SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 23--33. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  23. C. Yount, J. Tobin, A. Breuer, and A. Duran. Yask---yet another stencil kernel: A framework for hpc stencil code-generation and tuning. In 2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pages 30--39, Nov 2016. Google ScholarGoogle ScholarCross RefCross Ref
  24. M. Lange, N. Kukreja, M. Louboutin, F. Luporini, F. Vieira, V. Pandolfo, P. Velesko, P. Kazakas, and G. Gorman. Devito: towards a generic finite difference dsl using symbolic python. In 2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC), pages 67--75. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  25. N. Kukreja, M. Louboutin, F. Vieira, F. Luporini, M. Lange, and G. Gorman. Devito: Automated fast finite difference computation. In 2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pages 11--19. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  26. J. E. Guyer, D. Wheeler, and J. A. Warren. Fipy: Partial differential equations with python. Computing in Science & Engineering, 11(3):6--15, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Choudhury and B. Nestler. Grand-potential formulation for multicomponent phase transformations combined with thin-interface asymptotics of the double-obstacle potential. Phys. Rev. E, 85:021602, Feb 2012. URL https://link.aps.org/doi/10.1103/PhysRevE.85.021602. Google ScholarGoogle ScholarCross RefCross Ref
  28. A. Meurer, C. P. Smith, M. Paprocki, O. Čertík, S. B. Kirpichev, M. Rocklin, A. Kumar, S. Ivanov, J. K. Moore, S. Singh, T. Rathnayake, S. Vig, B. E. Granger, R. P. Muller, F. Bonazzi, H. Gupta, S. Vats, F. Johansson, F. Pedregosa, M. J. Curry, A. R. Terrel, Š. Roučka, A. Saboo, I. Fernando, S. Kulal, R. Cimrman, and A. Scopatz. Sympy: symbolic computing in python. Peer J Computer Science, 3:e103, 2017. ISSN 2376-5992. URL https://doi.org/10.7717/peerj-cs.103. Google ScholarGoogle ScholarCross RefCross Ref
  29. A. Karma and W. Rappel. Quantitative phase-field modeling of dendritic growth in two and three dimensions. Phys. Rev. E, 57:4323--4349, Apr 1998. URL https://link.aps.org/doi/10.1103/PhysRevE.57.4323. Google ScholarGoogle ScholarCross RefCross Ref
  30. B. Nestler, H. Garcke, and B. Stinner. Multicomponent alloy solidification: Phase-field modeling and simulations. PHYSICAL REVIEW E 71, 041609, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  31. J. K. Salmon, M. A. Moraes, R.O. Dror, and D. E. Shaw. Parallel random numbers: as easy as 1, 2, 3. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, page 16. ACM, 2011.Google ScholarGoogle Scholar
  32. S. Verdoolaege. isl: An integer set library for the polyhedral model. In K. Fukuda, J. Hoeven, M. Joswig, and N. Takayama, editors, Mathematical Software - ICMS 2010, volume 6327 of Lecture Notes in Computer Science, pages 299--302. Springer, 2010. ISBN 978-3-642-15581-9.Google ScholarGoogle Scholar
  33. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis and transformation. pages 75--88, San Jose, CA, USA, Mar 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Christoph W. Kessler. Scheduling expression dags for minimal register need. Computer Languages, 24(1):33--53, 1998. ISSN 0096-0551. URL http://www.sciencedirect.com/science/article/pii/S0096055198000022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Stengel, J. Treibig, G. Hager, and G. Wellein. Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model. Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15, 2015. URL http://dx.doi.org/10.1145/2751205.2751240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Hammer, J. Eitzinger, G. Hager, and G. Wellein. Kerncraft: A tool for analytic performance modeling of loop kernels. In C. Niethammer, J. Gracia, T. Hilbrich, A. Knüpfer, M. M. Resch, and W. E. Nagel, editors, Tools for High Performance Computing 2016, pages 1--22, Cham, 2017. Springer International Publishing. ISBN 978-3-319-56702-0.Google ScholarGoogle ScholarCross RefCross Ref
  37. J. Treibig, G. Hager, and G. Wellein. LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. F. Schornbaum and U. Rüde. Extreme-scale block-structured adaptive mesh refinement. SIAM Journal on Scientific Computing, 40(3):C358--C387, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. F. Schornbaum and U. Rüde. Massively parallel algorithms for the lattice boltzmann method on nonuniform grids. SIAM Journal on Scientific Computing, 38(2): C96--C126, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  40. C. Godenschwager, F. Schornbaum, M. Bauer, H. Köstler, and U. Rüde. A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, page 35. ACM, 2013.Google ScholarGoogle Scholar
  41. M. Bauer, F. Schornbaum, C. Godenschwager, M. Markl, D. Anderl, H. Köstler, and U. Rüde. A python extension for the massively parallel multiphysics simulation framework walberla. International Journal of Parallel, Emergent and Distributed Systems, 31(6):529--542, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. van der Walt, S. C. Colbert, and G. Varoquaux. The numpy array: A structure for efficient numerical computation. Computing in Science Engineering, 13(2): 22--30, March 2011. ISSN 1521-9615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. T. Kluyver, B. Ragan-Kelley, F. Pérez, B. Granger, M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, P Ivanov, D. Avila, S. Abdalla, and C. Willing. Jupyter notebooks - a publishing format for reproducible computational workflows. In F. Loizides and B. Schmidt, editors, Positioning and Power in Academic Publishing: Players, Agents and Agendas, pages 87--90. IOS Press, 2016.Google ScholarGoogle Scholar
  44. A. Fog. 4. instruction tables. URL https://www.agner.org/optimize/instruction_tables.pdf.Google ScholarGoogle Scholar
  45. TOP500 List. http://www.top500.org/lists/2018/11/, 2019. [Online; accessed April-2019].Google ScholarGoogle Scholar
  46. Layer Conditions. https://rrze-hpc.github.io/layer-condition/, 2018. [Online; accessed April-2019].Google ScholarGoogle Scholar
  47. A. Dennstedt and L. Ratke. Microstructures of directionally solidified al-ag-cu ternary eutectics. Transactions of the Indian Institute of Metals, 65(6):777--782, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  48. A. Genau and L. Ratke. Morphological characterization of the al-ag-cu ternary eutectic. International Journal of Materials Research, 103(4):469--475, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  49. M. Kellner, W. Kunz, P. Steinmetz, J. Hötzer, and B. Nestler. Phase-field study of dynamic velocity variations during directional solidification of eutectic nial-34cr. Computational Materials Science, 145:291--305, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  50. J. Hötzer, P. Steinmetz, A. Dennstedt, A. Genau, M. Kellner, Irmak Sargin, and B. Nestler. Influence of growth velocity variations on the pattern formation during the directional solidification of ternary eutectic Al-Ag-Cu. 136:335--346, 2017.Google ScholarGoogle Scholar
  51. M. Wegener. German Aerospace Center (DLR). private communication, 2019.Google ScholarGoogle Scholar

Index Terms

  1. Code generation for massively parallel phase-field simulations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
          November 2019
          1921 pages
          ISBN:9781450362290
          DOI:10.1145/3295500

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 November 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,516of6,373submissions,24%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader