ABSTRACT
This article describes the development of automatic program generation technology to create scalable phase-field methods for material science applications. To simulate the formation of microstructures in metal alloys, we employ an advanced, thermodynamically consistent phase-field method. A state-of-the-art large-scale implementation of this model requires extensive, time-consuming, manual code optimization to achieve unprecedented fine mesh resolution. Our new approach starts with an abstract description based on free-energy functionals which is formally transformed into a continuous PDE and discretized automatically to obtain a stencil-based time-stepping scheme. Subsequently, an automatized performance engineering process generates highly optimized, performance-portable code for CPUs and GPUs. We demonstrate the efficiency for real-world simulations on large-scale GPU-based (PizDaint) and CPU-based (SuperMUC-NG) supercomputers. Our technique simplifies program development and optimization for a wide class of models.
We further outperform existing, manually optimized implementations as our code can be generated specifically for each phase-field model and hardware configuration.
- T. Takaki, T. Shimokawabe, M. Ohno, A. Yamanaka, and T. Aoki. Unexpected selection of growing dendrites by very-large-scale phase-field simulation. Journal of Crystal Growth, 382:21--25, 2013.Google ScholarCross Ref
- M. Bauer, J. Hötzer, M. Jainta, P. Steinmetz, M. Berghoff, F. Schornbaum, C. Godenschwager, H. Köstler, B. Nestler, and U. Rüde. Massively parallel phase-field simulations for ternary eutectic directional solidification. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, page 8. ACM, 2015.Google Scholar
- J. S. Rowlinson. Translation of J. D. van der Waals' "the thermodynamik theory of capillarity under the hypothesis of a continuous variation of density". Journal of Statistical Physics, 20(2):197--200, 1979.Google ScholarCross Ref
- L. D. Landau and I. M. Khalatikow. The selected works of 1. d. landau (engl, transi), 1963.Google Scholar
- S. M. Allen and J. W. Cahn. Coherent and incoherent equilibria in iron-rich iron-aluminum alloys. Acta Metallurgica, 23(9):1017--1026, 1975. ISSN 0001-6160. URL http://www.sciencedirect.com/science/article/pii/0001616075901066. Google ScholarCross Ref
- J. E. Hilliard and J. W. Cahn. On the nature of the interface between a solid metal and its melt. Acta Metallurgica, 6(12):772--774, 1958. ISSN 0001-6160. URL http://www.sciencedirect.com/science/article/pii/000161605890052X. Google ScholarCross Ref
- U. Hecht, L. Gránásy, T. Pusztai, B. Böttger, M. Apel, V. Witusiewicz, L. Ratke, J. De Wilde, L. Froyen, D. Camel, et al. Multiphase solidification in multicomponent alloys. Materials Science and Engineering: R: Reports, 46(1):1--49, 2004.Google ScholarCross Ref
- M. Asta, C. Beckermann, A. Karma, W. Kurz, R. Napolitano, M. Plapp, G. Purdy, M. Rappaz, and R. Trivedi. Solidification microstructures and solid-state parallels: Recent developments, future directions. Acta Materialia, 57(4):941--971, 2009.Google ScholarCross Ref
- J. Hötzer*, M. Kellner*, P. Steinmetz*, and B. Nestler (*equal authors). Applications of the phase-field method for the solidification of micro structures in multi-component systems. Journal of the Indian Institute of Science, 2016.Google Scholar
- T. Shimokawabe, T. Aoki, T. Takaki, A. Yamanaka, A. Nukada, T. Endo, N. Maruyama, and S. Matsuoka. Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer. In High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, pages 1--11. IEEE, 2011.Google Scholar
- J. Hötzer, M. Jainta, P. Steinmetz, B. Nestler, A. Dennstedt, A. Genau, M. Bauer, H. Köstler, and U. Rüde. Large scale phase-field simulations of directional ternary eutectic solidification. Acta Materialia, 2015.Google ScholarCross Ref
- J. Hötzer. Massiv-parallele und großskalige phasenfeldsimulationen zur Untersuchung der mikrostrukturentwicklung, 2017.Google Scholar
- J. Zhang, C. Zhou, Y. Wang, L. Ju, Q. Du, X. Chi, D. Xu, D. Chen, Y. Liu, and Z. Liu. Extreme-scale phase field simulations of coarsening dynamics on the sunway taihulight supercomputer. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, page 4. IEEE Press, 2016.Google Scholar
- A. Logg, K. Mardal, and G. Wells. Automated solution of differential equations by the finite element method: The FEniCS book, volume 84. Springer Science & Business Media, 2012.Google Scholar
- J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a cpu and gpu math expression compiler. In Proceedings of the Python for scientific computing conference (SciPy), volume 4. Austin, TX, 2010.Google ScholarCross Ref
- C. Lengauer, S. Apel, M. Bolten, A. Größlinger, F. Hannig, H. Köstler, U. Rüde, J. Teich, A. Grebhahn, S. Kronawitter, et al. Exastencils: Advanced stencil-code engineering. In European Conference on Parallel Processing, pages 553--564. Springer, 2014.Google Scholar
- P. Vincent, F. Witherden, B. Vermeire, J. S. Park, and A. Iyer. Towards green aviation with python at petascale. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, page 1. IEEE Press, 2016.Google Scholar
- M. Januszewski and M. Kostur. Sailfish: A flexible multi-gpu implementation of the lattice boltzmann method. Computer Physics Communications, 185(9): 2350--2368, 2014.Google ScholarCross Ref
- S. Kuckuk, G. Haase, D. A. Vasco, and H. Köstler. Towards generating efficient flow solvers with the exastencils approach. Concurrency and Computation: Practice and Experience, 29(17):e4062, 2017.Google ScholarCross Ref
- M. Fernando, D. Neusen, H. Lim, E. Hirschmann, and H. Sundar. Massively parallel simulations of binary black hole intermediate-mass-ratio inspirais. SIAM Journal on Scientific Computing, 41(2):C97--C138, 2019. URL https://doi.org/10-l137/18M1196972. Google ScholarCross Ref
- G. Baumgartner, A. Auer, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. J. Harrison, So. Hirata, S. Krishnamoorthy, et al. Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models. Proceedings of the IEEE, 93(2):276--292, 2005.Google ScholarCross Ref
- T. Muranushi, H. Hotta, J. Makino, S. Nishizawa, H. Tomita, K. Nitadori, M. Iwasawa, N. Hosono, Y. Maruyama, H. Inoue, et al. Simulations of below-ground dynamics of fungi: 1.184 pflops attained by automated generation and autotuning of temporal blocking codes. In SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 23--33. IEEE, 2016.Google ScholarCross Ref
- C. Yount, J. Tobin, A. Breuer, and A. Duran. Yask---yet another stencil kernel: A framework for hpc stencil code-generation and tuning. In 2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pages 30--39, Nov 2016. Google ScholarCross Ref
- M. Lange, N. Kukreja, M. Louboutin, F. Luporini, F. Vieira, V. Pandolfo, P. Velesko, P. Kazakas, and G. Gorman. Devito: towards a generic finite difference dsl using symbolic python. In 2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC), pages 67--75. IEEE, 2016.Google ScholarCross Ref
- N. Kukreja, M. Louboutin, F. Vieira, F. Luporini, M. Lange, and G. Gorman. Devito: Automated fast finite difference computation. In 2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pages 11--19. IEEE, 2016.Google ScholarCross Ref
- J. E. Guyer, D. Wheeler, and J. A. Warren. Fipy: Partial differential equations with python. Computing in Science & Engineering, 11(3):6--15, 2009.Google ScholarDigital Library
- A. Choudhury and B. Nestler. Grand-potential formulation for multicomponent phase transformations combined with thin-interface asymptotics of the double-obstacle potential. Phys. Rev. E, 85:021602, Feb 2012. URL https://link.aps.org/doi/10.1103/PhysRevE.85.021602. Google ScholarCross Ref
- A. Meurer, C. P. Smith, M. Paprocki, O. Čertík, S. B. Kirpichev, M. Rocklin, A. Kumar, S. Ivanov, J. K. Moore, S. Singh, T. Rathnayake, S. Vig, B. E. Granger, R. P. Muller, F. Bonazzi, H. Gupta, S. Vats, F. Johansson, F. Pedregosa, M. J. Curry, A. R. Terrel, Š. Roučka, A. Saboo, I. Fernando, S. Kulal, R. Cimrman, and A. Scopatz. Sympy: symbolic computing in python. Peer J Computer Science, 3:e103, 2017. ISSN 2376-5992. URL https://doi.org/10.7717/peerj-cs.103. Google ScholarCross Ref
- A. Karma and W. Rappel. Quantitative phase-field modeling of dendritic growth in two and three dimensions. Phys. Rev. E, 57:4323--4349, Apr 1998. URL https://link.aps.org/doi/10.1103/PhysRevE.57.4323. Google ScholarCross Ref
- B. Nestler, H. Garcke, and B. Stinner. Multicomponent alloy solidification: Phase-field modeling and simulations. PHYSICAL REVIEW E 71, 041609, 2005.Google ScholarCross Ref
- J. K. Salmon, M. A. Moraes, R.O. Dror, and D. E. Shaw. Parallel random numbers: as easy as 1, 2, 3. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, page 16. ACM, 2011.Google Scholar
- S. Verdoolaege. isl: An integer set library for the polyhedral model. In K. Fukuda, J. Hoeven, M. Joswig, and N. Takayama, editors, Mathematical Software - ICMS 2010, volume 6327 of Lecture Notes in Computer Science, pages 299--302. Springer, 2010. ISBN 978-3-642-15581-9.Google Scholar
- C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis and transformation. pages 75--88, San Jose, CA, USA, Mar 2004.Google ScholarDigital Library
- Christoph W. Kessler. Scheduling expression dags for minimal register need. Computer Languages, 24(1):33--53, 1998. ISSN 0096-0551. URL http://www.sciencedirect.com/science/article/pii/S0096055198000022. Google ScholarDigital Library
- H. Stengel, J. Treibig, G. Hager, and G. Wellein. Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model. Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15, 2015. URL http://dx.doi.org/10.1145/2751205.2751240. Google ScholarDigital Library
- J. Hammer, J. Eitzinger, G. Hager, and G. Wellein. Kerncraft: A tool for analytic performance modeling of loop kernels. In C. Niethammer, J. Gracia, T. Hilbrich, A. Knüpfer, M. M. Resch, and W. E. Nagel, editors, Tools for High Performance Computing 2016, pages 1--22, Cham, 2017. Springer International Publishing. ISBN 978-3-319-56702-0.Google ScholarCross Ref
- J. Treibig, G. Hager, and G. Wellein. LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA, 2010.Google ScholarDigital Library
- F. Schornbaum and U. Rüde. Extreme-scale block-structured adaptive mesh refinement. SIAM Journal on Scientific Computing, 40(3):C358--C387, 2018.Google ScholarDigital Library
- F. Schornbaum and U. Rüde. Massively parallel algorithms for the lattice boltzmann method on nonuniform grids. SIAM Journal on Scientific Computing, 38(2): C96--C126, 2016.Google ScholarCross Ref
- C. Godenschwager, F. Schornbaum, M. Bauer, H. Köstler, and U. Rüde. A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, page 35. ACM, 2013.Google Scholar
- M. Bauer, F. Schornbaum, C. Godenschwager, M. Markl, D. Anderl, H. Köstler, and U. Rüde. A python extension for the massively parallel multiphysics simulation framework walberla. International Journal of Parallel, Emergent and Distributed Systems, 31(6):529--542, 2016.Google ScholarDigital Library
- S. van der Walt, S. C. Colbert, and G. Varoquaux. The numpy array: A structure for efficient numerical computation. Computing in Science Engineering, 13(2): 22--30, March 2011. ISSN 1521-9615. Google ScholarDigital Library
- T. Kluyver, B. Ragan-Kelley, F. Pérez, B. Granger, M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, P Ivanov, D. Avila, S. Abdalla, and C. Willing. Jupyter notebooks - a publishing format for reproducible computational workflows. In F. Loizides and B. Schmidt, editors, Positioning and Power in Academic Publishing: Players, Agents and Agendas, pages 87--90. IOS Press, 2016.Google Scholar
- A. Fog. 4. instruction tables. URL https://www.agner.org/optimize/instruction_tables.pdf.Google Scholar
- TOP500 List. http://www.top500.org/lists/2018/11/, 2019. [Online; accessed April-2019].Google Scholar
- Layer Conditions. https://rrze-hpc.github.io/layer-condition/, 2018. [Online; accessed April-2019].Google Scholar
- A. Dennstedt and L. Ratke. Microstructures of directionally solidified al-ag-cu ternary eutectics. Transactions of the Indian Institute of Metals, 65(6):777--782, 2012.Google ScholarCross Ref
- A. Genau and L. Ratke. Morphological characterization of the al-ag-cu ternary eutectic. International Journal of Materials Research, 103(4):469--475, 2012.Google ScholarCross Ref
- M. Kellner, W. Kunz, P. Steinmetz, J. Hötzer, and B. Nestler. Phase-field study of dynamic velocity variations during directional solidification of eutectic nial-34cr. Computational Materials Science, 145:291--305, 2018.Google ScholarCross Ref
- J. Hötzer, P. Steinmetz, A. Dennstedt, A. Genau, M. Kellner, Irmak Sargin, and B. Nestler. Influence of growth velocity variations on the pattern formation during the directional solidification of ternary eutectic Al-Ag-Cu. 136:335--346, 2017.Google Scholar
- M. Wegener. German Aerospace Center (DLR). private communication, 2019.Google Scholar
Index Terms
- Code generation for massively parallel phase-field simulations
Recommendations
Massively parallel phase-field simulations for ternary eutectic directional solidification
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisMicrostructures forming during ternary eutectic directional solidification processes have significant influence on the macroscopic mechanical properties of metal alloys. For a realistic simulation, we use the well established thermodynamically ...
Performance evaluations of gyrokinetic Eulerian code GT5D on massively parallel multi-core platforms
SC '11: State of the Practice ReportsA gyrokinetic toroidal five dimensional Eulerian code GT5D [Y.Idomura et. al., Comput. Phys. Commun 179, 391 (2008)] is ported on five advanced massively parallel platforms and comprehensive benchmark tests are performed. Sustained performances of the ...
Massively Parallel Fluid Simulations on Amazon's HPC Cloud
NCCA '11: Proceedings of the 2011 First International Symposium on Network Cloud Computing and ApplicationsIn this paper, we report on the results of numerical experiments in the field of computational fluid dynamics (CFD) on Amazon's HPC cloud. To this end, we benchmarked our MPI-parallel fluid solver NaSt3DGPF on several HPC compute nodes of the cloud ...
Comments