ABSTRACT
In order to profit from emerging high-performance computing systems, weather and climate models need to be adapted to run efficiently on different hardware architectures such as accelerators. This is a major challenge for existing community models that represent very large code bases written in Fortran. We introduce the CLAW domain-specific language (CLAW DSL) and the CLAW Compiler that allows the retention of a single code written in Fortran and achieve a high degree of performance portability. Specifically, we present the Single Column Abstraction (SCA) of the CLAW DSL that is targeted at the column-based algorithmic motifs typically encountered in the physical parameterizations of weather and climate models. Starting from a serial and non-optimized source code, the CLAW Compiler applies transformations and optimizations for a specific target hardware architecture and generates parallel optimized Fortran code annotated with OpenMP or OpenACC directives. Results from a state-of-the-art radiative transfer code, indicate that using CLAW, the amount of source code can be significantly reduced while achieving efficient code for x86 multi-core CPUs and GPU accelerators. The CLAW DSL is a significant step towards performance portable climate and weather model and could be adopted incrementally in existing code with limited effort.
- CLAW Project 2015-2018. CLAW directive language specification. (2015-2018). https://github.com/claw-project/claw-language-specification.Google Scholar
- CLAW Project 2015-2018. CLAW Fortran Compiler. (2015-2018). https://github.com/claw-project/claw-compiler.Google Scholar
- DragonEgg 2014. DragonEgg - Using LLVM as a GCC backend. (2014). https://dragonegg.llvm.org.Google Scholar
- R Ford, M Glover, D Ham, C Maynard, S Pickles, and G Riley. 2013. GungHo Phase 1 Computational Science Recommendations. Technical Report. http://www.metoffice.gov.uk/media/pdf/8/o/FRTR587Tagged.pdfGoogle Scholar
- Tobias Grosser and Torsten Hoefler. 2016. Polly-ACC: Transparent Compilation to Heterogeneous Hardware. In ICS: International Conference on Supercomputing. 1--13. Google ScholarDigital Library
- XcalableMP Specification Working Group. 2017. XcodeML/Fortran Specification. Language Specification. RIKEN CCS, Kobe, Japan.Google Scholar
- Tobias Gysi, Carlos Osuna, Oliver Fuhrer, Mauro Bianco, and Thomas C. Schulthess. 2015. STELLA: A Domain-specific Tool for Structured Grid Methods in Weather and Climate Models. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, New York, NY, USA, Article 41, 12 pages. Google ScholarDigital Library
- Carter Edwards H., Trott Christian R., and Sunderland Daniel. 2014. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel and Distrib. Comput. 74 (2014), 3202--3216. Google ScholarDigital Library
- J. A. Herdman, W. P. Gaudin, O. Perks, D. A. Beckingsale, A. C. Mallinson, and S. A. Jarvis. 2014. Achieving Portability and Performance through OpenACC. 2014 First Workshop on Accelerator Programming using Directives July 2013 (2014), 19--26. Google ScholarDigital Library
- Tetsuya Hoshino, Naoya Maruyama, and Satoshi Matsuoka. 2014. An OpenACC Extension for Data Layout Transformation. 2014 First Workshop on Accelerator Programming using Directives July 2013 (2014), 12--18. Google ScholarDigital Library
- Daniel J. Quinlan. 2000. ROSE: Compiler Support for Object-Oriented Frameworks. 10 (06 2000), 215--226.Google Scholar
- Kazuhiko Komatsu, Ryusuke Egawa, Shoichi Hirasawa, Hiroyuki Takizawa, Ken'ichi Itakura, and Hiroaki Kobayashi. 2016. Translation of Large-Scale Simulation Codes for an OpenACC Platform Using the Xevolver Framework. International Journal of Networking and Computing 6 (2016), 167--180. http://www.ijnc.org/index.php/ijnc/article/view/123Google ScholarCross Ref
- Xavier Lapillonne and Oliver Fuhrer. 2014. Using Compiler Directives to Port Large Scientific Applications to GPUs: An Example from Atmospheric Science. Parallel Processing Letters 24, 1, Article 5 (March 2014), 1450003 pages.Google ScholarCross Ref
- LLVM Compiler Infrastructure 2003-2018. LLVM Language Reference Manual. (2003-2018). https://llvm.org/docs/LangRef.html.Google Scholar
- Michel Muller and Takayuki Aoki. 2017. Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model. CoRR abs/1710.08616 (2017). arXiv:1710.08616 http://arxiv.org/abs/1710.08616Google Scholar
- Matthew R. Norman, Azamat Mametjanov, and Taylor Mark. 2017. Exascale Programming Approaches for the Accelerated Model for Climate and Energy. Technical Report 168250. Argonne National Laboratory, Chicago, IL.Google Scholar
- Fuhrer Oliver, Osuna Carlos, Lapillonne Xavier, Gysi Tobias, Cumming Ben, Bianco Mauro, Arteaga Andrea, and Schulthess Thomas Christoph. 2014. Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomputing frontiers and innovations 1, 1 (2014), 45--62. Google ScholarDigital Library
- Omni Compiler Project 2013-2018. An Infrastructure for Source-to-Source Transformation. (2013-2018). http://omni-compiler.org.Google Scholar
- OpenACC-Standard.org 2017. The OpenACC Application Programming Interface - Version 2.6. (2017). https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.6.final.pdf.Google Scholar
- OpenMP Architecture Review Board November 2015. OpenMP Application Programming Interface - Version 4.5. (November 2015). http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf.Google Scholar
- Robert Pincus, Eli J. Mlawer, Lazaros Oreopoulos, Andrew S. Ackerman, Sunghye Baek, Manfred Brath, Stefan A. Buehler, Karen E. Cady-Pereira, Jason N. S. Cole, Jean-Louis Dufresne, Maxwell Kelley, Jiangnan Li, James Manners, David J. Paynter, Romain Roehrig, Miho Sekiguchi, and Daniel M. Schwarzkopf. 2015. Radiative flux and forcing parameterization error in aerosol-free clear skies. Geophysical Research Letters 42, 13 (2015), 5485--5492.Google ScholarCross Ref
- ROSE Compiler 2000-2018. ROSE Compiler Infrastructure. (2000-2018). http://rosecompiler.org.Google Scholar
- Suttinee Sawadsitang, James Lin, Simon See, Francois Bodin, and Satoshi Matsuoka. 2015. Understanding Performance Portability of OpenACC for Supercomputers. 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (2015), 699--707. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7284377 Google ScholarDigital Library
- Akihiro Tabuchi, Masahiro Nakao, and Mitsuhisa Sato. 2014. A Source-to-Source OpenACC Compiler for CUDA. Springer Berlin Heidelberg, Berlin, Heidelberg, 178--187.Google Scholar
- H. Takizawa, S. Hirasawa, Y. Hayashi, R. Egawa, and H. Kobayashi. 2014. Xevolver: An XML-based code translation framework for supporting HPC application migration. In 2014 21st International Conference on High Performance Computing (HiPC). 1--11.Google Scholar
- Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, Ryusuke Egawa, Akihiro Musa, and Hiroaki Kobayashi. 2017. Vectorizationaware loop optimization with user-defined code transformations. In Proceedings - IEEE International Conference on Cluster Computing, ICCC, Vol. 2017-Septe. 685--692.Google Scholar
- Michael Wolfe. 2016. Compilers and More: What Makes Performance Portable? (April 2016). Retrieved December 15, 2017 from https://www.hpcwire.com/2016/04/19/compilers-makes-performance-portable/Google Scholar
Index Terms
- The CLAW DSL: Abstractions for Performance Portable Weather and Climate Models
Recommendations
New High Performance GPGPU Code Transformation Framework Applied to Large Production Weather Prediction Code
We introduce “Hybrid Fortran,” a new approach that allows a high-performance GPGPU port for structured grid Fortran codes. This technique only requires minimal changes for a CPU targeted codebase, which is a significant advancement in terms of ...
Performance Analysis and Optimization of Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on the K Computer and TSUBAME2.5
PASC '16: Proceedings of the Platform for Advanced Scientific Computing ConferenceWe summarize the optimization and performance evaluation of the Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on two different types of supercomputers: the K computer and TSUBAME2.5. First, we evaluated and improved several kernels extracted from ...
Accelerating Quantum Many-Body Configuration Interaction with Directives
Accelerator Programming Using DirectivesAbstractMany-Fermion Dynamics—nuclear, or MFDn, is a configuration interaction (CI) code for nuclear structure calculations. It is a platform-independent Fortran 90 code using a hybrid MPI+X programming model. For CPU platforms the application has a ...
Comments