An efficient exact adjoint of the parallel MIT General Circulation Model, generated via automatic differentiation

https://doi.org/10.1016/j.future.2004.11.010Get rights and content

Abstract

We describe computational aspects of automatic differentiation applied to global ocean circulation modeling and state estimation. The task of minimizing a cost function measuring the ocean simulation versus observation misfit is achieved through efficient calculation of the cost gradient w.r.t. a set of controls via the adjoint technique. The adjoint code of the parallel MIT general circulation model is generated using TAMC or its successor TAF. To achieve a tractable problem in both CPU and memory requirements, in the light of control flow reversal, the adjoint code relies heavily on the balancing of storing versus recomputation via the checkpointing method. Further savings are achieved by exploiting self-adjointness of part of the computation. To retain scalability of domain decomposition-based parallelism, hand-written adjoint routines are provided. These complement routines of the parallel support package to perform corresponding operations in reverse mode. The unique feature of the TAF tool which enables the dumping of the adjoint state and restart the adjoint integration is exploited to overcome batch execution limitations on HPC machines for large-scale ocean and climate simulations. Strategies to test the correctness of the adjoint-generated gradient are presented. The size of a typical adjoint application is illustrated for the case of the global ocean state estimation problem undertaken by the SIO-JPL-MIT Consortium “Estimating the Circulation and Climate of the Ocean” (ECCO). Results are given by way of example.

Introduction

In one of the most complex Earth science inverse modeling initiatives, the Estimation of the Circulation and Climate of the Ocean (ECCO) project is developing greatly improved estimates of the three-dimensional, time-evolving state of the global oceans [1], [2]. To this end, the project is applying advanced, mathematically rigorous, inverse techniques [3] to constrain a state-of-the-art parallel general circulation model, MITgcm [4], [5], [6], with a diverse mix of observations. What emerges from the ECCO project goals is an optimization problem that must be solved to estimate and monitor the state or “climate” of the ocean. Ocean climate is characterized by patterns of planetary scale ocean circulation, the Gulf Stream current for example, and by large scale distributions of temperature and salinity. These quantities can be observed, but only partially, using satellites and oceanographic instruments. Combining, through a formal optimization procedure, the fragmentary observations with a numerical model, which is an a priori expression of the laws of physics and fluid mechanics that govern the ocean behavior, produces a more complete picture of the ocean climate. The ECCO optimization problem proceeds by expressing the difference between a numerical model trajectory M(u) and a set of observations d from the actual ocean in terms of a scalar cost, J, thus,J(M(u))=(H(v)d)TWobs(H(v)d)+(CoptC(0))TWC(CoptC(0))=i=1nobs(Hi(v)di)Wiiobs(Hi(v)di)+j=1nc(CjoptCj(0))WjjC(CjoptCj(0)).Here, u represents the full set of independent variables (initial and time-dependent boundary conditions), v=M(u) the model state at each time step, Hi the operator projecting the model state onto the i th observation di, Wobs and WC the data error and a priori control error variance matrices, respectively. Denoting a subset of the independent variables as adjustable “controls” C, the simulated state v=M(C), can be optimized to minimize J over the nobs observation points. The optimized controls, Copt, render a numerically simulated ocean state, M(Copt), that is spatially and temporally complete, and consistent with observations within their estimated errors.

The size of the ECCO optimization problem is formidable. In recent years, with increasing observational and computational capabilities, prominent patterns of naturally occurring intrinsic oceanic and coupled atmosphere–ocean–cryosphere variability have become widely appreciated, operating on many time-scales, from days (barotropic motion, e.g. [7]), months (mesoscale eddies, e.g. [8]), years (e.g. the El Niño-Southern Oscillation, ENSO [9]), decades (e.g. the North Atlantic Oscillation, NAO [10]), to centuries and millennia (thermohaline circulation, THC, e.g. [11]). Ideally, the optimization must, therefore, encompass processes spanning decades to centuries, on global scales, simulating them at spatial and temporal resolutions sufficient to yield state estimates with skill.

Our “smallest” current configuration is characterized by a cost function J that spans 9 years of planetary scale ocean simulations and observations, consisting of nobs=108 observational elements. Major ingredients include global, continuous sea surface height (SSH) data obtained from radar altimeters on board the TOPEX/Poseidon [12] and the European Remote Sensing Satellites ERS-1/2 [13] which achieve an accuracy at the 2–3-cm level on many spatial scales [14]. In addition, a variety of hydrographic data from the World Ocean Circulation Experiment (WOCE) [15] are used to constrain the model. Currently, the newly available data from the Jason-1 [16] altimetric mission, and depth profiles of temperature and salinity from a series of presently 300 (and potentially up to 3000 by the year 2005) autonomous seagoing floats [17] are being added.

In addition to the observational elements, J contains penalty terms for each control variable which account for the a priori error estimate of the controls (the a posteriori estimate of which would be obtained through the Hessian, i.e. the second derivative of J with respect to the controls). The a priori error was derived from climatological variances of the respective quantities, and taking into account the scales which the model resolves versus those which have to be considered as noise.

The set of independent or control variables which are varied to optimize J consists of three-dimensional fields for the initial temperature and salinity, as well as two-dimensional daily-varying surface forcing fields of heat, freshwater, and momentum (zonal and meridional wind stress) fluxes. Compressed to a one-dimensional vector C which contains wet points only, the size of the control space is nc=1.5×108.

The model state, i.e. the set of prognostic variables which are stepped forward in time by the underlying system of differential equations consists of 17 three-dimensional and 2 two-dimensional fields. A global configuration at a 1°×1° horizontal resolution with 23 vertical layers yields a model state of dimension 2×107.

Thus, with a gradient, JC of dimension 1.5×108, and a model Jacobian (i.e. the derivative of the mapping of the control space to the model state space, MC) of dimension 2×1071.5×108=3×1015, even allowing for some sparsity, the computation of the full model Jacobian is fundamentally impractical. Therefore, the reverse mode of automatic differentiation (AD), which allows the computation of the product of the Jacobian and a vector without explicitly representing the Jacobian, plays a central role.

Minimizing J, under the side condition of fulfilling the model equations, leads to a constrained optimization problem for which the gradient CJ is used to reduce J iteratively,minCJ(C)CJ(C,v(C))C=Copt=0.The constrained problem may be transformed into an unconstrained one by incorporating the model equations into the cost function (1) via the method of Lagrange multipliers. Alternatively and equivalently, the gradient may be obtained through rigorous application of the chain rule to Eq. (1). For the state estimation cost function (1) the gradient assumes the general formCJT=2MTHTW(H(v)d).Here, M denotes the model Jacobian with MT its adjoint, and H the differential of the projection operator with HT its adjoint. Thus, the gradient is proportional to the adjoint of the model Jacobian times the adjoint of the projection operator differential times the model versus data misfit.

Automatic differentiation (AD) [18] exploits this fact in a rigorous manner to produce, from a given model code, its corresponding tangent linear (forward mode) or adjoint (reverse mode) model (see e.g. [19]). The adjoint model enables the gradient (2) to be computed in a single integration. The reverse mode approach is extremely efficient for scalar-valued cost functions for which it is matrix free, and so the computational cost becomes tractable. In practical terms, we are able to develop a system that can numerically evaluate (2), for any scalar J, in roughly five to six times the compute cost of evaluating J. At this cost, reverse mode AD provides a powerful tool that is being increasingly used for oceanographic and other geophysical fluids applications. Note, that the model code is formulated in such a way that the projection operator (i.e. its specific realization for each data type, both altimetric and hydrographic) is part of the differentiated code, and thus intrinsically tied to the automatic adjoint code generation.

The results that are emerging from the application of reverse mode AD to the ocean circulation problem are of immense scientific value. However, here our focus is on the techniques that we employ to render a computationally viable system, and on providing examples of the calculations that are made possible with a competitive, automatic system for adjoint model development and integration.

The MITgcm algorithm, applications and the model’s software implementation in a parallel computing environment are described in Section 2. Section 3 discusses the implications for an AD tool and the requirements of an efficient, scalable reverse mode on a variety of parallel architectures for rendering the calculation computationally tractable. Applications are presented in Section 4 with an emphasis on computational aspects, rather than implications for oceanography or climate. An outlook is given in Section 5. Further discussion of the scientific aspects of this work is available, along with extensive data sets, at the ECCO website [20].

Section snippets

The MIT General Circulation Model

The MIT General Circulation Model (MITgcm) is rooted in a general purpose grid-point algorithm that solves the Boussinesq form of the Navier–Stokes equations for an incompressible fluid, hydrostatic or fully non-hydrostatic, in a curvilinear framework in the present context on a three-dimensional longitude (λ), latitude (ϕ), depth (r) grid. The algorithm is described in [5], [6] (for online documentation and access to the model code, see [21]).

The adjoint of MITgcm

The MITgcm has been adapted for use with the Tangent linear and Adjoint Model Compiler (TAMC), and recently its successor TAF (Transformation of Algorithms in Fortran) developed by Giering [19], [23], [24]. TAMC is a source-to-source transformation tool. It exploits the chain rule for computing the derivative of a function with respect to a set of input variables. Treating a given model code as a composition of operations—each line representing a compositional element, the chain rule is

Global ocean state estimation

The model state of the underlying global estimation problem consists of 17 three- and two-dimensional fields at a 1°×1° horizontal resolution and 23 vertical layers, yielding a model state of 5,659,200 elements which are updated at each time step (note, that a truly eddy-resolving setup would require a much higher horizontal resolution of about 1/10°×1/10° as well as higher vertical resolution, and, to remain numerically stable, a shorter model time step). The model is run over a 9-year period

Summary and outlook

Reverse mode AD applications emerge as a powerful tool to address a suite of ocean science issues. Crucial features are efficient recomputation algorithms and checkpointing. Scalability of the adjoint code, maintained by hand-written adjoint functions, complements parallel support functions of the model code. These features render computationally tractable adjoint code, despite the flow reversal in adjoint mode. Applied to the global time-dependent ocean circulation estimation problem, the code

Acknowledgements

This paper is a contribution to the ECCO project, supported by NOPP, and with funding from NASA, NSF and ONR. Many members of the ECCO Consortium under the lead of D. Stammer have contributed. Valuable comments by two anonymous reviewers are gratefully acknowledged.

Patrick Heimbach joined the physical oceanography team at the Massachusetts Institute of Technology after finishing his Ph.D. at the Max-Planck Institute for Meteorology, Hamburg, Germany, in 1998. His research focuses on applying mathematical rigorous techniques to the modeling of the complex atmosphere–ocean climate system and the quantification of its uncertainties. He has been involved in the first dynamical consistent decadal inverse modeling study by the ECCO project.

References (31)

  • D. Stammer et al.

    The global ocean circulation and transports during 1992–1997, estimated from ocean observations and a general circulation model

    J. Geophys. Res.

    (2002)
  • D. Stammer et al.

    Volume, heat and freshwater transports of the global ocean circulation 1993–2000, estimated from a general circulation model constrained by WOCE data

    J. Geophys. Res.

    (2003)
  • C. Wunsch

    The ocean Circulation Inverse Problem

    (1996)
  • C. Hill et al.

    Application of a parallel Navier–Stokes model to ocean circulation in parallel computational fluid dynamics

  • J. Marshall et al.

    Hydrostatic, quasi-hydrostatic and non-hydrostatic ocean modeling

    J. Geophys. Res.

    (1997)
  • J. Marshall et al.

    Hydrostatic, quasi-hydrostatic and non-hydrostatic ocean modeling

    J. Geophys. Res.

    (1997)
  • D. Stammer et al.

    De-aliasing of global high-frequency barotropic motions in altimetric observations

    Geophys. Res. Lett.

    (2000)
  • C. Wunsch

    Where do ocean eddy heat fluxes matter?

    J. Geophys. Res.

    (1999)
  • J. Neelin et al.

    ENSO theory

    J. Geophys. Res.

    (1998)
  • J. Marshall et al.

    Atlantic climate variability

    Int. J. Climatol.

    (2002)
  • P. Clark et al.

    The role of the thermohaline circulation in abrupt climate change

    Nature

    (2002)
  • TOPEX/Poseidon....
  • ERS-1/2....
  • C. Wunsch et al.

    Satellite altimetry, the marine geoid, and the oceanic general circulation

    Annu. Rev. Earth Planet. Sci.

    (1998)
  • WOCE....
  • Cited by (107)

    View all citing articles on Scopus

    Patrick Heimbach joined the physical oceanography team at the Massachusetts Institute of Technology after finishing his Ph.D. at the Max-Planck Institute for Meteorology, Hamburg, Germany, in 1998. His research focuses on applying mathematical rigorous techniques to the modeling of the complex atmosphere–ocean climate system and the quantification of its uncertainties. He has been involved in the first dynamical consistent decadal inverse modeling study by the ECCO project.

    Chris Hill is a researcher at the Massachusetts Institute of Technology, Department of Earth Atmospheric and Planetary Sciences. His current research interests center on the application of advanced computational technologies to challenging Earth science problems. Contact him at [email protected].

    Ralf Giering is co-owner and manager of FastOpt. He received both diploma in physics (1990) and Ph.D. (1995) in oceanography from University of Hamburg. His current research interests include automatic differentiation and optimisation/data assimilation.

    View full text