Fast Spin-Up of Geochemical Tracers in Ocean Circulation and Climate Models

Ocean geochemical tracers such as radiocarbon, protactinium and thorium isotopes, and noble gases are widely used to constrain a range of physical and biogeochemical processes in the ocean. However, their routine simulation in global ocean circulation and climate models is hindered by the computational expense of integrating them to a steady state. Here, a new approach to this long-standing “spin-up” problem is introduced to efficiently compute equilibrium distributions of such tracers in seasonally-forced models. Based on “Anderson Acceleration,” a sequence acceleration technique developed in the 1960s to solve nonlinear integral equations, the new method is entirely “black box” and offers significant speed-up over conventional direct time integration. Moreover, it requires no preconditioning, ensures tracer conservation and is fully consistent with the numerical time-stepping scheme of the underlying model. It thus circumvents some of the drawbacks of other schemes such as matrix-free Newton Krylov that have been proposed to address this problem. An implementation specifically tailored for the batch HPC systems on which ocean and climate models are typically run is described, and the method illustrated by applying it to a variety of geochemical tracer problems. The new method, which provides speed-ups by over an order of magnitude, should make simulations of such tracers more feasible and enable their inclusion in climate change assessments such as IPCC.

Here, a new approach is presented that, like MFNK, attempts to directly compute a steady state solution but overcomes some of the latter's drawbacks and limitations. The new scheme is based on a numerical technique known as Anderson Acceleration (AA) or Mixing. Developed in the 1960s by D. G. Anderson (Anderson, 1965(Anderson, , 2019 in the context of nonlinear integral equations, it primarily found application to electronic structure problems in quantum chemistry and material science (Walker & Ni, 2011;Zhang et al., 2020). More recently, there has been a resurgence of interest in AA as a solver for partial differential equations (Brune et al., 2015;Walker et al., 2010) and optimization problems (Fu et al., 2020;Tang & Daoutidis, 2022;Zhang et al., 2020).
The basic idea behind AA, and other so-called sequence acceleration methods (Brezinski, 2000;Brezinski et al., 2018), is to exploit the previous history of the model to extrapolate to a solution that is closer to equilibrium. One major advantage of AA in the context of spin-up is that it only requires the ability to integrate the model with a given initial condition and return the solution. Unlike MFNK, there is no need for a preconditioner (see below). For the spin-up problem this generally requires a transport matrix representation of the circulation (Khatiwala, 2008;Li & Primeau, 2008), which is rarely available, and a custom implementation tailored to each GCM and tracer problem, a nontrivial undertaking (Lindsay, 2017). Furthermore, and again in contrast with MFNK, the new proposed approach is demonstrated to work with multi-step time-stepping algorithms, such as leapfrog, that are common in ocean models. In that sense, this method is completely "black box," making it significantly easier to apply to a variety of GCMs and well suited to the batch HPC systems on which they are typically run.
In the next section, the theoretical underpinnings of AA are outlined, followed by details on the practical implementation. Next, the performance of AA is demonstrated by applying it to several tracer problems, including radiocarbon, ventilation tracers and 231 Pa/ 230 Th. The paper concludes with a summary and future directions for research.

Mathematical Formulation
We start with a mathematical statement of the problem. A numerical model can be expressed as a function g that takes in an initial tracer field x(0) at time t = 0, time-steps forward, and returns the tracer field x(T) at time t = T, where T is the forcing period: (0)). (1) To solve this system, which is not explicitly constructed but is implicitly defined via the model time-stepper, Merlis and Khatiwala (2008) proposed to apply MFNK. Recall that Newton's method is based on constructing a local linear model of the function f and then iterating (Dennis & Schnabel, 1996;Kelley, 1995): Given an initial iterate x 0 , for k = 0, 1, … until convergence Newton's method is attractive because, with a "good" initial guess, it converges quadratically and can be paired with a "globalization" method such as line search to find a global minimum. Much of the computational effort of this method is in solving the linear Newton equation: JΔx = −f(x). This is especially difficult for the current problem because the explicit computation and storage of the Jacobian matrix J is impractical since here this matrix is large and dense. For example,: J is O(30 TB) for a 1° resolution model with two tracers. For systems such as this it is natural to apply a Krylov subspace method like GMRES (Saad, 2003). The advantage of Krylov methods is that it is not necessary to explicitly form the coefficient matrix; only its action on a given vector is needed (Saad, 2003). Such a method is known as "matrix-free" and the combination of Newton with Krylov is called matrix-free or Jacobian-free Newton Krylov, an approach originally motivated by the solution of nonlinear ordinary and partial differential equations (Brown & Hindmarsh, 1986;Brown & Saad, 1990;Chan & Jackson, 1984;Gear & Saad, 1983;Knoll & Keyes, 2004). In the present case, the Jacobian-vector product can be accurately computed via finite differences, which only requires the ability to evaluate f(x), that is, integrate the model for one period: Krylov methods are iterative and work by expressing the solution in a small subspace that is built up through repeated evaluation of the product between the coefficient matrix and a given vector (Saad, 2003). They are thus only practical when the number of such evaluations, which for the present problem involves integrating the model for one period, is small. That is rarely the case and it is almost always necessary to precondition the matrix (Saad, 2003). In essence, the preconditioned coefficient matrix is closer to the identity matrix leading to faster convergence. Historically, MFNK has been applied to the solution of nonlinear systems derived from the discretization of partial differential equations (Knoll & Keyes, 2004). This results in sparse Jacobians for which preconditioners can usually be readily constructed. The fact that the Jacobian in the spin-up problem is dense, severely limits the choice to methods in which the preconditioner is applied implicitly, that is, by solving another linear system. Khatiwala (2008) and Li and Primeau (2008) proposed such an implicit preconditioner that could be applied to any generic biogeochemical model. With this preconditioner, Khatiwala (2008) showed that MFNK can accelerate spin-up by up to two orders of magnitude for a variety of biogeochemical models.
While MFNK can be very effective it has not been widely adopted by the modeling community (CESM is perhaps the only model in which it has been implemented for spinning-up radiocarbon (Lindsay, 2017)). There are several reasons for this. First, preconditioning requires a transport matrix for the ocean model (Khatiwala, 2008;Li & Primeau, 2008) and this is only available for a small handful of GCMs. Preconditioning also requires the ability to compute a Jacobian for the underlying (bio)geochemical model. Second, tracer conservation is not guaranteed and is tricky to ensure. Third, MFNK has not been shown to work with multi-step time stepping schemes, which are commonly used in ocean GCMs. For example, the Adams-Bashforth and leapfrog schemes require information from t − Δt and t to compute the solution at t + Δt, where Δt is the model time step. MFNK can be used with such models by providing a single initial condition, in which case the first time step will typically be an Euler step, with subsequent time steps using information from two or more previous steps. However, the equilibrium solution found by MFNK will be different from that obtained by direct time-stepping to steady state, that is, if the solution were inserted into the ocean model as an initial condition it will quite likely start drifting. While it is unclear how large or significant this drift would be in practice, the inconsistency may matter if the MFNK solution were, for example, used as an initial condition for a subsequent transient simulation. Lastly, the overall algorithm is cumbersome and difficult to apply, particularly when the model is run on batch HPC systems. This has required writing custom implementations tailored to the tracer problem and GCM (Lindsay, 2017).

Anderson Acceleration
An alternative approach to MFNK is to try to accelerate the original FP iteration, which generates a (slowly converging) sequence x k . Sequence acceleration or extrapolation methods seek to transform such sequences into ones that converge faster, an idea that has a rich history in numerical methods (e.g., Richardson extrapolation) (Brezinski, 2000;Brezinski et al., 2018;Smith et al., 1987). A simple modification of the FP algorithm illustrates the idea: Here, 0 < β ≤ 1 is known as the damping parameter. This scheme is known variously as "Krasnosel'skil Mann iteration," "averaged iteration," and "simple mixing" . Note that the iteration can also be written as x k+1 = x k + β(g(x k ) − x k ) = x k + βf(x k ), which shows that the new iterate is the current one plus a fraction β of the current residual. This simple modification can sometimes improve convergence at no extra cost and suggests that it may be possible to exploit the information contained in previous iterates to speed-up FP iteration.
Such a scheme was devised in the 1960s by D. G. Anderson (Anderson, 1965(Anderson, , 2019, who came up with an elegant approach based on taking a weighted average of several previous iterates such that, were g linear, the residual is minimized (Fang & Saad, 2009;Zhang et al., 2020). Known alternatively as AA, Mixing or Extrapolation, this method is also called "direct inversion in the iterative subspace" in computational chemistry and "Pulay mixing" in material science, the fields in which it first found wide application. Mathematically, this gives the iteration where the number of previous iterates is m k + 1 and the α j 's minimize the norm of the weighted residual f of those iterates: subject to the normalization ∑ =0 ( ) = 1 . Usually, this iteration is combined with "damping" to give: Also in practice, the constrained least-squares problem for the m k + 1 α j 's is replaced by an unconstrained one for m k γ i 's (Fang & Saad, 2009;Walker & Ni, 2011) so that the next iterate can be written as: The coefficients γ i are found by minimizing where ̃ is a matrix whose m k columns are Δf k−m , …, Δf k−1 and Δf k−m , …, Δf k−1 . The AA solution (Equation 6) can be seen as an extrapolation of the original iterate g(x k ) plus m k previous iterates . Note that substituting β k = 1 in Equation 5 gives the undamped step (Equation 4), and β k = 1, m k = 0 recovers the original FP iteration x k+1 = g(x k ).
Two aspects of AA are worth emphasizing. First, it has negligible overhead, with the computationally expensive part of the calculation being the cost of running the ocean model. Indeed, it can be run on the frontend of a HPC system or a single core of a compute node. This is in contrast to MFNK which has significant overhead and resource requirements for computing, factoring and applying the preconditioner. Second, tracer conservation is always satisfied as AA takes a linear combination of previous iterates (which are simply the outputs of the model).

Implementation
While the basic AA algorithm is quite straightforward, it is convenient to start with an existing implementation, of which there are several. For example, PETSc, a widely used numerical software library for solution of linear and nonlinear equations (Balay et al., 2022), has a full-featured AA implementation and other variations that fall under the category of nonlinear GMRES (Brune et al., 2015). However, its disadvantage is that petsc's architecture does not allow for checkpointing and restarts. This is essential for any practical implementation of AA that will be applied to the ocean spin-up problem. Instead, the "reference" implementation of Walker (2010) written in matlab is used as a starting point as it is well documented, compact, transparent, easy to modify and port to other languages such as python.
In most AA implementations, the algorithm has two main parameters, the damping parameter, β k , and the maximum memory parameter, m max , the maximum number of previous iterates stored from which the new iterate is computed. Typically, these are constant, but in so-called non-stationary AA they can vary over time (Chen & Vuik, 2022a, 2022b. Note that AA starts to be applied immediately after the first iteration, and at any iteration k there will be m k = min(k, m max ) iterates in memory from which the new iterate is constructed. When m k reaches m max , the oldest iterate in memory (on the LHS of ̃ , the coefficient matrix for the least-squares problem (Equation 7)) is discarded and the latest one added (to the RHS of ̃ ). In Walker's implementation, at each iteration, ̃ is monitored and if found to be poorly conditioned, the oldest vector is discarded. In practice, the QR factorization of ̃ is used to solve the least squares problem, and each time the matrix is updated the factorization is also updated without recomputing it using MATLAB's qrdelete function (Walker, 2010). Termination of the algorithm is based on ‖f(x k ) = g(x k ) − x k ‖ 2 going below a specified tolerance or the maximum number of specified iterations being reached.
For the spin-up problem, Walker's original implementation has been modified to incorporate a number of additional features. These include: 1. Encapsulation of the entire algorithm state in a struct data structure to facilitate checkpointing and flexibility in building workflows tailored for different problems and HPC systems; 2. Checkpointing to enable running on HPC machines with batch submission systems; 3. Hooks to signal convergence based on external criteria (e.g., the air-sea flux of CO 2 meeting the OMIP criterion); 4. The ability to restart AA by "zeroing" the memory periodically, if the algorithm stagnates, or because of an external signal based on a user-specified condition; 5. The ability to run multiple instances of AA to spin-up multiple independent tracers (or independent sets of tracers) simulated simultaneously in the model (the tracers/tracer sets can have different termination tolerances which can take a different number of iterations to reach); and 6. The facility to scale different tracers in a multi-tracer problem so that they have the same order of magnitude.
It is important to note that with this implementation, the algorithm does not have to be resident in memory, that is, run continuously (as is the case with other implementations, e.g., in petsc). This is crucial when running on batch submission systems. A python version (which can also be called from matlab) is under development and is currently being tested.
To use the code, a user must supply: • A driver routine that reads in the initial iterate and calls AA. • A "wrapper" (the g function seen by AA) around the model to exchange data between AA and the model (the initial condition on input, the state after running the model for one period on output). • An optional function that checks for convergence and passes back that information to AA via the wrapper.
The details of the driver and wrapper function will depend on the model and computing system on which the calculations are performed. Figure 1 illustrates a typical workflow on a batch HPC system but the implementation of AA presented here is general enough to allow a variety of different workflows.

Examples
In the following, the application of AA to spinning-up a variety of geochemical and ventilation tracers is illustrated (see Table 1 for a summary). For each case the number of simulated years required to reach equilibrium using AA is compared with that for direct integration (DI). Also compared are the final steady state solutions. Unless stated otherwise, AA was used with β = 1 (no damping), m max = 50 and no restarts. And with one exception, all examples were run on HPC systems with either the SLURM or PBS schedulers. The exception is the first example which was run on a local server as it involves a model that does not require parallel computing resources.

Example 1: Online Spin-Up of Abiotic Carbon Cycle Model
The first example concerns the spin-up of an OCMIP-type abiotic carbon cycle model (Orr et al., 1999) running online within the University of Victoria Earth System Climate Model (UVic ESCM) (version 2.9) (Weaver et al., 2001). This particular configuration of UVic ESCM was tuned to preindustrial conditions (Khatiwala et  Schematic of workflow for using the implementation of Anderson Acceleration presented here on a HPC machine with a batch queuing system. The sequence starts by submitting the driver to the queue. This calls the AA algorithm which in turn, after saving the full current state of the algorithm to a checkpoint file, calls the wrapper function in "run mode." The wrapper function maps the input vector to model fields, calls a script that submits both the model and driver to the queue, and then returns control to AA, which then terminates (as mentioned in the text, it is not necessary for the driver or AA algorithm to be resident in memory). As seen in the example script in the figure, execution of the driver is conditional on successful completion of the model run. Once that happens, the driver relaunches itself and calls AA, which restores the state of the algorithm by reading the checkpoint file and then calls the wrapper in "read mode." The wrapper reads the model output, remaps it to a vector and returns the solution back to AA along with (optional) convergence information. AA computes the next iterate and calls the wrapper again in "run mode." And so on. This sequence is repeated until convergence. The example script shown is for the SLURM scheduling software, but PBS and other systems can be similarly used. ochemical model, MOBI (Model of Ocean Biogeochemistry and Isotopes; Schmittner and Somes (2016)), it can be trivially reduced to a simple abiotic model for DIC and 14 C in DIC (DI 14 C) by switching off the biological source/sink terms with C preprocessor directives. DI 14 C is coupled to DIC via the air-sea gas exchange term, which is parameterized in a manner similar to the OCMIP-2 and OMIP protocols (Orr et al., 1999(Orr et al., , 2017, although with different gas transfer coefficients. The ocean GCM component of UVic ESCM has a resolution of 1.8° × 3.6° and 19 vertical layers. Tracers are time-stepped with a leapfrog scheme requiring two initial conditions. As is typical of ocean GCMs and climate models, at the start of a run UVic ESCM reads initial conditions from a (NetCDF format) restart file, if one is available. The file contains all the physical (dynamical) and tracer fields required to continue a previous integration. Here, a restart file from a previous dynamical model spin-up (Muglia & Schmittner, 2015) is used. At each iteration of AA, the DIC and DI 14 C fields (for both leapfrog time steps) in this file are overwritten by those provided by the AA algorithm (via the wrapper function). At the end of the (year long) integration, the final DIC and DI 14 C fields for both leapfrog time steps (written to another restart file) are read by the wrapper, remapped to a vector, and passed back to the AA algorithm. Figure 2 (top) shows the air-sea flux of CO 2 for DIC and the fraction of ocean volume with a radiocarbon drift of <0.001‰ per year versus the number of years of integration. The horizontal lines are the respective OMIP criteria for equilibrium. Evidently, with DI it takes ∼5,200 years to reach the OMIP criteria for air-sea flux, and ∼6,500 years for radiocarbon drift. In contrast, AA requires ∼450 and ∼470 years, respectively, to reach those criteria, implying a speed-up of over a factor of 11 for the coupled system. The bottom plots compare the equilibrium DI and AA solutions, showing that AA reproduces the DI solution. Note that both leapfrog time steps are plotted. An additional experiment (not shown) was performed in which AA was applied to the same carbon cycle model run offline via the TMM (Khatiwala, 2007(Khatiwala, , 2018Khatiwala et al., 2005) with transport matrices (TMs) from the same UVic ESCM configuration used for the above online simulations. The TMM requires only a single initial condition to be specified, leading to a problem that is half the size of that with AA applied online with leapfrog. No noticeable difference in performance was found between the online and offline cases. To the extent that these results can be generalized, it suggests that AA can be successfully applied without loss of performance to find equilibrium solutions of GCMs with multi-step time-stepping schemes. Moreover, the solutions would be fully consistent with the numerics of the model.

Example 2: Abiotic Carbon Cycle With ECCO State Estimates
To demonstrate that AA performs well at even significantly higher spatial resolution, a similar abiotic model as above was spun-up, but simulated offline using the TMM. For this, TMs extracted from the "Estimating the Circulation and Climate of the Ocean" (ECCO) version 4 (release 4) ocean state estimate (Forget et al., 2015) were used. ECCO is based on fitting the MITgcm model (Marshall et al., 1997) to a variety of observations using an adjoint approach to derive a dynamically-consistent ocean state estimate (Stammer et al., 2004;Wunsch & Heimbach, 2007). Version 4 uses observations between 1992 and 2017, and a MITgcm configuration with a "latitude-longitude-cap" grid (LLC90) with horizontal resolution ranging from 22 to 110 km and 50 vertical levels. To further assess the impact of resolution on the performance of AA, a second spin-up experiment was performed using TMs extracted from the MITgcm ECCO-GODAE ocean state estimate (Stammer et al., 2004;Wunsch & Heimbach, 2007). This version, which was constrained to observations between 1992 and 2004, has a lower resolution of 1° × 1° and 23 vertical levels. In both cases, TMs representing a monthly mean climatology over the estimation period were used. The OCMIP-2/OMIP abiotic carbon cycle model was forced with 6-hourly winds from the CORE-II atmospheric reanalysis (Large & Yeager, 2004), and temperature, salinity and sea ice concentration from the corresponding state estimate. TMs and forcing fields were interpolated to the current time step before being applied.
With ECCO-v4 (top row of Figure 3), direct integration takes ∼4,300 years and ∼8,000 years to reach the OMIP CO 2 flux and radiocarbon drift criteria, respectively, while AA requires ∼350 years and ∼470 years, respectively, to meet them. AA is thus faster by an overall factor of ∼12. With the lower resolution ECCO-GODAE configuration, the corresponding times are ∼5,400 years and ∼7,200 years for DI, and ∼350 years for meeting both criteria with AA, a speed-up of ∼15. Resolution thus does not seem to significantly impact the performance of the method. For completeness, Figure 4 shows that for ECCO-v4 the equilibrium AA solution agrees very well with the DI solution.

Example 3: Ideal Age
Next, AA is applied to the ideal age tracer (England, 1995;Holzer & Hall, 2000;Thiele & Sarmiento, 1990), a passive tracer with a zero surface boundary condition and an interior source of "1" representing aging. The   steady state solution is the "mean age," the average time since a water parcel was last in contact with the surface (Holzer & Hall, 2000), a widely used metric of ocean ventilation time scales. The ideal age was simulated with the TMM using TMs extracted from the MITgcm ECCO-GODAE ocean state estimate (see above). Figure 5 (left) comparing direct integration and AA shows that the former takes ∼4,500 years to reach the equivalent OMIP criterion for radiocarbon age drift, while the latter takes ∼200 years. This is a speed-up by a factor of ∼22. The right panel compares the two equilibrium solutions. AA large reproduces the DI solution with a RMS difference of ∼1.85 years. The solutions differ at a few isolated grid points where the model evolves slowly due to weak exchange with surrounding waters. At those points the AA mean age is systematically older than the corresponding DI values, suggesting that the DI solution hasn't fully equilibrated.

Example 4: Preformed Tracers
A "preformed" tracer is a conservative tracer whose concentration is set at the sea surface and is passively transported into the interior by ocean circulation. Such tracers are often used to study and quantify the strength of ocean carbon pumps by propagating surface distributions of nutrients, dissolved oxygen (O 2 ) and DIC (Ito & Follows, 2005;Ito et al., 2004;Khatiwala et al., 2019;Lauderdale et al., 2013;Williams & Follows, 2011). For this example, preformed PO 4 and O 2 were spun-up using monthly mean fields from World Ocean Atlas 2018 (WOA18; Garcia et al. (2018)) as surface boundary conditions. The boundary conditions were propagated into the ocean interior using the TMM with TMs extracted from the same configuration of UVic ESCM as above. AA was applied separately to each tracer (until a specified tolerance for the norm of the residual was reached), but both tracers were simulated simultaneously. That is, two instances of AA were run within the same overall driver. Figure 6 shows the results. Unlike the previous examples, here and in the following set of examples, there is no physical criterion for "convergence." Such tracers are typically integrated for several thousand years Lauderdale et al., 2013), at which point the solution is considered to be in equilibrium. Therefore, the number of iterations required to achieve the same residual norm as that reached by direct integration after a specified number of years is used to assess AA's performance. (Recall that the residual norm is the norm of f(x), the difference between the initial condition x and the solution after 1 year of integration.) If this is 4,000 years, AA requires 260 years for O 2 and 160 years for PO 4 , a speed-up of ∼15 and ∼25, respectively.
This example illustrates how the AA implementation described here can be used to spin-up multiple, independent tracers simultaneously. A problem where this facility can be particularly advantageous is that of computing the distribution of source water fractions, that is, the fraction of water at any point in the ocean that was last in contact with a given surface patch (e.g., Haine & Hall, 2002;Khatiwala et al., 2001;Primeau, 2005). The problem is in fact quite similar to that of preformed tracers, with the difference being that the boundary condition is fixed at "1" on the patch in question and "0" on the rest of the ocean surface. The steady state solution is the source water fraction. Like the mean age, water mass fractions are an important and widely used metric of ocean ventilation Horizontal line is the OMIP radiocarbon criterion for equilibrium (Orr et al., 2017). Right: Comparison of the AA equilibrium solution (vertical axis) with that computed by direct time integration (horizontal axis).
but aren't routinely simulated because of the need for extended integrations to capture the ocean's long diffusive time scales (Holzer & Primeau, 2006;Khatiwala et al., 2012;Primeau, 2005). It should be noted that there is value in the directly integrated solution to this problem. The time derivative of the transient solution (Haine & Hall, 2002) is the so-called "boundary propagator" (BP), a type of Green's function for the advection-diffusion equation that can be interpreted as a probability density function of the time and location of last surface contact for any water parcel (Holzer & Hall, 2000). In addition to its intrinsic value in rigorously characterizing ocean circulation, the BP has been used to estimate uptake of anthropogenic carbon (Khatiwala et al., 2009) and heat (Zanna et al., 2019) by the ocean. The 0 th moment of the BP is the water mass fraction and the first moment is the mean age (e.g., Waugh et al., 2003). There is work underway to investigate whether AA can be used to compute the moments of the BP efficiently, and if those moments can be used to approximate the full boundary propagator.

Example 5: Protactinium and Thorium Isotopes
In this and the next example two geochemical problems are considered. The first involves the tracers 231 Pa and 230 Th, whose ratio is widely used as a paleoproxy for the strength of the Atlantic Meridional Overturning Circulation (McManus et al., 2004;Yu et al., 1996). These particle reactive tracers are produced by uranium decay at (different) constant, spatially-uniform rates in the ocean, and in turn undergo radioactive decay. They are absorbed onto and desorbed from sinking particles in a process termed reversible scavenging (Bacon & Anderson, 1982). Weaker scavenging of 231 Pa relative to 230 Th causes it to be advected further and have a longer residence time than the latter (Henderson & Anderson, 2003;Yu et al., 1996), which is the basis for the use of the ratio of these tracers as a circulation proxy. These tracers are now implemented in many ocean and climate models used for paleoclimate studies (e.g., Gu & Liu, 2017;Missiaen et al., 2020;Rempfer et al., 2017;Sasaki et al., 2022;van Hulten et al., 2018). The two tracers have also been incorporated into MOBI, the biogeochemical model embedded within UVic ESCM (see above), as a separate module. A full description, which is in the process of being written for publication elsewhere, is beyond the scope of this paper but, briefly, their implementation closely follows Siddall et al. (2005). Scavenging from four different particle types (particulate organic carbon, opal, calcium carbonate and lithogenic particles) are treated, with particle concentration fields taken from Siddall et al. (2005). (The implementation also allows particle concentration fields to be taken directly from those simulated simultaneously by MOBI, but that feature is not used here.) Scavenging coefficients are from Hayes et al. (2015). MOBI and the Pa/Th module can be run either online within UVic ESCM or offline via the TMM (in which case TMs from any ocean model can be used). Here, for computational efficiency it was run via the TMM using TMs from UVic ESCM as described above. Both tracers were spun-up simultaneously.
As in the above example, in the absence of physical criteria for convergence the residual norm during AA iterations is monitored. As seen in Figure 7 (top), both tracers reach equilibrium relatively quickly, with direct integration taking ∼3,000 years for 231 Pa and even less for 231 Pa. Since both tracers are almost always simulated simultaneously the residual norm at 3,000 years is taken as convergence criteria. Using AA this takes ∼325 years for 231 Pa and ∼130 years for 230 Th, that is, AA offers a speed-up by a factor of ∼10. The bottom plots show there is very good agreement between the DI and AA solutions.

Example 6: Zinc Cycling Model
Lastly, a model of the oceanic cycle of zinc (Zn), a micronutrient important for phytoplankton growth (Morel et al., 2014;Vance et al., 2017), is spun-up. The model is that of de Souza et al. (2018) (see also Vance et al. (2017)) in which the biological uptake of Zn is linked to that of phosphate (PO 4 ) via a stoichiometric param eter that is a nonlinear function of the Zn concentration. The uptake of PO 4 is in turn diagnosed by restoring the surface concentration of PO 4 (Najjar et al., 2007) toward a seasonally-evolving climatology from WOA18 (Garcia et al., 2018), with a fraction of the uptake instantaneously converted to dissolved organic phosphorus (DOP). The remaining uptake (along with that of Zn) is exported out of the euphotic layer as a particulate flux which is remineralized with depth according to a power law (Martin et al., 1987). The model thus consists of three coupled tracers. The model is coupled to the TMM and, as in de Souza et al. (2018), run with TMs from a MITgcm configuration with a resolution of 2.8° × 2.8°× 15 levels.
In this example, to investigate the effect of the maximum memory parameter on AA's performance, three different values of m max were tried: m max = 50 with no restart as in the previous examples; and m max = 30 and m max = 40, both with restarts when the maximum memory was reached. As is evident in Figure 8 (top), the latter two performed better than m max = 50 without restart. While one might naively think that the more information retained for AA to exploit the better its performance, this example shows that is not necessarily the case. Retaining more iterates may lead to out-of-date information being used, degrading performance (Walker, 2010). Or it may lead to poor conditioning of the least-squares problem as redundant information is added to ̃ .
Using the year 3,000 residual norm with direct integration as a convergence criteria, AA with m max = 30 takes ∼440 years for both PO 4 and Zn, a factor of ∼7 speed-up. Regardless of the value of m max , AA converges to the same equilibrium solution, which agrees well with DI (bottom plots).
In this problem, the inventories of Zn and total phosphorus (the sum of PO 4 and DOP) should be conserved (within numerical accuracy). Reassuringly, the global mean concentration of Zn (phosphorus) for the AA solution is found to deviate from its initial value by ∼−0.058% (∼−0.04%), while that for the DI solution by ∼−0.047% (∼−0.033%). This demonstrates another important aspect of AA, namely the ability to conserve tracer.

Summary and Future Directions
In this study a new method for spin-up of passive tracers in periodically-forced ocean models is described and applied to several widely simulated geochemical and ventilation tracers. The new approach, based on a sequence acceleration technique called AA or Mixing, offers speed-ups of between 10 and 25 times over conventional direct time integration. Also described is an implementation that is tailored for the spin-up problem and designed to work on the batch HPC systems on which ocean GCMs and climate models are typically run. The algorithm has two main tunable parameters, the damping parameter β and the maximum memory parameter m max . For the problems treated here, β = 1 and m max = 50 were found to work well. But their optimal values are likely to depend on the specific tracer problem and spatial resolution of the underlying ocean GCM and some experimention on the problem at hand will likely be needed to find them.
While for some problems MFNK, another proposed approach to the spin-up problem (Khatiwala, 2008;Li & Primeau, 2008;Merlis & Khatiwala, 2008), may perform better, AA offers a number of advantages. Unlike MFNK, AA is completely black box, requiring no preconditioners and thus no TMs for the underlying GCM; it has been demonstrated to work on models with multi-step time stepping schemes; by construction, tracer conservation is ensured; and it is well suited to batch HPC systems, a particularly complicated aspect of using MFNK (Lindsay, 2017).
The promising results shown by AA in this study suggest a number of avenues for future research. An obvious one is whether AA can be applied to more complex biogeochemical models. Preliminary experiments with a typical NPZD-type model with several interacting tracers have yielded promising results, with a factor of 10 speed-up over direct integration. However, more refinement is needed before the method can be applied routinely to such models. For example, AA does not guarantee that the new iterate generated from extrapolating previous ones is physically sensible. In the present context, tracer concentrations should be non-negative or not exceed their plausible range. While this was always the case in the examples presented here and the initial NPZD model experiments, for other biogeochemical problems it may require adding safeguards or constraints on the least-squares solver. Biogeochemical problems are also often stiff, that is, involve multiple time scales. This could potentially lead to loss of performance or even stagnation of the method. The non-stationary variants of AA that have been recently proposed (Chen & Vuik, 2022a, 2022b) may help ameliorate this behavior. In such schemes, β k and m k are adjusted over time, possibly dynamically (by solving an optimization subproblem) (Chen & Vuik, 2022a). Such variants come at the cost of additional function evaluations but may help stabilize AA for more complex problems.
A second avenue is to explore whether AA can be combined with MFNK to overcome some of the latter's drawbacks. In particular, Eyert (1996) and Fang and Saad (2009) have shown that there is a deep connection between quasi-Newton (QN) methods and AA (and other acceleration methods). In quasi-Newton, the Jacobian is not recalculated at each iteration but is "updated" by using information from previous iterations. A well known method for this is due to Broyden who came up with a remarkable scheme to modify the Jacobian (or its inverse) via a low rank update (which requires storing just a few vectors) (Brown & Brune, 2013;Fang & Saad, 2009;Nocedal & Wright, 2006). Such "limited memory" QN methods are widespread in scientific computing. The relation between AA and QN can be seen by writing Equation 6 as: where ̃ is a matrix whose m k columns are Δg k−m , …, Δg k−1 , with Δg i = g(x i+1 ) − g(x i ) (Walker, 2010). Substituting the normal equations solution to the least squares problem ‖ −̃( ) ‖ 2 2 , and rearranging, AA can be written as: Comparing this with Newton we see that AA implicitly constructs an approximate inverse Jacobian that is a rank m update to -I. An interesting question is whether this can be exploited in some way. One possibility is to insert a few iterations of AA between each iteration of MFNK, using the former to precondition the inner GMRES iterations within the latter. Updating Jacobians via low-rank updates is a common strategy in quasi-Newton (Brown & Brune, 2013) and, as envisioned here, the preconditioner can be applied efficiently and recursively by storing just a few vectors (Nocedal, 1980;Nocedal & Wright, 2006).
A third direction is to investigate whether AA can be used to accelerate the dynamical equilibration of seasonally-forced ocean models. Indeed, this problem was one of the primary motivations for the development of the TMM (Khatiwala et al., 2005). And it was also the context in which MFNK was first applied to the ocean spin-up problem (Merlis & Khatiwala, 2008). One could either apply AA to the full model state or, simpler and as proposed by Khatiwala et al. (2005), interleave equilibration by AA of active tracers (temperature and salinity) with conventional direct integration to adjust the dynamical (velocity and pressure) fields.
In the examples discussed here the forcing and underlying ocean circulation were perfectly periodic (ensured by repeatedly using a single year's forcing and circulation). While this is quite reasonable for many situations, as pointed out by an anonymous reviewer, care will be needed when using AA (or MFNK) with climate models that exhibit intrinsic interannual variability, even when they are nominally in equilibrium. In such cases, it may be important to account for this variability in computing the equilibrium solution. One possibility is to use a longer repeat cycle for the forcing/circulation, that is, each AA iteration would include several years, but this will naturally reduce the speed-up. Another is to use a 1 year integration in each AA iteration while forcing the tracer model with the (time-dependent) forcing and circulation from the climate model. Each iteration would thus experience a slightly different forcing/circulation, which may be regarded as "noise" in the tracer model. There is evidence that AA can be successfully applied to such problems (e.g., Toth et al., 2017). The motivation for this approach is that the equilibrium solution found by AA will at least in part incorporate the impact of variability. It would be interesting to see to what extent this is borne out in practice.
Lastly, within ocean models there are a number of components that require efficient, scalable and robust solvers. One such is the sea ice component where the complex, nonlinear physics remains a challenge for the iterative schemes (e.g., MFNK) currently being used (Kimmritz et al., 2017;Lemieux et al., 2012;Losch et al., 2014). Anderson Acceleration may be just the tip of the iceberg in terms of sequence acceleration methods (Brezinski et al., 2018) that may be worth pursuing.

Data Availability Statement
The Anderson Acceleration code described in this study is available from https://github.com/samarkhatiwala/ AndersonAcceleration. The TMM software and associated data required to perform the simulations presented here are available from Khatiwala (2018)