Ocean‐Only FAFMIP: Understanding Regional Patterns of Ocean Heat Content and Dynamic Sea Level Change

There is large uncertainty in the future regional sea level change under anthropogenic climate change. Our study presents and uses a novel design of ocean general circulation model (OGCM) experiments to investigate the ocean's response to surface buoyancy and momentum flux perturbations without atmosphere‐ocean feedbacks (e.g., without surface restoring or bulk formulae), as part of the Flux‐Anomaly‐Forced Model Intercomparison Project (FAFMIP). In an ensemble of OGCMs forced with identical surface flux perturbations, simulated dynamic sea level (DSL) and ocean heat content (OHC) change demonstrate considerable disagreement. In the North Atlantic, the disagreement in DSL and OHC change between models is mainly due to differences in the residual (resolved and eddy) circulation change, with a large spread in the Atlantic meridional overturning circulation (AMOC) weakening (20–50%). In the western North Pacific, OHC change is similar among the OGCM ensemble, but the contributing physical processes differ. For the Southern Ocean, isopycnal and diapycnal mixing change dominate the spread in OHC change. In addition, a component of the atmosphere‐ocean feedbacks are quantified by comparing coupled, atmosphere‐ocean GCM (AOGCM) and OGCM FAFMIP experiments with consistent ocean models. We find that there is 10% more AMOC weakening in AOGCMs relative to OGCMs, since the extratropical North Atlantic SST cooling due to heat redistribution amplifies the surface heat flux perturbation. This component of the atmosphere‐ocean feedbacks enhances the pattern of North Atlantic OHC and DSL change, with relatively stronger increases and decreases in the tropics and extratropics, respectively.


Introduction
A rise in global mean sea level is a robust feature of projected anthropogenic climate change from state-ofthe-art atmosphere-ocean general circulation models (AOGCMs) (Church et al., 2013;Slangen et al., 2014). Projected global mean sea level rise is largely due to a net ocean heat uptake, leading to thermal expansion, and total ocean mass increase due to reduced terrestrial water and ice storage (Church et al., 2013). However, there has been considerable disagreement among AOGCMs contributing to the Coupled Model Intercomparison Project, Phase 5 (CMIP5; Taylor et al., 2012) on the more policy-relevant regional patterns of sea level change (Bouttes & Gregory, 2014;Bouttes et al., 2012;Church et al., 2013;Yin, 2012). Part of the large disagreement in regional sea level projections stems from model structural differences, such as the variety of spatial grids and subgrid-scale parameterizations. These structural differences contribute to a spread in model performance at simulating regional ocean processes. Multimodel ensembles such as CMIP5 allow for a better quantification of the uncertainty associated with regional sea level change. However, the optimal method for selecting or rejecting models to reduce uncertainty remains unclear (Knutti et al., 2010). Air-sea buoyancy and momentum flux changes are coupled to ocean dynamic and thermodynamic changes and play an important role in modulating regional sea level change (Bouttes & Gregory, 2014;Lowe & Gregory, 2006).
Dynamic sea level (DSL) change is a useful metric for examining the ocean processes that modulate regional sea level. DSL is defined as ζ ¼ η − η, where η is the sea surface height relative to a particular geopotential surface and · represents a global mean . Hence, DSL change, Δζ ¼Δη − Δη, has a global mean equal to zero by construction. Changes in depth-integrated ocean circulation directly contribute to Δζ via a barotropic component. Circulation change is also strongly coupled to temperature and salinity changes, which affects density, and contributes to Δζ through a baroclinic component. Typically, at middle and high latitudes, the baroclinic component of Δζ has a much larger magnitude than the barotropic component (Lowe & Gregory, 2006).
Coupled AOGCMs generally simulate qualitatively consistent Δζ responses to greenhouse gas forcing in three regions. Reduced heat loss and increased precipitation over the high-latitude North Atlantic inputs buoyancy, weakens the Atlantic meridional overturning circulation (AMOC) and leads to a meridional Δζ dipole. This Δζ dipole is characterized by relative sea level increases and decreases over the subpolar and subtropical gyres, respectively (Bouttes et al., 2013). Over the North Pacific, an opposite Δζ dipole is simulated, due to relatively enhanced heat uptake over the subtropical gyre, and increased zonal wind stress, which accelerates the gyre circulation (Yin et al., 2010). In the Southern Ocean, a similar Δζ dipole is evident, with relative sea level increases and decreases north and south of the Antarctic Circumpolar Current (ACC), respectively. Increased buoyancy input at high Southern Ocean latitudes is advected northward via Ekman transport . This Ekman transport of relatively low-density water is further enhanced due to increased westerly wind stress and a poleward shift of the winds over the ACC, leading to the meridional DSL change dipole (Bouttes et al., 2012;Lowe & Gregory, 2006;Marshall et al., 2015;Saenko et al., 2015;Spence et al., 2014). The North Pacific and Southern Ocean features are common in a majority of models, with weaker agreement for the North Atlantic . More recent, CMIP Phase 6 (CMIP6) simulations generally indicate similar features in projected regional DSL  and AMOC (Weijer et al., 2020) change relative to previous generations of AOGCMs. There is little consensus on the rate at which regional anthropogenic Δζ will emerge from natural variability due to disagreement in the unperturbed and forced interannual variability, and uncertainty in the sensitivity of ocean dynamics to surface forcing (Lyu et al., 2014).
In order to investigate the spread in DSL projections under greenhouse gas forcing, Gregory et al. (2016) devised the Flux-Anomaly-Forced Model Intercomparison Project (FAFMIP), a novel set of AOGCM experiments and diagnostics to contribute toward CMIP6. Part of the spread in DSL projections from AOGCMs arises from the global and local ocean dynamical and thermodynamical response to greenhouse gas forcing, leading to different patterns of surface flux changes (Bouttes et al., 2012). The FAFMIP experiments involve prescribing time-independent (except for a seasonal cycle) surface buoyancy and momentum flux perturbations (presented in Figure 1 and discussed further in section 2) to an ensemble of several different AOGCMs. The surface perturbations imposed are the same in all models, so this framework estimates the spread in DSL change uncertainty due to the model response to a given surface forcing (heat, freshwater, or momentum). A further experiment involves applying the buoyancy and momentum flux perturbations simultaneously. Comparing the response to this simultaneous perturbation with the sum of the responses to the individual perturbations, the nonlinear response to heat, freshwater, and momentum flux changes can also be diagnosed.
A limitation of the coupled FAFMIP design is that an additional and unknown atmosphere-ocean feedback is introduced: Air-sea perturbations are imposed, which lead to ocean circulation changes; these changes in ocean circulation can contribute to sea surface temperature (SST) changes, which in turn affects the atmosphere (Gregory et al., 2016). This unknown coupled feedback is problematic, because the main aim of FAFMIP is to isolate the role of ocean model uncertainty (in order to understand the spread in regional patterns of ocean heat uptake and thermosteric sea level). It is a particular issue in regions where the ocean circulation is tightly coupled to the surface buoyancy flux (Delworth & Greatbatch, 2000).
Ocean modeling advances have occurred since the original FAFMIP protocol was designed (Marshall et al., 2015;Zika et al., 2018), which enable a control ocean state to be simulated with minimal drift and without surface restoring. This motivates a revisit of the FAFMIP experiments using an ocean-only framework, in order to further investigate the ocean processes that affect DSL change. This study presents an ocean-only FAFMIP investigation, building upon and complementing the AOGCM analysis of Gregory et al. (2016). Here, we use an ensemble of five ocean general circulation models (OGCMs) and two AOGCMs with ocean components from the OGCM ensemble. Two aims motivate our study of the ocean's role in future DSL change.
The first aim is the one that motivates FAFMIP, namely, to examine how much of the spread in regional patterns of heat content change and Δζ in coupled AOGCMs is due to the use of different ocean models. Individual OGCMs simulate a range of background states, use a variety of spatial grids, and incorporate different subgrid-scale parametrizations, with varying biases relative to the observed ocean state (Flato et al., 2013). The ocean-only FAFMIP extends the comparison by including models that are not used in CMIP5. The ocean-only framework provides a clearer assessment of the direct ocean response to surface buoyancy and momentum flux perturbations relative to the coupled framework. show the surface heat (faf-heat and faf-heat-o) and surface freshwater (faf-water) flux perturbations, respectively. Note that the surface heat flux perturbation over the Barents and Kara seas is reset to zero. In (c), colors indicate the magnitude, and arrows the direction, of the surface momentum flux perturbation (faf-stress). All flux perturbations are defined as positive downward, from the atmosphere to the ocean. In (d), a schematic demonstrates the surface flux perturbations applied in each experiment.
The second aim of this study is to quantify the effect of atmosphere feedbacks on ocean climate change. In ocean-only FAFMIP experiments, no surface restoring or bulk formulae for ocean-atmopshere coupling is applied in the forced scenarios. In coupled FAFMIP experiments, the atmosphere responds to changes in sea surface conditions simulated by the ocean, producing a coupled feedback. In particular, the applied heat flux perturbation induces a weakening of the AMOC, leading to a surface cooling in the North Atlantic, and hence an increase in the heat flux into the ocean, as a positive feedback ( Figure 2). By comparing AOGCM and OGCM simulations performed with an identical ocean model, coupled feedbacks, similar to the one previously described, can be quantified, because they do not occur in the ocean-only FAFMIP experiments.
The OGCM and AOGCM FAFMIP methods are described in section 2. Section 3 presents an analysis of the ocean circulation, heat content, and DSL change in the surface heat flux perturbation experiments. Section 4 extends the analysis to the surface freshwater (faf-water) and momentum (faf-stress) flux perturbation experiments, in addition to the simultaneous surface flux perturbation experiments. Finally, we conclude in section 5.

Ocean and AOGCMs
Five OGCMs (MITgcm, GFDL-MOM5, ACCESS-OM2, HadOM3, and NEMO3.4) and two coupled AOGCMs (HadCM3 and CanESM5) are used in this study. Model acronyms, forcing data, and technical details are presented in Table 1. HadOM3 and NEMO3.4 are the ocean components to HadCM3 and CanESM5, respectively. GFDL-MOM5 and ACCESS-OM2 are two different configurations of the NOAA-GFDL Modular Ocean Model, Version 5 (MOM5) (Griffies, 2012). ACCESS-OM2 uses the CICE5 sea ice model (Hunke et al., 2015), instead of the standard MOM5 sea ice model in GFDL-MOM5. Differing parametrization choices are also applied in GFDL-MOM5 and ACCESS-OM2 Kiss et al., 2020). Several further AOGCM FAFMIP simulations have been performed using CMIP5 (Gregory et al., 2016) and CMIP6 generation models. However, assessing these coupled simulations is beyond the scope of this study since matching OGCM simulations are currently unavailable.
Among the OGCM ensemble, the horizontal grid resolution is nominally between 2.8°and 1°latitude × longitude, with vertical grids using between 15 and 50 irregularly spaced levels, with level thickness increasing with depth. All OGCMs use the Gent and McWilliams (1990) (GM) eddy parametrization scheme to represent subgrid, mesoscale eddies. MITgcm, GFDL-MOM5, ACCESS-OM2, and HadOM3 implement a skew-flux closure (Griffies, 1998) of the GM scheme. In addition, MITgcm, HadOM3, and NEMO3.4 use the Visbeck et al. (1997) scheme to estimate the eddy coefficient from the diagnosed Eady growth rate. Both AOGCMs include a thermodynamic-dynamic sea ice model, while the treatment of sea ice in the OGCMs, alongside other parametrization choices, is discussed further in Appendix A.
To produce statistically equilibrated initial conditions, the OGCMs except for HadOM3 are integrated for several thousand years with either a prescribed monthly climatology (MITgcm) or daily varying (ACCESS-OM2 and NEMO3.4) air-sea heat, Q in W m −2 ; freshwater, W in kg m −2 s −1 ; and momentum fluxes, τ in Pa, or a prescribed atmospheric climatological state from which these fluxes are estimated via bulk formulae (GFDL-MOM5). The MITgcm and NEMO3.4 surface conditions are derived from AOGCM preindustrial control simulations. The ACCESS-OM2 and GFDL-MOM5 surface conditions, JRA55-do (Tsujino et al., 2018) and COREv2 (Large & Yeager, 2009) forcing data, respectively, are representative of late twentieth century observations. Following Huber and Zanna (2017), in MITgcm the surface layer is relaxed to climatologies of SST (θ * ) and sea surface salinity (S * ) on time scales of 60 and 90 days, respectively. A similar restoration of SST and SSS in NEMO3.4 and ACCESS-OM2 is also applied ( Table 1). For the coupled AOGCMs (HadCM3 and CanESM5), initial conditions are obtained from a long-running spinup simulation with prescribed preindustrial control greenhouse gas concentrations and aerosol forcing. By definition, the HadOM3 initial conditions are identical to HadCM3, since both branch from an identical coupled spinup simulation.

FAFMIP Control Experiment: faf-passiveheat
The control experiment, with no prescribed external forcing, is termed faf-passiveheat, following the CMIP6 nomenclature. The faf-passiveheat simulations for HadCM3 and CanESM5 are performed by continuing the respective spinup simulations for a further 70 years. For HadOM3, daily atmosphere-ocean and sea ice-ocean buoyancy and momentum fluxes from the HadCM3 faf-passiveheat simulation are prescribed directly to the HadCM3 ocean component with no atmospheric or sea ice coupling. Hence, the HadOM3 and HadCM3 faf-passiveheat simulations are identical. The faf-passiveheat simulations in the other  (Chylek et al., 2011) preindustrial control (piControl, Taylor et al., 2012 following (Huber & Zanna, 2017 Preindustrial control Gordon et al. (2000) OGCMs are performed by first diagnosing and saving 70 years of high temporal frequency (6-hourly in ACCESS-OM2 and NEMO3.4 and daily in MITgcm and GFDL-MOM5) surface fluxes of momentum and buoyancy, including any effective fluxes from restoration (ACCESS-OM2, MITgcm, and NEMO3.4) and/or bulk formulae (GFDL-MOM5). Then, these simulations are rerun with these saved fluxes prescribed and no surface restoration or bulk formulae applied.
An illustrative example is provided for faf-passiveheat using MITgcm. Consider temperature, θ, and salinity, S, in the surface layer. Forced advection and mixing/diffusion of temperature and salinity are governed by and where ∇ is the three-dimensional (3-D) gradient operator; u is the 3-D resolved velocity vector; κ represents the 3-D diffusivity tensor; λ θ and λ S are the reciprocals of the temperature and salinity restoration timescales, in s −1 , respectively; ρ 0 = 1,035 kg m −3 , S 0 = 35 psu, and c p = 4,000 J K −1 kg −1 are the reference density, salinity, and specific heat capacity values, respectively; and Δz s = 50 m is the thickness of the surface layer. In this example, −λ θ ðθ − θ * Þ− Q ρ 0 c p Δz s and −λ S ðS − S * Þ− S 0 F Δz s are diagnosed as the effective air-sea heat and freshwater fluxes, including those due to restoration, respectively. In the surface layer, the momentum balance is given by where p is the pressure, f is the Coriolis vector, and ∇ · (κ u ∇ u) represents the effects of viscosity.
In all cases, using high temporal frequency forcing fluxes is essential to the OGCM faf-passiveheat design in order to minimize any drift away from the spinup control climate (such as occurs in Method C of Gregory et al., 2016). We find that with daily or higher-frequency forcing, the drift in circulation metrics (section 3.1) over the 70 year faf-passiveheat simulations is negligible compared to the forced response in the perturbation experiments. For instance, the trend in AMOC strength in faf-passiveheat varies between −5% and +3% across the ensemble. Note that faf-passiveheat uses flux forcing alone, without any adaptive surface restoration, since the aim is to eliminate any processes that would cause surface fluxes to react to the surface state (as occurs in Method B of Gregory et al., 2016).

Journal of Advances in Modeling Earth Systems
experiments, but the faf-water and faf-stress experiments are done exactly according to the AOGCM protocol. Figure 1 shows the annual mean of the FAFMIP surface perturbations. In order to restrict excessive ocean cooling, negative Q′ over the Barents and Kara sea regions (ocean grid points between 15-135°E and north of 60°N) is reset to zero. This has the effect of adding approximately 3% to the globally integrated atmosphere to ocean heat flux perturbation relative to the original Q′ presented by Gregory et al. (2016). Since this difference is small relative to the global mean surface heat flux perturbation (1.8 W m −2 ), we continue to refer to the present heat flux perturbation experiment as faf-heat or faf-heat-o. Any of the perturbation experiments could lead to changes in ocean circulation and heat convergence, which might result in SST below freezing. However, this effect is typically found to be small and localized, with further detail provided in Appendix A.
The AOGCM faf-heat simulations for HadCM3 and CanESM5 are performed following Method B of the FAFMIP protocol (Bouttes & Gregory, 2014;Gregory et al., 2016). Under Method B, a redistributed passive temperature tracer, θ R , is introduced in the ocean component and set to the initial faf-passiveheat temperature field at the beginning of the simulation. An added passive temperature tracer, θ A , which is initialized at 0 everywhere, is also introduced and experiences only the surface heat flux perturbation, Q′ (Banks & Gregory, 2006). Meanwhile, the active temperature tracer, θ, experiences Q + Q′, where Q is the surface heat flux computed by the atmosphere-ocean or sea ice-ocean coupler from using the surface layer of θ R instead of the prognostic SST. Subsequently, Q is diagnosed online and prescribed to the θ R tracer, and hence θ = θ R + θ A by design. Importantly, the Q prescribed to θ R can be different to the surface heat flux in the faf-passiveheat experiment, since θ R can be different from the faf-passiveheat θ due to changes in advection or mixing. Hence, Method B introduces an atmosphere-ocean feedback caused by the redistributed temperature change, as shown in Figure 2.
The OGCM faf-heat-o simulations are performed by directly applying Q′ to the respective faf-passiveheat surface heat fluxes. The HadOM3 simulation follows an approach that is slightly different from the Method C protocol described by Gregory et al. (2016). Under Method C, the sea ice model interacts with the redistributed SST, which may lead to a change in the sea ice-ocean buoyancy fluxes relative to fafpassiveheat, whereas in HadOM3 sea ice interacts with the actual SST, as usual. In addition, the Method C protocol suggested using a monthly climatology, which produced a substantial drift in the ocean state, whereas HadOM3 uses higher-frequency surface forcing like the OGCMs in this study and suffers negligible drift. Similar to the AOGCM experiments, an added passive temperature tracer, θ A , is also introduced in the OGCMs. The redistributed temperature field, θ R , can then be computed off-line as the difference θ − θ A . Consequently, the only difference between the faf-heat and faf-heat-o simulations is that the latter does not include an atmosphere-ocean feedback caused by redistributed temperature change ( Figure 2). By comparing the faf-heat and faf-heat-o responses in matching AOGCM-OGCM pairs, the effect of this feedback is estimated.
In both AOGCMs and OGCMs, the faf-water and faf-stress simulations are performed by directly applying W ′ and τ′ to the faf-passiveheat freshwater and momentum fluxes, respectively. This choice implicitly assumes that the atmosphere-ocean feedback to redistributed temperature change is fairly small in response to surface freshwater and momentum perturbations.
An OGCM experimental design similar to that used in this study has previously been implemented by Marshall et al. (2015) and Zika et al. (2018). Marshall et al. (2015) included a feedback to damp SST change. This is not done in faf-heat-o and faf-all-o experiments because we specifically wish to avoid surface flux feedbacks on ocean climate change, as it would interfere strongly with the imposed heat flux perturbation. Annual mean temperature, salinity, velocity, and DSL diagnostics for each simulation are saved on each model's native grid. In addition, temperature and salinity tendency diagnostics, as presented in Table 4 of Gregory et al. (2016), are saved as annual means. Kuhlbrodt et al. (2015) review the use of temperature tendency diagnostics in previous studies, demonstrating that global mean ocean heat content change is largely a balance of downward advection changes in the extratropics, compensated by changes in upward isopycnal mixing, mainly in the Southern Ocean. In the following analysis, regional, basin, and global means are computed on each model's native grid. For spatial intercomparisons, all model data are bilinearly interpolated onto the MITgcm regular 2.8°latitude × longitude grid.

Surface Heat Flux Forcing: Coupled (faf-heat) and Ocean-Only (faf-heat-o) Intercomparison
This section examines the OGCM and AOGCM responses in the faf-heat-o and faf-heat simulations, respectively. Discussion of ocean heat content and DSL change focuses on the mean difference between faf-heat/ faf-heat-o and faf-passiveheat during the last decade, Years 61-70, of each experiment.

Ocean Circulation
Among the OGCM and AOGCM ensemble, the 70 year mean faf-passiveheat AMOC strength, Ψ AMOC , varies between 11 and 21 Sv (1 Sv ≡ 10 6 m 3 s −1 , unless otherwise stated the notation "varies between x-y" indicates the ensemble minimum, x, and maximum, y, values). Here, Ψ AMOC is defined as the maximum of the overturning stream function between 20°N and 60°N, and beneath 500 m depth, calculated from the residual overturning stream function (Huber & Zanna, 2017). This spread in control AMOC strength is consistent with the CMIP5 multimodel ensemble preindustrial control simulations (Wang et al., 2014). Huber and Zanna (2017) found that spread in AMOC strength among CMIP5 models is dominated by differences in high-latitude surface heat fluxes. The relative faf-heat/faf-heat-o AMOC strength change, ΔΨ AMOC , over time is presented in Figure 3a. The rate of AMOC weakening in faf-heat and faf-heat-o typically slows over time, consistent with coupled AOGCM simulations under greenhouse gas forcing . After 60 years in faf-heat and fafheat-o, ΔΨ AMOC ranges between −20% and −50%. This spread in relative AMOC response suggests differences in model sensitivity to identical surface perturbations is relatively high. Assessing the absolute ΔΨ AMOC changes, MITgcm is an outlier (−11 Sv), compared to the spread from the rest of the ensemble (−6 to −3.5 Sv). This is consistent with the findings of Gregory and Tailleux (2010), who demonstrated an association between control AMOC strength and absolute AMOC strength change. For the two pairs of coupled atmosphere-ocean and ocean only simulations, AMOC weakening is 10% larger in the coupled relative to the ocean-only configuration.
The MITgcm, GFDL-MOM5, and ACCESS-OM2 cases demonstrate substantially less ΔΨ AMOC high frequency variability in comparison to other ensemble members. This is likely due to the relatively lowfrequency, monthly faf-passiveheat background surface fluxes (Q, W, and τ), which are linearly interpolated to daily or subdaily frequency and applied in these three OGCMs. Effective surface heat and freshwater fluxes from weak SST and SSS restoring, respectively, are prescribed in the MITgcm and ACCESS-OM2 faf-passiveheat simulations and act to mitigate high frequency AMOC variability. In contrast, daily and subdaily faf-passiveheat surface fluxes from an interactive atmosphere are applied in HadOM3/HadCM3 and NEMO3.4/CanESM5, respectively.
In each model, the barotropic stream function decreases along the western boundary of the North Atlantic in faf-heat/faf-heat-o and increases near the subpolar gyre (not shown). This suggests a general weakening of the North Atlantic subtropical and subpolar gyre circulation, which is consistent with the simulated AMOC weakening (Figure 3a). There is no consensus among the ensemble of a change in the ACC strength, measured by the Drake Passage transport, Ψ ACC , following the definition used by Huber and Zanna (2017). Some models indicate a very strong Ψ ACC weakening, such as ACCESS-OM2 (−7.7%) and GFDL-MOM5 (−12.4%), Global-and basin-scale vertical profiles of faf-heat and faf-heat-o temperature changes (Figure 4) are typically consistent with corresponding vertical profiles of added temperature changes ( Figure 5). In particular, warming from surface added temperature changes in the Atlantic is intensified in coupled models. This is linked to the enhanced AMOC weakening in AOGCMs, which leads to less downward advection of added temperature changes relative to OGCMs. Similarities between temperature and added temperature change suggest that heat content change at basin scales is largely due to the passive advection and mixing of a net heat input at the surface, consistent with previous coupled FAFMIP simulations (Gregory et al., 2016). In contrast to basin scale total and added temperature changes, corresponding redistributed temperature changes are relatively small (not shown), with global mean absolute redistributed SST changes less than 0.5°C.

Ocean Heat Content
Depth-integrated ocean heat content change at each grid point in J m −2 is defined as ΔOHC ¼ ∫ H 0 c p ρ 0 Δθ dz, assuming c p = 4,000 J K −1 kg −1 is a fixed specific heat capacity, ρ 0 = 1,035 kg m −3 is a constant reference density, Δθ is the temperature change, and H is the spatially varying ocean depth. Regional patterns of the depth integrated total, added and redistributed ocean heat content change in faf-heat and faf-heat-o, are shown in Figures 6-8, respectively. All models indicate increased total heat content in the midlatitude relative to the high-latitude Southern Ocean where there is a net surface heat input. The pattern and magnitude of Southern Ocean total heat content change are similar to the added heat content change (Figure 7), while the redistributed heat content change has a much smaller magnitude and typical cooling effect in this region. This suggests that Southern Ocean heat content change is largely due to northward Ekman transport and subduction of heat input at high latitudes (Armour et al., 2016;Gregory et al., 2016;Zanna et al., 2019). The similarity between OGCM and AOGCM ΔOHC implies that the Method B atmospheric feedback in coupled simulations has a minimal role in affecting heat content in the Southern Ocean.
The North Atlantic demonstrates the largest spread across the ensemble in the pattern of ocean heat content change. This is linked with the large spread in simulated AMOC changes (Figure 3a), which modulates the northward heat flux into the North Atlantic. A region of substantial heat loss in MITgcm, GFDL-MOM5,

Journal of Advances in Modeling Earth Systems
ACCESS-OM2, and to a lesser extent in HadOM3, is present in the midlatitude North Atlantic. This heat loss is largest in MITgcm, which is likely due to it having the strongest absolute AMOC weakening across the ensemble (section 3.1). In contrast, NEMO3.4 and the two coupled simulations indicate increased heat content in this region. Examining the added and redistributed heat content patterns in the North Atlantic, we see that the total heat content change is a small residual of the sum of these two terms. The spread in added heat content, which warms the North Atlantic, is much smaller than the spread in redistributed heat content, which generally cools the midlatitude North Atlantic, due to the reduced northward heat transport by the weakened AMOC. In the two MOM5 simulations, GFDL-MOM5 and ACCESS-OM2, the main difference is the background state. The redistributed heat content change is of a larger magnitude across the Atlantic in ACCESS-OM2 relative to GFDL-MOM5, consistent with stronger AMOC weakening in the former (28%) relative to the latter (18%). However, added heat content change in the extratropical North Atlantic is similar (>10 GJ m −2 ) in both ACCESS-OM2 and GFDL-MOM5. These results suggest that differences in circulation change, and the background temperature, are primary in setting the heat content change in the North Atlantic.
Comparing the ocean-only and coupled cases, we find that there is a greater depth integrated total heat content increase in the midlatitude North Atlantic, and less decreased heat content in the tropical Atlantic, in the latter. In the midlatitude North Atlantic, added heat content increases (Figure 7) are slightly weaker in the coupled simulations, while redistributed heat loss (Figure 8) is much weaker. However, it is important to note that in AOGCMs, the redistributed heat change also includes the effect of additional air-sea heat flux

Journal of Advances in Modeling Earth Systems
changes from the atmospheric feedback. Examining the vertical profile of redistributed temperature change, Δθ R , reveals that North Atlantic cooling is more concentrated and stronger near the surface in AOGCMs in comparison to OGCMs. Consequently, in AOGCMs, the surface θ R minus air temperature gradient is steeper relative to the implicit SST minus air temperature gradient contributing to the atmosphere-ocean heat flux in OGCMs. This leads to an additional surface heat input at high latitudes from the atmosphere in coupled simulations, relative to ocean-only simulations (Figure 2). In the tropical Atlantic, θ R warming is more concentrated near the surface in AOGCMs relative to OGCMs. Hence, in AOGCMs, there is additional heat loss to the atmosphere over the tropical Atlantic, which is balanced by the extra heat input at higher latitudes.
In the Pacific, the ocean heat content change is typically more homogeneous than in the Atlantic and Southern oceans across the multimodel ensemble, at approximately 1-2 GJ m −2 . Similar to the Atlantic, Pacific warming from added heat is typically larger at midlatitudes (3-4 GJ m −2 ) and weaker at low latitudes (0-1 GJ m −2 ). This added heat content change pattern is offset by a slight cooling due to redistribution at high latitudes, and a warming from redistribution in the tropics.
The surface heat flux change in AOGCMs is tightly coupled to the AMOC change, as summarized in Figure  2. Total heat gain and loss at high and low latitudes in the Atlantic, respectively, weaken the meridional density gradient, causing the AMOC to weaken. This AMOC weakening reduces northward heat transport, which leads to further surface θ R cooling at high latitudes and warming at low latitudes, contributing to

Journal of Advances in Modeling Earth Systems
the AMOC weakening in coupled models. This mechanism is consistent with the simulated 10% additional AMOC weakening in AOGCMs relative to the OGCMs (Figure 3a). Consequently, the atmospheric feedback due to heat redistribution acts to enhance AMOC weakening in AOGCMs by amplifying the prescribed surface heat flux perturbation.

Temperature Tendency Diagnostics
The depth and weighted time integral of faf-heat and faf-heat-o minus faf-passiveheat temperature tendency terms are now examined to assess which processes contribute to the ΔOHC patterns. Consider the FAFMIP temperature tendency diagnostics: where ∂ t θ total , ∂ t θ resolved , ∂ t θ eddy , ∂ t θ isopycnal , and ∂ t θ diapycnal are the temperature tendencies due to all processes, resolved advection, parametrized eddy advection, isopycnal mixing, and diapycnal mixing, respectively. The resolved advection component is estimated as the residual advection minus the parametrized eddy advection tendency diagnostic.
We define a dimensionless and time varying weighting, w(t), as follows: for year t. Hence, the subsurface depth and weighted time integral of the faf-heat or faf-heat-o temperature tendency change, relative to faf-passiveheat temperature tendency, at each grid point, ΔH j , is ΔH j ðx; yÞ ¼ Δ∫ 70 0 dt wðtÞ∫ H d c p ρ 0 ∂ t θ j ðx; y; z; tÞ dz (6) for component j = total, resolved, eddy, isopycnal, and diapycnal. Here, Δ represents faf-heat or faf-heat-o minus faf-passiveheat difference, and H (in m) denotes the spatially varying ocean depth. Neglecting changes above d, here equal to 100 m, leads to nonzero ΔH diapycnal . The weighting in the time integral ensures that ΔH j represents the mean ocean heat content change between Years 1-10 and 61-70 periods for process j, reducing the effect of interannual variability relative to the faf-heat or faf-heat-o forced response. In practice, the time integral in Equation 6 is discretized using yearly intervals, as only annual mean ∂ t θ j data are available. Hence, the computed ΔH j components do not perfectly represent the Years 61-70 minus Years 1-10 changes. In addition, we note that ΔH total only approximates ΔOHC (defined in section 3.3), since the former represents subsurface changes instead of full depth changes, and both metrics indicate changes over slightly different time intervals.
The multimodel ensemble (MME) mean and standard deviation of ΔH total are presented in Figure 9. Three case study regions are selected for further analysis, as depicted by gray boxes in Figure 9. First, the midlatitude North Atlantic, which shows the largest spread in heat content change (standard deviation >0.5 GJ m −2 ), relative to the local multimodel mean heat loss (−0.1 GJ m −2 ). Second, the western North Pacific over the subtropical gyre, which demonstrates the smallest spread (0.05 GJ m −2 ) relative to the mean heat increase (+0.1 GJ m −2 ). Third, the midlatitude Southern Ocean between 35°S and 55°S, with a moderate spread (0.1 GJ m −2 ) relative to a mean heat increase (+0.15 GJ m −2 ).
For the midlatitude North Atlantic region, there is no agreement on the sign of the total OHC change, with two models simulating a negative ΔH total (MITgcm and GFDL-MOM5) and four models simulating a positive ΔH total (HadOM3, HadCM3, NEMO3.4, and CanESM5), and a large ensemble range (−0.9 to 0.3 GJ m −2 ) as shown in Figure 10a. In every model, the cooling due to resolved advection, ΔH resolved , has the largest magnitude of any ΔH total component. This negative ΔH resolved is consistent with the simulated AMOC weakening under faf-heat/faf-heat-o (Figure 3a), which reduces northward heat transport in the North Atlantic. The resolved advection cooling is typically weakly opposed by a positive ΔH eddy component, which represents warming due to a change in parametrised eddy advection from the GM scheme. Since the residual (resolved plus eddy) advection component is always negative, it is the level of compensation by the

Journal of Advances in Modeling Earth Systems
typically positive ΔH isopycnal and ΔH diapycnal terms, which determines the sign of ΔH total . The ΔH diapycnal change is likely associated with decreased convective mixing at higher latitudes. As the diapycnal and isopycnal mixing contributions are relatively small, the spread in midlatitude North Atlantic ΔH total is mainly dominated by the spread in resolved advection change. This spread in the OHC change budget due to circulation change differences is consistent with the findings of Exarchou et al. (2015), who analyzed three different AOGCMs.
In the western North Pacific subtropical gyre region, all models simulate a positive ΔH total , with relatively small ensemble spread compared to the midlatitude North Atlantic. Despite the consistency in ΔH total , no single process dominates the spread. For example, the heat increase in HadCM3, ΔH total = 0.1 GJ m −2 , is mainly driven by resolved advection change, ΔH resolved = 0.12 GJ m −2 . In contrast, heat increase in HadOM3, ΔH total = 0.08 GJ m −2 , is mainly a balance of diapycnal mixing, 0.12 GJ m −2 , and resolved advection, −0.07 GJ m −2 , changes. This highlights a substantial contrast in the heat content change processes between a matching AOGCM/OGCM pair of models. Typically, ΔH resolved and ΔH diapycnal are the components of ΔH total with the largest magnitudes. Hence, the balance of mainly warming from resolved advection and cooling from diapycnal mixing determines the subsurface OHC change.
For the midlatitude Southern Ocean, ΔH total is consistently positive (Figure 10c), but the ensemble range (0.08 to 0.26 GJ m −2 ) is greater relative to the western North Pacific (0.03 to 0.13 GJ m −2 ). In the majority

Journal of Advances in Modeling Earth Systems
of models, heating from ΔH isopycnal , and to a lesser extent ΔH diapycnal , largely dominate the spread and ensemble mean of ΔH total . MITgcm is an outlier, where isopycnal mixing change contributes a cooling, and instead, a positive contribution from diapycnal mixing is the largest component of ΔH total . Notably, both the resolved and parametrized eddy advection terms are much smaller in magnitude than ΔH total , and typically negative. This suggests that residual advection changes in faf-heat and faf-heat-o play a minimal role in setting the midlatitude Southern Ocean warming. This partially contrasts with the findings of Kuhlbrodt et al. (2015), who showed that subtropical Southern Ocean warming is mainly due to residual advection changes, while higher-latitude Southern Ocean warming is largely due to reduced vertical isopycnal mixing. However, our study examines a broader latitude band than Kuhlbrodt et al. (2015), containing much of the overturning region, and focuses on volume integrated isopycnal mixing changes instead of the change in the vertical heat flux component used by Kuhlbrodt et al. (2015), perhaps explaining the disparity. Furthermore, the dominance of isopycnal and diapycnal mixing change in the Southern Ocean heat budget in faf-heat and faf-heat-o is broadly consistent with the findings of Gregory (2000) and Exarchou et al. (2015).

Dynamic Sea Level
Across the ensemble, the faf-heat and faf-heat-o simulated DSL change, Δζ, is generally a 20 cm fall across the high latitude Southern Ocean and a weaker, 10 cm rise across much of the tropical and subtropical Atlantic, as shown by the contour lines in Figure 6. Over the North Pacific, a relative sea level rise of

Journal of Advances in Modeling Earth Systems
10 cm is consistently simulated over the subtropical gyre, with a relative sea level fall of approximately 8 cm simulated over the subpolar gyre. Similar to ocean heat content change, the largest Δζ spread is over the North Atlantic, consistent with the findings of Gregory et al. (2016). In MITgcm, GFDL-MOM5, and ACCESS-OM2, a relative sea level fall is simulated over the North Atlantic subpolar gyre, with a relative sea level rise over the subtropical gyre. In contrast, HadOM3 and HadCM3 simulate a sea level rise over the subpolar gyre, and a weaker relative sea level fall at midlatitudes. NEMO3.4 and CanESM5 both simulate a relative sea level rise across much of the North Atlantic. Bouttes et al. (2013) suggest the simulated DSL change pattern of relative sea level rise over the subpolar gyre and fall over the subtropical gyre under CO 2 forcing is largely due to the surface heat flux change. Simulated temperature and salinity changes in faf-heat and faf-heat-o are used to decompose DSL changes into steric, Δθ steric , thermosteric, Δζ θ , and halosteric, Δζ S , contributions (Pardaens et al., 2011) using a nonlinear equation of state (TEOS-10, McDougall & Barker, 2011). Simulated DSL changes are almost entirely steric, with a negligible contribution from barotropic changes (not shown). In the Alantic, the thermosteric DSL change (Figure 11) largely balances the halosteric ( Figure 12) DSL change, leaving the steric DSL change as a small residual (Lowe & Gregory, 2006). The thermosteric DSL change closely resembles the total heat content change ( Figure 6). In the Pacific, the compensation of thermosteric and halosetric DSL change is typically weaker than in the Atlantic (Durack et al., 2014), but the magnitude of Δζ θ and Δζ S are relatively smaller in the former. Both the thermosteric and halosteric components show substantial spread in the North Atlantic, each contributing to the large spread in simulated DSL change.

Freshwater, Momentum, and All Surface Flux Forcing: faf-water, faf-stress, faf-all, and faf-all-o Intercomparison
This section explores the ocean's response in the faf-water, faf-stress, and faf-all/faf-all-o experiments. Similar to section 3, analysis focuses on the mean simulated change during Years 61-70.

Ocean Circulation
Simulated AMOC weakening across the ensemble is of a similar order of magnitude in faf-all, 20-50%, as in faf-heat and faf-heat-o, as shown by Figure 3b. This suggests that the addition of surface freshwater and momentum fluxes in OGCMs has only a secondary effect to surface heat fluxes in modulating AMOC changes (as found for AOGCMs Gregory et al., 2016). After 60 years in faf-water and faf-stress, ΔΨ AMOC typically has a magnitude smaller than 10% of Ψ AMOC (Figures 3c and 3d). The relatively weak AMOC response under the freshwater flux change experimentis perhaps unsurprising since the faf-water perturbation over the high-latitude North Atlantic is approximately one eighth the magnitude of the corresponding faf-heat perturbation in buoyancy flux units (not shown).
Examining ACC changes, in faf-stress there is a consistent strengthening (between 3% and 7%), with faf-water demonstrating a weakening (−2% to −13%). The ACC change in faf-all demonstrates no consistency among the ensemble, with only relatively weak magnitudes (−3% to 2%). However, the individual surface flux perturbations combine relatively linearly, with wind-driven strengthening of the ACC largely canceled by the freshwater flux-driven weakening (not shown). All models simulate a weakening of AABW overturning in faf-water (−0.8% to −18%), and a strengthening of AABW overturning in faf-stress (3% to 20%). Examining the faf-all AABW overturning response, there is no consistency among the ensemble (−18% to 21%); however, the faf-heat/faf-heat-o, faf-water, and faf-stress responses combine relatively linearly (not shown). These results suggest that ACC and AABW overturning changes are linked, which is consistent with geostrophic balance (Shakespeare & Hogg, 2012).

Ocean Heat Content and DSL
The ocean heat content and DSL change for faf-water and faf-stress are presented in Figures 13 and 14, respectively. An area of consensus for the faf-water simulation is the Southern Ocean, where all models simulate heat loss at midlatitudes and heat gain around the Antarctic coastline. This is accompanied by a rise and fall in DSL at high and middle Southern Ocean latitudes, respectively, suggesting the DSL change in this region is largely thermosteric. Gregory et al. (2016) suggest the input of freshwater at high Southern Ocean latitudes in faf-water acts to stratify the water column, reducing upward convection and surface heat loss.
A second area of consensus in faf-water is the western subtropical North Atlantic, where all models simulate moderate heat increases, of approximately one quarter the corresponding heat increases simulated in faf-heat and faf-heat-o. In all models except for NEMO3.4, this region of warming is collocated with a fall in DSL. This implies that the negative halosteric component, from increased salinity, typically has a larger magnitude than the positive thermosteric component, from increased temperature. In CanESM5, HadCM3, and MITgcm, the pattern of heat content increases in the North Atlantic extends northward into the midlatitude and subpolar regions. In contrast, the other four ensemble members simulate a weak heat loss in the midlatitude and subpolar North Atlantic. Since no surface heat perturbation is included in fafwater, the heat content change is entirely due to circulation change leading to heat redistribution.
Similar to faf-water, patterns of ocean heat content change are relatively weak in faf-stress, in comparison to faf-heat and faf-heat-o. As in faf-water, the Southern Ocean is a major area of consensus in faf-stress, but with the opposite pattern: heat loss and DSL fall, and heat increases and DSL rise, at high and midlatitudes, respectively. This pattern in the Southern Ocean can be explained by the enhanced northward Ekman transport in faf-stress, due to the increased surface westerly wind stress, causing an advection of heat from the high to midlatitudes. Consequently, in the ocean-only ensemble, the FAFMIP momentum flux perturbation acts to increase the DSL gradient in the Southern Ocean, while the freshwater perturbation weakly weakens the DSL gradient, consistent with previous studies (Bouttes & Gregory, 2014;Saenko et al., 2015). An area of major disagreement in faf-stress is the North Atlantic, where there is no consensus on the pattern of ocean heat content or Δζ change. However, the magnitude of the spread in faf-stress North Atlantic responses is much smaller than in both faf-heat/faf-heat-o and faf-water. This suggests that the uncertain response to surface momentum flux perturbations is of second order to the uncertainty in the overall heat content change and Δζ response.
There is strong similarity between the faf-heat and faf-heat-o ( Figure 6) and faf-all ( Figure 15) patterns of ocean heat content and Δζ. This is consistent with previous studies, suggesting uncertainty in patterns of ocean heat content change and corresponding Δζ change is largely driven by uncertainty in the response to surface heat flux perturbations (Gregory et al., 2016;Lowe & Gregory, 2006), as discussed in section 3. Furthermore, the AMOC responses for each model are similar in faf-heat/faf-heat-o and faf-all/faf-all-o.

Conclusions
This study has examined the ocean's response to surface momentum and buoyancy flux perturbations in an ensemble of OGCMs and AOGCMs. A novel, ocean-only FAFMIP experimental design is presented, where high temporal frequency surface fluxes are prescribed to OGCMs in order to simulate a control state, fafpassiveheat, with minimal drift and without surface restoration or bulk formulae. Therefore, by design, the spread in the response of individual models to identical surface forcing taken from a 1%CO 2 year −1 experiment can only be attributed to the ocean model structure. By prescribing this model independent surface flux perturbation to different OGCMs, we find a large model structure driven spread in the simulated regional ocean heat content (OHC) and DSL change. This spread from OGCMs is of a similar order of magnitude as the spread among AOGCMs with prescribed greenhouse gas and aerosol forcing. Furthermore, with the ocean-only FAFMIP design, the ocean response is simulated without atmosphere-ocean feedbacks. This permits the atmosphere-ocean feedback introduced by the coupled FAFMIP method (Bouttes & Gregory, 2014;Gregory et al., 2016) to be quantified by comparing ocean-only and coupled FAFMIP simulations with consistent ocean model components.
The North Atlantic is the region that demonstrates the largest spread in the ocean-only response to surface buoyancy and momentum forcing after 70 years. Under surface heat flux forcing in OGCMs, the faf-heat-o 10.1029/2019MS002027 Journal of Advances in Modeling Earth Systems experiment, the total OHC change is a small residual of the added heat increase and redistributed heat decrease. Decomposing the heat budget for the midlatitude North Atlantic, residual advection weakening typically leads to a cooling tendency, which is weakly offset by a warming due to changes in diapycnal mixing. The added heat increase is largely passive (e.g., the contribution from changes in circulation is small) and is focussed in the middle and high latitudes of the North Atlantic where the surface heat flux perturbation is positive. However, it is the spread in the redistributed heat content change, and hence circulation change, which dominates the spread in total heat content change in the North Atlantic and associated patterns. For instance, there is a broad spread in simulated AMOC weakening (20-50%) under faf-heat-o among OGCMs. Models with more AMOC weakening (MITgcm and GFDL-MOM5) typically have a net cooling of the North Atlantic, whereas models with less AMOC weakening (NEMO3.4 and CanESM5) simulate a net warming.
For regions outside of the North Atlantic, such as the western North Pacific and Southern Ocean, there is typically less spread in the OHC change under faf-heat-o across the OGCM responses. In the western North Pacific, this low spread in OHC change occurs despite large differences in the contributing processes. For the Southern Ocean, there is generally a relative warming and cooling middle and high latitudes, respectively. Similar to the North Atlantic, warming in the midlatitude Southern Ocean is partially due to an added heat increase from a largely passive uptake of the local positive surface heat flux perturbation. We find that warming in the midlatitude Southern Ocean across the ensemble is largely due to enhanced isopycnal and diapycnal mixing, instead of residual advection change as suggested by previous studies (Bouttes & Gregory, 2014;Lowe & Gregory, 2006;Saenko et al., 2015). The dominance of isopycnal mixing over residual circulation change for midlatitude Southern Ocean warming could be due to the relatively wide-latitude band and depth integral beneath 100 m used in our analysis. This latitude band encompasses much of the residual overturning circulation. Hence, changes in overturning may only lead to small regional and depth integrated heat changes due to compensation of the upper and lower branch of the circulation. In addition, the relatively coarse OGCMs used in our study may simulate the heat content change processes via isopycnal mixing parametrizations instead of accurately resolving the overturning circulation.
Consistent with the findings of Gregory et al. (2016), who assessed coupled FAFMIP simulations with CMIP5 models, much of the spread in the ocean response to surface buoyancy and momentum flux perturbations in ocean-only simulations is driven by the response to the surface heat flux perturbation. For instance, our study shows, similarly to Huber and Zanna (2017), that surface heat flux perturbations dominate the spread in AMOC change, compared to changes in surface freshwater or momentum fluxes. However, unlike Huber and Zanna (2017), our study also shows a large sensitivity in AMOC due to ocean model structure. However, Huber and Zanna (2017) used a mix of surface restoring and imposed fluxes diagnosed from CMIP5 models; therefore, each model structure is somewhat imprinted in the surface boundary conditions imposed. As such, our study and Huber and Zanna (2017) are not inconsistent in their results.
Similar to the OHC and circulation responses, DSL changes, Δζ, are found to be mainly driven by the surface heat flux perturbation. Over the North Atlantic, there is a wide spread in the Δζ response, which is matched by large spread in both of the contributing thermosteric and halosteric components. These salinity-and temperature-driven Δζ changes largely cancel (Lowe & Gregory, 2006;Pardaens et al., 2011), but both terms contribute to the overall spread. Agreement among the OGCM ensemble over the Pacific sector of the Southern Ocean is relatively higher than the Atlantic sector, with between −3 and −4 GJ m −2 OHC change locally, contributing to Δζ ≈ −0.1 m in each model via a negative thermosteric component.
An important finding is that faf-heat, using the coupled FAFMIP method (Bouttes & Gregory, 2014;Gregory et al., 2016), causes 10% additional AMOC weakening relative to faf-heat-o, the ocean-only method presented in this study. In the coupled FAFMIP method, the surface heat flux is calculated using a redistributed ocean temperature tracer instead of the active temperature tracer. Hence, in faf-heat, a weakened AMOC leads to reduced northward heat transport, increasing the atmospheric temperature-redistributed SST difference over the high-latitude North Atlantic and amplifying the atmosphere-ocean surface heat flux change relative to that prescribed in faf-heat-o.
The coupled FAFMIP atmosphere-ocean feedback largely occurs over regions where the circulation is particularly sensitive to surface heat flux changes (Delworth & Greatbatch, 2000). Comparing the faf-heat (AOGCM) and faf-heat-o (OGCM) responses, the main inconsistencies in DSL change are over the North Atlantic. These Δζ differences are mainly due to differences in the thermosteric contribution, with increased warming in the subpolar North Atlantic in faf-heat relative to faf-heat-o due to the uptake of an amplified surface heat flux change in the coupled simulations. In general, the effect of the coupled FAFMIP feedback under the surface heat flux perturbation is small outside of the North Atlantic, and small globally under surface freshwater and momentum perturbations.
There are two caveats to the ocean-only FAFMIP protocol presented in this study. First, in the MOM5 cases, unphysical multidecadal oscillations in AMOC strength are simulated in response to the surface buoyancy flux forcing (Figures 3a-3c). These AMOC oscillations limit our ability to detect a forced response relative to internal variability. These AMOC oscillations are reminiscent of those simulated by ocean-only models of varying complexities under mixed boundary conditions (MBCs), where only surface temperatures are restored (Weaver & Sarachik, 1991, present a review). No surface restoration is used in our ocean-only FAFMIP protocol, but similar processes may excite the AMOC oscillations similar to those induced by MBCs. An option to suppress the AMOC oscillations in ocean-only FAFMIP simulations would be to introduce weak SSS restoring (Griffies et al., 2009). However, the pattern to which the surface salinity should be restored to in future climate change experiments is unknown. Therefore, the simulated AMOC change will be sensitive to both the prescribed SSS pattern choice and restoring time scale. An alternative option would be to perform single ocean model ensemble simulations from different initializations and use the ensemble mean to reduce internal variability (Zanna et al., 2018). A second caveat is the issue with ocean temperatures falling below freezing point in the ocean-only experiments, since no surface temperature restoring is used and the sea ice model only interacts with the redistributed temperature (Gregory et al., 2016). In our protocol, we diminish unphysical behaviors by constraining the density of water parcels using the equation of state (Appendix A). However, the treatment of temperatures below freezing and their impact on interior transport remains an open question.
The novel ocean-only FAFMIP protocol presented in this study provides a clearer assessment of the direct ocean response to surface forcing than the coupled protocol used in the AOGCM FAFMIP (Gregory et al., 2016). By excluding all atmosphere feedbacks, it guarantees that the spread of ocean responses can only be due to the differences among the ocean models, in respect of their formulation and control state, which is the focus of FAFMIP. This method was not originally proposed for FAFMIP because flux-forced OGCM experiments with no restoration had not been successful until recently; moreover, the method requires new technical work and large ancillary files. Therefore, so far, it has been tried with relatively few OGCMs. To address the uncertainty in projections of ocean climate change, it would be very useful if these new experiments could be carried out with more OGCMs, and especially useful with OGCMs, which are the ocean components of CMIP6 and other AOGCMs used for climate projection. In these cases, it is important to set up the OGCM to have the same control (faf-passiveheat) state as it does in the AOGCM, by using surface fluxes of high temporal frequency obtained from the AOGCM control experiment. This is necessary in order to make pairs of coupled and uncoupled FAFMIP experiments strictly comparable, thus enabling the role of coupled feedbacks to be quantified, and the OGCM results to be applied directly in interpreting the spread of AOGCM projections. from a cooling of the layer immediately beneath. If a flux perturbation causes the whole water column to freeze, the remaining negative heat flux is lost from the system.

Data Availability Statement
All data used in this study, except for CanESM5 model output, is stored on the Centre for Environmental Data Analysis (CEDA) archive and is publicly available online (at https://gws-access.ceda.ac.uk/public/ ukfafmip/). CanESM5 model output produced for FAFMIP is available separately and publicly as part of the CMIP6 archive on the World Climate Research Programme's Earth System Grid Federation.