The representation of alkalinity and the carbonate pump from CMIP5 to CMIP6 Earth system models and implications for the carbon cycle

Ocean alkalinity is critical to the uptake of atmospheric carbon in surface waters and provides buffering capacity towards the associated acidification. However, unlike dissolved inorganic carbon (DIC), alkalinity is not directly impacted by anthropogenic carbon emissions. Within the context of projections of future ocean carbon uptake and potential ecosystem impacts, especially through Coupled Model Intercomparison Projects (CMIPs), the representation of alkalinity and the main driver of its distribution in the ocean interior, the calcium carbonate cycle, have often been overlooked. Here we track the changes from CMIP5 to CMIP6 with respect to the Earth system model (ESM) representation of alkalinity and the carbonate pump which depletes the surface ocean in alkalinity through biological production of calcium carbonate and releases it at depth through export and dissolution. We report an improvement in the representation of alkalinity in CMIP6 ESMs relative to those in CMIP5, with CMIP6 ESMs simulating lower surface alkalinity concentrations, an increased meridional surface gradient and an enhanced global vertical gradient. This improvement can be explained in part by an increase in calcium carbonate (CaCO3) production for some ESMs, which redistributes alkalinity at the surface and strengthens its vertical gradient in the water column. We were able to constrain a particulate inorganic carbon (PIC) export estimate of 44–55 Tmol yr−1 at 100 m for the ESMs to match the observed vertical gradient of alkalinity. Reviewing the representation of the CaCO3 cycle across CMIP5/6, we find a substantial range of paramPublished by Copernicus Publications on behalf of the European Geosciences Union. 1196 A. Planchat et al.: The representation of alkalinity and the carbonate pump eterizations. While all biogeochemical models currently represent pelagic calcification, they do so implicitly, and they do not represent benthic calcification. In addition, most models simulate marine calcite but not aragonite. In CMIP6, certain model groups have increased the complexity of simulated CaCO3 production, sinking, dissolution and sedimentation. However, this is insufficient to explain the overall improvement in the alkalinity representation, which is therefore likely a result of marine biogeochemistry model tuning or ad hoc parameterizations. Although modellers aim to balance the global alkalinity budget in ESMs in order to limit drift in ocean carbon uptake under pre-industrial conditions, varying assumptions related to the closure of the budget and/or the alkalinity initialization procedure have the potential to influence projections of future carbon uptake. For instance, in many models, carbonate production, dissolution and burial are independent of the seawater saturation state, and when considered, the range of sensitivities is substantial. As such, the future impact of ocean acidification on the carbonate pump, and in turn ocean carbon uptake, is potentially underestimated in current ESMs and is insufficiently constrained.


Introduction
The ocean is a major carbon sink, absorbing a quarter of anthropogenic carbon emissions each year (Friedlingstein et al., 2022), limiting atmospheric CO 2 growth rate and hence anthropogenic warming. The cumulative ocean carbon sink is estimated at 170 ± 35 GtC over 1850-2020(Friedlingstein et al., 2022, rising to 290 ± 30 GtC under the emission scenario SSP1-2.6 (shared socio-economic pathway) and to 520 ± 40 GtC under SSP5-8.5 by 2100 (Liddicoat et al., 2021;Canadell et al., 2021). Carbon uptake by the ocean is not without consequences for marine ecosystems, as it leads to seawater acidification (Doney et al., 2009;Gattuso and Hansson, 2011), which poses a threat to many marine organisms (e.g. Dutkiewicz et al., 2015;Mostofa et al., 2016), particularly calcifying species (Ilyina et al., 2009;Ridgwell et al., 2009;Lohbeck et al., 2012;Meyer and Riebesell, 2015). In surface waters, the global average pH has already decreased by about 0.1 units since the beginning of the industrial era (Bindoff et al., 2019). Depending on future emission scenarios, projected acidification would result in globalmean surface-ocean pH decreasing by 0.16 to 0.44 in 2080-2099 compared to pre-industrial values .
The absorption of anthropogenic carbon emissions by the ocean is primarily driven by the increasing atmospheric CO 2 concentration and the resulting gradient of CO 2 partial pressure (pCO 2 ) across the air-sea interface. However, in the surface ocean, pCO 2 is controlled by the total amount of dissolved inorganic carbon (DIC) in seawater, but also by sea surface temperature, salinity, and alkalinity, which con-trol CO 2 solubility and the partitioning of DIC between dissolved CO 2 , bicarbonate, and carbonate ions. Total alkalinity (Alk), defined as the excess of proton acceptors over proton donors (Dickson, 1981; essentially the sum of the carbonate, borate, water, phosphoric, silicic, and fluoride alkalinity components), is a central concept in the ocean sciences. Despite multiple definitions that have undoubtedly led to some confusion (Dickson, 1992;Zeebe and Wolf-Gladrow, 2001;Middelburg et al., 2020), Alk has remained a key quantity for studying the ocean carbon cycle, primarily because it (i) is measurable, (ii) is conservative, and (iii) is used to solve the ocean CO 2 system. (i) Alk has been extensively measured by titration methods (Thompson and Anderson, 1940) since the pioneering work of Tornøe (1880) and Dittmar (1884). Today, the Global Ocean Data Analysis Project (GLODAP) compiles Alk measurements from more than 1.3 million water samples collected on almost 1000 cruises covering the global ocean . (ii) Alk is conservative, i.e. unchanged with respect to modifications of temperature and pressure and conserved during mixing of water masses of different properties. It is thus used in oceanic models of the carbon cycle as a prognostic variable (Zeebe and Wolf-Gladrow, 2001;Wolf-Gladrow et al., 2007). (iii) Knowing Alk in combination with any of the variables DIC, pCO 2 , or [H + ]  allows one to compute the entire ocean CO 2 system -i.e. the respective concentrations of CO 2 , HCO − 3 , CO 2− 3 , DIC as well as pH. Alk is dependent on multiple physical and biogeochemical processes, the interpretation of which is not always straightforward. At the ocean surface, it is mainly affected by freshwater fluxes (precipitation, evaporation, sea-ice formation or melting, and riverine discharge) through dilution or concentration. As a result, the surface distribution of Alk shows a strong salinity dependence (Friis et al., 2003) with higher surface Alk values in regions of net evaporation (e.g. subtropical gyres) and lower surface Alk in regions of net precipitation (e.g. near the Equator). In the ocean interior, the Alk distribution is mainly driven by the biological pump, associated with the consumption of DIC and Alk at the ocean surface through biological production, and the remineralization or dissolution of the biogenic material at depth after sinking (Hain et al., 2014). It is predominately the carbonate pump, also called the hard-tissue pump, that drives the Alk distribution in the water column through (1) biotic calcification in the upper ocean, (2) sinking of biogenic calcium carbonate particles, (3) dissolution, and (4) burial of part of this particulate inorganic carbon (PIC) at the seafloor (Fig. 1). Calcification acts as a biological Alk sink in the upper ocean, while dissolution at depth acts as a source. In contrast, the soft-tissue pump -associated with the production, export, and remineralization of organic matter -has much less influence (per mol of organic / inorganic C) on the vertical distribution of Alk through the consumption and release of nutrients, essentially nitrates Wolf-Gladrow et al., 2007). However, the much higher production and export of particulate organic carbon (POC) than PIC mean that it is a major driver of the relative concentrations of Alk and DIC in surface and sub-surface waters and consequently the carbonate ion concentration. By affecting the balance of proton acceptors over proton donors, nitrogen reactions (nitrification and denitrification) can also affect Alk in the water column (Wolf-Gladrow et al., 2007). Finally, ocean circulation also plays a major role in the distribution of Alk, with typically higher surface values in regions of upwelling. On centennial timescales, the global ocean inventory of Alk is thought to be roughly in steady state (Revelle and Suess, 1957) -estimated at about 3150 Pmol )but potential variations are difficult to estimate due to the influence of processes at the ocean boundaries (Middelburg et al., 2020;see Fig. 1). In particular, in addition to freshwater fluxes at the ocean surface, rivers act as an Alk source due to the natural weathering of silicate and carbonate minerals on land, whereas sediment burial and dissolution at the seafloor can act as both a sink and a source (Middelburg et al., 2020).
Preliminary work by Revelle and Suess (1957) to determine how carbon dioxide is partitioned between the atmosphere and the ocean initiated a sustained series of modelling efforts to represent the ocean carbon cycle and its coupling with increasing atmospheric CO 2 . Early modelling studies, using either box models (Siegenthaler and Sarmiento, 1993) or ocean general circulation models (Maier-Reimer and Hasselmann, 1987;Sarmiento et al., 1992), assumed a spatially homogeneous surface Alk to calculate ocean carbon uptake. Sarmiento et al. (1992), for example, used a constant surface Alk of 2300 µeq kg −1 (where "eq" refers to molar equivalent since Alk is a charge balance), recognizing that inclusion of variable Alk would require a model with biology. Some later studies updated the uniform Alk approach by imposing a local surface Alk that varies proportionally with salinity (e.g. the Princeton solubility model, involved in the first phase of the Ocean Carbon Cycle Model Intercomparison Project, OCMIP-1, Sarmiento et al., 2000). In 1990, the pioneering work of Bacastow and Maier-Reimer (1990) introduced an explicit representation of Alk and calcium carbonate cycling in a three-dimensional ocean general circulation model. In this approach, Alk is included as a three-dimensional state variable, and calcium carbonate (CaCO 3 ) formation in the surface ocean is related to the rate of POC production with a spatially and temporally constant rain ratio -defined as the ratio between the export of PIC and POC. The downward flux of CaCO 3 is assumed to decrease exponentially with depth, and all CaCO 3 reaching the seafloor dissolves instantaneously. In a later publication, this approach was updated by Maier-Reimer (1993), with the description of the HAMOCC3 biogeochemical model, in which Alk is also represented as a three-dimensional state variable but with a non-constant rain ratio, a fixed CaCO 3 penetration depth of 2 km, and explicit interactions with the sediment. In the second phase of OCMIP (OCMIP-2, Doney et al., 2004;Orr et al., 2005), the 13 modelling groups adopted a common bio-geochemical framework and followed the approach of Yamanaka and Tajika (1996) based on Bacastow and Maier-Reimer (1990), with explicit Alk and a spatially homogeneous rain ratio. Later developments in the modelling of the global ocean carbon cycle included the implicit incorporation of aragonite in addition to calcite (Gangstø et al., 2008;Dunne et al., 2013) and the recent representation of calcifying plankton functional groups (Buitenhuis et al., 2019;Krumhardt et al., 2019).
The development of marine biogeochemical models that resolve the carbonate pump and, consequently better represent the distribution of Alk in the ocean, has furthered our understanding of the evolution of the carbonate pump and possible feedbacks on ocean carbon uptake and acidification (e.g. Gehlen et al., 2007;Gangstø et al., 2011;Yool et al., 2013a). Representing Alk and CaCO 3 cycling in Earth system models (ESMs) requires marine biogeochemistry modellers to balance the complexity required to evaluate specific processes alongside computational efficiency within the wider context of representing the Earth system, and particularly the carbon cycle response to anthropogenic emissions.
The objectives of this study are (1) to document how the processes affecting Alk are represented in the latest generation of ESMs, and (2) to evaluate the Alk distribution simulated by each of these models against observations. To do this, we use the latest generation of ESMs (from the sixth phase of the Coupled Model Intercomparison Project, CMIP6) and compare these models to those from the previous phase (fifth phase, CMIP5).  Taylor et al., 2012;CMIP6, Eyring et al., 2016). We only consider ESMs for which Alk is not prescribed but determined by physical and biogeochemical processes represented in the models (e.g. calcification and dissolution, or also primary production and remineralization). This leads us to include 17 ESMs for CMIP5 and 19 for CMIP6 (Table 1). In this study, we use " [ESM_name] (CMIP [number])" to refer to a given ESM, indicating whether it was used for CMIP5 or CMIP6, but we also refer to modelling centres or to the specificities of the marine biogeochemical components of the given ESMs.  (1, calcification, 2, sinking, 3, dissolution and 4, burial). For the CMIP5 and CMIP6 ensembles, the global-mean Alk profile and a bar chart of the total particulate inorganic carbon (PIC) export at 100 m are also presented with their associated standard deviation. Also shown are observations from GLODAPv2 for the Alk profile and the observationally derived estimate of PIC export from Sulpis et al. (2021; 1 assessment at 300 m).
In total, this ESM intercomparison encompasses 15 marine biogeochemical models (CMOC, CanOE, BFM, PISCES, WOMBAT, OECO, diat-HadOCC, MEDUSA, HAMOCC, NPZD-MRI, BEC, MARBL, TOPAZ, BLING, COBALT) with different versions and/or configurations depending on the CMIP and the modelling group. All the ESMs considered in this study represent the carbonate pump, with the exception of CMCC-CESM (CMIP5), which we include in our analysis to highlight the effect of implementing such a pump. NASA-GISS ESMs were not included as they prescribe Alk.

Review of the marine biogeochemical models
We review the key properties of marine biogeochemical models simulating Alk, seeking to share an in-depth overview of the representation of the Alk tracer in these models as well as its main interior driver, the carbonate pump. Specifically, we have collected a wide range of information from the different groups, both for CMIP5 and CMIP6, regarding the protocols followed (e.g. spin-up, initialization), the model boundary conditions (e.g. river discharge, Alk restoration), and the biological complexity and representation of explicit or implicit mechanisms (e.g. CaCO 3 production, dissolution, sedimentation). In addition, we also report the nitrogen reactions taken into account by the different models, but we do not explore their effects on the Alk distribution, which can be complex under low-oxygen conditions (e.g. . Finally, it should be noted that we were unable to collect comparable information for all mod-els, notably the HAMMOZ-Consortium group, which is assumed to be identical to MPI-M for CMIP6 in terms of marine biogeochemistry modelling (Table 1) (Table 1), such that we do not address the role of internal variability in the emergence of climaterelated changes in the key marine biogeochemistry variables we consider in this analysis.

Background processing
To facilitate the ESM intercomparison, we used Climate Data Operator (CDO) functions to regrid ESM outputs and observations. Specifically, we used distance-weighted average remapping "remapdis" to regrid the data on a regular 1 • × 1 • grid and linear-level interpolation with extrapolation 1200 A. Planchat et al.: The representation of alkalinity and the carbonate pump "intlevelx" to regrid to the World Ocean Atlas (WOA) vertical grid with 33 depth levels up to 5500 m (even though 5500 m is not accessible for some of the CMIP5 ESMs). The mid-point of the uppermost level, which we refer to as the surface hereafter, was set to 5 m, given this is the deepest upper-ocean level among the ESMs considered. The analysis was performed in Python with the use of the Gibbs Sea-Water (gsw) oceanographic toolbox for ocean property conversions. We also used mocsy 2.0 (Orr and Epitalon, 2015) to compute the ocean carbonate system over our averaging period  with the use of (i) Alk, DIC, phosphate (or nitrate divided by a Redfield ratio, r N : P = 16, if not available), salinity and temperature from ESM outputs, (ii) silicate from GLODAPv2 observations (Olsen et al., 2020) as it is not included in many ESMs, and (iii) the seawater equilibrium constants recommended for best practices Orr and Epitalon, 2015). Quality control of ESM outputs led us to (i) exclude "po4" for CMCC-CESM (CMIP5) due to anomalously high values and long-term drift, and to (ii) exclude certain values at high latitudes given at 5500 m for MIROC ESMs in CMIP5 because they were not plausible and were likely affected by an error in model output processing before outputs were shared. Finally, each model is weighted in the calculation of CMIP5 and CMIP6 statistical values (mean, standard deviation, quartiles, and linear regressions) such that each modelling group has the same total contribution. Further 3D assessment of simulated variables by basin was performed through profiles, sections, and maps. Although beyond the scope of this study, these figures are provided online: https://doi.org/10.5281/zenodo.7637538.

Drift assessment
We did not correct for potential drift in the ESM outputs in order to maintain consistency between the internal mechanisms of the ESMs. However, the piControl simulations were assessed for drift and are discussed in Sect. 4.2.1. Using the piControl data coincident with the Historical simulation and the RCP (Representative Concentration Pathway) for CMIP5 simulations or SSP for CMIP6 simulations (250-year-long piControl simulations), we were able to assess whether the ESMs had reached a quasi steady state prior to the Historical simulation. We assessed the drift of the vertical gradients of Alk, DIC, nitrate, and phosphate between the surface and deep ocean, considering the difference between the first and last 20 years of the piControl simulations. Similarly, we also estimated the drift in surface salinity and temperature as well as the spatially integrated exports of PIC and POC at 100 m. Drift assessment was not possible for MRI-ESM1 (CMIP5) due to a lack of piControl outputs and was only carried out for DIC and Alk for CanESM2 (CMIP5) as nitrate data were unavailable.

Open-ocean mask
Our analysis focuses on the representation of Alk and the carbonate pump in the open ocean. Thus, most of the analysis and the associated figures consider the open ocean (defined in Appendix A) rather than the entire ocean. Indeed, our aim was to exclude coastal regions due to the coarse ESM resolution, but particularly to avoid inclusion of river discharge effects on Alk (see Sect. 2.4). The entire ocean was considered when values were integrated to compare with observationally based estimates (e.g. PIC and POC exports at 100 m). Unless otherwise specified, differences between consideration of the open ocean and the entire ocean were negligible.

Data products
We use the gridded data from the Global Ocean Data Analysis Project (GLODAP) to evaluate model performance. This database is built with bias-corrected water column bottle data -merged with CTD data for salinity -from the ocean surface to the bottom (Olsen et al., 2020). In particular, we made use of the second update of the second version of the gridded product (GLODAPv2.2020, Olsen et al. 2020), with an improved and extended coverage compared to the original second version (Lauvset et al., 2016) and the first version (Key et al., 2004), especially in the Arctic. GLODAPv2.2020referred to as GLODAPv2 hereafter -it contains data from 946 cruises, covering the global ocean from 1972 to 2019 with two quality controls, and adjustments to minimize severe biases. Note that we use the GLODAPv2 product that was normalized to the year 2002 for DIC to avoid biases due to the accumulation of anthropogenic carbon over the observational period. For consistency, we use nutrient fields from GLODAPv2 rather than those given by WOA, maintaining the same method for mapping the nutrients as for DIC and Alk (Lauvset et al., 2016). As modelling groups have internal protocols for sharing volumetric concentrations of their outputs (e.g. for Alk and DIC, mol m −3 ), and as most ESMs apply the incompressibility assumption for the ocean, we convert GLODAPv2 observations from gravimetric to volumetric units using a constant density of 1026 kg m −3 (approximately mean surface density).
To evaluate the simulated export of CaCO 3 from the upper ocean, we use the latest estimate of Sulpis et al. (2021) -although referenced to 300 m -, which is consistent with another recent estimate from Battaglia et al. (2016). While Battaglia et al. (2016) is an observationally constrained probabilistic evaluation, Sulpis et al. (2021) is an assessment from seawater chemistry and water-age data. These estimates seem to mark a point of agreement (76.0 ± 12.0 Tmol yr −1 for the former and 75.0 [60.0; 87.5] Tmol yr −1 for the latter) in the evaluations carried out since the late 1980s, which range from about 45 to 150 Tmol yr −1 (Sulpis et al., 2021). This reflects both the sparsity and collection biases of in situ data from sediment-trap measurements and the difficulty in eval-uating the contribution of CaCO 3 to the Alk budget with interpolated observations and numerical tools. To evaluate the simulated export of POC at 100 m, we compare the models to the observationally derived estimate of 558 Tmol yr −1 from DeVries and Weber (2017). For simplicity, we chose to use the latest estimates in our analysis as data reference values, although large uncertainties remain for PIC and POC export from the surface ocean (∼ 50 to ∼ 150 Tmol yr −1 and ∼ 300 to ∼ 1200 Tmol yr −1 respectively), as discussed by Sulpis et al. (2021) and DeVries and Weber (2017). The observationbased rain ratio is 0.14 and was computed from integrated PIC and POC export values from Sulpis et al. (2021) and De-Vries and Weber (2017) respectively.

Salinity normalization
As Alk is highly correlated with salinity in the upper ocean due to freshwater fluxes (e.g. precipitation, evaporation, and river discharge; Friis et al., 2003), salinity normalization is required to assess the influence of biogeochemical processes. We use the canonical normalization approach of dividing Alk and DIC values by the coincident salinity and multiplying this by a reference salinity value of 35 g kg −1 : which gives the Alk and DIC that the considered fluid parcel would have at a salinity of 35 g kg −1 (e.g. Sarmiento and Gruber, 2006;Fry et al., 2015). This approach was deemed appropriate given that our analysis is focused on the global open ocean, and therefore near-zero salinity values simulated by certain ESMs in the coastal ocean or closed seas are not taken into account. Hereafter, salinity-normalized Alk and DIC are referred to as sAlk and sDIC respectively. The influence of alternative salinity normalization techniques (Robbins, 2001;Friis et al., 2003;Carter et al., 2014;Koeve et al., 2014;Sulpis et al., 2021) on our results was assessed and found to be limited (see Appendix B).

Estimating the biological pump and related quantities
The expression and quantification of the biological carbon pump are essential to understanding the influence of biological processes on the distribution of Alk. The biological pump can be split into a soft-tissue pump associated with the production and remineralization of organic matter and the carbonate pump associated with the production and dissolution of CaCO 3 . Here we define the pumps relative to the surface following Sarmiento and Gruber (2006). This is broadly equivalent to the TA * method developed by Feely et al. (2002), which uses apparent oxygen utilization (AOU) instead of nitrate and/or phosphate concentrations. Our choice of method here was influenced by the direct availability of the nitrate and phosphate fields simulated by the ESMs (un-like AOU). Thus, we express the soft-tissue pump (δC soft ) and carbonate pump (δC carb ) as where NO − 3 and PO 3− 4 respectively refer to the nitrate and phosphate concentrations, and for each tracer τ , δτ = τ τ surf is the difference between a tracer concentration and its surface value. r C:P , r C:N , and r N:P are C : P, C : N, and N : P ratios, and r Nut:P is a nutrient-to-phosphorus ratio with regards to the effect of the soft-tissue pump on Alk. The C : P ratio is model-dependent in our analysis (r C:P = 106 for GLO-DAPv2 and the CMIP5/6 ensemble mean), whereas the N : P ratio is fixed (r N:P = 16) and r P:P = 1 by definition. We can thus infer r Nut:P = r N:P + r P:P + 2 · r S:P = 21.8, where we assume the S : P ratio at r S:P = 2.4 for the observations (Wolf-Gladrow et al., 2007). Since the effect of sulfur is not taken into account in models, we use r Nut:P = r N:P + r P:P = 17 for the ESMs (e.g. Brewer et al., 1975;Sarmiento and Gruber, 2006). This definition of the biological pump does not take into account the influence of ocean circulation. As a consequence, we limit the consideration of these pumps to horizontally averaged open-ocean regions, and the calculation with phosphate is preferred in the analysis when possible. Indeed, very low nitrate concentrations can be observed at the ocean surface in locations where significant phosphate remains. Our restriction of this calculation to open-ocean regions also reflects concerns that nutrient inputs from the ocean boundaries may also bias estimates of the pumps in coastal regions . We use the same decomposition approach for all ESMs and GLODAPv2, neglecting that (i) the soft-tissue pump has no impact on Alk in BFM4 and MEDUSA-2.0, (ii) CMCC ESMs, NOAA-GFDL ESMs (excluding GFDL-CESM4), and NCAR ESMs involved in CMIP6 have variable or various r N:P and/or r C:P , and (iii) CMCC for CMIP5 had no representation of the carbonate pump (see Supplement Table S1).
From this definition, it is important to highlight the dependency of the carbonate pump on the soft-tissue pump, combining Eqs. (2) and (3): A positive soft-tissue pump value refers to net remineralization at a given depth compared to the surface and a negative value to net organic matter production. Similarly, a positive carbonate pump corresponds to net dissolution relative to the surface, while a negative value corresponds to net calcification. Net remineralization compared to the surface results in positive values for both the soft tissue and carbonate pumps, whereas net dissolution compared to the reference level results in positive values only for the carbonate pump. As highlighted by Sarmiento and Gruber (2006), δC soft and δC carb are "potential" pumps as they reveal the biological processes that drive the distribution of sAlk within the ocean but fail to account for the indirect effect of biology on air-sea CO 2 fluxes. Alk and the carbonate cycle are closely linked due to the effect of calcification and dissolution, but the soft-tissue pump also impacts Alk. To estimate the drivers of ESM sAlk vertical profile biases in the open ocean, we decompose sAlk. We start by differentiating sAlk: Rewriting Eqs. (3) and (4), δAlk can be expressed in terms of the carbonate and soft-tissue pumps: This means that, at a given depth z, Alk can be expressed as follows: where Alk surf refers to the surface Alk. Combining Eqs. (7) and (5), this results in distinguishing four terms related to the role of surface Alk, salinity, and both the carbonate and soft-tissue pumps. Using the operator defined on a tracer τ by τ = τ Simu − τ Obs , we can express, as a first approximation, from Eq. (8), the difference in sAlk at a given depth z between the ESMs and the observations from GLODAPv2 using a reference value at the surface: where the overbar corresponds to the mean bias between the simulations and the observations. In this way, we compute a relative decomposition -since components are not fully independent (e.g. the carbonate pump is computed from the soft-tissue pump) -to compare the different terms with the observations. Although another approach was proposed by Oka (2020), our intention here is to isolate the role of the surface Alk bias, which is key in driving carbon fluxes.

Survey of relevant model parameterizations
Our review of the representation of Alk and the carbonate pump in the ESMs leads us to share a synthesis, which allows us to compare the different models and their evolution from CMIP5 to CMIP6 (Fig. 2). Here we document the key features we have identified regarding the carbonate pump and the global Alk budget. Additional model information is provided in Fig. C1 and in Appendix C, including specific model equations and parameters. A detailed overview of the modelling schemes used by the different groups is provided in Supplement Table S1. While CMIP models generally represent the carbonate pump in a limited number of formulations, the specific details of these often vary between models.

Calcification
All biogeochemical models that consider the carbonate pump represent pelagic calcification implicitly in both CMIP5 and CMIP6. No model explicitly incorporates a representation of a calcifying planktonic functional type (PFT). For most of the models, biogenic CaCO 3 is in the form of calcite, although aragonite is also considered by NOAA-GFDL in TOPAZ2 and COBALTv2. Certain groups represent a generic biogenic CaCO 3 (CSIRO with WOMBAT for CMIP6, MIROC with OECO1/2 for CMIP5/6, and MOHC with diat-HadOCC for CMIP5) but attribute it either to calcite or aragonite based on their outputs and consistency with these two forms of CaCO 3 (e.g. export distribution) to conform to CMIP output requirements. Finally, we note that no model represents benthic production of CaCO 3 . This reinforces the decision to focus our analysis on the open ocean and to exclude the coastal ocean when possible. The parameterizations used to represent implicit pelagic calcification are various and show dependence on a variable number of drivers. Most of the models determine implicit calcification rates as a function of the fate of phytoplankton (through mortality and excretion by zooplankton after grazing), with certain models additionally considering the fate of zooplankton (through mortality and excretion by other zooplankton after consumption). In contrast, in CMIP5, diat-HadOCC for MOHC, OECO1 for MIROC, and TOPAZ2 for NOAA-GFDL are the only models for which calcification is directly related to phytoplankton growth. In addition, model calcification exhibits various dependencies on nutrient concentrations (phosphate, nitrate, iron, and silica), temperature, light, depth, and the calcium carbonate saturation state ( ): where CO 2− 3 is the carbonate ion concentration, Ca 2+ is the calcium ion concentration -which is considered propor- Figure 2. ESM representation of processes related to Alk and the carbonate pump. The representation and/or parameterization of each process has a subscript code associated with the potential values (1, 2, 3 or is not applicable -N/A). A more complete representation of this figure is available in Fig. C1, with further model details in Supplement Table S1. tional to salinity -and K sp is the apparent solubility product of CaCO 3 . is approximated in some models to the ratio between the carbonate ion concentration and that at saturation, CO 2− 3 sat . NOAA-GFDL with TOPAZ2, BLINGv2, and COBALTv2 and MOHC with MEDUSA-2.0 all consider CaCO 3 production to be dependent on the saturation state, with no calcification in undersaturated waters ( < 1). The implicit calcification parameterizations adopted by ESMs directly relate net CaCO 3 production to the export of PIC, as opposed to gross CaCO 3 production, of which only a fraction is exported. By not explicitly resolving the grazing of calcifying plankton and partial egestion of CaCO 3 by zooplankton (e.g. due to gut dissolution), it is expected that simulated CaCO 3 production will generally be less than observational estimates of the total production of biogenic calcium carbonate.

Sinking and dissolution
The sinking of PIC is model-dependent, with both explicit and implicit representations in the CMIP5 and CMIP6 ensembles. When represented explicitly, a sinking speed is considered for PIC. This speed is constant in the models with the exception of PISCESv1/2(-gas), where it is depth-dependent. PIC dissolution is computed using a dissolution rate and/or a dependence on the calcium carbonate saturation state. This dependence on the saturation state is variously represented in the models. Generally, PIC dissolution in the water column occurs in undersaturated waters ( < 1), with a linear dependence on the saturation state (BFM5.2, PISCESv2(gas), HAMOCC5.2/6, HAMOCC5.1/iHAMOCC, TOPAZ2, and COBALTv2), although its representation is more complex in PISCESv1 and BLINGv2.
When PIC sinking and dissolution are represented implicitly, a dissolution length scale and an exponential decay of the downward flux divergence of CaCO 3 represent the combination of instantaneous sinking and dissolution in the water column. diat-HadOCC is unique in representing PIC dissolution homogeneously through the water column below a globally uniform lysocline depth -the upper limit of the transition zone, where sinking CaCO 3 starts to substantially dissolve.

Ballast and protection effects
In the water column, PIC can be considered both as a ballast for organic matter, increasing the sinking speed of POC, but also as a protector of organic matter reducing the rate at which it is remineralized. It is relevant to distinguishing both processes, as their feedback on the soft-tissue pump are often exclusively treated. However, what modelling groups typically consider a ballast effect is generally better described as a protection effect, as it reduces organic matter remineralization during the sinking of POC. The formulation of this process is typically based on the model proposed by Armstrong et al. (2001) in which a component of the sinking POC flux is associated with sinking CaCO 3 and experiences reduced remineralization. It is generally parameterized using the data collated by MEDUSA-2.0, BEC, MARBL, TOPAZ2, BLINGv2, and COBALTv2. PISCESv1/2(-gas) and BEC are the only models which parameterize a ballast effect. In PISCESv1/2(-gas) half of the POC produced by nanophytoplankton is associated with calcifiers and is routed to quickly sinking particles, while in BEC and MARBL a fraction of the POC is associated with the higher-dissolution length scale for "hard" particles.

Sedimentation and Alk sources/sinks
The fate of PIC reaching the seafloor is one of the determinants of the ocean Alk inventory and closure of the CaCO 3 budget. There is a high diversity among models in their representation of sedimentation processes associated with calcium carbonate. For some models, all of the PIC reaching the seafloor is considered permanently buried and lost from the ocean (e.g. CMOC and OECO2). Other models dissolve all of the PIC reaching the seafloor, closing the calcium carbonate cycle and avoiding its processing in the seabed (e.g. WOMBAT and NPZD-MRI). A final sub-set of models represents sediment processes. Some of these distinguish a dissolved and buried PIC fraction (e.g. CanOE, BFM5.2, and PISCES), while others represent diagenesis with a sediment module (e.g. HAMOCC, BLINGv2, and COBALTv2).
Sedimentation is one way to balance broader inputs, and especially riverine discharge that many models either ignore or represent in only simplified ways (freshwater, Alk, DIC, and nutrient discharge). At the global scale, the sedimentation of PIC at the seafloor corresponds to a net biological sink of Alk, while sediment mobilization -essentially through the dissolution of CaCO 3 present in sediments -and river discharge are a net source. Although Alk sinks and sources are ideally balanced in steady state to avoid drift in the global Alk inventory, this is difficult to achieve in certain models and is forced in others through the use of a fixed Alk inventory and a restoring term. As a result, sedimentation processes appear to be key to closing the CaCO 3 budget and are further discussed in Sect. 4.4.2.

Alkalinity
The representation of surface Alk has evolved from CMIP5 to CMIP6 with a convergence of the global average value within the model ensembles, while regional disparities remain but to a lesser extent (Fig. 3). In CMIP5, the openocean mean surface Alk is higher than the observations (+0.022 mol m −3 ; +0.9 %), and in CMIP6 it is lower (−0.030 mol m −3 ; −1.3 %). This reflects a global decrease in surface Alk between CMIP5 and CMIP6, with an inversion of the bias relative to GLODAPv2 observations ( Fig. 3 and Fig. 4a). In addition, from CMIP5 to CMIP6, the variability among the ESMs with regards to surface Alk was reduced, with a decrease in the ensemble standard deviation of surface Alk globally (from 0.057 to 0.047 mol m −3 ), and particularly in the Arctic Ocean (Fig. 3). However, in both CMIP5 and CMIP6, global-mean surface biases relative to the observations cannot be attributed to a specific and consistent regional bias among the ESMs (see Fig. D1).
Normalizing Alk by salinity (sAlk) to remove the impact of freshwater fluxes has little impact on CMIP ensemble biases (Fig. 4). Indeed, the open-ocean mean sur-face sAlk bias compared to observations is reversed and reduced in CMIP6 (−0.019 mol m −3 ; −0.8 %) compared to CMIP5 (+0.034 mol m −3 ; +1.4 %), and the ensemble standard deviation of surface Alk has slightly decreased (from 0.044 to 0.039 mol m −3 ; Fig. 4a). We can therefore infer that these changes are mainly driven by biogeochemical processes rather than changes in surface salinity driven by freshwater fluxes. In particular, the zonally averaged sAlk for CMIP6 is closer to observations, with an enhancement in meridional variability (Fig. 4a). This improvement between CMIP5 and CMIP6 seems to be mainly due to the elimination of some poorly performing models in CMIP6. The CNRM-CERFACS, MOHC, MIROC, and NCC ESMs in particular have a more consistent representation of the standard deviation of the sAlk surface distribution compared to observations, alongside improved correlation (Fig. 4b). However, the standard deviation of sAlk in MRI ESMs has slightly decreased from CMIP5 to CMIP6, moving away from observations, although the correlation is similar. Furthermore, there is only improvement in the correlation of sAlk in CMCC, while for IPSL the sAlk correlation is improved, but this is accompanied by an excessive increase in the surface standard deviation.
Associated with the global improvement in the representation of the surface sAlk is a significant increase in the sAlk vertical gradient (Fig. 5). The groups for which the ESMs show an improvement in the correlation and a considerable change from CMIP5 to CMIP6 in the surface sAlk standard deviation -corresponding either to an improvement (CNRM-CERFACS, MOHC, MIROC, and NCC) or a large bias (IPSL) -are the groups that reveal major improvement in the vertical profile of sAlk (Fig. 5). Indeed, from a relatively uniform sAlk profile, they now exhibit a profile with increased Alk at depth, more consistent with observations, albeit with concentrations too high. For instance, the magnitude of the sAlk vertical gradient (the concentration anomaly at 5000 m with respect to the surface) has increased from 0.02 mol m −3 in CMIP5 to 0.17 mol m −3 in CMIP6 for the IPSL ESMs, and from 0 mol m −3 in CMIP5 to 0.17 mol m −3 in CMIP6 for MOHC ESMs. The ESMs of these groups are predominantly responsible for the strengthened sAlk vertical gradient in CMIP6, which has increased from 0.05 ± 0.05 to 0.12 ± 0.05 mol m −3 (2.6-fold). The magnitude of the CMIP6 sAlk vertical gradient is now closer to that of the observations (0.16 mol m −3 ).

PIC export at 100 m
The global improvement in the representation of surface sAlk and the increase in the sAlk vertical gradient in CMIP6 is accompanied by an enhancement of the carbonate pump. This is illustrated by a global increase of 11 % in the PIC export at 100 m between CMIP5 (49 Tmol yr −1 ) and CMIP6 (55 Tmol yr −1 ; Fig. 6). Additionally, we report a decoupling in the trends from CMIP5 to CMIP6 for the PIC and POC ex- ports, with an increase for the former and a decrease for the latter by 7 % between CMIP5 (712 Tmol yr −1 ) and CMIP6 (659 Tmol yr −1 ; Fig. 6). The combination of the two results in a 20 % increase in the rain ratio (RR) -defined as the ratio between PIC and POC export at 100 m (from 0.070 in CMIP5 to 0.083 in CMIP6; Fig. 6). Overall, CMIP6 ESMs tend to better match observational estimates of the exports and the RR compared to CMIP5 ESMs. While the CMIP6 average is 18 ± 27 % higher (+100 ± 151 Tmol yr −1 ) for the POC export compared to the observationally informed estimates from DeVries and Weber (2017), it is 28 ± 26 % lower (−21 ± 20 Tmol yr −1 ) than the estimate from Sulpis et al. (2021) for the PIC export, highlighting inter-ESM variability (Fig. 6). Although there is a global increase in the RR from CMIP5 to CMIP6, this shift is strongly associated with certain ESMs, specifically, CNRM-CERFACS and IPSL, where the increase is principally due to enhanced PIC export, and NCC, where it is due to reduced POC export (Fig. 6). While our focus is on the open ocean, when the global ocean is considered, the integrated CMIP6 mean POC and PIC exports both increase by 31 % (+155 and +13 Tmol yr −1 respectively), reflecting the importance of coastal ocean export.
The spatial distribution of the RR has also evolved from CMIP5 to CMIP6, reflecting the variable trends of PIC and POC export. There is typically a greater increase in the RR at high latitudes, due to a higher increase in PIC export and a smaller decrease in POC export (see Fig. D2). There is also diversity among the ESMs with respect to the spatial distribution of PIC export at 100 m, with notable differences in the Pacific equatorial upwelling region, the great calcite belt in the Southern Ocean, and the coastal ocean (see Fig. D3 and Fig. D4). These points of disagreement between the ESMs are interestingly also found within the estimates proposed by Lee (2001), Jin et al. (2006), Sarmiento and Gruber (2006), and Battaglia et al. (2016).

The carbonate pump
Using the decomposition of the sAlk bias at depth expressed in Eq. (9), we can distinguish the roles of surface Alk, salinity, the carbonate pump, and the soft-tissue pump in driving biases between the ESMs and observations at depth. The carbonate pump and surface Alk are found to explain the large majority of sAlk biases at depth (Fig. 7). Surface Alk contributes 34 % of the ensemble-mean sAlk bias at 5000 m in CMIP5 and 41 % of the bias in CMIP6, while the carbonate pump contributes 51 % of the ensemble-mean bias in CMIP5 and 46 % of the bias in CMIP6. Their respective influence has nevertheless changed from CMIP5 to CMIP6, with a greater relative contribution of surface Alk to the sAlk bias and a reduced contribution of the carbonate pump, which is further analysed in Sect. 3.3. In contrast, we find that salinity and the soft-tissue pump have a minimal influence on the sAlk bias at 5000 m in both CMIP5 and CMIP6 ensembles, contributing less than 10 %.
The CMIP6 increase in PIC export at 100 m globally acts to decrease sAlk at the ocean surface and increase it at depth. This increase could be explained by enhanced upper-ocean production and/or enhanced sinking of PIC and/or dissolution at depth. Using all the ESMs, we find a significant relationship between the sAlk vertical gradient in the open ocean and the global PIC export at 100 m (R 2 = 0.54, p<0.01; Fig. 8a). This reflects inter-ESM consistency between higher export of PIC at 100 m and an associated increase in the vertical gradient of sAlk -expressed as the difference between the mean sAlk between 4000 and 5000 m and the mean sAlk between 5 and 100 m. In particular, it highlights the shift towards higher values of PIC export and a strengthened sAlk vertical gradient in CMIP6. The relationship encompasses the wide variety of CMIP modelling schemes used to represent Alk and the CaCO 3 cycle, despite differences in the representation of sinking, dissolution, and seabed processes. It should be noted, however, that some models clearly stand out, as is the case for UKESM1-0-LL (CMIP6) and CNRM-ESM2-1 (CMIP6). This discrepancy may be explained by the fact that for the biogeochemical scheme in UKESM1-0-LL (MEDUSA-2.1), the soft-tissue pump does not affect Alk and thus does not attenuate the sAlk vertical gradient. As for CNRM-ESM2-1 (CMIP6), very high values of the PIC export at 100 m in the Japan Sea may explain its excessive PIC export relative to the other ESMs.
The relationship established between the global sAlk vertical gradient and PIC export across the CMIP5/6 ESMs can be combined with the sAlk vertical gradient from the GLO-DAPv2 observations to infer PIC export at 100 m. This ap- proach, similar to so-called "emergent constraint" methodologies (e.g. Eyring et al., 2019;Hall et al., 2019), provides a present-day PIC export estimate of 44-55 Tmol yr −1 at 100 m (see the black vertical line in Fig. 8a). This estimate is lower and out of the confidence interval of PIC export values independently assessed by Sulpis et al. (2021; 76 ± 12 Tmol y −1 at 300 m) and Battaglia et al. (2016;75.0 [60.0;87.5] Tmol yr −1 ). This reflects an apparent underesti-mation of the simulated PIC export at 100 m for a given sAlk vertical gradient in comparison with observational estimates.
The sAlk vertical gradient across the combined CMIP5/6 ESM ensemble is also consistently related to the surface meridional distribution of sAlk through the Meridional Overturning Circulation (MOC) with the upwelling of Alkenriched deep waters in the Southern Ocean. In particular, models with a higher sAlk vertical gradient have higher meridional gradients of sAlk at the surface -expressed as the difference between surface sAlk in the Southern Ocean, [−90, −45] • , and the low latitudes, [−45, 45] • -(R 2 = 0.46, p<0.01; Fig. 8b). Here again, the shift towards higher values for the meridional sAlk gradient and the vertical sAlk gradient from CMIP5 to CMIP6 is noticeable. Despite known differences in the representation of Southern Ocean upwelling (Beadling et al., 2020) and the spatial distribution of PIC export within the CMIP5/6 ensemble, the relationship found for the ESMs agrees relatively well with the GLODAPv2 observations.
Differences in simulated ocean circulation do not appear to be a major driver of differences in Alk gradients across the CMIP ensemble. Other factors being equal, an overly sluggish ocean circulation for instance would lead to water masses at depth that are too old and thus amplify the sAlk vertical gradient. However, using Atlantic and Southern Meridional Overturning Circulation indices (AMOC and SMOC) as proxies of the ocean overturning circulation (Heuzé et al., 2015;Heuzé, 2021), we find no robust relationship between the intensity of the global-scale ocean circulation and the sAlk vertical gradient across the CMIP ensemble.

Pump decomposition and implications
The improvement in the representation of the carbonate pump from CMIP5 to CMIP6 strengthens the vertical gradient of sAlk but now leads to excessively low sAlk concentrations in the upper water column (including at the surface; Fig. 9a). The full decomposition of the sAlk vertical gradient into its main drivers (see Sect. 2.5) gives insights into the respective roles of the carbonate pump (analysed in Sect. 3.2.3), the soft-tissue pump, and both the surface Alk and salinity biases. The positive bias of the soft-tissue pump, present in CMIP5, has increased in CMIP6, but its influence on the sAlk bias remains limited. The use of the same r Nut:P for the observations and ESMs (instead of 21.8 and 17 respectively; see Sect. 2.5) would have driven a minor offset in the bias associated with the carbonate pump (e.g. an sAlk bias reduction of 0.006 mol m −3 at 5000 m) without impacting the shape of the bias throughout the water column.
Despite the improvement in the representation of sAlk in CMIP6, the representation of the calcite and aragonite saturation horizon depth has worsened. From CMIP5 to CMIP6, a deepening of simulated saturation horizons has moved the ESMs away from observations (Fig. 9c). Moreover, although  the interquartile range of simulated calcite and aragonite saturation horizons in CMIP6 is reduced compared to CMIP5, the ensemble range remains considerable. Focusing on the CMIP6 ensemble, the bias in the aragonite saturation horizon depth seems to be strongly driven by Atlantic Intermediate Water (see Fig. D6b). In contrast, the global bias for the calcite saturation depth in both CMIP5 and CMIP6 results from partial compensation between a saturation depth in the equatorial Pacific Ocean that is too shallow and a saturation depth in the North Pacific that is too deep. These regional biases showed greater global compensation in CMIP5, although the individual regional biases have been reduced in CMIP6 (see Fig. D6b). Although there is a slight deterio-ration in the representation of the saturation horizons from CMIP5 to CMIP6, they remain globally in broad agreement with observations. This is due to compensation between the negative biases of the respective sAlk and sDIC vertical profiles, notably in the sub-surface. The resulting vertical profile bias of sAlk-sDIC, an approximation of the carbonate ion concentration, is therefore lower than that of sAlk and sDIC (Fig. 9).  Table S1). Such changes are insufficient to explain the increase in the intensity of the carbonate pump and in the vertical Alk profile as seen from CMIP5 to CMIP6. Although a CaCO 3 cycle has been added to the BFM biogeochemical model, in other models, parameterizations generally changed little between the two CMIP exercises. Improvements in the range of processes that can be represented with respect to the CaCO 3 cycle are limited and model-dependent. Certain ESM groups have changed their embedded ocean biogeochemical model between CMIP5 and CMIP6, with consequent changes in the CaCO 3 cycle scheme. For example, the transition from TOPAZ2 to COBALTv2, which includes enhanced resolution of the plankton food web, in NOAA-GFDL has changed the parameterization of aragonite and calcite production. One trend is towards a more complete representation of the fate of PIC at the seafloor in CMIP6 with the expansion of the use of sediment modules to at least partly balance the global ocean Alk content. This indicates that most of the model performance changes from CMIP5 to CMIP6 are likely associated with parameter tuning, or ad hoc settings, and potentially with a general increase in the horizontal and vertical model resolution and improved representation of ocean circulation .

Inconsistencies in protocols and future recommendations
This analysis and review of the modelling schemes has provided insight into the protocols followed by the modelling groups and the implications this has for ESM outputs, lead- ing us to make several recommendations for the ocean biogeochemical modelling community.

Drift and spin-up
The drift that we assess in the ESMs is low enough to have minimal influence on our non-drift-corrected results centred on 2002 (see Sect. 2.2.1, Fig. 10, and also Figs. D7 and D8). For instance, the model drift per century of the sAlk and sDIC vertical gradients is less than 8 % of the observed vertical gradients. Similarly, model drift in surface-ocean salinity, temperature, and exports at 100 m is also limited and should have minimal influence on the results, in part due to salinity normalization. The largest drift was observed for CMCC-CESM (CMIP5) and CNRM-ESM1 (CMIP5); specifically, the global DIC inventory of CMCC-CESM (CMIP5) had not reached equilibrium prior to the Historical simulation. In CMIP6, these two groups have increased the spin-up du-ration, although the relative part of their online spin-up has decreased.
There is great diversity with regards to the spin-up strategy employed by different groups (Séférian et al., 2016). Séférian et al. (2020) discussed this and pointed out that the spin-up duration has increased for all groups except IPSL and NOAA-GFDL. For these two groups, the considerable increase in resolution from CMIP5 to CMIP6 was balanced against a reduced spin-up duration as well as the completion of a fully online spin-up in the case of IPSL. Finally, we highlight two contrasting spin-up strategies with consequences for the mean ocean state of Alk and the CaCO 3 cycle. In MPI-M ESMs for CMIP5, the model was initialized with the same Alk and DIC values in all ocean grid cells and, to reduce the spin-up duration, Alk was indirectly tuned to achieve a consistent representation of the ocean CO 2 sink. This was achieved through increasing weathering fluxes and the CaCO 3 content in sediments, leading to an increase in Alk and DIC to maintain the desired pCO 2 field (Ilyina et al., 2013b). This explains the strong offset in both Alk and DIC content for MPI-M in CMIP5 (see Fig. C1). An alternative strategy was developed by NCAR for CMIP6 regarding the balancing of the global ocean budget of Alk. During the spin-up, the saturation-state threshold for the burial of CaCO 3 was tuned to balance the loss of Alk from the burial of CaCO 3 and the riverine input of Alk before starting the experimental simulations (Long et al., 2021). This resulted in the choice of an unusual threshold for the burial of CaCO 3 (see MARBL in Appendix C). Similarly, NOAA-GFDL for CMIP6 in COBALTv2 set sediment calcite concentrations such that Alk lost through calcite burial balanced river Alk inputs at a certain year during the spin-up (Dunne et al., 2012). While it is probably advisable that model groups continue to work to balance the Alk budget at quasi steady state, observations suggest there may have been a slight net sink of Alk during the Holocene and therefore potentially an ocean carbon source to the atmosphere (Ciais et al., 2013;Cartapanis et al., 2018). The strategy of maintaining a degree of freedom at the seafloor during spin-up with the tuning of parameters associated with the CaCO 3 sedimentation processes at the bottom of the ocean seems to be relevant to balancing the overall Alk budget at equilibrium. However, given the difficulty in running ESM simulations to equilibrium during the model development process, it is likely that drift correction of ocean CO 2 system variables will continue to be a requisite of robust ESM intercomparisons.

Alk and DIC initialization
As previously discussed, drift can depend not only on the spin-up strategy, but also on the initialization strategy. Indeed, it is interesting to examine the initialization of Alk and DIC in ESMs and to assess how this may influence model performance. Alk and DIC fields are recommended to be initialized using the second version of the GLODAP product (GLODAPv2, Lauvset et al., 2016) following the OMIP-BGC protocol for CMIP6 (Orr et al., 2017). For CMIP5, many groups were initialized with the first gridded version (GLODAPv1, Key et al., 2004), even though this was not specific to a protocol. GLODAPv2 surface fields are more heterogeneous than GLODAPv1, but it is mainly in the coastal ocean, and in particular in the Arctic, that the two datasets diverge due to the addition of new observations. At global scale, the difference between the GLODAP products does not appear to have a noteworthy impact on the representation of both present-day Alk and DIC (Fig. 11a, b and see also Appendix E). Thus, neither the change in the GLODAP mapping method nor the increase in the number of observations is responsible for the improvement in the Alk representation from CMIP5 to CMIP6. In fact, a number of groups did not follow the OMIP-BGC protocol and instead continued using GLODAPv1 to initialize their CMIP6 models (see Supplement Table S1).
Surprisingly, it is assumptions in the conversion of the GLODAP Alk field from gravimetric to volumetric units that contribute to differences between ocean Alk distributions across ESMs. The Alk field is provided in micromoles per kilogram in the GLODAP-mapped product and must, with the exception of the NOAA-GFDL models, be converted to volumetric units (e.g. mol m −3 ) to be used by the marine biogeochemical models. Multiple approaches are employed by the groups when performing this conversion. For most of the ESMs, the conversion was made using a constant seawater density, which itself varies between ESMs (from 1024 kg m −3 for ACCESS-ESM1-5 (CMIP6) to 1028 kg m −3 for CNRM-CERFACS, IPSL, and MIROC) with in situ density used by the other ESMs (see Fig. C1). Further investigation reveals that the method of volumetric conversion for the initial Alk field drives a surface Alk bias that influences both surface DIC concentrations and the global DIC content of the ocean. Although this bias in the initialization is confined to depth depending on the density used for the conversion (see in situ, potential, and 1026 kg m −3 in Fig. 11c), it results in a perturbed surface-ocean CO 2 system once the ESM reaches a quasi steady state. Indeed, in an ocean with a conservative Alk inventory, the biological pump and in particular the carbonate pump only influence the distribution of Alk. Whatever the initialization of ocean DIC, the Alk content of the ocean is fixed at its initial value, and its gradient is determined through biogeochemical and physical processes. This means that surface Alk after spin-up can be inferred from the initialization field and the biogeochemical processes and ocean dynamics represented within a model. As a result, without taking into account salinity and temperature, the equilibrium between surface Alk, DIC, and atmospheric pCO 2 tends to set the surface DIC concentration at the end of spin-up through air-sea carbon fluxes. The global content of DIC is therefore directly impacted by surface Alk after spin-up with an atmosphere considered an infinite carbon reservoir. Finally, although the NOAA-GFDL models avoid initialization issues by keeping tracers in gravimetric units, the use of a constant density of 1035 kg m −3 during post-processing to produce "talk" and "dissic" fields distorts the ocean CO 2 system from the real ESM outputs, in particular with excessive surface values (see Fig. D1).
In addition to correct initialization of the global Alk inventory, initialization with a spatial Alk distribution consistent with observations is also required to allow accurate computation of the ocean CO 2 system and likely air-sea carbon fluxes (see mocsy 2.0; Orr and Epitalon, 2015). Indeed, the ocean biogeochemical models are generally only able to directly affect the alkalinity components associated with carbonates, phosphates, and sometimes silicates, while the other components are computed from pH (water alkalinity), salinity (fluoride alkalinity), or both (borate; Uppström, 1974;Lee et al., 2010). Thus, an inaccurate initialization of the Alk distribution (e.g. for groups initializing with a constant density) could indirectly repartition Alk between its differ- Figure 10. Resolution, spin-up and evaluation of drift in CMIP5 and CMIP6. ESM resolution and spin-up are normalized between 0 and 1, so that the highest model resolution and longest respective spin-up correspond to 1 (the maximum value attributed for the spin-up duration was set to 8000 years to keep the differences between MPI models and other ESMs visible). Model drift is normalized between −1 and 1 so that the model with the maximum absolute drift corresponds to either 1 or −1, and better-performing models have lighter cell colours. For each row, the highest model drift per century is expressed as a percentage of the observational estimate (e.g. CESM2-WACCM drift in PIC export at 100 m represents 1 % per century of observational estimates). Vertical gradients are assessed with the difference between the mean between 4000 and 5000 m and the mean between the surface and 100 m, while "upper ocean" refers to the mean between the surface and 100 m. ESMs with an issue are marked with an asterisk, and when information is missing, the cell is left blank. POC and PIC export drifts at 100 m are for the global ocean and not the open ocean, as is the case for other variables. CNRM-ESM2-1 (CMIP6) export data were not included due to extreme values in the Japan Sea, which affect the colour-scale normalization and might partly explain the model's high PIC export at 100 m (Fig. 6). Additional information on model resolution, spin-up and drift is available in Fig. C1 and Supplement Table S1. ent components, including those not directly affected by biogeochemical processes in the models. The influence of even small biases in borate alkalinity has been shown to have nonnegligible effects on the ocean CO 2 system (Orr and Epitalon, 2015).
In summary, standardizing the Alk initialization protocol in CMIP exercises would reduce biases in the representation of ocean carbonate chemistry, especially at the surface. As ocean models typically apply an incompressibility assumption, it is meaningful to initialize the Alk field with the constant reference density of a given model. While a density of 1026 kg m −3 enables models to reproduce surface Alk values consistent with observations, the use of 1035 kg m −3 conserves the total alkalinity budget. However, the use of a constant density partly flattens the Alk spatial distribution, entailing potential biases. This could be partly counterbalanced by using potential density, which has a typical profile close to a constant value while maintaining spatial variability. As a consequence, we recommend initializing the Alk field with a weighted potential density in order to keep the density of reference considered within the ocean model as the mean density while being in agreement with the physical assumptions made in the models. We advise doing the same for DIC initialization.

Improving model assessment and traceability
The different strategies for initializing Alk and DIC highlight the importance of clear data sharing and precise protocols to enable robust ESM assessments and intercomparisons. In the following, we recommend increasing the priority of certain variables in CMIP exercises (Orr et al., 2017). First, we suggest that, in future intercomparisons, model groups share the three-dimensional export of POC and PIC ("expc", "expcalc", and "exparag") and not only the exports at 100 m, as some groups have done for CMIP6. This would enable a more consistent estimate of POC and PIC export as well as the resulting rain ratio. Similarly, the potential inaccuracy of soft-tissue pump estimates based on fixed-depth POC export is well-known (e.g. Buesseler et al., 2020;Koeve et al., 2020). The simulated increase in sDIC in the upper 100 m of the water column, despite a relatively consistent sAlk, indicates that net remineralization of POC is occurring in much shallower waters than net dissolution of PIC in the ESMs. As such, POC export values should be used with caution when assessing rain ratios (see Figs. 6 and D2). Sharing of complete export fields would ideally be accompanied by three-dimensional fields of remineralization and dissolution ("remoc", "dcalc", and "darag") to facilitate analysis of processes such as the biological pump throughout the water column. Finally, sharing of vertically integrated calcite and aragonite production ("intpcalcite" and "intparag") and POC and PIC burial ("froc" and "fric") would also improve assessments of the influence of the biological pump on vertical DIC and Alk profiles (see Fig. 9a, b).
Alongside the absence of certain model outputs, one issue that has presented itself throughout our analysis is a lack of model traceability between CMIP5 and CMIP6. In the absence of publications documenting model changes, it is typically not possible to trace ocean biogeochemical model developments without contacting individual developers and in certain instances asking for the model code. To address this, we propose that developers utilize a common online platform to share their code and provide an associated model guide. Such a platform would critically improve model traceability, enhancing ocean biogeochemical model transparency and accessibility (e.g. sharing river discharge values for both Alk and DIC). Within this context, the Earth System Documentation (ES-DOC, https://es-doc.org, last access: February 2022) project is highly relevant. However, in its current form, the tool is broadly insufficient for specific studies due to the paucity of ESMs participating and the level of model documentation provided.

Model intercomparison and model-data comparison
Alk is sensitive to the density used to convert from gravimetric to volumetric concentrations. Using in situ density rather than potential density increases global-mean Alk and DIC volumetric concentrations by only 1 % (+0.023 mol m −3 for Alk and +0.022 mol m −3 for DIC), but the vertical gradient of sAlk between the surface and 5500 m increases by 32 % compared to 15 % for sDIC (Fig. 11c), while this conversion has negligible effects on nitrate and phosphate. Concentrations output by models assuming incompressibility can be considered "potential" concentrations and not "true" concentrations. The format of data sharing does not currently allow this distinction to be made. Indeed, ocean models used in CMIP exercises, which assume incompressibility, represent concentrations defined from the reference density used in the model. Thus, in order to convert modelled concentrations from volumetric to gravimetric units, the respective model reference densities should be used.
Conversion between gravimetric and volumetric concentrations can lead to biases in model intercomparisons. The use of different model reference values across an ESM ensemble results in biases unrelated to model processes when comparing "potential" concentrations. Similarly, the choice of density used to convert GLODAPv2 concentrations from gravimetric to volumetric units can shift values across the water column affecting model-data comparisons. It may therefore be worthwhile to document when model outputs are "potential" concentrations and to share the associated reference density. Ideally, all models would share "potential" concentrations with the same reference density, either by standardizing the reference densities used in the models or by using a multiplicative factor for the output concentrations.

Implications of the improved Alk representation in CMIP6
Although no major trend emerges in terms of the evolution from CMIP5 to CMIP6 with regards to the modelling schemes (see Sect. 4.1), there is nevertheless an improvement in the representation of Alk associated with a strengthened carbonate pump (see Fig. 1). Here we discuss the potential implications that this improvement may have for the ocean response to anthropogenic carbon emissions considering potential CO 2 feedbacks and the impact on ocean acidification projections.

Ocean carbon uptake and ocean acidification
In a CO 2 -concentration-driven simulation, the surface Alk and DIC are directly connected to each other through equilibration via air-sea CO 2 fluxes. As a result, a modification of global-scale surface Alk has a direct effect on surface DIC (Fig. 12a). Indeed, neglecting the effect of temperature and salinity on the partial pressure of CO 2 at the ocean surface, we can differentiate pCO 2 as follows: where pCO 2 , Alk, and DIC all refer to surface values and the partial differentials are both at fixed temperature and salinity. Rewriting the differentials "d" as the difference " " between two ESMs gives where the overbars correspond to the mean surface-ocean values for the partial differentials. At the global scale, we can assume that, for a surface ocean in balance with atmospheric pCO 2 , pCO 2 = 0. Consequently, surface differences in Alk and DIC between two ESMs are linearly related. Approximating the carbonate ion concentration to Alk-DIC and using the expression for pCO 2 given in Sarmiento and Gruber (2006), this results at the ocean surface in Substituting the mean surface-ocean Alk and DIC of the combined CMIP5 and CMIP6 ensembles, DIC 0.81 · Alk, and indeed a very similar relationship between surface-ocean Alk and DIC anomalies for individual ESMs relative to CMIP5 ensemble-mean values is found ( DIC 0.842(±0.009) · Alk; R 2 = 0.98, p < 0.01; Fig. 12a).
This relationship between anomalies in surface Alk and DIC has implications for the wider surface-ocean CO 2 system. As the slope associated with the linear regression is less than 1, an ESM with higher surface Alk will tend to have a higher Alk-DIC and therefore a higher surface concentration of CO 2− 3 and pH (Fig. 12a). The decrease in surface Alk from CMIP5 to CMIP6 therefore results in a slight decrease in pH and a generally lower calcite and aragonite saturation state, with the global surface-ocean carbonate ion concentration decreasing by 2.9 ± 2.7 % and up to 5 % in certain regions (Fig. 12b). As the timescale of air-sea CO 2 exchange can be approximated as being proportional to the carbonate ion concentration , other factors being equal, CMIP6 mean surface-ocean pCO 2 is likely to equilibrate faster with the atmosphere than that of CMIP5. An exception to this is the Southern Ocean, where upwelled deep waters are far from equilibrium with the atmosphere. In CMIP6, enhanced carbonate dissolution at depth results in upwelled Southern Ocean waters with a higher carbonate ion concentration, implying that Southern Ocean pCO 2 has a longer equilibration timescale. While the change in the carbonate pump from CMIP5 to CMIP6 seems to have a slight effect on the representation of the present-day ocean CO 2 system at the surface, it is likely to have negligible feedback on the projected ocean carbon sink, with an overall decrease in the Revelle factor (γ DIC ) of only 0.2 ± 1.3 % (Fig. 12b). The maximum potential influence of the carbonate pump and in turn surface Alk on the uptake of anthropogenic carbon over the historical era has previously been estimated as 5 % (Murnane et al., 1999). However, in the equatorial Pacific, where upwelling variability induced by the El Niño-Southern Oscillation (ENSO) strongly modulates surface concentrations of DIC and Alk, accurately reproducing the observed Alk vertical gradient in ESMs is important for correctly simulating the observed interannual variability of CO 2 fluxes (i.e. anomalously outgassing during El Niño and ingassing during La Niña events; Feely et al., 2006). In a recent study, Vaittinada Ayar et al. (2022) showed that the mean state of the Alk vertical profile in the tropical Pacific influences both projections of ENSO-driven CO 2 fluxes and long-term carbon uptake in the region.
Finally, the trend in surface Alk and DIC from CMIP5 to CMIP6 also influences spatial heterogeneity, especially between the low latitudes ([−40, 40] • ) and the Southern Ocean ([−90, −40] • ), with enhanced meridional surface gradients of Alk and DIC in CMIP6 compared to CMIP5 (+0.024 and +0.013 mol m −3 ; Fig. 13). Neglecting model differences in ocean dynamics, we can estimate that differences in the amplitude of the soft tissue and carbonate pumps between CMIP6 and CMIP5 impact the meridional surface gradients of DIC and Alk. To estimate this effect, we define an attenuation coefficient (α) as the ratio between the meridional Alk gradient at the surface (δ Southern-midlat sAlk surf ) and the verti-cal open-ocean Alk gradient (δ 5000 m-surf sAlk): We associate this coefficient with the upwelling that determines the vertical Alk gradient in the Southern Ocean. Hence, by multiplying the deviations of both the soft tissue and carbonate pumps at depth for CMIP6 relative to CMIP5 by this attenuation coefficient, we are able to trace the origin of the changes in the meridional gradients of Alk and DIC from CMIP5 to CMIP6 with a residual component (including the gas-exchange pump and anthropogenic carbon uptake; Fig. 13). This highlights that the increase in the carbonate pump from CMIP5 to CMIP6 is the main driver of the enhanced meridional surface gradients of Alk and DIC.

Transient changes in the ocean Alk budget and its distribution
A quasi equilibrium of Alk on centennial timescales is commonly accepted, but observational data have only recently allowed the total global Alk budget to be closed (Middelburg et al., 2020). The riverine input of Alk is mainly balanced by the burial of PIC, but also to a lesser extent by the remobilization of sediments (essentially through submarine weathering and anaerobic remineralization of organic matter) and the burial of organic matter. Interplay between these fluxes makes Alk central to the understanding of the processes driving atmospheric CO 2 over glacial and interglacial cycles (Archer and Maier-Reimer, 1994;Kerr et al., 2017;Boudreau et al., 2018). Although in modelling studies drift in the Alk inventory would impact the surface air-sea carbon flux, anthropogenic perturbation of the carbon cycle and the effect of climate change could cause transient changes in the ocean Alk inventory and its distribution. The timescales over which such perturbations may occur, and the potential consequences for projections of anthropogenic carbon uptake, are unclear. Ocean Alk might increase through enhanced terrestrial rock weathering and an associated increase in riverine input in response to climatic drivers (e.g. enhanced precipitation, permafrost thaw; Raymond and Cole, 2003;Drake et al., 2018). In addition, shoaling of the saturation horizon due to ocean acidification is thought to explain recent observations of enhanced CaCO 3 dissolution at the seafloor (Sulpis et al., 2018) initiating chemical carbonate compensation. This highlights the potential importance of representing sediment processes in models given that CaCO 3 dissolution enriches waters in Alk and can therefore enhance ocean carbon uptake when these waters are recirculated to the surface ocean (Archer et al., 1998;Gehlen et al., 2008). Potential dissolution of coral reef CaCO 3 in response to climate change (Cooley et al., 2022) may also increase Alk on centennial timescales -echoing the coral reef hypothesis and its potential effect on atmospheric CO 2 during glacial-interglacial transitions (Berger, 1982;Opdyke and Walker, 1992).
Finally, the possibility of implementing large-scale ocean Alk enhancement (OAE, Renforth and Henderson, 2017; Bach et al., 2019) could increase the global Alk inventory on relatively short timescales. Representation of CaCO 3 burial and sub-marine weathering is necessary in simulations of OAE, even on decadal to centennial timescales, since this could entail abrupt changes. To date, OAE model studies have typically directly enhanced surface-ocean Alk (Köhler et al., 2013;Ilyina et al., 2013a;Hauck et al., 2016;González and Ilyina, 2016;Lenton et al., 2018;González et al., 2018); however, in reality Alk is likely to be provided via the addition of a mineral such as olivine, of which only a fraction will dissolve in the surface ocean. The explicit simulation of alkaline mineral addition could enhance Alk at depth, deepening the carbonate saturation horizon, suppressing the dissolution of sediment carbonate that might otherwise occur, and increasing the burial of sinking CaCO 3 .
Strategies used to close the Alk budget in ESMs are questionable in simulations where the global Alk inventory may not be in quasi equilibrium . Indeed, in CMOC and CanOE, CaCO 3 burial at the seafloor is redissolved at the ocean surface locally to close the Alk budget. This parameterization is intended to represent fluvial Alk sources  but does not impact the spatial distribution of surface Alk in the same manner. It also couples riverine discharge of Alk with the carbonate pump in projections. As previously discussed, in CMIP6, NCAR and NOAA-GFDL with COBALTv2 control the Alk balance at depth by tuning the calcite saturation-state threshold for burial and the sediment calcite concentration. Similarly, IPSL and CNRM in CMIP5 with PISCESv1 force this balance at depth, not explicitly taking into account the processes at the sediment interface. Other ESMs dissolve all the PIC that reaches the seafloor in the last ocean level, effectively avoiding the consideration of an Alk burial sink (WOMBAT, BEC, diat-HadOCC, and NPZD-MRI). For the models that include a sediment module, control of the global Alk budget under pre-industrial conditions might be complicated. However, the approach used by some groups (CNRM-CERFACS and IPSL in CMIP5 and CMIP6) to restore the global Alk inventory to ensure its conservation can mask Alk budget imbalances and potentially bias Alk vertical profiles. In particular, this could lead to drifts if Alk is no longer restored after spin-up (CNRM-CERFACS in CMIP6).

Potential changes in the carbonate pump
An ESM representation of the carbonate pump that includes ocean acidification and climate change sensitivities has the potential to produce climate feedbacks in centennial projections. Such a model would require a relatively high level of biological realism, taking into account the complexity of the response of the CaCO 3 cycle to environmental stressors (Gattuso and Hansson, 2011;Schlunegger et al., 2019). Most studies suggest a negligible CaCO 3 cycle climate feedback this century, although on longer timescales this feedback can be more important (e.g. Gehlen et al., 2007;Ridgwell et al., 2007;Schmittner et al., 2008;Hofmann and Schellnhuber, 2009;Gangstø et al., 2011;Pinsonneault et al., 2012;Krumhardt et al., 2019). However, uncertainties and diverse responses of calcifying organisms to environmental change (e.g. Kroeker et al., 2013) make it difficult to constrain model parameterizations with confidence. Furthermore, most studies only consider a sub-set of potential impacts on the CaCO 3 cycle. Here we discuss current ESMs, highlighting developments that may affect the CaCO 3 cycle climate feedback.
Although projected ocean acidification and climate change impacts (e.g. warming and reduced upper-ocean nutrient concentrations) have diverse effects on calcification, meta-analyses generally support an expected decrease in calcification due to ocean acidification (Kroeker et al., 2013;Meyer and Riebesell, 2015;Seifert et al., 2020). This would lead to an increase in surface Alk and a decrease in Alk at depth, with possible effects on anthropogenic carbon uptake. The increased energetic cost of calcification could also impact the diversity of pelagic calcifiers and their niches (e.g. for coccolithophores; Monteiro et al., 2016). While the representation of saturation-state-dependent calcification was previously prioritized (Ridgwell et al., 2009;Gehlen et al., 2007;Gangstø et al., 2008Gangstø et al., , 2011Hofmann and Schellnhuber, 2009;Pinsonneault et al., 2012), this is no longer the case. NOAA-GFDL and MOHC in CMIP6 are the only groups in this intercomparison to consider such a dependence. We recommend that the other groups follow their lead in improving the realism of the carbonate pump response to anthropogenic emissions. The implicit representation of pelagic calcification in current ESMs depends on the fate of organic matter and therefore indirectly on net primary production (NPP). However, projected NPP changes this century are highly uncertain across the CMIP5/6 ensembles . For all ESMs any environmental impact on planktonic calcifiers does not affect organic matter production. Moreover, changes in the growth rate of calcifiers do not impact CaCO 3 production, with the exception of BEC and MARBL, in which an implicit calcite pool is considered in the phytoplankton group. The independence and potential influence of calcifiers on organic matter production (Gattuso and Hansson, 2011) would require their explicit representation through one or more PFTs. Explicit pelagic calcifiers have been implemented in MARBL (coccolithophores; Krumhardt et al., 2019) and PlankTOM (coccolithophores, foraminifers, and pteropods;Buitenhuis et al., 2019). However, the challenge in representing both organic carbon and carbonate biomasses simultaneously is complicated by scattered observational data products (see MARine Ecosystem Model Intercomparison Project (MAREMIP) papers; O'Brien, 2012;Bednaršek et al., 2012;Schiebel and Movellan, 2012). Finally, a comprehensive representation of the carbonate pump, and thus Alk, in coastal areas, even in ESMs, will probably require the representation of benthic calcifiers as they typically dominate coastal calcification (O'Mara and Middelburg et al., 2020). This would also be valuable within the context of climate change and ocean acidification ecosystem impact projections. CSIRO recently included benthic carbonate production for regional applications addressing the Great Barrier Reef (Steven et al., 2019), and NOAA-GFDL developed neritic environments, such as coral reefs and carbonaterich/carbonate-poor shelves (O'Mara and Dunne, 2019). However, no benthic calcification is currently represented in ESMs. Although CaCO 3 dissolution is essentially abiotic, certain models explicitly represent the sinking of CaCO 3 with dissolution dependent on the saturation state and therefore sensitive to ocean acidification. Specifically, models represent dissolution as linearly dependent on the saturation state in undersaturated waters (see Sect. 3.1.2 and Appendix C). This is despite laboratory studies indicating an exponent >1 in the water column (most likely around 3 to 4 for calcite) while maintaining a linear dependence at the seafloor (e.g. Subhas et al., 2015;Sulpis et al., 2017;Boudreau et al., 2020). Reconciling this mismatch between laboratory results and model parameterizations would increase simulated dissolution in less undersaturated waters and thus shallower waters. This could also partially compensate for the lack of pelagic aragonite production in most models -a carbonate mineral with a lower thermodynamic stability than calcite -while the absence of simulated aragonite could influence projections of how the distribution of Alk responds to acidification. The shoaling of the saturation horizon should increase Alk at depth with a reduction in CaCO 3 burial (see Sect. 4.4.2) but also change the Alk vertical distribution with an increase in dissolution, and thus Alk, in sub-surface waters.
There is no consensus on the interaction between POC and PIC in the ocean interior with respect to a protection and/or ballast effect (see Sect. 3.1.3 and Fig. 2). Observational and laboratory data diverge (Klaas and Archer, 2002;Passow and De La Rocha, 2006;De La Rocha and Passow, 2007;Lee et al., 2009;Engel et al., 2009;Moriceau et al., 2009), highlighted by recent observations of dissolution in shallow supersaturated waters, supposedly due to POC remineralization influencing the PIC microenvironment (Subhas et al., 2022). Potential coupling between POC and PIC export may have side-effects in a transient climate change scenario with a decrease in CaCO 3 production and export, modifying organic matter remineralization (Hofmann and Schellnhuber, 2009).
Additionally, implicit PIC production avoids the representation of gross PIC production and zooplankton gut dissolu-tion (e.g. Jansen and Wolf-Gladrow, 2001), which can potentially occur deeper than 100 m. Simulating gross PIC production and zooplankton gut dissolution may permit the representation of a sub-surface dissolution peak in addition to the deep dissolution peak seen in models with explicit saturationstate-dependent dissolution. Such a double peak in dissolution would be consistent with observations Sulpis et al., 2021;Subhas et al., 2022). The representation of these two dissolution peaks may be important in the context of transient simulations as they may have different sensitivities to climate perturbations.
In order to improve the representation of the carbonate pump and to account for the impacts and feedbacks associated with acidification and climate change, certain model developments are desirable. The main priority is the parameterization of CaCO 3 production, particularly a consensus on its dependence on saturation state, temperature, and organic matter production, as this should influence the surface carbon cycle. Further desirable developments include (i) the representation of aragonite, which could partly redistribute Alk in the sub-surface and is more responsive to climate perturbations, (ii) the representation of benthic calcifiers, such as corals, which could respond on relatively short timescales, (iii) the explicit representation of saturation-state-dependent PIC dissolution, and (iv) the representation of dissolved and buried PIC fractions at the seafloor with or without the use of a diagenesis module.

Conclusions
Our assessment of marine biogeochemical models involved in the fifth and sixth phases of CMIP highlights the diverse representation of processes associated with the carbonate pump. CMIP6 models simulate implicit pelagic calcification, no benthic calcification and generally calcite but not aragonite. In contrast, sinking and dissolution of CaCO 3 particles are represented both implicitly and explicitly and are variably sensitive to the local seawater saturation state. The fate of PIC reaching the seafloor also differs between models due to differences in external Alk sources such as riverine fluxes and the need to conserve the global Alk inventory.
Our inter-ESM analysis reveals an improvement in the representation of the Alk distribution as compared to observations. In particular, the surface distribution and mean vertical profile of sAlk show improvements in CMIP6. This is consistent with a strengthened carbonate pump in CMIP6 resulting from a global increase in PIC export at 100 m. Despite such improvements, PIC export remains globally underestimated in the CMIP6 ensemble compared to observational estimates, while the sAlk vertical gradient is now too pronounced. The increase in the carbonate pump from CMIP5 to CMIP6 is the main driver of the reversal of the biases in the representation of the sAlk vertical profile. In addition, the shift towards lower surface values of sAlk results in lower surface values of sDIC, producing slight differences in surface-ocean carbonate chemistry between the CMIP ensembles. Specifically, the CMIP6 ESMs tend to have slightly lower surface-ocean pH and carbonate concentrations and exhibit enhanced meridional surface gradients in sAlk and sDIC. The changes in the Alk distribution in CMIP6, however, have a negligible impact on the simulated Revelle factor and therefore are likely to have little effect on the magnitude of the projected ocean carbon sink for a given emissions scenario.
An incomplete mechanistic understanding and the subsequent representation of the CaCO 3 cycle in ESMs currently limit confidence in projections of how the carbonate pump will respond to climate change and the potential climate feedback. Concerted work with biologists and observationalists is required to develop parameterizations that permit robust inter-ESM analysis of the response and feedback of the carbonate pump to projected anthropogenic emissions. This includes the development of saturation-state dependencies, benthic CaCO 3 production, and work on the coupling of PIC and POC. Future marine biogeochemical model intercomparison projects would benefit from the provision of additional model outputs, notably three-dimensional fields of organic and inorganic export fluxes, remineralization, dissolution, integrated organic and inorganic carbon production, and PIC and POC burial. We have highlighted the need for transparency and a change in approach in studies considering Alk because of its great sensitivity to the density that is considered in converting between gravimetric and volumetric units. In particular, greater harmonization of initialization protocols with respect to Alk would avoid model biases regarding the Alk inventory and surface Alk concentrations. Finally, model traceability could be substantially improved if a shared platform were utilized to document changes to ocean biogeochemical models and the assumptions or observations underlying these developments.

Appendix A: Open-ocean mask
Throughout our analysis the open ocean was defined as >250 km from the coast, neglecting small islands (Fig. A1). This acts to (i) mask closed seas -which can have environments very different from the open ocean and are typically poorly simulated by ESMs -and (ii) maximize coincident spatial coverage in the model ensemble and observations. Figure A1. Map of the different ocean basins considered in the analysis: the open ocean is the coloured area at least 250 km from the coast (small islands excluded). Note that an asymmetry was considered in the tropics for the regional basins between the Southern Hemisphere and Northern Hemisphere due to the difference in the locations of the subtropical gyres.

Appendix B: Salinity normalization
The historical way to normalize Alk and DIC in the whole ocean is to divide the values by the salinity and multiply them using a salinity value of reference, generally 35 g kg −1 (e.g. Postma, 1964;Millero et al., 1998). This strategy, which we employed, gives the Alk and DIC that would have the considered fluid parcel at a salinity of 35 g kg −1 (e.g. Sarmiento and Gruber, 2006;Fry et al., 2015). The errors associated with this strategy are essentially confined in the ocean mixed layer where evaporation and precipitation occur (Friis et al., 2003). However, other salinity normalization techniques have been developed ever since (Millero et al., 1998;Robbins, 2001;Friis et al., 2003;Lee et al., 2006;Carter et al., 2014) on the basis of observation-based relationships between surface salinity and Alk.
In order to assess the potential biases of the historical method that Sarmiento and Gruber (2006), Krumhardt et al. (2020), and we considered (sAlk), we compare it to the strategy developed by Robbins (2001), Friis et al. (2003), and Carter et al. (2014), also used by Sulpis et al. (2021;sAlk * where the overbar corresponds to the mean surface value of the observations from GLODAPv2 or the CMIP5/6 ensemble mean. Hence, the method of salinity normalization was found to have a negligible impact on vertical profiles of sAlk (Fig. B1b). A slight difference between the two methods is apparent in the surface distribution of normalized Alk, particularly in the Arctic Ocean (Fig. B1c). However, the conventional salinity normalization approach was preferred throughout our analysis, as it maintains the order of magnitude of Alk, avoids empirical relationships, and is therefore easier to interpret. Figure B1. Evaluation of two different salinity normalization strategies. (a) Open-ocean surface sAlk * and sAlk anomalies relative to the surface mean, and the difference. (b) Profiles of sAlk * and sAlk for GLODAPv2 as well as CMIP5 and CMIP6 ensemble means. sAlk is defined in Sect. 2.4 and sAlk * in Appendix B.

Appendix C: Model equations and parameters
In the following, we present the equations and parameters governing the ocean calcium carbonate cycle for each of the biogeochemical models involved in our analysis. The marine biogeochemical models are classified following Table 1, and the parameters are given in Tables C2-C19. Model nonspecific variables and parameters are defined in Table C1. If processes are not included in the equations, "N/A" is given, but related information can be found in Supplement Table S1. Finally, for the PIC balance equation, a Lagrangian derivative is considered when the PIC is advected (we improperly include its diffusion in this derivative if considered) rather than a partial derivative. Figure C1. Model description and performance with respect to the representation of Alk and the carbonate pump (expanded from Figs. 2 and 10). Details include biogeochemical modelling schemes, resolution and spin-up, and model performance. For PIC production, underlined letters correspond to the explicitly modelled variables on which PIC production relies. Asterisks denote that further details can be found in Supplement Table S1. For the spin-up duration, the maximum value attributed was set to 8000 years to keep the differences between the other ESMs and MPI ones visible. Performance metrics are normalized between −1 and 1 so that the model with maximum absolute bias corresponds to either 1 or −1, and better-performing models have lighter cell colours. For each row, the value given for the ESM with the greatest bias corresponds to the percentage of the GLODAPv2 observational estimate that this bias represents. When information is missing, the cell is left blank. ∝ molC Particulate organic carbon X N ∝ molN X shared in N (nitrogen) units rather than C (carbon) units X P ∝ molP X shared in P (phosphorus) units rather than C (carbon) units PIC export at 100 m: PIC production:  References used: Vichi et al. (2011Vichi et al. ( , 2013Vichi et al. ( , 2020 and Lovato et al. (2022) Marine biogeochemical model of CMCC-ESM2 (CMIP6) PIC balance: PIC export at 100 m: PIC production:

C8 OECO1
References used: Yoshikawa et al. (2008) and Watanabe et al. (2011) Marine biogeochemical model of MIROC-ESM (CMIP5) and MIROC-ESM-CHEM (CMIP5) PIC balance: for z ≥ 200 m PIC production:  References used: Schmittner et al. (2008) and Hajima et al. (2020) Marine biogeochemical model of MIROC-ES2L (CMIP6) PIC balance: PIC export at 100 m: PIC production: PIC export at 100 m: PIC production: PIC export at 100 m: no specific calculation PIC production: for a layer n below the lysocline 0 for a layer n above the lysocline PIC export at 100 m: PIC production:   Schmittner et al. (2008) and Tsujino et al. (2010Tsujino et al. ( , 2017 Marine biogeochemical model of MRI-ESM1 (CMIP5) and MRI-ESM2-0 (CMIP6) PIC balance: PIC export at 100 m: PIC production: Calcite pool in the phytoplankton group: 2−T PIC sinking speed: N/A Calcite pool in the phytoplankton group:  Tjiputra et al. (2010Tjiputra et al. ( , 2013, Schwinger et al. (2016), and Tjiputra et al. (2020) Marine biogeochemical models of NorESM1-ME (CMIP5) and NorESM2-LM (CMIP6) PIC balance: PIC export at 100 m: PIC production: Half-saturation constant for grazing K Si(OH) 4 1.5 × 10 −6 (CMIP5), 5.0 × 10 −6 (CMIP6) kmolSi m −3 Half-saturation constant for Si(OH) 4 uptake m P 0.008 d −1 Mortality rate of phytoplankton P v,P,min 1 × 10 −11 kmolP m −3 Minimum concentration of phytoplankton  (100) PIC production: Pr PIC m,N calc = r PIC calc · r C:N · exp (−0.0539 · T ) Pr PIC m,N arag = r PIC arag · r C:N · min 1 t , f _graz P large ·P m,N large · min max , max 0, arag − 1 Ex(PIC calc ) (n) = min(1, n − 1) · Ex (PIC calc ) (n − 1) + Pr PIC m calc (n) − Di PIC m calc (n) · ρ · h(n) PIC export at 100 m: no specific calculation PIC production: Pr PIC m calc (n) = r PIC calc · r C:P · 1 − f _frac P large ·Pr P m,P (n) · exp(−0.0539 · T ) ·min max calc , max (0, calc − 1) PIC sinking speed: N/A PIC dissolution:  Ex PIC calc/arag (100) = ρ · w · PIC m calc/arag (100) PIC production: Pr PIC m calc = r PIC calc ·r C:N ·min max , max (0, calc − 1) · η Z medium ·f _cons Z small ·Z m,N small +η Z small ·f _graz P small ·P m,N small + η Z large · f _graz P large · P m,N large +f _agg P small · P m,N small + f _agg P large · P m,N large Pr PIC m arag = r PIC arag · r C:N ·min max , max 0, arag − 1 · η Z large · f _cons Z medium · Z m medium + η hp ·f _hpcons Z medium · Z m,N medium +η hp · f _hpcons Z large · Z m,N large Bu(PIC arag ) = 0 Figure D2. Spatial distribution of the PIC and POC exports at 100 m. The PIC export, POC export and rain ratio at 100 m for the Atlantic (blue), Indian (orange) and Pacific (green) basins and meridional sub-regions of each basin are shown. Bar heights give the relative difference between the CMIP6 and CMIP5 ensemble means. Bar colours give the CMIP6 ensemble mean (same shades as the grey-coloured bars). Errors are the CMIP6 standard deviation with values normalized between 0 and 0.5 (the minimum and maximum standard deviations are respectively marked by black and white points on the colour bars). Ocean basins are delineated as in Fig. A1.  . Bar colour denotes model drift per century. Model names with an asterisk are not shown due to extreme drift relative to other models. piControl data for MRI-ESM1 (CMIP5) and piControl nitrate data for CanESM2 (CMIP5) were not available. Figure D8. Additional CMIP5 and CMIP6 model drift evaluation (continuation of Fig. D7). Drift in upper (i.e. the first 100 m) open-ocean salinity and temperature (averaged between the surface and 100 m) as well as globally integrated POC and PIC (aragonite and calcite) export at 100 m. For each panel, the first bar denotes the observational estimates with the associated standard deviation for PIC export ( 1 assessment at 300 m). Bar height denotes the mean value in the first 20 years of the piControl (corresponding to 1850-1870), and the black bar denotes the mean value in the last 20 years of the piControl (corresponding to ∼ 2080-2100). Bar colour denotes model drift per century. Model names with an asterisk are not shown due to extreme drift relative to other models. piControl data for MRI-ESM1 (CMIP5) were not available.

Appendix E: GLODAPv2 observation
We discuss here the observations from GLODAPv2, since they seem to have a slight offset in either surface DIC or surface Alk compared to both CMIP5 and CMIP6 ensembles (see Fig. 12a). We have investigated whether the way we averaged the data around 2002 -as the observations from GLODAPv2 are normalized in 2002 -could drive a DIC offset, but this does not actually explain it. Indeed, we report that, to centre the data in 2002 regarding the ocean uptake of anthropogenic carbon, we should have in fact averaged from 1992 to between 2010 and 2011. Averaging between 1992 and 2012 induces a slight excess of DIC due to the non-linear increase in carbon uptake over this period. Although this excess in DIC is globally confined in the ocean surface layer, it does not impact our analysis, with an increase of less than 0.001 mol m −3 for all the ESMs at the surface ocean, which is negligible. A bias in the assessment of the Alk components that are diagnosed to compute the CO 2 system, especially the borate one, might drive a pCO 2 offset, resulting in a change in the y intercept in Fig. 12a (see Eq. 15) of a few millimole per cubic metre (Orr and Epitalon, 2015). Similarly, we assessed the possible bias of using GLO-DAPv2 nutrients rather than the 2009 update of the WOA product (Boyer et al., 2018) which was available at the time of CMIP5 simulations. We report only a small difference between the observational products, essentially confined to deep waters, with a decrease of 3.8 % in the magnitude of the nitrate vertical profile at 5000 m (see Fig. 11b). Estimating the observed soft-tissue pump from the WOA product would therefore slightly reduce the bias between observations and the CMIP ESMs, with a globally negligible effect on our analysis of the vertical biases of sAlk and sDIC compared to the observations (see Fig. 9). The 10 % larger volume considered in WOA compared to GLODAPv2 (Lauvset et al., 2016) might partly explain the difference between the two data products at depth for nitrate and phosphate.
All that remains is to note the following biases in the observations and keep in mind their possible consequences: (i) many more observations are shared in the surface layer of the ocean than at depth, and in particular, the density of observations is divided by about 8 between the surface and the deep ocean. (ii) There are many more observations in both hemispheres during summer than winter. (iii) The spatial resolution of the gridded product is unequal, with excellent coverage of the North Atlantic Ocean and relatively bad coverage of the Southern Ocean (Olsen et al., 2020). In particular, this results in some important local common differences between the ESMs and GLODAPv2 (e.g. at high latitudes, where the ocean is seasonally covered by sea ice, and in the Weddell Sea especially).
Author contributions. This work is in the framework of the OMIP-BGC group, which contributed collectively to this study through the organization and execution of the CMIP exercises and the sharing of both simulation outputs and model parameterizations. AP: conceptualization, investigation, methodology, formal analysis, visualization, writing -original draft preparation -and project administration. LB and LK: supervision, funding acquisition, methodology, resources, conceptualization, and writing -original draft preparation. OT: software. All the co-authors: resources and writing -review and editing.
grants. We are also grateful to the administrative and technical staff at Ecole Normale Supérieure/PSL and to the work completed by the Biogeosciences journal team for typesetting and technical corrections. Hiroyuki Tsujino is grateful to Hideyuki Nakano and Shogo Urakawa for their kind support in carrying out the experiments at MRI. John P. Dunne and Charles Stock were helped by Jessica Luo in leading the internal NOAA review of the manuscript.
Financial support. Alban Planchat, Laurent Bopp, Lester Kwiatkowski, and Olivier Torres are grateful to the ENS-Chanel research chair. Laurent Bopp was funded by the European Union's Horizon 2020 research and innovation programme (project COMFORT, Our common future ocean in the Earth-system-quantifying coupled cycles of carbon, oxygen, and nutrients for determining and achieving safe operating spaces with respect to tipping points -grant no. Review statement. This paper was edited by Jack Middelburg and reviewed by Fortunat Joos and one anonymous referee.