Combining local model calibration with the emergent constraint approach to reduce uncertainty in the tropical land carbon cycle feedback

. The role of the land carbon cycle in climate change remains highly uncertain. A key source of projection spread is related to the assumed response of photosynthesis to warming, especially in the tropics. The optimum temperature for photosynthesis determines whether warming positively or negatively impacts photosynthesis, thereby amplifying or suppressing CO 2 fertilisation of photosynthesis under CO 2 -induced global warming. Land carbon cycle models have been extensively calibrated against local eddy flux measurements, but this has not previously been clearly translated into a reduced uncertainty 5 in how the tropical land carbon sink will respond to warming. Using a previous parameter perturbation ensemble carried out with version 3 of the Hadley Centre coupled climate-carbon cycle model (HadCM3C), we identify an emergent relationship between the optimal temperature for photosynthesis, which is especially relevant in tropical forests, and the projected amount of atmospheric CO 2 at the end of the century. We combine this with a constraint on the optimum temperature for photosynthesis, derived from eddy-covariance measurements using the adjoint of the JULES land-surface model. Taken together, the 10 emergent relationship from the coupled model and the constraint on the optimum temperature for photosynthesis define an emergent constraint on future atmospheric CO 2 in the HadCM3C coupled climate-carbon cycle under a common emissions scenario (A1B). The emergent constraint sharpens the probability density of simulated CO 2 change (2100-1900) and moves its peak to a lower value: 497 ± 91 compared to 607 ± 128 ppmv when using the equal-weight prior. Although this result is likely to be model and scenario dependent, it demonstrates the potential of combining the large-scale emergent constraint approach 15 with parameter estimation using detailed local


Introduction
One of the key sources of uncertainty in future climate projections is the evolution of the land carbon sink (Friedlingstein et al., 2006;Cox et al., 2000;Arora et al., 2020;Canadell et al., 2021).As climate change elevates global temperatures and CO 2 conditions, the rate and efficiency of vegetation photosynthesis and respiration changes, influencing the capacity of the land to act as a repository for anthropogenic CO 2 (Medlyn et al., 1999;Cox et al., 2000;Friedlingstein et al., 2006).The structure and distribution of vegetation may also change in response to associate climate change, such as changes in precipitation patterns 1 https://doi.org/10.5194/egusphere-2023-274Preprint.Discussion started: 1 March 2023 c Author(s) 2023.CC BY 4.0 License.(Trenberth, 2011).These responses provide a feedback on the initial climate change signal, potentially leading to key transitions and tipping points in the land biosphere.Notable examples include a global carbon sink to source transition, Amazon rainforest dieback (Cox et al., 2004), shifting of the boreal forests (Chapin et al., 2004), and greening of the Sahel (Claussen et al., 2002).
Despite the increasing complexity of the climate-carbon cycle models developed for the latest IPCC (International Panel on Climate Change) Assessment Report (AR6), there is still a significant spread in projections of vegetation and soil carbon under common trajectories of atmospheric greenhouse gases and aerosols (Canadell et al., 2021).This spread arises partly from different climate projections within the host climate model and partly from uncertainties in the land surface models themselves.
Indeed, for the Joint U.K. Land Environment Simulator (JULES) land-surface model (Clark et al., 2011;Best et al., 2011) under one of the IPCC Special Report on Emissions Scenarios (SRES -A1B; Nakicenovic et al. (2000)), the atmospheric CO 2 change by the end of the century (∆CO 2 ) was found to range from 373.8 ppmv to 845.7 ppmv (Booth et al., 2012).This range was achieved simply by perturbing some of the model parameters related to the sensitivities of plant photosynthesis and soil respiration to temperature; stomatal conduction; soil water availability and surface evaporation; and plant competition.The key source of projection spread was found to be related to the assumed response of photosynthesis to warming, especially in the tropics Kattge and Knorr (2007); Booth et al. (2012); Cox et al. (2013); Mercado et al. (2018).Indeed, the optimum temperature for photosynthesis (T opt ) is a common parameter in land-surface models that determines whether warming has a positive or negative impact on photosynthesis, thereby either amplifying or suppressing CO 2 fertilisation of photosynthesis under CO 2 -induced global warming (Friedlingstein et al., 2006;Arora et al., 2020).
There is an urgent need to reduce such parametric uncertainties to make reliable and believable climate projections.Usually, to reduce uncertainty in model simulations, models are confronted with observations.However, although there is now an unprecedented amount of in situ and Earth Observation (EO) data with which to confront the models, the relatively shorter timescales mean these cannot be directly used to create constraints on changes in the Earth System over the next century.Furthermore, it is extremely computationally expensive to run complex land carbon cycle models (also known as land-surface models -LSMs), within Earth System Models (ESMs) to produce multiple climate-carbon cycle projections.Instead, computationally efficient ways to translate short-term constraints into reductions in long-term projection uncertainty need to be developed.
Emergent constraints are used to bridge the gap between short-term contemporary observations and long-term future predictions (Cox et al., 2013;Wenzel et al., 2014Wenzel et al., , 2016;;Hall et al., 2019;Williamson et al., 2021).Using the constraints provided by observations and physical understanding available today, emergent constraints can be used to assess the relative likelihood of different long-term trends (Allen et al., 2002).Emergent constraints identified in the carbon cycle include the sensitivity of the annual growth rate of atmospheric CO 2 to tropical temperature anomalies (Cox et al., 2013), and the changing amplitude of the CO 2 seasonal cycle to the projected land photosynthesis (Wenzel et al., 2016;Hall et al., 2019;Williamson et al., 2021).
Data assimilation has been shown to be a useful and versatile tool to constrain the response of the carbon cycle in LSMs in the short term.DA techniques use contemporary observations to improve the performance of a model by optimising two different components; either the values of unknown parameters (parameter estimation) or the predictions of the model according to a given data set (state estimation).In both cases, this is achieved by trying to find an optimal match between the model and https://doi.org/10.5194/egusphere-2023-274Preprint.Discussion started: 1 March 2023 c Author(s) 2023.CC BY 4.0 License. the observations by varying the properties of the model.In numerical weather prediction, DA has predominantly been used to optimise the state whilst keeping the parameters fixed.This is because the physics are mostly known and well-understood.
However, in terrestrial carbon cycle models, where most of the equations are unknown, finding the correct set of parameters is more pertinent.These models can have over a hundred internal parameters representing the environmental sensitivities of the various land-surface and plant functional types.These parameters are generally chosen to represent measurable real-world quantities (e.g.surface albedo, plant root depth).This allows observationally-based estimates of these parameters to be made in the early stages of the model development process.However, the detailed performance of an LSM can be very sensitive to such internal parameters and so it is common for land-surface modellers to calibrate their models against available observations.Since optimisations give the best possible values of parameters given the model parameterisation and structural errors, the results are more reliable than field measurements of the same parameters, often taken a different spatial scales than model resolution.
In this study, we show how we can combine parameter optimisation with emergent constraint techniques to reduce uncertainty in future projections.Specifically, we derive an emergent constraint between a linear regression across the possible JULES T opt values between the change in CO 2 by the end of the century (∆CO 2 ), and the posterior distribution of parameter T opt optimised against GPP and LE in situ measurements.

A relationship between T opt and ∆CO 2
In Booth et al. (2012)'s study, a large range of climate-carbon cycle feedbacks was found by perturbing the model parameters in the land surface component of the Hadley Centre global circulation model (version 3,HadCM3C).This experiment was conducted under the common climate scenario, A1B, which describes a future world of very rapid economic growth, a global population that peaks in the mid-century and declines after that, and the rapid introduction of new and more efficient technologies, with a balance of fossil intensive and non-fossil energy sources (Nakicenovic et al., 2000).One of the parameters perturbed in Booth et al. (2012) was T opt , which corresponds to the optimal temperature for non-light limited photosynthesis for broadleaf forests.This parameter was identified as the most important in controlling the carbon response of the model.Indeed, a statistically highly significant (p=0.000153)relationship between T opt and net CO 2 change by 2100 (∆CO 2 ) was found, whereas the rest of the parameters perturbed in the experiment showed little to no correlation with this change (Booth et al., 2012).T opt and ∆CO 2 were shown to be anti-correlated, with higher values of T opt resulting in lower values of ∆CO 2 .This implies that when the optimal temperature for photosynthesis for broadleaf trees is high, more CO 2 is predicted to be removed from the atmosphere through increased CO 2 fertilisation.This is particularly relevant in the tropics, where in a warming world, ambient temperatures have the potential to exceed optimal photosynthetic temperature persistently, and where broadleaf trees represent large carbon stocks (Booth et al., 2012).
Using linear regression, we can exploit this relationship to calculate a probability distribution function (PDF) for the distri- where f is the function describing the linear regression between ∆CO 2 and T opt , and σ f is the "prediction error" of the regression.

A constraint on T opt using local eddy-flux measurements
The land-surface component of HadCM3C was the Met Office Surface Exchange Scheme (MOSES, Cox et al. (1999)), which became the Joint U.K. Land Environment Simulator (JULES).The adJULES system (Raoult et al., 2016) was developed specifically to optimise the internal parameter of the JULES land surface model using data assimilation.Data assimilation allows the integration of multiple types of data (y) in order to optimise model parameters (x) while making allowance for associated uncertainties.It is a powerful tool which allows for objective and repeatable calibrations.A Bayesian framework is used to include prior knowledge about the parameters (x b ).All errors are assumed to be Gaussian distributed (with R and B the prior error covariance matrices for the observations and parameters, respectively).The optimisation corresponds to minimising the mismatch (J) between the model outputs and the observed data with respect to x: where M (x) is the model output vector given x.Methods for minimising the cost function range from stochastic random search algorithms to deterministic gradient-based methods.
This second class of methods was integrated into the adJULES system (Raoult et al., 2016).The adJULES system uses the adjoint of the JULES model, a computationally efficient way used to calculate the gradient of Eq. 2. The adjoint allows for efficient and repeatable optimisations utilising the gradient information.The quasi-Newton algorithm L-BFGS-B (limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm with bound constraints; see Byrd et al., 1995) is used to minimise the cost function iteratively.At each iteration of the algorithm, the cost function and its gradient with respect to each parameter are evaluated.The adjoint also allows for the accurate calculation of the Hessian (second derivative of the cost function) at the optimum.The Hessian determines the posterior error covariance matrix, which is used to calculate the posterior uncertainties associated with the best-fit parameters (in the form of PDFs).
Deriving the adjoint of a model as complex as JULES is extremely costly.Fortunately, this has been done for JULES v2.Pastorello et al., 2020).The optimisation returned best-fit parameters with posterior distributions much narrower than the prior, reducing the range of viable parameter values.From these posterior distributions, we obtain an observational-constrained PDF for T opt , i.e., P (T opt ).

Calculation of the PDF for ∆CO 2
We follow the method used by Cox et al. (2018) to bring these two elements together and calculate the PDF for ∆CO 2 .The PDF for ∆CO 2 is calculated by numerically integrating over the product of two PDFs, P {∆CO 2 |T opt } and P (T opt ): (3)

Results and discussion
Figure 1 shows how the distribution of likely T opt values, i.e., P (T opt ), changes when the JULES LSM is optimised against local measurements of photosynthesis (GPP) and latent heat (LE) using the adJULES system (Raoult et al., 2016).We can see that the posterior distribution is much more pronounced than the prior and suggests a higher parameter value than previously used.Values of T opt taken from this distribution, when used in the JULES model, will result in the best fit of the model to local measurements of photosynthesis (GPP) and latent heat (LE), and therefore improve the model's credibility.
As well as displaying the results found by optimising simultaneously over all of the broadleaf sites found in Raoult et al. (2016), Fig. 1 also considers distributions of T opt found optimising at each individual broadleaf site.Though none of these gives such a narrow distribution, the majority do suggest that the optimal value for the parameter (shown by the peak of the distributions) is higher than previously used in the JULES model.This gives confidence in the posterior distribution found by calibrating over all sites.Furthermore, one of the known limitations of gradient-based methods is their tendency to get stuck in local minima (i.e., not finding the 'true' global minimum).Optimisations over multiple sites have been shown to be more robust, with the additional constraints from each site acting to smooth the cost function, thus making local minima less common.As such, multi-site optimisations are more reliable in finding the true best-fit parameters and associated PDFs.For the remainder of this study, we will solely use the posterior distribution found by calibrating over all sites.
We can now translate the reduction the uncertainty in the T opt into a reduction of uncertainty in carbon-climate feedbacks.Figure 3a shows this PDF.This PDF is compared to the histogram arising from assuming that all of the T opt values in the ensemble are equally likely to be true.The emergent constraint from the T opt optimisation sharpens the PDF of CO 2 change (2100-1900) and moves its peak to a lower values: 496.5 ± 91 compared to 606.6 ± 128 ppmv when using the equal-weight prior.Figure 3a shows the resulting cumulative density function (CDF), which gives the probability of CO 2 change (2100- bution of ∆CO 2 given T opt , i.e., P {∆CO 2 |T opt }.The contours of equal probability density around the best-fit linear regression https://doi.org/10.5194/egusphere-2023-274Preprint.Discussion started: 1 March 2023 c Author(s) 2023.CC BY 4.0 License.follow a Gaussian probability density 2, which uses the same photosynthesis model as MOSES, allowing us to optimise the same parameters and photosynthesis model as used in HadCM3C and, therefore, inBooth et al. (2012)'s perturbation experiment.InRaoult et al. (2016), adJULES was used to improve the model performance at a wide range of broadleaf sites by optimising the key land surface parameters perturbed inBooth et al. (2012).Each parameter was assigned a wide prior distribution, allowing the parameters to take values from a large range of credible values elicited from expert opinion.The optimisation was performed using monthly in situ gross primary productivity (GPP) and latent heat (LE) data from CO 2 eddy-fluxes measured at FluxNet sites(Baldocchi et al., 2001;   https://doi.org/10.5194/egusphere-2023-274Preprint.Discussion started: 1 March 2023 c Author(s) 2023.CC BY 4.0 License.
Instead of running computationally expensive climate models with a new set of the parameter ensembles generated from the posterior distribution, this posterior PDF in T opt can be directly translated into a PDF for atmospheric carbon change using the carbon cycle sensitivity identified inBooth et al. (2012) as an emergent constraint.The linear relationship between T opt and CO 2 change is shown in Fig.2.The vertical blue lines included in this figure show the T opt constraint from adJULES.These lines are found at the upper-end of the figure and select a narrow range of T opt values.Using this constraint, we can derive tighter bounds on the CO 2 response of the model.The linear regression and T opt constraint can then be used to generate contours from the product of the two PDFs, and hence the T opt -constrained PDF of CO 2 change between 1900 and 2100.https://doi.org/10.5194/egusphere-2023-274Preprint.Discussion started: 1 March 2023 c Author(s) 2023.CC BY 4.0 License.

Figure 1 .
Figure 1.Different PDFs of P (Topt) found when using the adJULES system to optimise the JULES land-surface model against Fluxnet data.The prior distribution (red) of the parameter is compared to the posterior distribution (purple) found by calibrating simultaneously over the 27 broadleaf FluxNet sites considered in Raoult et al. (2016) (i.e., a multi-site optimisation), as well as the individual posterior distributions found by calibrating at each site separately (i.e., single-site optimisations).All distributions are modelled by a Gaussian curve.Note the range used in optimisation (entire x-axis) is greater than the range used in Booth et al. (2012) (vertical black lines).Initial value of Topt in JULES is highlighted by the dashed red line.

155 1900 )
taking a value lower than the value shown on the x-axis.The 95% confidence limits (shown by the black horizontal lines) range from 300 ppmv to 650 pmv.We see that values higher than 650 ppmv become extremely unlikely.The T opt constraint, therefore, reduces the estimated probability of CO 2 change values, predicting a slightly stronger carbon sink over broadleaf trees than previously suggested by the JULES climate predictions and reducing the range of possible responses by 30% and discounting higher values of CO 2 change.160 6 https://doi.org/10.5194/egusphere-2023-274Preprint.Discussion started: 1 March 2023 c Author(s) 2023.CC BY 4.0 License.

Figure 2 .
Figure 2. Contours of probability density for the linear regression adapted from Booth et al. (2012).The thin black dashed-line shows the best-fit linear regression, and the think black lines show plus and minus the prediction error (see Methods).The vertical blue lines show the observational constraint on Topt value, with the best fit shown by the thin dashed blue line, and the think vertical dashed lines showing plus and minus one standard error about this value.The continuous contours are the product of these two underlying PDFs.The integral of these contours across the x-axis variable leads to the Topt-constrained PDF shown in Figure 3a.

Figure 3 .
Figure 3. Emergent constraint on the sensitivity of Topt to the magnitude of future carbon cycle response.The horizontal dot-dashed linesshow the 95% confidence limits on the CDF plot.The orange histograms (both panels) show the prior distributions that arise from equal weighting of parameter perturbation experiment in 500 ppmv bins.