Carbon and Water Fluxes of the Boreal Evergreen Needleleaf Forest Biome Constrained by Assimilating Ecosystem Carbonyl Sulfide Flux Observations

Gross primary production (GPP) by boreal forests is highly sensitive to environmental changes. However, GPP simulated by land surface models (LSMs) remains highly uncertain due to the lack of direct photosynthesis observations at large scales. Carbonyl sulfide (COS) has emerged as a promising proxy to improve the representation of GPP in LSMs. Because COS is absorbed by vegetation following the same diffusion pathway as CO2 during photosynthesis and not emitted back to the atmosphere, incorporating a mechanistic representation of vegetation COS uptake in LSMs allows using COS observations to refine GPP representation. Here, we perform ecosystem COS flux and GPP data assimilations to constrain the COS‐ and GPP‐related parameters in the ORCHIDEE LSM for boreal evergreen needleleaf forests (BorENF). Assimilating ecosystem COS fluxes at Hyytiälä forest increases the simulated net ecosystem COS uptake by 14%. This increase largely results from changes in the internal conductance to COS, highlighting the need to improve the representation of COS internal diffusion and consumption. Moreover, joint assimilation of ecosystem COS flux and GPP at Hyytiälä improves the simulated latent heat flux, contrary to the GPP‐only data assimilation, which fails to do so. Finally, we scaled this assimilation framework up to the boreal region and find that the joint assimilation of COS at Hyytiälä and GPP fluxes at 10 BorENF sites increases the modeled vegetation COS uptake up to 18%, but not GPP. Therefore, this study encourages the use of COS flux observations to inform GPP and latent heat flux representations in LSMs.

2 of 26 covariance observations of net ecosystem exchange into respiration components and GPP. However, flux partitioning methods rely on important assumptions about the relationship between fluxes and their environmental drivers and are impacted by various sources of uncertainty (Tramontana et al., 2020). At regional to global scales, land surface models (LSMs) can simulate GPP from process representations, but the lack of direct GPP measurements makes it challenging to evaluate and improve the representation of GPP in LSMs (Anav et al., 2015). In addition, because boreal forests are highly sensitive to environmental changes, rapid ongoing changes in this biome impact gas exchange and lead to large uncertainties in GPP estimates simulated by LSMs (Bonan, 2008;Fisher et al., 2018) or obtained from data-driven methods (Goetz et al., 2005).
To address the uncertainty in GPP, several optical proxies have been used to infer GPP estimates or to improve GPP representation in LSMs. Vegetation indices (VIs) inform GPP seasonality by tracking seasonal changes in vegetation greenness (Shen et al., 2014). However, the small variations in vegetation greenness over the season for evergreen boreal forests limit the effectiveness of such VIs in tracking photosynthetic activity. Changes in snow cover also affect the monitoring of seasonal variations in greenness and the estimation of GPP using VIs (Beck et al., 2006;Böttcher et al., 2014;Delbart et al., 2005). Solar-induced fluorescence (SIF), radiation in the red or far-red bands emitted from illuminated chlorophylls, can be retrieved from satellite observations and strongly correlates with GPP at large spatial and temporal scales (Frankenberg et al., 2011). However, SIF is also affected by snow cover and previous studies have reported differences in phenology between GPP derived from SIF and that from VIs (Chang et al., 2019;Li et al., 2018), especially under water stress .
Vegetation carbon uptake is also coupled with water loss, both controlled by stomatal diffusion. Plant transpiration is a key process for ecosystem functioning, sharing common environmental drivers with GPP. Transpiration also faces similar challenges as GPP as it is not well constrained by observations (Schlesinger & Jasechko, 2014), limiting the ability of LSMs to accurately represent its spatial and temporal dynamics (Mencuccini et al., 2019). Transpiration measurements are performed at the leaf or branch scale, it is difficult to upscale them to the ecosystem level (Jarvis, 1995). At the ecosystem scale, evapotranspiration can be measured using flux tower remote sensing methods (Wang & Dickinson, 2012), but it includes not only plant transpiration but also evaporation from the soil and other surfaces due to water interception, or snow sublimation.
Carbonyl sulfide (COS) has emerged as a promising tracer to track GPP (Campbell et al., 2008;Montzka et al., 2007;Sandoval-Soto et al., 2005;Seibt et al., 2010;Stimler et al., 2010), providing complementary information to existing optical proxies of GPP. COS, an atmospheric trace gas, is absorbed by vegetation following the same diffusion pathway as CO 2 during photosynthesis. Inside the leaves, COS is presumed to be totally hydrolyzed by the carbonic anhydrase (CA), an enzyme also involved in CO 2 fixation during photosynthesis (DiMario et al., 2016), and is normally not emitted back to the atmosphere. Therefore, the main advantage of COS lies in the fact that it allows GPP to be estimated independently of CO 2 measurements. In addition, COS helps to constrain stomatal diffusion, which determines the coupling between CO 2 uptake and water loss (Berkelhammer et al., 2020;Kooijmans et al., 2017;Wehr et al., 2017).
Previous studies have used COS to infer stomatal conductance, to investigate stomatal control on GPP, and to explore the physiological links between plant COS uptake, GPP, and transpiration (Berkelhammer et al., 2014;Wehr et al., 2017;Wohlfahrt et al., 2012). Alternatively, some studies have linked vegetation COS uptake to GPP using the leaf relative uptake (LRU) approach, based either on empirical ratios between COS and CO 2 deposition rates in plants (Asaf et al., 2013;Kooijmans et al., 2019), or more recently based on stomatal conductance theories as a function of humidity, temperature, light, or CO 2 concentration (Kohonen, Dewar, et al., 2022;Sun et al., 2022). Process-based representations of COS uptake by plants have also been implemented in LSMs (Berry et al., 2013;Kooijmans et al., 2021;Maignan et al., 2021). These mechanistic models simulate the dynamics of plant COS uptake and of the conductances involved in stomatal diffusion, and provide new global estimates of the vegetation COS sink.
However, the representation of COS exchanges between the atmosphere and land ecosystems rely on parameterizations that are still highly empirical and supported by limited measurements. Indeed, multiyear observations of ecosystem COS fluxes are only available at two sites, Harvard Forest in the United States with 2 years (2012-2013) of measurements and Hyytiälä Forest in Finland with 5 years (2013-2017) of measurements. So far, two studies have used COS flux measurements to directly constrain parameters related to COS and GPP in LSMs (Chen et al., 2023;Cho et al., 2023). Chen et al. (2023) focused on optimizing two parameters in the Boreal Ecosystem Productivity Simulator (BEPS) LSM at these two sites, one parameter related to the carboxylation rate 3 of 26 of the Rubisco enzyme, which is also involved in vegetation COS uptake representation, and another parameter specific to the vegetation COS model. The impacts of other physiological parameters on the simulated ecosystem COS flux and GPP remain largely unexplored.
While vegetation uptake is the largest COS sink, several sinks and sources also contribute to the global COS budget . Soils can absorb COS because soil microorganisms contain CA and other COS hydrolases (Masaki et al., 2021), or emit COS due to microbial or abiotic COS production . The ocean is a significant source of COS through direct emissions or indirect emissions via dimethyl sulfide (DMS) and carbon disulfide (CS 2 ) (Lennartz et al., 2017(Lennartz et al., , 2020. Anthropogenic activities are also a major source of COS (Zumkehr et al., 2018), and biomass burning contributes to COS surface emissions (Stinecipher et al., 2019). In the atmosphere, COS can be destroyed through oxidation by OH radicals in the lower troposphere or photolysis .
Currently, a major challenge of using COS as a global-scale GPP proxy is the imbalance of the global COS budget, as recently highlighted in the intercomparison of atmospheric COS transport models conducted by Remaud et al. (2023). A previous inversion study by Ma et al. (2021) has suggested the likely presence of a missing COS source in the tropics, and an underestimated COS sink in northern high latitudes. Another atmospheric inversion of COS and CO 2 concentrations carried out by Remaud et al. (2022) supports these findings for COS while also showing an underestimation of GPP simulated by the ORCHIDEE LSM in the high latitudes. Moreover, Vesala et al. (2022) used the longest ecosystem COS flux measurements (5 years) made at the Hyytiälä forest, Finland, to develop an empirical parametrization of vegetation COS fluxes based on environmental drivers. This parameterization was scaled up to boreal evergreen needleleaf forest biome using the SiB4 LSM (Haynes et al., 2020), leading to an increase in COS uptake in the high latitudes, compared to the one computed using the mechanistic COS model implemented in SiB3 by Berry et al. (2013).
In this context, the goal of this study is to evaluate the potential of COS to directly constrain the representation of GPP in the ORCHIDEE LSM using a data assimilation framework. This work addresses the following questions: 1. To what extent can biospheric COS flux measurements help to constrain stomatal diffusion of CO 2 and COS, and as a result, GPP? 2. What is the impact of assimilating biospheric COS fluxes on the simulated latent heat flux (LE)? 3. How does assimilating ecosystem COS flux observations at one site along with GPP data from multiple sites impact the simulated COS, GPP, and LE fluxes over the entire biome of boreal evergreen needleleaf forests?
Here, we optimize the GPP-and COS-related parameters by assimilating GPP and ecosystem COS fluxes at the Hyytiälä forest to evaluate the impact on the simulated vegetation COS and CO 2 uptakes, and LE. In particular, we focus on the constraint provided by these flux data assimilations during a severe drought event in 2006. Then, we perform multi-site assimilations to assess the changes in GPP, LE, and vegetation COS fluxes for the boreal evergreen needleleaf forest biome. Finally, we discuss necessary improvements in COS flux representation in LSMs, as highlighted in this study, and the implications of assimilating ecosystem COS fluxes to constrain GPP compared to using the empirical LRU approach.

Methods
The general workflow of this study is represented in Figure 1 illustrating the main steps carried out. First, sensitivity analyses over the variables of interest were conducted following prior simulations in ORCHIDEE. This led to selecting the model parameters to include in the optimizations, which were performed either at the site scale or considering multiple sites for the assimilation. Finally, the optimized simulations following these two optimization procedures are evaluated against eddy covariance data and global evaluation products. These different steps are detailed in the following sections.

Description of the Studied Sites
We selected 16 boreal evergreen needleleaf forest sites from the FLUXNET network. All sites are located in North America and Europe between 40° and 68°N, with each continent featuring eight sites, as illustrated in ABADIE ET AL.
10.1029/2023JG007407 4 of 26 Figure 2. Mean annual air temperatures range from −1.4° to 6.9°C, and mean annual precipitation ranges between 149 and 1,440 mm across these sites. Two sites are located at a high elevation: Davos (Switzerland) at 1,639 m and Niwot Ridge (United States) at 3,050 m. These 16 sites were selected because the normalized root mean square deviation (nRMSD) between the FLUXNET GPP estimates and the GPP simulated in ORCHIDEE is lower than 25%. Sites for which the nRMSD exceeds this threshold may have been impacted by processes not represented in the ORCHIDEE version used in this study, such as fires or clear-cuts, and are therefore not selected for data assimilation. A description of the sites is provided in Table S1 in Supporting Information S1.
Among these 16 sites, COS measurements were only carried out at Hyytiälä forest, Finland (FI-Hyy; 61.845°N, 24.288°E). FI-Hyy is a coniferous forest planted in 1962 (Suni et al., 2003) dominated by Scots pine (Pinus sylvestris) and Norway spruce (Picea abies). The climate is boreal with an annual mean temperature of 4.5°C and an annual mean precipitation of 632 mm. The soil type is described as Haplic Podzol (Sun et al., 2018).

Site Measurements of COS, GPP, and LE Fluxes
We used GPP and LE measurements from the FLUXNET global network (La Thuile: Baldocchi et al., 2001or FLUXNET2015: Pastorello et al., 2020, which are available at a half-hourly time step for the 16 selected sites. LE and net ecosystem exchange (NEE) are measured using the eddy covariance (EC) method. Then, GPP is retrieved from NEE measurements based on a nighttime partitioning method (Reichstein et al., 2005) using a variable friction velocity ("U-star") threshold for each year (VUT). This means that daytime respiration is first estimated with a respiration model parameterized with nighttime NEE data, and GPP is then obtained as   Pastorello et al., 2020 for details). No significant difference was found between FLUXNET GPP retrieved from the nighttime or daytime (Lasslop et al., 2010) partitioning method at FI-Hyy (not shown).
For COS observations, EC measurements were carried out at FI-Hyy from 2013 to 2017 (Vesala et al., 2022), along with soil chamber measurements in 2015 (Sun et al., 2018) and branch chamber measurements in 2017 (Kooijmans et al., 2019). EC fluxes were measured at 23 m height using an Aerodyne quantum cascade laser spectrometer (QCLS, Aerodyne Research, Billerica, MA, USA). EC data were processed following the recommendations by Kohonen et al. (2020) regarding quality-check and gap-filling, resulting in a half-hourly EC flux data set. EC observations are available from April to November in 2013, March to September in 2014, July to October in 2015, April to November in 2016, and January to August in 2017, with a lack of data during wintertime. Soil COS measurements were conducted from July-November 2015 using two automated soil chambers connected to another Aerodyne QCLS, the same model used for EC observations (Sun et al., 2018). Understory herbs and bryophytes were removed prior to performing chamber measurements to eliminate the influences of plant COS uptake on observed soil COS fluxes.

Characterization of a Drought Event at Hyytiälä
We focused on a drought event that occurred at FI-Hyy to investigate the potential of COS to constrain GPP and LE under these specific stress conditions. FI-Hyy did not undergo any drought event between 2013 and 2017 when the EC COS measurements were carried out (Vesala et al., 2022). However, a severe drought was reported in summer 2006, causing large damage in southern Finland (Gao et al., 2017). The intensity of this drought was characterized by the soil moisture index (SMI), which is defined as the difference between observed volumetric soil moisture and volumetric soil moisture at wilting point, divided by the difference between volumetric soil moisture at field capacity and at wilting point. Gao et al. (2017) found very dry conditions (SMI <0.20) for 37 consecutive days (from 23 July to 28 August) with the most severe part of the drought between 1 and 17 August (SMI <0.15).

GPP and LE Global Observation Products
We evaluated the simulated fluxes over the whole boreal evergreen needleleaf forest region with several global GPP and LE products. First, we used global GPP data products from the FLUXCOM version RS (Jung et al., 2019 and FLUXSAT version 2.0 (Joiner et al., 2018) databases, which are produced by applying different machine-learning upscaling methods to FLUXNET EC measurements and remote sensing data. FLUXCOM GPP is available between 2001 and 2015, whereas FLUXSAT GPP is available from 2000 to 2020.
The FLUXCOM database also provides global estimates of LE derived from the same approach as for GPP. In addition, we considered a second global product for LE from the Global Land Evaporation Amsterdam Model (GLEAM), which separately computes different components of evapotranspiration (ET) from satellite data (Martens et al., 2017) from 1980 to 2021. For all global products, we used monthly average fluxes at a 0.5° spatial resolution to match the temporal and spatial resolution of our global simulated fluxes.

The ORCHIDEE Land Surface Model
The ORCHIDEE land surface model (LSM) is developed at the Institut Pierre Simon Laplace (IPSL). Here, we used the version involved in the Coupled Model Intercomparison Project Phase 6 (CMIP6) (Boucher et al., 2020;Cheruy et al., 2020). ORCHIDEE solves the water, carbon and energy budget between land surfaces and the atmosphere (initially described in Krinner et al., 2005). Fast processes such as photosynthesis, hydrology and energy balance are computed every 30 min, while slow processes such as carbon allocation and phenology are run at a daily timestep. The different vegetation types are grouped into plant functional types (PFTs) with similar characteristics in terms of structure, bioclimatic range, leaf phenology, and the photosynthetic pathway. We ran ORCHIDEE distinguishing among 14 classes of PFTs and a class representing bare soil. Each model grid cell is associated with fractions of PFTs prescribed using yearly varying PFT maps derived from the ESA Climate Change Initiative (CCI) land cover products .
In ORCHIDEE, photosynthesis is computed following the approach of Yin and Struik (2009)  and photosynthesis are controlled by a stress factor, which varies linearly from 0 at the wilting point (maximum stress) to 1 when soil moisture is close to the field capacity (no stress). This stress factor regulates stomatal conductance by controlling the minimum stomatal conductance (when irradiance approaches zero) and the factor that describes the effect of the leaf-to-air vapor pressure deficit (VPD) (Yin & Struik, 2009). This stress factor also regulates the mesophyll conductance, Rubisco carboxylation, and leaf day respiration.
The stomatal conductance (g s ) is defined following Yin and Struik (2009) with g 0 the residual stomatal conductance (m s −1 ), A the CO 2 assimilation (minimum between the Rubisco-limited rate and electron transport-limited rate) (μmol m −2 s −1 ), R d the day respiration (μmol m −2 s −1 ), C i the intercellular CO 2 partial pressure (μmol m −2 s −1 ), C i* the base CO 2 compensation point in the absence of R d (μmol m −2 s −1 ), and f VPD the function describing the effect of VPD, defined as, where A1 and B1 are empirical factors (Table S2 in Supporting Information S1).
A global soil map based on the Food and Agriculture Organization of the United Nations/United States Department of Agriculture (FAO/USDA) texture classification describes the distribution of soil textures in 12 classes (Reynolds et al., 2000). The soil texture in each grid cell determines soil properties such as soil porosity, wilting point, and field capacity.
The ORCHIDEE LSM can be run both at the site level or at the global scale. The site simulations were forced by local micro-meteorological measurements from the FLUXNET network (Pastorello et al., 2020). For global simulations, we used the 0.5° 6-hourly CRU JRA reanalysis (University of East Anglia Climatic Research Unit-Japanese Reanalysis; Friedlingstein et al., 2020). Because simulated vegetation COS uptake depends on COS concentrations in the atmospheric boundary layer Kooijmans et al., 2021), we prescribed near-surface COS and CO 2 concentrations using 3-hourly simulated concentrations extracted from the first vertical level (33 m above sea or ground level) of the Laboratoire de Météorologie Dynamique (LMDz) atmospheric transport model output. The atmospheric COS and CO 2 concentrations were obtained using optimized COS and CO 2 surface fluxes inferred from atmospheric inverse modeling as described in Remaud et al. (2022).

ORCHIDEE Simulations
We first ran site simulations at the 16 selected boreal evergreen needleleaf forest sites. This vegetation type is represented by a dedicated PFT (BorENF) in ORCHIDEE. For each site, a "spin-up" phase was performed to reach an equilibrium state at which all carbon pools are stable and the net biome production oscillates around zero in the absence of any disturbances. In ORCHIDEE, around 340 years are usually needed to reach this equilibrium state as the convergence is accelerated using a pseudo-analytical iterative estimation for the soil carbon pools (Lardy et al., 2011). We carried out the spin-up phase at each site by cycling over the available years (Table  S1 in Supporting Information S1) in the FLUXNET meteorological forcing file for about 340 years and using a constant atmospheric CO 2 concentration of 312 ppm (corresponding to the year 1950). Then, we performed transient simulations for about 60 years, by cycling over the years in the FLUXNET forcing files, to introduce disturbances related to climate change, land use change, and increasing CO 2 atmospheric concentrations. Following the transient phase, we ran the site simulations over the period available in the FLUXNET forcing file (Table  S1 in Supporting Information S1).
For the simulations over the boreal region, we selected a latitudinal range between 40° and 80°N. We followed the same simulation protocol as for site simulations with preceding spin-up and transient phases. The 340-year spin-up phase was performed by cycling over 10 years (1931)(1932)(1933)(1934)(1935)(1936)(1937)(1938)(1939)(1940) of CRU JRA reanalysis forcing files. Then, we carried out the transient phase cycling over the same 10 years in the forcing files where disturbances were introduced from 1940 to 2000, followed by a global simulation from 2000 to 2019.
For site and regional simulations, we evaluated the simulated GPP and biospheric COS fluxes against the corresponding observations using the root mean square deviation (RMSD) and the normalized root mean square 7 of 26 deviation (nRMSD), for which the normalization denominator is defined as the maximum value minus the minimum value of the observations.

Biosphere COS Exchange Models
Mechanistic representations of vegetation and soil COS fluxes have been previously implemented in ORCHIDEE Maignan et al., 2021). The vegetation COS model is based on Berry et al. (2013) and describes COS uptake by plant as a one-way diffusion from the atmosphere to the leaf interior where COS is irreversibly hydrolyzed by CA (Equation 3).
with F COS,veg the vegetation COS uptake flux (pmol m −2 s −1 ), [COS] the atmospheric COS concentration (ppt), g b,COS , g s,COS , and g i,COS the laminar boundary layer, stomatal, and internal COS conductances (mol m −2 s −1 ), respectively. The internal conductance g i,COS includes both the COS diffusion through the mesophyll and the COS consumption by CA as a first-order reaction. The latter two were assumed to scale with the photosynthetic capacity (V max ) of the Rubisco enzyme (μmol m −2 s −1 ) (Badger & Dean Price, 1994;Berry et al., 2013). Therefore, g i,COS is expressed as proportional to Vmax, where α is a parameter, the value of which depends on the photosynthetic pathway. Typical values of α are 0.0012 mol μmol −1 for C3 and 0.013 mol μmol −1 for C4 plants (Berry et al., 2013). Because COS is assumed to be totally hydrolyzed by CA, COS internal concentration is zero at the terminus.
The soil COS model in ORCHIDEE is based on Ogée et al. (2016), with a distinction between oxic soils and anoxic soils. Anoxic soils such as wetlands, rice paddies, or salt marshes, are represented as a COS source only, while oxic soils can both emit and absorb COS. The oxic soil COS model resolves COS diffusion into the soil matrix, abiotic and biotic COS hydrolysis, and COS production, resulting in the following formulation following Ogee et al. (2016), where F COS,oxicsoil is the COS flux from oxic soils (mol m −2 s −1 ), k is the first-order COS consumption rate constant within the soil (s −1 ), B is the solubility of COS in water (m 3 water m −3 air), θ is the soil volumetric water content (m 3 water m −3 soil), D is the total effective COS diffusivity (m 2 s −1 ), P is the COS production term expressed as a function of soil temperature (mol m −2 s −1 ), z 1 = D/kB (m), and z max is the maximum soil depth (m).
In ORCHIDEE, the grid cells corresponding to anoxic soil are represented using a map of wetlands from Tootchi et al. (2019). Then the anoxic soil COS flux is expressed as a function of soil temperature following Ogée et al. (2016), where F COS,anoxicsoil is the COS flux from anoxic soils (mol m −2 s −1 ), P ref is the reference production term (mol m −2 s −1 ), T ref is a reference soil temperature (K) and Q 10 is the multiplicative factor of the production rate for a 10°C increase in soil temperature (unitless).
A more detailed description of the vegetation and soil COS models implemented in ORCHIDEE can be found in Maignan et al. (2021) and Abadie et al. (2022), respectively.

The ORCHIDEE Data Assimilation System
The ORCHIDEE Data Assimilation System (ORCHIDAS) was designed to optimize ORCHIDEE parameters related to carbon, water, and energy processes. ORCHIDAS has been frequently used and described in detail in previous studies focusing on the assimilation of EC flux data (Bastrikov et al., 2018;  values that give the best fit between the observations and the corresponding ORCHIDEE outputs. Assuming that the observations, parameters, and model outputs follow a Gaussian distribution, the optimized parameters are obtained through the minimization of the following cost function J(x) (Tarantola, 2005), with x b the a priori vector of parameters, x the optimized vectors of parameters, y the observations, H(x) the corresponding model outputs, and R and B the prior error covariance matrices for the observations (including measurement and model errors) and the parameters, respectively. R and B are diagonal in this study since the error covariances are difficult to access and hence neglected. The errors (i.e., variances) occupy the diagonal elements in each matrix. The observation errors in R are defined as the mean squared differences between the prior model and the observations, following a classical approach used in the studies listed above where the model error dominates the overall observation error. The parameter uncertainty in B is defined as 15% of the parameter physical range of variation.
We conducted the optimizations with the genetic algorithm (GA) method (Goldberg, 1989;Haupt & Haupt, 2004). This global search method allows us to obtain a combination of optimized parameters without risking getting stuck in a local minimum of the cost function J. Using a population of 32, the algorithm was run for 25 iterations, which was sufficient for the optimization to converge.
Then, at the minimum of the cost function J, the posterior uncertainties can be approximated assuming linearity of the model around the solution and Gaussian prior errors. The reduction in posterior uncertainty for each parameter after optimization can be computed as the difference between the prior and posterior parameter uncertainties, divided by the prior parameter uncertainty. This can inform which parameters are the most well constrained by observations during the optimization.

Sensitivity Analysis for Parameter Selection
We conducted sensitivity analyses (SAs) prior to performing optimizations to identify the parameters to which simulated GPP and ecosystem COS fluxes are the most sensitive ( Figure 1). This enables us to focus only on the key parameters during the optimization, reducing computational cost and the risk of overfitting. We selected the Morris method for SA, which is a time efficient qualitative method that ranks the parameters by importance (Campolongo et al., 2007;Morris, 1991). We considered a large number of parameters for SA, including parameters in the COS flux models and those related to photosynthesis, phenology, conductances, albedo, snow and soil thermal properties, and soil hydrology. A complete list of all parameters can be found in Table S2 in Supporting Information S1.
For soil COS fluxes, we performed an SA at FI-Hyy in 2015 following the method described in Abadie et al. (2022). Then, we conducted SA for simulated GPP and vegetation COS fluxes over one year for each of the 16 BorENF forest sites. Although ecosystem COS flux measurements are only available at FI-Hyy, here, we aimed to identify the key parameters for vegetation COS uptake in the BorENF PFT considering all the studied sites. Indeed, all selected sites correspond to the same vegetation type in ORCHIDEE (BorENF), although soil types vary across sites. The parameters to which simulated soil COS fluxes, vegetation COS uptake, and GPP fluxes are the most sensitive are shown in Figure S1 in Supporting Information S1.

Single Site COS and GPP Optimization Scenarios
In this study, we investigated the potential of COS flux measurements for improving the simulated GPP and LE fluxes in an LSM using the ORCHIDAS optimization framework. Because COS and CO 2 follow a common diffusion pathway during plant uptake, we expect that assimilating vegetation COS flux measurements will affect parameters that also control GPP and LE. However, whereas partitioned GPP data are available at all study sites, measurements of vegetation COS fluxes, including EC fluxes and soil fluxes, are only available at FI-Hyy. Therefore, at this site, we conducted the optimization of ecosystem COS fluxes in two steps.
First, we only assimilated soil COS fluxes using the soil chamber measurements collected in 2015 as previously done in Abadie et al. (2022). Following the results of the first SA experiment, we selected 5 parameters for soil 9 of 26 COS flux optimization: F CA , ⍺ soil , β soil , the van Genuchten water retention curve coefficient n (Van_Genuchten n ), and the saturated volumetric water content (θ sat ) ( Table S2 in Supporting Information S1). F CA , ⍺ soil , and β soil are parameters of the soil COS model, with F CA representing the soil microbial community that contains the CA enzyme and can consume COS, while ⍺ soil and β soil are involved in the production rate expressed as a function of soil temperature. The van Genuchten water retention curve coefficient n and the saturated volumetric water content both describe soil hydrology, which determines soil water content and COS diffusivity in the soil column.
Then, to optimize parameters related to vegetation COS fluxes, we re-ran ORCHIDEE simulations at FI-Hyy using the optimized set of parameters for soil COS fluxes. The use of optimized soil parameters in the second optimization step enabled us to focus on the parameters to which vegetation COS fluxes are sensitive. Starting from this optimized soil COS flux simulation, we assimilated EC-measured ecosystem COS fluxes over the 5 years of available data (2013)(2014)(2015)(2016)(2017). Based on vegetation COS flux SA, we considered the 9 most important parameters for the optimization of ecosystem COS fluxes. The default values and ranges of variation for the parameters included in each optimization are given in Table 1. We defined the physical range of variations based on expert and physical knowledge for each parameter. Note that for the α parameter, we considered a large range of variation corresponding to ±40% of the prior value as its value is not very well constrained by the linear regression applied in Berry et al. (2013) and as was shown based on site observations by Kooijmans et al. (2021).
We compared the data assimilation of ecosystem COS flux to a standard optimization in which we assimilated only GPP data at FI-Hyy over the last 5 years of available FLUXNET GPP estimates (2010-2014) ( Figure 1).
Finally, to investigate the additional constraint provided by COS observations on top of that provided by GPP, we performed an optimization with both GPP and ecosystem COS fluxes (Figure 1). This optimization was carried out by assimilating GPP for 5 years (2010)(2011)(2012)(2013)(2014) with COS data also for 5 years (2013)(2014)(2015)(2016)(2017). This joint assimilation aims at finding a combination of optimized parameters that minimizes the misfit for both GPP and ecosystem COS fluxes even if the assimilation periods differ. Because there are data gaps in the ecosystem COS  (Equation 7) in order to give the same weight to COS and GPP in the optimization. This weight was computed as the ratio of the number of ecosystem COS flux observations and the number of GPP estimates over the 5-year period considered for each. In this work we chose not to assimilate LE observations as evapotranspiration includes not only plant transpiration, on which COS offers a constraint through the gas diffusion pathway, but also snow sublimation, water interception by and evaporation from the canopy or ground vegetation, and soil evaporation, none of which is physiologically linked to COS uptake through stomata. Note that at this site, bare soil evaporation is marginal as the soil is mainly covered by mosses.
For the data assimilation experiments, data were averaged at a daily time step and smoothed over a 7-day running mean. A few net ecosystem COS emissions are found in the EC observations, especially in April, August, and September 2014. These net emissions may be due to data noise, or possibly reflect some soil emission episodes at high temperatures at the end of the summer. However, in ORCHIDEE, the annual mean simulated soil COS flux is about −3 pmol m −2 s −1 with no strong seasonal variations, which is in line with the average soil COS fluxes measured in the two soil chambers in 2015 (−2.8 ± 1.0 and −2.5 ± 1.2 pmol m −2 s −1 , Sun et al., 2018). Therefore, we filtered the EC COS fluxes to remove the data for which the mean daily COS fluxes was higher than −3 pmol m −2 s −1 , as such values cannot be reached at FI-Hyy by the COS models implemented in ORCHIDEE.

Multi-Site COS and GPP Optimization Scenarios
To evaluate the additional constraint on GPP and LE from COS across the entire BorENF biome, we also performed multi-site optimizations using only GPP data or combining GPP and COS flux data (Figure 1). In a multi-site optimization, one common set of parameters is obtained by simultaneously optimizing over all sites. Ten of the 16 BorENF sites presented in Section 2.1.1 were selected because at least 4 years of GPP and LE measurements are available (Table S1 in Supporting Information S1). Among these 4 years, we used 3 years for data assimilation and the last year was used for evaluating the optimization with a full independent seasonal cycle.
We performed a first multi-site optimization assimilating only GPP data at these 10 sites ("Post GPP"). Then, in a second multi-site optimization, we assimilated ecosystem COS measurements at FI-Hyy in addition to GPP at the 10 BorENF sites. In this scenario, we aimed at including as much information from COS observations as from GPP estimates in the assimilation. However, as we only have COS observations at one site for this multisite optimization, we applied a weighting factor to the COS term in the cost function (Equation 7) so that COS observations would have the same weight as GPP observations at the 10 sites combined (scenario "Post GPP & F COS ½"). Then, we tested another scenario in which we arbitrarily downweighted the importance of COS data by adjusting the multiplier before the COS cost term to ⅓ (scenario "Post GPP & F COS ⅓"; Table 2) because COS measurements are fewer than GPP data and are available at only one site. In both cases, because we intended to use the FI-Hyy site as an additional site to constrain the COS fluxes and COS-related parameters only, we did not use it for GPP data assimilation or the evaluation of optimizations results.
For each of the multi-site optimizations, we considered the same set of parameters as for the single site optimization at FI-Hyy for GPP only or joint GPP and COS assimilation (Table 1). In addition to the 10 sites used in the multi-site optimizations, we used 5 other BorENF sites for which 1 year of GPP and LE measurements is available to further evaluate the impact of optimizations on model performance at independent sites.

Constraint on COS and GPP Seasonal Cycles
First, we assessed the impacts of COS flux-only, GPP-only, or a joint COS flux-GPP assimilation on ecosystem COS fluxes and GPP at FI-Hyy. Figure 3 shows the change in the simulated mean seasonal cycle for the ecosystem COS flux over the 5 years of assimilation ( Figure 3a) and for GPP (Figure 3b) Table S1 in Supporting Information S1. The two optimizations that assimilate COS flux observations ("Post F COS " and "Post F COS & GPP") lead to a similar mean seasonal cycle with a higher net ecosystem COS uptake compared to the prior simulation (mean ecosystem COS flux of −10 pmol m 2 s −1 over the period against −8.8 pmol m 2 s −1 for the prior). This increase in COS uptake reduces the RMSD by 20% and 14% for the COS-only assimilation and the joint assimilation of F COS and GPP, respectively. The mean seasonal amplitude (difference between the annual maximum and minimum of weekly mean net uptake) of ecosystem COS flux is still underestimated compared to the observations (observed amplitude of 17.2 pmol m 2 s −1 ), but is slightly improved by the optimizations that assimilate COS flux observations (amplitudes near 12.5 pmol m 2 s −1 against 11.7 pmol m 2 s −1 after the GPP-only assimilation and 11.9 pmol m 2 s −1 for the prior).
As expected, assimilating GPP observations only has little impact on ecosystem COS fluxes seasonal cycle. Concerning the GPP seasonal cycle, the prior simulation is already in good agreement with the estimates from flux partitioning (RMSD = 0.92 gC m 2 d −1 ) at FI-Hyy. However, assimilating GPP observations only or GPP and COS observations together further improves the RMSD, with a reduction of about 10% over 8 independent years. Note that for these two optimization scenarios, a similar reduction in RMSD (8%) is found over the period considered for GPP assimilation (

Constraint on Leaf Gas Exchange Parameters
To understand how the optimizations impact the modeled COS and GPP fluxes at FI-Hyy, we studied the changes in the simulated stomatal and internal conductance to COS (Figure 4). As the boundary layer conductance is an 12 of 26 order of magnitude higher than the stomatal and the internal conductances, we do not focus on this conductance as it is not the main limiting factor for gas diffusion.
The assimilation of COS only or both COS and GPP observations leads to a decrease in the stomatal conductance to COS compared to the prior simulation ( Figure 4a). This decrease is driven by a lower value of the A1 parameter after optimization ( Figure S3 in Supporting Information S1), which leads to lower stomatal conductance under the same VPD conditions (Equation 2). Because the ratio between stomatal conductance to CO 2 and that to COS is a constant of 1.21 (Seibt et al., 2010), stomatal conductance to CO 2 is proportionally affected by the change in A1. Similarly, assimilating only GPP observations also reduces stomatal conductance, though to a lesser extent. Therefore, assimilating COS instead of or in combination with GPP data provides a stronger constraint on stomatal diffusion.
Finally, the two optimizations that assimilate COS data both increase the internal conductance to COS compared to the prior (Figure 4b). The increase in Vcmax 25 and α parameters ( Figure S3 in Supporting Information S1) explains this higher internal conductance as it relates to Vcmax by the multiplicative factor α (Equation 4).
In addition to analyzing the changes in parameter values after optimization, evaluating the reduction in uncertainty informs on which parameters are the most constrained after optimization (see Section 2.2.4). Across all three optimizations, the parameters showing a hight percentage of reduction in uncertainties are Vcmax 25 (66%-85% reduction) and acclim Vcmax,a (the intercept of the linear regression representing the acclimation to temperature of the entropy term for Vcmax) (95%-98% reduction). The uncertainty on α is reduced by 27% in the COS-only data assimilation and 82% in the joint assimilation of both COS and GPP data.
Then, the uncertainty in Leaf_age_crit (critical leaf age) also shows a strong reduction across all optimizations (49%-85% reduction), as well as gb ref (the leaf bulk boundary layer conductance) for the assimilations including COS data (75%-88% reduction). Finally, the lowest reduction in parameter uncertainties are found for Tphoto min (the minimum photosynthesis temperature) and LAI max (the maximum LAI), both of which experience less than 1% reduction in uncertainty.

Impact on LE and WUE
Considering the crucial role of stomatal in controlling in the coupled plant carbon and water fluxes, we evaluated the impact of optimizing the GPP-and COS-related parameters on LE (Figure 5a). While assimilating only GPP data has little impact on the simulated LE mean seasonal cycle, the two optimizations assimilating COS observations reduce the LE seasonal amplitude. This reduction is caused by a decrease in stomatal conductance after the optimizations, which results from the COS constraint on stomatal diffusion, as noted previously (Figure 4a). The optimizations also reduce the RMSD in LE by about 20% (from 10.22 W m −2 for the prior to 8.22 W m −2 and 8.48 W m −2 for "Post F COS " and "Post F COS & GPP," respectively). Note that when focusing on the summer months (from May to August), the RMSD is reduced by about 35% (9.11 W m −2 for "Post F COS & GPP") to 45% (8.37

10.1029/2023JG007407
13 of 26 W m −2 for "Post F COS ") compared to the prior simulation RMSD (12.39 W m −2 ). However, the simulated LE is still overestimated from January to April, with a mean bias of 6.24 W m −2 . As the simulated plant transpiration is zero in winter and increases in April with the start of the growing season, the overestimation in LE at the beginning of the year cannot be attributed to the transpiration flux but to other components of LE (see Section 4.1). Overall, because no LE data were assimilated, the post-optimization results highlight the potential of using COS flux observations to constrain LE.
Finally, we examined the coupling between GPP and ET as indicated in the ecosystem water use efficiency (WUE), the rate of carbon uptake per unit of water loss. We computed the WUE at FI-Hyy as the coefficient of the linear regression between GPP and ET ( Figure 5b). Model estimates of WUE in the prior simulation and the simulation that assimilates only GPP data are 3.25 (±0.025) gC kgH 2 O −1 and 3.17 (±0.024) gC kgH 2 O −1 , respectively, which are below the observationally derived WUE of 3.52 (±0.025) gC kgH 2 O −1 . This underestimation can be related to the higher LE in summer, which is not well captured by these simulations (Figure 5a). Then, while assimilating only COS observations further degrades the WUE (3.05 (±0.025) gC kgH 2 O −1 ), the assimilation of COS and GPP together leads to a WUE close to the observationally inferred value of 3.46 (±0.026) gC kgH 2 O −1 .

Focus on a Severe Drought Event at Hyytiälä
We focused on the period characterized by very dry conditions at FI-Hyy (SMI <0.20 from 23 July to 28 August 2006, see Section 2.1.3) to study the impact of our three optimization scenarios on the simulated GPP, LE and vegetation COS fluxes during this severe drought event. Figure 6a shows the response of the simulated and FLUXNET GPP during the drought event at FI-Hyy in 2006. Just before the beginning of the most severe phase of the drought, the FLUXNET GPP depicts a sharp decrease of about 5 gC m 2 d −1 , but stabilizes after DOY 218 under high soil moisture stress (SMI <0.15). After assimilating COS observations, the optimizations lead to a decrease in the simulated GPP compared to the prior simulation, reducing the mismatch with the observations during the most severe part of the drought. However, the decrease in the simulated GPP following the progression of the drought is less abrupt than that in the observations. The simulated GPP reaches its minimum just before the end of the most severe phase of the drought (DOY 228), 10 days after when the FLUXNET GPP reaches its minimum. After the most severe phase of the drought, observed GPP partially recovers (DOY 231). However, all simulations strongly underestimate GPP in this recovery phase (see Section 4.1).
Contrary to the variations of GPP during this drought event, the observed variations in LE are better captured by the modeled fluxes with a strong decrease in LE from the second day of the very dry conditions (DOY 206) to the end of the most severe drought period (Figure 6b). Note that the decrease in observed LE is stronger than that simulated in ORCHIDEE. However, in the two scenarios with COS data assimilation, the decrease in LE is higher than in the prior simulation or the simulation that only assimilates GPP data, especially in the two weeks of LE decrease (DOY 206-220).
Simulated vegetation COS uptake closely follows the decreasing trend in the simulated GPP until the end of the most intense phase of the drought across all assimilation scenarios (Figure 6c). However, for vegetation COS uptake, the impact of the different assimilation scenarios is mainly concentrated at the beginning of the most severe part of the drought. Indeed, when assimilating COS, the decrease in vegetation uptake during this period is stronger than the decrease in the GPP-only assimilation or that in the prior simulation. This faster reduction in vegetation COS uptake better tracks the response of the observed GPP to water stress than does the simulated GPP.
Finally, we studied the variations in the simulated stomatal and internal conductance to COS during this drought event for the three optimization scenarios (Figure 6d). As previously seen for the mean seasonal cycles (Figure 4), the assimilations of COS data reduce the simulated stomatal conductance and increase the simulated internal conductance to COS. While the internal conductance to COS does not show a sharp decreasing trend during the drought event, the stomatal conductance is halved from the beginning of the very dry conditions to the end of the most intense phase of the drought. Therefore, during the period with particularly high soil moisture stress (SMI <0.15), the stomatal conductance shows lower values than the internal conductance and becomes the most limiting factor for vegetation COS uptake. Note that these conductance responses are found in ORCHIDEE simulations, though there are no observations to validate these responses during this drought event. This change in the limiting conductance is more pronounced when assimilating COS observations, because the difference between the internal conductance and the stomatal conductance is higher in these optimization scenarios than in the prior simulation or the optimization that assimilates only GPP data.

Evaluation of COS and GPP Multi-Site Assimilations at Several Boreal Evergreen Needleleaf Forest Sites
After evaluating the potential of COS to constrain GPP and LE at FI-Hyy, we investigated the additional constraint provided by COS measurements at FI-Hyy on multiple BorENF sites, as compared to the constraint provided solely by GPP data at the same sites. Figure 7 presents the improvement or degradation of RMSD in the 3 multisite optimization scenarios compared to the prior RMSD at each of the 15 validation sites, of which 10 were used for the assimilations (see Section 2.3.3).
First, we find that the multi-site optimization that assimilates only GPP data at the 10 assimilaton sites ("Post GPP") deteriorates the simulated GPP at 7 of the 15 validation sites (CA-NS1, CA-NS2, CA-NS5, CH-Dav, RU-Fyo, SE-Kno, and US-NR1). At these sites, prior simulations underestimate GPP compared to the observations ( Figure S4 in Supporting Information S1).
Assimilating COS observations at FI-Hyy in addition to GPP at 10 BorENF sites ("Post GPP & F COS ½" and "Post GPP & F COS ⅓") also increases the RMSD of GPP at 5 sites (CA-NS1, CA-NS2, CA-NS5, SE-Kno, and US-NR1), but the degradation of GPP performance is not as severe as after GPP-only data assimilation ("Post GPP"). This is mainly because assimilating both COS and GPP data leads to a stronger seasonal amplitude at these 5 sites compared to assimilating only GPP data.
For sites at which the prior GPP is underestimated, different changes in parameter values may explain the milder degradation of GPP performance for the optimizations that assimilated COS data than for the optimization that assimilates only GPP data. Contrary to assimilating both COS and GPP data, the optimization that assimilates only GPP data reduces three GPP-related parameters: acclim Jmax,a , specific leaf area (SLA), and LAI max ( Figure  S5 in Supporting Information S1; see Table 1 for parameter definitions). In particular, because the ratio between LAI max and SLA determines the maximum leaf biomass, a decrease in this ratio after assimilating only GPP data In all multi-site optimization scenarios, GPP data from 10 BorENF sites (not marked with a star), each with an observational record of at least 4 years, were assimilated. For each site in this group, three out of the 4 years of data were assimilated, and the rest was used for evaluation. In addition, 5 other BorENF sites (marked with a star), each of which provides 1 year of GPP data, were used for evaluation but not data assimilation.
ABADIE ET AL.

10.1029/2023JG007407
16 of 26 leads to a lower maximum leaf biomass. In contrast, this ratio increases after the two optimizations that assimilate COS data, leading to a higher maximum leaf biomass.
At the 8 sites where GPP performance was improved after the GPP-only multi-site optimization, the prior simulations overestimate GPP. Note that similar to the single-site optimization at FI-Hyy ( Figure S3 in Supporting Information S1), the multi-site optimizations that assimilate COS data also lead to a decrease in A1 ( Figure S5 in Supporting Information S1), which reduces stomatal conductance at the same VPD (Equation 2). Then, all sites considered, the mean improvement of GPP after optimization, as measured by the relative difference in RMSD, is about 2% when assimilating only GPP data, and 4% and 7% when assimilating both COS and GPP data with a weight of ½ or ⅓, respectively.
Similar patterns are found for LE which experiences performance degradation at 6 sites (CA-NS1, CA-NS5, CH-Dav, CZ-BK1, SE-Kno, and US-NR1) after assimilating only GPP data ("Post GPP"). In contrast, assimilating COS and GPP data ("Post GPP & F COS ½" and "Post GPP & F COS ⅓") mitigates the performance degradation or even reduces the RMSD compared to the prior (except SE-Kno). The prior simulations overestimate LE compared to the observations at most sites, and assimilating COS data in addition to GPP data helps to reduce this overestimation ( Figure S6 in Supporting Information S1). Overall, assimilating only GPP data has little impact on the mean relative change in RMSD (1%), while a joint assimilation of GPP and COS data reduces the mean RMSD by 6% considering a weight of ½ for COS observations and by 7% with a weight of ⅓ for COS observations.
For both GPP and LE, assimilating GPP and COS observations using a weight of ⅓ for the latter yields post-optimization GPP and LE that are closer to the observations than using a weight of ½ for COS observations in the assimilation. This can be due to the larger decrease in A1 and increase in Vcmax 25 for the joint assimilation with a weight of ⅓ for COS data compared to the assimilation using a weight of ½ for COS data. At US-NR1, the model struggles to represent both GPP and LE and the optimizations fail to improve the simulated fluxes. This site is located at a high altitude (3,050 m, Table S1 in Supporting Information S1) and this specific condition may have contributed to the inaccurate representation of GPP and LE.

COS and GPP Multi-Site Assimilation Upscaling to Boreal High Latitudes
The three optimized sets of parameters for BorENF from the multi-site assimilation scenarios were used to run simulations over the whole boreal region (40°-80°N) ( Figure S7 in Supporting Information S1). Figure 8 presents In ORCHIDEE, the prior simulation overestimates GPP compared to FLUXCOM and FLUXSAT products over the studied areas. The three optimizations lead to similar reductions of 9% in GPP compared to this prior simulation, and help to reduce the mismatch with both evaluation products. This decrease in GPP mainly occurs between the beginning of the growing season and the end of July ( Figure S8 in Supporting Information S1). GPP estimates from all three optimizations are close to the FLUXSAT GPP estimate (4.76 PgC yr −1 ), but exceed that of FLUXCOM (3.94 PgC yr −1 ).
The prior estimate of ET is between those of FLUXCOM (2.01 10 3 km 3 yr −1 ) and GLEAM (2.86 10 3 km 3 yr −1 ) and is only slightly impacted by the optimizations. Note that over land grid cells that have a fractional coverage of BorENF higher than 40%, the GLEAM ET is 30% higher than FLUXCOM ET, but the spatial distribution of ET also differs between these products ( Figure S9 in Supporting Information S1).
The two optimizations that assimilate COS data lead to a stronger vegetation COS sink with an increase of 5% ("Post F COS & GPP ½") and 9% ("Post F COS & GPP ⅓") compared to the prior. Surprisingly, the optimization that assimilates only GPP data leads to a substantial increase in vegetation COS uptake to 43.50 GgS yr −1 over the whole study area, which is about 3 times the prior estimate.
Finally, spatial distribution of the vegetation COS uptake, GPP, and LE does not differ significantly between the three optimization scenarios and the prior (not shown).
We also investigated how the post-optimization changes in GPP and ET affect WUE over the BorENF biome ( Figure S10 in Supporting Information S1). ORCHIDEE overestimates WUE in the prior simulation (2.32 ± 0.008 gC kgH 2 O −1 ) compared with FLUXCOM WUE (2.0 ± 0.007 gC kgH 2 O −1 ). All optimizations reduce WUE as well as the difference in WUE between ORCHIDEE simulations and FLUXCOM. The largest reduction in WUE occurs for the optimization that assimilates only GPP data (2.14 ± 0.009 gC kgH 2 O −1 and 8% reduction compared with the prior estimate).

COS Data Assimilation Informs Biospheric Processes Represented in ORCHIDEE
Because vegetation COS uptake is more sensitive to stomatal conductance than is GPP, assimilating ecosystem COS flux data in models provides a more robust constraint on parameters that govern stomatal diffusion than does assimilating GPP data. Indeed, we find that the simulated GPP is weakly sensitive to the conductance-related parameters according to a multi-site sensitivity analysis ( Figure S1 in Supporting Information S1). This could be due to infrequent high VPD conditions over the BorENF biome, the influence of which is represented by the parameter A1 (Equation 1). Therefore, for this biome, GPP is more sensitive to other photosynthesis-related parameters, particularly those that determine the light reactions of photosynthesis (e.g., acclim Jmax,a ).
In addition to constraining stomatal conductance, COS data assimilation also highlights the need to improve the model-represented response of GPP to drought, especially during the most severe phase of the drought and the recovery phase that follows. In ORCHIDEE, the response of the simulated GPP to soil moisture stress depends on a soil water stress factor, which linearly varies between 0 at the wilting point and 1 for soil moisture close to the field capacity (Section 2.2.1). The inability of the model to capture the observed decrease in GPP at the beginning of the most severe phase of the drought at FI-Hyy, even after optimization (Figure 6), indicates that uncertainty in the response of GPP to drought is dominated by structural uncertainty associated with the functional form rather than parametric uncertainty. That is to say, a linear response to soil moisture stress cannot accurately represent drought impacts on GPP and COS fluxes.
Moreover, a comparison between the surface soil moisture measured at FI-Hyy and that simulated in ORCHIDEE highlights an underestimation of the simulated soil moisture at the onset of drought recovery ( Figure S11 in Supporting Information S1). The lower simulated soil moisture translates to a stronger water stress on GPP, which explains the slower recovery of the simulated GPP than in the observations. In addition, the highest volumetric soil moisture values simulated in ORCHIDEE during the recovery phase (18%-20%) are associated with a GPP ABADIE ET AL.

10.1029/2023JG007407
18 of 26 of about 4.5 gC m 2 d −1 , which is lower than the FLUXNET GPP of 6.6 gC m 2 d −1 at similar soil moisture values ( Figure S12 in Supporting Information S1).
Despite the potential of COS to constrain plant carbon and water exchanges, a joint assimilation of COS and GPP data seems necessary to improve the seasonal cycle of COS fluxes without risking degrading GPP at the same time (Figure 3). Indeed, the degradation in GPP seasonal cycle after assimilating only COS data can be explained by error compensation between Vcmax 25 and α. When both Vcmax 25 and α need to be increased, the optimization that assimilates only COS observations increases α more than Vcmax 25 (Vcmax 25 = 46.6 μmol m 2 s −1 and α = 0.0016, Figure S3 in Supporting Information S1), which reduces errors in COS uptake at the expense of GPP performance. In contrast, the optimization that assimilates both COS and GPP observations increases Vcmax 25 more than α (Vcmax 25 = 54 μmol m 2 s −1 and α = 0.0014), which reduces errors in both COS uptake and GPP. Therefore, the stronger increase in α needed to reproduce observed vegetation COS uptake will not affect GPP, whereas the weaker increase in Vcmax 25 needed to reproduce GPP cannot compensate for the decrease in the stomatal conductance resulting from the decrease in A1 ( Figure S3 in Supporting Information S1). The joint data assimilation is a delicate balancing act between the two competing needs.
Multi-site GPP-only assimilation leads to an unconstrained and unrealistically large increase in vegetation COS sink over the boreal forest biome (Figure 8). This large increase in vegetation COS uptake seems to be driven by increases in the leaf boundary layer conductance (gb ref ) and in the parameter that represents thermal acclimation of Vcmax (acclim Vcmax,a ), because such increases are absent in the optimization that assimilates both GPP and COS data ( Figure S5 in Supporting Information S1). This is because the simulated vegetation COS uptake is particularly sensitive to acclim Vcmax,a , which affects the internal conductance to COS through Vcmax according to the Berry et al. (2013) parameterization ( Figure S1 in Supporting Information S1). Thus, this overly sensitive behavior of vegetation COS uptake highlights a need for a mechanistic representation of the internal conductance to COS beyond what an empirical scaling factor α can offer.
Finally, this study highlights other deficiencies in the representation of some processes in ORCHIDEE, as illustrated by the overestimation of the simulated LE from January to April at FI-Hyy ( Figure 5), which cannot be corrected by data assimilation. The culprit seems to be snow sublimation, which shows strong peaks early in the year for several years ( Figure S13 in Supporting Information S1). These peaks coincide with those of the simulated LE, such as the peaks in 2005 or 2008 between January and March. Therefore, it is likely that the overestimation of the simulated LE at this time of the year originates from the snow sublimation component, which cannot be mitigated by assimilating GPP and COS flux observations. Note that the earlier onset of the simulated GPP compared to the FLUXNET GPP could also result from a misrepresentation in the snow processes ( Figure 3).

BorENF COS and GPP Fluxes and Related LRU
All studies focusing on the BorENF PFT find an increase in vegetation COS uptake after optimization (Figure 9), which is in line with a suspected high-latitude missing sink of COS proposed in recent inversion studies (Hu et al., 2021;Kuai et al., 2022;Ma et al., 2021;Remaud et al., 2022). Remaud et al. (2022) optimized the COS and CO 2 gross fluxes simulated in ORCHIDEE against atmospheric concentrations from the NOAA/ESRL/GML and provided prior and posterior values for GPP and vegetation COS uptake for each PFT. While their optimization led to an increase of 20% in the vegetation COS uptake for BorENF, this study found a lower increase up to 10% ("Post F COS & GPP ⅓"). It is to be noted that their prior vegetation COS fluxes were computed using the LRU approach with the LRU values from Whelan et al. (2018), which gives a prior vegetation COS uptake for BorENF 80% larger than the prior in this study (32.3 GgS yr −1 against 17.78 GgS yr −1 ). In their inversion framework, the errors from all boreal PFT fluxes (BorENF, Boreal Broadleaf Summergreen, and Boreal Needleleaf Deciduous) were correlated, which does not allow to strictly optimize the fluxes per PFT separately. On the contrary, the prior GPP simulated in their ORCHIDEE version (5.8 PgC yr −1 ) is 10% lower than the one computed in the version used in this study (6.36 PgC yr −1 ). In another inversion study, Hu et al. (2021) derived the GPP over the North American Arctic and boreal region using atmospheric COS concentrations from the NOAA/ESRL and the LRU approach. Over this region, the atmospheric COS inversion led to a vegetation COS uptake 40% higher than the prior simulated in the SiB4 LSM. Finally, to investigate BorENF contribution to the missing COS sink from a bottom-up approach, Vesala et al. (2022) developed a parametric model based on environmental drivers (photosynthetically active radiation, vapor pressure deficit, air temperature, and leaf area index) to simulate vegetation 19 of 26 COS fluxes that was used in SiB4 to scale up the COS fluxes to the region covered with BorENF. This parameterization also led to an increase in COS uptake for BorENF from 10.6 GgS yr −1 for the prior SiB4 simulation to 14.6 GgS yr −1 . In this study, we found a vegetation COS sink for BorENF from the two optimizations including COS assimilation (average of 18.8 GgS yr −1 ) that is higher than the one computed in SiB4 with the parametric model.
Unlike inversion studies where a linear relationship is used to link the COS fluxes and GPP, directly optimizing the model parameters does not imply that vegetation COS fluxes and GPP evolve in the same direction. While Remaud et al. (2022) and this study agree on an increase of vegetation COS uptake for BorENF, the changes in GPP differ. The optimization performed in Remaud et al. (2022) increased the GPP considerably for BorENF (from 5.8 PgC yr −1 to 7.2 PgC yr −1 ), which would disagree with the estimates from the FLUXCOM and FLUX-SAT products (3.94 PgC yr −1 and 4.76 PgC yr −1 respectively over land grid cells that have a fractional coverage of BorENF higher than 40%). Then, Kuai et al. (2022) also used atmospheric COS observations to optimize the vegetation COS uptake but using aircraft profiles over Alaska (CARVE). The vegetation COS uptake was increased by 25% over the Northern high latitudes (40°-90°N), as well as GPP given that vegetation COS flux and GPP are linked with a linear relationship. It should be noted that all these inversion studies use atmospheric COS concentrations over northern North America to constrain the whole boreal region, while the optimization of ecosystem COS fluxes is driven by FI-Hyy in this study.
Estimating LRU is critical for inversion studies that rely on this empirical approach to link vegetation COS uptake and GPP. However, temporal variations in LRU in response to changes in environmental drivers such as light and VPD (Kohonen, Dewar, et al., 2022;Kooijmans et al., 2019;Sun et al., 2022) are not accounted for in some of these studies (Hu et al., 2021;Remaud et al., 2022). Following our optimizations, we computed the seasonal variations of optimal LRU values of BorENF for each scenario ( Figure S14 in Supporting Information S1), as described in Maignan et al. (2021). The LRU seasonal cycles show that monthly LRU can vary by 0.3-0.7 within a year with lower values in spring and autumn. Then, the resulting mean annual LRU is increased by 20% with the two assimilations including COS data (LRU of 1.20 for "Post F COS & GPP ½" and 1.23 for "Post F COS & Figure 9. Synthesis of LRU values and vegetation COS and GPP fluxes from different studies focusing on boreal biomes. Note that the y-axis has been inverted for GPP (with largest positive values on the bottom, and decreasing moving up) so that both GPP and vegetation COS fluxes representing uptakes are in the same direction. Also note that each study follows a different approach to estimate the changes in COS or GPP fluxes relative to the reference values (see the referenced papers for more details). Each background color represents a specific vegetation type or studied area: Scots pine (Pinus sylvestris) tree (light yellow); BorENF PFT only (light blue); all land grid cells that have a fractional coverage of BorENF higher than 40% (light purple). For the comparison over land grid cells that have a fractional coverage of BorENF higher than 40% (in purple), we added the GPP estimates for the FLUXCOM and FLUXSAT evaluation products, and the mean GPP from the TRENDY-V10 model ensemble with the minimum and maximum GPP. Over this region, contrary to Figure 8, the COS and GPP fluxes for this study were computed using the post optimization values for α and A1 parameters from the multi-site optimizations for all C 3 PFTs. 20 of 26 GPP ⅓") compared to the prior (0.98). The assimilation including only GPP data led to the highest LRU of 3.24, a value not reported in Figure 9 because based on an inconsistent vegetation COS uptake after optimization.
The LRU values from the optimized fluxes including GPP and COS assimilations are in the low range of the reported LRU values with a median of 1.68 for C3 plants , but close to the value obtained in Kooijmans et al. (2019) at FI-Hyy for Scots pine (Pinus sylvestris) under light-saturated conditions (1.1). Indeed, the lowest LRU values have been estimated for boreal ecosystems (Maignan et al., 2021;Seibt et al., 2010). This was recently supported by Wohlfahrt et al. (2023) who provided new LRU estimates for each biome based on plant optimality principles and found LRU values around 0.5 in high latitudes. However, this LRU data set is computed for light-saturated conditions (photosynthetically active radiations higher than 1,000 μmol m −2 s −1 ), which can explain its low values as LRU increases with low photosynthetically active radiations (Kooijmans et al., 2019). Therefore, considering a sunlit leaf at the top of the canopy also induces a scale bias between the two LRU estimates as ORCHIDEE computes LRU integrated over the canopy, and not at the leaf scale. It is to be noted that their estimate for BorENF (around 0.6) is about half of the LRU values found after our optimizations including COS data. Therefore, while branch chamber measurements enable to study the LRU dynamics for a given species and determine a LRU value under light-saturated conditions (Kooijmans et al., 2019), a strong uncertainty remains when considering a constant LRU in inversion frameworks for BorENF after several studies focusing on this biome, with values ranging from 0.6 to 1.89 ( Figure 9).
Finally, as BorENF is only one of the biomes in the high latitudes where the missing COS sink was identified, we performed a scaling up of the two optimizations including COS assimilations to the whole boreal region (40°-80°N). We ran ORCHIDEE using the post optimization values for α and A1 parameters from the multi-site optimizations ("Post F COS & GPP ½" and "Post F COS & GPP ⅓") for all C 3 PFTs as the values of these two parameters were initially defined according to the photosynthetic pathways (C 3 or C 4 plants). While the α parameter is specific to the vegetation COS uptake model, the A1 parameter is involved in the computation of the stomatal conductance, which impacts COS exchanges but also GPP and LE. Considering the higher post optimization value for α and the decrease in A1 ( Figure S5 in Supporting Information S1) leads to an increase of 18% in vegetation COS uptake (93.9 GgS yr −1 for both assimilation scenarios) over the whole boreal region compared to the prior simulation (79.75 GgS yr −1 ) ( Table S3 in Supporting Information S1). On the other hand, changing only the value of A1 has a negligible impact on GPP and LE fluxes over the boreal region.

Remaining Challenges for Ecosystem COS Flux Modeling
The inability of the simulated ecosystem COS flux to reproduce the observed seasonal amplitude after optimization can highlight model deficiencies specific to the biospheric COS flux representation in ORCHIDEE. In particular, the implementation of the internal conductance to COS as a function of the Rubisco maximum carboxylation rate (Vcmax) instead of directly representing the activity of the CA enzyme could lead to an inadequate response of vegetation COS uptake to environmental factors. Indeed, while photosynthetic activity depends on both light and temperature, the activity of CA is not light dependent (Protoschill-Krebs et al., 1996). Moreover, Cho et al. (2023) recently proposed a new function to describe the temperature response of CA that accounts for a temperature optimum specific to this enzyme. Following this approach, the internal conductance to COS is still expressed as a function of the α parameter and the Vcmax 25 of the Rubisco, but includes the temperature response function of CA instead of the one of Rubisco. They found that the temperature response of CA and Rubisco differs, with a lower optimum temperature for CA at FI-Hyy (22°C against 45°C for Rubisco). We implemented this new CA temperature response in ORCHIDEE and optimized the ecosystem COS flux at FI-Hyy with this updated formulation of g i,COS (see Text S1 in Supporting Information S1). However, the updated prior simulated ecosystem COS uptake was strongly overestimated compared to the observations ( Figure S15a in Supporting Information S1). Then, the COS assimilations led to a higher RMSD than the one obtained after the optimization with the formulation from Berry et al. (2013) for g i,COS (RMSD of 3.6 pmol m 2 s −1 against 3.2 pmol m 2 s −1 ), further degrading the ecosystem COS flux seasonal amplitude. Therefore, in ORCHIDEE, the implementation of a temperature response specific to CA is not sufficient to improve the representation of vegetation COS fluxes. Furthermore, assimilating ecosystem COS flux only, with this CA temperature response, also leads to a stronger degradation of the GPP seasonal amplitude ( Figure S15b in Supporting Information S1). A next step in the vegetation COS uptake model development could be to define a formulation for COS consumption by CA that is independent from the Rubisco activity for CO 2 assimilation. This would imply to distinguish between the mesophyll conductance and CA activity in g i,COS . Following a different approach, Davidson  the potential of sulfur isotopes to better constrain the internal conductance to COS by providing information on COS internal concentration inside the leaves.
In the current formulations of g i,COS (Berry et al., 2013;Cho et al., 2023), the α parameter is critical for vegetation COS uptake and optimizing its value is necessary as this parameter, but also its temporal and spatial variability and drivers, are not well constrained in the literature (Berry et al., 2013). In this study with the formulation of Berry et al. (2013) for g i,COS , the two optimizations including COS data increase the value of α, which participates in the increase of the internal conductance to COS ( Figure S3 in Supporting Information S1). Post optimization values are between 0.0014 and 0.0016 mol μmol −1 while the default value used for all C 3 plants in Berry et al. (2013) is 0.0012 mol μmol −1 . However, in SiB3 where the Berry et al. (2013) model was initially implemented, the value of α was recently revised to 0.0014 mol μmol −1 for C 3 plants . Cho et al. (2023) obtained lower values for α (0.001316 mol μmol −1 and 0.001331 mol μmol −1 for growth and maturity phenological stages, respectively) at FI-Hyy following their optimization with the CA temperature response in SiB4. Then, performing similar optimizations on sites dominated by different vegetation types could help to refine the definition of this parameter with estimated values specific to a biome type instead of a photosynthetic pathway only (C 3 and C 4 plants). For example, Cho et al. (2023) extended the optimization of α to the Harvard Forest site (United States) in addition to FI-Hyy and found values between 0.001740 mol μmol −1 and 0.002224 mol μmol −1 depending on the phenological stage.
Despite the insufficiency of the α parameter to adequately represent the internal conductance to COS, its value still needed to be calibrated compared to the default value of 0.0012 mol μmol −1 as a major change was made to the vegetation COS model when considering 3-hourly variable atmospheric COS concentrations instead of a constant value of 500 ppt in the previous version of the model (Maignan et al., 2021). Initially α was estimated by fitting a regression between vegetation COS uptake observations (Stimler et al., 2010) and SiB3 simulated fluxes (Berry et al., 2013). When estimating α following this approach, the estimated value of α depends on the atmospheric COS concentration considered to compute the vegetation COS fluxes.
Neglecting the COS drawdown during the growing season would lead to an overestimation of the simulated ecosystem COS uptake, impacting the optimized parameter values when performing data assimilation. It is to be noted that an additional drawdown of COS concentration further down inside the canopy due to leaf COS uptake, or possibly related to understory, ground cover vegetation, or mosses, is not taken into account in ORCHIDEE.
To highlight the importance of considering a variable COS concentration in the biospheric COS models, Belviso et al. (2022) estimated the impact of the recent decline in atmospheric COS concentration on the simulated vegetation COS fluxes in the ORCHIDEE LSM. However, at FI-Hyy, Vesala et al. (2022) found a lower univariate correlation between COS fluxes and atmospheric COS mixing ratio than with air or soil temperature or net radiation, but still larger than with VPD. It should be noted that the optimized atmospheric COS concentrations used to compute the COS fluxes are also associated with posterior uncertainties .
In addition to the uncertainty on the parameter estimates, the vegetation and soil COS models implemented in ORCHIDEE are missing some processes, which can contribute to the model observation mismatch. For example, non stomatal components such as mosses were found to absorb COS during nighttime (Rastogi et al., 2018;Sun et al., 2018). COS uptake by understory vegetation is also neglected. These missing processes could explain part of the underestimation of the net ecosystem COS uptake simulated during summertime at FI-Hyy (Figure 3). The representation of soil COS exchanges also neglects processes such as the effect of snow cover on COS diffusion for subniveal COS uptake or the impact of solar radiations on COS production Kitz et al., 2020;Spielmann et al., 2019).
Then, a strong limitation for assimilating biospheric COS measurements to constrain GPP and LE is the lack of COS flux measurement sites and long time series, with FI-Hyy being the longest available time series of 5 consecutive years. Moreover, the absence of data during wintertime does not enable to fully capture the seasonal cycle amplitude of ecosystem COS fluxes for evergreen biomes. Going further in the assimilation of ecosystem COS fluxes in LSMs would greatly benefit from having a larger network of COS flux measurements, on the model of the FLUXNET network for carbon and energy flux measurements. However, developing such a network is currently restrained by the high instrumental cost of performing COS flux measurements. Despite these challenges in ecosystem COS flux modeling, assimilating biospheric COS fluxes has different advantages than assimilating COS concentrations. Indeed, tower based COS flux measurements capture only the COS sinks and sources included in the tower footprint, enabling to mainly focus on the vegetation and soil contributions. Therefore, working at the ecosystem scale enables us to exclude contributions from other components of the COS budget associated with large uncertainties, such as the ocean with a contribution estimated between 200 GgS yr −1 and 1,000 GgS yr −1 (Berry et al., 2013;Kuai et al., 2015;Launois et al., 2015;Lennartz et al., 2017Lennartz et al., , 2020Remaud et al., 2022;Suntharalingam et al., 2008). In addition, this can help to identify missing processes that have an impact at the ecosystem scale.

Conclusions and Outlooks
We performed a joint assimilation of ecosystem COS fluxes and GPP to directly optimize the parameters involved in the representation of these fluxes for BorENF. The key messages from this study and the arising recommendations for future work are the following: • Jointly assimilating ecosystem COS flux and GPP data enables us to improve both GPP and LE due to the strong link among COS, CO 2, and H 2 O fluxes through stomatal diffusion. This finding supports using COS flux data to obtain new insights into plant transpiration and encourages for future exploration into inferring transpiration from COS measurements. An alternative optimization worth considering could be to assimilate both ecosystem COS flux and LE, which could then be compared to the assimilation of ecosystem COS flux and GPP performed in this study. Exploring different data assimilation strategies may inform the complementarity of different flux measurements in constraining leaf-level parameters that control biosphere-atmosphere exchange of H 2 O, CO 2 , and COS. • Despite the fact that COS data assimilation yields an increase in simulated net ecosystem COS uptake to better match the observations at FI-Hyy, the representation of COS internal conductance following Berry et al. (2013) is structurally insufficient to reproduce the ecosystem COS flux seasonal amplitude. This evinces the need to implement a mechanistic representation of a mesophyll diffusion and enzymatic consumption specific to COS in LSMs. • Ecosystem COS flux assimilation can highlight deficiencies in simulated GPP and LE sensitivity to drought events. Along this line, COS fluxes could be used to evaluate GPP and LE sensitivity to water stress in LSMs, and this potential application should be further investigated, for example, COS flux measurements could help to better capture the response of the simulated WUE that is currently underestimated in the ORCHIDEE LSM (De Pue et al., 2022). Thus, it would be interesting to perform COS data assimilation for ecosystems more exposed to drought than BorENF, using the data collected in Wohlfahrt et al. (2018) and Cochavi et al. (2021) during heat waves, in a Mediterranean pine forest and citrus orchard, respectively. • The joint assimilation of ecosystem COS flux and GPP leads to an increase in vegetation COS uptake, but not in GPP for BorENF, thereby alleviating the mismatch between the model and the evaluation GPP products. It is to be noted that the assimilation of ecosystem COS flux and GPP data together enables parameter optimization in a way consistent with the shared stomatal control of COS and CO 2 vegetation uptakes, contrary to a COS flux-only or GPP-only data assimilation. These post-optimization changes in COS and GPP fluxes contrast with previous top-down studies assimilating atmospheric COS concentrations, which find increased vegetation COS uptake along with GPP due to an empirical linear relationship (i.e., through LRU) being imposed between the two fluxes. Therefore, the empirical approach to derive GPP from COS may introduce an important source of uncertainty. This also highlights the need for continued work to reconcile top-down and bottom-up COS-derived GPP estimates. • Considering that COS serves as a mechanistic constraint on stomatal diffusion and helps to inform the carbon assimilation stage of photosynthesis, a complementary approach could be to jointly assimilate COS and solar-induced fluorescence, the latter of which has a global coverage from satellite observations and can inform the light reactions of photosynthesis. This complementary approach is a promising path toward a more comprehensive mechanistic understanding of global photosynthesis.