Intercomparison of Atmospheric Carbonyl Sulfide (TransCom‐COS): 2. Evaluation of Optimized Fluxes Using Ground‐Based and Aircraft Observations

We present a comparison of atmospheric transport models that simulate carbonyl sulfide (COS). This is part II of the ongoing Atmospheric Transport Model Inter‐comparison Project (TransCom–COS). Differently from part I, we focus on seven model intercomparison by transporting two recent COS inversions of NOAA surface data within TM5‐4DVAR and LMDz models. The main goals of TransCom‐COS part II are (a) to compare the COS simulations using the two sets of optimized fluxes with simulations that use a control scenario (part I) and (b) to evaluate the simulated tropospheric COS abundance with aircraft‐based observations from various sources. The output of the seven transport models are grouped in terms of their vertical mixing strength: strong and weak mixing. The results indicate that all transport models capture the meridional distribution of COS at the surface well. Model simulations generally match the aircraft campaigns HIAPER Pole‐To‐Pole Observations (HIPPO) and Atmospheric Tomography Mission (ATom). Comparisons to HIPPO and ATom demonstrate a gap between observed and modeled COS over the Pacific Ocean at 0–40°N, indicating a potential missing source in the free troposphere. The effects of seasonal continental COS uptake by the biosphere, observed on HIPPO and ATom over oceans, is well reproduced by the simulations. We found that the strength of the vertical mixing within the column as represented in the various atmospheric transport models explains much of the model to model differences. We also found that weak‐mixing models transporting the optimized flux derived from the strong‐mixing TM5 model show a too strong seasonal cycle at high latitudes.

• Simulations in seven models propagating optimized carbonyl sulfide (COS) fluxes derived from two inversions agree with independent observations • Simulated and observed COS drawdowns are captured in boundary layer over the Pacific and Atlantic Oceans due to plant uptake over lands • Weak vertical mixing models using fluxes optimized from the fast-mixing TM5 model overestimate the COS seasonal amplitude at high latitudes

Supporting Information:
Supporting Information may be found in the online version of this article.
COS is emitted directly to the atmosphere through multiple sources globally, for example, anthropogenic emissions (Campbell et al., 2015;Zumkehr et al., 2018), oceanic emissions (Kettle et al., 2002;Lennartz et al., 2017Lennartz et al., , 2019) ) and biomass burning (Notholt et al., 2003;Stinecipher et al., 2019).COS is absorbed by plants through stomata, like CO 2 , but without a respiration flux (Montzka et al., 2007;Protoschill-Krebs et al., 1996;Stimler et al., 2012;Sun et al., 2022;Wohlfahrt et al., 2012).Although the biosphere is generally a sink of COS, soils can also become a source and emit COS to the atmosphere over wetlands and over agricultural areas in summer (Whelan et al., 2013(Whelan et al., , 2016)).Some plants also emit COS under specific conditions (Belviso et al., 2022a).In the stratosphere, COS undergoes photolysis under high levels of ultra-violet radiation above the ozone layer.The magnitude, spatial and temporal variability of COS sources and sinks remain to some extent uncertain (Whelan et al., 2018).The chemical sink of COS by OH removal and photolysis is about 140 GgS a −1 (Ma et al., 2021).
Recently, hydroperoxymethyl thioformate (HPMTF) was identified as a potential COS precursor from DMS oxidation (Novak et al., 2021;Veres et al., 2020), but there remains large uncertainty in the contribution to COS production due to the sensitivity to multiphase cloud chemistry (Jernigan et al., 2022).COS in the atmosphere shows relatively small inter-annual variability, implying that the sources and sinks are almost balanced in terms of the global budget (Montzka et al., 2007).Recent studies, however, indicate that the COS mole fractions show a declining trend from 2015 to 2020 (Belviso et al., 2022b;Hannigan et al., 2022;Serio et al., 2023).
Recent atmospheric inversion studies on COS using in situ measurements demonstrate that the global budget of COS can be closed by optimizing the sources and sinks of COS.Several inverse studies have been conducted.A regional inversion was used to study the biosphere uptake over North America (Hu et al., 2021).On the global scale, two inverse modeling studies have been conducted, one based on the TM5-4DVAR system (Ma et al., 2021) and the other on the LMDz model (Remaud et al., 2022).These two inversion studies agree on underestimated sources (or overestimated sinks) in tropical regions, consistent with earlier modeling studies (Berry et al., 2013;Suntharalingam et al., 2008).Also, both inversions reproduced independent data from the HIAPER Pole-To-Pole Observations (HIPPO) campaigns to some extent, but pointed out the importance of atmospheric transport to infer the surface fluxes and the need for further analysis of the impact of transport uncertainties on the COS budget.In a first paper (Part I, Remaud et al., 2023a), a COS intercomparison was carried out based on a set of reference surface fluxes for all processes (i.e., non-optimized fluxes); the results pointed out some shortcomings in the COS global budget that need to be resolved.In this Part II, we extend the analysis by evaluating model simulations that use two versions of the optimized COS fluxes with available independent data, mostly obtained from aircraft platforms.
We used a similar approach as in Part I, based on a protocol defined to compare different transport models with the same set of fluxes, and usually referred as a "TransCom" inter-comparison exercise.Several "TransCom" protocols were used in the past; they have been very useful to investigate the diversion of atmospheric transport models through rigorous inter-comparisons.For example, an earlier TransCom-CH 4 study investigated the roles of surface emissions, transport and chemical loss in simulating the global methane distribution (Patra et al., 2011).A previous TransCom Age of Air (TransCom-AoA) study using six global models highlighted that the inter-model differences are still significant and require further investigation (Krol et al., 2018).Differences may be caused by resolved transport (advection, use of reanalysis data, nudging) or parameterized transport (convection, boundary layer mixing, and resolution) (Bisht et al., 2021).In the accompanying Part I paper, Remaud et al. (2023a) showed that the differences in the vertical mixing implemented in the various participating atmospheric transport models (ATMs) were largely responsible for the inter-model differences.Part I also highlighted two groups of models: strong versus weak vertical mixing.The motivation of part II is to quantify the model spread in term of vertical mixing using more realistic fluxes (optimized).Linked to that, we expect that the optimized fluxes of TM5-4DVAR show a better performance when propagated in strong-mixing models.Likewise, the LMDz-optimized fluxes are expected to compare better to observations when propagated in weak mixing models.
1. Comparison of the surface fluxes from the TM5-4DVAR and LMDz inverse modeling systems.2. Evaluation transport of the model-derived fluxes against various COS measurement data: ground-based and aircraft COS observation, some of which were used to derive the optimized fluxes.3. Quantification of the impact of the transport uncertainties on the simulation of COS mixing ratios using the optimized COS fluxes.
The paper is organized as follows: first we introduce the participating models, measurements and inter-model comparison protocol in Section 2. The results are presented in Section 3, validations against aircraft observations in Sections 3.2 and 3.3.Finally, the improvements and limitations are discussed in Section 4 and conclusions with recommendations are presented in Section 5.
The main features of each transport model, that is, the horizontal and vertical resolution, meteorological drivers, and sub-grid scale physical parameterizations are given in Appendix Table A1.All models used meteorological fields from weather forecast analysis (e.g., ERA5) either by interpolating or by nudging toward fields of horizontal winds and temperature (e.g., LMDz).The participating models are not entirely independent.TM5 and TM3 are in the same family since they share similar physics and numerical schemes, but TM3 operates on a coarser resolution compared to TM5.TOMCAT is an offline 3D chemistry transport model, parameterized with the boundary layer scheme of Louis (1979) and Prather advection scheme (Prather, 1986).MIROC4, NICAM5, NICAM6 use the same JRA-5 meteorological driver fields.MIROC4 has been further modified since Arakawa and Schubert (1974).Specifically, there is a new threshold on the closure based on relative humidity (Patra et al., 2018).NICAM5 and NICAM6 applied updated physical schemes for convection (Chikira & Sugiyama, 2010), boundary layer mixing (the Mellor-Yamada scheme (Nakanishi & Niino, 2004)), and advection (Niwa et al., 2011).LMDz uses a mass flux scheme for vertical mixing representing the thermals for shallow convection and the Emanuel (1991) scheme for deep convection.The similarities and differences amongst the seven ATMs are expected to influence the model-to-model spread and their performance in simulating the spatial and temporal distributions of the COS mole fractions.To effectively evaluate the model-to-model differences, the models are organized in two groups based on vertical mixing strength from the multi-model average.As shown in Figures S1 and S2 in Supporting Information S1, TM5 and TOMCAT produce similar COS distributions in February and August, respectively, with stronger mixing of COS in the tropics and on the NH.The other ATMs (MIROC4, NICAM5, NICAM6 and LMDz) show similar COS distributions, however, vertical concentration gradients are generally stronger because these models are characterized by weaker vertical mixing.TM3 did not provide 3D model output and is assumed to be in the strong mixing group, because the vertical transport parameterization is similar to TM5.The impact of the vertical mixing on the seasonal cycle is known as the seasonal rectifier effect (Denning et al., 1995) and is shown in the case of COS on Figure 4 in Remaud et al. (2023a).
We focus our analysis on the comparison between the strong mixing (SM) and weak mixing (WM) model groups.
Note that the fluxes were optimized with one model from the SM group (TM5), and one model from the WM group (LMDz).

Prescribed COS Flux Components
Details about the TM5 and LMDz inversions are given in Table 1.TM5-4DVAR optimized a so-called "unknown" source to close the global budget of COS, and LMDz used an analytical inversion technique to optimize anthropogenic, oceanic, and biomass burning sources and ecosystem uptake using NOAA surface measurements.Ma et al. (2021) optimized the unknown emissions at the grid scale using an error correlation length approach to limit the degrees of freedom.In contrast, Remaud et al. (2022) divided the globe into big regions.The optimized fluxes generated by TM5-4DVAR and LMDz were first interpolated to a common resolution of 1° × 1°, assuring mass conservation.These COS fluxes are presented in Table 2 and are provided as inputs to each ATM on a monthly temporal resolution.The ATMs then simulated the atmospheric COS concentration (3D) following the transport of COS surface fluxes.For a comparison to the results with the optimized fluxes, we also present results from the control scenario described in part I (Remaud et al., 2023a).Relying on the linearity of the atmospheric transport, each flux of the control scenario was transported separately by all participating models, after which the contributions were added.
Figure 1 shows the multi-annual mean of the two optimized fluxes, indicating mainly the anthropogenic sources as hot spots and the main sinks over regions dominated by vegetation, for example, large parts of Northern Hemisphere, the Amazon, and parts of Indonesia and Africa.We also notice that the fluxes obtained with TM5 show larger spatial gradients compared to those obtained with LMDz consistent with the fact that TM5 and LMDz are fast and slow vertical mixing models, respectively (see Section 2.1). Figure 2 shows that the OPT-TM5 fluxes have a larger Note.Mean magnitudes of different types of fluxes are given for the period 2010-2018.Note that the fluxes are mapped to a fine resolution on 1° × 1° as transport input for all models.The flux unit is GgS a −1 , and the deviation in parenthesis is the ratio of net flux over source in %.Prior sources Anthropogenic (Zumkehr et al., 2018) Anthropogenic (Zumkehr et al., 2018) Ocean (Kettle et al., 2002;Suntharalingam et al., 2008) Ocean (Lennartz et al., 2017(Lennartz et al., , 2021) ) Biomass Burning (Ma et al., 2021) Biomass Burning (Stinecipher et  seasonal cycle compared to the OPT-LMDz fluxes over higher latitudes (a and b) and globally (c and d).Note also that the horizontal resolution of OPT-TM5 fluxes is coarser, since optimization was performed on 6° × 4° resolution, whereas optimization in LMDz was performed on 3.75° × 1.875° resolution.The corresponding difference of the two optimized fluxes is shown in Figure S3 in Supporting Information S1.In general, the optimized fluxes agree on (prescribed) anthropogenic hot spot emissions and (optimized) uptake patterns.As will be shown, both fluxes lead to a much better agreement to the available observations compared to the control scenario (Remaud et al., 2023a), as a result of the optimization process during which NOAA surface observations were assimilated.
Since atmospheric chemistry was not taken into account, the optimized fluxes were adapted to include the chemical loss as an extra sink to the global budgets of COS.The stratospheric and tropospheric sinks (−144 GgS a −1 in total) of TM5 were projected on the surface and added to the fluxes from Ma et al. (2021) to obtain a balanced atmospheric COS budget.The LMDz optimization did not account for (small) stratospheric loss and only the tropospheric loss by OH oxidation (−100 GgS a −1 ) was projected on the surface.The average annual budget of the OPT-TM5 fluxes is 21.3 GgS a −1 , which represents a deviation from the net total source of about 2.5%.The corresponding LMDz fluxes (Remaud et al., 2022) have an annual budget of 4.5 GgS a −1 , which represent a deviation from the net total source of 0.6%.On the top of these mean budgets both inversions show year to year budget variations.

Surface Measurements
We compare results to the NOAA Global Monitoring Laboratory COS measurements, which were used in the two inversions (Table 1) during 2010-2018 & 2019 at 14 sites.Further information is given in Appendix Table B1.The COS observations have been collected at multiple surface sites around the world as paired flasks one to five times a month since 2000 and have then been analyzed with gas chromatography and mass spectrometer detection.The COS measurements have been kept for this study only if the difference between the pair flasks was less than 6.3 pmol mol −1 .These data are an extension of the measurements first published in Montzka et al. (2007) and are regularly updated at https://gml.noaa.gov/hats/gases/OCS.html.In addition, we used measurements from the French sampling site Gif-sur-Yvette (GIF) (48.71°N, 2.15°E), located about 20 km to the south west of Paris, where hourly COS measurements have been collected about 7 m above ground level since August 2014 (Belviso et al., 2020(Belviso et al., , 2023)).The NOAA stations are shown in Figure 3 as red crosses.We also compare model results to observations from the NOAA Global Greenhouse Gas Reference Network's (GGGRN) Aircraft Program (Sweeney et al., 2015), which primarily provides vertical profiles (Figure 3, top-left corner).Note that the LMDz inversion also used additional surface measurement from Weizmann Institute of Science (WIS at the Arava Institute, Ketura, Israel, 29.96°N, 35.06°E, 151 m a.s.l.).

HIPPO, ATom and NOAA Aircraft Observations
The HIPPO (S. C. Wofsy, 2011) and Atmospheric Tomography Mission (ATom, Thompson et al., 2022;S. Wofsy et al., 2021) provide the first vertically-resolved global scale observations of various trace gases during short-term deployments covering multiple seasons and are valuable for model evaluation.Thus, to evaluate the simulated latitudinal distribution of COS within the free troposphere, we used aircraft measurements from these two observation programs.In the following analysis in Sections 3.2 and 3.3, the HIPPO and ATom data were averaged vertically below 2 km to represent the boundary layer, and between 2 and 8 km to represent the free troposphere.
To further evaluate the impact of transport on the vertical distribution of COS, we compared model results to 2010-2011 NOAA aircraft platform observations located at 13 sites over North America, listed in Appendix Table B1.The upper altitude that was typically reached was 8 km.This NOAA aircraft platform data set was already evaluated in other studies (Hu et al., 2021;Ma et al., 2021;Remaud et al., 2022Remaud et al., , 2023a)).Note that TOMCAT did not provide the requested model output for the HIPPO, ATom and the NOAA aircraft platforms.As a result, the strong mixing models are represented only by TM5 and TM3 in the comparison to HIPPO, ATom and NOAA aircraft observations.

Post-Processing of the Simulations and Measurements
In this analysis we focus on the annual mean and the mean seasonal cycle.To this end, the surface data were processed using the CCGVU curve fitting procedure developed by the (Carbon Cycle Group of the Earth System Research Laboratory [CCG/ESRL]) at NOAA, USA (Thoning et al., 1989).The CCGVU procedure is fully described and freely available at http://www.esrl.noaa.gov/gmd/ccgg/mbl/crvfit/crvfit.html.The procedure estimates a smooth function by fitting the time series to a first-order polynomial equation for the growth rate combined with a two-harmonic function for the annual cycle.The seasonal cycle and annual gradient were extracted from the smooth function.In addition, outliers were discarded if their values exceed 3 times the standard deviation of the residual time series.
To directly compare model simulations and observations, the models were processed by removing the impact of the yearly budget deviations from Table 2 and the addition of 485 pmol mol −1 , which is representative for the global mean COS mole fraction in 2000-2020 (Montzka et al., 2007;Serio et al., 2023).The multi-year positive trend of the two optimized fluxes that was removed amounted to 4.6 and 1.0 pmol mol −1 a −1 , respectively, by assuming that the budget deviation is homogeneously distributed over the whole atmosphere.In this way, the simulated COS abundances were set to the reference of the NOAA surface network.A detailed example of this adjustment procedure at each station is provided in Figures S4-S6 in Text S1 in Supporting Information S1.

Evaluation Metrics
In this paper, root mean square error (RMSE), error weighted squared error (EWSE) and Pearson correlation coefficient are used to quantify the performance of the model (groups).RMSE and EWSE are defined as: where m i is the modeled sample, o i is the measured sample, N is the number of samples, and σ i is the measurement error.σ i represents the variation in the measurements over time or space, and is from either inter-annual or intra-period variability.If the monthly mean is analyzed, then σ i refers to intra-period variability within a given month.The unit of RMSE is pmol mol −1 , and EWSE is unitless.Note that RMSE and EWSE are defined for a single model transporting one flux.To calculate the RMSE of model groups, the quadratic mean was taken.
To calculate the EWSE of model groups, the arithmetic mean was taken.To calculate the statistics of the mean seasonal cycle, the time series were first processed by the CCGVU software to remove the trend and outliers, after which the RMSE and EWSE were calculated for each model.
The Pearson correlation coefficient is defined as: where ρ is Pearson correlation coefficient, Cov is the covariance of modeled and observed samples, and σ m and σ o are the standard deviation of modeled and observed samples over certain average (e.g., averaged in latitudinal bins), respectively.The calculation of Pearson correlation coefficient is performed using the Python module Scipy version 1.7.3.Information S1.The general feature is that all the models with both fluxes capture the meridional gradient relatively well, but overestimate the mole fraction at the GIF station.In the Northern Hemisphere (NH), the prominent drawdown over North America observed at HFM and LEF in August is well reproduced by the models.This suggests that the optimized fluxes are representative of the net surface flux over North America.An exception is GIF, a French observational site which was, unlike the NOAA surface data, not assimilated to derive both optimized fluxes.The (coarse) grid cell in which GIF is sampled in the models has a high positive flux value, monthly mean about 3.6 ± 6.9 pmol m 2 s −1 and 14. cycle seems generally closer to the simulations using the OPT-LMDz fluxes, which is likely caused by the large seasonality in the OPT-TM5 flux.Inspecting the performance of the WM and SM model groups, large differences are observed at PSA, SUM and ALT, that is, stations in the two polar regions.Besides, SUM is a high-altitude sites whereas BRW and ALT are not.In general, WM models using the OPT-TM5 fluxes overestimate the seasonal cycle.This is explained by the fact that the fluxes were optimized by the strong mixing TM5, resulting in large seasonal cycle in the optimized flux.

Impact of Different
Propagation of these fluxes in WM models hence leads to overestimated seasonal cycles, specifically at higher latitudes where mixing and fluxes change strongly with the season.
To assess the performance of the model groups in simulating the seasonal cycle, the statistics of the model groups transporting the optimized fluxes are presented in Table 3.As reference, we also show the results of the control simulation that were presented in Remaud et al. (2023a).Five stations (PSA, THD, HFM, BRW, and ALT) are presented showing high RMSE values in Figure 5.In general, the seasonal cycles are well reproduced by the optimized fluxes (correlation in between 0.85 and 1.0), and using the optimized fluxes leads to large improvements compared to the control scenario.One exception is THD, where the performance of the control scenario was already good.
At PSA, the errors between observations and model group simulations are largest in local summer (see Figure 5), and WM models show larger RMSE values compared to SM models.At PSA, BRW and ALT, we again notice that using the OPT-TM5 fluxes in WM models leads to large RMSE and EWSE values.At HFM, BRW and ALT, WM transporting OPT-LMDz flux perform better.At NOAA stations, model results support the hypothesis that OPT-TM5 fluxes show a better performance when propagated in SM models.Likewise, OPT-LMDz fluxes perform better when propagated in WM models.

Mid-Troposphere Seasonal Variations
The ongoing surface observations discussed in the previous section were used to optimize the fluxes.In this section, independent data from the ongoing aircraft measurements (mostly over North America [NA]) are used to evaluate the fluxes and models.These data were not assimilated in the inversions, so they can provide insights in the quality of the optimized fluxes using a model ensemble (Ma et al., 2021;Remaud et al., 2022).The vertical gradient in the NOAA aircraft observations (averaged over 2010-2011), grouped by season over NA and Alaska, is presented in Figure 6, similar to the gradients shown in Figure 6 of Montzka et al. (2007) for a different set of years and sites.
Results from the individual models are presented in Figure S9 in Text S3 in Supporting Information S1.
In general, the vertical gradients are well reproduced by the models propagating the optimized fluxes.This is in stark contrast with the control flux scenario.This mismatch was attributed to an overestimated oceanic source at high latitudes and an underestimated biosphere sink at high latitudes (Remaud et al., 2023a).The good agreement between the observations and the models reflects that, at high latitudes, the optimized fluxes have more biosphere uptake and less ocean emissions compared to the control fluxes.Over NA during DJF, the observed vertical gradient is about −15 pmol mol −1 , similar to the mean of the WM and SM models.Note, however, that the model spread can reach 100 pmol mol −1 in JJA, pointing to differences in vertical mixing, also within the SM and WM groups.Over the course of the year, the vertical gradient in the observations grows, which is to some extent reproduced by the models with a slight exception for the autumn (SON) over NA where the models still have a too low gradient.As expected, the vertical gradient is more prominent for the WM models.

Evaluation With HIPPO Aircraft Data
We use a subset of the HIPPO results from the multi-seasonal aircraft campaigns to evaluate the optimized fluxes as they have not been used in the data assimilation process.HIPPO campaigns 1-2 are not used because model simulations start from 2010, while the HIPPO 1-2 data were collected in 2009.Two aspects are considered in the evaluation: (a) the meridional gradient and (b) the vertical distribution of the COS abundance.Figure 7 shows the meridional distributions over the Pacific Ocean.We averaged the observations over 20° latitude bins and in the vertical in two bins: the boundary layer (below 2 km), and the free troposphere (2-8 km).
The prominent feature of simulations compared with HIPPO is that the two fluxes underestimate the observations, specifically over tropical regions in the free troposphere.This bias may be due to our simple model correction procedure (Text S1 in Supporting Information S1) or unresolved sources.In the lowest 2 km, the simulations capture the meridional variations well, while there is a larger gap between HIPPO and simulations in the free troposphere, most prominent in the latitude range 0-40°N.This is more significant during the HIPPO4 campaign across the east Pacific Ocean, which will be further discussed in Section 4. The results for the individual models are discussed in Text S4 in Supporting Information S1.
The model performance is quantified in Table 4.We calculate how well the models reproduce the latitudinal gradients by correlating modeled and observed mole fraction against latitude.Models using the optimized fluxes show significantly improved correlation with HIPPO measurements for all three campaigns, with a significantly reduced RMSE.Correlations are in the range 0.78-0.93,0.82-0.95,and 0.92-0.99 for HIPPO3, HIPPO4, and HIPPO5, respectively.For the RMSE, HIPPO3, HIPPO4, and HIPPO5 show deviations of 15-27, 18-30, and 18-41 pmol mol −1 , respectively.One outlier is the WM model group using the OPT-TM5 fluxes, showing a RMSE of 41 pmol mol −1 , again due to incompatibility of the flux and vertical mixing at high northern latitudes as shown in Figure 7. Results for the individual models are presented in Figure S10 in Supporting Information S1, and results for the control scenario are shown in Figure S17 in Text S5 in Supporting Information S1.
To further compare to HIPPO observations, the data are separated over continents and the Pacific Ocean and mean vertical profiles were calculated.Figure 8 shows the vertical profiles of HIPPO and the simulations.Results of the individual models are shown in Figure S11 in Text S4 in Supporting Information S1.Consistent with Figure 7, the simulations are generally lower than the HIPPO observations, and simulations using the OPT-TM5 fluxes are closer to HIPPO compared to simulations using the OPT-LMDz fluxes.Although the simulations are lower than HIPPO measurements, they generally stay within 1-σ of the HIPPO measurements.

Evaluation With ATom Aircraft Data
In this section, we use ATom aircraft data to evaluate the model simulations.The ATom data were collected in four different campaigns, across mainly the Atlantic and Pacific oceans.We evaluate the fluxes and model group performance separately over the Atlantic and Pacific oceans and also assess the impact of nearby continents.
Figure 9 shows the COS meridional gradient over the Atlantic Ocean (results of the individual models are shown in Figure S12 in Supporting Information S1).In the lowest 2 km over the Atlantic Ocean, the meridional gradients observed by ATom are reasonably well reproduced by the models.ATom2 observed a drawdown of COS over the Atlantic in the SH, mostly above 2 km.These observations are probably impacted by uptake of the Amazon forest, which is consistent with a recent model study of Stinecipher et al. (2022) showing a depression over the Atlantic that peaks in January in both the GEOS-chem model and in MIPAS satellite data (their Figure 2 and Figure S4 in Supporting Information S1).The models used in this study reproduce this feature well.ATom3 (September-October 2017) also observed a drawdown of COS over the Atlantic in the NH during late boreal summer, likely caused by the uptake of the NH biosphere.ATom4 shows a COS enhancement in the low latitude NH, mostly below 2 km, which is not well reproduced by the models.In general, however, both fluxes simulate the observed meridional gradients well.Due to the less strong seasonal variations, the gradients calculated with the OPT-LMDz fluxes are somewhat flatter compared to the simulations with OPT-TM5 fluxes.Again, for ATom3, there is a strong difference between the WM model group and the SM model group at high latitudes below 2 km.WM models transporting the OPT-TM5 flux show the largest COS drawdown.
Figure 10 compares the COS meridional gradients over the Pacific Ocean (results for individual models are presented in Figure S13 in Supporting Information S1).Generally, all ATom flights are again well simulated by the models.In contrast to the Atlantic, ATom2 does not show a strong drawdown over the SH Pacific, a feature that is well reproduced by the models.ATom3 shows a strong drawdown below 2 km toward high latitudes over the Pacific, and this drawdown is not observed in the free troposphere.Here, models underestimate the COS mole fractions observed by ATom3 in the free troposphere.The performance of the model groups is evaluated against ATom in Table 5, separated into the Atlantic and Pacific regions.The performance of the models using the control scenario fluxes is also included.We calculate how well the models reproduce the latitudinal gradients by correlating modeled and observed mole fraction against latitude.The results using the optimized fluxes generally reach a high correlation and much lower RMSE, showing improvements over the control scenario.In addition, the correlation of the optimized cases are usually close to 1.0, but there are exceptions, especially in free troposphere.For instance, for ATom1 over the Atlantic, WM models with both fluxes show correlations of 0.32 and 0.57, drastically lower than 0.99 and 0.95 in the boundary layer.For ATom3 over the Pacific, WM models show correlations of −0.4 and 0.28, while correlations of 0.99 are seen in the boundary layer.These low and sometimes negative correlations are mostly caused by the small variations of the COS mole fractions averaged as latitudinal bins (20°).The lower simulated COS in the NH free troposphere during ATom3 will be further discussed in Section 4. Overall, there is no model group or flux showing clearly better statistics than the others.The ATom results using the control scenario is shown in Figures S18 and S19 in Supporting Information S1 and further described in Text S5 in Supporting Information S1.
The vertical distribution of the model simulations is compared to ATom data in Figures 11 and 12 (results for individual models are presented in Figures S14 and S15 from Text S4 in Supporting Information S1).In the lower troposphere, below 8 km, ATom data and model simulations are in good agreement and model-to-model spread is rather small.In the upper troposphere above 8 km, the model-data comparison shows good performance for ATom flights 1-3.ATom4, however, shows a drastic decline of the COS abundance over the Atlantic above 8 km, and all models fail to capture this decrease.An analysis of the COS vertical distribution over 30° latitudinal bins indicates that this COS decline mainly occurs over high latitudes in both hemispheres, as shown in Figure S16 in Text S4 in Supporting Information S1.This decline is likely associated with the influence of the stratosphere, which has lower COS abundance (Brühl et al., 2012;Glatthor et al., 2017).Since the models do not simulate stratospheric COS removal, this feature is not present in the simulations.For ATom4, around 2 km, the models do not capture the enhanced COS over the Pacific that was observed during April-May 2018.

Discussion
In this section, we discuss the main findings and potential improvements of this model intercomparison.
First of all, we find that models with optimized COS fluxes capture the available observations in the atmosphere generally quite well, both in the boundary layer and in the free troposphere.This agreement with observations is in sharp contrast with the control scenario, discussed in the accompanying paper (Remaud et al., 2023a), and shows that the flux optimization process generally leads to better comparison to observations, including with measurements that were not used in the optimization process.The optimized

Table 4
Statistics Between Model Groups Transporting Fluxes (Optimized and Control Fluxes) and HIPPO Data Along Latitude as Shown in Figure 7 fluxes of TM5-4DVAR and LMDz are generally in good agreement, with a slightly stronger seasonal cycle in the OPT-TM5 fluxes.This can be explained by the fact that TM5 is in the group of the "strongly-mixing" models, which implies that larger flux adjustments are required to obtain agreement with the assimilated NOAA surface measurements.The net source of the OPT-TM5 fluxes (838.5 GgS a −1 ) is 13.3% higher than that of OPT-LMDz (740.1 GgS a −1 ), see Table 1.Also, it is worth noting that the TM5-4DVAR inversion assimilated COS measurements from 14 NOAA surface stations, while the LMDz inversion assimilated COS and CO 2 from 15 NOAA surface stations adding WIS.Interestingly, a similar budget difference of CO 2 inversion was also found based on the comparison of GEOS-Chem and TM5-4DVAR (Schuh et al., 2019(Schuh et al., , 2022)).Later, Schuh and Jacobson (2023) analyzed the systematic large-scale patterns in column integrated CO 2 concentration (XCO 2 ) differences associated with transport of the two models, and found that the XCO 2 differences were primarily caused by differences in the parameterization of convective mixing.
Near the surface, the strength of the modeled vertical mixing simulated at the NOAA aircraft sites is controlled by the sub-grid scale parameterization.Specifically, TM3, TOMCAT, and TM5 as strong-mixing models use a similar boundary layer scheme (Louis, 1979) and ECMWF-based convective fluxes (Krol et al., 2018).The weak-mixing models share a similar Mellor-Yamada boundary layer scheme (Mellor & Yamada, 1974, 1982;Nakanishi & Niino, 2004).The station-based vertical gradients are compared to the NOAA aircraft platform in Figure S9 in Text S3 in Supporting Information S1.Over Alaska, the SM models agree better with the observed vertical gradients.WM models generally simulated too large vertical COS vertical gradients during JJA and SON, and this effect is reinforced by using the OPT-TM5 fluxes.Note, however, that the model spread is large.The smaller vertical gradients for strong-mixing models can be explained by the faster vertical mixing in the boundary layer, as pointed out in a SF 6 validation study by Peters et al. (2004).Classifying models into certain groups, that is, strong or weak mixing, is an appropriate way to evaluate the model performance as the main driver of the model spread is the strength of the vertical mixing, and to give an insight into the impacts of transport errors on the optimized fluxes.Also, the classification into weak and strong mixing models is somewhat arbitrary, because seasonal mixing patterns are also influenced by seasonal patterns in the chemistry and surface fluxes.However, it should be noted that quantifying the models' mixing is not straightforward and needs careful consideration in further studies.
In the free troposphere, the models using optimized fluxes show a significantly improved match to HIPPO and ATom data compared to simulations using the control flux scenario, see also Text S5 in Supporting Information S1.One point of discussion is the underestimation of the modeled COS, mostly in the free troposphere over the NH and tropics (HIPPO4, HIPPO5, see Figure 7, ATom1 and ATom4 Atlantic, see Figure 9, ATom3 and  ATom4 Pacific, see Figure 10).These underestimates mostly occur in the NH summer.We speculate that the mismatches are caused by missing sources in the free troposphere.Recent findings on oxidation pathways of DMS revealed a new stable intermediate, hydroperoxy-methyl-thioformate (HPMTF).HPMTF can potentially be oxidized to produce COS in the troposphere (Fung et al., 2022;Veres et al., 2020;Wu et al., 2015).However, taking into account the large solubility of HPMTF strongly reduces the conversion of DMS to COS, even below the yield of 0.7% (Barnes et al., 1994) that is currently used in COS emission inventories (Jernigan et al., 2022;Ma et al., 2021).Possibilities of in-cloud production of COS from dissolved HPMTF are still rather speculative, but cannot be excluded.Another possible candidate for the COS underestimates in the free troposphere could be unaccounted COS or CS 2 emissions from Asia.Ma et al. (2021) discussed the ambiguity of the CS 2 lifetime according to different rate constants of its oxidation in the atmosphere and the potential impact on COS production.The delayed formation of COS from CS 2 that is emitted by oceans and anthropogenic activities could potentially explain some of the low COS bias in the free troposphere.Further investigation on these possibilities is required.
There are several shortcomings in this work.First of all, the COS chemistry in the troposphere and stratosphere was not explicitly included in the models, but projected to the surface to keep the modeling protocol relatively simple and the COS budget closed.However, in reality COS is depleted in the stratosphere, and entrainment of stratospheric air may result in lower COS mole fractions, as observed in ATom4 over the Atlantic (Figure 10).A more realistic approach would be to treat the COS chemistry as a 3-dimensional loss  field.Second, another limitation of this work is that the ATMs started from a zero COS initial state, which made direct comparisons against COS measurements challenging.We solved this by correcting the models for budget imbalances (Table 2) and adding 485 pmol mol −1 .This adjusts the model simulations to the NOAA surface network as reference, yet the procedure is based on the assumption that the COS abundance does not change over time in the troposphere.The issue is alleviated by applying the standard CCGVU software to filter out inter-annual and synoptic signals.Thus comparing the modeled seasonal cycles to observed cycles is likely reliable.However, this correction procedure may partially explain the offsets between the models with HIPPO and ATom observations.Third, COS uptake by the biosphere is treated by using a prescribed flux.In reality, COS uptake is a first-order uptake process (Berry et al., 2013;Ma et al., 2021).Treating the biosphere uptake as a first-order process would complicate the intercomparison, but could be pursued in future studies.

Conclusions and Recommendations
In this paper, we presented results of the inter-model comparison TransCom-COS.In this Part II we focused on the optimized COS fluxes that are propagated in seven ATMs starting from the same initial state.We grouped the model results based on two sets of optimized fluxes (OPT-TM5 and OPT-LMDz), and on the strength of the vertical mixing in the models.Specifically, we identified weak mixing models (WM, including LMDz) and strong mixing models (SM, including TM5).Main findings are: This paper clearly shows that the current optimized fluxes are well able to reproduce the main features of the observed global distribution of COS and its seasonal cycle.To further improve and refine our knowledge on the COS budget we present the following recommendations for future research.
1.More elaborate data assimilation and model evaluation methods would be helpful.These methods could make use of FTIR and satellite data to further constrain the sources and sinks of COS.Recent studies have used MIPAS satellite data to constrain GPP over the Amazon region (Ma et al., 2023;Stinecipher et al., 2022).Hannigan et al. (2022) recently presented the extensive COS FTIR Network linked to the Detection of Atmospheric Composition Change (NDACC) network, which could enable a more comprehensive model evaluation and offers possibilities for data assimilation in the future.2. In general, COS inversion studies are still limited by a lack of COS observations, and more measurement data could advance understanding.Areas of particular interest are some unique and large ecosystems that currently are poorly characterized, such as the Amazon rain forest and data-void regions like Asia. 3. The underestimated COS mole fractions in the free troposphere require an explanation.Enhanced COS production from DMS oxidation can be a candidate, that is, through the HPMTF intermediate, but this requires further study.4. The TransCom-COS protocol can be further improved by providing ATMs with a standard initial state and 3D fields of COS related chemistry, for example, tropospheric oxidation and stratospheric photolysis. 5. To better quantify the differences between strong or weak mixing models, as outlined by Krol et al. (2018), using a tracer of Age of Air (AoA) could be considered.Another tracer is 222 Radon to be used to determine the strength of the vertical mixing because of its short lifetime (Remaud et al., 2018).

Table A1
Summary Information (Transport Model, Meteorology, Vertical Resolution, Horizontal Resolution, and Physical Schemes)

Figure 1 .
Figure 1.Averaged (2010-2018) optimized surface fluxes that are used as model input: TM5 (left) and LMDz (right).The surface fluxes are augmented with vertically integrated troposphere chemistry (for both models) and stratospheric removal (only for TM5).

Figure 2 .
Figure 2. Surface fluxes OPT-TM5 and OPT-LMDz (a, b) and seasonal variability in the total global net flux (c, d).The fluxes are multi-year and longitudinal averaged over the years 2010-2018.The x-axis represents month and y-axis latitude in (a) and (b).The difference between the fluxes is presented in Figure S3 in Supporting Information S1.

Figure 3 .
Figure 3. Geographical locations of the NOAA ground-based observations (red crosses), the HIAPER Pole-To-Pole Observations (HIPPO) and Atmospheric Tomography Mission (ATom) campaign tracks for different flights, and the ongoing NOAA aircraft measurement program (primarily vertical profiling; purple circles).The NOAA aircraft measurement locations over North America are shown in the top left inset panel.
Transport Models: Using Optimized Flux Scenarios 3.1.1.Comparison With the NOAA Surface NetworkWe first compare the model simulations to the NOAA surface network in February and August in Figure4.This figure can be compared to Figure3inRemaud et al. (2023a).To highlight the model differences, the models are grouped into SM and WM models, and the single model results are presented in FigureS7in Supporting

Figure 4 .
Figure 4. Comparison of the meridional variations of the carbonyl sulfide (COS) abundance simulated by the weak mixing (WM) and strong mixing (SM) model groups using the optimized surface fluxes with the surface observations only (black) in February and August.The error bars represent the variation among the WM and SM models, and the gray bar represents the RMSE of the measurements at each station.For visualization, the locations of KUM, NWR, and SUM are shifted by −2°N, −2°N, 2°N, respectively.The WM and SM groups are slightly shifted horizontally to avoid overlap.

Figure 5 .
Figure 5. Mean seasonal cycle of the carbonyl sulfide (COS) abundance simulated by the weak mixing (WM) and strong mixing (SM) model groups using the optimized fluxes.The COS mole fractions from measurements and models are decomposed with the standard software CCGVU to remove the inter-annual and synoptic variability.The seasonal cycle is averaged over the years 2010-2018.The black line represents the observed COS seasonal cycle with the standard deviation derived from the decomposed measurements within the particular month.The stations are ordered from SH to NH.The errors of the SM group are shown as shading, and those of the WM group are shown as error-bars.These errors represent the model spread.

Figure 6 .
Figure 6.Seasonal mean observed and simulated carbonyl sulfide (COS) vertical gradient between 1 and 4 km averaged for NOAA aircraft observations.The data are grouped into the North American continent (left panel) and Alaska (right panel).The monthly COS gradients are calculated by averaging the differences in COS abundances between 1 and 4 km over all the vertical profiles.The gray shading represents the spread in the observations averaged in 3 months.The number of observations in each season on 1 and 4 km are marked.

Figure 7 .
Figure 7. Carbonyl sulfide (COS) meridional gradient of HIAPER Pole-To-Pole Observations (HIPPO) flights 3-5 and model simulations.The model groups and observations are separated in observations below 2 km and into the free troposphere (2-8 km), and averaged in 20° latitude bins.The gray shading represents 1-σ in the binned HIPPO data.

Figure 8 .
Figure 8. Carbonyl sulfide (COS) Vertical gradient of HIAPER Pole-To-Pole Observations (HIPPO) measurements against model groups over the oceans.The data are averaged over layers of 1.25 km.

Figure 9 .
Figure 9. Meridional gradient of ATom flights 1-4 and over the Atlantic Ocean.The model groups and observations are separated into observations below 2 km and in the free troposphere between 2 and 8 km, and averaged in each 20° latitudinal bin.The gray shading represents the standard deviation of ATom data for each flight and vertical region.

Figure 10 .
Figure 10.Same as Figure 9 but over the Pacific Ocean.

Figure 12 .
Figure 12.Same as Figure 11, but over the Atlantic Ocean.
ble at: https://sourceforge.net/projects/tm5/.The source codes of TM5 in this study are available inMa and  Krol (2023).TOMCAT is a UK community model.It is available to UK (or NERC-funded) researchers who normally access the model on common facilities (e.g., Archer or JASMIN) or who are helped to install it on their local machines.As it is a complex research tool, new users will need help to use the model optimally.We do not have the resources to release and support the model in an open way.Any potential user interested in the model should contact Martyn Chipperfield.The model updates described in this paper are included in the standard model library.The source codes of SiB4 model used to simulate the biosphere fluxes are described inKooijmans et al. (2021).The source codes of ORCHIDEE used to simulate the COS biosphere fluxes is described inAbadie et al. (2022).All the model outputs of optimized fluxes and simulated mixing ratios of COS are available inRemaud et al. (2023b).

Table 1
Description of the Two Atmospheric Inverse Systems That Produced the Optimized Carbonyl Sulfide (COS) Surface Fluxes, TM5-OPT and LMDz-OPT

Table 3
Statistics of the Simulation of the Seasonal Cycle at Selected NOAA Stations Note.Root mean square error (pmol mol −1 ) and Pearson correlation are reported for the different ATom campaigns.The ATom observations are aggregated in the layer below 2 km (BL) and the free troposphere (FT, 2-8 km).

Table 5 Continued
Figure 11.The vertical Atmospheric Tomography Mission (ATom) profiles averaged over 1.25 km thick layers and the different model groups over the Pacific Ocean.
of the TransCom Models in This Study Note.Only the stations and time period used in this work are listed.Information of Observational Platforms: NOAA Surface Network, NOAA Aircraft, HIPPO and ATom