Differences in MOPITT surface level CO retrievals and trends from Level 2 and Level 3 products in coastal grid boxes

. Users of MOPITT (Measurement of Pollution in the Troposphere) data are advised to discard retrievals performed over water from analyses. This is because MOPITT retrievals are more sensitive to near-surface CO when performed over land than water, meaning that they have a greater measurement component and are less tied to the a priori CO concentrations (which are taken from a model clima-tology) that are necessarily used in their retrieval. MOPITT Level 3 (L3) products are a 1 ◦ × 1 ◦ gridded average of ﬁner-resolution ( ∼ 22 × 22 km) Level 2 (L2) retrievals. In the case of coastal L3 grid boxes, L2 retrievals performed over both land and water may be averaged together to create the L3 product, with L2 retrievals over land not contributing to the average at all in certain situations. This conﬂicts with data usage recommendations. The aim of this paper is to highlight the consequences that this has on surface level retrievals and their temporal trends in “as-downloaded” L3 data (L3O), by comparing them to those obtained if only the L2 retrievals performed over land are averaged to create the L3 product (L3L), for all identiﬁed coastal L3 MOPITT grid boxes. First, the difference between surface level retrievals in L3L and

tion, has no trend year to year.VMRs in the resulting L3O M are significantly different to L3L for 45 % of all coastal grid boxes, corresponding to 75 % of grid boxes where the L3L − L3W difference is also significant.Just under half of the grid boxes that featured a significant L3L − L3W trend difference also see trends differing significantly between L3L and L3O M .Factors that determine whether L3O M and L3L differ significantly include the proportion of the surface covered by land/water and the magnitude of land-water contrast in retrieval sensitivity.Comparing the full L3O dataset to L3L, it is shown that if L3O is filtered so that only retrievals over land (L3O L ) are analysed -as recommendedthere is a huge loss of days with data for coastal grid boxes.This is because L2 retrievals over land are routinely discarded during the L3O creation process for these grid boxes.There is less data loss if L3O M retrievals are also retained, but the resulting L3O "land or mixed" (L3O LM ) subset still has fewer data days than L3L for 61 % of coastal grid boxes.As shown, these additional days with data feature some influence from retrievals made over water, demonstrably affecting mean VMRs and their trends.Coastal L3 grid boxes contain 33 of the 100 largest coastal cities in the world, by population.Focusing on the L3 grid boxes containing these cities, it is shown that mean VMRs in L3O L and L3L differ significantly for 11 of the 27 grid boxes that can be compared (there are no L3O L data for 6 of the grid boxes studied), with 9 of the 18 grid boxes where temporal trend analysis can be performed in L3O L featuring a trend that is significantly different to that in L3L.These differences are a direct result of the data loss in L3O L -data that are available in L2 data (and are incorporated into the L3L product created for this study).The L3L − L3O LM mean VMR difference exceeds 10 1 Introduction Carbon monoxide (CO) is directly emitted into the atmosphere from anthropogenic (e.g.fossil fuel burning) and natural (e.g.wildfire) sources, and it is also produced via the oxidation of hydrocarbons in the atmosphere.With an atmospheric lifetime of weeks to months (e.g.Duncan et al., 2007), it is an important tracer of pollutant transport and indicator of emission sources.While a health concern at high enough concentrations, CO also plays an important role in atmospheric chemistry, for example as a precursor to ozone formation and a primary sink for the hydroxyl radical.Atmospheric CO concentrations have decreased since the start of the 21st century, with a slowdown in the rate of decline observed in recent years (Buchholz et al., 2021).Trends also show substantial spatial variability (Hedelius et al., 2021).Satellite instruments have been central to our understanding of global change in CO concentrations, with the Measurement of Pollution in the Troposphere (MOPITT - Drummond et al., 2010Drummond et al., , 2016; frequently used abbreviations are defined in Appendix A) instrument well suited to this task, providing a nearly unbroken and consistent data record since the year 2000.
MOPITT observes upwelling radiances at thermal infrared (TIR) and near-infrared (NIR) wavelengths and uses these in an optimal estimation retrieval algorithm to retrieve coarsevertical-resolution CO profiles, which are integrated to give total column amounts.Among multiple additional inputs required by the retrieval algorithm, a priori CO profiles -which describe the most probable state of the CO profile at a given location -are necessary to constrain the retrieval to physically reasonable limits (Pan et al., 1998;Rodgers, 2000; the retrieval algorithm is outlined in more detail in Sect.2.1).For the most recent iterations of MOPITT products, these a priori CO profiles are based on a monthly climatology from a chemical transport model.The degree to which a given MOPITT retrieval reflects information obtained from the observed radiances -known as "information content" -is highly spatially and temporally variable, depending on scene-specific factors such as surface temperature, thermal contrast in the lower troposphere, and the actual ("true") CO loading itself, as well as on instrumental noise (e.g.Deeter et al., 2015).The lower the retrieval information content, the closer the retrieved CO loading will be to the a priori, a model value.
Retrievals that take place over water are known to have a lower information content than retrievals that take place over land.Primarily, this is due to weak thermal contrast near to the surface hampering the instrument's ability to sense CO absorption in the lowermost layers of the troposphere (Deeter et al., 2007;Worden et al., 2010), and this is confounded by a lack of NIR reflectance over water, which limits these retrievals to TIR wavelengths only.It is therefore recommended that MOPITT data users exclude these retrievals from any analyses they perform, to ensure that results are not biased by retrievals that have a heavy reliance on the a priori (MOPITT Algorithm Development Team, 2018;Deeter et al., 2015).Such filtering is specifically emphasised where the focus of analysis is the identification of long-term CO trends because any real trends in the data will be weakened by the inclusion of retrievals that are tied heavily to the a priori (Deeter et al., 2015).This is because the a priori CO profiles are taken from monthly modelled CO climatologies: for a given location and day of the year, they will be the same every year and therefore feature no temporal trend (Deeter et al., 2014).
MOPITT data are available as either Level 2 (L2) or Level 3 (L3) products.L2 products contain each individual retrieval, at ∼ 22 × 22 km spatial resolution.L3 products are a 1 • × 1 • gridded area average of the individual L2 retrievals that fall within each grid box (see Fig. 1), with some filtering criteria applied.One criterion is the surface type over which the L2 retrievals were performed -land, water, or "mixed".If more than 75 % of the bounded L2 retrievals were performed over the same surface type, then only those retrievals are averaged to create the L3 product and the rest are discarded; otherwise, all bounded L2 retrievals are averaged, and the L3 product is given the surface type classification of mixed (L3 surface type classification is explained in more detail in Sect.2.2).This creates a problem for L3 grid boxes that overlay coastlines: to a greater or lesser extent, these L3 products will have some contribution from L2 retrievals performed over water, as shown in Fig. 1.L3 product users have limited capability to discard them, at least without sacrificing temporal resolution, because each L3 grid box only has a single "retrieval" per day.By contrast, with L2 products it is possible, for the same coastal grid boxes, to choose to retain only the retrievals performed over land.In practical terms, this means that, for coastal L3 grid boxes, valuable retrieval information over land, available in L2 products, can be lost to users of L3 products.
With a focus on the coastal L3 grid box containing the city of Halifax, Canada, Ashpole and Wiacek (2020) demonstrate the consequences of this loss of retrieval information in L3 products.They compare the results of analyses performed using L3 data and L2 data whereby only bounded retrievals performed over land were retained, and they find significant differences in both seasonal mean statistics and the magnitudes of trends identified in surface level CO.These differences are a direct result of the L3 products being dominated and bounded L2 retrievals from which the L3 products for that grid box are created.Purple (green) boxes correspond to L2 retrievals with a surface index of "water" ("land").Note that only L2 retrievals with a midpoint that falls within the boundaries of the L3 grid box will be used in L3 creation for that grid box.These are indicated by solid purple/green outlines -those not included in L3 creation for this grid box are shown with dotted purple/green outlines.More information on surface indexing and L3 product creation is given in Sect.2.2."Coastal" L3 grid box classification is outlined in Sect.2.3.The coastal L3 grid box visualised here contains the city of Dubai (∼ centre at 25.277 • N, 55.296 • E), which features in the case study analysis of Sect.3.4.Faint background shading is from NASA Blue Marble imagery.by L2 retrievals over water, which feature a weaker trend than the L2 retrievals over land, demonstrably due to a greater a priori influence owing to their reduced true-profile sensitivity, especially close to the surface.In their conclusions, Ashpole and Wiacek (2020) suggest that L2 retrievals over water should not contribute to L3 products for coastal grid boxes, which would be consistent with previous data-filtering recommendations (MOPITT Algorithm Development Team, 2018;Deeter et al., 2015).The study presented here expands that work to the global scale.
The aim of this paper is to compare surface level retrievals and their temporal trends in "as-downloaded" L3 data (L3O; a list of dataset short names is given in Table 1) with those that could be obtained if only the L2 retrievals performed over land are averaged to create the L3 product (L3L, Ashpole and Wiacek, 2022 -outlined in Sect.2.4), for all identified coastal L3 MOPITT grid boxes around the globe.It is necessary to identify whether there are differences for two reasons: firstly, L3 data are more convenient for longtime-series analysis than L2 data owing to their smaller file size (∼ 25 MB vs. ∼ 450 MB, respectively, for a single daily, global file).It cannot be overlooked that working with L3 data thus requires fewer computing resources and less technical proficiency, with a range of simple-to-use tools available for working with gridded products.L3 products thus make the MOPITT data more easily accessible, especially to lessexpert users, who may lack the expertise required to scrutinise the data for potential a priori bias.Secondly, many of the world's largest agglomerations are situated within a coastal L3 grid box (5 of the top 10 and 33 of the top 100 largest agglomerations by population; derivation outlined in Sect.2.5), making these likely targets for analyses of air quality indicators, especially their changes over time.The paper focuses on the surface level of the retrieved profile specifically because this can yield information that is of use in identifying potential air quality impacts for humans (e.g.Buchholz et al., 2022) and also because this is the profile level where the greatest land-water differences in retrieved volume mixing ratio (VMR) statistics and trends were found in Ashpole and Wiacek (2020).
This paper is structured as follows: Sect. 2 describes the datasets and methods used, including outlining the creation of the new "land-only" L3 product (L3L) and its "wateronly" counterpart (L3W) created for comparison purposes, which are analysed in this paper.A method for determining which L3 grid boxes are "coastal" is also outlined (Sect.2.3); these grid boxes are selected as the focus of analysis.Section 3.1 demonstrates the magnitude of the sensitivity difference for retrievals over land and water, zooming in to focus on coastal grid boxes.Although this paper focuses on the surface level of the retrieved vertical profile, higher levels in the profile are also briefly considered here to contextualise the land-water sensitivity contrast at the surface.Section 3.2 links the surface sensitivity contrast to differences in mean CO VMRs and their temporal trends for L2 retrievals performed over land and water within coastal L3 grid boxes, and it evaluates the effect that the averaging together of these retrievals has on the statistics and trends in resulting L3 mixed values.Section 3.3 quantifies the proportion of L2 retrievals performed over land within coastal L3 grid boxes that are lost to L3 products, before finally comparing statistics and trends in L3 and L2 products for all coastal L3 grid boxes, outlining the magnitude and significance of differences for the coastal grid boxes that contain 33 of the largest 100 cities in the world (Sect.3.4).Results are summarised and conclusions drawn in Sect. 4.

MOPITT instrument and retrieval overview
Carried on board the polar-orbiting NASA Terra satellite that was launched in December 1999, MOPITT began measuring CO in March 2000 and has provided near-continuous https://doi.org/10.5194/amt-16-1923-2023Atmos. Meas. Tech., 16, 1923-1949, 2023  The instrument is a gas correlation radiometer that measures radiances in two CO-sensitive spectral bands: the TIR at 4.7 µm, which is sensitive to both absorption and emission by CO and can provide information on its vertical distribution in the troposphere, and the NIR at 2.3 µm, which constrains the CO total column amount and yields information on CO concentrations in the lower troposphere (LT), to which TIR radiances are typically less sensitive (Drummond et al., 2010;Pan et al., 1995Pan et al., , 1998)).For the work presented here, the TIR-NIR combined MOPITT product is used, owing to its demonstrably greater sensitivity to CO loadings near to the surface than the TIRand NIR-only products which are also available (Deeter et al., 2013).Note, however, that retrievals over water and at night are limited to the TIR band only due to the lacking NIR signal.This analysis is based on daytime-only retrievals (more information on data selection and preparation is given in Sect.2.4).Multiple other sources describe the retrieval algorithm in detail (e.g.Deeter et al., 2003;Francis et al., 2017).In short, it uses optimal estimation (Pan et al., 1998;Rogers, 2000) and a fast radiative transfer model (Edwards et al., 1999) to invert measured radiances and retrieve the CO volume mixing ratio (VMR) profile on 10 vertical layers.The vertical grid consists of nine equally spaced pressure levels from 900 to 100 hPa (the uppermost level covers the atmospheric layer from 100 to 50 hPa), with a floating surface pressure level (if the surface pressure is below 900 hPa, fewer than 10 profile levels are retrieved).Retrieved values represent the mean CO VMR in the layer immediately above that level.These profile measurements are then integrated to provide total column CO amounts.Retrievals are only performed for scenes free of cloud (cloud clearing is based on coincident MODIS observations and MOPITT's own radiances).
In addition to the measured radiances, the retrieval requires multiple inputs including meteorological data; surface temperature and emissivity; and, of direct relevance to this study, a priori CO profiles, which are necessary to constrain the retrieval to physically reasonable limits.These a priori CO profiles come from a monthly CO climatology (years 2000-2009), simulated with the Community Atmosphere Model with Chemistry (CAM-chem) chemical transport model (Lamarque et al., 2012) at a spatial resolution of 1.9 • × 2.5 • , which is then spatially and temporally interpolated to the time and location of each individual MOPITT observation.A priori profiles for a given location and day of the year are therefore the same every year and feature no temporal trend.To understand the physical significance of the MOPITT CO retrievals, it is necessary to examine the retrieval averaging kernels (AKs), available with all MOPITT data products, which quantify the sensitivity of the retrieved vertical profile to the true vertical profile.The lower the retrieval sensitivity, the greater the a priori weighting.Two different components of AKs are analysed in this paper: AK row sums, which represent the overall sensitivity of the retrieved profile at the corresponding pressure level to the whole true profile, and AK diagonal values, which represent the sensitivity of the retrieved profile at the corresponding pressure level to the same level of the true profile (e.g. the AK diagonal value for the surface level of the retrieved profile represents its sensitivity to the surface level of the true profile).
From time to time, new MOPITT products become available as improvements are made to the retrieval algorithm and radiative transfer model, yielding superior validation statistics compared to earlier product versions (Worden et al., 2014).This analysis uses MOPITT Version 8 (V8) products (Deeter et al., 2019).Version 9 (V9) products became avail-able shortly after this study was completed.V9 features cloud screening improvements that yield additional retrievals over land in comparison to V8 (the exact percent change varies significantly with geography).Validation results are comparable to V8.An overview of MOPITT V9 is given by Deeter et al. (2022).A subset of the analysis presented in this paper has been duplicated using V9 data, and this confirms that the main conclusions drawn based on V8 data also hold for V9 (this analysis is outlined in Sect.S1 in the Supplement).This is to be expected, given that the land-water sensitivity contrast remains in V9 and the L3 processing method is unchanged.

MOPITT surface type classification
To aid in filtering and interpreting retrievals, all MOPITT data products are distributed with a range of diagnostic fields.As retrieval information content is known to be variable depending on the type of surface over which it is performed (Deeter et al., 2007), L2 retrievals are given a surface index according to whether they were performed over land, water, or a combination of the two (mixed).For a given 1 • × 1 • L3 grid box, how the L2 retrievals that fall within its boundaries are processed to produce the L3 product depends on how their surface indexes vary: if more than 75 % of the bounded L2 retrievals have the same surface index, only those retrievals are averaged to produce the L3 gridded value, and the L3 surface index is set to that surface type (the other L2 retrievals are discarded).Otherwise, all L2 retrievals available in the L3 grid box are averaged together and the L3 surface index is set to mixed, as is the case in the example shown in Fig. 1 (this information is taken from the MOPITT Version 6 L3 data quality summary, which at the time of writing, is the most recent data quality summary to detail exactly how L3 data are created, despite more recent data quality summaries being available, https://www2.acom.ucar.edu/mopitt/mopitt-level3-ver6, last access: 6 April 2023).Note that the L2 VMR profiles that are averaged to produce the L3 retrieval are first converted to log(VMR) profiles and then averaged, and the mean log(VMR) profile is then converted back to a VMR profile.
Each L3 grid box only has one retrieval per day.This dictates that where the grid box overlies both land and water, its surface index could vary through time, depending on the population of L2 retrievals from which it is created.The make-up of this population can vary from day to day due to factors such as cloud cover and issues around screening for data quality: on day n the population could be predominantly L2 retrievals over land (resulting in a surface index of land for the L3 retrieval), on day n + 1 it could be predominantly L2 retrievals over water (L3 surface index is water), and on day n + 2 it could be an even mix of the two (L3 surface index is mixed).Given that the averaging together of retrievals with significantly different sensitivity profiles -as could be the case when averaging retrievals over land and wa-ter -serves to dilute the information coming from the MO-PITT observed radiances with information coming from the a priori and is therefore discouraged (MOPITT Algorithm Development Team, 2018;Deeter et al., 2015Deeter et al., , 2007) ) and that MOPITT data users are advised to exclude retrievals over water from analyses owing to the known reduced sensitivity, this introduces two potential problems for L3 data taken from coastal grid boxes: firstly, discarding all L3 retrievals with the surface index of water will result in a loss of temporal coverage; secondly, L3 retrievals with a surface index of mixed feature some contribution from L2 retrievals over water.The consequences of both these problems are explored in this paper.

Coastal grid box classification for this study
Since the focus of this paper is on coastal L3 grid boxes, it is first necessary to isolate these from the remaining landonly or water-only L3 grid boxes in the MOPITT dataset.The initial step is to identify all grid boxes that have a surface index of mixed at least once during the study period.This indicates that the ground area within those grid boxes was both land and water -a characteristic that can safely be assumed true for coastal grid boxes.However, analysis of the global distribution of L3 grid boxes featuring a surface index of mixed revealed that, in addition to actual coastlines, a large proportion of inland grid boxes that are clearly not coastal are given the surface index of mixed at least once during the study period ("inland_mixed"; Fig. 2a).The reason for this is unclear, but it could be for real physical reasons, such as land grid boxes sporadically flooding or due to issues in the retrieval schemes caused by, for example, cloud screening problems or the presence of surface ice cover.One characteristic of these inland_mixed grid boxes is that, compared to the total number of days with L3, the relative frequency with which they are flagged as land is very high (expressed as the ratio "n_days(L3O L / L3O)", plotted in Fig. 2b; a list of short names and abbreviations referred to in the text can be found in Appendix A for reference).This relative frequency is much lower for true coastal grid boxes, to be expected given prior knowledge of (1) the fact that these grid boxes span both land and water surface types and (2) how the surface index is determined for L3 data (as outlined in Sect.2.2).Following iterative threshold testing, L3 coastal grid boxes are classified as grid boxes that 1. have at least one classification of mixed during the study period, 2. have an n_days(L3O L / L3O) ratio < 0.5.
The distribution of coastal grid boxes identified using these criteria is shown in Fig. 2c  n_days(L3O L / L3O) ratio to remove these areas has diminishing returns, since it results in the rejection of more true coastal grid boxes.These criteria therefore strike a balance between minimising false and maximising true coastal classifications.
Applying these criteria to the MOPITT L3 data yields 4299 coastal grid boxes, from a total of 64 800 L3 grid boxes (6.6 %).This mask is applied to all data, and only those L3 grid boxes that remain are classified as coastal.Only data for these coastal grid boxes are analysed in this study (with the exception of global L3 maps analysed in Sect.3.1.1).(Drummond et al., 2010).Data posted after 28 February 2019 are flagged as "beta" at the time of writing, their use in scientific analysis (especially for examining long-term records of CO) being discouraged until final processing and calibration occurs (MOPITT Algorithm Development Team, 2018).For clarity, the original, as-downloaded L3 time series is referred to as "L3O" for the remainder of this paper.Only retrievals that were performed during daytime hours are retained (daytime and nighttime retrievals are stored as separate fields in MOP03J files).For this analysis, separate subsets of L3O are created according to the surface index: L3O land-only (L3O L ), L3O water-only (L3O W ), L3O mixed (L3O M ), and L3O land-or-mixed (L3O LM ).When the L3O dataset is analysed with no filtering by surface index applied, it is referred to as "L3O NF ".A list of dataset short names used in this article, as well as their full descriptive name, is given in Table 1.

MOPITT
The land-and water-only L3 products are created from daily L2 data.The first step of L2 data processing required is to filter the retrievals as is done for the processing of L3O.This involves the following: discarding all observations for Pixel 3 (this corresponds to one of MOPITT's four detectors), discarding all observations where both (1) the channel 5A signal-to-noise ratio (SNR) < 1000 and (2) the channel 6A SNR < 400 (5A and 6A correspond to the average radiances for MOPITT's length-modulated cell TIR and NIR channels, respectively).
This filtering takes place because observations from specific elements on MOPITT's detector array were found to exhibit greater retrieval noise than the other elements, and their inclusion therefore lowered overall L3 information content (MOPITT Algorithm Development Team, 2018).Only daytime L2 retrievals are retained, using a solar zenith angle filter of < 80 • .
From the remaining set of filtered L2 retrievals, separate area averages are taken for those with a surface index of land and water, for every 1 • × 1 • L3 grid box.This effectively creates two new L3 land-only and water-only products, which are referred to herein as "L3L" and "L3W".For clarity of analysis, remaining L2 retrievals with a surface index of mixed are discarded.These make up a very small proportion of the overall L2 retrievals (e.g.< 5 % for the grid box containing Halifax, analysed in Ashpole and Wiacek, 2020).Both L3L and L3W are publicly available for download (Ashpole and Wiacek, 2022).Note that, as with the creation of L3O, L2 VMR profiles for each L3 grid box are first converted to log(VMR) profiles before averaging, and the mean log(VMR) profile is then converted back to a VMR profile to give the final L3L and L3W retrievals.Additionally, the number of L2 retrievals that are used for calculating the area averages when creating L3L and L3W ("n_ret L " and "n_ret W ", respectively) is recorded.The ratio n_ret L / n_ret W (herein referred to as "ratio(land / water)" for simplicity) is used to indicate the proportion of the L3 grid box that is covered by land vs. water: a ratio of 1 indicates an even split of these surface types in the grid box, a ratio < 1 indicates that a greater proportion of its surface is water covered, and a ratio > 1 indicates that the grid box is land-dominated.
From the L3O, L3L, and L3W datasets, only grid boxes that are classified as coastal using the coastal grid box mask outlined in Sect.2.3 are analysed (see Table 1 for a list of dataset short names used in this article, as well as their full descriptive name).
Note that the analysis presented in this paper is restricted to daily products.Monthly L3 files are available; however the absence of a monthly L2 product precludes the analysis from being conducted on those data.Based on the results of the analysis of daily data, however, there is reason to also advise caution if working with coastal grid boxes in the monthly L3 product.This is because the data for those grid boxes will still be created from daily L2 retrievals over land and water, with the same implications as those that are discussed in this paper.

Time series preparation, statistical methods, and additional data sources
For every coastal L3 grid box, two separate time series from each of the L3O, L3L, and L3W datasets are analysed: 1.The time series analysed in Sect.3.1 and 3.2 only contain days when L3L and L3W are both present and the L3O surface index is mixed (L3O M ).This is to ensure that the true CO profiles are as similar as possible when directly comparing L3L and L3W for a given coastal grid box.Furthermore, it allows for the analysis of the resulting L3O M data on these days with knowledge of the parent L2 retrievals over land and water and their differences.
2. In Sect.3.3 and 3.4 the full time series from each dataset is analysed with no temporal filtering applied.
Descriptive statistics are calculated from both time series across the whole study time period and also for individual years (full years only -2002 to 2018 inclusive) in order to perform the regression analysis outlined below.
To identify and compare temporal trends for each coastal grid box in the datasets outlined above, weighted least squares (WLS) regression analyses are performed on yearly mean values, weighted by the inverse of the standard deviation of the measurements used in the yearly mean (i.e.1/σ ).For years that contain just a single retrieval, the weighting is set to 1/100 000 to de-weight them in the fit.If there is more than 2 years in a time series that has no data for a given grid box, the regression analysis is not performed.WLS is preferred over ordinary least squares (OLS) because it is less sensitive to outliers.For simplicity, no other trend detection methods -e.g. the Theil-Sen slope estimator -are applied to corroborate the trends that are detected with WLS, nor do we analyse additional datasets to verify them.Such extra steps would be necessary if the actual trend values were the focus of this study; however, the aim of this trend analysis is instead to identify whether the same method can yield different results depending on which of L3O, L3L, or L3W is analysed.Trend verification is beyond the scope of this study.
To determine whether two trends identified are significantly different, their difference is evaluated using the Z test as follows: where SE 1 and SE 2 correspond to the standard errors of Trend 1 and Trend 2 , respectively, and Z is the test statistic.
Where Z is greater (less) than 1.645 (−1.645), the trend difference is statistically significant to at least 90 % (i.e.p < 0.1).In addition, two trends are classified as being significantly different if Trend 1 is significantly different to zero (p < 0.1) but Trend 2 is not (p > 0.1), and vice versa (i.e. the conclusion would be that Trend 1 is not zero but Trend 2 may be).
A list of the top 100 largest agglomerations by population in the world is obtained from http://www.citypopulation.de/(last access: 6 April 2023, valid at time of writing).Of these, 33 are situated in a coastal L3 grid box, according to the classification in Sect.2.3.Time series of L3L, L3W, and L3O are extracted from each of these grid boxes for the analysis in Sect.3.4.

Land-water contrast in MOPITT sensitivity
This section demonstrates the land-water sensitivity contrast in MOPITT retrievals on a global scale and examines the magnitude of the difference within coastal L3 grid boxes.The analysis is presented for levels throughout the vertical profile in addition to the surface level to give context as to how MOPITT retrieval sensitivity, as well as its land-water contrast, varies with height.

Global context
Figure 3 shows long-term mean maps for the retrieval sensitivity metrics AK diagonal value, AK row sum, and retrieved minus a priori VMR (VMR ret − apr) at selected profile levels, created from L3O data averaged across the entire study period (September 2001-February 2019, inclusive).All indicators show that retrieval sensitivity is greater over land than water at the surface, with sharp differences evident at almost all land-water boundaries.The same is true at the 900 and 800 hPa profile levels, although the land-water contrast clearly decreases in strength with height on average, and by 600 hPa retrieval sensitivity tends to be a little greater over water than land.Some strong land-water gradients remain present in VMR ret − apr fields at this level, most notably over North Africa, the Arabian Peninsula, and south-east China, but on average these values are much more similar in magnitude across land and water than they are closer to the surface.No clear land-water contrast is evident at 300 hPa (which represents the upper troposphere), with retrieval sensitivity instead varying more with latitude, decreasing towards both poles (a companion to Fig. 3 with an altered colour bar to better show spatial patterns in AK diagonal values and row sums at the higher profile levels considered here is provided in Sect.S2 in the Supplement).AK diagonal values and row sums clearly show that retrieval sensitivity increases across both land and water with height.It is generally lowest at the surface level, with little information content in the retrieval over water (mean AK diagonal values and row sums over water are less than half what they are over land).However, there is high spatial variability over land, with clear sensitivity hotspots (e.g.parts of central Europe, eastern Asia, the eastern USA, and tropical western Africa), but also some areas where AK values are more comparable to those over water.The rate of sensitivity increase with height is greater over water than land, with AK values more than doubling over water between the surface and 800 hPa.
Spatial patterns in retrieved minus a priori VMRs are slightly more complex to interpret because they are influenced by both retrieval sensitivity and the accuracy of the a priori.For example, while VMR ret − apr values close to zero can indicate a retrieval that is heavily weighted by the a priori and therefore low retrieval sensitivity, they can also indicate that the true VMR is close to the a priori value.Despite this, retrieved minus a priori VMR values clearly reach more strongly positive or negative values over land than water at the surface, with the contrast becoming less pronounced with height.Furthermore, there are clear land-water change points, further demonstrating the impact of the land-water contrast in retrieval sensitivity.
An analysis of latitudinal and seasonal variability in the land-water surface level retrieval sensitivity contrast is provided in Sect.S3.Briefly, this shows a tendency for greater land-water retrieval sensitivity differences in the North-ern Hemisphere than Southern Hemisphere when averaged across the year.The land-water AK row sum differences tend to vary least by season in the tropical regions (between 30 • S and 30 • N) and show the greatest contrast in the midlatitudes (30-60 • ) in the respective hemisphere's spring and summer months, with the smallest differences in the winter months.Overall, a land-water sensitivity contrast is evident irrespective of latitude or season.

Analysis of coastal L3 grid boxes
Scatterplots of the sensitivity metrics discussed above, for coastal L3 grid boxes only, are shown in Fig. 4. Specifically, these plots show the sensitivity of the L2 retrievals over land and water that are bounded by the 1 • × 1 • L3 grid boxes and used to create the L3O data -represented here by L3L and L3W.As noted in Sect.2.5, the time series analysed in this section only contain days when L3L and L3W are both present and the L3O surface index is mixed (L3O M ), for a given coastal grid box.This is to ensure that the true CO profiles are as similar as possible when directly comparing L3L and L3W for that grid box.The values that are plotted correspond to the long-term mean from these L3L and L3W time series.
The AK diagonal value and row sum plots clearly demonstrate the greater sensitivity over land (L3L) than over water (L3W) at the surface level (a point below the diagonal line on these panels indicates greater values in L3L) for most grid boxes, with the difference decreasing with height, as expected from the preceding analysis.Retrieved VMRs also deviate more greatly from their a priori values in L3L than L3W closer to the surface, with smaller land-water differences higher up in the retrieved profile.All mean values are significantly different (p < 0.005) apart from AK diagonal values at 300 hPa and retrieved minus a priori VMR at 300 hPa (p = 0.13 and 0.07, respectively).Sensitivity metrics are generally better correlated over land and water higher in the retrieved profile than at the surface.
This analysis clearly shows how L2 retrievals that are averaged together to create the L3O data over coastal grid boxes have differing degrees of sensitivity, depending on the surface type that they were retrieved over, especially at the surface and lower profile levels.This is explicitly cautioned against in the MOPITT data user's guide (MOPITT Algorithm Development Team, 2018).The remainder of this paper focuses on the surface level of the retrieved profile, since this is where land-water discrepancies are greatest, and the cause of this sensitivity disparity is well established: differing thermal contrast conditions near to the surface over land and water and a lack of NIR radiances being used in the retrieval over water.Furthermore, surface level retrievals are of most interest for identifying potential air quality impacts for humans (e.g.Buchholz et al., 2022).

Differences in retrieved surface level VMRs and temporal trends and their relation to the land-water sensitivity contrast
In this section, retrieved surface level VMRs and their temporal trends in L3L and L3W are compared, as well as their differences related to the established land-water sensitivity contrast.The effect that averaging together these retrievals has on the statistics and trends in resulting L3O mixed (L3O M ) data is then evaluated.As with Sect.3.1.2,the time series analysed in this section only contain days when L3L and L3W are both present and the L3O surface index is mixed.

L3L vs. L3W
Retrieved VMR comparison between L3L and L3W In addition to the clear land-water sensitivity contrast in coastal grid boxes at the surface, there are clear differences in the retrieved VMRs here (Figs.row sums (second column), absolute VMR retrieved minus a priori values (third column; note that for ease of interpretation, the absolute retrieved minus a priori VMR values are plotted, i.e. ignoring whether the result is positive or negative; however, the results hold if using signed values, and a duplicate of Fig. 4 with signed retrieved minus a priori VMR values is included in Sect.S4 for reference), and retrieved (fourth column) and a priori (fifth column) VMRs, for the following levels of the retrieved profile: surface (top row), 900 hPa (second row), 800 hPa (third row), 600 hPa (fourth row), and 300 hPa (bottom row).Values in boxes in the top-left corner of each panel correspond to mean values across all L3L and L3W grid boxes.These means are significantly different using a two-tailed t test (unequal variance) with p < 0.005 in all cases except ak_diagonal at 300 hPa where p = 0.13, vmr_ret_minus_apr at 300 hPa where p = 0.07, vmr_ret at 600 hPa where p = 0.30, and vmr_ret at 300 hPa where p = 0.11.No vmr_apr mean differences are significant.Values in the bottom-right corner of each panel correspond to Spearman's rank correlation coefficient (p < 0.005 in all cases).
Greater land-water sensitivity differences also tend to be associated with greater retrieved VMR differences.Figure 5b shows the distribution of retrieved surface level VMR differences (L3L − L3W) stratified by the corresponding surface level AK row sum difference.Larger retrieved VMR differences are clearly associated with greater AK row sum differences (some degree of spread in the results is expected, since the relationship also depends on the accuracy of the a priori, as outlined previously).
Of the coastal grid boxes compared, 60 % show a significant difference (p < 0.1, determined using a two-tailed Student's t test) in mean VMRs in L3L and L3W (Fig. 5a).Compared to grid boxes where the mean VMR difference is not significant, there are several notable differences (detailed in Table 2).As expected from the previous analysis, the land-water sensitivity contrast is greater when mean VMRs are significantly different (SIGDIFF L3L − L3W ) than when not (NOT_SIGDIFF L3L − L3W ).This is evident in AK row sum and VMR retrieved minus a priori differences (the .Shown are the differences for all coastal grid boxes and for only those grid boxes where the difference is significant (p < 0.1), determined using a two-tailed t test.(b) Absolute mean VMR differences (absolute retrieved VMR difference values are shown in (b) for clarity, since L3L − L3W can be either positive or negative depending on whether a priori VMRs used in the retrieval are greater or less than the true VMR being retrieved, which complicates the analysis; the corresponding plot with raw values (i.e.not discarding the ± sign) is included in the Supplement however, and the same conclusions can be drawn based on this figure (Sect.S5)) between L3L and L3W, stratified according to the corresponding AK row sum difference (L3L − L3W in both cases).(c) Absolute differences in gradients (for clarity, differences between the absolute trend values (i.e.ignoring the ± sign of the trend) are presented, since this shows the degree of difference in the trend magnitude, irrespective of trend direction; a positive trend difference in this case signifies a stronger (faster) trend in L3L than in L3* c or L3W d.) detected using WLS regression analysis for L3W (black, mean values represented by filled squares) and L3O M (red, thicker lines, mean values represented by filled triangles), compared to L3L (L3L − L3* in both cases).Shown are differences for all coastal grid boxes where WLS analysis could be performed, for grid boxes where both trends compared are significantly different to zero (p < 0.1), and for grid boxes where the trend difference is significant (p < 0.1).(d) Absolute differences in gradients (for clarity, differences between the absolute trend values (i.e.ignoring the ± sign of the trend) are presented, since this shows the degree of difference in the trend magnitude, irrespective of trend direction; a positive trend difference in this case signifies a stronger (faster) trend in L3L than L3* c or L3W d.) detected using WLS regression analysis between L3L and L3W, stratified according to the corresponding AK row sum difference (L3L − L3W in both cases).Shown are the differences for all coastal grid boxes where WLS could be performed (black, mean values represented by filled squares) and for only those grid boxes where the detected trend is significant (p < 0.1) in both L3L and L3W (red, thicker lines, mean values represented by filled triangles).
https://doi.org/10.5194/amt-16-1923-2023 Atmos.Meas.Tech., 16,2023  magnitude of difference between subsets is around 50 % and 100 %, respectively).Interestingly, the AK difference is due to sensitivity being lower over water in SIGDIFF L3L − L3W than in NOT_SIGDIFF L3L − L3W ; sensitivity over land is similar in both subsets.This may be explained as follows: when sensitivity over water is especially low, as is the case in SIGDIFF L3L − L3W , the retrieved VMR will be heavily weighted by the a priori and unable to match the variation present in the more sensitive retrieval over land.As sensitivity over water increases, this a priori weighting weakens and the retrieved VMR will more closely track the retrieval over land, resulting in a less significant difference.Also of note, a priori VMRs are much lower in SIGDIFF L3L − L3W than in NOT_SIGDIFF L3L − L3W , on average.Considered alongside the greater retrieved minus a priori differences, this suggests that the a priori VMR could be a less accurate estimate of the true VMR for the SIGDIFF L3L − L3W subset, whereas it is closer to reality for the NOT_SIGDIFF L3L − L3W subset.Intuitively, this makes sense: for a hypothetical situation where the a priori VMR is a perfect match for the true VMR and both are uniform across a coastal L3 grid box, retrievals over the land and water portions of the grid box would be expected to be identical irrespective of any differences in retrieval sensitivity over those surfaces.To summarise, assuming true VMRs are similar over land and water within coastal L3 grid boxes, differences in retrieved VMRs depend not only on the sensitivity of the retrieval, but also on the accuracy of a priori VMRs used in the retrievals.
It should be noted that there are additional physical factors that could plausibly play a role in generating the L3L − L3W retrieved VMR difference that is observed, in addition to retrieval sensitivity.Given that most CO sources are landbased, a decrease in VMRs from land to water might be expected, especially near to the surface.However, this assumption only seems reasonable where large CO sources are proximal to the coastline, as it is unrealistic to expect gradients as large as are observed in background CO (which coastal grid boxes far from large CO sources are more likely to represent) across the relatively small distance covered by an L3 grid box.Given the relatively long-lived, well-mixed nature of atmospheric CO, VMRs retrieved at a given location are a function of both local emissions and transport, and the portion of coastal L3 grid boxes situated over water therefore does not represent pristine conditions in comparison to the adjacent land-based portion of the grid boxes.This is verified by comparing a priori VMRs (also shown in Fig. 4), which suggest the land-water difference in CO concentrations should be negligible (mean L3L − L3W a priori VMR difference of 0.69 ppbv, compared to a mean retrieved VMR difference of 10.29 ppbv).Indeed, in some specific cases -e.g.uninhabited coastal areas downwind of large trans-oceanic pollution sources -VMRs may be higher over the water portion of coastal grid boxes than the adjacent land portion (note that Fig. 4 does show that this is the case in some grid boxes).The above reasoning can also be applied to the question of whether the wind direction is responsible for creating the observed L3L − L3W difference in retrieved VMRs: it could be hypothesised that a prevailing onshore wind may lead to CO concentrations being higher over land than water, yet the negligible L3L − L3W a priori VMR difference, the fact that atmospheric CO is well-mixed, and the clear land-water sensitivity gradient that has been demonstrated suggest that wind direction does not play a big role in creating the land-water difference observed in retrieved VMRs.To further rule out the role of wind direction, the L3L − L3W retrieved VMR comparison has been analysed alongside wind direction for several case study grid boxes, and there appears to be no notable shift in wind direction whether L3L or L3W is greater for a given grid box.Results for this analysis are given in Sect.S6.The weight of evidence therefore points towards L3L − L3W retrieved VMR differences being a function of reduced retrieval sensitivity over water compared to land.

Trend comparison between L3L and L3W
We now compare temporal trends detected in surface level retrievals in L3L and L3W for coastal grid boxes, and we re-late differences to the land-water sensitivity contrast outlined previously.
On average, across all grid boxes where WLS can be performed in both datasets following the criteria outlined in Sect.2.5 (n = 2670), trends are stronger in L3L than L3W (Fig. 5c, black boxplots), with the range of differences around 2.5 ppbv yr −1 (∼ −1 to 1.5 ppbv yr −1 ).When the comparison is restricted to grid boxes where both trends are significantly different to zero (p < 0.1; 641 of the 2670 grid boxes, 24 %), a greater proportion of those grid boxes have a stronger trend in L3L than L3W (> 75 %), but the overall range of differences does not shift by much.The L3L − L3W trend difference is significant in 956 of the 2670 coastal grid boxes for which the analysis can be performed (36 %), with the range in differences spanning around 4 ppbv yr −1 .The trends are negative at 75 % of coastal grid boxes in both datasets, this value increasing to 95 % when the trend in both L3L and L3W is significant.Descriptive stats corresponding to the trends values compared are detailed in Table 3.
To determine whether differences in trend can be linked to differences in retrieval sensitivity, L3L − L3W trends are stratified by L3L − L3W surface level AK row sum differences (Fig. 5d).As with mean VMR differences, the size of the trend difference tends to increase as the difference in AK row sums increases.In addition, as the magnitude of AK row sum difference increases in the positive direction (i.e.increasingly greater sensitivity over land), a greater proportion of trend differences are positive (i.e. a stronger trend over land).This pattern is even more pronounced when restricted to grid boxes where both trends are significant (also shown in Fig. 5d).
In summary, these results show a general tendency for trend underestimation in surface level retrievals over water compared to surface level retrievals over land in the same coastal grid boxes obtained at the same times, which appears to be linked to differences in retrieval sensitivity.The relationships found in these analyses are not perfect because trend differences are sensitive to several other factors, in addition to differences in retrieval sensitivity.For example, a greater trend difference would be evident if the rate of change in true CO concentrations is faster than if it is slow/negligible, for a given sensitivity difference.Similarly, there should be zero trend difference if true CO concentration levels are stable over time, irrespective of the magnitude of difference in retrieval sensitivity.The accuracy of the a priori is a further complicating factor.An underlying assumption is also that the temporal trend in true VMRs should not vary much across a 1 • × 1 • L3 grid box.Hedelius et al. (2021) lend credence to this assumption with the finding that CO trends are similar within regions spanning a few thousand kilometres (L3 grid boxes are ∼ 100 km 2 ) and that trends within urban areas are generally indistinguishable from the trend of the broader region encompassing the urban area.

Consequences for L3O data with a surface index of mixed (L3O M )
To recap, L3O data are given the surface index mixed (L3O M ) when neither land nor water is the dominant surface type of the bounded L2 retrievals, for a given retrieval time.When this is the case, the retrievals over land and water are averaged together.Users of L3O data do not have the option of choosing to only analyse the subset of retrievals made over land (L3L) or water (L3W), as was done in the preceding analysis.To do so requires the original L2 retrievals.In this section, the L3O M retrievals are compared to the L3L retrievals that were analysed in the previous section.The aim here is to demonstrate how, for some L3 grid boxes, information on true VMRs and temporal trends that is available in the L2 retrievals over land (L3L) is effectively lost to users of L3O data by their averaging together with the less sensitive L2 retrievals over water (L3W).

Retrieved VMRs in L3O M
For long-term mean VMRs, L3O M unsurprisingly represents a midpoint between L3L and L3W, with lower VMRs than L3L but a smaller difference range overall than L3W (Fig. 5a, red boxplots).The L3L − L3O M differences in long-term mean VMR are significant at 45 % (1791) of coastal grid boxes.All but three of these grid boxes also see a significant difference between long-term mean VMRs in L3L and L3W.This makes sense: retrievals in L3L would not be expected to differ significantly from those in L3O M if they did not also differ significantly from L3W.In total, 75 % of grid boxes that feature a significant difference between L3L and L3W also see a corresponding significant difference between L3L and L3O M .There are several notable differences between this subset of coastal grid boxes (BOTH VMRs ), compared to those that see a significant difference between L3L − L3W but not between L3L and L3O M (L3L_L3W_ONLY VMRs ), detailed in Table 4a.
-The grid boxes of BOTH VMRs see greater retrieved VMR differences between L3L and L3W than the grid box subset of L3L_L3W_ONLY VMRs (mean L3L − L3W difference of 13.84 vs. 8.67 ppbv).This is logical: L3O M only differs significantly from L3L if the underlying L3L − L3W difference is sufficiently large to persist through averaging.
-The grid boxes of BOTH VMRs also feature a greater land-water sensitivity contrast than those of L3L_L3W_ONLY VMRs .This is indicated both by L3L − L3W AK row sum differences, driven predominantly by decreased sensitivity over water in BOTH VMRs , and by L3L − L3W retrieved minus a priori VMR differences. -

Trends in L3O M
Temporal trends detected in L3O M are now compared to those in L3L (Fig. 5c, red boxplots).Overall, a greater number of grid boxes feature a significant trend in both L3L and L3O M than in L3L and L3W (873 vs. 641; 33 % vs. 24 %), and fewer see a significant difference between trends (555 vs. 956; 21 % vs. 36 %).This is to be expected, given that the L2 retrievals contributing to L3L also contribute to L3O M .The trends in L3L and L3O M are signifi-cantly different in just under half (47 %) of the grid boxes where the trend is also significantly different between L3L and L3W (BOTH TRENDS ; Table 4b).These grid boxes are clearly more water-dominated than the remaining 53 % of grid boxes where the trend difference between L3L and L3W is significant but the L3L − L3O M difference is not (L3L_L3W_ONLY TRENDS ).This is indicated by a mean ratio(land / water) of 0.77 for BOTH TRENDS vs. 0.99 for L3L_L3W_ONLY TRENDS .Additionally, detected trends in the grid boxes of BOTH TRENDS are slightly stronger, with a greater difference between L3L and L3W, than for the L3L_L3W_ONLY TRENDS subset.Those L3 grid boxes featuring the strongest land-water trend difference are therefore most likely to also see a significant trend difference between L3L and L3O M .Again, this is logical.Unlike with the retrieved VMR comparison above however, there are no clear differences in mean retrieved or a priori VMRs, or in sensitivity metrics, between these two grid box subsets (also detailed in Table 4b).However, it is not necessarily expected that there would be clear differences in these parameters for this analysis, since trend magnitudes themselves are also a variable (i.e. the trend in true CO varies across space, independently of retrieval sensitivity or CO concentration, complicating the relationships outlined above).Most of the grid boxes where the L3L and L3O M trends are significantly different also feature a significant difference between L3L and L3W (453 of 555; 82 %).There are no clear differences between these and the remaining 18 % of grid boxes that, counter-intuitively, feature a significant difference between trends in L3L and L3O M but not between trends in L3L and L3W.However, small discrepancies are to be expected for results based on statistical thresholds, especially where the variables being compared are subject to multiple different factors (e.g.land-water surface cover ra-Table 4. (a) Descriptive stats corresponding to matched retrievals over land and water (L3L and L3W) where the long-term mean retrieved surface level VMR in L3L and L3W is significantly different (p < 0.1, n = 2379).Grid boxes are divided into two subsets depending on whether long-term mean VMRs in L3L and L3O M are significantly different (p < 0.1; BOTH VMRs ) or not (p > 0.1; L3L_L3W_ONLY VMRs ).The metric ratio(land / water) indicates the relative land vs. water surface coverage of an L3 grid box.A ratio(land / water) value > 1 (< 1) implies that more of the grid box surface is covered by land (water).(b) Descriptive stats corresponding to matched retrievals over land and water (L3L and L3W) where the temporal trend detected using WLS regression analysis on yearly mean retrieved surface level VMR in L3L and L3W is significantly different (p < 0.1, n = 956).Grid boxes are divided into two subsets depending on whether the trend in L3L is significantly different to the corresponding trend detected in L3O M (p < 0.1; BOTH TRENDS ) or not (p > 0.1; L3L_L3W_ONLY TRENDS ).The metric ratio(land / water) indicates the relative land vs. water surface coverage of an L3 grid box.A ratio(land / water) value > 1 (< 1) implies that more of the grid box surface is covered by land (water).

(a)
BOTH tio in L3O M , land-water sensitivity contrast, retrieved VMR differences, differences in the true CO concentration being retrieved and its change over time).

Implications for users of L3O data
So far, this paper has shown a clear difference in retrieval sensitivity over land and water for coastal grid boxes, demonstrated how long-term VMR statistics and temporal trends calculated using these retrievals (L3L and L3W) differ, and outlined consequences of averaging these retrievals together to create L3O M .The full time series of available data in L3O is now compared with L3L and L3W, without the constraint that a retrieval needs to be present in both L3L and L3W for it to be included in the analysis.This replicates what a user of the L3O data would do, i.e. work with all available data.Users of MOPITT data are advised to restrict their analysis to retrievals performed over land.This poses a quandary for users of L3O: what should be done about days with a surface index of mixed?Therefore, the implications of choosing to include or discard these days are also considered.In the sub-sequent sections, the following subsets of the full L3O time series for each coastal grid box are analysed: the full L3O time series with no filtering by surface index (L3O NF ), only days with a surface index of land (L3O L ), and days when the surface index is land or mixed (L3O LM -i.e.only days with an L3O surface index of water are discarded).

Loss of available data
The guideline to only analyse retrievals performed over land results in a huge loss of data for coastal grid boxes when using the L3O dataset.This is quantified by comparing the total number of days with data for analysis at each coastal grid box in L3O L (n_days(L3O L )) and L3O NF (n_days(L3O NF )) (Fig. 6a).Strikingly, 35 % of coastal grid boxes (total coastal grid boxes: 4299) have zero days in L3O L , and 67 % have a surface classification of land less than 5 % of the time in L3O (yielding an n_days(L3O L / L3O NF ) ratio of 0.05 or less in Fig. 6a).Importantly, retrievals over land are made on a large proportion of these filtered days, but they are either discarded altogether or averaged together with retrievals made over wahttps://doi.org/10.5194/amt-16-1923-2023Atmos. Meas. Tech., 16, 1923-1949, 2023 ter to create L3O M .This point is demonstrated by comparison to the total number of days with data for analysis at coastal grid boxes in L3L (n_days(L3L)).In contrast to a mean (median) n_days(L3O L / L3O NF ) ratio of 0.08 (0.01), a mean (median) n_days(L3L / L3O NF ) ratio of 0.44 (0.40) demonstrates the stark loss of available data.This is further highlighted by the fact that over half (56 %) of coastal grid boxes have at least 25 times more days with retrievals made over land than are available for analysis in the L3O dataset if filtering guidelines are followed (as shown by the ratio n_days(L3L / L3O L ) in Fig. 6b, black line).The situation can be improved for L3O users by keeping days when the L3O surface index is classified as mixed, in addition to land (L3O LM ).Even in this best-case scenario however, L3O LM sees fewer days with data than L3L for over 60 % of coastal grid boxes (ratio n_days(L3L / L3O LM ) in Fig. 6b, orange line).Moreover, a large proportion of these L3O LM days have the surface index of mixed and therefore suffer from the averaging together of retrievals over land with retrievals over water, which, as has been shown, can significantly impact the results of analyses using these data.This point is returned to in following sections.
Intuitively, it is to be expected that the ratio n_days(L3L / L3O LM ) should never be < 1. L2 retrievals over land obviously contribute to days when L3O is classified as land and should, by definition, also contribute to days when L3O is classified as mixed.In these cases, L3L will therefore also be present.However, there are two instances where L2 retrievals over land in fact do not contribute to an L3O retrieval classified as mixed.Firstly, L2 retrievals themselves also have a surface classification of mixed when the L2 retrieval does not predominantly overlie water or land.L3O can thus have a surface classification of mixed when created from bounded L2 retrievals that are retrieved over either only a mixed surface or a combination of mixed and water: in both cases, there are no L2 retrievals over land and therefore no L3L retrievals.Secondly, analyses performed for this paper identified numerous instances where L3O is classified as mixed, but the only contributing L2 retrievals are retrievals over water.In these instances, L3O therefore seems to be misclassified.On days when this is the case, there will be no corresponding L3L retrieval.This is documented further in Sect.S7.Attempting to quantify the extent of this misclassification influence is beyond the scope of this paper.In the vast majority of cases where a given grid box has an n_days(L3L / L3O LM ) ratio < 1, the difference is negligible (i.e.75 % of these grid boxes have a ratio between 0.9 and 1).Regardless, in terms of the number of days with retrievals available for analysis, L3L is an improvement over L3O LM for more grid boxes than it is not.

Scientific implications
Long-term mean (ltm) retrieved VMR values from the different L3O subsets are compared to L3L for all coastal grid boxes.As expected from the analyses in Sect.3.2, all L3O subsets that have some influence from L2 retrievals over water have an ltm retrieved VMR that is below that in L3L, on average (Fig. 7a).Unsurprisingly, the closest match to L3L is L3O L (mean difference −3.1 ppbv), with the mean difference increasing for each L3O subset as the influence of retrievals over water increases (e.g.L3O LM differs less on average from L3L (mean difference of 5.2 ppbv) than L3O NF (mean difference of 9.1 ppbv), which additionally features days when L3O is solely created from L2 retrievals performed over water).
Note that ltm retrieved VMRs in L3O L and L3L are not a perfect match because L3O L is only a subset of L3L for each grid box considered in the analysis: L3L may be present on a day when L3O L is not owing to the way that the L3O data are created (i.e.classified based on the ratio of L2 retrievals over land and water, with retrievals over land potentially being discarded if these are not the majority).Apart from L3O L , fewer than 25 % of the coastal grid boxes have a retrieved ltm VMR that is greater in an L3O subset than in L3L.The range of ltm differences for each of these L3O subset comparisons to L3L exceeds 35 ppbv (excluding outliers), with over 25 % of coastal grid boxes compared having ltm differences exceeding 9 ppbv (as indicated by boxplot upper-quartile values).
The percentage of coastal grid boxes that feature a significant difference between ltm retrieved VMRs in L3L and each L3O subset (indicated in blue above each boxplot) is high: strikingly, it is found that, for the two subsets that L3O users could realistically choose to analyse if following data-filtering guidelines (L3O L or L3O LM ), almost a quarter (L3O L ) or almost half (L3O LM ) of coastal grid boxes see a significant difference to L3L.
The results of WLS regression analysis on yearly mean values from each dataset are now compared.As expected from the earlier analysis, trends are strongest, on average, in L3L and L3O L -this is especially so when the comparison is restricted only to trends that are significantly different from zero (p < 0.1) (Table 5).These datasets also have the largest measures of spread, indicating their tendency to yield stronger trends than the other L3O subsets (and L3W), and these measures lessen for each L3O subset as the influence of retrievals over water increases.Concomitantly with trends decreasing in strength as the influence of retrievals over water increases in each L3O subset, overall retrieval sensitivity also decreases, as indicated by the mean averaging-kernel metrics shown in Table 5. Comparing the magnitude of trends at each coastal grid box, significant trends are stronger in L3L for at least 75 % of grid boxes for all comparison datasets apart from L3O L (Fig. 7b).L3O L sees stronger trends than L3L on average, but the comparison of these two datasets needs to be interpreted with caution due to L3O L being a subset of L3L that features far fewer days with data, as discussed previously.Like with ltm retrieved VMRs discussed above, the percentage of coastal grid boxes that feature a significant dif-Figure 6. Cumulative-frequency histograms comparing the number of days with data for different L3O subsets and L3L at coastal L3 grid boxes.A ratio < 1 (> 1) indicates the plotted dataset has fewer (more) days with data than the comparison dataset that is indicated on the x axis.(a) L3O L (dash-dotted green line), L3L (solid black line), and L3O LM (dashed orange line) are compared to the as-downloaded L3O dataset, without any filtering by surface index (L3O NF ).Values in the legend correspond to the mean and median ratio for the indicated dataset, respectively.Note, as a result of how coastal grid boxes are classified (outlined in Sect.2.3), all n_days(L3O L / L3O NF ) ratios are below 0.5 (i.e. at best, L3O has a surface classification of land on 50 % of days).(b) L3L is compared with L3O L (solid black line, bottom x axis) and L3O LM (dashed orange line, top x axis).Values in the legend correspond to the mean and median ratios, respectively.
Figure 7. Boxplots showing how mean VMRs and trends compare from selected L3O subsets and L3W to L3L.Values compared are calculated from all available data across the study period.Mean values are represented by filled squares, and values above the boxplots correspond to the number of grid boxes with data for that boxplot (black, top row), the mean value (black, second row), and the percentage of grid boxes represented in that boxplot that feature a significant difference with L3L (blue, third row).The comparison is calculated as L3L − L3* in both cases; therefore a point above (below) the black y = 0 line indicates that the value being compared is greater (lower) in L3L.(a) Mean VMR differences between L3L and the indicated L3O subset or L3W.Note that the n value is different for each boxplot because not all L3 subsets are present at every coastal grid box, as shown in Sect.3.3.1.(b) Differences in gradients (absolute values) detected using WLS regression analysis between L3L and the indicated L3O subset or L3W.Shown are the differences for all coastal grid boxes where WLS could be performed for both datasets compared (black, mean values represented by filled black squares) and for only the sample of those grid boxes where the detected trend is significant (p < 0.1) in both (red, thicker lines, mean values represented by filled triangles).

Illustrative examples comparing L3O and L3L: analysis of the most populous coastal cities
In this section, time series from the 33 L3 coastal grid boxes that contain cities classified amongst the 100 most populous in the world (derivation outlined in Sect.2.5) are analysed to illustrate the differences between mean values and trends obtained from the L3O and L3L datasets.The comparison is focused on L3O L and L3O LM , as these are the L3O subsets that data users would realistically choose to analyse if following the data-filtering guidelines.For clarity, from here these grid boxes are referred to by the name of the city that they contain.A detailed case study for the L3 grid box containing the city of Dubai is first presented, before considering results for all cities analysed.

Detailed case study: L3 grid box containing Dubai
Summary stats derived from the L3O subsets, L3L, and L3W (included for comparison), for the L3 grid box containing the city of Dubai, are given in Table 6. Figure 8 visualises the daily retrieved VMR time series from L3L, with L3O L overlaid for comparison purposes.Of a possible 1620 d with data in the unfiltered L3O dataset for this grid box, a mere 70 d (4 %) remains for analysis when following data-filtering guidelines to restrict analysis to retrievals performed over land only (the L3O L subset).By contrast, there is 1523 d available for analysis using the L3L dataset for this grid box (94 % of total days with retrievals in the L3O dataset).However, in L3O, on most days these retrievals over land are averaged together with retrievals over water to create L3O M , as evidenced by the L3O LM subset containing 1486 d with data for this grid box (92 % of total days in the L3O dataset).That L3L has a greater number of days with data than the L3O LM subset indicates that there are days in L3O with a surface index of water where L2 retrievals were present over land but were discarded because of the L3 creation process.
Long-term mean retrieved VMR is greatest in the landonly datasets L3O L and L3L.The value in L3O L is 10 ppbv greater than in L3L.Given that L3O L is a very small subset of L3L, this appears to be a large overestimate, when compared to L3L.Long-term mean retrieved VMR in L3O LM is 11 ppbv lower than in L3L.This is clearly a result of the inclusion of retrievals over water in this dataset, via L3O M , with long-term mean retrieved VMR in L3W being 17 ppbv lower than L3L.Both the L3L vs. L3O LM and L3L vs. L3W mean differences are significant (p < 0.1).Consistent with the results shown in Sect.3.2.2 when identifying factors that determine whether the averaging of L2 retrievals over land and water to create L3O M can yield statistically significantly different retrievals to L3L, this L3 grid box is waterdominated, with a mean ratio(land / water) of 0.60.It is also notable that the standard deviation of long-term mean retrieved VMR in L3L (and L3O L ) is roughly twice as large as that in L3O LM and L3W, which is to be expected given that retrievals over water are more greatly tied to their a priori than retrievals over land due to their comparatively low sensitivity (as discussed in Sect.3.2.1).
The trends detected using WLS analysis following the method outlined in Sect.2.5 are visualised in Fig. 9 (note that trend values are also given in Table 6 in both ppbv yr −1 and % yr −1 ), along with the yearly mean VMR values that were used in the regression.Detected trends are clearly strongest in the land-only datasets L3O L and L3L, with the L3O L trend being significantly stronger (p < 0.1) than the L3L trenda difference equating to almost 1 % yr −1 (2.01 ppbv yr −1 ).Again, given the far superior temporal coverage of L3L, this is the more reliable result.The trend in L3L is 0.65 % yr −1 (1.28 ppbv yr −1 ) stronger than in L3O LM , which corresponds Table 6.Summary stats from L3O subsets compared, L3L, and L3W (for comparison), for the L3 grid box containing the city of Dubai.Note that across the whole study period (1 September 2001 to 31 December 2018), there are 5988 MOPITT files available.There is 1620 d with data in the L3O dataset (unfiltered by surface index), 27 % of the whole study period.The WLS trend in units of % yr −1 is calculated by dividing the trend in units of ppbv yr −1 by the respective long-term mean VMR value.to a difference of almost 12 % over the 18-year period of analysis.The trend in L3O LM is clearly weakened by inclusion of retrievals over water, with the trend in L3W being over 1 % yr −1 weaker than in L3L.Note that this trend analysis has been repeated using an alternative regression method which is less sensitive to outlying values (Theil-Sen slope estimator), and the results are unchanged.This is detailed further in Sect.S8.
To summarise, if L3O users follow data-filtering guidelines and restrict analysis to retrievals only performed over land, there is a huge loss of data coverage in the L3O dataset for the coastal L3 grid box containing the city of Dubai.Choosing to work with L3O L despite this would lead to results that are clearly erroneous, when compared to L3L, which has far greater temporal coverage (almost 22 times more days with data than L3O L ).L3O users could make the decision to include days with an L3 surface classification of mixed in their analysis to increase temporal coverage (the L3O LM dataset analysed here).However, doing so would yield both lower retrieved VMRs, on average, and significantly weaker decreasing trends than L3L.This is demon-strably due to the incorporation of retrievals over water into L3O LM (via L3O M ), as shown by the comparison with L3W.

Discussion of results for all cities analysed
The above analysis is repeated for all 33 cities.The number of days with data, long-term mean retrieved VMRs, and temporal trends are given in Table 7 for the L3 grid boxes containing these cities for each of the L3O subsets considered, L3L, and L3W (for comparison).These metrics are evaluated in turn below.

Temporal coverage
The loss of data in L3O if filtering for retrievals over land only (L3O L ) is clear: 6 of the cities cannot be studied at all using L3O L (number of days with data is 0), and of the remaining 27 cities with data in this L3O subset, only a single city (Osaka) has more than 50 % of the days with data in L3L.The mean n_days(L3O L / L3L) ratio for these 27 cities is 0.18 -i.e. on average, there are over 5 times more days with data in L3L than are available in L3O when filtering for retrievals over land only.
https://doi.org/10.5194/amt-16-1923-2023Atmos. Meas. Tech., 16, 1923-1949, 2023 Table 7. Summary stats for the L3 grid boxes containing the 33 cities of interest from each of the L3O subsets considered, L3L, and L3W (for comparison).For each grid box and dataset, the following stats are shown: (1) ratio(land / water), which is an indicator of the relative land vs. water surface coverage of an L3 grid box; (2) the number of days with data across the whole study period; (3) the mean retrieved VMR (± the standard deviation), in parts per billion by volume (ppbv); and (4) the trend from WLS regression analysis (± the standard error), in ppbv yr −1 .Dash symbols ("-") indicate that the stat cannot be calculated for a given grid box and dataset owing to lack of data.Bold text indicates that a dataset mean or trend value is significantly different to the value in L3L for that city (p < 0.1).Italicised text indicates that the trend value is not significantly different to zero (p < 0.1).Bold italics indicate that the trend value is not significantly different to zero and that it is significantly different to the trend in L3L for that city.−1.8 * The modified mean, shown in the bottom row of the table, corresponds to the mean value that is calculated only for cities where there is a corresponding stat in the L3O L dataset.For 1-3, this corresponds only to cities where the number of days with data L3O L > 0 (n = 27).For 4, this corresponds only to cities where there are enough days with data for the regression analysis to be performed in L3O L (n = 18).By contrast, the mean value, shown in the penultimate table row, simply represents the mean of all values in that column.
Figure 9. Yearly mean ("ymean" in legend) retrieved VMR in the different datasets being investigated and the trend lines obtained from WLS regression analyses on each of these datasets ("wls_predictions" in legend).Black crosses and solid black lines correspond to L3L; filled green circles and dotted green lines correspond to L3O L ; filled orange squares and dashed orange lines correspond to L3O LM ; filled blue triangles and dash-dotted blue lines correspond to L3W.Trend values for each dataset are also given in Table 6.
L3O LM compares more favourably to L3L in terms of the number of days with data, due to the inclusion of days when the L3O surface index is mixed, with a mean n_days(L3O LM / L3L) ratio of 0.85.n_days(L3O LM ) > n_days(L3L) for 11 of the 33 cities, although the difference is generally small.L3O M is the dominant component of L3O LM in all cases here, being the classification on 84 % of days, on average, across all 33 cities (max = 100 %, min = 45 %).

VMR comparison
The consequence of the loss of data in L3O L is clear: compared to L3L, mean VMR in L3O L is higher, and the magnitude of this difference generally depends upon how many data are lost in L3O L .Mean VMR across all cities (excluding the 6 cities where n_days(L3O L ) = 0) is 17 ppbv higher in L3O L than in L3L.This falls to 10 ppbv if restricted to cities where the n_days(L3O L / L3L) ratio is greater than 0.05 (n = 17) and to 7 ppbv if restricted to cities where the n_days(L3O L / L3L) ratio is above 0.2 (n = 11).The mean VMR difference (L3L − L3O L ) is significant (p < 0.1) for 11 of the 27 cities that can be compared; in these cases, L3O L is a smaller subset of L3L than for the cities where the mean VMR difference is not significant (n_days(L3O L / L3L) = 0.15 vs. 0.22, respectively), and the mean VMR difference is unsurprisingly much greater (−36 vs. −4 ppbv).
The L3L − L3O LM mean VMR difference is relatively small by comparison (4 ppbv, all 33 cities).However, this does hide some much larger discrepancies between L3L and L3O LM for certain cities, with the difference exceeding 10 ppbv in 11 cases and 20 ppbv in 3 of them.The difference is significant (p < 0.1; SIGDIFF L3L − L3OLM ) for 13 of 33 cities (39 %).Compared to the subset where the L3L − L3O LM mean difference is not significant (n = 20, 61 %; NOT_SIGDIFF L3L − L3OLM ), the following characteristic differences are found (also detailed in Table 8).
-The grid boxes in SIGDIFF L3L − L3OLM have a greater proportion of their surface covered by water than NOT_SIGDIFF L3L − L3OLM : this is evidenced by a mean ratio(land / water) of 0.51 in SIGDIFF L3L − L3OLM vs. 1.02 in NOT_SIGDIFF L3L − L3OLM , indicating there are more retrievals over water than land in the former, and also by the fact that on average, L3O L only contributes to L3O LM in SIGDIFF L3L − L3OLM on 9 % of days vs. 20 % of days for NOT_SIGDIFF L3L − L3OLM (which means that retrievals over water contribute via L3O M more frequently to L3O LM in SIGDIFF L3L − L3OLM than NOT_SIGDIFF L3L − L3OLM ).
-The L3L − L3W VMR ret differences are larger in SIGDIFF L3L − L3OLM than NOT_SIGDIFF L3L − L3OLM (mean of 31.15 vs. 18.44 ppbv), meaning they are less likely to be hidden by averaging to create L3O M .
-Land-water mean averaging-kernel differences suggest there is not a large land-water sensitivity contrast between the SIGDIFF

Trend comparison
On average, the strongest trends are seen in L3O L .However, as with the Dubai case study, this often appears as an outlier compared to the other datasets -a consequence of its comparatively very sparse temporal coverage.As expected from previous sections, the weakest trends are detected in L3W, with L3O LM representing a midpoint between this and L3L.
Of the 18 cities where WLS analysis can be performed in L3O L , there are 9 where the resulting trend -and thus conclusion drawn from the analysis -is significantly different to that in L3L.In 3 of these cases (Dubai, Wenzhou, Bangkok), the trend in L3O L can be judged to be a strong overestimate given the large difference to the corresponding trends in L3L (trend standard errors do not overlap) and the very small number of days with data that these trends are based on when compared to L3L (n_days(L3O L / L3L) ratio < 0.08 in each case).There are 4 additional cities where a significant trend in L3O L appears to be an overestimate, when compared the L3L: Abidjan, Surat, Saigon, and Buenos Aires.This is because the trend for these cities in L3L is not significantly different to 0, which, given the higher number of days with data in L3L (n_days(L3O L / L3L) ratio = 0.44, 0.31, 0.49, and 0.28, respectively), appears to be the more reliable result.The L3O L trend for Miami is insignificant and derived from very low n.L3O L is also the only dataset to yield an insignificant trend for Qingdao.
As with mean VMRs, trends in L3O LM compare better than L3O L to L3L.However, there are still five cases where L3O LM and L3L yield significantly different results.For three of these (Hong Kong SAR, Istanbul, and Dubai, as covered in detail in Sect.3.4.1),interpretation of the difference is simple: L3O LM is a significant underestimate of the CO change over time.This is very likely due to the inclusion of retrievals over water in this dataset, as evidenced by L3W yielding a significantly weaker trend than L3L in all three cases.In the remaining two cases -New York and Saigon -interpretation is more complicated.For both these cities, the trend detected in L3L is not significantly different from zero, whereas the trend in L3O LM is.Does this mean that the trend in L3O LM is an overestimate?Possibly.However, in both cases, the trends are within 1 standard error of each other and therefore within the range of sampling uncertainty.There are an additional two cities where WLS could be performed in L3L but not L3O LM (Dar Es Salaam and Taipei), but n_days(L3L) is so low (44 and 36, respectively) that these results are not deemed to be trustworthy.
As outlined in Sect.2.5, it is important to note that the trends presented in this section are for illustrative purposes only, with the intention of demonstrating that different results can be obtained depending on whether L3O or L3L (and, by extension, L2) data are analysed.More focused analysis is needed to verify these trends, which is beyond the scope of this paper.The trend analysis has been repeated using an alternative regression method which is less sensitive to outlying values (Theil-Sen slope estimator), and the main results reported above stand.This is detailed further in Sect.S8.

Summary and conclusions
The aim of this paper was to compare surface level retrievals and their temporal trends in as-downloaded L3 data (L3O) with those that could be obtained if only the L2 retrievals performed over land are averaged to create the L3 product (L3L), for all coastal L3 MOPITT grid boxes around the globe (n = 4299).This work is motivated by a conflict between the recommendation that MOPITT data users restrict analyses to retrievals performed over land owing to known sensitivity issues over water (MOPITT Algorithm Development Team, 2018;Deeter et al., 2015) and the reality that L3O data are created from L2 retrievals performed over both land and water for coastal L3 grid boxes, limiting the ability of L3 data users to follow the recommendation in these cases.In short, this study has sought to answer the question, "does it matter?"Analysis has focused on comparing the original, asdownloaded L3 dataset (L3O) with new land-only and wateronly L3 products (L3L and L3W, respectively) that have been created from the L2 retrievals.The main results are summarised below.
First, a direct comparison of the L2 retrievals performed over land (L3L) and water (L3W) that are averaged together to create L3 products on days when the L3 surface index is mixed (L3O M ) identified the following: -Retrieval information content is clearly greater in L3L than L3W.The corresponding mean L3L − L3W VMR difference is over 10 ppbv, significant (p < 0.1) at 60 % of the coastal grid boxes compared.
-Temporal trends are also stronger, on average, in L3L (mean difference is 0.28 ppbv yr −1 or 0.43 ppbv yr −1 if only considering trends significantly different to zero), with the L3L − L3W trend difference significant (p < 0.1) at 36 % of grid boxes where a trend comparison was possible.
-Larger L3L − L3W differences in mean VMRs and trends are clearly associated with greater differences in retrieval sensitivity.
-The resulting VMRs in L3O M are significantly different to L3L for 75 % of grid boxes where the L3L − L3W difference is also significant; this corresponds to 45 % of all coastal grid boxes compared.Whether or not L3O M and L3L differ significantly depends on multiple factors including the ratio of land / water surface cover in the grid box; the strength of the land-water sensitivity contrast and VMR difference; and, potentially, the accuracy of the a priori.
-Just under half of the grid boxes that featured a significant L3L − L3W trend difference also see trends differing significantly between L3L and L3O M .As with the mean VMR comparison, these grid boxes are more water-dominated than the subset whereby the L3L − L3W trend difference is significant but the L3L − L3O M trend difference is not.They also feature stronger L3L − L3W trend differences overall, but no other variables (such as ltm VMRs and sensitivity metrics) show clear differences.
Having established the degree of difference in L3O M and L3L retrievals that is caused directly by averaging L3L with the less sensitive L3W, the full L3O dataset with differing surface filtering options was compared to L3L: -If L3O is filtered so that only retrievals over land (L3O L ) are analysed, as has been recommended (MO- PITT Algorithm Development Team, 2018;Deeter et al., 2015), there is a huge loss of data, in terms of days with data to analyse.This is a direct result of L2 retrievals over land routinely being discarded during the L3O creation process or being averaged with L2 retrievals over water, creating L3O M (at least for coastal grid boxes).The problem can be alleviated by also retaining L3O M retrievals, but these additional days with data feature some influence from retrievals made over water that can affect results, as outlined.The resulting L3O LM subset still has fewer days with data than in L3L for 61 % of coastal grid boxes.
-Almost a quarter (half) of coastal grid boxes see a significant difference in ltm VMR between L3L and L3O L (L3O LM ).Over a third (almost a quarter) of the trends in L3O L (L3O LM ) are significantly different to L3L.
-Focusing on the L3 grid boxes containing the 33 largest coastal cities in the world, mean VMRs in L3O L and L3L differ significantly for 11 of the 27 grid boxes that can be compared (40 %; there are no L3O L data for the remaining 6 cities).The L3L − L3O LM mean VMR difference across all 33 grid boxes is relatively small (3.7 ppbv), but this does hide some much larger discrepancies, with the difference exceeding 10 ppbv for 11 of the 33 grid boxes and 20 ppbv for 3 of them.The difference is significant for 13 of 33 grid boxes (39 %).Of the 18 grid boxes where WLS analysis can be performed in L3O L , there are 9 cases where the trend is significantly different to that in L3L.The trends in L3O LM and L3L differ significantly for 5 of the 33 grid boxes.
From these results, it can be concluded that, yes, for at least a quarter of all MOPITT coastal L3 grid boxes, it does matter that there is limited capacity to filter out the influence of retrievals over water in L3 data -at least without a huge loss of temporal coverage.Demonstrably, there are significant differences, sometimes very large, in the mean VMRs and temporal trends that can be obtained using L3O and L3L.These differences could have tangible consequences, depending on the purpose for which the MOPITT data are being used.While acknowledging that this analysis has also shown that there is a sizeable proportion of coastal grid boxes where, statistically, mean VMRs and trends do not differ significantly between L3L and L3O, there is enough evidence to suggest that an additional L3 land-only product, created only from averaging bounded L2 retrievals performed over land -the L3L dataset that has been analysed in this paper, could be beneficial to the research community.This L3L dataset enables L3 users to maximise retrieval information https://doi.org/10.5194/amt-16-1923-2023Atmos. Meas. Tech., 16, 1923-1949, 2023

Figure 1 .
Figure1.Example of a coastal L3 grid box (dashed black box) and bounded L2 retrievals from which the L3 products for that grid box are created.Purple (green) boxes correspond to L2 retrievals with a surface index of "water" ("land").Note that only L2 retrievals with a midpoint that falls within the boundaries of the L3 grid box will be used in L3 creation for that grid box.These are indicated by solid purple/green outlines -those not included in L3 creation for this grid box are shown with dotted purple/green outlines.More information on surface indexing and L3 product creation is given in Sect.2.2."Coastal" L3 grid box classification is outlined in Sect.2.3.The coastal L3 grid box visualised here contains the city of Dubai (∼ centre at 25.277 • N, 55.296 • E), which features in the case study analysis of Sect.3.4.Faint background shading is from NASA Blue Marble imagery.

Figure 2 .
Figure 2. Maps showing the stages of derivation of the coastal L3 grid box mask applied in this paper to MOPITT data.(a) Frequency with which L3 grid boxes are given the surface index of mixed, calculated from daily data between 25 August 2001 and 28 February 2019.(b) Frequency with which L3 grid boxes that have a surface index of mixed at least once in panel (a) have the surface index of land, compared to the total number of days with which L3 data are available for that grid box (expressed as n_days(L3O L / L3O)).(c) As (b) but with a threshold of n_days(L3O L / L3O) < 0.5 applied.This is the coastal L3 grid box mask used in this paper.

Figure 3 .
Figure 3. Mean sensitivity metrics from MOPITT L3 data, averaged across the entire study period (September 2001-February 2019, inclusive).Shown are AK diagonal values (left column), AK row sums (centre column), and VMR retrieved minus a priori values (right column) for the following levels of the retrieved profile: surface (top row), 900 hPa (second row), 800 hPa (third row), 600 hPa (fourth row), and 300 hPa (bottom row).Values in white boxes correspond to mean values across all land (L) and water (W) L3 grid boxes.

Figure 4 .
Figure4.Mean sensitivity metrics and VMRs (retrieved and a priori) from coastal L3 grid boxes.Values compared in the scatterplots are mean values from matched L3L and L3W retrievals within these grid boxes.Matched means that only days when both L3L and L3W are present and the L3O surface index is mixed are used to create the mean values analysed.Shown are AK diagonal values (left column), AK row sums (second column), absolute VMR retrieved minus a priori values (third column; note that for ease of interpretation, the absolute retrieved minus a priori VMR values are plotted, i.e. ignoring whether the result is positive or negative; however, the results hold if using signed values, and a duplicate of Fig.4with signed retrieved minus a priori VMR values is included in Sect.S4 for reference), and retrieved (fourth column) and a priori (fifth column) VMRs, for the following levels of the retrieved profile: surface (top row), 900 hPa (second row), 800 hPa (third row), 600 hPa (fourth row), and 300 hPa (bottom row).Values in boxes in the top-left corner of each panel correspond to mean values across all L3L and L3W grid boxes.These means are significantly different using a two-tailed t test (unequal variance) with p < 0.005 in all cases except ak_diagonal at 300 hPa where p = 0.13, vmr_ret_minus_apr at 300 hPa where p = 0.07, vmr_ret at 600 hPa where p = 0.30, and vmr_ret at 300 hPa where p = 0.11.No vmr_apr mean differences are significant.Values in the bottom-right corner of each panel correspond to Spearman's rank correlation coefficient (p < 0.005 in all cases).

Figure 5 .
Figure5.Boxplots showing how mean VMRs and trends from WLS analysis compare for coastal L3 grid boxes, calculated from matched retrievals within these grid boxes.Matched means that only days when both L3L and L3W are present and the L3O surface index is mixed are used to create the mean values analysed.Mean values are represented by filled squares/triangles, and values above the boxplots correspond to the number of grid boxes with data for that boxplot and the mean value, respectively.(a) Mean VMR differences for L3W (black, mean values represented by filled squares) and L3O M (red, thicker lines, mean values represented by filled triangles) compared to L3L (L3L − L3* in both cases).Shown are the differences for all coastal grid boxes and for only those grid boxes where the difference is significant (p < 0.1), determined using a two-tailed t test.(b) Absolute mean VMR differences (absolute retrieved VMR difference values are shown in (b) for clarity, since L3L − L3W can be either positive or negative depending on whether a priori VMRs used in the retrieval are greater or less than the true VMR being retrieved, which complicates the analysis; the corresponding plot with raw values (i.e.not discarding the ± sign) is included in the Supplement however, and the same conclusions can be drawn based on this figure (Sect.S5)) between L3L and L3W, stratified according to the corresponding AK row sum difference (L3L − L3W in both cases).(c) Absolute differences in gradients (for clarity, differences between the absolute trend values (i.e.ignoring the ± sign of the trend) are presented, since this shows the degree of difference in the trend magnitude, irrespective of trend direction; a positive trend difference in this case signifies a stronger (faster) trend in L3L than in L3* c or L3W d.) detected using WLS regression analysis for L3W (black, mean values represented by filled squares) and L3O M (red, thicker lines, mean values represented by filled triangles), compared to L3L (L3L − L3* in both cases).Shown are differences for all coastal grid boxes where WLS analysis could be performed, for grid boxes where both trends compared are significantly different to zero (p < 0.1), and for grid boxes where the trend difference is significant (p < 0.1).(d) Absolute differences in gradients (for clarity, differences between the absolute trend values (i.e.ignoring the ± sign of the trend) are presented, since this shows the degree of difference in the trend magnitude, irrespective of trend direction; a positive trend difference in this case signifies a stronger (faster) trend in L3L than L3* c or L3W d.) detected using WLS regression analysis between L3L and L3W, stratified according to the corresponding AK row sum difference (L3L − L3W in both cases).Shown are the differences for all coastal grid boxes where WLS could be performed (black, mean values represented by filled squares) and for only those grid boxes where the detected trend is significant (p < 0.1) in both L3L and L3W (red, thicker lines, mean values represented by filled triangles).

Figure 8 .
Figure 8. L3L (black crosses) and L3O L (green circles) time series for the entire study period.Note that the size of the plotted symbol required to visualise the whole time series artificially exaggerates the sense of temporal coverage; in reality, L3L is only present on 25 % of the days across the study period and L3O L on just 1 %.

Table 1 .
List of dataset short names used in the main article text and their corresponding full descriptive name.
measurements to date.With a native pixel resolution of ∼ 22 × 22 km at nadir and a swath width of ∼ 640 km, it offers near-global coverage roughly every 3 d, crossing the Equator at ∼ 10:30 and ∼ 22:30 local time.

Table 2 .
Mean values for selected variables from L3L and L3W for coastal L3 grid boxes, matched retrievals only.Matched means that only days when both L3L and L3W are present and the L3O surface index is mixed are used to create the mean values analysed.Mean values are calculated and presented separately according to the results of a two-tailed Student's t test (unequal variance) performed on mean retrieved VMR values in L3L and L3W (n = 3971).Mean L3L − L3W differences are also shown for each subset (L-W).

Table 3 .
Descriptive stats corresponding to the WLS trends detected in L3L, L3W, and L3O M that are compared in the boxplots of Fig.5c.SD denotes standard deviation; IQR denotes interquartile range.

Table 5 .
Descriptive stats corresponding to the WLS trends detected in L3L, L3W, and selected L3O subsets.Also shown are mean averagingkernel row sums and diagonal values corresponding to the retrievals from which trends are calculated.SD denotes standard deviation; IQR denotes interquartile range.
LM in all cases considered here (being the classification on 84 % of days, on average; max = 100 %, min = 45 %).