Total water vapour columns derived from Sentinel 5p using the AMC-DOAS method

. Water vapour is the most abundant natural greenhouse gas in the Earth’s atmosphere and global data sets are required for meteorological applications and climate research. The Tropospheric Ozone Monitoring Instrument (TROPOMI) onboard Sentinel 5 Precursor (S5P) launched on 13 October 2017 has a high spatial resolution of around 5 km and a daily global coverage. Currently, there is no operational total water vapour product for S5P measurements. Here, we present ﬁrst results 5 of a new scientiﬁc total column water vapour (TCWV) product for S5P using the so-called Air Mass Corrected Differential Optical Absorption Spectroscopy (AMC-DOAS) scheme. This method analyses spectral data between 688 and 700 nm and has already been successfully applied to measurements from the Global Monitoring Experiment (GOME) on ERS-2, the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY) on Envisat and GOME-2 on MetOp. The adaptation of the AMC-DOAS method to S5P data requires an additional post-processing procedure to correct the 10 inﬂuences of surface albedo, cloud height and cloud fraction. The quality of the new AMC-DOAS S5P water vapour product is assessed by comparisons with data from GOME-2 on MetOp-B retrieved also with the AMC-DOAS algorithm and with four independent data sets, namely re-analysis data from the European Centre for Medium range Weather Forecast (ECMWF ERA5), data obtained by the Special Sensor Microwave Imager and Sounder (SSMIS) ﬂown on the Defense Meteorological Satellite Program (DMSP) platform 16 and two scientiﬁc 2 m . The AMC-DOAS S5P TCWV and S5P TCWV from MPIC agree on average within 1 kg m − 2 over both land and ocean. TCWV from SRON shows daily global averaged differences to AMC-DOAS S5P TCWV of around 1.2 kg m − 2 . All of these differences are in line with the accuracy of these products and with the typical range of differences of 5 kg m − 2 obtained when comparing different TCWV data sets. The AMC-DOAS TCWV product for S5P provides therefore a valuable new and independent data set for atmospheric 5 applications which also has a higher spatial coverage than the other S5P TCWV products. and INST 144/493-1. This study is of relevance for the Transregional Collaborative Research Centre TR 172 (AC) 3 which has a focus on water vapour in the Arctic.

resolution. They yield TCWV for all weather conditions. In contrast, the spatial coverage is quite poor due to the limited amount ::::::: number of ground based receivers.
Another important part of the global observing system for water vapour are measurements made from passive remote sounding sensors from polar and geostationary orbiting platforms. These potentially provide global information about the atmosphere having full global coverage every day or better, dependent on the number of platforms flying simultaneously. This information 5 can be used to fill the spatial and temporal gaps of the different ground based measurements. A variety of possible methods to derive the total water vapour amount from space has been developed for various spectral regions.
One of the earliest TCWV data sets provided by satellites was derived from measurements in the microwave region by Nimbus 5 on NOAA (e. g. Staelin et al., 1976). In the same spectral region the SSM/I instrument and its successor SSMIS on different platforms provide the longest TCWV times series from 1987 up to now. The measurements of microwave sounders 10 yield water vapour under cloud free :::::::: cloud-free and cloudy conditions. These data products are usually limited to those measurements made above water surface. This is a result of the poor understanding of land surface emission in the microwave region. With microwave sounders it is possible to retrieve water vapour under cloud free :::::::: cloud-free : and cloudy conditions, but the retrievals are usually restricted to water surfaces due to not well known contributions of land surface emissions to the received signal (Schlüssel and Emery, 1990;Wentz, 1997). However, Melsheimer and Heygster (2008) extended the microwave 15 retrieval to polar regions where ice and snow is present throughout the year.
TCWV retrievals are also possible in the thermal infrared spectral region, e.g. by the mathematical inversion of measurements from Infrared Atmospheric Sounding Interferometer (IASI) (Schlüssel and Goldberg, 2002) or Landsat 8 (Ren et al., 2015).
In the near infrared retrievals are performed at wavelengths around 900 nm e.g. by the Medium Resolution Imaging Spectrometer (MERIS) (Bennartz and Fischer, 2001;Lindstrot et al., 2012) and its successor, the Ocean Land Color Instrument 20 (OLCI) flown on Sentinel-3 (Preusker et al., 2021), or the Moderate Resolution Imaging Spectrometer (MODIS) (Sobrino et al., 2003;Diedrich et al., 2015). These methods are usually limited to highly reflective surfaces such as land, which excludes ocean areas with exception of sun glint cases.
Another alternative is to employ measurements made in the visible spectral range to compute TCWV from satellites. Noël et al. (1999) introduced a modified DOAS (Differential Optical Absorption Spectroscopy) approach applied to GOME measure- 25 ments. This approach was also used to retrieve TCWV from the Scanning Imaging Absorption spectroMeter for Atmospheric CHartographY (SCIAMACHY) (Noël et al., 2005a, b) as well as from GOME-2 (Noël et al., 2008) on the MetOp series. Wagner et al. (2003) described another approach to retrieve TCWV using the DOAS technique for GOME in the visible red spectrum.
Later, Wagner et al. (2013) described an approach to derive TCWV from GOME-2 and Ozone Monitoring Instrument 30 (OMI) using the spectra from 430 to 450 nm. Advantageous for this method is a more homogeneous and higher surface albedo especially over water. As a consequence, the backscattered signal is stronger but the absorption strength of H 2 O is generally weaker. Wang et al. (2014) used a similar approach to determine TCWV from the Ozone Monitoring Instrument (OMI). They used a wider spectral range from 430 nm to 480 nm to include water vapour absorption at 470 nm.

5
:: In autumn 2017 the Sentinel-5 Precursor (S5P) satellite was launched. It contains the Tropospheric Ozone Monitoring Instrument (TROPOMI), which provides an unprecedented high spatial resolution and temporal sampling.
Currently, no operational S5P total column water vapour product exists. Schneider et al. (2020) presented a method to derive water vapour isotopes HDO and H 2 O from S5P data in the short-wave infrared (SWIR). Most recently, Borger et al. (2020) retrieved TCWV from Sentinel-5P in the blue spectral range. This is similar to the approach described by Wagner et al. (2013). 10 Fortunately, the spectral range around 700 nm, which is used in the AMC-DOAS retrieval, is also present in S5P spectra. Therefore, it is also possible to apply the AMC-DOAS method to TROPOMI data and thus extend the existing time series of AMC-DOAS TCWV.
In the current paper we present first results from the adaptation of the AMC-DOAS algorithm to this new instrument. The paper is structured as follows: Section 2 gives an overview of used instruments and data. Section 3 entirely explains the 15 adaption of AMC-DOAS to S5P measurements. In particular, its dependence on albedo and cloud properties will be evaluated and corrected. In section 4 the results of the retrieval and comparisons to other data sets are presented. Section 5 gives the summary and the conclusion.

Data
This section describes all external data sets used in this study either for generation of the new S5P AMC-DOAS :::: S5P data 20 product (see section 3) or for the comparisons with other TCWV data (see section 4).

Sentinel 5P Level 1 data
Sentinel-5P (S5P) is part of the European Commission's Copernicus programme and was launched on 13th October 2017. It is a low polar orbiting satellite observing Earth's surface and atmosphere at roughly 824 km height. The satellite crosses the equator at 13:30 local time in an ascending node. 25 TROPOMI onboard S5P is a nadir viewing spectrometer, which has a wide spectral range covering the ultraviolet (UV) and visible spectral range (270 nm to 500 nm), the visible / near infrared (NIR) from 675 nm to 775 nm and the shortwave infrared (SWIR) region from 2305 nm to 2385 nm (Veefkind et al., 2012). For most of the spectral channels the spectral resolution is about 0.5 nm with a sampling of around 0.1 nm. The first UV and the SWIR band have spectral resolutions of 1.0 nm and 0.25 nm, respectively.

30
The visible / near infrared bands are suitable for the retrieval of TCWV from S5P with the AMC-DOAS algorithm. In particular, radiances from Band 5 ranging :: in ::: the ::::: range from 661 to 725 nm are used in the present study. They are processed with the L0-1b data processor version 01.00.00. Irradiance data are taken from the corresponding S5P L1B data set closest in time before :::: made :::: prior :: to : the radiance measurement.
S5P's swath width of 2600 km allows ::::: yields : an almost full daily coverage even in tropical regions. Currently, the spatial resolution of the sensor is 5.5×3.5 km 2 except for SWIR bands (5.5×7.0 km 2 ) such that in contrast to other satellite instruments mentioned in section 2.4 below finer features in TCWV are resolved.

5
After the launch of S5P on the 13 October 2017 up to the end of April 2018 all its sensors were tested and calibrated. During this commission phase data sets are not provided regularly. However, after switching to operational mode the delivery of the radiances is almost continuous.
For the comparison studies more than two years of daily data is used. The time span of these data is from May 2018 to December 2020.

GMTED2010
The U.S. Geological Survey provides the Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) (Danielson and Gesch, 2011) which is used to get ::::: obtain information on surface height and its type on very fine resolution up to 7.5 arc-seconds. The data set used in this study is provided on a 0.025 • times 0.025 • spatial resolution and comprises surface type, surface elevation and surface roughness. For the AMC-DOAS product the closest match between the location of S5P 15 measurement and the GMTED2010 data product is chosen. Surface type is used to distinguish between land and sea. The surface height is needed to derive the surface height dependent TCWV product.

The S5P FRESCO product
The Fast Retrieval Scheme for Clouds from the Oxygen A band (FRESCO, Koelemeijer et al., 2001;Wang et al., 2008) is a method to derive cloud pressure or cloud height and cloud fraction. The method uses three different 1 nm wide spectral windows close to the oxygen A band near 760 nm with various absorption strength.
In the 758-759 nm window no oxygen absorption occurs. The measured signal thus depends mainly on the cloud albedo, 5 surface albedo and the cloud fraction. Within the O 2 A band at 760-761 nm with very strong oxygen absorption and at 765-766 nm with weaker oxygen absorption the reflected sunlight additionally depends on cloud top pressure. The depth of the O 2 A band gives an information of the height of the clouds. All three wavelength windows provide all necessary information to retrieve cloud height and cloud fraction.
In this study we use the cloud information from the operational FRESCO product for S5P (Apituley et al., 2017) for filtering 10 and post-processing (see section 3). It is provided on version 1.002 to 1.04.

Water vapour data sets
The independent TCWV products used for comparison are briefly described in this section. An overview of the different correlative satellite TCWV data sets used in this study is shown in Tab. 1.

15
The first GOME-2 instrument on the MetOp series was launched on MetOp-A in October 2006 (Munro et al., 2016). It is an improved version of GOME on the second European Remote Sensing Satellite (ERS-2) Munro et al., 2006). GOME-2 observes the atmosphere in a spectral range from 240 nm to 790 nm with a spectral resolution of 0.26 nm to 0.51 nm. By default its spatial resolution is 80 km across track times 40 km along track with a swath width of 1920 km. Since the launch of MetOp-B in September 2012 both satellites fly in a tandem operation mode. The swath of GOME-2 on MetOp-A 20 was then reduced to 960 km resulting in an increase of spatial resolution by a factor of two across track on the cost of spatial coverage. Metop-B has a sun-synchronous descending orbit at 9:30am local time of equator crossing. Since November 2018 MetOp-C completes the MetOp series.
AMC-DOAS water vapour products are available for all three MetOp sensors (see e.g. Noël et al., 2008), but for the comparisons with S5P data described in the current study the GOME-2 instrument on MetOp-B (version 0.5.5a) has been selected 25 because it provides the best :::::: overlap :: in : spatial and temporal coverage. The estimated accuracy of the GOME-2 TCWV depends on cloudiness and TCWV amount and is typically better than 5 kg m −2 .
2008). For comparison studies with S5P presented in the current paper the dayside data from the SSMIS instrument on the DMSP F16 satellite are chosen. This is because it has a ascending orbit with an equator crossing time of 15:54 which fits best to the S5P observation time.
Its swath width is around 1700 km. SSMIS total water vapour data used here are provided as daily gridded data (0.25 • resolution) by Remote Sensing System (Wentz et al., 2012). SSMIS data are only available over water surface for rain free 5 situations. The total water vapour product is processed with the algorithm of Wentz (1997) with version v7. The accuracy of the SSMIS TCWV is around 1 kg m −2 .

MPIC S5P TCWV
The Satellite Remote Sensing Group at the Max Planck Institute for Chemistry (MPIC) also provides a TCWV product from TROPOMI measurements making use of the water vapour absorption in the blue spectral range (Borger et al., 2020). The 10 retrieval consists of the common two-step DOAS approach: in the first step the spectral analysis is performed for a fit window from 430-450 nm within a linearised scheme. Then, the retrieved slant column densities are converted to vertical columns using an iterative scheme for the water vapour a priori profile shape, which is based on an empirical parameterisation of the water vapour scale height. During an extensive theoretical error estimation, the retrieval's TCWV uncertainty has been approximated to about 10-20 % for favourable and 20-50 % for unfavourable observation conditions. Furthermore, in the framework of a 15 validation study based on daily and hourly measurements it was demonstrated that the MPIC S5P TCWV product is in very good agreement to reference data sets (e.g. SSMIS) for clear sky scenarios over ocean as well as over land surface. For this study only measurements have been included for which the effective cloud fraction is between 0 and 0.2, the airmass factor > 0.1, and the snow-ice flag indicates snow-and ice-free conditions. The accuracy of the TCWV product is up to ::::: around : 25 % (2.8 kg m −2 ) for TCWV smaller than 20 kg m −2 and up to :::::: around 15 % for TCWV larger than 20 kg m −2 . 20

SRON S5P TCWV
The Netherlands Institude for Space Research (SRON) provides a TCWV product that is restricted to clear sky scenes over land and separates water vapour isotops (H 2 O/HDO) and is retrieved from the SWIR infrared measurements of TROPOMI from 2354 to 2380.5 nm (Schneider et al., 2020). More details about the retrieval approach and settings can be found in (e.g. Scheepmaker et al., 2016). The forward model used here ignores scattering which makes strict filtering of clouds necessary. As 25 cloud filter data from the Visible Infrared Imaging Radiometer onboard Suomi National Polar-orbiting Partnership (Siddans, 2016) are used. The upper threshold for cloud cover is a cloud fraction of 1 %. An additional filter for aerosols is also applied.
Values at solar zenith angles larger than 75 • are discarded. The albedo of water surfaces is too low to retrieve TCWV over oceans such that TCWV only is used over land surfaces. In this study we use version 9_1 of this data set which shows a bias to TCCON stations of (0.06±0.9 kg m −2 ((1.1±7.2) %).

ECMWF ERA5 TCWV
The ERA5 reanalysis data set (Hersbach et al., 2020) from the European Centre for Medium Range Forecast Reanalysis provides atmospheric parameters such as temperature and humidity computed on 137 levels from surface height to 80 km. It is a model data set in which a large variety of observational data including satellite measurements (e.g. SSMIS), radiosondes and ground stations are assimilated. 5 This product is available every hour; : . ::: The : data used here are on a 0.25 • spatial grid. TCVW is derived by vertical integration of the profile data.

AMC-DOAS Approach
The approach known as Differential Optical Absorption Spectroscopy was first used to describe active remote sensing mea-10 surements having long tropospheric optical paths. (Perner and Platt, 1979). Variants of DOAS techniques were proposed and have been successfully applied from space (see e.g. Burrows et al., 1999, and references therein) to derive the amount of trace gases in the atmosphere. The method uses the Lambert Beer's law, which describes the attenuation of light due to gas absorption along a light path. The amount of a trace gas along this light path is the slant column density. The slant column density is converted into a total vertical column via a so-called air mass factor. This air mass factor is usually derived from radiative 15 transfer calculations taking the solar geometry and scattering processes in the atmosphere into account.

30
a is the so-called air-mass correction factor, which accounts for differences between the real atmospheric conditions / light path compared to those assumed in the radiative transfer calculations. C v is the total vertical column of water vapour, which is derived together with a and P by a nonlinear ::::::::: non-linear fit.
This method is applied to the measured I λ and I 0,λ at a :: in ::: the : spectral range of 688-700 nm, which has been selected, because in this spectral region absorption lines of oxygen and water vapour are both present and of similar strength. This 5 is important, because the underlying assumption of AMC-DOAS is that the same correction factor a can be applied to both oxygen and water vapour. This will be explained in more details in the following.

Adaption and optimization of AMC-DOAS to Sentinel-5p observations 30
For the application to S5P :::::::: reflectance :::::::: (radiance ::::::: divided :: by ::::::::: irradiance) :::::::::::: observations, the AMC-DOAS method was adapted in the following way. The radiative transfer model SCIATRAN v3.8 (Rozanov et al., 2014) in combination with the HITRAN 2012 (Rothman et al., 2013) spectral absorption database is used to compute the quantities c,b and τ O2 . As the reference H 2 O vertical profile a tropical atmosphere with a TCWV of 41.8 kg m −2 is used (from LOWTRAN data base). The spectra are then convoluted with the across-track ground pixel dependent instrument spectral response functions (ISRFs) (van Hees et al., 2018) of S5P. Their full width half maximum varies around and in the range given by :: of 0.34 nm ± 0.002 nm. The spectral quantities are calculated for a reference surface albedo of 0.02 for an ::::: which ::: we :::::: assume :: to ::: be ::: the :::::: surface : albedo of a water surface :: in :: the ::::::: selected ::::::: spectral ::::: range. The surface height is also considered. As :::::::: accounted :::: for. :: As ::: the : surface height reference the Global multi-resolution terrain elevation data 2010 (GMTED2010; Danielson and Gesch, 2011) is used. The radiative transfer database 5 is then calculated for every ground pixel and various surface heights from 0 to 9 km. Note that this added dependence on the surface height also changes the definition of the AMC-DOAS water vapour product: . : The S5P TCWV is defined as the total column above the surface, whereas in previous AMC-DOAS products it was defined as the total column above sea level. This has the advantage that TCWV over mountain ranges are valid data points.
In previous applications for the GOME-like instruments the scaling factor a was also used as an inherent quality check (Noël 10 et al., 1999). If the correction is too large (which is mainly due to clouds) the retrieval results are discarded. The corresponding minimum air mass correction factor of 0.8 is also used as filter criterium for the S5P data. However, it turns out that for S5P this filter is not effective enough; too . :::: Too : many (especially cloudy) data remain. In general, we derive typically higher air mass correction factors for S5P than for the other instruments. We attribute this mainly to the different equator crossing times (morning vs. noon) in combination with the higher spatial resolution and wider swath width of S5P. Thus additional filtering is 15 needed :: to :::::: remove ::::::::: unphysical :::::: results.
The largest source of error in the AMC-DOAS TCWV product are associated with partially cloud filled ground scenes.
The larger the fraction of cloud within a ground scene and the higher the cloud, then the lower the effective sampling of the troposphere. We therefore apply an additional cloud filter, which is based on cloud fraction and cloud height provided by the operational S5P FRESCO cloud product. A pixel is considered as cloudy if the cloud fraction is larger than 0.2. In addition, 20 measurements with cloud heights above surface of more than 2.0 km are also discarded.
An example for a S5P measured spectrum and the corresponding fitted spectrum from the retrieval can be seen in Fig. 1a for a scene over the pacific with very little cloud fraction. In this example the retrieved TCWV is 16.0 kg m −2 with an retrieval error of 0.39 kg m −2 . The residual, which is given in relative amount (measurement minus fit divided by measurement, see Fig. 1b), is not larger than roughly 0.3 % in this example. The root mean square of the absolute residual (measurement-fit) is with 25 0.07very low., ::: i.e. :::::: small. This shows that the measurement and the fit ::::::: measured :::: and :::: fitted ::::::: spectra match very well.

Postprocessing
With the AMC-DOAS method one day of S5P measurements (23 February 2020) has been processed and filtered according to the procedure described above. The resulting S5P TCWV product shown in Fig. 2a represents all expected spatial features.
Within the Intertropical Convergence Zone (ITCZ) the values are largest. Towards polar regions the TCWV decreases.

30
Details on the quality of the AMC-DOAS S5P TCWV are revealed by the deviation to the collocated ERA5 TCWV ( Fig.   2b) which shows several issues. On ::: For ::: the global average, there is only very little : a :::: small : difference of 0.05 kg m −2 between both data sets. Over ocean repeating patterns : of ::: the :::::::::: differences are visible. These patterns are more pronounced over regions with higher TCWV. Over land systematic positive deviations over regions with higher surface albedo can be observed, like such :: as :::::::: observed :::: over Sahara and Australia. These regions typically have a higher surface albedo than the reference used for the AMC-DOAS radiative transfer data base. This implies that surface albedo influences on the retrieved AMC-DOAS TCWV need to be considered. Also remnant clouds will affect the retrieval. Thus an additional correction scheme has been introduced to reduce systematic effects due to :::::::: variations :: of : surface albedo and clouds. This is described in the following subsections.
To investigate the dependence of AMC-DOAS TCWV on surface albedo and cloud properties radiances I clear and I cloud are simulated with SCIATRAN for the clear sky case and the fully cloudy case, respectively. In :::::: Please ::: note :::: that :: in this manuscript the term of albedo is used to describe the spectral reflectance from surface and clouds. This assumes a lambertian surface where the total reflected radiation is homogeneously distributed over a hemisphere, i. e. 2π steradians. For the small spectral window used in the retrieval, this is considered a reasonable approach and spectral dependence of surface albedo is ignored. 5 For the clear sky case, surface albedo, surface height and solar zenith angle are varied. For the cloudy case we also consider dependencies on cloud height and cloud fraction.
A cloud is considered in the simulations as a reflecting layer with an albedo of 0.8 which is located at a given height. This follows the definition of the S5P FRESCO product. The simulated cloud-free and cloudy radiances are then mixed by the cloud fraction CF according to the independent pixel approximation: The spectrum I mixed is then used in : to ::::::: retrieve the AMC-DOAS retrieval :::::: TCWV. The ratio of the reference ('true') TCWV C v,ref to the retrieved ::::::::::: AMC-DOAS TCWV C v,retr may then be used as a multiplicative correction factor c ac :
To avoid that the correction factor dominates the retrieval results an additional filter is applied to exclude situations, : where the correction is too large. Thus the overall correction factor is restricted to values between 0.6 and 1.2.
The final albedo and cloud correction factor depends on geometrical information (solar zenith angle, across-track ground pixel; , ::::: being : taken from the S5P measurements), surface elevation (from GMTED2010), cloud fraction and cloud height (from 10 the S5P FRESCO product) and surface albedo.
As the surface albedo is highly variable, we do not use a climatology but determine it directly from the S5P reflectance measurements from 684 nm to 686 nm. This spectral region is close to the retrieval window of 688-700 nm, but contains no major atmospheric trace gas absorption. To relate the reflectance to the surface albedo radiances and irradiances are simulated from 684 nm to 686 nm with varying surface albedo, solar zenith angle,surface height, cloud fraction and cloud height. To 15 smooth out fluctuations the average reflectance over this 2 nm window is calculated. This results in a database from which for each (measured) average reflectance, geometry and cloud properties a surface albedo can be derived via interpolation.
The resulting clear sky albedo and cloud correction is then applied as multiplicative factor (C v,ac ) to get ::::: obtain the corrected TCWV: C v,ac = C v,uc c ac (4) 20 where C v,uc is the uncorrected TCWV. Note that due to this correction the TCWV product is independent of the surface albedo chosen as reference for the basic AMC-DOAS retrieval.
The results of this correction when applied to the uncorrected data from 23 February 2020 is :: are : shown in Fig. 2c and its deviation to :::: their ::::::::: differences :: to ::: the :::::: values :: in : ERA5 in Fig. 2d. The global mean deviation is slightly increasing but the variability denoted by the standard deviation SD is lower compared to the uncorrected product. Over land the application of the 25 correction factors reduces the deviations :::::::: differences : over the deserts. However, over ocean there are still some patterns visible.
For this purpose for each swath over water surface the relative difference ∆C v,ac of the retrieved TCWV at each across-track ground pixel i to the nadir value (across-track ground pixel i nadir =223) is computed: All S5P orbits in February 2020 are used for this to have good statistics. For every across-track ground pixel with valid TCWV measurement ∆C v,ac (i) is calculated and counted with bins of 0.05. This results in a histogram of ∆C v,ac as function of across-track ground pixel number which is shown in Fig. 4. As can be seen there is a systematic across-track ground pixel dependence of ∆C v,ac . In the western part of the swath (across-track ground pixel numbers smaller than i nadir ) the relative deviation ::::::::: differences is positive whereas the eastern part (across-track ground pixel numbers larger than i nadir ) shows more 10 pronounced negative deviations ::::::::: differences.
For the correction the maximum amount of ∆C v,ac at each across-track ground pixel is then used to fit a polynomial P emp of third degree (orange line in Fig. 4): where k = i − i nadir is the shifted across-track ground pixel number and a j are the derived polynomial coefficients, namely: 15 a 0 = 0.0, a 1 = −1.099 · 10 −3 , a 2 = −1.13 · 10 −6 and a 3 = 1.075 · 10 −8 .
The multiplicative correction factor c emp for every across-track ground pixel is then defined as: This correction is then applied in addition to the cloud and albedo correction, leading to: 20 The results are shown in Fig. 2e and f. The spatial patterns over ocean are corrected out and also the mean deviation to ERA5 and the scatter of the data is reduced.
Note that all applied corrections do not significantly change the main ::::: water vapour patterns (Fig. 2a,c,e), but generally result in a smoother spatial distribution.  From these, a daily gridded data product with a spatial resolution of 0.25 • degree× 0.25 • is produced, resulting : in : a data set called TCWV AMC,S5P in the following. An overview of this TCWV product is given in Fig. 5 which shows the spatial distribution of TCWV AMC,S5P for four months.

Results and Discussion
The general features shown in the maps meets the expectations from climatology. In the tropics there is higher TCWV due to high temperature. Within the ITCZ the values are highest. Towards the polar regions the air gets ::::::: becomes : colder thus the In January, the ITCZ is located close to the equator. During the course of time it shifts northwards until July. Large changes are observed comparing January and July over southeast Asia (e.g. India, China) and nearby water surfaces. Here the ITCZ reaches its northernmost position causing an increase of TCWV AMC.S5P of around 30 kg m −2 from January to July.

10
During northern summer as the entire northern hemisphere warms the global average TCWV is largest (23.1 kg m −2 ). This is in large part due to larger land masses in the northern hemisphere. They are significantly warmer during July than the large oceans in the southern hemisphere in January. Smaller contributions come from Arctic regions, : which also show enhanced TCWV. Such large TCWV increase cannot be observed from July to January over the southern hemisphere due to the lack of landmasses.

15
TCWV AMC,S5P also shows some differences between April and October. In general, all features follow the position of the sun but with a time lag of several weeks. That means in April the northern hemisphere is colder which results in higher TCWV in October.
Over sea the averaged TCWV AMC,S5P is higher than over land due to different surface elevation, larger temperature variability and also large :: the : evaporation over water surfaces. No data are available in winter hemisphere's polar night region due to lack of solar insulation :::::::: insolation.
To assess the quality of this new data set it is compared to various other other data sets (see section 2.4 for more information), which are either also provided on a daily 0.25 • × 0.25 • grid or have been gridded accordingly. We use the following notation 5 for these correlative data sets: -TCWV AMC,GOME-2B : GOME-2B data product, which is based on the original AMC-DOAS approach; daily gridded to 0.25 • degree× 0.25 • .
-TCWV WENTZ,SSMIS : SSMIS data product using microwave emissions as input; provided on a daily 0.25 • grid. S5P TCWV product data from MPIC in Mainz using the 'blue' spectral range; provided on a daily 0.25 • grid.
In the first step, daily TCWV AMC,S5P data are compared to other daily TCWV products. Global deviation maps are then 20 presented and discussed. The comparisons are done over land and ocean separately to detect possible systematic features arising from surface type and/or elevation. It has to be noted that all satellites have different time :::: times : of overpass thus diurnal changes in TCWV may affect the comparison results.

Daily comparisons
The comparison procedure for the daily data is as follows. For every day TCWV AMC,S5P and the other TCWV products are 25 collocated. From the collocated data sets pairwise differences ∆T CW V AM C,S5P −Z are calculated: Here, the index Z denotes the specific data set to compare with. Additionally, the difference is averaged :: by :::::::: weighting ::::::::: according :: to :: the :::::: cosine :: of ::: the :::::: latitude : and its variability is given by :: the : standard deviation SD. The averages are calculated with weighting according to the latitude.
30 Table 2. Correlation ::::: Pearson ::::::::: correlation coefficient R, slope m, intercept n, average difference ∆TCWV (in kg m 2 ) to the AMC-DOAS S5P product and collocation counts for the scatter plots in Fig. 6. The errors represent one standard deviation. A linear regression model using least square technique is applied to TCWV AMC,S5P and TCWV Z . This gives the correlation coefficient R and the regression parameters n and m denoting the intercept and the slope, respectively.
A scatter plot for 23 February 2020 is shown in Fig. 6 for the various TCWV products. All statistical parameters are given in Tab. 2.

ERA5
5 ERA5 is a model comprimising satellite data (e.g. SSMIS), radiosondes and weather observations for reanalysis. That makes TCWV MOD,ERA5 a very robust TCWV product. Due to the fact that ERA5 is an hourly data set temporal mismatch is restricted to less than half an hour.
The comparison between TCWV AMC,S5P and TCWV MOD,ERA5 (Fig. 6a,b) shows a very small difference of -0.7 kg m −2 (Fig.   6a) over land. The values are orientated along the 1:1 line which is also denoted the by small standard deviation of 3.2 kg m −2 .

10
Over sea (Fig. 6b) the difference is larger (-2.0 kg m −2 ) than over land but the standard deviation (3.5 kg m −2 ) and also the correlation coefficient is very similar. The correlation coefficients are above 0.98 indicating a very good agreement between both data sets.

GOME-2B
The comparison between TCWV AMC,S5P and TCWV AMC,GOME-2B (Fig. 6c,d) shows very good agreement between both data 15 sets irrespective of whether the retrieved TCWV is over land or water surfaces. This is demonstrated by the regression line (solid line in Fig. 6c,d) being very close to the 1:1 line (dotted) and a correlation coefficent above 0.9.
The average TCWV difference ∆T CW V AM C,S5P −AM C,GOM E−2B is -1.3±4.2 kg m −2 over land (Fig. 6c) model is more weighted to these values. Over ocean (Fig. 6d) there is more variability in the difference, which is also indicated by the standard deviation of 6.7 kg m −2 . The average deviation is 1.7 kg m −2 . There is a land sea bias of 3 kg m −2 at this day.
Both products are processed with AMC-DOAS. In contrast to TCWV AMC,GOME-2B the surface height is considered during the retrieval of TCWV AMC,S5P , which explains higher values of TCWV AMC,GOME-2B over land. The additional postprocessing only done for TCWV AMC,S5P also affects the results. 5 Other sources that influences the comparison results are the filters applied to the TCWV AMC,S5P . As mentioned above TCWV AMC,S5P data are filtered with an additional cloud filter, which is not applied to GOME-2B data. The propagation of large scale cloud decks like :::: such ::: as in lows or the well known stratocumulus cloud region is not that fast within the time difference of MetOp-B and S5P overpasses of four hours (at equator). These cloud decks are therefore located at similar positions for both overpass times. Thus cloud masking applied to TCWV AMC,S5P will also filter clouds by some degree from 10 TCWV AMC,GOME-2B (as we only consider grid points where both instruments have data).

MPIC S5P
The TCWV AMC,S5P and TCWV MPIC,S5P both use S5P Level 1 measurements but different spectral regions are used in the retrieval. Over land (Fig. 6e) the mean difference between both data sets is 0.8 kg m −2 with a standard deviation of 4.7 kg m −2 .
The TCWV MPIC,S5P data contain a few values up to 90 kg m −2 which are not observed in the TCWV AMC,S5P . Over land there 15 are less valid data in TCWV MPIC,S5P compared to other data sets, which can be seen from the number of valid counts (Tab. 2) after collocation. This is due to the filtering of snow and ice contaminated scenes.
Over sea (Fig. 6f) there is almost no mean deviation (-0.3 kg m −2 ) and a lower variability (4.1 kg m −2 ) than over land.
The differences are smallest over both land and sea compared to the other data sets. This meets the expectations because measurements are performed by the same instrument. Nevertheless, uncertainties may arise from sampling differences.

SRON S5P
The TCWV AMC,S5P and TCWV SRON,S5P also rely on the ::: use ::: the :::: data ::::: from ::: the same instrument. In contrast to TCWV MPIC,S5P the SRON product has a poorer spatial coverage due strict cloud filtering and limitation to land. The main aim of the SRON data product is to provide columns with low error caused by cloud contamination. This results in 50000 collocated grid points over land (see Tab. 2) can be used :::: being :::::::: available : for comparison. Fig. 6g also illustrates this. There is far less scatter visible 25 than for the other data sets. Most of the TCWV pairs are well oriented along the regression line. This is also shown by an almost perfect correlation coefficient of 0.99. On average, the deviation :::::::: difference : between TCWV AMC,S5P and TCWV SRON,S5P is 0.8 kg m −2 . The standard deviation is 1.4 kg m −2 which is comparably low with respect to the other TCWV products; the TCWV SRON,S5P rarely exceeds 40 kg m −2 . Both low scatter and low total columns are probably related to the filtering, which removes almost all even partly cloudy scenes, i.e. especially those scenes which require dedicated corrections in the other S5P 30 TCWV algorithms.

SSMIS
Microwave instruments are known to provide good information on total water vapour because microwave emission penetrates through clouds. However, the comparison to TCWV AMC,S5P is limited to ocean areas because SSMIS does not provided data over land surfaces.
The mean deviation between TCWV AMC,S5P and TCWV WENTZ,SSMIS is the largest compared to other sensors (-3.7 kg m −2 ). 5 The regression line in Fig. 6i shows a constant offset between both data sets. There are several possible reasons to explain this large offset: -The DMSP F16 has an orbit later in the afternoon (around 16:00 LT). This can affect the TCWV due to slight warming of the sea surface and the above air. This causes enhanced evaporation and thus a slightly higher water vapour content.
-In the microwave region the radiation penetrates through the clouds. As consequence SSMIS senses the entire profile 10 also if clouds are present. The total water vapour usually is higher in cloudy scenes than in clear sky scenes which is also referred to as the clear sky bias (Gaffen and Elliott, 1993;Sohn and Bennartz, 2008). In TCWV AMC,S5P data with cloud fractions above 0.2 are excluded, which causes :::::::: introduces : a negative offset.
-The AMC-DOAS retrieval and also the cloud and albedo correction for S5P use as reference a tropical profile. Usually the reference profile shape and the true profile shape differ. That also can cause systematic deviations ::::::::: differences : especially 15 in the presence of remnant :::::: residual : clouds.
-The cloud and albedo treatment is dependent on the quality of the used cloud products. Uncertainties in the cloud product will have an impact on the surface albedo estimation and also on the calculation of the correction factors.
There also are some values where TCWV WENTZ,SSMIS exceeds TCWV AMC,S5P by more than 20 kg m −2 . These arise from two DSMP F16 orbits located between the International Date Line and North and South America. Those orbits belong to the 20 very first orbits of the daily SSMIS TCWV product. For the TCWV AMC,S5P the transition between the first and last orbit of the specific day is located much closer to the International Date Line. This results in a time mismatch of roughly 24 h between S5P data and SSMIS data in these areas causing these observed TCWV differences.

Time series of TCWV differences
Since TCWV AMC,S5P is available for more than two years it is worthwhile to investigate the behaviour of differences of TCWV 25 throughout the time. Fig. 7 shows the daily averaged TCWV differences between the TCWV AMC,S5P product and the different correlative data sets from May 2018 to December 2020. Again this is done for land surface (Fig. 7a) and water surface (Fig.   7c) separately. The respective standard deviations are also shown (Fig. 7b,d) The temporal behaviour of the TCWV difference between the AMC-DOAS products for S5P and GOME-2B over land shows a systematic deviation between -1 kg m −2 -2.5 kg m −2 ; also, a seasonal cycle is visible. The largest deviations can be 30 seen during the northern summer months whereas in northern winter the deviation is least. The standard deviation also shows a seasonal cycle with largest variability also during northern summer. Compared to the other data sets the average difference is largest.
The behaviour of the difference to the S5P AMC-DOAS ::: S5P : product is quite different for :: the : S5P data set from MPIC.
The difference ::::::::: differences also shows a seasonal cycle : , : which is of opposite sign, : compared to the difference between to other TCWV products. The TCWV SRON,S5P are filtered with a very strict cloud filter that only left small TCWV values.
Therefore TCWV values from tropical regions where TCWV is high are discarded.
Between TCWV AMC,S5P and TCWV MOD,ERA5 there is a general negative deviation around 1.6 kg m −2 . In contrast to the other data sets only very small seasonal variability can be seen.
At the end of November 2020 there was a version change in the FRESCO product : , which is used to correct for cloud effects 15 and is also used to calculate surface albedo to derive the AMC-DOAS S5P TCWV product. This caused a general increase of 2.3 kg m −2 in TCWV AMC,S5P over both land and water surfaces. Due to this increase the deviations also ::::::::: differences show this jump of around 2 kg m −2 except for ∆TCWV AMC,S5P-MPIC,S5P .

Assessment of the spatial dependence of the difference
To investigate possible reasons for e.g. the seasonal cycle or the different temporal behaviour among the data sets we present 20 monthly mean global maps of all data and their difference to our new product. The monthly comparison is restricted to January and July 2019 because all typical spatial and temporal features can already be :: are ::::::: already seen from these months. Values for the global average of the TCWV products and its standard deviation and also their difference to TCWV AMC,S5P can be found in Tab. 3. for January and Tab. 4 for July.

ERA5
25 Fig. 8 shows a comparison of the AMC-DOAS S5P TCWV products with ERA5 model data. The temporal and spatial sampling of the ERA5 data is the same as for TCWV AMC,S5P , because we selected the closest model data point for each S5P measurement.
All spatial features of the differences can be seen in Fig. 8b,d.
There are no TCWV AMC,GOME-2B data over Himalayan mountain ranges due to exclusion of GOME-2 values when the air mass correction is too large.

5
More details are revealed by : in : the difference maps (Fig. 9b,d). As can be seen there are systematic spatial structures in the differences. In general, the highest differences both over land and water occur close to the tropics where absolute TCWV is high. In the mid-latitudes and polar regions the deviations are close to zero. Over land surface negative deviations are prevailing ::::::::: differences ::::: occur ::::: often, e.g. over ::: the entire Africa or India. Especially over Africa several effects can be seen.
In the northern part where the surface is bright due to the deserts the albedo correction reduces the TCWV AMC,S5P result-10 ing in a difference of around -5 kg m −2 especially in July. The southeastern part of Africa also shows enhanced differences.
This region is typically more elevated than the northeastern part. This difference in the surface elevation is only considered in the TCWV AMC,S5P product, which results in lower TCWV over mountain regions and thus explains the larger retrieved TCWV AMC,GOME-2B (which represents the column from sea level) there. These differences are therefore mainly due to the definition differences between both data sets. In July the deviations over land are largest and widely spread throughout the northern 15 hemisphere due to overall increase of TCWV.
Over sea the difference is overall positive. Since over ocean the surface height is zero, the differences between the two data products cannot be attributed to the different TCWV definitions, they have to be related to the postprocessing, which is only performed for the S5P product. The mean sea surface albedo does not deviate much from ::::: differ ::::: much :: in the assumptions made in the retrievals, therefore differences caused by the post-processing corrections are more likely to be related to clouds. 20 Most deviations can be seen ::: The :::::: largest ::::::::: differences ::: are :::::::: observed in the tropical area where the average TCWV is largest.
The ITCZ, where highest TCWV occurs, is also predominated :::: often :::::: covered : by clouds due to enhanced convection. Here, the correction of cloud effects affects the already high TCWV in the tropics more than in other regions.
The difference of the crossing time of GOME-2 on MetOp-B and Sentinel-5p at equator is about four hours. This also can have an effect due to diurnal cycles in TCWV and in cloud cover. In some areaslike , ::: for :::::::: example the stratocumulus cloud 25 shields over ocean, : there is a diurnal cycle with enhanced cloud cover in the morning hours and decreased cloud cover during the afternoon hours (Noel et al., 2018). This may reduce the retrieved TCWV AMC,GOME-2B due to more cloudiness ::::: clouds :::::::: appearing during the morning overpass of GOME-2 on MetOp-B. Over land the situation is reversed due to more pronounced convective clouds in the afternoon hours.

MPIC S5P
The S5P TCWV data sets provided by MPIC gives ::::::: provides the opportunity to compare different methods applied to the same instrument. This reduces the effect of possible temporal changes of TCWV as a source of uncertainty in the comparison results.
The averaged TCWV MPIC,S5P (Fig. 10a,c) shows similar structures as TCWV AMC,S5P (see Fig. 5 for comparison). Over the western Pacific close to Indonesia there are values up to 70 kg m −2 which are not shown :::: found : in the averaged TCWV AMC,S5P .
Over sea there is an overall negative difference which is slightly larger in July (-2.2 kg m −2 ) than in January with -1.8 kg m −2 .

SSMIS
The spatial distribution of TCWV WENTZ,SSMIS is shown in Fig. 12a,c. There are no data over land surfaces and also not over 5 sea ice. The averages over sea are around 27 kg m −2 . The deviation ::::::::: differences : (Fig. 12b,d) to TCWV AMC,S5P show an overall negative deviation ::::: overall ::::::: negative :::::: values : of around -5 kg m −2 , : which is more than for the other data sets. There are also structures visible, e.g. a tongue of slightly more enhanced discrepancy ::::::::: differences located over southern parts of the Pacific.

Conclusions
The AMC-DOAS approach was successfully applied to S5P measurements to detect TCWV. For this purpose, several improvements of the retrieval method have been developed. This includes an update of the underlying radiative transfer data basewhich now especially : , ::::: which :::: now also considers variable surface elevation. Due to the latter, the AMC-DOAS product is now defined as the TCWV relative to the surface whereas it was defined relative to sea surface before. This especially results in on average In addition to the usually ::::::::: previously applied filtering based on the derived air mass correction factor and solar zenith anglealso : , new filters are applied. These use the S5P FRESCO cloud fraction and cloud height relative to the surface.
Additional post-processing procedures have been established to account for variable surface albedo and remnant ::::::: residual clouds. Furthermore, an empirical correction has been developed and applied : , : which reduces systematic striping structures :::::::::: across-track ::::::: features in the retrieved TCWV over ocean. The origin of these structures is currently unclear. They are assumed 5 to be instrumental features, but this issue needs further investigations.
Except for the empirical stripes correction, the newly developed algorithm modifications are instrument independent and may thus also be applied to GOME, SCIAMACHY and GOME-2 to further improve also these AMC-DOAS TCWV data products.
The updated AMC-DOAS retrieval has been applied to all S5P measurements from May 2018 to December 2020 which 10 results in a new global TCVW data set. This product was validated by comparison with various independent data sets, namely with the GOME-2B AMC-DOAS product, ECMWF ERA5 model data and the MPIC S5P TCWV product over land and ocean and with SSMIS data over ocean.
The observed standard deviations are in agreement with comparison studies for the existing AMC-DOAS products (see e.g. Noël et al., 2005;Kalakoski et al., 2016) which show a typical scatter of around 5 kg m −2 . This variability arises from different overpass times resulting in systematic changes in the atmospheric conditions due to e.g. transport processes and altering cloud cover. The value of 5 kg m −2 can therefore be considered as a rough estimate for the natural variability of TCWV in 35 combination with effects from different spatial and temporal sampling. All TCWV data sets used in this study show systematic ::::: global : :::::::: averaged differences between each other, : which are typically smaller than this.
Especially :: We :::::::: conclude ::::: from the comparisons of the different S5P results show that there is no "best" algorithm or product for TCWV; all retrieval methods / spectral regions have their advantages and disadvantages. The dynamic range of the TCWV in the atmosphere benefits from having a variety of approaches to measure this quantity, thus complementing each other. The 5 large variability of atmospheric water vapour requires a variety of approaches, which can complement each other.
These studies also reveal that typical differences around 5 kg m −2 occur when comparing different TCWV data sets. The current AMC-DOAS S5P TCWV product relies on FRESCO input data for the albedo / cloud correction and for filtering.
10 Therefore, changes ::::::: Changes : in the input cloud product can have an effect on the derived TCWV. This problem can be seen e.g. in the jump observed in the S5P product from MPIC, which originates in an algorithm change of the used cloud product.
AMC-DOAS S5P TCWV also shows a general increase of on average more than 2 kg m −2 at the end of November. This is caused by a version change of the FRESCO cloud product. It is planned to investigate possibilities to retrieve the required cloud properties independently from external data, e.g. by a FRESCO-like cloud detection scheme from the oxygen B band. 15 This would make the AMC-DOAS retrieval method even more independent from external data sets.