Assessment of atmospheric correction algorithms for the Sentinel-2A MultiSpectral Imager over coastal and inland waters

Abstract The relatively high spatial resolution, short revisit time and red-edge spectral band (705 nm) of the ESA Sentinel-2 Multi Spectral Imager makes this sensor attractive for monitoring water quality of coastal and inland waters. Reliable atmospheric correction is essential to support routine retrieval of optically active substance concentration from water-leaving reflectance. In this study, six publicly available atmospheric correction algorithms (Acolite, C2RCC, iCOR, l2gen, Polymer and Sen2Cor) are evaluated against above-water optical in situ measurements, within a robust methodology, in two optically diverse coastal regions (Baltic Sea, Western Channel) and from 13 inland waterbodies from 5 European countries with a range of optical properties. The total number of match-ups identified for each algorithm ranged from 1059 to 1668 with 521 match-ups common to all algorithms. These in situ and MSI match-ups were used to generate statistics describing the performance of each algorithm for each respective region and a combined dataset. All ACs tested showed high uncertainties, in many cases >100% in the red and >1000% in the near-infra red bands. Polymer and C2RCC achieved the lowest root mean square differences (~0.0016 sr−1) and mean absolute differences (~40–60% in blue/green bands) across the different datasets. Retrieval of blue-green and NIR-red band ratios indicate that further work on AC algorithms is required to reproduce the spectral shape in the red and NIR bands needed to accurately retrieve the chlorophyll-a concentration in turbid waters.


Introduction
The first of a series of Multi-Spectral Imager (MSI) instruments was launched in June 2015 by the European Space Agency (ESA) on board Sentinel-2A. MSI offers high resolution images (10-60 m) with repeat coverage of 10 days at the equator -reducing to 5 days with the inclusion of Sentinel-2B. Primarily designed as a land monitoring component of the Copernicus programme, MSI also records over coastal marine regions and can be considered an asset in water quality monitoring of inshore regions and inland water bodies that are not observable by current ocean colour sensors such as OLCI and MODIS.
The MSI optical waveband configuration resembles previous landmonitoring satellites such as NASA/USGS Landsat 7 and 8. These sensors have demonstrated use cases for water remote sensing applications (e.g. Doña et al. (2015), Khattab and Merkel (2014), Vanhellemont and contribution of atmospheric path radiance in the visible spectrum exceeds water-leaving radiance by at least 80-90% (Gordon (1978), Gordon et al. (1985)) and is predominantly due to molecular and aerosol scattering in the atmosphere, which decreases with increasing wavelength. These components of the signal must be removed so that the remaining signal can be attributed to the surface reflectance of the water. Additional effects at the water surface, such as whitecaps (Gordon and Wang (1994), Gordon (1997)), sun-glint (Steinmetz et al. (2011), Wang and Bailey (2001), Harmel et al. (2018)) and adjacency from neighbouring land (De Keukelaere et al. (2018), Santer and Schmechtig (2000), Reinersman and Carder (1995), Richter (1990)) may also be taken into account. These effects result in reflectance in addition to that of the water column and present challenges for the atmospheric correction (Moses et al., 2017). To aid improving atmospheric correction routines it is important to validate the current techniques in a wide range of water bodies and atmospheric conditions.
In situ data collected from hand-held devices and shipborne spectrometers may be considered free from atmospheric effects since the path from sensor to observation surface is negligible. These measurements and subsequent quality control can be automated and therefore used as a cost-effective validation source for the atmospheric correction of satellite data (Simis and Olsson (2013), Qin et al. (2017), Groetsch et al. (2017)). Given the short-term variability and large optical diversity found in inland and coastal waterbodies (Spyrakos et al., 2017) and disparate spatio-temporal scales of in situ and satellite sensors, validation datasets for remote sensing systems need to be sufficiently large to be representative.
In summary, a key limiting factor of space-borne water monitoring is the atmospheric correction (IOCCG, 2010). If it is of poor quality then any results derived from water-leaving reflectance are subject to large uncertainty. Sensor characteristics must also be considered, such as the suitability of the spectral sensitivity in each band to achieve a sufficient signal-to-noise ratio, and calibration between detectors observing different parts of the sensor swath. Prior validation efforts of atmospheric correction of MSI have used either a small number of match-ups, few water bodies or individual atmospheric correction routines, e.g. (De Keukelaere et al. (2018), Pahlevan et al. (2017a), Dörnhöfer et al. (2016)). In this study, we combine automated in situ measurements from two optically contrasting coastal marine systems with hand-held measurements from a range of inland water bodies with varied water quality characteristics, where high-frequency time series are scarce. The aim of this study is to assess the current performance of available atmospheric correction routines for optical water quality monitoring applications with MSI over varied water bodies on a continental scale. Match-up datasets generated over coastal and inland water are used to statistically evaluate the atmospheric correction algorithms before discussing their current ability to retrieve water-leaving reflectance in coastal and inland water environments.

Study areas
The regions analysed here can be separated into coastal and inland waters (Fig. 1). For the coastal waters there are two regions under study: the south west coast of the United Kingdom around Plymouth, and the Baltic Sea.
The Baltic Sea is a semi-enclosed brackish water body in Northern Europe spanning the latitudes 55-65°N. Due to low mineral particle concentrations, the open sea is dark (Berthon and Zibordi (2010), Babin et al. (2003)), particularly in the blue (efficient light absorption by CDOM) and red-NIR (absorption by water) parts of the spectrum, leading to a consistent reflectance peak at green wavelengths during both phytoplankton bloom and non-bloom periods . The CDOM absorption at 440 nm can range from 0.01 to 32.0 g m −3 depending on the area in the Baltic Sea (Kratzer and Moore, 2018).
The Western Channel around Plymouth, UK, (−4.5°E, 50°N to −4°E , 50.35°N) is a dynamic marine water body influenced by estuaries and tides. The suspended particulate matter (SPM) and turbidity from coastal waters is strongly contrasted to the clear offshore regions fed by oceanic currents resulting in peak reflectance in the blue (for the clearest waters) and shifting to blue-green wavelengths (with increasing coastal turbidity). Seasonally the backscatter properties of the water can vary greatly due to particulate organic matter due to the increase of phytoplankton in the spring and summer (Martinez-Vicente et al., 2010).
The inland water bodies consist of lakes within Europe including from Italy, the Netherlands, Estonia, Spain and Scotland and include a range of lakes in clear and productive conditions (Fig. 1). Information on the lakes included is given in Table 3.
The reflectance spectra used in this study thus cover a wide range of variability in the MSI wavebands. Median spectra generated from the in situ datasets, together with the Sentinel-2A MSI bands, are shown in Fig. 2.

In situ observation data
Baltic Sea above-water reflectance data were obtained from the Alg@line network through the BONUS FerryScope project. Sets of hyperspectral radiance and irradiance sensors (Two TriOS RAMSES ARC and one ACC per set) were mounted on two ferries that traverse the Baltic Sea, azimuth controlled for glint avoidance and recording data every 15 s. The setup and automated quality control of the system is described in detail in Simis and Olsson (2013) and briefly covered further below. Subsequent quality control of the reflectance spectra followed the procedures detailed in Qin et al. (2017) and outlined further below. The instruments were operational for the 2015 and 2016 spring and summer periods.
For the Western Channel (W Channel) region, shipborne abovewater reflectance observations from RV Quest carrying a set of three Satlantic HyperSaS instruments on a rotating platform were used, operating in a similar fashion to the Baltic instruments (Martinez-Vicente et al., 2013). Data were collected autonomously when the ship was at sea (usually at least once a week). These data used in this analysis have been made available as supplementary data.
For the inland test sites, the above-water reflectance data have been acquired by a variety of instruments (Analytical Spectral Devices Fieldspec, Water Insight WISP-3, TriOS RAMSES) owned, operated and periodically calibrated by the organisations contributing data. The measurements include observations from ship-mounted, hand-held and fixed-position spectro-radiometer systems. In situ data treatment procedures are described in Section 3.3.
The approximate distance to land for each in situ measurement has been taken from the Globolakes distance-to-land product (Carrea et al., 2015), a global map containing the great-circle distance from the nearest land for each pixel at 300 m spacing. The Baltic Sea dataset has the largest range, up to nearly 60 km, with 50% of observations between 18.2 and 37.5 km from land, and a median of 26.2 km. The W Channel dataset is closer to land with a range up to nearly 35 km, the middle 50% of observations between 1.9 and 14 km from land, and a median distance of 7.1 km. The inland dataset has a much smaller range up to 14 km, with the middle 50% of data between 0.2 and 1.2 km and a median of 0.5 km. Note that the static distance-to-land product does not necessarily reflect the actual distance to land at a given time (e.g. tidal effects not taken into account).
It should be noted that these in situ data contain an unknown level of uncertainty (that is, they are not 'truth') associated with the instrumentation and calibration, measurement methodology and environment the instruments were deployed in.

Earth observation data
Sentinel-2A data were acquired over Europe, with a 10 day repeat period measured at the equator. Sentinel-2A MSI data are provided by ESA processed to level-1C or level-2C. The level-1C data are radiometrically calibrated, corrected for viewing geometry and geocorrected in the Universal Transverse Mercator (UTM) projection (Drusch et al., 2012). The level-2C standard product is atmospherically corrected using the Sen2Cor (Main-Knorn et al., 2017) processor, included in the present comparison. MSI band characteristics are given in Table 1. It is important to note that the MSI instrument is constructed from 12 different detectors on two focal planes (Drusch et al., 2012). Therefore, unless calibrated correctly, the different detectors can give a different response to the same input. This is especially of interest in the region where neighbouring detectors overlap.

Fig. 1.
Image showing the in situ data observation locations (blue circles) with spectra that passed the quality control procedures: (a) the Baltic region match-ups within ± 3 h of a Sentinel-2 overpass; (b) the W Channel within ± 3 h of a Sentinel-2 overpass; (c) the inland water match-ups within ± 24 h of a Sentinel-2 overpass. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) (Steinmetz et al., 2011), Sen2Cor v2.4.0 (Main-Knorn et al., 2017, Acolite v20170718 (Vanhellemont and Ruddick, 2014), iCOR v1.0 (De Keukelaere et al., 2018) and l2gen version 7.5.1 (Pahlevan et al., 2017a). It is important to note that these processors are all under active development, but considered mature and useful to compare as they rely on different principles: Case 2 Regional Coast Colour processor (C2RCC) is based on a multi-sensor per-pixel artificial neural network method, built upon previous AC algorithms Case2Regional and Coast-Colour. Sen2Cor is built upon scene classification and look-up tables from a radiance transfer model (LibRadtran). Polymer uses a per-pixel spectral optimisation method relying on two models; one for the atmospheric reflectance and one for the water reflectance. It has the ability to work in sun-glint affected areas. Acolite bundles different aerosol correction methods for turbid water applications of Landsat and Sentinel-2 data. After Rayleigh correction, the aerosol reflectance is derived in two bands from the imagery according to the method selected in the settings, and can be fixed or variable over the subscene. The aerosol reflectance is then extrapolated using an exponential function to the other bands. The iCOR algorithm is another image-based correction which attempts to determine the aerosol optical thickness using the spectral variability of land pixels in the scene. If this fails it falls back to a Rayleigh correction only (i.e. a default value of aerosol optical thickness of 0). It requires suitable variation and distribution of land pixels for best results. The l2gen algorithm uses a bio-optical model and resolves the correction in an iterative process. Detailed descriptions are provided in the references given above. Four of the processors are specifically designed for water AC (C2RCC, Polymer, Acolite, l2gen), whereas iCOR is designed for the case of both land and inland waters but not open water. Sen2Cor is designed for land with no water application as part of its specification, but is included here because it is the default L2A processor. Its methodology requires a good distribution of land pixels. Results for the respective retrieval performance of each processor are therefore presented separately for inland and coastal datasets. All AC methods apply a correction using the viewing and solar illumination angles, in some cases together with a water model such as Park and Ruddick (2005) or radiative transfer model such as MODTRAN.

Specific processing steps
Each processor was operated using the default settings, as these should be the best options for general use without a priori knowledge of the waterbody or atmospheric conditions, with recommended options for water correction. The specific options chosen and input data used are described in Table 2. The l2gen algorithm only processes MSI data in the 'new style' Sentinel-2 file format, that which contains a single granule only. ESA are currently reprocessing data to convert to this single granule format. At time of writing 32 scenes used in the study could not be processed by l2gen (8 in the Baltic dataset, 24 in the inland dataset) due to only being available in old-style file format that contains multiple granules.
Prior to running the AC processors an open water mask was produced for each granule using the Idepix (version 2.2) operator in SNAP v5.0 Sentinel processing toolbox. Pixels with mask values belonging to any of the following flags were excluded from further analysis: invalid, cloud, cloud_ambiguous, cloud_sure, cloud_buffer, cloud_shadow, cir-rus_sure, cirrus_ambiguous, land and vegrisk. If none of these mask flags were set, the pixel was considered water and included in further analysis.
If the AC produced remote sensing reflectance (Rrs, units sr −1 ) then this option was selected. If fully normalized water-leaving reflectance was produced (ρ w , dimensionless) the output was transformed to Rrs by dividing by π. Outputs at 60 m resolution were used, or resampled to 60 m if outputs were higher resolution, for consistency as some algorithms only output at 60 m. All pixel flags were propagated through the resampling procedure to ensure flagged resampled pixels were not used.

In situ data processing and filtering
As stated in Section 2.2, the in situ data were from multiple sources and instruments. All data were collected under controlled and narrow forward looking angles and have been collected off-nadir using a nominal viewing zenith angle of 40°and are not corrected for the nonnadir view. To ensure that in situ Rrs were calculated in the same way for each contributing data source, they were processed from the measured irradiance and radiance spectra. Rrs was calculated according to the model: With L t , L s and E s the spectral water-leaving radiance, sky radiance and downwelling irradiance, respectively. The term ρ s is the fraction of L s reflected specularly on the water surface, resolved here using the procedure given in Simis and Olsson (2013) which is a spectral optimisation procedure to minimise the presence of features in Rrs that are associated with atmospheric absorption. This procedure also removes samples that do not converge on a solution, making it particularly suitable to process large volumes of shipborne observations.

Fig. 2.
Median spectra derived from the in situ data sets illustrating typical reflectance for each region. Vertical bars show the corresponding eight Sentinel-2A MSI bands. Band 8 is not shown as it is not provided as output, by default, by all the atmospheric correction algorithms. Note bands 1 and 2 partly overlap and the colour change indicates the overlap area.

Table 1
Sentinel-2A band wavelengths and resolutions. λ denotes the central wavelength, B the bandwidth, Res the pixel resolution, SNR the signal-to-noise ratio at the reference radiance L ref, and SNR 60 the estimated SNR at 60 m pixels (SNR increased with the square root of the observed area). Spectra resulting from this procedure may have a spectrally neutral offset (Qin et al., 2017) which can be problematic in relatively clear waters where the amplitude of the Rrs spectrum is comparatively low. Here, the offset was calculated, for all spectra, using the near infra-red reflectance where the absorption by pure water may be assumed to dominate the shape of Rrs. The extent to which the shape of the spectrum reveals the pure water absorption depends on turbidity (which amplifies the absorption signal due to particle scattering). Thus, an elevated signal in the near infra-red should increasingly take the shape governed by the absorption of pure water. If the signal was elevated but this shape was absent, the correction became proportionally larger. However, when the water is turbid, the shape of the NIR spectrum should be increasingly featured and the correction would be proportionally smaller. Thus the signal due to turbidity is left intact when it is present. The procedure to calculate ε from the ratio of bands at 779 and 865 nm was: where a w (λ) is the absorption by pure water at the specified waveband, obtained from Roettgers et al. (2011). The processed in situ Rrs spectra were subsequently filtered to remove any remaining erroneous or suspicious measurements. The filtering regime was slightly different depending on the region and was performed using characteristics of the spectral shape (Qin et al., 2017) where the following criteria had to be met for coastal observations: • Mean Rrs intensity in range (350-400 nm) ≥ −0.0005 sr −1 such that spectra are not significantly negative in the ultraviolet range.
• Mean Rrs intensity in range (800-900 nm) ≥ −0.0005 sr −1 such that spectra are not significantly negative in this part of the near infra-red range.
• Maximum Rrs intensity < 0.015 sr −1 to remove spectra affected by white caps or sun glint.
• Any peak in range (760-770 nm) < 10% of the maximum in range (560-600 nm) such that spectra do not show significant effect of the oxygen absorption peak (possible instrument calibration error).
• Peak signal occurs < 600 nm, corresponding to reflectance expected in relatively clear waters either with (Baltic Sea, green peak) or without (W Channel, blue-green peak) strong influence of coloured dissolved organic matter.
In the first two above tests, a small negative reflectance is allowed because the in situ data contain uncertainties, especially in these low signal wavebands. Therefore otherwise good spectra can show small negative reflectance in these regions.
In addition to these spectral shape tests the in situ data were rejected if the sun zenith angle < 30°or, for the Baltic region, the downwelling photon flux of photosynthetically active radiation (PAR) was < 350 μmol photons m −2 s −1 .
For the lake in situ data the above spectral shape filters were not used due to the larger variability in optically active substances influencing spectral shape. However, these data originated predominantly from manually operated systems which are more likely to avoid  (Sterckx et al., 2015) Applies vicarious calibration gains (Pahlevan et al., 2017a) Chl-a initialised at 10 mg m −3 TSM at 1 mg m −3 Automatic aerosol type, mid-latitude and ozone Table 3 Number of match-ups by processor, after averaging and removing duplicates and invalid match-ups (e.g. cloud affected) as described in Section 3.4. Typical water type and approximate size of each bounded body is also given.

Extraction and match-up routine
For the MSI image pixel extraction, data were taken from a 3 × 3 window (equivalent to 180 × 180 m) centred at the observation location. The central pixel was used as the match-up value and the standard deviation of the 3 × 3 window was used in the Deming regression (see below). The central pixel was used rather than the mean of the window since the data have already been resampled from 10 m and 20 m to 60 m, and considering a larger window reduces the attractiveness of the high resolution data.
When multiple in situ observations occurred within the spatial extent of the central MSI pixel, the median in situ spectrum was calculated together with its standard deviation, which again was used in the Deming regression. Many observations were made from shipborne instruments, but since the MSI overpass time and the in situ observations are not generally coincident (i.e. up to a few hours apart) the ships should not be expected to impact on the EO observation. In the occasional case where the ship is in the image, that pixel is expected to be  masked out as non-water. The ship structure should have minimal impact on the in situ observations since measurements were taken from the bow, avoiding shade and reflections from the ship using viewing azimuth angle control as described in Simis and Olsson (2013).
Each match-up was thus constructed from a unique set of in situ measurements and MSI pixel value. MSI bands from 440 nm to 845 nm were used except band 8 (835 nm) as some AC algorithms use this band to determine aerosol type for the correction.
Match-ups were discarded if the Idepix mask suggested the pixel was obstructed water (i.e. pixels masked as per Section 3.2) or if the estimated MSI spectra had an Rrs value > 1 or less than −0.01 sr −1 for any waveband. In addition, as identical MSI data can be in multiple tiles (due to edge overlaps or different UTM zone projections) match-ups were filtered to only include one copy if multiple tiles contained the same match-up.
Finally, match-ups were filtered by time between MSI and in situ observation; 3 h for coastal and 24 h for inland (see Section 4.1). Number of match-ups per water body is available in Table 3. These range from 530 to 986 match-ups for the Baltic Sea dataset, 395 to 462 for the W Channel dataset and 77 to 230 for the inland dataset, dependent on AC algorithm.
For the match-up analysis a set of statistics to describe the difference between the MSI and in situ data was calculated. This included the square of the Pearson product correlation (R 2 , the coefficient of determination), the mean absolute percentage difference (ψ), the root mean square difference (RMSD) and the mean relative difference (δ).
In addition to these statistics, a Deming regression was used to determine a gradient and intercept to indicate the general trend of the data, where the standard deviations of the 3 × 3 extractions and in situ data are incorporated as estimates of the observation errors. An intercept of zero and gradient of one suggest a good fit between the data (in general). Further, the in situ and atmospherically corrected MSI spectra were compared in regard to their shape, that is, how closely each MSI derived spectrum agrees with the matching in situ spectrum. The spectral angle (Kruse et al., 1993), Eq. (7), was used as a measure of similarity. The angle can range from 0°to 180°with a lower angle implying more similar spectra, and was calculated over all wavebands. Individual spectra (not median spectra) were used for spectral angle calculations. x y x y cos 1 (7)

Band ratios
Certain water quality constituents can be retrieved from atmospherically corrected satellite data. In particular, chlorophyll-a (chl-a) concentration can be retrieved using band-ratio algorithms (Blondeau-Patissier et al., 2014). As a ratio of bands is used, the spectral shape tends to be more important than the absolute value of intensities, so that useful data can be retrieved despite uncertainties and wavelength independent biases. Two common ratios used are a blue-green ratio (O'Reilly et al., 1998), typically used for open ocean, and a NIR-red ratio (Moses et al., 2012) typically used in turbid and CDOM enriched waters. Ratios at wavelengths: 444/560, 490/560 and 704/665 are reported here.

Summary of pre-processing analyses
The time window used for the match-up analysis has an effect on the statistics produced. Longer windows produce more matchups from more MSI scenes, but risk the comparison of different states of the system due to its dynamic nature. A window of ± 3 h was used for the coastal waters (W Channel and Baltic Sea) whereas ± 24 h was used for the inland waters, to produce additional matchups in the latter. For comparison, Qin et al. (2017) used a ± 3 h window for the Baltic Sea, Kutser (2012) used ± 3 days for inland water, Toming et al. (2016) used up to several days and Cardille et al. (2013) used even longer time scales. We verified against shorter and longer windows that they presented the best compromise between size of the matchup data set and scatter introduced with increasing window length.
The MSI footprint consists of data from multiple detectors that fall into two groups of azimuth viewing angles in an alternating pattern. Banding between these detectors is visible in the data due to the differing atmospheric path lengths, although after an ideal AC banding should be removed. An analysis of the RMSD between observations on different detectors and different AC algorithms (not shown here) suggested no evidence that one detector performed worse than others, but a varying presence of banding was visible in all bands after AC by all algorithms.
Further additional analysis suggested that spatial averaging of TOA reflectance prior to AC improves the signal-to-noise ratio across all the MSI bands, at an obvious loss of spatial resolution (results not shown). This is important to note for optically dark waters where resampling could improve SNR and therefore the retrieved signal.

Number of granules
The number of MSI granules with match-ups for each algorithm is shown in Table 4. This is an important indicator towards the variability in atmospheric conditions corrected for by the algorithms. All produced match-ups from 5 granules in the W Channel. For the Baltic Sea all AC models except for l2gen produced match-ups from a similar number of granules (between 30 and 32). For the inland dataset there is a larger variability across the AC processors ranging from 10 to 35.

Number of land pixels
Sen2Cor and iCor use land pixels to derive parameters for the correction. Therefore examining the percentage of land pixels in each granule is a first-order approach to investigate the extent that land pixels influence the result. The percentage of unobscured land pixels versus the RMSD of match-ups in the 560 nm band is shown in Fig. 3. There are no obvious trends that suggest a higher percentage of land pixels improves the RMSD of the match-ups for either processor, although it should be noted that for many granules (i.e. points on the plots) there are few observations. It is expected that those with a higher number of observations have more representative RMSD values. Those with more observations tend to be in the coastal regions and therefore have lower percentage of land pixels.

Coastal waters match-up analysis
Scatter plots of Rrs for each of the visible and near infra-red bands from each AC algorithm are shown in Figs. 4-9. Here only the matchups within ± 3 h are considered. Statistics are reported for each AC algorithm calculated for the Baltic Sea region (Table A1) and the W Channel region (Table A2), and are summarised in Figs. 10 and 11 respectively.
Considering the Baltic Sea region first and examining Fig. 10, which shows the R 2 , RMSD, ψ and δ, it can be seen that Polymer had the highest R 2 , albeit under 0.4, for all bands except the 443 and 740 wavebands. It also returned, generally, the lowest RMSD and ψ. C2RCC performed similarly to Polymer but with lower R 2 . Sen2Cor returned the highest R 2 in the shortest waveband but with high RMSD, ψ and δ throughout, whereas l2gen had the highest RMSD performing poorly across all bands. Acolite was placed between the best performing cluster of Polymer and C2RCC and the least performing cluster of Sen2Cor, l2gen and iCOR. The number of match-ups returned was > 780 for each processor except for l2gen which returned 530, having resulted in 8 fewer scenes. These statistics are mirrored by the scatter plots. The match-up comparison of Baltic Sea points shows closer correlation and lower bias for Polymer (Fig. 8), C2RCC (Fig. 5) and l2gen (Fig. 7), whereas scatter was more prominent for the other algorithms. Acolite (Fig. 4) tended to perform better in the blue to red bands than in the NIR bands. The iCOR (Fig. 6) and Sen2Cor (Fig. 9) processors generally overestimated Rrs. In these cases, the extrapolation from land pixels could have underestimated atmospheric path radiance over coastal waters, but more likely the default parameterization of the landbased algorithms, or their fall-back routines, were not sufficient.
Over the W Channel both iCOR and Sen2Cor again showed undercorrection of the MSI reflectance, as expected with limited land pixels. Acolite performed relatively better in this region than in the Baltic Sea with a higher R 2 and lower RMSD, but still has a problem representing the NIR bands. C2RCC appeared to slightly over-correct the data in this region. The l2gen processor also over-corrected whereas Polymer performed better with points clustered around unity. Sen2Cor and iCOR showed the highest RMSD, ψ and δ and low R 2 (Fig. 11). There is very little separating the other four algorithms in this region apart from the number of match-ups. Acolite yielded approximately 50 fewer matchups. The other algorithms yielded between 448 and 462 match-ups. C2RCC performs best, achieving the highest R 2 and lowest RMSD, followed by Polymer, Acolite and l2gen. These four AC also have a regression slope closer to 1.

Inland match-up analysis
The statistics for each AC algorithm calculated for the inland water match-ups within a ± 24 h time window are shown in Table A3. The scatter plots in Figs. 4-9 show the inland match-ups as red circles and Fig. 12 the plots of the statistics from Table A3. The number of matchups range from 77 for l2gen to 230 for Polymer. It can be seen (Fig. 12) that the algorithms performed more similarly over inland waters compared to coastal regions while, generally, Polymer and C2RCC performed the best and Polymer attained the most match-ups. Acolite attained higher R 2 in the red and NIR bands, C2RCC achieved generally the lowest RMSD and ψ but with Polymer performing best in the 560 nm band.
The RMSD statistic provides a per-band average of residual difference between the in situ and estimated spectra. The shape of the RMSD plots in Figs. 10, 11 and 12 can be used to identify possible sources for the residual difference. The curves of the Sen2Cor and iCOR RMSD in the coastal regions (Figs. 10 and 11) appear very similar in shape to wavelength dependency of aerosol optical thickness and could be associated with the under-correction seen in the scatter plots for these ACs in Figs. 6 and 9. This under-correction is understood to be due to the methodology used by these algorithms to estimate the aerosol model and parameters. They rely on the presence of land pixels, and if they are absent default to a low estimate for aerosol optical thickness (0.2 for Sen2Cor (Main-Knorn et al., 2017) and 0.1 for iCOR). This shape is not evident in the inland water dataset (Fig. 12) where there are many more land pixels available to calculate a better estimate of aerosol optical thickness. The shapes of the C2RCC, Polymer and l2gen RMSD plots in Figs. 11 and 12 appear to scale with the amplitude of water-leaving reflectance, which suggest that the aerosol scattering is not the dominant part of these errors.

Combined and common match-up datasets
Analysis has also been undertaken using the combined coastal and inland datasets (albeit with significantly more coastal than inland water observations) and a dataset of match-ups common between all AC algorithms (consisting of 521 match-ups). The ratio of coastal:inland match-ups for each AC in the combined dataset is: Acolite 1179:163, C2RCC 1416:163, iCOR 1420:217, l2gen 982:77, Polymer 1438:230, Sen2cor 1428:160. For the common dataset there were 514 coastal and 7 inland match-ups.  Tables A4 and A5 for the combined and common datasets respectively. Polymer and C2RCC again performed best on these datasets, producing high R 2 and low RMSD, ψ and δ. In both datasets, C2RCC tended to give lower RMSD, ψ, δ and intercept, whereas Polymer achieved higher R 2 and produced more match-ups in the combined dataset.
Inspecting the common match-ups (Table A5 and Fig. 13) very little separated Polymer and C2RCC in R 2 and RMSD, with C2RCC showing lower ψ for all wavelengths except 705 nm. Acolite, l2gen, Sen2cor and iCOR all had ψ > 100% for all wavelengths except 490 and 560 nm for l2gen and 560 nm for Acolite. RMSD was close to a factor of 2 (or greater) more for these ACs, and the R 2 was lower for bands < 705 nm.

Spectral similarity
The spectral angle was calculated for all the match-ups for each of the regions and the set of common match-ups for all AC algorithms (Fig. 14), where a smaller angle implies a more similar shape. In the Baltic Sea dataset, 75% of the spectra for all AC algorithms had a spectral angle < 50°and median < 32°. For the W Channel region all the AC algorithms except l2gen had median spectral angle < 36°, with Polymer and C2RCC having the tightest distribution and lowest medians, both at 8°. For the inland waters C2RCC has the lowest median spectral angle (20.4°), followed by Sen2Cor, iCOR, Polymer, Acolite and l2gen. When considering only the common set of match-ups, Polymer had the tightest distribution and together with C2RCC the lowest median angle, whereas l2gen had the widest distribution and median angle of 70°.

Implications with band ratio algorithms
Ratios at wavelengths: 444/560, 490/560 and 704/665 were produced and are shown in Fig. 15, with the statistics based on the combined match-up dataset in Table A6. The scatter plots show a much better agreement for the blue-green ratios than the NIR-red ratio, with the 490/560 ratio showing the least scatter around the line of equality for the three ratios tested. Polymer together with the 444/560 ratio gives the highest R 2 with low RMSD, ψ and δ. However, iCOR in the 490/560 ratio also has low RMSD, ψ and δ but a lower R 2 compared to Polymer. The statistics for l2gen were particularly poor in the 704/665 ratio, skewed by the Baltic dataset (off the axes in Fig. 15). The C2RCC plots show a small set of points with identical values suggesting that additional flagging of invalid results may be needed.

Summary discussions
C2RCC and Polymer performed best overall in both coastal and inland regions and achieved the lowest median and inter-quartile ranges of spectral angles in the coastal regions. This suggests that they best reproduced the spectral shape of the in situ data from all AC models under comparison. However, none of the AC models performed well over the entire bandset (444 to 865 nm). Sen2Cor and iCOR performed relatively poorly in the coastal waters -as expected due to the land-based methodology which generates scene parameters requiring a distribution of pixels containing land. In the absence of land, these corrections fall back to default values of aerosol optical thickness (set in configuration files). Therefore it is suggested that the tested versions of these processors are not used outside their recommended scope. However, for the inland waters, they showed improved performance, suggesting that these algorithms are better suited to correct inland water bodies rather than coastal zones. C2RCC achieved the lowest (best) median spectral angle for the inland water region, followed by Sen2Cor and iCOR.
The number of match-ups identified for each AC (excluding l2gen for reasons previously mentioned) was similar, with the exception of Acolite in the coastal datasets, which returned fewer match-ups from a comparable number of granules.
All AC methods compared use different methodology (see Section 3.1). It is therefore interesting that two of these methods (polynomial fitting in Polymer and neural networks in C2RCC) performed similarly well across the different water types. It is also interesting to note that for the inland dataset (Table A3) Polymer performed best in the 560 nm band whereas C2RCC had better statistics in general for the other bands. The better performance of C2RCC with spectral angle in the inland dataset suggests it is resulting in spectra more similar in shape to the in situ spectra, which will be beneficial in the retrieval of diagnostic absorption properties. These results are mirrored in the common and full match-up datasets where Polymer and C2RCC again perform best, giving very similar RMSD, higher R 2 and the lowest ψ.
Of concern are the high uncertainties present across all the AC methods. The green (560 nm) and blue (490 nm) bands, corresponding to higher reflectance amplitudes in a large part of the data set, were handled best by all ACs, with the red, red-edge and NIR bands having often very high uncertainties (ψ > 1000%). These high uncertainties in the NIR bands are likely, in part, due to the low signal strength (high absorption) and lower SNR of the MSI sensor at these wavelengths (Table 1). It would be of interest to investigate whether further spatial resampling could improve results for these bands. C2RCC tended to get the lowest ψ but still > 250% in the red band (Table A5). The highest R 2 , 0.72, is returned by Polymer in the 560 nm band in the inland dataset, suggesting it resolves the overall amplitude of the spectrum relatively well, which could prove beneficial in mapping optically active components best characterized by light scattering, such as suspended sediment concentrations. Overall, however, R 2 values were rarely above 0.5. In the common match-up dataset (Table A5) only Polymer and C2RCC attained R 2 > 0.5.
One source of uncertainty is the adjacency effect related to the distance to shore. The spread of distance-to-shore measurements for each dataset (Section 2.2) implies that one would expect the inland waters to be affected more than the Baltic Sea and W Channel datasets. The RMSD at 560 nm for Polymer and C2RCC increases by approximately 25-55% in the inland dataset compared to the coastal ones. A   Fig. 10. Plot of the statistics for the Baltic Sea region (from Table A1). The plot of ψ is shown on a log scale and δ on a semi-log scale due to the large variation between processors and wavelengths.
part of this increase is likely due to the adjacency to land. In contrast, the RMSD at 560 nm for iCOR and Sen2Cor decreased in the inland dataset compared to the coastal by 10% and 25% respectively, likely due to better approximation of the aerosol from the presence of more land pixels.
If the end goal is to retrieve chlorophyll-a concentration through a band-ratio algorithm, then for clear waters the results warrant the use of the 490/560 blue-green ratio. This showed lower scatter around the line of equality than the 704/665 ratio and the 444/560 ratio. However, blue-green algorithms are not suitable in turbid regions. Therefore, AC algorithms need to improve to replicate the spectral shape in the NIR bands so that the NIR-red band ratio algorithms can be used in such regions. This is of high importance for coastal and inland waters which are naturally more turbid. The performance of AC algorithms across the three datasets and three band ratios varied, but C2RCC, iCOR, Polymer and Acolite out performed l2gen and Sen2cor when taking into account all datasets, the latter two performing particularly worse in the Baltic Sea.
Further improvement with all processors that are still under development may be expected as additional insights are provided from match-up analyses such as presented here. A new algorithm in Acolite has recently been released and is currently undergoing validation. In future work, provided additional in situ observations, the performance of individual processors over optically different water classes should be explored, so that the best-performing techniques may be implemented in applications for specific optical water types.
This value of the availability of hyperspectral in situ data from a combination of hand-held and automated sensors cannot be understated. The hyperspectral data cover the 400-900 nm spectrum with a contiguous set of narrow bands, allowing the data to be averaged using the spectral response function of the MSI instrument. These sensors can be used complementarily with others such as the AERONET-OC dataset. Automation of these systems will ensure that new sensors and algorithms can continue to be validated within much shorter timescales and against a much wider range of atmospheric and water conditions, than previously achieved.

Conclusions
Six atmospheric correction algorithms were tested with Sentinel-2A MSI data. A dataset of in situ measurements from different sources covering a variety of water bodies, classified as coastal (Baltic and Western Channel) and inland, has been used to generate a set of matchup points within a ± 3h window with the Sentinel-2A overpasses for coastal data and ± 24 h for inland. These match-ups have then been used to generate statistics to rate the performance of each algorithm with the aim of guiding algorithm developers to continue to improve their products for Sentinel 2 MSI data. Fig. 11. Plot of the statistics for the W Channel region (from Table A2). The plot of ψ is shown on a log scale and δ on a semi-log scale due to the large variation between processors and wavelengths.
The match-up procedure broadly follows the recommendations from (Bailey and Werdell, 2006) but with some differences for inland waters. That is: using a consistently processed in situ data set and removing suspect data, using suitable time windows of ± 3 h for coastal and ± 24 h for inland water, using the mean of a window of data (here a 60 m resampled pixel is essentially the mean of 10 m or 20 m resolution bands) and applying the appropriate masks from Idepix and the AC algorithms. A homogeneity test was not used on the pixel window but standard deviations were used in the Deming regression for the slope and intersect statistics.
The statistics derived from each dataset showed high levels of uncertainties with low R 2 for most algorithm/dataset/band combinations (Tables A1-A5). These will have been due to a combination of error in the MSI and in situ datasets. But even with these uncertainties, it is possible to draw conclusions on performance of algorithms in different water bodies and see which of the MSI bands are corrected adequately. Results from this analysis should be considered with the proviso that all algorithms are still under active development with improvements expected at each new release. However, the algorithms are considered mature and are in use, so it is appropriate to test their current performance. The red to NIR bands do not show good results, with much higher uncertainties, which is a concern for coastal and inland water monitoring as these bands are required to determine key parameters such as chlorophyll-a concentration and turbidity.
It is anticipated that the majority of match-ups will be affected to some extent by the adjacency effect of nearby land. A study by Bulgarelli and Zibordi (2018) suggests that MSI data can be affected at least up to 20 km from the shoreline, dependent on the land cover types and AC methodology. Only the iCOR AC algorithm had an adjacency correction built-in. With 50% of the inland match-ups < 0.5 km from the shore, adjacency will be a significant error source and should be considered by all inland AC developers.
A varied selection of lakes is included in the match-up dataset representing a high diversity of optical water types (see Table 3). Nevertheless, the size of the dataset is still limited and prevents further analysis of the performance of AC over clear versus turbid and productive lakes. Given the continued development of the AC processors, including new algorithms being implemented, we identify a need for a repeat analysis, particularly as both Sentinel-2A and 2B could then be included. An increase in the number of match-up points leads to more robust statistics, and including more water bodies in the dataset gives further information on the performance under varying atmospheric conditions. With a much larger dataset and associated observations on the optical properties, it will be possible to separate the match-ups by water type, giving further information and possible targeting of water bodies with specific AC processors.
With a database of continuously acquired data, hopefully including an increasing number of automated stations to capture every overpass, Fig. 12. Plot of the statistics for the inland water match-ups (from Table A3). The plot of ψ is shown on a log scale and δ on a semi-log scale due to the large variation between processors and wavelengths. Fig. 13. Plot of the statistics for the common match-ups (from Table A5). The plot of ψ is shown on a log scale and δ on a semi-log scale due to the large variation between processors and wavelengths.

Fig. 14.
Box plots of spectral angle for each AC algorithm. This is a measure of similarity of the corrected MSI and in situ spectra. A smaller angle suggests a more similar shape. The whiskers denote the full range (min and max), the red line shows the median and the boxes cover the 25-75% range. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) and with global contributors, newly launched satellites could be validated and calibrated in shorter time frames than currently possible.

Acknowledgements
We thank Sandeep Chittimalli and Nima Pahlevan of NASA for providing the initial l2gen processed Sentinel-2A scenes, Liesbeth De Keukelaere for information on the iCOR algorithm and members of the EOMORES project consortium for supplying in situ measurements. Data storage and processing were performed using hardware purchased by NERC Earth Observation Data Acquisition Analysis Service (NEODAAS). Data from the Sentinel missions are freely available from the ESA data hub (https://scihub.copernicus.eu/dhus). This work has received funding from the European Union's Horizon 2020 research and innovation programme (EOMORES,grant agreement no. 730066,TAPAS,grant agreement no. 678396) and Copernicus Global Land Service.