Sensitivity analysis of the dark spectrum fitting atmospheric correction for metre-and decametre-scale satellite imagery using autonomous hyperspectral radiometry

: The performance of the dark spectrum ﬁtting (DSF) atmospheric correction algorithm is evaluated using matchups between metre-and decametre-scale satellite imagery as processed with ACOLITE and measurements from autonomous PANTHYR hyperspectral radiometer systems deployed in the Adriatic and North Sea. Imagery from the operational land imager (OLI) on Landsat 8, the multispectral instrument (MSI) on Sentinel-2 A and B, and the PlanetScope CubeSat constellation was processed for both sites using a ﬁxed atmospheric path reﬂectance in a small region of interest around the system’s deployment location, using a number of processing settings, including a new sky reﬂectance correction. The mean absolute relative diﬀerences (MARD) between in situ and satellite measured reﬂectances reach < 20% in the Blue and 11% in the Green bands around 490 and 560 nm for the best performing conﬁguration for MSI and OLI. Higher relative errors are found for the shortest Blue bands around 440 nm (30–100% MARD), and in the Red-Edge and near-infrared bands (35–100% MARD), largely inﬂuenced by the lower absolute data range in the observations. Root mean squared diﬀerences (RMSD) increase from 0.005 in the NIR to about 0.015–0.020 in the Blue band, consistent with increasing atmospheric path reﬂectance. Validation of the Red-Edge and NIR bands on Sentinel-2 is presented, as well as for the ﬁrst time, the Panchromatic band (17–26% MARD) on Landsat 8, and the derived Orange contra-band (8–33% MARD for waters in the algorithm domain, and around 40–80% MARD overall). For Sentinel-2, excluding the SWIR bands from the DSF gave better performances, likely due to calibration issues of MSI at longer wavelengths. Excluding the SWIR on Landsat 8 gave good performance as well, indicating robustness of the DSF to the available band set. The DSF performance was found to be rather insensitive to (1) the wavelength spacing in the lookup tables used for the atmospheric correction, (2) the use of default or ancillary information on gas concentration and atmospheric pressure, and (3) the size of the ROI over which the path reﬂectance is estimated. The performance of the PlanetScope constellation is found to be similar to previously published results, with the standard DSF giving the best results in the visible bands in terms of MARD (24–40% overall, and 18–29% for the turbid site). The new sky reﬂectance correction gave mixed results, although it reduced the mean biases for certain conﬁgurations and improved results for the processing excluding the SWIR bands, giving lower RMSD and MARD especially at longer wavelengths ( > 600 nm). The results presented in this article should serve as guidelines for general use of ACOLITE and the DSF


Introduction
A multitude of metre-and decametre-scale optical satellite sensors have been collecting regular imagery of the earth in the last few years, including Landsat 8 (2013-present), Sentinel-2 (2 units, 2015-present and 2017-present), RapidEye (5 units, 2012-present) and PlanetScope (100's of units since 2015).Due to the high spatial resolution at scales more relevant to human activities [1], data from these sensors have been successfully applied in remote sensing of coastal and inland water bodies, e.g. for the mapping of water depth [2], turbidity [3][4][5][6], water clarity [7,8], chlorophyll a concentration [9] and assessing impacts of dredging operations [10,11] and cyanobacterial blooms [12,13].In addition to the selection of cloud-free data, the performance of these applications mainly depend on the accuracy of the estimated atmospheric and air-water interface effects, the removal of which is frequently referred to as atmospheric correction (a/c).
Typical algorithms for a/c over open ocean and coastal waters include the use of assumed zero reflectance in the NIR [14] or SWIR [15], with non-negligible NIR reflectance estimated using the retrieved chlorophyll a concentration [16] or a simple water reflectance model and spatially consistent aerosols [17][18][19].Recently, the use of a multi-band NIR-SWIR algorithm has been proposed, offering improved performance over complex waters [20].Other approaches involve modeling of the water reflectance spectrum to best fit a given satellite pixel [21][22][23], or the unmixing of nearby land/water pixels to provide an estimate of the atmospheric contribution [24][25][26].The Dark Spectrum Fitting (DSF) a/c algorithm was designed for metre-and decametrescale sensors [4,27] in order to retrieve water reflectances and water turbidity over turbid coastal and inland waters.In essence, the DSF is an extension of the algorithms assuming a negligible NIR or SWIR signal over water, and the algorithms assuming a certain spatial extent of aerosol type and concentration.Rather than a priori determining the bands with negligible signal, the DSF determines the optimal bands for a given image or image subset.Recent studies found good performance of the DSF algorithm for clear and moderately turbid lakes [7,28,29], complex coastal and estuarine waters [4,[30][31][32], including extremely turbid estuaries [6,33].The DSF seems well suited for processing imagery over turbid and extremely turbid waters, especially with the aim of retrieving water turbidity.
The performance of a/c algorithms is most commonly evaluated through the comparison of satellite data with in situ measurements as close as possible in time, ideally bounding the satellite overpass.For this purpose, the AERONET-OC network [34] has provided invaluable long-term data for dozens of sites worldwide.The AERONET-OC stations are equipped with a multispectral narrowband CIMEL radiometer, typically 8 bands between 412 and 1020 nm, with band widths of 8-10 nm, that measures water-leaving radiance multiple times per day.A number of more recent deployments also include a number of Red and NIR bands, e.g. at 620, 667, 709, and 781 nm.The original bands on the CIMEL were designed specifically for the validation of ocean colour satellites, and are centered on typical narrow spectral bands of SeaWiFS, MODIS, MERIS, and OLCI, and may not be ideally suited for the validation of broadband high resolution sensors.Due to the large spectral variability of water targets within broad bands, the AERONET-OC measurements require band shifting [35] (e.g. through neural networks [36] or linear interpolation [4]) to be useful for this purpose.Furthermore, the AERONET-OC spectral coverage is sparse in the Red and Near-InfraRed part of the spectrum, with typically only 2 bands between 650 and 900 nm.Hyperspectral measurements are better suited for validation of broadband sensors, and eliminate errors originating in the band shifting process.Manual measurements of hyperspectral reflectance are time-consuming and efforts to match the overpass times of narrow swath satellites are rarely fruitful.To increase matchup data availability, autonomous systems have been deployed on ships of opportunity [37,38] and fixed measurement poles [39,40].For calibration and validation of ocean colour satellites, hyperspectral in situ measurements are made using buoys BOUSSOLE [41] and MOBY [42] that are deployed in typically clear oceanic waters.
In the present study, the DSF as applied to a number of high resolution satellites, in particular Landsat 8, Sentinel-2 A/B, and PlanetScope, is evaluated using matchups with two autonomously measuring PANTHYR systems [40] in two sites featuring distinct water types, one in the Belgian coastal zone, and one in the Adriatic Sea.This paper is intended as an update of validation results as previously presented [4,5,27], and includes a sensitivity analysis to various processing settings in order to make general recommendations to the users.A new method to estimate the sky light reflected at the air-water interface is implemented.This study provides the first validation of Panchromatic and Orange contra-band [13] water reflectances as derived from the Operational Land Imager (OLI) on board Landsat 8, and extends the validation to the Red-Edge and NIR bands on Sentinel-2 previously not possible with multispectral in situ measurements.

In situ data
Two autonomous PANTHYR systems [40] were deployed at the Acqua Alta Oceanographic Tower (AAOT, 45.3139°N, 12.5083°E) and at Research Tower 1 near Oostende (RT1, 51.2464°N, 2.9193°E), and are measuring independently every 20 minutes during daytime, respectively from 2019-09-26 and 2019-12-11.The water at AAOT is fairly clear, but is characterised by a large variability in optical properties due to its location in transitional waters [34,43].The water at RT1 is turbid with tidal variability and with an occasional outflow from the port of Oostende reaching the site.Deployment locations are shown in Fig. 1, and examples of water spectra illustrating the differences between the sites are provided in Fig. 2. The measurement protocol and processing steps follow the general approach of [44,45], but with sequential rather than simultaneous measurement of water and sky radiance -for details see [40].Viewing nadir angle is set to 40 degrees, and the relative azimuth to the sun was kept at 90 or 135 degrees for minimising sun and sky glint on the air-water interface and the shadow of the platform, depending on the deployment superstructure and given sun position.Water leaving radiance was computed from the average of 11 replicate measurements of total upwelling radiance (L u ), and 6 replicate measurements of sky radiance (L d ).This full measurement cycle for a single relative azimuth angle takes around 1 minute for the RT1 deployment, and 4.5-7 minutes for the AAOT deployment.These differences are caused largely by the hardware and software used on the two prototype systems, and to a smaller degree by the average water signal at both sites.These measurements are considered fast enough to avoid changes in the water mass, and in addition, temporal stability checks between adjacent scans are performed during processing [40].Wind speed was linearly interpolated from 6-hourly NCEP/MET model results to the time and location of the measurement in order to retrieve the fraction of L d reflected at the air-water interface (ρ f ) according to the lookup table provided by [44]: (1) Fig. 2. Median (solid) and 5th and 95th percentiles (dotted lines) ρ w spectra from the PANTHYR at AAOT (blue) and RT1 (orange).Number of spectra given in brackets.
The lookup table was linearly interpolated to the sun zenith angle and retrieved wind speed at time of observation.Water reflectances were then computed using the average of 6 replicate measurements of downwelling irradiance (Ed):  3. Imagery from all sensors was processed to water reflectances (ρ w ) using ACOLITE/DSF for a region of interest (ROI) around the site, as defined by a bounding box in latitude and longitude.In the present study, a 3x3 km ROI with fixed path reflectance estimation was used across all sensors and configurations.PS imagery was processed using the standard DSF [5] and DSF using the new atmospheric lookup table and the new sky reflectance correction (see section 2.3).For L8/S2, 1x1 and 12x12 km ROI were also used to evaluate the sensitivity of the aerosol and path reflectance determination.Output resolution was 30 m for L8 and 10 m for S2, except for one configuration processing S2 at 60m.L8/S2 imagery was processed using the following twelve processing options: 1. DSF Standard Dark Spectrum Fitting, using ancillary datasets for pressure, ozone and water vapour, with fixed path reflectance for an approximately 3x3 km ROI.

DSF+1x1km
As DSF, but for a 1x1 km ROI.

DSF+12x12km
As DSF, but for a 12x12 km ROI.
7. DSF+GC+NewLUT As DSF+GC, but using the new lookup table for atmospheric parameters (details in section 2.3).

DSF+NoSWIR
As DSF, but limiting the bands used in the path reflectance retrieval to those between 400 and 900 nm.
10. DSF+NoSWIR+SkyTOA As DSF+NoSWIR, but using the new sky reflectance lookup table integrated in the a/c (details in section 2.3).

Atmospheric correction
The top-of-atmosphere reflectance (ρ TOA ) is defined as a function of target reflectance (ρ s ) and atmospheric parameters [46], the path reflectance (ρ path ), down and upward total transmittance (T d and T u ), respectively for the sun-target and target-sensor paths, and the spherical albedo of the atmosphere (S), ignoring adjacency effects: For water pixels, the surface reflectance (ρ s ) is defined as the sum of the water-leaving radiance reflectance (ρ w ) and a component related to the air-water interface reflection of sky-and sunlight (ρ Sky ).The Dark Spectrum Fitting (DSF) atmospheric correction assumes invariability of the atmospheric properties for a given image or image subset, and uses a dynamic dark target and band selection to retrieve the atmospheric path reflectance, ρ path .A spectrum of lowest observed reflectances, i.e. the dark spectrum, ρ dark , is created and filtered for outliers, e.g. using the offset of a linear regression through the darkest pixels in each band.The targets providing the lowest reflectance typically include low turbidity waters, cast shadows, or dark vegetation, and can vary spectrally.For each band in this dark spectrum, the ρ path is fitted to give an estimate of aerosol optical thickness (τ a ) at 550 nm for two aerosol models (Continental and Maritime), assuming zero surface reflectance (ρ s ) for land pixels, or zero ρ s for water pixels after the removal of an estimated ρ Sky component.The minimum τ a retrieved across this spectrum is then used to derive the ρ path for all bands.The model giving the lowest RMSD between ρ path and ρ dark for the two best fitting bands is then used for further processing.
In this paper, two updates to the DSF are presented and evaluated: (1) a finer wavelength spacing in the atmospheric lookup tables, and (2) an alternative sky reflectance correction that takes the aerosol load into account.The atmospheric lookup table is simulated with 6SV [47] for the same configuration as the one in [4,27] but using a finer wavelength step: 10 nm from 380-900 nm, 100 nm from 900-1500 nm, and 50 nm for 1500-2400 nm.The lookup tables are generated using three atmospheric pressure values: 500, 1013, and 1100 hPa, for an equivalent elevation range of about 4000 m above to 500 m below sea level.During processing, the lookup tables are linearly interpolated to the ancillary or fixed pressure.In the previous version of the DSF, the sky reflectance was estimated just from the Rayleigh contributions, and is here updated to take both Rayleigh and aerosols into account.The ρ Sky was modelled using OSOAA [48] for the same geometry and wavelength configuration as the new atmospheric lookup tables, using the Continental and Maritime aerosol models.OSOAA was modified to make the water fully absorbing, in order to retrieve just the sky reflection at the water surface for a given view and sun geometry, i.e. the radiance (including Rayleigh and aerosol contributions) reflected at the air-water interface (L Sky ) normalized to the irradiance at the surface (E d ): The outputs were generated at sea-level with normal pressure and no wind, and the resulting lookup table can be used to generate hyperspectral ρ Sky for a given geometry, aerosol type and aerosol optical depth.The ρ Sky is then resampled to the sensor's relative spectral response and is transferred to the top-of-atmosphere (using the part after the plus sign of Eq. ( 3)) for integration in the atmospheric correction.Figure 4 shows an example of the modelled ρ Sky at the surface and at top-of-atmosphere for two aerosol models and a range of aerosol optical depths.

Matchups
Matchups were extracted from reference locations near the deployment towers, in order to avoid platform effects such as direct pixel contamination and shadows [49], as well as in-water wakes Fig. 4. ρ Sky for the Continental and Maritime aerosols at three optical depths (τ a at 550 nm), at surface level (solid lines), and at top-of-atmosphere (dashed lines).[3,27].For AAOT the location provided by [4] was used (45.3139°N,12.5083°E) and for RT1 the location was shifted about 90 metres east (51.2464°N, 2.9206°E).Matchups between satellite imagery passing the quality control and in situ measurements were divided in two classes depending on in situ data availability around the satellite overpass time: (1) matchups with two bounding in situ measurements, each within 20 minutes from the satellite overpass, (2) all other matchups within ±1 hour of the overpass, with or without bounding in situ data.In situ measurements at relative azimuth of 270 °were used for both sites.The matching L w and E d were convoluted to the relative spectral response of the satellite sensor (Fig. 3) and converted to water reflectance (ρ w ) using Eq. ( 2).For (1) the in situ ρ w were interpolated linearly to the satellite overpass time, and for (2) the closest measurements were used.Satellite measurements are the mean of a 3x3 pixel box centred on the pixel containing the reference location.Matchups were filtered automatically to exclude cloudy and partially cloudy scenes.The use of automated filtering is preferred to hand-picking of validation scenes.116 potential matchups were identified for L8/S2, 67 with and 49 without bounding in situ data.Some image filtering is inherent to the requirement of having available in situ data, and these remaining scenes were further filtered, excluding scenes according to the following criteria: • 95th percentile ρ t 1.3 µm > 0.005 in a 11x11 pixel box centred on the site location, excluding cirrus clouds (excluding 21/116 scenes, 18%), • 95th percentile ρ t 1.6 µm > 0.05 in a 11x11 pixel box centred on the site location, excluding clouds and severe glint (excluding 18/116 scenes, 16%), • 95th percentile ρ t 440 nm > 0.3 in the ROI, excluding (scattered) clouds in the area surrounding the site, (excluding 13/116 scenes, 11%), 111 potential matchups were identified for PS, 51 with and 60 without bounding in situ data.Filtering was done using the same criteria as L8/S2 for the Blue band (excluding 10/111 scenes, 9%), but, due to the more limited band set, replacing steps (1) and ( 2) by the 95th percentile ρ t in the Red band > 0.14 in a 11x11 pixel box on the matchup location (excluding 23/111 scenes, 21%).The PS filtering was less robust than that for L8/S2, and several scenes with cirrus clouds and cloud shadows are included in the analysis.
Reduced Major Axis (RMA) regression lines and squared correlation coefficients are provided.Error statistics for ρ w were computed as the Root Mean Squared Differences (RMSD) between the in situ (x) and satellite (y) measurements, the Mean Average Differences (MAD), and Mean

Results and discussion
In the period up to 2020-07-15, 58 and 24 matchups were found between PANTHYR and S2 (30 and 28 from S2A and S2B) and L8, of which 34 and 18 are from AAOT and 24 and 6 from RT1.Both sites are covered by two relative orbits by both satellites, and the lower number of matchups retrieved for RT1 compared to AAOT are due to the shorter deployment of the former (7 compared to 9 months) and the higher cloudiness in the Belgian coastal zone.Scatterplots of the matchups between S2 and PANTHYR for the DSF+NoSWIR+SkyTOA processing are shown in Fig. 5, and between L8 and PANTHYR in Fig. 6, with points coloured according to site (orange: RT1, blue: AAOT).Matchups with bounding in situ data are represented by a circle, the ones without by a triangle.The error bars on the scatter plots represent half the absolute difference between the measurements bounding the satellite overpass for the in situ data, and the standard deviation in a 3x3 pixel box for the satellite data.The matchups between satellite and in situ are generally along the 1:1 line, except in the Blue bands, where a lower (RT1) or higher (AAOT) estimate of the water reflectance is retrieved by the satellite compared to the in situ data.This can be caused by the over-or underestimation of Blue path reflectance, e.g. because of a differences between the aerosol model and optical thickness and the true aerosol properties.Some uncertainty arises from the sky reflectance estimate for both satellite and in situ measurements.The above water in situ measurement is quite sensitive to contamination by sky reflectance in the Blue, and some differences between relative azimuths were observed for PANTHYR at AAOT (see further).The correction of in situ above water measurements could be further improved by taking into account polarisation [50], spectral variability of the surface reflectance coefficient [51], and separating diffuse sky and direct sun components [52].Figure 7 shows the Landsat 8 Panchromatic band (resampled to 30 m) and the derived Orange contra-band.The Orange contra-band represents the reflectance in a spectral region centred on 613 nm (width 47 nm) that can be extracted from the Panchromatic band on OLI.A multilinear model based on in situ reflectance measurements is used to subtract the Green and Red bands, that overlap the Panchromatic region, and a Blue portion not covered by either overlapping band from the Panchromatic reflectance [13].Overall, the Panchromatic band reflectances are in between the Green and Red band performances, with good correspondence even in clearer waters at AAOT.For the Orange band, the clearer AAOT ρ w are underestimated, likely due to the importance of Blue reflectance not accounted for in the contra-band approach [13].Excluding matchups outside the algorithm domain (ρ w 443 nm / ρ w 665 nm < 2, ρ w 665 nm > 0.006) reduces the relative errors.Naturally, for the other bands limiting the matchups to these more turbid waters also reduces the relative errors.A summary of the results is presented as spectral plots of the RMSD, MAD, and MARD between the S2/L8 and PANTHYR matchups in Fig. 8.These spectral plots show that the RMSD increases from the NIR (0.005) to the Blue (0.015-0.02), consistent with increasing atmospheric path reflectance and associated a/c errors.The MAD between in situ and S2/L8 are rather flat spectrally with a few configurations showing a clear bias in the Blue: the DSF+GC+60m and DSF+NoSWIR without the new sky correction overestimates the atmospheric contribution (i.e.negative bias in the Blue), while the DSF+NewLUT+SkyTOA underestimates the atmospheric contribution in the Blue (i.e.positive bias in the Blue).The MARD shows that relative differences are lowest (<20-30%) for the bands with highest reflectances, i.e. the Blue-Green bands at around 490 and 560 nm, and highest (40-70%) for the NIR bands (>700 nm).Since the RMSD is rather flat spectrally for wavelengths >560 nm, the MARD increase towards the longer wavelengths is largely caused by the lower water signal.The relative differences are in general lower for L8 than for S2, with all but two methods having <20% MARD in the Blue-Green bands and the best performing method reaching 11%.The MARD increases to about 40-100% for the Orange contra-band retrieval (L8, 613 nm, triangles), which was originally developed for brighter waters, and the clear water matchups (14/24) in the present study reduce the accuracy.For the points within the algorithm domain [13] the Orange band retrieval improved to 8-33% MARD (Fig. 7), giving especially good performance for the methods including the SWIR bands.The L8 Panchromatic band reflectance (592 nm) is found to be in between the accuracy of the Green and Red bands, with MARD of 17-26% and RMSD of 0.006-0.008,indicating it is well calibrated and could be used for water quality applications.For S2, the MARD reaches 20% in the Green band for several settings, and 11% for the method excluding the SWIR bands.For most methods, the RMSD are lower for L8 than for S2, and with a flatter RMSD spectrum between 490 and 865 nm.The L8 error spectra are lower and also more tightly bundled, indicating a larger stability between the processing settings.The low L8 RMSD values are approached only by S2 for the processing excluding SWIR bands and using the new sky correction.The processing without the SWIR bands gives the lowest MARD and RMSD for S2, where differences are especially pronounced in the NIR.Excluding the SWIR bands from the estimation of path reflectance shows the biggest improvement for S2, indicating perhaps a calibration issue [53] with those bands on MSI.For L8, excluding the SWIR bands gives comparable results to the standard DSF, with slightly higher errors, and an underestimation of the water reflectance in the visible bands not present in the standard DSF (as seen by the larger negative bias).Overall these results indicate the robustness of the DSF to band availability, but that for L8, keeping the SWIR bands gives best results.Although it gives the lowest and flattest MAD, the application of the new sky reflectance correction gives mixed performance results in terms of RMSD and MARD, with the largest positive impact at longer wavelengths for the methods excluding the SWIR bands.These statistics give a performance overview for both sites combined, but due to different magnitude and range in the average water reflectance, the relative performance at each site can be quite different.For example, the S2/L8 Green to Red reflectance MARD from DSF+NoSWIR+SkyTOA is 10-18% for RT1, and 14-45% for AAOT, while the NIR (at 865 nm) ranges are 43-70% for RT1 and 54-90% for AAOT.There are some impacts of ROI size, with the 12x12 km ROI giving a RMSD approximately 0.003 higher than the 3x3 km for both L8 and S2, and a positive MAD across the spectrum.This indicates that a larger ROI leads to a too low path reflectance, as there are darker pixels in the larger subset.Interestingly, the 1x1 km ROI gives a RMSD about 0.003 lower in the longer wavelengths compared to the 3x3 km ROI, while a higher RMSD (by about the same amount) is found for the Blue and Green bands.The MAD in the Blue and Green bands shows a larger negative bias for the 1x1 km ROI compared to the 3x3 km ROI, while for the NIR bands the 3x3 km ROI bias is closer to zero.A smaller ROI seems appropriate if, combined with the sensor resolution, there are enough pixels to retrieve a good path reflectance estimate.For processing subscenes with a fixed path reflectance these differences in box size have minimal impact on computing time.For tiled processing of full scenes however, smaller tiles increase computation time significantly.These results indicate that a 12x12 km tiling can be sufficient for processing larger scenes, giving a fourfold performance increase compared to the 6x6 km tiling used in [4].
The use of fixed defaults instead of ancillary inputs for atmospheric gases and pressure slightly increases the differences, especially in the NIR where there are more gas absorption features in the bands, but there is no large impact on the overall performance.Both validation sites are at sea level, so no large impacts of atmospheric pressure fluctuations are expected.
In the period up to 2020-07-15, 88 matchups with the PlanetScope constellation were found, 10 from the 0e series and 78 from the 0f series, of which respectively 4 and 48 were from different satellites (Fig. 9).Of these 88 matchups, 52 are from RT1 and 36 from AAOT.Here, more matchups are found for the former site as a result of its proximity to land, and the PlanetScope acquisition plan focusing on land masses.The differences between PlanetScope and PANTHYR are larger than for L8/S2, with the RMSD generally > 0.01 across the spectrum, and the MARD for the best performing bands (Green and Red) is <40% for the standard DSF, and 33-65% for the DSF with the new sky reflectance method (Fig. 10).The RMSD is lower for the 0e family, due to a low number (10) of matchups that happen to have a better correspondence to in situ measurements, and hence the combined PlanetScope performance shows a closer alignment to the performance found for the 0f family.The MAD results show a systematic overestimation of NIR reflectance, with underestimation of the visible reflectances, which is likely the result of the relatively poor NIR band performance [5].Some cloud and cloud shadow contamination is present for these matchups, that could not be filtered out based on the automated matchup criteria.Only minor differences are found also here for the updated lookup tables, and the previous tables are deemed adequate for PS processing.The new sky reflectance correction provides a small additional improvement in the NIR, but shows worse results in the visible bands.Overall performance statistics are impacted by the different characteristics of both sites, with the PlanetScope data clearly being less suitable for the lower turbidity site.Assessing the more turbid RT1 site separately, relative errors for the standard DSF in the Green and Red bands drop from 24-40% to 18-29%, while errors increase to 32-57% when assessing AAOT separately.For the new sky reflectance correction similar patterns are observed, with Green and Red band MARD 27-40% for RT1 and 42-100% for AAOT.
At AAOT, PANTHYR measurements were made at two relative azimuth angles to the sun (225°a nd 270°), but data from a single relative azimuth angle (270°) were used.Some differences were observed between measurements from the two azimuth angles, especially for the up-and downwelling radiances (L u and L d ), as different parts of the sky were measured and reflected on the air-water interface.There may also be some impacts of tilt in the installation of the prototype PANTHYR at AAOT.This was experimentally investigated on 2020-02-28: E d was measured every 20°for a full azimuthal rotation throughout the day, so for a range of sun zenith angles, θ s .This experiment was supplemented by comparison of E d measurements made at different azimuth throughout the deployment.On the basis of these tests, the E d uncertainty due to tilt is estimated at 2.5-6.5% for the Landsat and Sentinel-2 matchups with AAOT used here (θ s 40-70°with a mean of 61°).For the VNIR OLI and MSI bands, these differences in E d , L u , and L d resulted in average differences in L w of <2.5% for all but the shortest blue bands around 443 nm, which was just over 3%.For PlanetScope the average differences in L w were <2%.For all sensors, this resulted in ρ w differences between the two azimuths of <6% at 443 nm, <2.5% for the 490-670 nm bands, and 5-7% in the NIR.Lowest differences (<0.25%) were found for the bands around 560 nm.These differences are acceptable for the present application of validating high resolution broad band sensors, but their causes need to be further analysed.In situ measurements at multiple azimuths may lead to improvements in deployment evaluation in terms of sensor verticality or platform impacts [54].The estimation of the effective sky reflectance factor (ρ f ) may need to be improved.With the present dataset, an iterative fitting of the ρ f (in Eqs. ( 1) and ( 2)) to match the NIR ρ w to the similarity spectrum [45] made the results from both azimuth angles agree to within 2% across the VNIR, and retrievals were closest to the original 270°data used here.The 270°data matched more closely with AERONET-OC measurements made from the same platform, likely due to the same relative azimuth to the sun of the measurements.

Conclusions
In a short period of time, automated hyperspectral radiometers deployed in the North (7 months) and Adriatic Sea (9 months) have collected a valuable set of validation data for satellite sensors spanning the visible to near-infrared (400-900 nm).By measuring every 20 minutes throughout the day, these radiometers collect high frequency data that can closely match any satellite overpass, even those with irregular orbits or overpass times (e.g.PlanetScope).The two PANTHYR systems used here represent the start of the global WATERHYPERNET hyperspectral validation network, which will be invaluable for validation of any VNIR imaging satellite.
The present study updates the evaluation of the Dark Spectrum Fitting (DSF) a/c algorithm available in ACOLITE for the processing of decametre-(Landsat and Sentinel-2) and metre-scale (various, including the PlanetScope CubeSats) optical satellite sensors.Not surprisingly, the relative performance is linked to the absolute signal in each band, and the best performances are found for the bands with the highest signal i.e.MARD with in situ for the Blue and Green bands are <20%, reaching 11% for the best performing configuration.For PlanetScope the MARD are 24-40% for the Green and Red bands across both sites, and 18-29% for the turbid water site, confirming the applicability of these 4 band CubeSats for the mapping of water turbidity.RMSD between satellite and in situ are < 0.015 for Landsat 8 and Sentinel-2, with RMSD < 0.01 at wavelengths >= 490 nm (L8) and >= 560 nm (S2), reaching <0.005 in the best case for the NIR bands.MAD for L8/S2 are spectrally rather flat, with some settings giving either a negative or positive bias towards the Blue, indicating over-or underestimation of the path reflectance, or the selection of a wrong aerosol model.RMSD for PS are in general close to 0.015, with a slight increase to the Blue and NIR.
These results show a better performance than the previously published results [4,5] in terms of relative errors, and a slightly worse performance in terms of absolute errors, which can be explained by the lower data ranges in the previous studies.In addition, the present study provides a realistic performance estimate for the Red, Red-Edge and NIR bands due to the more appropriate hyperspectral validation data from the PANTHYR.Configuration options of the DSF were evaluated using matchups from both sites.The use of ancillary data did not have a large impact over using default values, and increasing the wavelength resolution in the atmospheric lookup tables only had minor effects.The size of the region of interest (1x1, 3x3, or 12x12 km) in which the path reflectance was estimated did not show large differences, although the 3x3 km subset provided the lowest differences with in situ data.The new top-of-atmosphere sky reflectance correction improved results in terms of MAD across the spectrum, and reduced errors at longer wavelengths for the method excluding the SWIR.Severe glint is largely avoided here by the deployment period and the SWIR based image filtering, so no large differences were found for methods including sun glint correction.
Based on the results from this study, the DSF+NoSWIR can be recommended for both processing of L8 and S2, especially with the new top-of-atmosphere sky reflectance correction.The standard DSF can however still provide better performance for L8 in certain conditions for the Blue and Green spectral bands.If the S2 SWIR calibration is improved, there may be further improvements for the standard DSF.The new atmosphere lookup tables did not improve the results significantly, but will be made available to the users, and could perhaps be of further use for processing narrowband or hyperspectral imagery.The new lookup tables and sky reflectance correction did not really show a performance improvement for PS, although for the NIR some improvements over the standard DSF were found.

Fig. 3 .
Fig. 3. Relative Spectral Responses of (top left) Sentinel-2 A/B MSI, (top right) Landsat 8, and (bottom) PlanetScope.Lines are coloured to approximate the band centre wavelength.NIR bands are shown in Black, the 10 m MSI NIR and the OLI Panchromatic bands are shown in Grey.The OLI contra band is shown as a dashed line.

Fig. 5 .
Fig. 5. Scatterplots for the matchups between PANTHYR and the nine visible and near-infrared bands Sentinel-2 A/B for DSF+NoSWIR+SkyTOA.Orange and blue dots are measurements from RT1 and AAOT respectively.Circles represent matchups with bounding and interpolated in situ data, triangles show matchups with only the closest in situ measurement.

Fig. 6 .
Fig. 6.Scatterplots for the matchups between PANTHYR and the 5 VNIR bands on Landsat 8/OLI (DSF+NoSWIR+SkyTOA).Orange and blue dots are measurements from RT1 and AAOT respectively.Circles represent matchups with bounding and interpolated in situ data, triangles show matchups with only the closest in situ measurement.

Fig. 7 .
Fig. 7. Scatterplots for the matchups between PANTHYR and Landsat 8/OLI (DSF+NoSWIR+SkyTOA) for the Panchromatic band (592 nm) resampled to 30 m, and the Orange contra-band [13] (613 nm), excluding from the statistics points outside the algorithm domain (shown in Grey).Orange and blue dots are measurements from RT1 and AAOT respectively.Circles represent matchups with bounding and interpolated in situ data, triangles show matchups with only the closest in situ measurement.

Fig. 8 .
Fig. 8. Spectral plots of RMSD (top), MAD (middle) and MARD (bottom) for the matchups between PANTHYR and Sentinel-2 (left) and Landsat 8 (right) for different processing settings.Triangles in the Landsat 8 plots represent the orange contra band.

Fig. 9 .
Fig. 9. Scatterplots for the matchups between PANTHYR and the four bands on PlanetScope satellites from both 0e and 0f families (19 different satellites).Orange and blue dots represent RT1 and AAOT respectively.

Fig. 10 .
Fig. 10.Spectral plots of RMSD (top), MAD (middle) and MARD (bottom) for the matchups between PANTHYR and PlanetScope.Different lines denote different processing settings.The grey line shows the combined performance, at the average of 0e and 0f wavelengths.