Estuarine, Coastal and Shelf Science Validation of Landsat 8 high resolution Sea Surface Temperature using surfers

Nearshore coastal waters are highly dynamic in both space and time. They can be difficult to sample using conventional methods due to their shallow depth, tidal variability, and the presence of strong currents and breaking waves. High resolution satellite sensors can be used to provide synoptic views of Surface Temperature (ST), but the performance of such ST products in the nearshore zone is poorly understood. Close to the shoreline, the ST pixels can be influenced by mixed composition of water and land, as a result of the sensor’s spatial resolution. This can cause thermal adjacency effects due to the highly different diurnal temperature cycles of water bodies and land. Previously, temperature data collected during surfing sessions has been proposed for validation of moderate resolution (1 km pixel size) satellite ST products. In this paper we use surfing temperature data to validate three high resolution (100 m resampled to 30 m pixel size) ST products derived from the Thermal InfraRed Sensor (TIRS) on board Landsat 8 (L8). ST was derived from Collection 1 and 2 Level 1 data (C1L1 and C2L1) using the Thermal Atmospheric Correction Tool (TACT), and was obtained from the standard Collection 2 Level 2 product (USGS C2L2). This study represents one of the first evaluations of the new C2 products, both L1 and L2, released by USGS at the end of 2020. Using automated matchup and image quality control, 88 matchups between L8/TIRS and surfers were identified, distributed across the North-Western semihemisphere. The unbiased Root Mean Squared Difference (uRMSD) between satellite and in situ measurements was generally < 2 K, with warm biases (Mean Average Difference, MAD) of 1.7 K (USGS C2L2), 1.3 K (TACT C1L1) and 0.8 K (TACT C2L1). Large interquartile ranges of ST in 5 × 5 satellite pixels around the matchup location were found for several images, especially for the summer matchups around the Californian coast. By filtering on target stability the number of matchups reduced to 31, which halved the uRMSD across the three methods (to around 1.1K), MAD were much lower, i.e. 1.1 K (USGS C2L2), 0.6 K (TACT C1L1), and 0.2 K (TACT C2L1). The larger biases of the C2L2 product compared to TACT C2L1 are caused as a result of: (1) a lower emissivity value for water targets used in USGS C2L2, and (2) differences in atmospheric parameter retrieval, mainly from differences in upwelling atmospheric radiance and lower atmospheric transmittance retrieved by USGS C2L2. Additionally, tiling artefacts are present in the C2L2 product, which originate from a coarser atmospheric correction process. Overall, the L8/TIRS derived ST product compares well with in situ measurements made while surfing, and we found the best performing ST product for nearshore coastal waters to be the Collection 2 Level 1 data processed with TACT.


Introduction
The temperature of Earth's surface drives heat exchange between the surface and atmosphere and has important implications for climate in general. Surface temperature is strongly linked to water availability and water use, evapotranspiration, severe weather, and the growth and metabolic rates of organisms, among other processes. Ocean temperature change is described in the 5th Assessment Report (AR5) of accessed, 2021-04-10) and an Essential Climate Variable (https://gcos. wmo.int/en/essential-climate-variables/, accessed 2021-04-10). In the aquatic environment, thermal stratification of the water column impacts nutrient and oxygen cycling and is strongly linked to the rise and collapse of phytoplankton blooms of different species (Jones and Gowen, 1990;Wilhelm and Adrian, 2008;Murphy et al., 2011;Martinez et al., 2011) and in shallow well-mixed waters temperature is a determining factor of phytoplankton productivity (Trombetta et al., 2019). The recent proliferation of (harmful) algal blooms can be linked to eutrophication and increasing temperatures (Davis et al., 2009;O'Neil et al., 2012), while falling sea ice coverage and increased Atlantic water inputs are shifting the phytoplankton bloom dynamics in the warming Arctic (Oziel et al., 2017;Neukermans et al., 2018).
Orbital sensors are the only reasonable way to consistently assess surface temperatures (ST) at local to global scales, and as a result satellite derived ST has found many aquatic and terrestrial applications. Moderate resolution imagery was used to construct global climate records of open ocean ST, e.g. Reynolds SST (Reynolds et al., 2002) and the European Space Agency Climate Change Initiative (SST CCI) (Merchant et al., 2014(Merchant et al., , 2019) that serve as inputs to climate models and various other applications. Higher resolution imagery from Landsat has been used for mapping of river plumes (Brando et al., 2015) and the thermal effluent from power plants (Ingleton and McMinn, 2012) or treated wastewater outfall pipes (Trinh et al., 2017). In the terrestrial realm, urban heat islands have been studied using thermal satellite data (Yuan and Bauer, 2007;Liu and Zhang, 2011), and the occurrence of drought (Wan et al., 2004) and fire (Kaufman et al., 1998;Giglio et al., 2003) are regularly mapped. Evapotranspiration can be estimated with the aid of remotely-sensed ST, and Landsat in particular has the capability to resolve water use at the field level (Anderson et al., 2012(Anderson et al., , 2018Senay et al., 2016). The Copernicus Land Service produces hourly land ST at relatively coarse spatial resolution (5/112 • grid cell size) derived from multiple geostationary satellite observations (e.g. Freitas et al., 2013), that can represent the diurnal temperature cycle, and provides opportunities for improving state of the art land surface models (Orth et al., 2017;Nogueira et al., 2020).
Various methods exist for retrieving ST from satellite sensors based on observations either in single or multiple channels in the Thermal InfraRed (TIR). Single channel, single window, and split-window methods tend to show similar performance, but have different requirements for ancillary inputs (Sekertekin and Bonafoni, 2020). Single channel methods need a representative atmospheric profile and a set of radiative transfer simulations to derive the downwelling and upwelling atmospheric radiance and the atmospheric transmittance (Barsi et al., 2003;Cook, 2014). Knowing the target emissivity and the surface emitted radiance allows for the computation of ST. Alternatively, single channel algorithms -in this configuration also called single window or mono-window algorithms -can be based on pre-generated calibration coefficients for ranges of Total Columnar Water Vapour (TCWV). These algorithms may be simpler to deploy and may be more computationally efficient as no radiative transfer code has to be run, and can be readily integrated in cloud computing platforms . Splitwindow algorithms similarly use a set of pre-generated coefficients and two thermal bands to derive ST based on an estimate of TCWV and boundary layer temperature (Wan and Dozier, 1996). For Landsat 8, the use of single band methods using band 10 is preferred, due to stray light issues, especially in band 11 (Barsi et al., 2014;Montanaro et al., 2014a,b;Gerace and Montanaro, 2017), and for compatibility with heritage Landsat sensors (Cook, 2014;Malakar et al., 2018;Sekertekin and Bonafoni, 2020).
As the accuracy of the downstream products depends on the accuracy of the input ST, validation of the satellite derived ST using in situ measurements is of crucial importance, and is done either using matchups with contact thermometers Malakar et al., 2018;Vanhellemont, 2020a,b) or infrared radiometers (García-Santos et al., 2018;Malakar et al., 2018;Sekertekin, 2019;Vanhellemont, 2020b;Sekertekin and Bonafoni, 2020), e.g. from the SURFRAD network (Augustine et al., 2000). Due to the rather consistent temperature within a satellite pixel footprint and the well known emissivity of water, water targets are often preferred to the much more variable land targets. As was shown in Vanhellemont (2020a) the water ST derived from Landsat 8 compared well with in situ measurements made at measurement towers and buoys in the Belgian coastal zone (bias of < 0.1 K and scatter of < 0.7 K), but these are located several km from the coast and hence results may not be representative for nearshore waters. Recently, the use of benthic loggers  and loggers integrated into sports equipment (Brewin et al., 2015(Brewin et al., , 2017b(Brewin et al., , 2020b has been proposed for validation of nearshore satellite ST. The bulk temperature as measured by these accurate, affordable and easily deployed contact thermometers was found to be ≤ 0.13 K warmer compared to the radiative skin temperature measured by an Infrared Sea surface temperature Autonomous Radiometer (ISAR) (Brewin et al., 2021), rather consistent with the −0.17 K bulk-to-skin difference found in other studies (Donlon et al., 1999).
In the present paper, temperature collected during surfing sessions is used for the validation of high resolution imagery from Landsat 8. ST were obtained from the new Landsat Collection 2 Level 2 products, as well as processed with the Thermal Atmospheric Correction Tool (TACT, Vanhellemont, 2020a,b) using Level 1 data from both Collection 1 and Collection 2. The Collection 2 products provided by USGS contain ST and Surface Reflectance (SR) data, and are produced with their internal Landsat Product Generation System (LPGS) versions 15.3 and later. TACT is an open source processor for deriving ST from Landsat sensors (Landsat 5, 7, and 8), based on the libRadtran (Emde et al., 2016) radiative transfer code, that can be run using various atmospheric profile inputs. It is integrated into ACOLITE and freely available from https://github.com/acolite/acolite. The main advantage of TACT over Collection 2 data is the public availability of the code, and the use of a free radiative transfer code. In this study, the performance of these different methods is quantified using in situ matchups. To the best of our knowledge, this work presents a first independent intercomparison of Level 1 thermal data from both collections, as well as an evaluation of the standard Level 2 ST product for retrieving (Water) Surface Temperature.

In situ data
In situ measurements were obtained from: (1) surfing sessions in the southern UK, Ireland, and San Diego, California using Tidbit v2 loggers attached to the surfboard leash ( Fig. 1; Brewin et al., 2020a,b), and (2) surfing sessions in various locations using a Smartfin ( Fig. 1; Bresnahan et al., 2017), with temperature, motion and GPS data downloaded from the Smartfin website (https://smartfin.axds.co/). The leash measurements were processed as described in Brewin et al. (2020b), providing geolocated, median sea surface temperature data, with uncertainties, for each surfing session. The data are freely available through Brewin et al. (2020a) Smartfin data were processed following a method adapted from Brewin et al. (2020b). Firstly, Smartfin data were downloaded through the Smartfin Application Programming Interface (API, at https://stage. platforms.axds.co), with example scripts for accessing these data provided at https://github.com/SUPScientist/Smartfin_data_via_Axiom_API. The API provided a combined data file of motion (accelerometer) and ocean (temperature) data at 1/6 Hz. Motion data in the combined data file were subsampled by the data provider, from their original measurement frequency at 6 Hz, to the temperature measurement frequency. The following steps were taken to remove data that were not collected in the water. The rate of change in temperature was computed using the external temperature sensor (located on the tip of the fin), as was the difference between the internal and external temperature sensor Fig. 1. Photo of equipment used by surfers to measure water temperature. Tidbit sensors (shaded (white) and unshaded sensor) on leash of surfboard, and Smartfin, with integrated environmental sensor package (surfboard fin closest to the leash). Detailed descriptions of the system are provided by Brewin et al. (2015Brewin et al. ( , 2017bBrewin et al. ( , 2020b). on the Smartfin. Data were flagged when the rate of change exceeded 0.05 K, and where the temperature difference between internal and external temperature sensors was greater than 0.5 K, or less than −0.8 K. For cases where motion data were available, data were flagged when the pitch angle of the accelerometer (smoothed with a 5-minute filter) fell outside predetermined bounds (i.e., making use of the fact that a surfboard is relatively flat when being surfed in the water). Finally, the first and last 5 min of the temperature data were flagged (i.e., typically when the surfer is entering and exiting the water). All data were visually inspected, and for four surfing sessions (where there were no motion data available), additional measurements were flagged manually, that were clearly not characteristic of measurements collected in the ocean at the time. Having identified measurements when the Smartfin was in the water, the median of the temperature data from the external temperature sensor was computed and taken to be the water temperature for the surfing session. Uncertainties were also approximated for each surf, by taking the square root of the sum of the squared calibration error of the Smartfin (set to 0.05 K, based on laboratory tests in Brewin et al. (2020a)) and the squared median absolute deviation in temperature during each surfing session. Following this processing, and consistent with the leash measurements, geolocated, median sea surface temperature data, with uncertainties, were available for each Smartfin surfing session.
Comparisons between leash-based Tidbit and Smartfin measurements have shown excellent agreement, with no systematic differences (after applying a correction for solar heating in the leash processing) and with a mean absolute difference of around 0.07 K (for 141 surfing sessions, see Brewin et al. (2020b)). Both Smartfin and leash data have also been found to be in very good agreement with independent data collected from benthic and pier measurements (Brewin et al., 2015(Brewin et al., , 2017b(Brewin et al., , 2020b. The Smartfin has also been evaluated against a standard oceanographic temperature instrument (Seabird SBE38) on an oceanographic voyage through the Atlantic (Brewin et al., 2021), spanning a gradient in sea surface temperature (SST) of 19 K, and found to be in excellent agreement (mean differences and mean absolute differences of −0.01 and 0.06 K, respectively). For the combined leash and Smartfin data, bulk temperatures were adjusted to skin temperatures using a fixed −0.17 K offset (Donlon et al., 1999).

Satellite data
Landsat 8 (L8) has two sensors on board (Irons et al., 2012), the Operational Land Imager (OLI), a 9-band multispectral instrument covering the visible (VIS), near infrared (NIR), and short-wave infrared (SWIR), and the Thermal Infrared Sensor (TIRS) with two bands in the thermal infrared at around 10 and 11 μm. The OLI multispectral data is recorded at 30 m spatial resolution in 8 channels, and at 15 m resolution in the panchromatic channel. TIRS data is recorded at 100 metre resolution, and is resampled to the OLI 30 m grid during Level 1 processing and geo-orthorectification. Level 1 (L1) top-of-atmosphere data from L8 is currently available in two collections, Collection 1 (C1) and Collection 2 (C2). Released by the United States Geological Society (USGS) near the end of 2020, C2 for the first time includes standard Level 2 (L2) surface reflectance (SR) and surface temperature (ST) products (USGS, 2020). For L8/TIRS, the C2L1 processing contains several enhancements compared to C1L1, notably a change in absolute calibration, to reduce radiometric calibration errors after the 2017 stray light correction (Gerace and Montanaro, 2017), and new relative (detector-to-detector) gains to reduce along track striping that progressively got worse during the mission (https://www.usgs.gov/core-science-systems/nli/landsat/landsatcollection-2-level-1-data, accessed 2021-04-10). C1L1 data were obtained from Google Cloud Services (https://cloud.google.com/storage/ docs/public-datasets/landsat), and C2L1 and C2L2 data were obtained through the USGS EarthExplorer (https://earthexplorer.usgs.gov). L1 TIRS data from both C1 and C2 were processed using TACT (Vanhellemont, 2020a,b) using atmospheric profiles from the 0.25 degree ERA5 climate model, with radiative transfer results at the 0.25 degree spacing linearly interpolated to the individual 30 m pixels. All three ST (TACT C1L1, TACT C2L1, and USGS C2L2) are single band products derived from band 10 (B10), at around 10 μm. The top-of-atmosphere (TOA) radiance ( ) in B10 is a measurement of the surface radiance ( ) and atmospheric up-( ) and down-welling ( ) radiances ( ⋅ −2 ⋅ −1 ⋅ −1 ) transmitted through the atmosphere: where is the target emissivity, and the atmospheric transmittance. TACT (both C1L1 and C2L1) and USGS C2L2 use radiative transfer models to derive from , respectively libRadtran (Emde et al., 2016) and MODTRAN (Berk et al., 1999), by deriving the atmospheric parameters ( , , ) based on atmospheric profiles, with details provided in Vanhellemont (2020a,b), Cook et al. (2014), and Malakar et al. (2018). C2L2 uses emissivity values based on the ASTER Global Emissivity Dataset (Hulley et al., 2015) adjusted by the OLI derived Normalised Difference Vegetation and Snow Indices (NDVI and NDSI) for land pixels, and uses a fixed emissivity of 0.9880 for water pixels. TACT is here configured to use a fixed water emissivity value of 0.9926 for B10 (Vanhellemont, 2020a). With the atmospheric parameters and target emissivity known, can be inverted from Eq. (1), and is then converted to temperature (in K): where the L8/TIRS B10 specific coefficients are 1 = 774.8853 W m −2 sr −1 μm and 2 = 1321.0789 K. For most plots, ST were converted from K to • C (using an offset of −273.15) for familiarity with the encountered temperature ranges. L1 OLI TOA reflectance ( ) data were processed using ACOL-ITE (Vanhellemont, 2019(Vanhellemont, , 2020c to retrieve surface reflectances ( ) using the Dark Spectrum Fitting (DSF) method with a fixed atmospheric path reflectance over a 24 ×24 km region of interest centred on the in situ data locations. OLI and were used for quality control of matchup data (see next section).

Matchups
Same-day measurements made in situ and by satellite were identified as matchups. A 5 × 5 pixel box (at the 30 m OLI grid) centred on the pixel containing the in situ coordinate was extracted from the satellite imagery. These boxes were then quality controlled using simple thresholds on the OLI reflectance. Pixels were masked if the top-ofatmosphere reflectance ( ) in any band was greater than 0.3, rejecting land, clouds, high glint and floating objects or vegetation. Further tests were done on the 1.3 μm cirrus band, masking pixels where (1.3 μm) > 0.01 to reject cirrus clouds, and the 1.6 μm band, masking pixels where (1.6 μm) > 0.1 refining the test on floating objects and high glint. Pixels with surface reflectance ( ) greater than 0.15 in any band were also masked, rejecting atypical water spectra.
The matchups have varied fractions of masked pixels (MP, 0 to 24; boxes with 25 MP were rejected) and interquartile range (IQR) of temperature (up to 4K) in the 5 × 5 pixel extracted box, and the effects of filtering based on these values was further explored. The median value (P50) and interquartile range (IQR, P75-P25) were computed for each 5 × 5 pixel box to compare with the in situ measurements. Robust statistics were computed, as commonly used in SST validation studies (Merchant and Harris, 1999;Minnett et al., 2019). The Reduced Major Axis (RMA) regression line was computed for the matchups, as were the Mean Average Differences (MAD) and unbiased Root Mean Squared Differences (uRMSD) between in situ and satellite: with RMSD:

TIRS ST
TIRS ST from the three datasets agree to within a few degrees for the water scenes evaluated here. Among the three, TACT C2L1 data gives the lowest temperatures, and USGS C2L2 the highest. The C2 data (both TACT and USGS) show less noise and image artefacts compared to the C1 data, due to the improved processing of the TIRS data during L1 generation. Example products are shown in Fig. 2 for Smartfin and Tidbit matchups, where the along track striping is rather obvious in the C1L1 data and much reduced in both C2 datasets. After TACT processing, i.e. with the same atmospheric correction parameters, the C2L1 data are generally a bit colder than the C1L1 data, as a result of the updated calibration in C2. A step change at around −117.5 • E can be observed in the C2L2 product in the second example of the Smartfin matchups (second row, last column of Fig. 2, 2017-08-05), here with a magnitude of about 0.5-1 K. A similar step change can be seen in the right hand side panel of the graphical abstract to this paper. These step changes are in fact present in all C2L2 scenes, and originate from the atmospheric parameters used to generate the ST, presumably as a result of a tiling grid used in the USGS atmospheric correction. TACT products use a per-pixel interpolated atmospheric correction which means these step changes are not present. C2L2 uses a lower B10 emissivity factor over water than TACT, which would partially explain higher ST for C2L2 compared with TACT.
Scatter plots of the full 88 matchup dataset are presented in Fig. 4. These show that the scatter in the matchups is similar for the three processing methods, with uRMSD between 1.7 and 2.0 K. This is largely a result of the inherent sensor and image characteristics and the spatiotemporal mismatch between the in situ and satellite measurements. The MAD on the other hand shows that the bias increases from TACT C2L1 (0.8 K), to TACT C1L1 (1.3 K) and USGS C2L2 (1.7 K), with all methods on average slightly warmer compared to the in situ data.
In these matchups, relatively large vertical error bars are seen, that represent the IQR in the 5 × 5 pixel box. These are caused by spatial variability in the 5 × 5 satellite data box, largely as a result of close proximity to land (see e.g. Martí-Cardona et al., 2019). Points with large IQR show a larger warm bias of the satellite measurement. Especially the summer matchups from the San Diego area show large IQR due to the large difference in surface temperatures between land (>50 • C) and water (most points between 15-25 • C). Fig. 5 shows two horizontal transects through the matchup location for two of the San Diego area scenes (24 km ROI). These plots show the sharp transition between colder water and warmer land surface temperatures, and the position of the matchup location just at this transition, with the associated high IQR. For validation purposes it is justifiable to select stable targets, and hence matchups with high IQR (> 0.25 K in any of the processors) were removed. Additionally, 5 × 5 boxes where less than half of the pixels do not pass the optical data quality check were also removed, i.e. only matchups with < 13/25 MP using the OLI criteria listed above are retained. The MAD is relatively stable as function of MP, but the uRMSD increases rapidly when more than 12-15 pixels are masked (not shown). The uRMSD increases rapidly for matchups where over half the pixels in the 5 × 5 pixel box are masked, i.e. matchups where the surfer is positioned at the landwater edge, and hence where half of the 5 × 5 pixel box contains land. This effect illustrates the difficulty of validating the nearshore ST, but also shows the advantage of using data logged by surfers with variable positions. The surfers are usually close to or at the land-water edge, but in about 1/3 of our matchups their position is far enough to allow for a good validation matchup, while still being in the nearshore zone.
With this more stringent filtering on the satellite data, the matchup dataset reduces from 88 to 31 (5 Tidbit and 26 Smartfin), with about the same temperature range. The statistics for this reduced dataset improve considerably. Table 1 summarises the full and reduced matchup Q. Vanhellemont et al. Fig. 2. Selected scenes for the matchups with Smartfin (top two rows, Costa Rica and California) and Tidbit (bottom two rows, UK summer and winter). Each row shows the OLI RGB composite (655, 561, 483 nm reflectance between 0 and 0.15 linearly scaled to 8 bits) and the products derived from C1L1 and C2L1 data using TACT and the USGS C2L2 standard product. Non-water pixels are masked in white using the matchup filtering thresholds.
datasets. Fig. 6 shows the scatter plots of the high quality matchups, showing a reduction of the scatter to about 1.1 K uRMSD, and MAD ranging from 0.2 K (TACT C2L1), to 0.6 K (TACT C1L1) and 1.1 K (USGS C2L2). The TACT processed data has slightly higher biases (MAD <0.2 K versus <0.1 K) and higher noise (uRMSD 1.1 K versus 0.7 K) compared to previous results using Collection 1 data for offshore in situ measurements at towers and buoys (Vanhellemont, 2020a). This is likely the influence of the very nearshore location of the present matchups. The bias may be impacted by the bulk-to-skin correction applied to the in situ measurements, where we subtract 0.17 K from the in situ data to obtain equivalent skin temperature, which curiously is about the magnitude of the C2L1 TACT bias (0.19 K). Previous work has suggested that the skin and Smartfin differences may be smaller than 0.17 K due to the shallow depth of the Smarfin measurements (< 0.1 m), i.e. 0.13 K overall, and 0.06 K during the day, when the surfers would be collecting data (Brewin et al., 2021). Brewin et al. (2021) also suggested that in some cases, in a turbulent nearshore environment, the skin cooling effect may disappear completely, which needs to be further investigated experimentally. RMA slopes for the matchup subset are 0.99 and 1.01 for TACT C1L1 and C2L1, while the slope of USGS C2L2 is > 1.07, indicating that for this product the bias increases with target temperature. On average the satellite data is biased warm, but the bias varies slightly with the target temperature. In Fig. 7 the difference between satellite and in situ measurement is shown as function of the target temperature. TACT shows a cold bias for most of the colder (<15 • C) and some of the warmer temperatures (>20 • C) encountered in the matchup dataset. USGS C2L2 only shows 4 points with negative biases, all for the colder (<15 • C) temperatures. For the higher temperatures (>20 • C) the bias of C2L2 is close to the dataset average bias. The rather triangular shape of the TACT results ( Fig. 7 A and B) is caused by a remaining warm bias for some of the San Diego matchups in the 15-25 • C range, presumably as a result of large water-land temperature differences (Fig. 5). This effect is less visible in the USGS (Fig. 7 C) dataset, since their average bias also increases with Q. Vanhellemont et al.    5. Two transects through the 5 × 5 median filtered ST product for the San Diego coast, illustrating the large difference between water (pixels 0-400) and land (pixels 400-800) temperatures. The dotted lines represent the 5 × 5 filtered IQR. Note that TACT is used here with a fixed emissivity of water, and could hence underestimate the land temperatures. Note also in the left plot the step-change at around pixel 80 in the USGS C2L2 product as a result of tiling in the atmospheric correction. target temperature. This bias increase is a result of the atmospheric parameters used in the USGS processing -see next paragraphs.
The differences between TACT C2L1 and USGS C2L2 data can be attributed to both the lower emissivity used by the latter, and by differences in the atmospheric parameters ( , , and ). When modifying the TACT processing to use the USGS C2L2 emissivity value, the MAD increases from 0.2 to 0.4 K (A panel of Fig. 8, and Table 1). Replacing the atmospheric parameters by those of USGS C2L2, leads to an increase of MAD from 0.2 to 1.0 K (B panel of Fig. 8). The substitution of both emissivity and atmospheric parameters (essentially a local processing of the provided C2L1 to USGS C2L2) leads to an increase of > 1 K in MAD compared to TACT C2L1 processing (C panel of Fig. 8). The MAD of this product (1.2 K) is a bit higher than that of the USGS C2L2 (1.1 K) presumably due to the discretisation of the atmospheric parameters Q. Vanhellemont et al. Fig. 6. Same as Fig. 4 but for the 31 matchups left after filtering on 5 × 5 box temperature stability (IQR < 0.25 K) and masking level (< 13/25 MP).  in the C2L2 GeoTiFF files. Atmospheric radiances are discretised to 1/1000, and the atmospheric transmittance and emissivity to 1/10000. The origin of the bias coming from the USGS atmospheric parameters is further analysed by comparing the retrieved , , and to those retrieved by TACT. Fig. 9 shows the atmospheric parameters from USGS C2L2, and also those retrieved from the Atmospheric Correction Parameter Calculator (ACPC, Barsi et al. (2003Barsi et al. ( , 2005, accessed 2021-03-20), compared to the ones derived in TACT for the full 88 matchups. The Ordinary Least Squared (OLS) regression line is used to estimate the relative differences between TACT and the other sources of atmospheric parameters. A significant difference between TACT and USGS C2L2 atmospheric correction parameters is found for the , where USGS values are about 30% of the TACT values. For , ACPC retrieves an about 10% higher value compared to TACT. However, the impact of changes in on the final ST product are minor due to the high emissivity of water (see Eq. (1)). The and are very comparable for ACPC and TACT, with OLS slopes close to unity and low offsets (see also Vanhellemont, 2020a). Compared to TACT, for USGS C2L2 a lower (−6%) and for most of the range a higher (offset +0.3 W m −2 sr −1 μ m −2 and slope 0.93) are found, both changes that lead to higher surface temperature retrievals. The underestimation of causes an additive increase of the Q. Vanhellemont et al. Fig. 9. A comparison of the atmospheric parameters retrieved for the 88 matchups. The Ordinary Least Squares regression line is shown as is the square of Pearson's r coefficient.

Table 1
Matchup performance of the different processors. n is the number of samples, and m the RMA regression slope. The matchup subsets are indicated by the number of matchups (n = 88 for all, and n = 31 for the highest quality matchups). Below the line results of changing the emissivity and atmospheric parameters in TACT to those from C2L2 are given. bias in ST, while the lower slope and non-zero offset for causes a multiplicative increase of the bias in ST. In the scatter plots compared to in situ measurements, these effects are reflected in a vertical shift upwards and an increased slope of the USGS C2L2 points. The result of the slope in is an increase of the bias as function of the target temperature. To summarise, the USGS C2L2 product has a higher warm bias compared to TACT, as a result of the lower emissivity value (causing around + 0.2 K in MAD), and the difference in atmospheric parameters (causing around + 0.8 K in MAD) used in C2L2 processing.

Processor Data Level Emissivity Atmosphere n MAD (K) uRMSD (K) m (1)
For a final comparison, matchups were also obtained using the single window calibration and code of Ermida et al. (2020). This code uses TCWV from the National Centers for Environmental Prediction (NCEP), and ASTER emissivity data, and can be run easily on the Collection 1 data hosted on Google Earth Engine (Gorelick et al., 2017). Valid results were obtained for 29 out of the 31 low-variability matchups, likely due to additional cloud filtering in the Ermida processing. For these matchups the statistics of the TACT and USGS processors are very similar to the ones obtained before, with the same scatter and slightly lower biases (Fig. 10). The Ermida code gives higher scatter (2 K) and higher biases (2 K) compared to the other processors, and hence is about 1 K warmer compared to USGS C2L2, and 1.5-2 K warmer compared to TACT. While its implementation in Google Earth Engine is convenient, this method cannot at present be recommended for nearshore water surface temperature retrieval.
The nearshore supports the highest levels of marine biodiversity and productivity in our ocean and is the region humans interact the most with. It is among the most dynamic regions on our planet, and requires observing systems capable of capturing this high spatial and temporal variability. High resolution satellite observations are a critical component of such observation systems, but to maximise their potential, they need to be carefully compared with in situ measurements, to quantify accuracy and precision, essential for use in marine management applications. Owing to difficulties in deploying conventional oceanographic equipment in the nearshore, new solutions to in situ data collection in the nearshore are urgently needed. Here, we have demonstrated how one such citizen-science based solution (Brewin et al., 2017a), is capable of identifying the most accurate high-resolution satellite processor. Other in situ solutions exist to, including benthic temperature loggers , coastal gliders (Rudnick et al., 2004), autonomous beach buoy systems (Shively et al., 2016) and the tagging of marine vertebrates with sensors (Fedak et al., 2004). Integrating all these observations, with satellite and models, into a wider nearshore observing system will ultimately help monitor and manage this critical region of our oceans, in the face of uncertain environmental change.

Conclusions
• Three surface temperature products derived from L8/TIRS were compared for retrieval of water surface temperatures: Collection 1 and Collection 2 top-of-atmosphere data as processed by TACT (TACT C1L1 and C2L1), and the USGS standard Collection 2 Level 2 product (USGS C2L2). In the most stable subset of matchup data, all three products showed a comparable uRMSD of 1.1 K, but for both C1 and C2 data, TACT gave lower biases (MAD respectively 0.6 and 0.2 K) compared to USGS C2L2 (1.1 K). An additional processor implemented on a cloud computing platform was tested, but gave significantly larger bias and scatter. • The higher bias of the USGS C2L2 product can be explained by differences in the emissivity value used over water (0.9880 for USGS C2L2 and 0.9926 for TACT), and differences in the estimated atmospheric parameters. The lower emissivity in the USGS C2L2 product adds around 0.2 K in MAD, and the difference in and lower combined add about 0.8 K in MAD. The multiplicative error of the change in results in a bias of C2L2 that increases with target temperature.
• The Collection 2 data shows less along-track striping in the output images compared to Collection 1 data. When Level 1 data is processed with TACT, Collection 2 shows lower biases compared to Collection 1 data. The standard Collection 2 Level 2 product shows a tiling grid in the atmospheric parameters and hence the output surface temperature, leading to step changes of about 0.5-1 K at the borders of this tiling grid. Overall, with these results Collection 2 data as processed by TACT show the most promising water surface temperature products from Landsat 8 with lowest biases and fewer processing artefacts. • Data collected using low-cost contact thermometers integrated in sports equipment show promise for validation of high resolution satellite temperature products, but the transition zone between land and water has to be handled with care, especially when one of the two is significantly warmer than the other. Filtering based on the concurrent optical data from OLI, and the interquartile range of surface temperature in a 5 × 5 pixel box reduces the impact of land proximity and mixed pixels on the validation statistics. Surfing data could be similarly useful for validation of nearshore data from new missions such as L9/TIRS-2, launched in September 2021, and future missions, such as the Copernicus Land Surface Temperature Monitoring mission (LSTM), a High Priority Candidate Mission.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements
TACT development was funded by the Federal Belgian Science Policy Office (BELSPO) under the BRAIN-be programme MICROBIAN project (BR/165/A1/MICROBIAN). Robert Brewin was supported by a UKRI Future Leader Fellowship (MR/V022792/1). The Lost Bird Foundation supported the development of the Smartfin project, including all Smartfin data acquisitions. The National Aeronautics and Space Administration (NASA) and the United States Geological Society (USGS) are thanked for acquiring and freely distributing Landsat 8 data. Google is thanked for hosting a copy of the Collection 1 Level 1 data on their Cloud Services. Smartfin is thanked for providing surfing data through their Application Platform Interface. The people recording and sharing data during their surfing sessions are thanked for their efforts, without which, this work would have not been possible. The authors of libRadtran are thanked for the release of their code under a GNU General Public License. The National Center for Atmospheric Research (NCAR) is thanked for hosting ERA5 data on a THREDDS/OPeNDAP server in their Research Data Archive (RDA). Two anonymous reviewers are thanked for their feedback on the paper.