Evaluating satellite estimates of particulate backscatter in the global open ocean using autonomous profiling floats

: Satellite retrievals of particulate backscattering (b bp ) are widely used in studies of ocean ecology and biogeochemistry, but have been historically diﬃcult to validate due to the paucity of available ship-based comparative ﬁeld measurements. Here we present a comparison of satellite and in situ b bp using observations from autonomous ﬂoats (n = 2,486 total matchups across three satellites), which provide b bp at 700 nm. With these data, we quantify how well the three inversion products currently distributed by NASA ocean color retrieve b bp . We ﬁnd that the median ratio of satellite derived b bp to ﬂoat b bp ranges from 0.77 to 1.60 and Spearman’s rank correlations vary from r = 0.06 to r = 0.79, depending on which algorithm and sensor is used. Model skill degrades with increased spatial variability in remote sensing reﬂectance, which suggests that more rigorous matchup criteria and factors contributing to sensor noisiness may be useful to address in future work, and/or that we have built in biases in the current widely distributed inversion algorithms.


Introduction
Satellite observations of particulate backscattering (b bp_sat ) have advanced our understanding of ocean biology over the last few decades by using b bp_sat as a proxy for particulate organic carbon [1], phytoplankton biomass [2], and particle size [3].There are three algorithms for which NASA's Ocean Biology Processing Group distributes b bp_sat products.While they are widely used, it remains a challenge to compare satellite and in situ observations of b bp at global scales.The extreme paucity of in situ data stands as a significant barrier to evaluating methods that retrieve b bp_sat , and ultimately to understand how these biases may influence the application of b bp for global questions.The ongoing deployment of worldwide autonomous floats equipped with backscattering sensors at 700 nm makes it possible to compare in situ b bp from these platforms with algorithms for b bp_sat .
Autonomous profiling floats (of the Bio-geochemical Argo program, [4]) have recently become the backbone of a global array for biogeochemical monitoring.These float measurements have since been used for a variety of applications [5][6][7][8], including regional assessments of ocean color products [9].Floats supply a valuable source of data to assess satellite performance because they continuously monitor the global ocean in seasonally under sampled places infrequently accessed by ships.Floats spend most of their time in cold dark waters ∼1000m deep, so biofouling and lateral drift are minimized [10], although they do occur, particularly in warm waters [11].There is low individual probability for coincident measurements with satellites (due to cloud cover and seasonal darkness), but the sustained sampling and spatial coverage of float observations yields many more satellite matchups compared to ship-based observations of b bp , especially in open ocean regions that are severely under sampled with respect to b bp .
This study builds off of the success of previous work that has advanced our understanding of remote sensing inversion algorithms to predict in situ observations of b bp [12,13].Most of these studies have relied on a limited number of shipboard observations and/or simulated datasets for the remote sensing reflectance (R rs ) values used to derive b bp .There exist more recent studies comparing R rs -derived estimates of b bp to in situ observations, but they do so at limited regional scales [9,14,15].A broader understanding of how spaceborne estimates of particulate backscatter compare with globally distributed in situ observations is still needed.The aim of this study is to compare b bp observations from floats (n = 2,486 surface matchups) and satellites in order to identify paths forward for improving b bp_sat in the open ocean.

Bio-argo float data: acquisition and processing
Vertical profiles of b bp_float (700 nm, m −1 ) were acquired from the Argo Global Data Assembly Centre (GDAC) ftp site (ftp://ftp.ifremer.fr/ifremer/argo/etc/argo-synthetic-profile,on 30 January 2019, [16]), the Southern Ocean Carbon and Climate Modeling database (SOCCOM, on 14 January 2019), the NASA North Atlantic Aerosols and Marine Ecosystems project (NAAMES, http://misclab.umeoce.maine.edu/floats/,on 18 January 2019), and through the Monterey Bay Aquarium Research Institute (MBARI, https://www3.mbari.org/chemsensor/MBARI_float_table/mbari_float_table.html, on 8 Feb 2019).Float measurements of b bp have a known uncertainty on the order of 10-15% [17].Floats are not perfect platforms and they can be subject to instrumental drift and calibration issues arising from using only the manufacturer calibration file rather than performing sensor-specific calibrations with dark counts.In all cases, the quality-controlled (QC) data flagged as 'bad data' by the ARGO QC Data management team were removed [18], as well as additional outliers.Outliers were defined as b bp_float values outside the bounds given by 1.5 times the interquartile range (+/− the 75 th and 25 th percentile, respectively) of log-transformed b bp_float data.In total, 850 observations were removed, or approximately 2% of all observations.After outliers and bad data are removed, this data compilation consists of 37,165 independent lat/lon/time b bp observed vertical profiles (700 nm, m −1 ) at resolutions ranging from 0.5m to 5m depending on the location and dataset (Fig. 1).
Profiles of b bp_float were processed to make them comparable with remote sensing reflectance products by computing the average b bp in the surface mixed layer.Mixed layer depth (MLD) was calculated in each profile as the depth where density exceeds 0.03 kg m −3 relative to the density at 10 m [19].Next, each vertical profile was de-spiked with a 3-point moving median within the MLD to remove the backscattering contribution of bubbles [20].De-spiking the profiles reduced the median of all b bp_float observations from 9.7 × 10 −4 m −1 to 9.0 × 10 −4 m −1 and it reduced (on average) the interquartile range within all profiles from ∼3 × 10 −4 m −1 to ∼1 × 10 −4 m −1 .Finally, we averaged the de-spiked b bp values within the MLD for each float profile, assuming that MLD is equivalent to the active mixing depth.We note using a processing depth of the first light attenuation layer does not change the values of b bp_float for the places and times used in this study.

Satellite data: acquisition and processing
Remote sensing reflectance (R rs , sr −1 ) data were acquired for MODIS (Moderate Resolution Imaging Spectroradiometer, nadir viewing resolution = 1 km), VIIRS (Visible Infrared Imaging Radiometer Suite, nadir viewing resolution = 750 m), and OLCI (the SENTINEL-3 Ocean and Land Colour Instrument, nadir viewing resolution = 1.2 km) sensors.MODIS and VIIRS Level-2 scenes were downloaded from the NASA ocean color website (https://oceandata.sci.gsfc.nasa.gov/)and OLCI Level-2 scenes were downloaded from EUMETSAT (https://codarep.eumetsat.int/).Specific ocean color scenes were identified for download if they coincided with a float observation within a +/-3 hour window.R rs data were considered according to the matchup criteria in [21], hereafter referred to as the 'Bailey and Werdell criteria.'In particular, the solar zenith angle must be less than 75 degrees, at least half of the pixels in a 5 × 5 pixel box centered on the float location must be unflagged (unflagged data are free from failed atmospheric corrections, are not land pixels, etc.), and the median coefficient of variation for bands between 412 and 555 nm (and for the aerosol optical thickness at 865 nm) within the 5 × 5 box must be less than 15%.Remote sensing reflectances that meet these criteria are considered to be in spatially homogeneous waters that are roughly coincident with the timing of a float observation, and are minimally influenced by clouds, sun glint, and/or other atmospheric correction problems.Figure 1 shows where and when there are float-satellite match-ups that meet the Bailey and Werdell criteria (n = 539 matchups for MODIS, n = 840 matchups for VIIRS, n = 1107 matchups for OLCI).The spatial bias in the distribution of float observations with satellite matchups is substantial due, in part, to the geographical bias in float distribution.
All R rs data were corrected to remove the contribution of Raman scattering Eqs.(1,2).Raman scattering is a spectrally continuous process whereby water molecules absorb photons and re-emit them at different wavelengths.If Raman scattering is unaccounted for, this can introduce bias into estimates of derived inherent optical properties, including b bp , especially for wavelengths greater than 550 nm [22,23].At the time of writing, the standard R rs products distributed by NASA (at all processing levels, from 1997 -present) are not corrected for Raman scattering.We note that the documented average difference between b bp_sat derived from Raman corrected R rs versus b bp_sat that does not use Raman corrections has been shown to be as large as 20% for wavelengths between 412 and 670 nm [24].
There are both empirical and analytical approaches [15,[23][24][25][26][27] to account for the Raman contribution to R rs (λ).Rather than investigate the differences in derived b bp_sat from the choice of Raman scattering algorithm, in this study we apply the correction of [25] and use interpolated parameters to specific wavelengths when appropriate.We choose the algorithm in [25] because it has documented low uncertainty, the parameters it requires were derived over a realistic range of open ocean optical values, and it involves no additional data inputs (which would introduce an additional source of uncertainty, documented in [26]).
The Raman corrected R rs is given by: Where R rs T is the R rs from satellite observations, and RF, the 'Raman Factor,' is given by: Values of β 1 , and β 2 were derived from Hydrolight simulations for chlorophyll values ranging from 0.02 to 3 µg L −1 and the reported model uncertainty is small (3%, [25]).We corrected R rs from each sensor for all wavelengths between 410 and 700 nm so that they can be used in inversion algorithms to obtain b bp_sat .The effect of Raman scattering is greatest at higher wavelengths.In this study, the median fraction of R rs to R rs T at 667 nm is 0.91 with an interquartile range of 0.04.The retrieval of b bp_sat is an inverse problem requiring a forward model that describes the relationship between R rs and b bp_sat .There are several published inversion algorithms available to determine b bp_sat , and we consider the three model b bp_sat products currently distributed by the NASA ocean color processing group: the Generalized Inherent Optical Property algorithm (GIOP, [28]), the Garver-Siegel-Maritorena algorithm (GSM, [29]), and the Quasi Analytical Algorithm (QAA, [30]).The goal of the GSM and GIOP inversion algorithms is to minimize the difference between modeled and observed R rs using routine optimization techniques (see Appendix A for more details).The goal of the QAA is to analytically invert R rs by first calculating the total absorption coefficient at a reference wavelength so that b bp_sat can be algebraically determined using the inverted ratio of total backscattering to total absorption.In these algorithms, b bp_sat is determined in combination with empirical coefficients for particulate absorption and absorption by colored dissolved and detrital material, which will both impact estimates of b bp_sat .When in their default configuration, the general architecture of these three algorithms is similar but they differ in their assumptions of inherent optical property spectral shape and in their optimization techniques.Complete details are given in Appendix A.
In this study, we calculate b bp_sat from the GIOP, QAA, and GSM algorithms for every location and time with a float matchup observation.Observations of b bp_sat are extrapolated to 700 nm for direct comparison to b bp_float using the assumed spectral slope of b bp that is specific to each algorithm (Fig. 2).The implications of using an assumed spectral slope within a remote sensing inversion algorithm are reviewed in the discussion.Because the distributions of b bp_sat and b bp_float data are skewed, we calculate basic summary and non-parametric statistics, which include the median and Spearman's rank correlation.

Results: b bp matchups and modeled b bp comparisons
There are slightly different matchup statistics between the satellite sensors (Fig. 1) depending on the satellite crossover time and nadir viewing spatial resolution.The probability of a satellite matchup is a strong function of location (Fig. 1).In order for floats to be used for satellite comparisons, they must surface in the daytime during clear sky conditions.The globally integrated effect of this availability is a spatially biased dataset to the Mediterranean and Black Seas.Sixty-three percent of the match-up data are from the Mediterranean and Black Seas, 15% are from the Southern Ocean, 9% are from equatorial regions, 3% are from the North Atlantic, and the rest are from varied locations worldwide.There is no systematic bias depending on the time difference (in hours) between a float and satellite observation (Fig. 3).
Comparisons of b bp_float and b bp_sat are shown for the different sensors across the GIOP, QAA, and GSM algorithms (Fig. 4), with summary statistics in Table 1.Considerable variations in these summary statistics are observed between sensors, algorithms, and regions, with median b bp values differing by more than a factor of two and ranging from 6.2 × 10 −4 m −1 to 0.0015 m −1 .The calculated bias (the median ratio of b bp_sat to b bp_float ) between b bp_sat and b bp_float ranges from 0.77 to 1.6, the Median Percent Error (MPE) spans from 27% to 65%, and Spearman's rank correlation coefficients (r) vary from r = 0.06 to r = 0.79 (Table 1).Despite these seemingly divergent results, several qualitative consistencies are also apparent in the  the ranking of the median b bp_sat observations from lowest to highest is nearly always GIOP < QAA < GSM, no matter which satellite or region is considered.Second, the Median Percent Error (MPE) is greatest for GSM in all cases and, no matter which sensor or region is considered, GSM (in its standard configuration) always has the lowest 'r' value relative to GIOP or QAA.Finally, b bp_sat derived from MODIS or VIIRS has lower MPE than those derived from OLCI.OLCI has the highest overall number of matchups and MODIS has the fewest, although in the Mediterranean and Black Sea, VIIRS has more matchups than OLCI.
Direct comparisons between b bp_sat and b bp_float in the Mediterranean and Black Seas are confounded by a high degree of scatter (Fig. 4), despite smaller documented variability of either b bp_sat or b bp_float in this region compared to the open ocean (Table 1).This noise limits the degree to which any obvious biases in the b bp_sat comparisons with b bp_float may be identified.When the Mediterranean and Black Sea data are excluded, a different relationship between b bp_sat derived from GIOP, QAA, and GSM and b bp_float is observed.In particular, at low b bp_float values (< 0.001 m −1 ) GSM greatly overestimates b bp_float and GIOP and QAA do so to a lesser extent.At higher b bp_float values (> 0.005 m −1 ), GIOP and QAA slightly underestimate b bp_float and there is negligible bias using GSM.
The performance of b bp_sat compared to global observations of b bp_float is a strong function of which algorithm or sensor is used.We found that GIOP and QAA performed better than GSM in oligotrophic regions, especially in waters with b bp (700 nm) < 0.001 m −1 , but in waters with higher b bp all three algorithms performed similarly.This finding is consistent with other studies that have shown good agreement when comparing GSM to either in situ or lidar based observations of b bp in waters of moderate to high b bp [31].In general, the MPEs reported in this study for QAA are larger (by a factor of 3) than those that have been achieved using in situ observations of b bp and shipboard observations of R rs within the QAA framework [12].A recently published inversion algorithm ( [15], hereafter 'LS2') yielded an MPE of 36% when calculated using LS2-derived backscattering from satellite R rs at 670 nm and in situ b bp (n = 25 matchup observations).The magnitude of this MPE is consistent with our reported MPEs from  MODIS and VIIRS using GIOP or QAA, and it reflects the degraded algorithm performance when using satellite rather than in situ R rs as inputs.
There are only 9 float observations common to all three sensors in space and time.This limits the degree to which differences in b bp_sat and b bp_float may be attributed to sensor specifications (e.g., differing signal to noise ratios) or to variability arising from additional float observations.However, if we assume that the underlying distributions of b bp_sat for any given satellite are unbiased with respect to sampling, we can test for statistically significant differences between sensors.This assumption requires that all sensors contain observations from common regions although not precisely at the same time, they have similar proportions of the Southern, Mediterranean and Black seas compared to all data, and they all sample the Pacific and Atlantic basins.
We first tested the statistical differences in the different b bp_float matchup distributions for each sensor to ensure the above assumption is valid.The b bp_float distribution of values at OLCI matchups is statistically different from either MODIS or VIIRS.There are statistically significant differences (at the 5% significance level using a two-sample Kolmogorov-Smirnov test, p < 1.0 × 10 −5 in all cases) in the distributions of GIOP, GSM, and QAA b bp values derived from either OLCI, MODIS, or VIIRS.For example, the b bp_sat values from MODIS, OLCI, and VIIRS using the GIOP algorithm are statistically different from each other.In fact, no two distributions of b bp_sat within any given algorithm are similar, even though they are calculated using the same formulae.On the other hand, when b bp_sat distributions are tested for the GIOP and QAA algorithms, they are indeed statistically similar at the 5% significance level (p = 0.92), no matter which sensor is used, for all places except the Mediterranean and Black Seas.This suggests that there is greater similarity in the b bp observations within a given satellite and/or region than among algorithm choice across different sensors.

Discussion
Satellite derived estimates of the particulate backscattering coefficient at 700 nm vary among sensors.No matter which algorithm is used, we found that b bp_sat correlations are highest in open ocean waters with MODIS (r = 0.60 to 0.79) compared to VIIRS (r = 0.21 to 0.38) and OLCI (r = 0.32 to 0.47) because R rs observations are noisiest using either VIIRS or OLCI, on the basis of their coefficient of variations within the 5 × 5-pixel box, and on their relatively higher MPEs.The original Bailey and Werdell criteria stipulate that the median coefficient of variation in R rs around a 5 × 5-pixel box be less than 15%.Although we found that all three sensor data sets have median coefficients of variation less than 15% for any location and time, VIIRS and OLCI have higher average coefficients of variation (8% and 30%, respectively) than MODIS (5%) for all bands between 412 and 555nm, as well as the aerosol optical thickness at 865.The coefficient of variation has a strong dependency on wavelength and surpasses 15% at wavelengths longer than 555nm.This finding suggests a revisit of the Bailey and Werdell criteria, which were originally formulated to leverage the quality of satellite observations with the quantity of in situ matchups.Using a median coefficient of variation in R rs (λ) does not reflect the average spatial variability within a particular place and time.Choosing a different metric to quantify spatial heterogeneity, such as the standard deviation in R rs for each wavelength, may be useful so that it can be directly contrasted with the uncertainty expected from the spectrally varying signal-to-noise ratios that are specific for each sensor.The results herein support the use of autonomous floats for performing ocean color validations.Now that the quantity of in situ matchups is less of a limiting factor, more restrictive matchup criteria may be appropriate in future work.
Satellite derived estimates of the particulate backscattering coefficient at 700 nm also vary among algorithm choice.The GIOP, GSM, and QAA algorithms can all be optimized with local datasets, and their spectral shape assumptions of optical properties can be changed with different formulae if desired.We chose to compare them here in their default operational configuration because the products from these three algorithms are widely distributed, and thus it is critical to examine how they may be biased when confronted with a large global dataset.Importantly, their wide distribution does not imply that these are the best inversion algorithms available.If we view the results herein as a sample representation of how well we currently retrieve b bp from satellites, we can outline avenues for improvement in future work.
In the default configuration, the major difference between the GSM and either the GIOP or QAA is the assumed b bp spectral slope.It is critical to evaluate our results in the context of b bp slope because floats provide b bp at 700 nm and ocean color b bp values are typically compared between 440 and 555nm.The GSM assumes a constant b bp spectral slope of −1, whereas the GIOP and QAA allow the b bp spectral slope to vary as a function of R rs ratios, effectively making the b bp slope a function of chlorophyll.In its standard configuration, the GSM overestimates b bp by factors up to 50%, which is a probable consequence of both the constant b bp spectral slope and the Raman scattering correction, which has been shown to introduce large (> 20%) uncertainties in b bp when GSM is used [24].These findings suggest that caution should be applied when using the GSM algorithm in oligotrophic environments with low b bp because it could result in large overestimates of particulate organic carbon (POC) or phytoplankton biomass when b bp to carbon relationships are used.
One way to avoid uncertainty introduced by the assumption of b bp spectral slope is to employ inversion algorithms that do not require an assumed shape of b bp.This can be done by retrieving the spectral beam attenuation coefficient from ocean color [32] so that b bp may be expressed as the linear difference between particulate beam attenuation (which is a well described function of wavelength) and particulate absorption.The 'LS2' model is a more recent and promising method that updates the Loisel and Stramski (2000) inversion algorithm [27] and independently retrieves b bp at any wavelength knowing R rs and the solar zenith angle.The radiative transfer simulations used to develop this model were extended so that it can be applied in both clear and turbid waters.
where G 1 = 0.0949 and G 2 = 0.0794 for use in the GIOP and GSM model.The method in [30] uses G 1 = 0.0985 and G 2 = 0.1247 within the QAA to leverage the coefficients that work best for highly scattering coastal waters and lower scattering oligotrophic seas.
Determining b bp from Eq. ( 3) is carried out in different ways depending on the inversion algorithm used.The GIOP determines b bp by solving for the combination of IOPS that produce the lowest misfit between reconstructed r rs and observed r rs .To accomplish this, a(λ) and b b (λ) are expressed as the sum of their parts: Where a w and b bw are absorption and scattering by pure water (respectively) and M denotes the eigenvalues that modify its eigenvector (a* or b*) for CDOM and non-algal particles (collectively dg), phytoplankton (ph), and all particles (p).In the GIOP default configuration, a * dg (λ) = e -S dg (8) where S = 0.018 nm −1 and is spectrally independent.a * ph (λ) is determined from satellite estimates of chlorophyll a, normalized to 0.055 m 2 mg −1 .The spectral shape of b bp is given by b bp (λ) = b bp (λ 0 )( λ / λ 0 ) −γ (9) where γ is an empirically derived function using blue to green band ratios of r rs , which effectively makes γ a function of chlorophyll.The three unknown eigenvalues are solved, and b bp is thus determined, using the Levenberg-Marquardt least squares method to invert observed r rs .
In contrast to the GIOP, the QAA determines a(λ) using empirically derived R rs or r rs band ratios (depending on the R rs (670nm) value) in combination with the known spectral shape of absorption by pure water [35].Then, because µ(λ) and a(λ) are known, b b is solved analytically using Eq. ( 5), and b bp is retrieved after subtracting scattering by pure water [36].The assumed slope of the backscattering spectrum is calculated as in the GIOP.The updated QAA (v6) is used in this study, with more details available at <<http://www.ioccg.org/groups/Software_OCA/QAA_v6_2014209.pdf>>.
While the GIOP and QAA allow the shape of b bp to vary, the GSM inversion model is a semi-analytical model that, in its default configuration, holds the spectral shape of IOPs constant and solves for chl a concentration, the cdm absorption coefficient (a dg ), and the backscatter coefficient (b bp ).
The governing equations are below: where a * ph is the chl a specific absorption coefficient, S is held constant to 0.02, and the backscatter coefficient (γ) is held to 1).The procedure used to derive default configuration parameters is simulated annealing, which allows parameter searches in the direction of lower model performance, so it is little influenced by nearly performance maximums and instead gives solutions that represent that best model performance over a large search range.

Fig. 1 .
Fig. 1. a) A map of float observations used in this study, colored by matchups with a particular sensor.Grey represents float observations without coincident satellite matchups.b) Distribution of sampling month for all float observations and for those that coincide with a particular satellite sensor.c) Median (dotted line) and the interquartile range (blue shading) of satellite remote sensing reflectance for the matchup times and locations used in this study.

Fig. 2 .
Fig. 2. Comparisons of b bp_float distributions (depending on sensor, top panel), and b bp_sat from the GIOP (2 nd panel), GSM (3 rd panel) and QAA (4 th panel) algorithms across three different sensors (MODIS in yellow, VIIRS in red, OLCI in blue).Grey represents all floats that do not coincide with a satellite observation.

Fig. 3 .
Fig. 3.The absolute value of the difference between b bp_sat and b bp_float is shown against the time (hours) between a satellite overpass and a float observation.The b bp_sat values are shown as an example using the GIOP, but these results are consistent among all algorithms.Satellite data source is indicated by symbol type: filled circle = MODIS, plus sign = VIIRS, and open triangle = OLCI.

Fig. 4 .
Fig. 4. b bp_sat vs b bp_float matchups for the GIOP, QAA, and GSM inversion algorithms using remote sensing reflectance from MODIS, VIIRS, or OLCI.For each algorithm, the left panel shows all data locations and the right panel shows all locations except the Mediterranean and Black Seas.The Mediterranean and Black Seas data are highlighted in blue in the left panel for each satellite.b bp is at 700 nm m −1 .The red dashed line is the 1:1 line.

Table 1 . Matchup summary statistics and non-parametric indices for the 3 satellite sensors and remote sensing inversion algorithms used in this study. Bias is calculated as the median ratio of the satellite values to the float observations, and median percent error (MPE) is calculated as the median of 100% x
|(b bp_sat /b bp_float −1)|.