Biases from incorrect reflectance convolution

Reflectance, a crucial earth observation variable, is converted from hyperspectral to multispectral through convolution. This is done to combine time series, validate instruments, and apply retrieval algorithms. However, convolution is often done incorrectly, with reflectance itself convolved rather than the underlying (ir)radiances. Here, the resulting error is quantified for simulated and real multispectral instruments, using 18 radiometric data sets (N = 1799 spectra). Biases up to 5% are found, the exact value depending on the spectrum and band response. This significantly affects extended time series and instrument validation, and is similar in magnitude to errors seen in previous validation studies. Post-hoc correction is impossible, but correctly convolving (ir)radiances prevents this error entirely. This requires publication of original data alongside reflectance. © 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement


Introduction
Reflectance, the spectral fraction of light reflected by a surface, is an essential earth observation (EO) variable. It forms the basis for data products such as chlorophyll and suspended matter in water [1][2][3], and canopy cover and biomass on land [4,5]. As such, it is a routine data product for EO satellites, including NASA's Landsat and ESA's Sentinel programs, and in situ radiometers.
Spectral data are divided into two categories, namely multispectral and hyperspectral. Multispectral instruments observe in several broad, discrete wavelength bands. Examples include the Moderate Resolution Imaging Spectrometer (MODIS) and the Visible Infrared Imaging Radiometer Suite (VIIRS), but also in situ instruments including unmanned aerial vehicles (UAVs) and even smartphones [6]. Conversely, hyperspectral instruments provide continuous wavelength coverage with a fine spectral resolution. Examples include the TriOS RAMSES, Seabird HyperOCR, and ASD FieldSpec field-going spectroradiometers, as well as the Ocean Color Instrument (OCI) due to fly on the Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) mission. Hyperspectral data have a finer spectral sampling and, typically, resolution and thus contain more information than multispectral ones, but depending on the instrument design, often collect less light in each band, giving a worse signal-to-noise ratio.
Since the current EO landscape is a mixture of both types, it is often desirable to convert data between the two, typically from hyper-to multispectral. Three common use cases for this process exist, namely combining time series, instrument validation, and retrieval algorithms.
The first use case is merging and extending time series using different sensors. Long-term, high temporal resolution time series are necessary to study fundamental biogeochemical processes and long-term effects [7,8] such as climate change [9]. Current efforts focus on merging multispectral time series, on the radiance or reflectance level [10][11][12], achieving relative errors on reflectance <5% [10,12], or on the end product level [7,13]. Future efforts will focus on extending multispectral time series with new hyperspectral sensors, for example extending MODIS/VIIRS aerosol optical depth (AOD) series with OCI (PACE) data [14]. This is done by converting hyperspectral data to the multispectral sensor's bands, to simulate what the latter would have measured. However, calibration differences and sensor characterization imperfections can introduce significant biases, for example up to 0.10 AOD for OCI-MODIS/VIIRS [14].
The second use case is the validation of multispectral (often satellite) data using in situ hyperspectral sensors. This is done by comparing simultaneous match-up measurements from both instruments [15]. Validation is done on all products, including normalized radiance [16], reflectance [4,11,17,18], and derived products such as chlorophyll [7,19] and inherent optical properties (IOPs) [18]. Similar validation is done for in situ multispectral sensors, such as UAVs [20] and smartphones [21,22]. Vicarious calibration similarly involves comparing match-up data, but aimed at determining satellite gain factors [23]. Since vicarious calibration is performed on (normalized) radiance rather than reflectance, it is outside the scope of this work, though a brief discussion is given in Sect. 2.4.
The third use case is the application of multispectral retrieval algorithms to hyperspectral data. Such algorithms are commonly based on the ratio between spectral bands and are thus called band-ratio algorithms. For example, band-ratio algorithms relating chlorophyll to Sentinel-2A (S2A) Multi-Spectral Instrument (MSI) bands have been developed for Vietnamese [24] and Estonian [1] lakes, the latter with a mean standard error in chlorophyll-a of 5%. While derived on multispectral data, such algorithms are also applied to hyperspectral data, both to derive products and for validation, requiring a spectral conversion. Differences between these converted data and in situ data have been found [25], which may be due in part to incorrect treatment of radiometry. It should be noted that instead of a spectral conversion, often only the central band wavelength reflectance is used [3,19].
Converting hyperspectral data to multispectral bands is commonly, though not exclusively (see Sect. 2.2.1), termed spectral convolution. An in-depth description is provided in Sect. 2., but in short, the hyperspectral data are multiplied by the spectral response function (SRF) of the multispectral band and the product is integrated. This is done for quantities including radiance [26,27], optical thickness [27,28], IOPs [18,29], vegetation indices [5], and reflectance [4,29].
This work quantifies the error induced by incorrect spectral convolution of reflectance in each of the three use cases, for a variety of synthetic and real instruments using 18 archival data sets totaling N = 1799 spectra. To narrow the scope, this work focuses on remote sensing of ocean color. However, the principles and methods apply broadly to any fractional quantity, including other reflectances (soil, vegetation), attenuation coefficients, and degree of polarization, as well as spatial convolution [33]. While the existence of this error has been pointed out previously [29,32,34] and quantified at 1% for a single data set and sensor [29], a large-scale quantitative assessment has not yet been published.
This work fits into a wider field of EO error analysis. Recent efforts include investigations into the out-of-band response of EO sensors [35], the impact of differing spectral [32] and spatial [33] resolutions on satellite match-up analyses, and the impact of hyperspectral SRFs having a non-zero bandwidth [36]. On the experimental side, significant efforts have gone into glint removal in above-water radiometry [37][38][39] and rigorous characterization of instrumental [6,20] and methodological [15,34] uncertainties. A broad, in-depth overview of uncertainties in ocean color data is provided in the recently published International Ocean Color Coordination Group (IOCCG) report number 18 [40].
Sect. 2 describes the theoretical background of reflectance and spectral convolution. Sect. 3 describes the data used in this work and the method for quantifying the convolution error. Results are presented in Sect. 4. Finally, Sect. 5 contains a discussion of the results and conclusions.

Reflectance
Reflectance R is the ratio of upwelling over downwelling (ir)radiance. Radiance L(λ, θ, φ) is the radiant energy per wavelength λ propagating in a direction (θ, φ), in W m −2 nm −1 sr −1 , while irradiance E(λ) is L integrated over a solid angle, in W m −2 nm −1 . The units of R depend on which ratio is taken. Since this work deals only with wavelength dependence, (θ, φ) terms are dropped henceforth for clarity.
Different reflectances can be defined by dividing different (ir)radiances. Examples include the bi-directional radiance reflectance [4], the non-directional irradiance reflectance [31], and the uni-directional remote sensing reflectance R rs used in ocean color [37,39]. As defined in Eq. (1), R rs is the ratio of water-leaving radiance L w [37] over downwelling irradiance E d [41], in units of sr −1 . This work focuses on R rs , but the same mathematics apply to any reflectance.

Spectral convolution
Multispectral data are simulated from hyperspectral data through spectral convolution. As shown in Eq. (2), this involves multiplying the hyperspectral data L(λ) by the multispectral band SRF S B (λ), integrating the result over all wavelength in the band ( ∫ λ∈B dλ), and normalizing by the effective bandwidth. In this work, convolved quantities are denoted by a bar, such asL(B) in Eq. (2). In practice, spectral convolution is often a sum over discrete L and S B data. The convolution process is shown graphically in Fig. 1. Convolving hyperspectral data is really an approximation, due to the finite spectral resolution of hyperspectral sensors. As derived in Appendix A, this method is valid if the full width at half maximum (FWHM) of the multispectral band is at least double that of the hyperspectral sensor.

Reflectance convolution
Just as the hyperspectral remote sensing reflectance . Both are calculated as in Eq. (2) and then divided, as shown in Eq. (3). Convolving (ir)radiances to calculate a band-average reflectance will be referred to in this work as working in radiance space or L-space, and the result asR L rs (B). Mathematically, this is the correct method for convolving R rs to simulate multispectral data.
Instead, one might simply convolve R rs itself. This will be referred to as working in reflectance space or R-space and the result asR R rs (B). The expression forR R rs (B) is given in Eq. (4).
Working in R-space is incorrect, as shown in Fig. 1 and the following example. First, let the SRF S B (λ) be a boxcar response of 1 for 0 ≤ λ ≤ 1 and 0 elsewhere. Then all integrals need only be evaluated for those wavelengths and the SRF bandwidth is 1. Second, let L w (λ) = e eλ and E d (λ) = e −eλ . Such spectra are not physical but demonstrate the mathematical principles well. As shown in Eqs. (5) and (6),R L rs (B) ≈ 15 andR R rs (B) ≈ 42 differ significantly.

General rule
Convolution is a useful tool, but the order of operations is not always intuitive. A general rule of thumb can be used, which applies to any kind of convolution (spectral or spatial) when converting high-to low-resolution (spectral or spatial) data. For other purposes, such as smoothing, reflectance itself can be transformed. As a rule of thumb, only quantities the lower-resolution sensor would observe can be convolved. This includes the at-sensor (ir)radiance (in physical units [34]) but not reflectance and derived products. Propagation of in situ radiances, through surfaces when measured underwater [37] or through the atmosphere for vicarious calibration [23], must occur prior to convolution to accurately simulate the radiance at a multispectral sensor. Simplifications may be necessary [27,35,43] but should be mathematically justified. Finally, hyperspectral upwelling radiance L u , measured in-or above-water, should be converted to L w [37] before convolution when comparing it to multispectral L w .

Methods
Archival data sets containing (ir)radiance and reflectance data were used to test the principles described in Sect. 2 and quantify the errors resulting from working in R-space rather than L-space. All analysis was done using custom Python scripts, available at https://github.com/burggraaff/ reflectance_convolution.

Radiometric data
18 archival radiometric data sets were used [44][45][46][47][48][49][50][51][52][53][54][55][56][57][58], totaling N = 1799 spectra. Data were sourced from the SeaWiFS Bio-optical Archive and Storage System (SeaBASS) [59] and PANGAEA (https://pangaea.de/). Only data sets including either the original radiometric data or R rs and L w or E d were used. In most cases, the given R rs and E d were used and L w = R rs E d was reconstructed. This reduced the amount of post-processing, such as glint removal, that was necessary. All spectra used in the further analysis are shown in Fig. 2. An overview of the data and post-processing is provided in Appendix B. Small imperfections in the resulting data, such as residual atmospheric bands in R rs (Fig. 2), are no problem. For this work, it is only necessary to obtain a set of realistic spectra, not to determine IOPs. Negative R rs were removed since they are not physical but instead the result of measurement error or over-correction of glint; this is no problem for the same reason.

Spectral convolution
Spectral convolution was implemented in the custom Python library described above. The radiometric data were interpolated to the SRF wavelengths. If the radiometric data and SRF wavelengths did not overlap fully, the convolution was only done if the integral of the SRF over the non-overlapping wavelengths was ≤ 5% of its total integral. The integration was done using the SciPy implementation of Simpson's rule in the integrate.simps function.
In each experiment, data were convolved in both L-and R-space, and the resulting reflectances were compared in absolute and relative terms. The absolute difference is ∆R rs =R R rs (B) −R L rs (B), meaning a positive ∆R rs corresponds to an overestimation in R-space. The relative difference was normalized toR L rs (B), and set to 0% ifR L rs (B) = 0 sr −1 . All spectra were treated separately, enabling a statistical analysis of the difference on varying input spectra. Due to the finite spectral resolution of the hyperspectral data, some data sets could not be convolved with some multispectral SRFs (see Appendix A).

Synthetic
The dependence of ∆R rs on band location and width was investigated by generating various synthetic boxcar and Gaussian filters. Both are common approximations of real SRFs [5,29]. Boxcars were evaluated on wavelengths with an non-zero response, Gaussians on wavelengths from 320-800 nm. For both, a 0.1 nm step size was used to properly sample narrow bands. Central wavelengths between 330-809 nm (1 nm steps) and FWHMs of 6-65 nm (1 nm steps) were used, representative of real multispectral instruments (Sect. 3.3.2).

Real instruments
The behavior of ∆R rs for real multispectral instruments, namely eleven satellite instruments and three low-cost sensors, was also investigated. A selection of these is shown in Fig. 3. Panchromatic bands were not used as they are intended for spatial sharpening, not reflectance measurements. Only bands fitting the radiometric data (within 320-1300 nm) were used.  [64]. These are all commonly used to measure R rs . The low-cost sensors were one UAV, the DJI Phantom Pro 4, and two smartphones, the iPhone SE and Samsung Galaxy S8 [6]. Such sensors have become popular in their own right as they can provide radiance data, if radiometrically calibrated [6,20,21], but also serve as proxies for new cubesat sensors such as the Planet Labs RapidEye and Dove series. The radiometric response and SRF may be affected by mechanical and electronic effects, including satellite launch and sensor drift, as well as by viewing angle and electronic cross-talk. Using up-to-date calibration data from the instrument developer negates these problems. Here, the SRFs recommended by instrument developers or in literature were used, representative of what is done in the wider literature.

Retrieval algorithm propagation
Finally, the error induced by R-space convolution was propagated through several retrieval algorithms. These were the polynomial OCx chlorophyll-a (Chl-a) algorithms [7] for MODIS (OC6, OC3), SeaWiFS (OC4), MERIS (OC4), VIIRS (OC3), and CZCS (OC3), the exponential Ha+17 S2A/MSI Chl-a algorithm [24], and the polynomial Lymburner+16 (LL+16) OLI total suspended matter (TSM) algorithm [13]. These are representative of most multispectral retrieval algorithms in the literature, which differ only in bands used or coefficient values. Equation (7) describes OCx, with [Chl-a] in mg m −3 , a i instrument-specific empirical coefficients, and λ B , λ G the instrument's blue and green bands. The Ha+17 algorithm is given in Eq. (8), with B3, B4 the R rs in the respective S2A/MSI bands. The LL+16 algorithm is given in Eq. (9), with G, R the R rs in the OLI Green and Red bands, and TSM in mg L −1 . [ For each input spectrum, bothR L rs andR R rs were propagated through each algorithm and the results were compared, analogous to theR rs comparison described in Sect. 3.2.

Simulated instruments
The reflectance convolution error ∆R rs was calculated for the synthetic SRFs described in Sect. 3.3.1. As an example, Fig. 4 shows ∆R rs as a function of central wavelength λ c and FWHM for the seaswir-a (see Table 1) data. The sign and magnitude of the error depend on the input spectrum. For example, the local minima around 400 and 520 nm correspond to local maxima in the derivative E d spectrum dE d /dλ. Similarly, the local maxima at 480 nm correspond to a local minimum in dE d /dλ. Furthermore, the magnitude of ∆R rs increases with wider FWHMs. This is expected since E d , L w , and R rs are less spectrally flat over a wider spectral range [29]. Rather than a random error around a median of 0, the difference is a systematic bias in either direction. This is especially clear in Fig. 4 at λ c ≤ 460 nm. Being a bias, it needs to be corrected rather than simply incorporated into an error budget. This will be discussed in Sect. 5.
Similar trends were found in the other data sets and with the Gaussian SRFs. For the latter, the λ c -∆R rs relation was similar to boxcars with the same FWHM, but larger in magnitude and smoother. This is due to the Gaussian wings covering more of the spectrum than the boxcar's sharp edges. For example, for λ c = 420 nm, ∆R rs = (−1.5 ± 0.2)% for a 30 nm boxcar and (−3.8 ± 0.4)% for a 30 nm Gaussian, error bars indicating the 5%-95% range, for the seaswir-a data. Finally, the same boxcar filter applied to the tarao data gave ∆R rs (420 nm) = (+0.01 ± 0.02)%. This value is much smaller since the tarao spectra are smoother than the seaswir-a ones; a similar trend was seen across all data sets. These differences highlight the importance of determining this error for each filter and data set, as an ensemble correction is impossible.

Real instruments
∆R rs was also calculated using the real SRFs described in Sect. 3.3.2. For example, Fig. 5 shows the distribution of ∆R rs across all data for the five OLI bands. As with the synthetic sensors, ∆R rs is typically a bias in one direction rather than a random error and its magnitude and sign depend on the input spectrum. For example, in the OLI Blue band ∆R rs >0 for 77% (1380/1799) of spectra while in Green ∆R rs <0 for 80% (1444/1799). Furthermore, a similar trend for larger errors with wider bands was seen, for example in the OLI Green band (λ c = 562 nm, FWHM = 57 nm) ∆R rs = (−0.2 +0.4 −0.9 )% while in the similar S3A/OLCI Oa6 band (λ c = 560 nm, FWHM = 10 nm) ∆R rs = (−0.00 +0.03 −0.05 )%. No significant differences were found between paired instruments such as S3A/OLCI and S3B/OLCI. Some multispectral band-data set combinations are technically invalid (Appendix A); however, these need not be excluded from these overall statistics, as they do not affect the observed trends. Comparing the convolution error between data sets, as in Fig. 6 for the OLI Green band, again revealed significant differences. Depending on the data, ∆R rs was a systematic underestimation (tarao ∆R rs = (−0.7 ± 0.2)%), overestimation (seaswir-r ∆R rs = (+0.2 ± 0.1)%), or a random error around 0 (orinoco ∆R rs = (+0.1 +0.3 −0.5 )%). This is similar to what was observed in Sect. 4.1 and again shows that the error must be quantified separately for each filter and data set.

Low-cost sensors
Finally, the SPECTACLE low-cost sensors [6] are particularly interesting due to their broad bands. The convolution error in their RGB bands, using all data, is shown in Fig. 7. Interestingly, ∆R rs was largest in the relatively narrow R bands, possibly due to the shapes of the input spectra or the multi-peaked SRFs [6]. Overall, the large magnitude of ∆R rs (down to −5% in the R bands) highlights the importance of correct spectral convolution for these sensors.

Retrieval algorithms
Finally, the reflectance convolution error was propagated through the retrieval algorithms described in Sect. 3.4. The results for the smf-a data set are shown in Fig. 8. As in the previous sections, the propagated error in Chl-a and TSM was a bias of a few percent. Its sign varied by data set and by algorithm; for example, for the seaswir-a data, VIIRS OC3 underestimated Chl-a (∆Chl-a = −1.4 +0.7 −0.3 %) while CZCS OC3 overestimated it (∆Chl-a = +0.8 +0.3 −0.1 %). The magnitude of the error was consistently on the percent level for all data sets and algorithms. These results are representative for most band-ratio algorithms, as discussed in Sect. 3.4.

Discussion & conclusions
In this work, the effects of incorrectly convolving reflectance when simulating multispectral data (Sect. 2.) were investigated. While this error has been pointed out previously [29,32,34], it still commonly occurs in the literature (see Sect. 1.). Only one quantitative analysis was found, in which for one data set and one sensor the difference was found to be 1% and neglected [29]. However, this result cannot be generalized to all data sets and sensors, as shown in this work.
Significant errors, up to several percent, in the remote sensing reflectance (∆R rs ) were found for all data sets and sensor bands (Sects. 4.1 and 4.2). The error was largest near features in the input spectra, particularly peaks in the derivative of E d (dE d /dλ), and for sensors with wide FWHMs, especially low-cost sensors (Sect. 4.2.1). For example, in the narrow (FWHM ≈ 10 nm) OLCI bands, |∆R rs | 0.1%, while in the wide (FWHM >50 nm) R bands of low-cost sensors, |∆R rs |>5% for >5% of the spectra. Furthermore, the magnitude and sign of ∆R rs differed significantly between data sets due to varying spectral shapes.
If not prevented, the convolution error will create dubious patterns in combined time series. Depending on the data set and sensor, the convolution error is similar to or larger than errors found in existing band-shifting algorithms for combining multispectral time series [10,12]. With the launch of PACE, for which time series extension is a primary goal [14], this effect must be accounted for to achieve desired uncertainty requirements [32,40].
Similarly, incorrect reflectance convolution in instrument validation leads to systematic overor undercorrections. For example, reflectances from the HydroColor smartphone app have been validated using WISP [21] and HyperSAS [22] data, convolved in R-space, finding significant errors and biases. Biases of −9.5 × 10 −4 to +1.3 × 10 −4 sr −1 were found in the WISP comparison; in Sect. 4.2.1, the convolution error caused biases on the order of 10 −4 sr −1 for 5%-14% of spectra, varying per band. Errors in the HyperSAS comparison were on the percent level, similar to the errors up to 5% found in Sect. 4.2.1. Interestingly, in both studies the convolved data underestimated the multispectral data, as would be expected from the negative biases found in this work. This suggests that the convolution error may have contributed a significant part of the error in both studies. However, a direct comparison is difficult due to differing input spectra, as shown in Fig. 6, and band responses. Thus, the error in these cases cannot definitively be attributed to incorrect convolution. Additionally, many other factors causing significant errors in low-cost sensor data are known [6].
This importance for validation also applies to satellites. For example, in [18], systematic underestimations up to 1% were found in band-averageR rs (compared to hyperspectral R rs ) convolved in R-space with the OLI, MSI, and ETM+ SRFs. This is similar to, and may be explained by, the reflectance convolution error found for these sensors in Sect. 4.2 and shown in Figs. 5 and 6. The same study found no significant errors in convolved VIIRS and OLCI reflectance, agreeing with the correlation between FWHM and error demonstrated in Sect. 4.1.
Conversely, the effects on retrieval algorithms are minor. The convolution error in Chl-a and TSM algorithms (Sect. 4.3) was on the percent level. Since errors in satellite-retrieved Chl-a can be up to 500% [2], a bias of a few percent can safely be neglected. Typical TSM errors are less extreme but still significantly larger than the ≤ 1% found here [3,13]. While only a few algorithms were tested, as discussed in Sect. 3.4, these results are representative for most band-ratio algorithms. While many studies opt to use only the central band wavelength, not the full SRF [3,19], in which case the convolution error does not occur, comparing narrow-and wide-band data that way introduces similar problems, described in [29].
Prevention of the convolution error is straight-forward while post-hoc correction is not. As explained in Sect. 2., simply convolving (ir)radiances instead of reflectance prevents the error from occurring, and is the only mathematically correct procedure. Of course this requires the original data to be available, which is not always true. Post-hoc correction is impossible since the error is highly variable across different sensors and data sets. When lacking original data, the reported uncertainty may simply be increased by a few percentage points [29] but this fails to account for systematic biases. An estimate may be made, for example by reconstructing L w from a reported R rs and simulated E d , but this introduces further assumptions.
To this end, it is recommendable that published data sets, intended for satellite validation, contain not only products such as reflectance but also the raw data,at-sensor (ir)radiance data, and calibration data. This way, the convolution error can be avoided. Furthermore, it would greatly increase the amount of data available for other studies requiring radiometric data, such as those into glint removal [38]. Finally, publication of original data, as well as sensor characteristics, allows for traceability, which is crucial for quality control [40].
While this work focused on the remote sensing reflectance R rs using ocean color data, the principles and conclusions are broadly applicable. A general rule of thumb on convolution practice is given in Sect. 2.4. In short, the principles outlined in this work are relevant to the simulation of low-resolution data from high-resolution data. This includes all types of reflectance, as well as other divisional quantities such as attenuation coefficients and degree of polarization. Furthermore, it includes all types of convolution, including spectral and spatial. In all cases, a correct order of operations is crucial to prevent systematic errors. Any simplifications should be justified mathematically, not made at whim.

A. Validity of spectral convolution
Consider a source radiance spectrum L src (λ), observed by a hyperspectral sensor with N bands. The spectral radiance arriving at the sensor L(λ) is the product of the source spectrum and any atmospheric effects. However, the data used in this work were recorded in or only a few meters above the source, so atmospheric effects can safely be ignored: The hyperspectral sensor records the radiance in bands h = 1, 2, . . . , N, with central wavelengths λ h . Each band h has its own SRF S h (λ), and the radiance recorded in band h, L h , is the spectral convolution of the at-sensor spectrum L(λ) with S h (λ). The integral is evaluated over all wavelengths; for clarity, this is not written explicitly in this section. The denominator in Eq. (11) corrects for the spectral (or 'quantum') efficiency of the sensor in band h.
The resulting spectrum measured by the hyperspectral sensor, L H (λ), consists of the individual band spectra: S h affects L(λ) in two ways. The first is to lower it due to the spectral efficiency of the sensor. This is described by an overall SRF S H (λ); dividing the data by S H (λ) corrects for this. The second effect is to smoothen the data: since in practice S h is never a delta function, band h records not only the radiance at its central wavelength λ h but also at other wavelengths where S h (λ)>0. The smoothening can be described as a cross-correlation ( ) between the observed radiance S H (λ)L(λ) and a bandwidth function G. In reality, each band will have a slightly different G h , for example due to stray light; however, for simplicity, here G is assumed to be the same for all bands. Then L H (λ) can be described as in Eq. (13).
Now consider a multispectral band M with SRF S M (λ). Following the same logic, Eq. (11) gives the radiance recorded in band M, L M , as in Eq. (14).
However, when simulating multispectral data from hyperspectral data, the original radiance L(λ) is not available. Instead, the recorded hyperspectral radiance L H (λ) is used. This means that in practice, one does not calculate L M as in Eq. (14) but an approximation L H M , as in Eq. (15).
The approximation L H M ≈ L M holds in two cases. The first is if L H (λ) ≈ L(λ), that is if Eq. (16) holds. From information theory it follows that this is true if G is significantly narrower than typical features in L(λ).
The second case where L H M ≈ L M holds is when the multispectral band M is significantly wider than G and typical features in L(λ). Then, any radiance redistributed from λ h to surrounding wavelengths in L H (λ) is still captured in the integral ∫ L H (λ)S M (λ)dλ, and the value of the integral is the same. E d has narrow line features, as does L w by extension. Hyperspectral (ir)radiance sensors typically undersample these features [37,41], so Eq. (16) does not hold in practice.
However, the second case does hold, if M is significantly wider than the hyperspectral band (or G). The Nyquist-Shannon theorem provides, to first order, a requirement: L H M ≈ L M if the FWHM of M is at least twice that of the hyperspectral data. Then, hyperspectral data L H (λ) adequately approximate the original radiance L(λ) for spectral convolution purposes. Table 1 lists the radiometric data sets used in this work. Some of these contain unphysical data due to measurement errors, environmental effects, and instrumental problems [30,38,43]. This appendix describes how the data were filtered and homogenized before processing.

B. Radiometric data
First, all spectra were converted to SI units. Second, negative R rs values were clipped to 0 if −10 −4 <R rs (λ)<0 as this is within typical measurement errors; spectra with any R rs (λ) ≤ −10 −4 were removed wholly. For as11, 5 spectra with negative R rs were removed. For cariaco, 230 spectra missing R rs and 11 missing E d values were removed, as were 64 spectra with negative and 1 spectrum with unphysically high (>0.8) R rs . For clt-a, 6 spectra with negative R rs were removed. For clt-s, the spectra within 3-minute windows suggested in the accompanying documentation were averaged and L w was calculated from L u and L t following the Mobley protocol [39]; 19 spectra with missing and 36 with negative R rs were removed. For gasex, wavelengths λ>710 nm were removed due to incomplete data. For he302, 3 spectra with R rs (800 nm) ≥ 0.003 were removed as outliers; the original authors noted the difficulty in normalizing these data [38]. For msm213-h, L w was used to reconstruct E d ; 179 spectra with missing data were removed, as were spectra with unphysically large jumps in E d , namely 21 with |E d (λ 1 ) − E d (λ 2 )| ≥ 0.2 and |E d (λ 2 ) − E d (λ 3 )| ≥ 0.2 and 1 with |E d (λ 1 ) − E d (λ 2 )| ≥ 0.35, with λ 1 , λ 2 , λ 3 subsequent wavelengths. For orinoco, 1 spectrum with negative R rs was removed. For sabor-s, which contains polarized and unpolarized spectra, only the latter were used and the wavelength range was clipped to 358-749 nm because of incomplete and noisy data elsewhere. For seaswir-a, the provided plaque radiance L d was used to calculate E d = πL d [39], assuming a plaque reflectance of R g ≈ 1 [30]; no units were given for these data, so the resulting E d spectra were divided by 10 5 to be in line with the others. 'Water reflectance' R w was provided instead of R rs ; comparing [30] and [39] showed that R rs = R w /π. Finally, 35 spectra with missing E d and 39 with negative R rs were removed. For seaswir-r, R w was similarly converted to R rs and 2 spectra with negative R rs were removed. For sop4, the provided L u and L s were used to calculate R rs following the Mobley protocol [39] for simplicity [38]. The R rs spectra were then normalized by subtracting R rs (750 nm) [39] and the results used to re-calculate L w . Next, 885 spectra with unphysical max(E d (λ))<0.01 were removed and the wavelength range cropped to 360-750 nm to remove noisy data. Spectra with unphysical features were then removed, namely 23 with |R rs (λ 1 ) − R rs (λ 2 )| ≥ 0.005 and |R rs (λ 2 ) − R rs (λ 3 )| ≥ 0.005 and 3 with E d (400 nm) − E d (405 nm)>0.01; finally, 289 spectra with negative R rs were removed. The remaining data sets (sabor-h, sfp, rsp, taram, and tarao) required no post-processing. For sfp, only the mean R rs spectra were used.