Towards consistent assessments of in situ radiometric measurements for the validation of fluorescence satellite missions Remote Sensing of Environment

The upcoming Fluorescence Explorer (FLEX) satellite mission aims to provide high quality radiometric mea- surements for subsequent retrieval of sun-induced chlorophyll fluorescence (SIF). The combination of SIF with other observations stemming from the FLEX/Sentinel-3 tandem mission holds the potential to assess complex ecosystem processes. The calibration and validation (cal/val) of these radiometric measurements and derived products are central but challenging components of the mission. This contribution outlines strategies for the assessment of in situ radiometric measurements and retrieved SIF. We demonstrate how in situ spectrometer measurements can be analysed in terms of radiometric, spectral and spatial uncertainties. The analysis of more than 200 k spectra yields an average bias between two radiometric measurements by two individual spectrometers of 8%, with a larger variability in measurements of downwelling radiance (25%) compared to up- welling radiance (6%). Spectral shifts in the spectrometer relevant for SIF retrievals are consistently below 1 spectral pixel (up to 0.75). Found spectral shifts appear to be mostly dependent on temperature (as measured by a temperature probe in the instrument). Retrieved SIF shows a low variability of 1.8% compared with a noise reduced SIF estimate based on APAR. A combination of airborne imaging and in situ non-imaging fluorescence spectroscopy highlights the importance of a homogenous sampling surface and holds the potential to further uncover SIF retrieval issues as here shown for early evening acquisitions. Our experiments clearly indicate the need for careful site selection, measurement protocols, as well as the need for harmonized processing. This work thus contributes to guiding cal/val activities for the upcoming FLEX mission.


Introduction
Sun-induced chlorophyll fluorescence (SIF) is a red and far-red (infrared) light emission (650 nm to 800 nm) by plants in association with the process of plant photosynthesis (Mohammed et al., 2019). Stimulated by the development of the 8th Earth Explorer satellite mission Fluorescence Explorer (FLEX) by the European Space Agency (ESA) and the advanced exploitation of atmospheric missions like Sentinel-5P, GOSAT and OCO-2, researchers across the globe pushed the field of fluorescence spectroscopy over the last decade, resulting in versatile instrumentation and approaches to retrieve SIF across scales (Drusch et al., 2017;Mohammed et al., 2019). SIF complements established vegetation information accessible with common remote sensing (RS) approaches (i.e. biochemical or structural plant properties). Particularly the fact that SIF provides the most direct observation of plant photosynthesis at ecosystem scale from RS data, determines its increasing importance to studies of ecosystem functioning (Mohammed et al., 2019;Ryu et al., 2019). SIF was used to quantify photosynthetic activity at ecosystem scale Rossini et al., 2015) and constrain associated gas exchange processes including gross primary productivity (Damm et al., 2015a) and transpiration (Qiu et al., 2018;Shan et al., 2019;Shan et al., 2021). The assessment of drought effects in crops ) and on plant ecosystems or agricultural productivity Pagán et al., 2019;Sun et al., 2015) was also found to be possible with empirical relationships to SIF. Robust estimates of complex processes determined by various environmental and plant specific factors, however, requires the use of multi-sensor data (Damm et al., 2018;Jonard et al., 2020;Shan et al., 2019) and likely ingesting such observations in modelling schemes (Lee et al., 2015;Parazoo et al., 2014). Furthermore, SIF was also suggested as proxy for the assessment of plant functional diversity (Tagliabue et al., 2020), and to estimate absorbed photosynthetically active radiation (APAR) (Yang et al., 2018)).
Although highly promising, the retrieval and interpretation of the weak SIF signal is delicate and prone to errors. SIF contributes only 1-5% to the surface-leaving radiance in the far-red (Meroni et al., 2009a). The main challenge in SIF retrieval is thus a reliable decoupling of these two radiance components. SIF retrievals rely on measurements of spectrally contrasting conditions as caused by e.g. solar or atmospheric absorption lines. Particularly the measurement and interpretation of radiance in these small spectral windows determines SIF retrievals error prone, since atmospheric disturbances (Damm et al., 2014;Frankenberg et al., 2012) and instrumental effects Damm et al., 2011) can substantially alter the apparently retrieved SIF signal.
Detailing uncertainty related to the SIF processing chain is thus essential to ensure that employed retrieval approaches provide robust SIF estimates for the assessment of complex ecosystem processes. Systematic uncertainty assessments of satellite derived information need to be complemented by independent measurements from calibration/ validation (cal/val) networks, where in situ and airborne observations are systematically acquired and processed for a comparison to the satellite measurements (Hueni et al., 2017).
We argue that a harmonized processing and quality assessment of large in situ measurement series across test sites is important to eventually disseminate robust and quality checked data as one key requirement for calibration and validation efforts of the upcoming FLEX mission. We particularly aim to outline strategies for individual uncertainty estimates across processing levels from in situ radiometric measurements to SIF retrievals. We first highlight important theoretical sources of uncertainty related to these processing levels. We then demonstrate methods for the assessment of introduced individual uncertainties. From gained experiences we derive guidelines and suggestions for the implementation of cal/val activities in the context of satellite-based SIF measurements.

Background
Estimating uncertainties is a process that requires information about the contributing sources and their relationship (i.e. independent or correlated uncertainty sources). An uncertainty budget listing the influence of individual effects on the measurand (the quantity to be measured) and an error propagation are required to get a level of confidence on the presented quantitative value (Damasceno and Couto, 2018;Woolliams et al., 2015). In general, measurement uncertainties can be related to systematic and random errors that are commonly expressed as accuracy and precision. It is good practice to add an uncertainty estimate when presenting scientific results. There are however various factors complicating the derivation of the uncertainty budget for in situ spectroradiometers, such as the lack of clear and unified measurement protocols (Hueni et al., 2017) and numerous uncertainty sources (Bialek et al., 2020;Gamon, 2015;Pacheco-Labrador et al., 2019).
This article highlights uncertainties caused by instrumental effects (Chapter 2.1), assumptions applied during data calibration (Chapter 2.2), the subsequent retrieval of data products (Chapter 2.3), and the spatial representativeness of in situ observations for satellite measurements (Chapter 2.4). We note that these addressed aspects do not cover the full uncertainty budget, to get this, an uncertainty propagation should be applied. This must then include specific experiments to characterize the sources of uncertainty that lead to the observed data fluctuations.

Uncertainties due to instrumental effects
This category relates to the measurement process itself, while many factors influence the signal detected by the sensor.
The anisotropic sensitivity of the detector to incident energy relates to radiometric non-uniformities, while we here distinguish systematic and random radiometric non-uniformities. If the actual response of the detector elements (i.e. pixels) deviates from their nominal response (e.g. over time due to sensor degradation or in relation to temperature or pressure), systematic radiometric errors may occur. The environment, the instrumental setup and the sensor characteristics also influence random radiometric errors, typically expressed as noise and characterized with the signal-to-noise ratio (SNR) or the noise equivalent delta radiance (see Schläpfer and Schaepman (2002) for an approach to model sensor performance). Several radiometric non-uniformities additionally cause non-systematic radiometric errors (i.e. malfunctioning detector elements called bad pixel, or stray light effects). Noise is typically well quantifiable, while bad pixels and stray light are more difficult to detect.
Spectral non-uniformities define another category of instrumental uncertainty. Spectral deviations can result from operations under nonlaboratory conditions (i.e. in the field). Of particular interest is the spectral sensitivity of individual detector elements and their corresponding spectral response functions (SRF). Two parameters are typically used to describe the SRF, i) the spectral position of the peak sensitivity within the detector array (i.e. the centre wavelength position, CWL) and ii) the width of the SRF in the spectral dimension (i.e. full width at half maximum, FWHM). In most cases, the SRF is assumed to follow a Gaussian distribution, however recent work highlights that this assumption is not always the best option (Trim et al., 2021). Deviations of CWL and FWHM from their nominal values can cause systematic uncertainties manifested in observed radiometric values.

Uncertainties caused by assumptions during data calibration
During the sensor calibration, per-pixel registered digital signals are converted to spectral radiances (e.g. up-and downwelling spectral radiance L λ ↑ , L λ ↓ ) using calibration coefficients provided by the manufacturer or determined in dedicated laboratories. There are several sources of uncertainty associated with the calibration process that can be summarized in two main categories. First, the underlying sensor model defines the strategy how calibration coefficients are obtained and which effects are considered in respective factors. In case of unknown sensor behaviour, e.g. nonlinearities, these aspects are not represented in resulting calibration coefficients. Second, determined calibration coefficients characterize the instrument in laboratory but there is a break in the traceability chain from the laboratory to the field deployment (Woolliams et al., 2015). Instruments can deviate from their calibrated behaviour due to changes imposed during transport and installation, due to environmental stress during operation or due to aging and related sensor degradation (see Pacheco-Labrador et al. (2019) for a list of device internal and external causes). It should be noted that the deviation from the calibrated behaviour is caused by uncertainties discussed in the previous chapter, but this effect is listed here since it is a violation of the underlying integrity assumption.

Uncertainties caused by data product retrieval
Once the measurements are converted to a physical unit, they are typically further processed to derive quantitative or qualitative atmospheric or surface information ("data product generation"). This step introduces uncertainties by retrieval assumptions and required auxiliary data. For the example of SIF retrievals, two aspects can cause uncertainties: i) inherent assumption of the retrieval approach to disentangle reflected radiance from SIF emissions Chang et al., 2020;Damm et al., 2011), and ii) the reliability of the atmospheric compensation particularly for retrievals based on atmospheric absorption bands (i.e. oxygen) and in presence of anisotropic surfaces (Damm et al., 2014;Frankenberg et al., 2012).
SIF retrieval approaches differ in their strategies to approximate surface reflectance and fluorescence to eventually estimate reflected radiance and enable extracting SIF from the measured radiance signal (Chang et al., 2020;Meroni et al., 2009b). Models range from simple estimates based on neighbouring wavelengths to complex physical modelling (cf. Cendrero-Mateo et al. (2019), Mohammed et al. (2019), Chang et al. (2020) for a comprehensive review on this topic). All methods are prone to uncertainties because either the underlying model is simple or because required input parameters for complex modelling may also be uncertain.
Atmospheric correction is pivotal for SIF retrievals but requires exact knowledge of the atmospheric state during observation. In case of in situ measurements, down-welling irradiance is often directly measured to allow the retrieval of SIF. Particularly problematic are varying atmospheric path lengths of measured irradiances and reflected radiances if the sensor system is installed with a certain distance above the canopy (Sabater et al., 2018). Another reason for uncertainties is the fact that measured irradiances become increasingly disconnected from effective canopy irradiance with increasingly complex canopy structure (Damm et al., 2015b;Kückenbrink et al., 2019). In absence of irradiance measurements (e.g. for alternative in situ approaches or airborne cal/val measurements), atmospheric states need to be modelled. The atmospheric correction relies on the exact parameterization of atmospheric radiative transfer (RT) models, which per se are uncertain. Furthermore, only few model parameters (e.g. aerosol optical thickness, water vapor) can be directly provided via in situ observations that have their own limitations or are itself derived from data. Other parameters including vertical layering of the atmosphere need to be fixed by best guess, which is inaccurate per definition.

Representation uncertainties
Field observations at representative sites are frequently used in calibration and validation activities for satellite missions (Bouvet et al., 2019), e.g. for the Sentinel-2 mission, and they are required for smaller satellites without on-board calibration facilities (Gascon et al., 2017;Revel et al., 2019). Due to general recommendations for suitable land monitoring sites (e.g. homogeneity, size, number of cloud free days) mostly desert locations, defined as pseudo invariant calibration sites, are chosen (Lamquin et al., 2019;Marcq et al., 2018). The validation of SIF satellite missions requires non-desert sites that provide significant gradients of SIF emissions. Most natural vegetation sites do not have pseudo invariant surface features. Of concern are spatial homogeneity and extent of SIF emission values and other vegetation traits that can impact measured radiance signals (e.g. canopy structure, biochemistry). This impacts the representativeness of in situ measurements for larger footprint measurements, as will be provided by FLEX in the future.

Overview of uncertainty experiments
We demonstrate an approach that leverages time series to provide uncertainty estimates related to instruments, processing, and the representativeness of in situ measurements for satellite observations (Table 1). These uncertainty measures are evidently not exhaustive in the respective domains (as mentioned in Chapter 2) but they shed light on the overall reliability of the measurement series.

Test sites
We used data from two test sites for our study, namely Haute-Provence Observatory, Saint-Michel-l'Observatorie (France) and Campus Klein-Altendorf (CKA, Germany). The deciduous forest test site in Saint Michel is dominated by oak (Quercus pubescens) and is located close to the Observatoire de Haute-Provence (43 • 56 ′ N, 5 • 42 ′ E). The agricultural research station CKA (50 • 37 ′ N, 6 • 59 ′ E), covering various plant and crop types, is affiliated with the Agricultural Faculty of the University of Bonn and located 40 km south of Cologne between the cities of Meckenheim and Rheinbach.

In situ spectroscopy measurements
Data from the FloX system (JB-Hyperspectral Devices, Düsseldorf, Germany) are used in this study. The FloX system houses two spectrometers, a higher-resolution spectrometer for fluorescence retrieval (here referred to as FLUO) and a lower-resolution instrument for the retrieval of standard vegetation traits (referred to as FULL, cf. Table 2 for technical specifications). The FloX instrument is a commercially available instrument for top of canopy SIF estimates and widely accepted in the scientific community. The FloX network is growing and extensively exploited by several research agencies, including ESA in the frame of the We used time series data from a FloX system installed 90 m above the canopy at the ICOS tower in Saint-Michel spanning from December 2017 to September 2019. We also used data sets of two FloX systems installed in a wheat and an oat agricultural field at the German test site CKA, recorded on 23 June 2020. These two FloX systems were installed on small towers and the fibre optics were placed 0.9 m above the wheat canopy and 1.4 m above the oat canopy. Please see Table 1 for the specific experiments facilitated by the three FloX systems.

In situ spectroscopy data storage
Systematic uncertainty assessments greatly profit from a harmonized data collection and analysis. We made use of the existing SPECCHIO Spectral Information System (see Hueni et al. (2020) and references therein) and added functionality to handle the high spectral and temporal resolution FloX data. The resulting updated spectral information system (called FluoSpecchio) enables important functionality as needed for our study, including (i) data transfer from individual instruments to a central data storage and backup, (ii) data ingestion into a spectral information system, (iii) data processing from digital numbers to radiance, reflectance and SIF retrieval, (iv) large data volume handling, and (v) on-demand programmatic data extraction for subsequent analysis via an application programming interface. A detailed description for the implementation of all these steps (iv) and the architecture of Fluo-Specchio is presented in Appendix A. In short, an arbitrary number of in situ spectrometers collect data during the daylight hours and use the nightly hours to transmit the data to a central file server. This server then starts the ingestion into the database. After successful ingestion, the processing is started. During all processing steps, the database is augmented with automatically extracted metadata elements, such as sensor diagnostics (e.g. relative humidity, different temperature probes) and data quality indicators (e.g. illumination stability during the measurement). In the context of this article, the FloX data was accessed by downloading them together with selected metadata in the form of NetCDF files to facilitate further analysis as described in the following sections.

Airborne imaging spectroscopy measurements
The HyPlant airborne imaging spectrometer consists of three pushbroom line scanners. Two scanners form the DUAL module and cover the visible/near infrared and shortwave infrared wavelength range between 380 nm to 2500 nm in moderate spectral resolution ( Table 3). The third sensor forms the FLUO module and records image data in the near infrared wavelength range between 670 nm and 780 nm with very high spectral resolution (Table 3) . Please refer to Siegmann et al. (2019) for additional details of the HyPlant sensor system and an overview about the data processing scheme.
We used three data sets acquired over CKA at around 11:00, 14:00, and 17:00 on 23 June 2020 with varying sun zenith angles of 39.8 • , 27.2 • and 45.4 • , respectively. Data was recorded from a flight altitude of 600 m above ground level, leading to a ground sampling distance of 1 × 1 m. During each acquisition, no clouds were present and flight lines were recorded in either west or east direction to cover the location of the FloX systems on ground.

Information retrieval
3.4.1. Fluorescence retrieval 3.4.1.1. In situ data. SIF from FloX data is retrieved using the SpecFit implementation, which is an advanced method specifically developed to retrieve SIF and "true" surface reflectance (ρ) over the full emission spectrum (670-780 nm) from the observed total L ↑ emerging at the top of the canopy Cogliati et al., 2015). The novel version of SpecFit uses a simplified modelling of the SIF spectrum (only two parameters, related to the red and far-red peak magnitude), leveraging on the spectral information included in ρ as a proxy of the SIF reabsorption within the leaves and canopy. In practice, a simple parametric function is employed to fit the red and far-red fluorescence peaks in the 670-780 nm spectral window (i.e. combination of two Lorentzian peaks), while a piecewise cubic spline represents the reflectance. The SIF spectrum is thus estimated by means of a nonlinear least squares technique by minimizing the difference between the measured top-ofcanopy radiance and the spectrum calculated by SpecFit (for further details please refer to Cogliati et al. (2019); Cogliati et al. (2015)). For each FloX measurement, SIF and ρ spectra are retrieved. In addition, several metrics are computed from the SIF spectrum, including total SIF over the emission range from 600 to 800 nm, the spectral position and magnitude of the peaks in the red and far-red spectral regions (local maxima), and the SIF values at commonly exploited wavelengths (sampled at the wavelength corresponding to the bottom of the O 2 bands).
3.4.1.2. Airborne data. The SIF airborne retrieval is enabled by an algorithm (cf. Siegmann et al. (2019) for details), specifically developed to couple the MODTRAN5 atmospheric radiative transfer code (Berk et al., 2005) with the SIF retrieval at the O 2 bands based on the Spectral Fitting approach Cogliati et al., 2015). Accurate MOD-TRAN5 simulations require several input parameters to characterize the atmospheric scattering and absorption, as well as an accurate knowledge of the instrument's spectral response function. To simplify the retrieval approach and provide an operational airborne retrieval algorithm, the atmospheric transfer functions are estimated by means of an imagebased technique. For this, MODTRAN5 simulations are constrained using non-fluorescence areas (e.g. bare soil pixels). The atmospheric parameters are optimized under the assumption that fluorescence must be equal to zero for non-fluorescence pixels. This allows to estimate an effective value of the atmospheric functions that are used to retrieve the canopy SIF for the entire image. This simplified approach offers to retrieve SIF in a simpler way, but the retrieval is limited to narrow spectral windows at the O 2 absorption bands only. A more sophisticated atmospheric modelling (atmospheric correction) would be required to enable the retrieval approach and obtain the entire SIF emission spectrum as it is possible for the ground-based measurements.

Estimating absorbed photosynthetic active radiation
Absorbed photosynthetically active radiation (APAR) was retrieved from the difference between down-and upwelling (L ↓ , L ↑ ) radiance in the region between 400 and 700 nm, by approximating the integral of FULL spectrometer data in the respective wavelength range with the trapezoidal rule (Eq. (1)) as: Table 2 Main spectral characteristics of the FloX spectrometer system, L ↑ upwelling spectral radiance, L ↓ downwelling spectral radiance.  Table 3 Main spectral characteristics of the HyPlant spectrometer system according to Rascher et al. (2015).
with b indicating the spectra band, λ b the wavelength of band b, and n the number of spectral bands in the 400-700 nm interval. The used instrument is located at a certain height above the canopy (i.e. 0.9 m) and we, thus, assumed the atmospheric impact outside the O 2 bands almost negligible. This means that for a first order approximation of APAR an atmospheric correction is not needed and FloX spectra can be used as they are. This approximation does not consider canopy BRDF and suntarget-sensor geometry. It presents a first order approximation of the total APAR and not green APAR as described in Gitelson and Gamon (2015). A possible influence of radiation absorption by the soil is assumed to be small in the presented case because the data used in our analysis were acquired on well-developed agricultural field. Please note that this approximation is not directly transferable to other sites.

Time series preprocessing
For uncertainty assessments related to radiometric non-uniformities we used two years of measurements acquired about every minute during daylight hours (in total about half a million measurements). We conducted a few preprocessing steps to efficiently work with this amount of data. First the different datasets were filtered by thresholds for the quality indicators sensor saturation, illumination stability during the acquisition and integration time. We defined the thresholds based on investigation of the distribution of the quality indicators. Since this might yield different results for the two spectrometers (FLUO and FULL), they were then aligned again in a next step. The alignment was achieved by using the timestamps of the filtered data set with less elements (i.e. only select measurements with a corresponding timestamp in both measurement sets). The relevant variables (L ↓ and L ↑ ) were then resampled to 30-min frequency (by taking the 30-min mean value).
From this dataset, we then selected a day with ideal conditions (no clouds) for which we created corresponding synthetic spectra using radiative transfer simulations. We used the open source libRadtran radiative transfer model (Emde et al., 2016;Mayer and Kylling, 2005) parameterized with a standard mid-latitude summer atmosphere and a default rural aerosol model, visibility of 23 km and mixed forest surface type. The DISORT solver was used with eight streams and the absorption parameterization was set to use REPTRAN with fine resolution. REPTRAN uses the extra-terrestrial solar spectrum based on (Kurucz, 1994). We note that there are more recent measurements available for the solar spectrum (e.g. Meftah et al. (2018)) -with non-negligible differencesbut in the context of this study the focus is on other sources of uncertainty.
As a final step in the preprocessing, we conducted a convolution for sensor comparisons (i.e. aligning the spectral characteristics). Therefore, FLUO data were convolved to the spectral resolution of FULL by means of the respective CWL and FWHM. The signals simulated with libRadtran were convolved to the spectral resolution of both spectrometers. The convolved signals were used to estimate noise and to support estimation of the spectral shift.

Systematic and random radiometric errors
Systematic radiometric errors can be evaluated by exploiting the configuration of the FloX that combines two sensors (FLUO, FULL) with a spectral overlap and similar characteristics (cf. Table 2). This allows a radiometric evaluation since observations were acquired under the same environmental conditions. Although such a comparison does not create a traceability chain (because neither of the two devices is traceable to an SI standard), it does however give a better estimate on the radiometric consistency between the sensors and to a lesser degree also on the absolute value of the measurand. Differences between the two sensors are nonetheless expectable because environmental variables such as temperature and humidity might not affect the two spectrometers in the same way, furthermore, the instrumental setup is also not identical between the two systems (i.e. the FLUO sensor is kept at a more stable temperature than the FULL sensor).
The signals of the FULL and the convolved FLUO spectrometers were averaged in the spectral region of overlap (650 nm to 813 nm) to compare them. Together with the temporal resampling to thirty minutes (as discussed earlier) these two steps reduce small scale variabilities in sensor noise and environmental stressors, which allows to focus more on the alignment between the maximum relative difference Δ rel max between the two spectrometers (x and y, Eq. (2)) as: We chose Δ rel max because neither x nor y can be assumed to be the reference a priori. In a final step, the difference was averaged considering each individual acquisition per day to derive a bulk daily estimate. Random radiometric errors, often represented or quantified by the signal-to-noise ratio (SNR), and other non-systematic errors (e.g. temporary specular reflections or shading of nearby objects), would ideally be analysed with repeated measurements under known and stable conditions. In the case of measurements in the field, however, these conditions are never met. Assuming that L ↓ under stable conditions (e.g. cloud free conditions) follows a well predictable diurnal pattern, we compared the variation of observed and modelled L ↓ diurnal cycles to estimate random and other non-systematic uncertainties.
For the assessment of the random radiometric error, measured L ↓ at 755 nm was approximated by a polynomial fit with ten degrees of freedom (least squares polynomial fit). The polynomial fit can be expected to closely match the measurements and thus highlight small deviations (i.e. bulk noise) from the overall trend. It must be noted that even though we analysed on a clear stable day, atmospheric disturbances and canopy movement likely influence measured signals and contribute to instrument noise. The variability determined in this way is not comparable to a real derivation of the SNR of a system but represents a bulk variability under field conditions comprising SNR, atmospheric disturbances and canopy movement among others.
The comparison of measured and simulated (libRadtran) diurnal cycles of L ↓ at 755 nm allows assessing other non-systematic radiometric errors including temporal shading or other illumination effects in the observational setup. It must be noted that models require an accurate parameterization of the environmental conditions, while such information is seldom available. Since the tested wavelength at 755 nm is located in an atmospheric window (high transmittance) and atmospheric Rayleigh and Mie scattering is relatively low under dry and cloud free conditions, we assumed the impact of atmospheric disturbances as small. Sun and observational geometry can be also well constrained so that we expect resulting diurnal L ↓ signals as robust proxy of expected L ↓ diurnals.
For both cases, the diurnal variation of the maximum relative difference Δ rel max (Eq. (2)) between measured and modelled (polynomial fit, libRtran) L ↓ at 755 nm was systematically analysed.

Spectral uncertainties
Spectral uncertainties due to deviations of actual and nominal CWL (spectral shift) and FWHM (band broadening) are assessed by exploiting how accurate the FLUO system samples the region around the oxygen absorption features O 2 -A. We particularly evaluate the temporal variation of the band position to estimate deviations of the nominal CWL, while the assessment of band broadening effects are not treated here (we refer to Meroni et al. (2010) for more details). To this end, we selected a region covering the oxygen absorption feature and shifted the nominal CWL position to higher and lower wavelengths up to four times the spectral sampling interval (see Table 2) in steps of 0.01 nm. The shifted band positions were then used in the convolution process to create synthetic spectra with known spectral shift. For each measurement, the shift is then quantified by obtaining the best fit between a spectrally shifted synthetic spectrum and the observation (based on R 2 ). The simulations were scaled in respect to the measurement to cancel out illumination constraints, this way we could improve computational efficiency and method accuracy.

Assessment of SIF retrieval uncertainties
Some strategies exist to evaluate the accuracy of retrieved SIF signals in vegetation canopies, i.e., measurement of field employable reference tarps or LED panels . In absence of such absolute SIF references, only the relative performance and plausibility of retrieved SIF could be assessed in this study. The stability and accuracy of derived SIF signals is evaluated by making use of the mechanistic relationship between SIF and APAR (Damm et al., 2015a;Yang et al., 2018). In detail, we calculated the correlation of APAR and SIF for a day with good and stable weather (i.e. blue sky) in an active and unstressed crop canopy, and used this correlation to estimate the diurnal cycle of SIF using APAR. Due to the almost noise free APAR proxy, since representing a spectral average from 400 nm to 700 nm, we could derive a quasi noise free SIF diurnal cycle. Any deviation of the real SIF retrieval from the noise-reduced SIF estimates indicates the SIF retrieval uncertainty of a single data point. The difference between the model and the observation was calculated based on Eq. (2).

Evaluation of representation error
Surface heterogeneity challenges the direct comparison of an in situ measurement with a small footprint (e.g. diameter of 50-100 cm) with a larger footprint observation (e.g. 300 m pixel size). We exploit airborne SIF data (HyPlant) together with FloX in situ measurements to evaluate the representativeness of an in situ SIF signal for a spaceborne derived signal. The opportunity to spatially re-sample the airborne-based SIF map enables evaluating different spatial (and temporal) scales and allows investigating the effect of scene heterogeneity on satellite validation attempts. In a first step, a number of HyPlant acquisitions were merged in a reverse painting approach, thereby creating a spatiotemporal mosaic covering a 300 by 300 m area around the FloX system. The spatially aggregated area was gradually increased up to a pixel size of 300 × 300 m. For each new aggregation size, relevant image statistics were derived (i.e. mean, min, max, standard deviation).

Assessment of systematic and unsystematic radiometric uncertainty
The FloX dataset acquired at the French test site covers a period of nearly two years and was used to look at the systematic correspondence of radiances measured with both spectrometers (Fig. 1). A trend in presented Δ rel max would indicate a possible sensor degradation, assuming that both sensors would not degrade in exactly the same way. Fig. 1 does not show such a trend, suggesting no relevant degradation of any sensor for the given time period. The variability, however, is drastically larger in L ↓ (s = 25%) than in L ↑ (s = 6%). The daily mean in Δ rel max is 8% for both L ↓ and L ↑ , with insignificant seasonal and interannual trends (Fig. 1). There are some contrasting measurement days with a difference of more than 20% and almost reaching 100% (e.g. a August/December 2018). Interestingly, the L ↑ measurements are not exhibiting the same behaviour for August and December 2018 (Fig. 1b).
Since the spectrometers use a dual field of view setup (e.g. Fig. 1 in Porcar-Castell et al. (2015)), this finding indicates that whatever causes the problems for L ↓ is not determined by sensor issues. There is no significant correlation between Δ rel max of L ↓ and L ↑ (data not shown).
Concerning random radiometric uncertainties, we evaluate the agreement of a diurnal cycle of L ↓ measurements with a fitted polynomial model as well as with a signal simulated with the libRadtran RT model (Fig. 2a, d). Measured L ↓ of both spectrometers follow a typical diurnal cycle but show specific differences with both modelled signals (Fig. 2b, c and e, f). Δ rel max for the polynomial fit indicates low variability with an average Δ rel max of 0.3% for L ↓ FLUO (Fig. 2b) and 1.4% for L ↓ FULL (Fig. 2e). Δ rel max for the libRadtran simulation decreases from the start of the measurement period at 06:00 until about 10:00, where it levels out until about 14:00, and increases again until the end of the measurement period (Fig. 2c, f). Imposed on this pattern, we find a relatively constant offset of about 23 mW/m 2 /nm/sr between the measurement of the FULL spectrometer and the simulation (Fig. 2d).

Assessment of spectral uncertainty
We observe a clear temporal pattern of a spectral shift in the FLUO sensor starting at about − 0.5 pixel in December, decreasing to a 0 pixel shift in April and then increasing towards July and August with a peak shift of about 0.75 pixel (Fig. 3a). The mean daily spectral shift (orange line in Fig. 3a) is closely related to the mean daily temperature measured in the instrument chamber (red line in Fig. 3a, also highlighted in Fig. 3d). The seasonal trend is superimposed on a daily pattern (see Fig. 3b). In Winter, the shift is in general negative and decreases towards midday. In contrast, the spectral shift is positive in summer time and peaks around midday. Important is the correspondence between the shift in the individual L ↑ and L ↓ measurements (Fig. 3c). Ideally both, the L ↑ and L ↓ measurement, show the same spectral shift since SIF retrieval methods use both signals to disentangle ρ and SIF (cf. Pacheco-Labrador et al. (2019) for a comparison of SIF retrieval uncertainties in relation to spectral shifts that occur in either one or both L measurements). While there is a general trend towards a consistent shift, we also observe situations with a mismatch. These mismatches tend to occur more often for negative shifts (bottom left of Fig. 3c) and less often for positive shifts (top right of Fig. 3c).

Assessment of SIF retrieval uncertainties
The assessment of the SIF retrieval accuracy is based on a qualitative comparison between corresponding APAR and SIF. We find a slightly saturating relationship between APAR and SIF which is in agreement with model simulations (cf. Damm et al. (2015a)) and indicates plausibility of retrieved SIF (Fig. 4a). Assuming that SIF in the O 2 -A band closely follows APAR under non-stressful environmental conditions, we could exploit this relationship to evaluate the retrieval robustness in terms of SIF noise (black line in Fig. 4a). It must be noted that this approach might not work for red SIF due to re-absorption effects by the canopy. Fig. 4b shows the calculated Δ rel max as the difference between SIF and the derived noise-reduced SIF based on APAR. With increasing APAR the variability in SIF increases, however it is relatively small overall (mean of 1.8 ± 1.4%).

Evaluation of spatial representation error
The representativeness of an in situ SIF measurement covering a small spatial footprint (e.g. circular with a diameter of 0.5 to 1 m) in high temporal resolution (e.g. every two minutes) for a satellite observation (snapshot in 300 m pixel size) is determined by the temporal stability of SIF and the spatial variability in the satellite footprint (Fig. 5). This is illustrated here for three points in time where the spatial dynamics was observed by HyPlant (Fig. 5, visually in left column and statistics in right column) and the temporal dynamics with the FloX system (Fig. 5, right column, σ FloX ). Concerning the spatial heterogeneity, we observe that an increasing aggregation causes an increasing SIF variability (error bars in Fig. 5). We also find a good agreement between FloX (orange bar in Fig. 5) and HyPlant measurements (black dots in Fig. 5) for noon and early afternoon flights at the lowest spatial aggregation level (4 m aggregation distance). With increasing spatial aggregation, the spatially averaged HyPlant SIF and SIF measured with the FloX increasingly diverge, and saturate with a difference of around 0.3 mW m − 2 sr − 1 nm − 1 (noon) and around 0.4 mW m − 2 sr − 1 nm − 1 (early afternoon). For the late afternoon flight, we observe the largest mismatch between FloX and HyPlant at the smallest aggregation size (around 0.35 mW m − 2 sr − 1 nm − 1 ) and the best agreement with the corresponding largest aggregation (300 m).

Instrumental, methodological and representation uncertainties
We exploited a time series of radiometric measurements to assess systematic and random radiometric uncertainties of in situ radiometric measurements. We found a comparable systematic bias in L ↑ and L ↓ measured with two independent sensors but differing temporal variability in the agreement of both sensors (Δ rel max ). These differences can be theoretically related to the spectrometer or the footprint observed with the fibre optics. Since a possible sensor problem would result in a similar variability of Δ rel max for L ↑ and L ↓ measurements over time, we suspect that the object heterogeneity might be the source of the observed uncertainties. In fact, although the fibre optics are mounted as close as possible, there is likely a small difference in the observed canopy area that contributes to observed differences. The high variability in the downwelling radiance is a matter of concern and further investigation into this variability is needed to evaluate the origin of this uncertainty, e. g. shutter issues or optimization problems. Shutter issues, for example, or degradation of other optical components are known to happen and are likely the cause of the two acquisition periods with very large Δ rel max of up to 100%. Confronting derived quality measures with automatically collected information about the system status (e.g. optimization parameter, sensor performance measures) will help to further deepen insight on possible causes. While comparing measured and simulated L ↓ , we observed low noise in the measurements of the high resolution FLUO spectrometer Fig. 2b). This is reasonable because the sensor is in a temperature-controlled compartment that contributes to reducing noise. The FULL spectrometer shows larger noise (Fig. 2e), which is surprising at a first glance since this spectrometer integrates over a larger wavelength regions per spectral band compared to the FLUO instrument. However, the SNR of the FULL spectrometer is only roughly 1/3 of that of the FLUO spectrometer and it is not temperature controlled. Besides, we also found a diurnal pattern in the offset between measured and simulated L ↓ for both spectrometers (Fig. 2c, f). This is likely caused by illumination and optical path geometry, where the cosine receptor for certain sun angles do not perfectly capture the sky irradiance. We also observed a larger offset between simulated and measured L ↓ for the FULL spectrometer. Since the agreement between simulated and measured L ↓ for the FLUO is rather consistent, we would rule out issues with the simulation and condider that possible calibration issues and cosine receptor imperfections can partly explain the observed offset and variability.
For the spectral shift, we found a clear seasonal pattern that closely follows the temperature in the FloX housing. This is reasonable because temperature changes of an instrument are known to influence the measurement of radiances (Pacheco-Labrador and Pilar Martin, 2015). The evaluation of spectral shifts should ideally be associated with a simultaneous quantification of band broadening effects (we refer to Meroni et al. (2010) for more details). SIF retrieval approaches are sensitive to spectral shifts as observed in this study (i.e. +/− 0.75 pixel), (see an assessment by Pacheco-Labrador et al. (2019) for SFM approach or Damm et al. (2011) for FLD based derivatives). Also, the observed increasing mismatch between the shift in the up-and the downwelling radiance for temperatures below about 10 • C must be accounted for to accurately retrieve SIF. The spectrally high resolution FloX data, however, facilitates the assessment of spectral shifts (as demonstrated in this study), since several spectral bands sample the O 2 absorption features. Ideally uncovered variability in the measured radiances together with the spectral shifts would be used in a more sophisticated assessment strategy, e.g. as demonstrated in , to compensate instrumental effects and obtain robust and unbiased SIF.
In absence of reference SIF measurements, we suggested evaluating SIF retrieval uncertainties based on a cross-comparison with its main driver APAR. Our analysis revealed low levels of noise in retrieved SIF and a plausible diurnal SIF pattern. It must be noted that the proposed approach is only valid for unstressed vegetation where SIF is closely related to APAR. Under environmental stress, this assumption is violated and likely not applicable. In the future, the implementation of more sophisticated approaches is suggested. This could be the implementation of an automated SIF retrieval method cross-comparison (Chang et al., 2020;Damm et al., 2011) to continuously evaluate the impact of method inherent assumptions and used auxiliary data, as well as to provide confidence on retrieved SIF dynamics and value ranges.
The analysis of the spatial and temporal representativity revealed a very good agreement between the in situ and airborne SIF retrieval for a timing close to noon and in the early afternoon. This indicates a robust atmospheric correction in the HyPlant processing scheme. The observed disagreement in the late afternoon (around 16:45) for the best spatial match (i.e. smallest aggregation size of 2 by 2 pixel) is not realistic. An in-depth analysis with methods described in this study (cf. Appendix B) Fig. 4. Sun-induced chlorophyll fluorescence (SIF) retrieval accuracy as depicted by (a) the correlation to absorbed photosynthetic active radiation (APAR). (b) The relative difference Δ rel max between retrieved SIF and SIF predicted via APAR as an indication for SIF noise and retrieval uncertainty. SIF was sampled at 760 nm. did not reveal a notable bias in the FloX SIF retrieval or the FloX radiance measurements during the overflight times of HyPlant. Furthermore, an extended analysis of the spatial heterogeneity in HyPlant NDVI maps did not unravel a vegetation pattern that could explain the larger HyPlant SIF values in the late afternoon. Here the NDVI is very stable both in time and space throughout the individual acquisitions. We thus conclude that we can rely on the in situ SIF estimate via FloX and, in consequence, we believe that there is likely an issue in the HyPlant SIF retrieval for late afternoon observational times. This finding indicates the importance of concurrent in situ airborne measurements to identify possible retrieval problems.
To move from the presented estimates of individual measurement uncertainties to a full accounting of uncertainties related to measurement, product retrieval and representativeness, we suggest further experiments that look into each of these aspects individually and then combine findings in a Monte Carlo approach to propagate errors. This has been demonstrated by Bialek et al. (2020) for in situ water leaving radiance measurements. A similar effort is required for in situ fluorescence spectroscopy, while a good starting point is presented in Pacheco-Labrador et al. (2019). With the help of the presented uncertainty estimates, it is possible to filter out individual measurements or longer periods of measurements with abnormally high uncertainties. Together with a filtering based on direct quality indicators (like illumination stability, integration time or saturation) and a temporal resampling, this can yield more robust in situ data sets for the calibration and validation of satellite retrieved products.

Towards Cal/Val activities of fluorescence satellite missions
Validation networks are pivotal to ensure sufficient data quality, but they are particularly challenging to implement for satellite missions aiming to retrieve SIF (i.e. FLEX). It is a challenging task because of the spatio-temporal dynamics of SIF in combination with the complexity of retrieval schemes. Due to the relation between SIF and photosynthesis, SIF is often used to estimate complex ecosystem processes (Mohammed et al., 2019;Porcar-Castell et al., 2021). Cal/val networks should consequently facilitate a thorough and consistent uncertainty assessment of all involved processing stages to enable the subsequent use of SIF together with its uncertainty estimate in process models. Several components make up a successful and reliable cal/val network, including the distribution and properties of considered sites, site measurement infrastructure and analytical capability (i.e. data exploitation platforms) to manage data.
For SIF satellite mission cal/val, a diversity of environmental factors across involved sites is fundamental. Large gradients of vegetation information (e.g. SIF emission, biochemical and structural traits) and environmental conditions (e.g. atmospheric composition, irradiance, temperature, water availability) need to be covered by the network. This will enable thorough analysis of SIF retrieval performances including the correction of atmospheric effects, and to evaluate relations between SIF and ecosystem processes (e.g. productivity, transpiration).
At the individual site level other requirements become important to ensure reliability and robustness of the entire network, including homogeneity of the covered footprint to reduce the uncertainty caused by spatial variability. Particularly in the context of SIF measurements, site homogeneity might become an issue since vegetation covered cal/val sites are per definition heterogeneous and heterogeneity can even change over time. One possibility to characterize site homogeneity is to frequently evaluate spatial variation of the remote sensing indices NIRv and NIRvR that were found to correlate with SIF under unstressed conditions (Badgley et al., 2017;Zeng et al., 2019;Zeng et al., 2022) and can be retrieved from high resolution operational satellite missions (e.g. Sentinel-2). We suggest to first explore the relationship to SIF and NIRv based indices on airborne data across site conditions before applying it to satellite products. Besides the homogeneity criteria, availability of other extensive auxiliary data per site including atmospheric measurements or frequent vegetation sampling are recommended to facilitate the evaluation of site suitability, and thus representativeness, and the analysis and interpretation of validation results.
Availability of reliable instrumentation is essential, but outdoor operations of sensitive spectrometer systems are challenging. Field spectrometers can always be subject to errors, particularly environmental stress (e.g. temperature changes, rain, wind) can cause measurement problems (Pacheco-Labrador and Pilar Martin, 2015). It is important to (automatically) detect and trace performance changes even for very remote field stations. A method of in situ fluorescence uncertainty estimation was presented in Burkart et al. (2015) and could be adapted to current automatic field installations and uncertainty measures. In sensor networks, consistency of measurements across sites becomes important, particularly if different instrumentation is used. Therefore, frequent inter-calibration attempts are key and should complement individual site monitoring and calibration efforts.
Next to site and instrumental requirements, analytical infrastructure is an important prerequisite for a successful cal/val of SIF satellite measurements. In our study, we used a set of different tools, called FluoSpecchio, comprising methods for data transfer and ingestion, homogenized processing, and access to data and metadata. While applying our uncertainty assessment, we derived several conclusions concerning the complication of processing long radiometric time series in the realm of SIF retrievals. In the following we highlight a few important aspects and suggest possible approaches for an efficient data processing.
One important feature of FluoSpecchio is to gather, combine, and store data from many research sites. When the radiometric data is stored together with relevant metadata (e.g. quality indicators like sensor saturation), one can effectively filter and request data. Our processing chain is modular and supports state-of-the-art processing and product retrieval schemes. This modularity can be used to quickly account for evolving methodological developments (e.g. new or improved algorithms to derive SIF). The high spectral and temporal resolution of such in situ radiometric observations distributed across many sites produces large amounts of data. We realized a trade-off between optimal data storage (incl. meta data) and performance. This means that the current processing time of FluoSpecchio is not competitive with the native processing provided by the manufacturer. It is thus important to consider the data volume in the design of such systems. We are sure that improvements can be achieved by parallelizing workflows and by adopting big-data and internet-of-things technologies (e.g. distributed databases, time series databases, dashboards, logs, monitoring).

Conclusions
Considering our assessment of possible error sources associated with the validation of satellite-based SIF measurements using in situ infrastructure, we conclude that the implementation of cal/val networks for SIF satellite missions is essential but highly challenging. Particularly the dynamics and complexity of the SIF signal and the various factors influencing the SIF retrieval accuracy (i.e. instrumental, calibration, methodological effects) determines high demands on the completeness and suitability of evaluation strategies for a successful SIF validation.
The list of requirements for SIF cal/val networks is long and the resulting big-data challenges can be best tackled with dedicated analytical tools for harmonized data collection, storage, processing and analysis. We demonstrated the added value of harmonized data processing even if applied checks are not always traceable to SI units or are of relative nature. We highly recommend investing in the further development of data exploitation platforms to eventually enable a comprehensive uncertainty assessment of in situ SIF measurement when applied for satellite-based SIF validation. We suggest to consequently apply these tools to define network requirements, evaluate and frequently monitor suitability of identified sites and, thus, ensure quality and comparability of in situ SIF measurements for satellite validation.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments
Bastian Buman was financially supported by the European Space Agency (ESA) in the frame of the AtmoFlex project (ESA/Contract No 4000122454/17 /NL/FF/mg). Marco Celesti was supported by ESA in frame of a Living Planet Fellowship (ESA/Contract No. 4000125442/ 18/I-NS). This work was partly supported by the project 19ENV07 MetEOC-4, that has received funding from the EMPIR programme cofinanced by the Participating States and from the European Union's Horizon 2020 research and innovation programme. Airborne data acquisition was supported by the 'Strukturwandel-Projekt Bio-okonomieREVIER', which is funded by the German Federal Ministry of Education and Research (project identification number 031B0918A). We are grateful to the anonymous reviewers for providing excellent and highly constructive comments to improve this manuscript.

Appendix A. Design and functionality of FluoSpecchio
The analysis of time series of FloX data collected across different research sites ideally requires a fully automated processing including i) data transfer from spatially distributed FloX instruments to a central data storage, ii) data ingestion into a spectral information system, iii) data processing from digital numbers to radiance, to reflectance, and to SIF, iv) ability to handle the large data volumes stemming from several deployed FloX instruments, and v) interfaces to flexibly and programmatically interrogate the information system and extract the data for subsequent analysis.
FluoSpecchio is a first attempt to combine a number of exiting software components to meet these requirements. At the heart of FluoSpecchio is the SPECCHIO system (Hueni et al., 2020;Hueni et al., 2009). SPECCHIO is an advanced spectral information system with a client-server architecture and a relational spectral database for data storage. The advantages of SPECCHIO are the rich API allowing a fully programmatic control of the system while also offering a graphical user interface (GUI) for access, modification, and input/output (I/O) operations.
The various components and their interaction are best understood when considering a) the time axis of dataflow and the related processing levels (see Fig. A1), and b) the different processing routines utilised during the spectroscopy data life cycle (Hueni et al., 2020) of FloX data. Fig. A1. Overview of the FluoSpecchio system. Acquisition of radiometric measurements during the day is followed by a transfer of data from many (n) sites to a single (1) central storage unit (e.g. at the institute) during the night. On the file server, an observer monitors the incoming data and as soon as a site has finished its data upload, the ingestion process is triggered by starting a SPECCHIO client instance. The SPECCHIO client transfers the data to the SPECCHIO database and subsequently starts the processing of the raw data. Each individual processing level (L0 -> L2) remains in the database and inherits meta data (MD). Finally, users can request data via the Fluospecchio downloader (FS) to retrieve NetCDF files on demand for further processing. Processing is done in Matlab (MATLAB, 2019) while the data download is handled in a Python environment (van Rossum and Drake, 2009).
FloX instruments spend the daylight hours acquiring spectral measurements every few minutes and storing data as digital numbers. The night-time presents a perfect opportunity to utilize the onboard computer to transfer the daily data files from the local storage card to a server. Spectral files are stored on an SD-card connected to an on-board computer (.CSV, Fig. A1). A scheduler on this computer is used to start the nightly file transmission using an SFTP client (Fig. A1). A connection to our network is established via a VPN tunnel. All files not yet transmitted to our file server are then uploaded to a site-specific raw data directory on the file server (Fig. A1). Alternatively, the data may also be put onto the server manually after visiting the FloX in the field and retrieving the storage SD card.
The server side consists of a file server for raw data upload and backup as well as for the hosting of the SPECCHIO web application server plus the SPECCHIO database (Fig. A1). A Java based loader process (Observer, Fig. A1) is used to monitor the raw file directories on the file server and initiate automatic data ingestion and processing for newly uploaded files by using the SPECCHIO Java client. The loader has a configurable polling time, typically 60 s, defining the time interval between checking for new data. A single loader instance can be easily configured to handle several towers.
Synchronization of upload and data ingestion is implemented with a state text file per spectral tower, with the file name encoding the state. Any deadlock situation due to concurrent access of the state file by the tower upload and the data loader processes is avoided by utilizing the atomic operation of the rename command of the UNIX operating system. The states are defined as shown in Table A1. Consequently, the loader process will only start when the tower has finalized the uploading and updated the name of the state file accordingly to its neutral tower state, e.g. Site_neu-tral_state(tower).txt.

Table A1
States of data upload and ingestion in data base. The loader process initiates a cascade of consequent processing stages once new data are available. These processing steps comprise a) loading of raw data (L0, Fig. A1), b) radiometric processing (L1, Fig. A1), c) reflectance calculation (L2, Fig. A1), and d) SIF retrieval (L2, Fig. A1). The data ingestion process is a regular file loading routine of the SPECCHIO Java client. It parses the data files and sends the spectral data and their metadata to the SPECCHIO server where the spectra are inserted as digital numbers and augmented with their metadata. The subsequent processing (L0 ➔ L1 ➔ L2Fig. A1) bases on object oriented Matlab code.
Metadata (MD, Fig. A1) defined at L0 level are automatically inherited by higher levels through functions of the SPECCHIO system. Higher levels can keep adding metadata on their own level or on previous levels, which are then automatically inherited again. An example of this process is a quality indicator (QI, Fig. A1) called saturation count computed during L1 processing. This QI is inserted on the L0 level, allowing the easy selection of data outside the sensor's saturation limit.
The rich API provided by the SPECCHIO web application is then employed in a software called FluoSpecchio downloader (FS, Fig. A1). The FluoSpecchio downloader, written in Python 3 (van Rossum and Drake, 2009), makes use of the SPECCHIO Java client, by means of the JPype software (JpypeOrg, 2020) and allows the user to navigate the database as well as to download data in the form of NetCDF (ref Fig. A1) version 4 files (Rew et al., 1989) including a customizable selection of metadata attributes (MD). This internally makes use of the Numpy (van der Walt et al., 2011), Xarray (Hoyer et al., 2021) and Dask (Dask_Development_Team, 2016) Python libraries. The NetCDF format provides a reliable and convenient experience for the data analyst due to the augmentation with relevant metadata, fast access time and easy file sharing.

Appendix B. Quality of FloX data used for the evaluation of spatial representation errors
Section 4.4 shows an analysis of the spatial representation error of SIF retrieved from in situ FloX measurements for SIF retrieved from the HyPlant airborne imaging spectrometer. This analysis revealed some differences between FloX and HyPlant based SIF, particularly in the late afternoon (cf. Section 4.4 and Section 5.1). This appendix presents a quality assessment of the used FloX data with methods described in this study. Upwelling radiance (L ↑ , grey) and downwelling radiance (L ↓ , black) shown in Fig. B1 follow a bell-shaped pattern, indicating no illumination effects such as shading. SIF (orange) shows a common diurnal pattern, closely following L ↓ and L ↑ particularly in the afternoon, while leveling at around 1.2 mW/m 2 /sr/nm. For the assessment of representation errors in Section 4.4 of this study, we used FloX measurements acquired at 11:05, 13:45 and 16:45, corresponding to the data acquisition of HyPlant. The following assessments focus on the FloX data quality within this time span as indicated with the grey box in Fig. B1. are spectrally averaged signals in the overlapping region of both spectrometers. FLUO data were not convolved to match the spectral resolution of the FULL data.
The assessment in Fig. B2 shows a good correspondence between L ↓ and L ↑ measurements between both sensors. For L ↓ we observe low Δ rel max values (i.e. less than 2.5%, black line in Fig. B2), while L ↑ has a slightly larger relative difference (around 5%, grey line in Fig. B2), but it is smaller to the one found in France (about 8%).

Fig. B3.
Reliability of retrieved sun-induced chlorophyll fluorescence (SIF) as depicted by the correlation with absorbed photosynthetic active radiation (APAR) (left). Right: relative difference Δ rel max between retrieved SIF and SIF predicted via APAR.
The relation between SIF and APAR (Fig. B3) shows a high agreement (R 2 of 0.98). The variability enlarges for higher APAR values as result of a slight saturation of SIF for higher APAR (cf. Fig. B1). The assessment of Δ rel max reveals errors of less than 12% when comparing retrieved SIF and SIF predicted via APAR. This variability can be associated to noise and other effects causing random variation in SIF (e.g. wind caused movement of the canopy, etc). B4. Mosaic of the normalized difference vegetation index (NDVI) calculated from HyPlant data and its correspondence with in situ retrieved NDVI from a FloX spectrometer system. Left column: HyPlant based NDVI maps. The orange dot indicates the position of the FloX system, the white rectangles the different spatial aggregation level ranging from 4 to 300 m. Right column: Mean and standard deviation of HyPlant based NDVI per aggregation level, the orange dot relates to the smallest aggregation level (2 × 2 pixel surrounding the FloX sysem), black dots to aggregation distances between 60 and 300 m. The orange bar with dotted lines (σ FloX ) indicates the temporal NDVI variability (mean ± standard deviation) as derived from the FloX during the time of HyPlant acquisitions.

Fig.
The analysis in Fig. B4 indicates that HyPlant based NDVI is very stable over the course of the day. The variability in NDVI increases with increasing spatial aggregation distance, as expected. At the highest and second highest spatial match (i.e. both lowest aggregation distances) the agreement with the FloX measurement is largest.