How well do recently reconstructed solar-induced fluorescence datasets model gross primary productivity?

The collection of various long-term reconstructed solar-induced fluorescence (SIF) datasets derived at a range of spatio-temporal scales provides new opportunities for modelling vegetation dynamics, in particular, gross primary productivity (GPP). Information about the proximity of the reconstructed SIF (SIFr) datasets to GPP across land cover types and climatic conditions provides important support for a better application of these products for modelling applications. We conducted a multiscale analysis of four different long-term (12 years, 2007 – 2018) high-resolution global SIFr datasets (0.05 ◦ × 0.05 ◦ ), namely – CSIF (Contiguous SIF), GOSIF (Global OCO-2 SIF), LUE-SIF (Light Use Efficiency SIF), and HSIF (Harmonized SIF) - at 4-day, 8-day, and monthly time scales and found that for the majority of sites, the SIFr is linearly related to ground-based GPP measurements with the eddy covariance method. While the relationship between SIFr and GPP (i.e., the slope - GPP/SIFr) varied significantly across the SIFr datasets, sites, and land cover types, all four SIFr datasets were unequivocally a better predictor of GPP compared to remotely sensed vegetation indices – NDVI (normalized difference vegetation index) and EVI (enhanced vegetation index), sensed by the MODIS satellite. Furthermore, we also analyzed SIF-GPP relationships during drought vs non-drought conditions and found that for about 30% of the sites, comprising mostly non-forests site, the SIF-GPP relationship became weaker (decreased R 2 ) with a lower slope during drought conditions compared to non-drought conditions. Among the four different products, the CSIF (at 4-day timescale) and GOSIF (at 8-day timescale) predicted GPP better compared to LUE-SIF and HSIF across all land cover types. Owing to their long-term availability (since 2000 for CSIF and GOSIF), these SIFr datasets combined with proxies of ecosystem properties can be used to appropriately capture vegetation dynamics and the interannual vari-abilities across a wide range of climatic conditions.

The satellite-based SIF (represented for broader spatial scales) and canopy level GPP are known to show linear relationships with each other over longer time scales (daily or longer) (Frankenberg et al., 2011;Li et al., 2018;MacBean et al., 2018;Parazoo et al., 2014). However, at leaf-level and short time scales (hourly), the SIF-GPP relationship becomes non-linear (Damm et al., 2015;Yang et al., 2017;Liu et al., 2022;Li and Xiao, 2022). Nevertheless, few studies highlighted the use of nonlinear relationships, in the form of mechanistic and statistical relationships to better model satellite-based SIF and GPP (Guan et al., 2016;Liu et al., 2022). Guan et al. (2016) for example, proposed a simplified mechanistic approach to model GPP via SIF and electron transport rate (ETR) relationship which considers different photosynthetic pathways, temperature, and CO 2 concentration. Recently, Liu et al. (2022) theoretically derived an exponential GPP-SIF relationship and demonstrated its advantage compared to the linear SIF-GPP model at a daily timescale using SIF data derived from the latest TROPOMI satellite. Furthermore, the relationship between SIF and GPP can vary strongly across land cover types (Peng et al., 2020;Zuromski et al., 2018) and environmental conditions, namely plant water stress level (Chen et al., 2021a;Martini et al., 2022). Plant water stress level plays an important role because the dynamic between three major pathways of light absorbance by leaves (i. e., fluorescence (SIF), photochemical quenching (PQ; leads to GPP), and non-photochemical quenching (NPQ; heat dissipation) is largely driven by environmental conditions including plant water stress. Drought conditions increase plant water stress levels and result in the increase of NPQ (heat dissipation) which results in a non-proportional change in SIF and PQ, thereby making the SIF-GPP relationship non-linear (Helm et al., 2020). The change in the SIF-GPP relationship due to water stress has mostly been reported based on tower-based/air-borne SIF retrievals over a shorter timescale (half-hourly) Maguire et al., 2020;Marrs et al., 2020;Martini et al., 2022), but fewer studies focused on satellite-based SIF (Song et al., 2021). It is therefore important to analyze if and how the SIF-GPP relationship changes during drought vs. non-drought conditions based on the SIFr datasets. The improved understanding of the SIF-GPP relationship for different vegetation types and water stress levels (classified as drought and non-drought) would enable us to model global/regional GPP based on SIFr more reliably and across a wider range of environmental conditions.
In this study, we first explore different SIFr datasets and compare different relationships with GPP (linear, non-linear) across different land cover types. Secondly, we compare the performance of different SIFr datasets with MODIS-VIs for modelling GPP. Finally, we explore the variation of the SIF-GPP relationship under drought vs. non-drought conditions. It is important to emphasize that it is not the scope of this study to determine the best SIFr dataset (in terms of their absolute values) which needs to be done by validation of the SIFr datasets with global network(s) of tower-based SIF measurements and it is a work in progress (Wen et al., 2020). Instead, this study explores the ability of available SIFr datasets to perform the very task they were reconstructed for, i.e., how well they model GPP, and provide a sound conclusion on the usability of these SIFr datasets as a direct and reliable proxy for GPP.

Data and methods
For comparison purposes, we selected four out of seven global SIFr datasets (CSIF, GOSIF, LUE-SIF, and HSIF) that had a similar spatial resolution (0.05 • × 0.05 • ) and overlapping long-term temporal coverage (12 years, 2007-2018) ( Fig. 1 and Table 1). We used vegetation indices (VIs) derived from MODIS namely -NDVI, EVI, and NIRv (near-infrared reflectance of vegetation), to compare with the SIFr datasets. NIRv is calculated as a product of NDVI and reflectance from the NIR band. The ground-based GPP data were obtained from two sources, namely the FLUXNET2015 Tier 1 dataset and the ICOS drought-2018 EC dataset. Detailed description of these datasets is given in the following sections.

Contiguous SIF (CSIF) dataset
The global spatially contiguous solar-induced fluorescence (CSIF) dataset has been generated by (Zhang et al., 2018a) by training an artificial neural network (NN) with MODIS surface reflectance and SIF derived from the OCO-2 satellite to create a gap-filled OCO-2 SIF dataset ( Table 1). The trained NN model was also used to generate SIF data during the MODIS-era (since 2000) at a 4-day temporal resolution and spatial resolution of 0.05 • × 0.05 • . In this study, the most recent clearsky instantaneous CSIF (version 2) data with 4-day resolution from 2007 to 2018 were used. The CSIF data were obtained from the OSF data repository (Zhang, 2021). In comparison to the original OCO-2 SIF, the clear-sky CSIF product has showed about an 11% overestimation and an RMSE of 0.077 Wm − 2 μm − 1 sr − 1 . Upon preliminary assessment of CSIF with the GPP data from 40 EC flux towers, Zhang et al. (2018a) reported a median R 2 of 0.69 (and RMSE of 1.67 gCm − 2 day − 1 ) of the CSIF-GPP linear relationship, which was observed to vary across sites and further investigation was recommended (Zhang et al., 2018a).

Global OCO-2 SIF (GOSIF) dataset
The global OCO-2 SIF (GOSIF) dataset has been generated by training a Cubist Regression Tree model to gap-fill OCO-2 SIF data using MODIS EVI (enhanced vegetation index) and meteorological reanalysis MERRA-2 (Modern-Era Retrospective analysis for Research and Applications) data of PAR (photosynthetically active radiation), VPD (vapor pressure deficit) and air temperature (Table 1)   Global maps of mean July month SIF (2015-2018) from two original satellite-based SIF retrieval -(a) instantaneous GOME-2; (b) instantaneous OCO-2 SIF; and four SIFr datasets -(c) instantaneous CSIF; (d) daily-corrected GOSIF; (e) daily-corrected LUE-SIF; (f) daily-corrected HSIF used in this study. Original GOME-2 SIF and SIFr datasets are available from 2007 to 2018, whereas Original OCO-2 SIF is available from September 2014 onwards. Original OCO-2 and SIFr datasets come with a land filter, whereas original GOME-2 SIF has inherited noise over water bodies. Please see section 2.1 for details of the SIFr datasets. Units of all SIFr datasets are in Wm − 2 μm.sr − 1 .

Table 1
Details of four global gridded reconstructed-SIF (SIFr) datasets compared in this study. LUE-SIF and HSIF methodology included physiological and regional constraints, whereas CSIF and GOSIF were entirely data-driven. The spatial resolution of all the four SIFr datasets is 0.05 • × 0.05 • (degrees). Machine Learning with regional constraint using MODIS reflectance data (Wen et al., 2020) * Data used in this study are from 2007 to 2018 for all the SIF products. Abbreviations are explained in section 1.
CSIF, the GOSIF dataset has also been generated for the MODIS-era but has a temporal resolution of 8-days and represents daily SIF rather than instantaneous SIF (of CSIF). The conversion from instantaneous to daily SIF is based on a conversion factor depending on the solar zenith angle and short-wave radiation at the time of satellite overpass (Hu et al., 2021;Zhang et al., 2018b). Li and Xiao (2019a) demonstrated a strong linear relationship (R 2 = 0.73) between GOSIF and EC tower-GPP data from 91 flux sites at 8-day resolution. In this study, the GOSIF (version 2) dataset were used, which were obtained from the Global Ecology Group Data Repository (https://globalecology.unh.edu/data/GOSIF.html).

Light use efficiency SIF (LUE-SIF) dataset
The LUE-SIF dataset has been generated by Duveiller et al., (2020) based on GOME-2 SIF data using an improved and updated downscaling methodology based on a light-use efficiency (LUE) model for GPP (Monteith, 1977) and semi-empirical relationships between GOME-2 SIF and different explanatory variables (NDVI, EVI, and NIRv), evapotranspiration, normalized difference water index (NDWI), and land surface temperature (LST) all derived from MODIS (Table 1) (Duveiller and Cescatti, 2016). The SIF downscaling function for the construction of the LUE-SIF product has involved three separate functionsnamely, (i) a quadratic function of vegetation index (NDVI, EVI, NIRv) which is downregulated by (ii) a sigmoid function representing water (NDWI, ET), and (iii) a Gaussian function representing thermal stress (LST) (Eq. (3) of Duveiller et al., 2020). The parameters of these functions are meant to provide regional and physiological constraints to the LUE-SIF. Similar to GOSIF, the LUE-SIF dataset has been generated at an 8-day temporal resolution representing daily SIF, but has a temporal coverage of 2007-2018, as the GOME-2 sensor started operating in 2007. In this study, LUE-SIF (version 2) dataset based on the JJ retrieval (Joiner et al., 2013) method was used as it has shown lower bias upon validation with OCO-2 SIF, compared to the PK retrieval method Köhler et al., 2015), and has been corrected for the decreasing trend in GOME-2 SIF due to sensor degradation (Parazoo et al., 2019;Zhang et al., 2018c) (see Fig. S1).

Harmonized SIF (HSIF) dataset
The HSIF dataset was constructed by Wen et al., (2020) by combining the GOME-2 SIF data and the SCIAMACHY SIF data through downscaling and using a cumulative distribution function matching. The downscaling has been done first using a Random Forest (RF) model to establish predictive relationships between SIF and MODIS reflectance data at coarse resolution (0.5 o and 1 o ), and then using the same predictive relationships to estimate daily SIF (HSIF) at a comparatively finer resolution (0.05 o ) using MODIS reflectance data (at 0.05 o ) at monthly time scales from 2002 to 2018. HSIF has incorporated regionalization constraints following the methodology of Duveiller and Cescatti, (2016) which involves performing separate RF regressions at a regional (or local) scale of 5 o × 5 o spatial windows. The HSIF dataset has been validated with airborne SIF derived from the Chlorophyll Fluorescence Imaging Spectrometer (R 2 = 0.73) and with ground towerbased SIF measurement (R 2 = 0.91). The HSIF (version 2) dataset used in this study was obtained from Cornell University's BOX link (HSIF_v2, 2020). Since the HSIF dataset suffers from the decreasing trend in GOME-2 SIF due to sensor degradation (Parazoo et al., 2019;Zhang et al., 2018c), we used an Ensemble Empirical Mode Decomposition (EEMD) method (Wen et al., 2020) to detrend the dataset and used the detrended HSIF for our analysis (see Figs. S2). More detail about the EEMD method can be found in Wu and Huang (2009).

MODIS datasets
In addition to SIFr datasets, this study also used MODIS vegetation indices (VIs), namely EVI, NDVI, and NIRv to compare it to the SIFr relationships with ground-based GPP data. The NDVI and EVI data were obtained from the standard Level-3 MODIS products -MOD13A1 (TERRA) and MYD13A1 (AQUA) V6 16-day composite product (combined 8-day from both AQUA and TERRA satellite), with a spatial resolution of 500 m (Didan, 2015a(Didan, , 2015b. We used only good quality NDVI and EVI data (variable 16_days_pixel_reliability = 0) for our analysis. The NIRv (product of NDVI and NIR reflectance band) was calculated from the daily MODIS Level-2 NBAR (Nadir BRDF-adjusted reflectance) product at 500 m resolution (MCD43A4, Schaaf and Wang, 2015), and aggregated at a 4-day resolution to match the CSIF temporal scale for further analysis. Furthermore, combined MODIS Level-4 product -MCD15A3H 4-day LAI (leaf area index) and fPAR (fraction of photosynthetically active radiation) data were also used (Myeni and Knyazikhin, 2015).

Ground-based GPP measured with Eddy covariance (EC)
The spatial resolution mismatch between the footprint size of EC data (several meters to several 100 m) and the SIFr pixel size (0.05 o × 0.05 o , approximately 5 km near the equator) makes the use of most of the available EC data challenging. In this study, only EC flux sites that could be representative for the corresponding SIFr pixel (0.05 • x 0.05 • ) were selected. The MODIS land cover data at 0.05 • x 0.05 • comes with information about the 'majority land cover' and 'percentage' of each land cover which is given with the IGBP (International Geosphere-Biosphere Program) code in each pixel. The first criteria were to determine if the majority of the land cover of a MODIS pixel had the same type as the land cover of the EC site. Since land cover change is a slow process, we chose to randomly sample MODIS land cover data for four years (2007, 2010, 2015, and 2018) instead of each of the 12 years (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018). Then for the selected sites from the first criteria, we calculated the relative area (excluding the water bodies where SIF is zero) of the EC site land cover within the MODIS pixel. Site was selected for our analysis if the relative area was >50%. The median relative area of the selected 67 flux sites was 86% as shown in the boxplot of Fig. 2. Out of the available 212 FLUXNET2015 Tier 1 flux sites only 67 were selected as their land cover was comparatively homogeneous (median relative area of 86%; Fig. 2) within the 0.05 • x 0.05 • SIFr pixel. For these 67 sites, the SIFr signal was mostly (if not all) coming from a similar land cover over which GPP was measured. The details of the sites and their location are shown in Fig. 2 and Table S1. Out of the 67 selected sites, data from 24 sites was included in the ICOS Drought-2018 ecosystem eddy covariance flux product (Drought 2018 Team and ICOS Ecosystem Thematic Centre, 2020; https://www.icos-cp.eu/data-products/YVR0-4898) which had EC data available till 2018. Both the ICOS and FLUXNET2015 EC flux datasets were processed using the same standardized processing pipeline (Pastorello et al., 2020). This resulted in a total of 427 site-years of EC data from 2007 to 2018. We used the GPP partitioned based on the standard daytime (DT) method (GPP_DT_VU-T_REF data) (Lasslop et al., 2010) since GPP_DT_VUT_REF is constrained by a minimum of zero (no negative values), to be able to test an exponential SIF-GPP relationship (explained below). Nevertheless, our result should be robust to the method of GPP used (with daytime vs. nighttime partitioning method) as at 4-day, 8-day and monthly aggregation, the DT partitioning methods correlates strongly (R 2 > 0.91) with the GPP obtained from the nighttime partitioning method (Reichstein et al., 2005;Trifunov et al., 2021) as shown in Fig. S3 (a-c). The poor quality GPP data (NEE_VUT_REF_QC = 2, about 10%), were not used in our analysis. Site included in the analysis covered nine different land cover types in total -CRO (cropland -18 sites, 117 site-years), DBF (deciduous broadleaf forest -8 sites, 60 site-years), EBF (evergreen broadleaf forest -7 sites, 26 site-years), ENF (evergreen needleleaf forest -11 sites, 83 site-years), GRA (grassland -15 sites, 84 site-years), MF (mixed forest -

SIF-GPP relationship
Combining the light-use-efficiency model for GPP and SIF, the SIF-GPP relationship can be approximated as shown in Eq. (1) : where LUE GPP is the light-use efficiency for photosynthesis, and SIF yield is the emitted SIF per photon absorbed, which varies with fluorescence yield (Φ F ) and SIF canopy escape probability term (Ω C ) (Damm et al., 2010). Since the second term of the right-hand-side of Eq. 1 ( LUEGPP ΦF ×ΩC ) can be linear or non-linear, we used different types of relationships (one linear and two non-linear) to test the relationship between SIFr and GPP derived from eddy covariance measurements. The linear relationship can be expressed as a simple univariate linear regression model with SIFr as an explanatory variable and ground-based GPP as a response variable as shown in Eq. (2): The coefficient s in Eq.
(2) represents the slope. The intercept is assumed to be zero based on the physical constraint that with no photosynthetic activity (zero GPP), SIFr would approach zero (Li and Xiao, 2022;Sun et al., 2017). The non-linear relationship proposed by Guan et al. (2016) and Liu et al. (2022) were used in this study. A simplified mechanistic approach proposed by Guan et al. (2016) first involves modelling photosynthetic electron transfer rate (ETR) linearly with SIFr (Eq. (3)) and then estimating GPP using ETR and electron use efficiency (EUE; Eq. (4)). EUE depends non-linearly on the photosynthetic pathway of the plant (C 3 or C 4 ), temperature and internal CO 2 concentration (Equations SE1 and SE2; Edwards and Baker, 1993). See Supplementary section S1 for more information.
The second non-linear relationship models SIFr-GPP using an exponential equation with two positive empirical coefficients (a and b) (Liu et al. 2022) as shown in Eq. 5.
Before fitting the SIF-GPP relationship we harmonized all the SIFr datasets for time of day of the measurement and retrieval wavelength. Only CSIF is measured instantaneously, whereas the other three SIFr datasets represents daily corrected SIF. Therefore, we first converted instantaneous CSIF (CSIF inst ) to daily CSIF (CSIF daily ) using the day of time correction (γ) factor based on the solar zenith angle (SZA) as a determinant of incoming solar radiation (Frankenberg et al., 2011;Zhang et al., 2018b) as shown in Eqs. (6).
Where, t0 is the OCO-2 time of measurement and the integration of cos(SZA) from t0 to t0 + 1 denotes a daily integration (negative cos (SZA) is zeroed out). The daily integration is numerically integrated at 10 min intervals. The SZA is a function of local daytime and latitude, therefore γ varies across latitude and day of year. To harmonize SIF retrieved at different wavelengths (CSIF and GOSIF at 757 nm and LUE-SIF and HSIF at 740 nm) we used a simple and robust wavelength scaling factor, given the invariant SIF spectra at far-red wavelengths (see Magney et al., 2019; Parazoo et al., 2019). We multiplied GOSIF and CSIF measured at 757 nm to a wavelength scaling factor of 1.69 (Parazoo et al., 2019) to convert them to GOSIF and CSIF at 740 nm. We aggregated ground-based GPP data based on the temporal resolution of the SIFr data for fitting the three SIF-GPP models at 4-day, 8-day, and monthly resolutions to regress with the SIFr dataset, i.e., daily GPP (daily average) aggregated at 4-day for CSIF, 8-day for GOSIF and LUE-SIF, and monthly resolution for HSIF. The strength of the SIF-GPP relationship was compared based on the goodness of fit of the model  Table 1 and caption of Fig. 1 for more detail about the SIFr data. The correlation values are calculated based on the lowest common temporal resolution, i.e., NIRv with CSIF at 4-day, GOSIF/LUE-SIF/NDVI/EVI with CSIF at 8-day and so on. See Fig. 2 caption for abbreviation of land cover types.
(R 2 ) and root mean square error (RMSE). Finally, we used the best SIF-GPP relationship (among the three mentioned above) to investigate the SIF-GPP relationships across site and year based on all four SIFr datasets and MODIS VIs. To be able to compare the model coefficients across SIFr products we standardized both SIFr and GPP for each site by dividing them by their maximum values.
Owing to the increasing use of SIFr to study drought impact on vegetation as a proxy of GPP, it's also essential to check if the SIF-GPP linear relationship holds during drought and non-drought conditions. For this, the SIF-GPP relationship was investigated during meteorological drought and non-drought conditions. A meteorological drought was identified with the SPEI (standardized precipitation evapotranspiration index) (Vicente-Serrano et al., 2010). The month which showed a 3month SPEI (SPEI-3; represents seasonal drought) or monthly SPEI (SPEI-1; represent agricultural drought) less than − 1 was identified as a drought month (Wable et al., 2018). SPEI was constructed based on the monthly precipitation and air temperature measurements at each flux site (FLUXNET/ICOS data products, see section 2.3). Drought conditions were confirmed by the climatic water balance which was significantly lower than zero (Fig. S4). We excluded croplands from the analysis because different crops during different times could also result in different SIF-GPP relationships. Finally, a total of 36 sites ( Fig. S4; seven DBF, four EBF, ten ENF, nine GRA, three MF, two SAV, and one WET) had at least nine months of meteorological drought during the study period (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) and were investigated for the drought vs nondrought SIF-GPP relationships (Table S1). All data analyses and visualization were carried out in the statistical software package R (R Core Team, 2020), and extended to packages: ggplot2 (Wickham, 2016), dplyr (Wickham et al., 2019), ggsignif (Ahlmann-Eltze and Patil, 2021) and raster (Hijmans et al., 2020).

Comparison of SIFr and MODIS datasets
Overall, there was a significantly positive correlation among all the SIFr (CSIF, GOSIF, LUE-SIF, HSIF) and MODIS level-3 (NDVI, NIRv, and EVI), and level-4 (fPAR, LAI) datasets (Fig. 3a) indicated by Pearson correlation coefficient (r > 0.5). The correlation was highest among the four SIFr datasets (r > 0.9; Fig. 3a) but among the remote sensing datasets (SIFr and MODIS datasets) correlation varied significantly across different land cover types. The EBF, ENF, OSH, and SAV showed the lowest (and non-significant) positive correlations among some of the datasets (Fig. 3d, e, h, i), whereas DBF showed the highest positive correlation (Fig. 3c). For EBF, ENF, and SAV, all the SIFr datasets showed low correlations with MODIS products, except for NIRv (r as high as 0.91, 0.83, and 0.88 with CSIF, respectively) (Fig. 3). Also, across all land cover types, all the SIFr datasets showed higher correlations with NIRv than with any other MODIS dataset, and among the SIFr datasets, CSIF and GOSIF showed a higher correlation (r = 0.79 to 0.95, mean r = 0.88) with NIRv than LUE-SIF and HSIF (r = 0.38 to 0.96, mean r = 0.8) (Fig. 3). The strong correlation between CSIF and GOSIF was clear across all land cover types (mean r = 0.98; Fig. 3a), with r varying between 0.89 for EBF (Fig. 3d) to 0.99 for DBF and WET (Fig. 3c, j). Furthermore, both CSIF and GOSIF always showed higher correlation with each other (r = 0.89 to 0.99) than with LUE-SIF (r = 0.63 to 0.96 with CSIF, and r = 0.62 to 0.96, with GOSIF) and HSIF (r = 0.34 to 0.98 with CSIF, and r = 0.42 to 0.98, with GOSIF) across all land cover types. Across EBF, OSH, and SAV, particularly HSIF showed a significantly lower than average correlation with other SIFr and MODIS datasets. Similarly, LUE-SIF showed correlations significantly lower than average with other SIFr and MODIS datasets for EBF, ENF, MF, and WET sites (Fig. 3).

Comparison of linear and non-linear SIF-GPP relationships
Except for the EBF sites where none of the models performed well, the linear model explained the relationship between SIFr and GPP better than the exponential model or the model of Guan et al. (2016) for most of the sites and land cover types (Figs. 4, S5, S6 and S7). Between the two non-linear models, the model of Guan et al. 2016 was marginally better for the DBF, ENF, and SAV land covers (Fig. S7). This result shows that a simple linear SIF-GPP relationship might be robust enough for simulating GPP based on SIFr datasets. Furthermore, due to its simple nature and the straightforward interpretation and comparability across sites and land cover types (based on the slope of the model) we used the linear model to explore the SIF-GPP relationships across sites and land cover types.
The linear relationship of GPP with seven datasets (four SIFr and three MODIS-VIs) varied significantly both in terms of strength and slope across sites ( Fig. S8 and S9), land cover types (Fig. S10), datasets (SIFr and MODIS-VIs) (Figs. 5 and 6) and for few sites also across years ( Table 2). The slope was largely the same across different temporal resolutions (4-day, 8-day and monthly) for each of the SIFr datasets with moderate increase in R 2 (ΔR 2 > 0.05) as temporal aggregation increased (Fig. S11). Therefore, the comparison of slope across SIFr datasets was done at their original temporal resolution, whereas the R 2 was compared at the least commonly aggregated time scale for all SIFr and MODIS datasets, i.e., at the monthly timescale. The results of R 2 at original temporal resolution are shown in the supplementary Fig. S12. The dailycorrected CSIF showed improvement in R 2 (increased by 0.075) of the linear CSIF-GPP model compared to instantaneous CSIF (Fig. S5).
Overall, all four SIFr datasets performed better (in terms of R 2 ) in linearly modelling GPP than the traditional MODIS VIs, NDVI, and EVI products (Fig. 5) with ΔR 2 (increase in R 2 ) as high as 0.3 compared to NDVI, and 0.25 compared to EVI. The newly developed NIRv showed similar or lower R 2 in comparison to the SIFr datasets except for CRO and OSH (Fig. 5). Overall, among the SIFr datasets, CSIF (median R 2 = 0.86) and GOSIF (median R 2 = 0.88) performed better than LUE-SIF (median R 2 = 0.78) and HSIF (median R 2 = 0.81) (Fig. 5). These overall differences among the SIFr datasets were also evident across most of the land cover types. The similarity in performance of CSIF and GOSIF was evident across all land cover types except at OSH where CSIF (median R 2 = 0.58) was better than GOSIF (median R 2 = 0.24) (Fig. 5). Furthermore, LUE-SIF showed significantly lower R 2 than one or many of the other SIFr datasets for all land cover types except EBF. Among different land cover types, the highest R 2 were obtained for DBF (median R 2 of >0.90 for CSIF, GOSIF, HSIF, and NIRv, and median R 2 > 0.75 for LUE-SIF, NDVI, and EVI) whereas the lowest R 2 for EBF with median R 2 < 0.36 with all the SIFr and MODIS-VIs. Even at 4-day or 8-day resolution both CSIF and GOSIF showed high performance (median R 2 > 0.80) for forests (except EBF), grasslands, savannas, and wetlands and were comparable to HSIF's performance at a monthly timescale (Fig. 5).
The slope largely varied across sites for each land cover type, with high variance across sites for croplands and grasslands based on the four SIFr datasets (Figs. 6, S8, and S9). These site differences in slope were evident in the difference of slope between most of the land cover types as indicated by all four SIFr datasets (Fig. S10). The slope values were higher for the forests (DBF, ENF, ENF, and MF) compared to non-forests ( Fig. S8-S10). The difference in slope across different SIFr datasets was clear for most of the land cover types except for grasslands and mixed forests (Fig. 6) with the differences mostly between CSIF/GOSIF and LUE-SIF/HSIF. The standardized slope from HSIF and LUE-SIF were mostly not statistically different from each other (except for CRO), indicating similarity in both of these datasets. Some differences between the standardized slope of CSIF and GOSIF were present for CRO, DBF and ENF types (Fig. 6). The variance in the slope across sites of GRA and MF was similar for all four SIFr datasets, resulting in no statistical difference between slope across SIFr datasets for these land cover types (Figs. S8, S9 and 6). Additionally, for most of the sites (>57 out of 67 sites), the SIFr-GPP relationship did not vary throughout the year, indicating a robust site-specific SIFr-GPP relationship. Furthermore, the SIF-GPP linear relationships did not change significantly whether dayor nighttime-partitioned GPP was used (see Fig. S13).

SIF-GPP linear relationship under drought vs. non-drought conditions
Out of the 67 selected flux sites, 36 sites had either agricultural (based on SPEI-1) or seasonal drought (based on SPEI-3) conditions of a minimum of nine months, and thus, allowed for a robust analysis of the GPP linear relationships SIFr datasets under drought vs. non-drought conditions. We first compared two non-linear SIF-GPP relationships with the linear SIF-GPP relationship to test how SIFr and GPP are related during water stress conditions. Our results indicated that even during drought conditions the SIF-GPP relationships remained linear and GPP could be modelled as ΔR 2 from the linear model was >0.08 compared to the non-linear models for all SIFr datasets (Fig. 7). Therefore, we also analyzed how the slope and strength (R 2 ) of the linear SIF-GPP relationships changed during drought.
Since the relationship between SIFr datasets did not change significantly when aggregated across time (Fig. S11), we show the SIF-GPP relationship during drought and non-drought conditions at the original temporal resolution of the SIFr datasets, i.e., 4-day resolution for CSIF, 8-day resolution for GOSIF and LUE-SIF, and monthly resolution for HSIF. Out of the 36 sites, most (26 sites) showed no significant change in the slope during drought conditions in all four SIFr datasets (Fig. 8). Out of those ten sites which showed a significantly different (mostly lower) slope during drought, six were GRA, two ENF, one OSH and one was an EBF site ( Fig. 8 and S14). The DBF, MF, SAV, and WET sites showed similar slopes also during drought conditions (Fig. 8).
In addition to slope, the strength of the linear relationship decreased during drought for ten sites (Fig. 9) with an overall median reduction of R 2 of 0.05, 0.04, 0.04, and 0.03 for CSIF, GOSIF, LUE-SIF and HSIF, respectively. The decrease in R 2 was as high as 0.3 for five grassland sites (Fig. 9) for all the SIFr datasets. Interestingly, LUE-SIF showed an increase in R 2 of as high as 0.2 for five out of ten ENF sites (the slopes did not differ significantly) during drought conditions. In comparison, MODIS VIs showed no significant change in the slope of their linear relationship to GPP during drought conditions for most of the sites (32/ 36) (Fig. S15). Similarly, the strength of the linear relationship of MODIS-VIs with GPP mostly remained similar (median change across all sites ranged from 0.01 to 0.02) during drought conditions (Fig. S16).

Discussion
Using four recently reconstructed SIF (SIFr) datasets (CSIF, GOSIF, LUE-SIF and HSIF) and GPP measured at 67 flux sites (427 site-years) covering nine different land cover classes, we first showed that the linear SIFr-GPP relationships was more robust than a non-linear SIFr-GPP relationship across sites and land cover types (Figs. 4,S6,and S7) and that in non-grassland sites the relationship held also during drought conditions (Figs. 7, 8, and 9). We also showed that the SIF-GPP linear relationship varied significantly across the SIFr datasets, across sites, across land cover types -and for few sites also across years-in terms of slope and strength of the relationship (Figs. 5, 6, S8-S10 and Table 2). Our results also show that the SIFr datasets are a better predictor of GPP across all land cover types than the MODIS-VIs products (Fig. 5).

SIFr vs MODIS-VIs datasets
The differences between SIFr and MODIS-VIs datasets were clearly evident with strong similarities for DBF, GRA, MF, OSH, and WET (Getachew Mengistu et al., 2021;Qiu et al., 2020), and clear differences (weaker or no correlation) for evergreen land cover (EBF, ENF, and SAV). MODIS-VIs, specifically NDVI and EVI, are less sensitive to evergreen land covers compared to SIF, as they capture the structural characteristics of the vegetation (greenness and biomass), while SIF is known to capture both structural and physiological characteristics (i.e., photosynthetic rate) of the vegetation . The lower correlation between SIFr and MODIS-VI for CRO compared to other land cover types can be related to management (high heterogeneity) and different spatial resolution of the compared SIFr and MODIS-VI datasets (0.05 • versus 500 m). The very high similarity of NIRv with SIFr (r > 0.8), can be particularly useful to further downscale these SIFr datasets from 0.05 • × 0.05 • degrees to 500 m (for MODIS) or even 10 m spatial resolution (based on Sentinel-2) for a finer scale monitoring of GPP dynamics (see for example Turner et al., 2020).

SIF-GPP relationship variation
Although mathematically the SIF-GPP relationship (as shown in Eq. (1)) is non-linear, it depends on the spatial and temporal scale of the dataset. When measured at the leaf level for short timescales of hours, the relationship is found to be highly non-linear, but when scaled to broader spatial and longer temporal scales the SIF-GPP relationship tends to become increasing linear (Damn et al., 2015;Guanter et al., 2012). Such a shift from non-linearity to the linearity of the SIF-GPP relationship can be attributed to the mean-field effect of larger scales (both in terms of spatial and temporal) that results in canceling out the non-linear terms (Chen et al., 2021a;Thum et al., 2017). Since these SIFr datasets are available at a resolution of 4-day, 8-day, and even monthly, the longer aggregation periods make the relationship increasingly linear as also reported by Liu et al., (2022) in the case of an increase in temporal aggregation of TROPOMI SIF. This linear simplification of the SIFr-GPP relationship also simplifies the upscaling of SIFr to GPP as well as the evaluation of the spatial heterogeneity of the SIFr-GPP relationship as it is based on one parameter (the slope of the relationship).
The SIF-GPP relationship differs significantly across SIFr datasets with CSIF and GOSIF demonstrating a stronger linear relationship to GPP in comparison to . This difference can be attributed to two reasons. First, differences in the base SIF retrieval used in the reconstruction of the SIFr datasets, and second, the downscaling methodology used for their reconstruction. Both CSIF and GOSIF have OCO-2 SIF retrievals as their base, whereas LUE-SIF and HSIF have SIF from GOME-2 and/or SCIAMACHY as their base. The major Fig. 6. Standardized slope (GPP/SIFr) of the SIFr-GPP linear model for different land cover across different SIFr datasets. Asterisk sign shows significant difference between standardized slope for different SIFr datasets. All the SIFr datasets are in their original temporal resolution (see Table 1). CSIF is daily-corrected. Both CSIF and GOSIF are scaled to wavelength of LUE-SIF and HSIF. For abbreviations, refer to caption of Fig. 2.

Table 2
Significant differences of GPP linear relationships to SIFr datasets across years (p < 0.05, denoted by *) of each site of different land cover types. Number of sites with land cover type is denoted by 'n'. For each SIFr dataset, the number in the bracket represent number of sites that showed significantly different linear relationships across years. differences between these original SIF retrievals are their different spatial resolution (OCO-2: 1.3 × 2.25 km 2 ; GOME-2: at least 40 × 40 km 2 ; SCIAMACHY: 30 × 240 km 2 ), retrieval wavelength (OCO-2: 757 nm; GOME-2 and SCIAMACHY: 740 nm) and the corresponding spectral resolution (OCO-2 ≤ 0.05 nm; GOME-2 and SCIAMACHY = 0.5 nm), and sampling strategy (about 2,000,000 clear sky SIF retrievals for OCO-2 and 80,000 for GOME-2 per month) (Köhler et al., 2015;Sun et al., 2018). OCO-2's higher spatial and spectral resolution and a larger number of SIF retrievals compared to GOME-2 and SCIAMACHY ultimately result in a large quantity of good quality SIF retrievals (more samples with less cloud contamination) of GOME-2 (Sun et al., , 2017 to train the downscaling algorithm of CSIF and GOSIF. Furthermore, owing to the higher spatial resolution of OCO-2 SIF, it segregates different SIF responses from different vegetation types, and thus the reconstructed product carries distinct SIF signals from different vegetation (Sun et al., 2017;Zhang et al., 2018a), whereas SIF from GOME-2 and SCIAMACHY are not resolved enough to capture SIF signals from different land cover types. Any vegetation-specific signals in LUE-SIF and HSIF are due to distinct MODIS reflectance signals for different land cover types. Moreover, the gap-filling of OCO-2 SIF to reconstruct GOSIF and CSIF involves multiple OCO-2 SIF retrievals in the 0.05 × 0.05 • grid size to train the neural network (for CSIF) and cubist (for GOSIF) model to better capture the SIF and MODIS reflectance relationship for reconstructed CSIF and GOSIF products (Li and Xiao, 2019a;Zhang et al., 2018a). The calibration of GOME-2 SIF (only LUE-SIF) and SCIAMACHY (for HSIF) with MODIS data is done at a much coarser resolution of 0.5 × 0.5, but the reconstruction of LUE-SIF and HSIF is done at 0.05 × 0.05 degrees Duveiller and Cescatti, 2016;Wen et al., 2020). This large difference in calibration and reconstruction spatial scale might further dilute the SIF signals from different land cover types in the LUE-SIF and HSIF. This methodological difference in combination with the difference in the quality of the original SIF retrieval (OCO-2 vs GOME-2 and SCIAMACHY) could explain the better performance of CSIF and GOSIF compared to LUE-SIF and HSIF for different land cover types. However, it is difficult to conclude which of the two distinct methods of training algorithm, namely -the solely data-driven gap-filling approach (for CSIF and GOSIF) and physiologically (LUE model) and regionally constraint downscaling approach (for LUE-SIF and HSIF) is a better methodology for the production of a reconstructed product due to the confounding effect from different original SIF retrievals used for these methodologies. The former approach could be more flexible than the LUE model in nonlinear modelling of predictors (MODIS and meteorological data) with SIF retrievals, whereas the latter approach is physically and physiologically more sound (Duveiller and Cescatti, 2016;Wen et al., 2020). Nevertheless, in the case of the SIFr datasets used here, the advantages of OCO-2 SIF could outplay the possible disadvantage of the data-driven method, or the disadvantages of GOME-2 SIF can underplay the advantages of the regionally constraint LUE model. The recent availability of TROPOMI SIF characterized by a nearly daily revisit time, high signalto-noise ratio, and spatial resolution of 3.5 × 5.5 km 2 can serve as an excellent reference for a reconstructed finer scale SIF product as demonstrated by Turner et al., (2020) andGensheimer et al., (2022). We found that there was no significant advantage of including MODIS VIs (NIRv, NDVI, and EVI) or other SIFr datasets in the bivariate linear model (including the linear combinations of SIFr and MODIS VIs) for GPP estimation (data not shown). Since SIF-GPP relationships are affected by vegetation structure, the inclusion of MODIS-VIs (especially NIRv) could act as a canopy-level correction factor  to improve the linear relationship between SIF and GPP. But, this could Fig. 7. Comparison of linear and non-linear SIFr-GPP models during drought conditions for 36 flux sites with seven DBF, four EBF, ten ENF, nine GRA, three MF, two SAV and one WET site. The linear model is shown as per Eq. 2, the Guan et al. model is as per Eq. 3 & 4, whereas exponential model is as per Eq. 5. All SIFr datasets are in their original temporal resolution, i.e., (daily-corrected) CSIF at 4 day, GOSIF and LUE-SIF at 8-day and HSIF at monthly resolution. The 1:1 line is shown as dashed line. For abbreviations of land cover types, refer to caption of Fig. 2. also introduce some level of obliqueness since these SIFr datasets (CSIF and GOSIF) are already trained based on MODIS reflectance products with some level of information about the canopy properties. This might explain no significant improvement in bivariate models in comparison to univariate CSIF and GOSIF models.

SIF-GPP relationship across sites and land cover types
Our result of different SIF-GPP relationships across sites and land cover types (Figs. 6 and S8-S10) are in line with previous research (Chen et al., 2021a;Lu et al., 2018;Peng et al., 2020;Zuromski et al., 2018 among others). Most studies have shown that the SIF-GPP linear relationship varies with vegetation characteristics and environmental conditions (Chen et al., 2021a;Getachew Mengistu et al., 2021;Jiao et al., 2019;Paul-Limoges et al., 2018;Song et al., 2021;Stocker et al., 2019;Wang et al., 2021). The weak relationship between SIF-GPP for evergreen broadleaf forest could be attributed to the complex vegetation structure leading to larger uncertainty in satellite observations and ground-based GPP estimates from EC (Hayek et al., 2018;Li et al., 2018;Madani et al., 2017;Tang and Dubayah, 2017;Zhang et al., 2020).
Furthermore, EBF are mostly present in tropical regions where cloud presence contaminates the SIF and VIs signal from satellite measurements. Additionally, retrieval of VIs suffers from the saturation issue over evergreen forests (ENF and EBF). As SIF is more linked to the photosynthetic properties than VIs, it was expected that the SIFr datasets demonstrated a stronger linear relationship with GPP for ENF and EBF compared to VIs. A large variance in linear SIF-GPP relationship (in terms of slope) across croplands and grasslands could be attributed to year-to-year variability in crop type, species and management practices. A recent study by Gao et al., (2021) that compared the slope of the SIF-GPP relationship based on CSIF and GOSIF reported that the slope in ENF was significantly higher than other land cover types and thereby reported a two-slope scheme (one for ENF and another for non-ENF) for modelling GPP. However, we did not find such two-slope scheme, perhaps because in the first place we had a stricter site selection compared to Gao et al., (2021) who did not account for the landcover heterogeneity (Gao et al., 2021).

Fig. 8.
Comparison of slope of the SIF-GPP linear model for four SIFr datasets (a-d) during drought (x-axis) and non-drought (y-axis) conditions for 36 flux sites with seven DBF, four EBF, ten ENF, nine GRA, three MF, two SAV and one WET site. Horizontal and vertical error bars denote the 99% confidence interval of the slope for drought and non-drought conditions, respectively. Significant difference (p < 0.05) between the slope is assumed when the error bars are away from the 1:1 line (showed in the black dash). All SIFr datasets are in their original temporal resolution, i.e., (daily-corrected) CSIF at 4 day, GOSIF and LUE-SIF at 8-day and HSIF at monthly resolution. For abbreviations of land cover types, refer to caption of Fig. 2 in the main text. Scatter plots with regression lines is shown in Fig. S14.

SIF-GPP relationship during drought and non-drought conditions
In total, we identified at least nine months for each site that were classified as meteorological drought conditions and allowed us to test the SIFr-GPP relationship outside normal conditions. Plant water stress results in the increase of NPQ (heat dissipation), which ultimately results in a non-proportional change in both LUE GPP and fluorescence yield (Φ F in Eq. 1) thereby making the SIF-GPP relationship non-linear (Helm et al., 2020). Here, we found that the SIF-GPP relationship during drought conditions was dominatingly linear (except few of the grassland sites) based on the SIFr datasets. This consistent linearity in the SIF-GPP relationship can be attributed to the mean-field effects at larger scales, where the short-term non-proportional changes in LUE GPP and Φ F would eventually synchronize when aggregated over broader spatial and longer temporal scales (Chen et al., 2021a;Thum et al., 2017), as in the case of SIFr datasets. Increased NPQ during drought conditions results in a higher reduction of GPP compared to SIF, thereby reducing the slope (GPP/SIFr) of the SIF-GPP linear relationship compared to non-drought conditions (Gu et al., 2019) which we observed only in the non-forest sites, including six GRA sites and one OSH site. Forests are characterized by a deeper rooting depth and buffered micro-climatic conditions and therefore have better mechanisms to resist drought conditions compared to grasslands which might lead to lower NPQ under water stress in forests compared to grasslands (Ellison et al., 2017;Teuling et al., 2010). Nevertheless, our results are consistent with a recent study by Song et al. (2021) reporting a consistent linear relation between SIF (SIF_OCO2_005 product) and GPP during drought compared to normal conditions across different biomes in the continental United States.  demonstrated a decreasing trend in the GPP/SIF slope along a cold-wet to a hot-dry moisture gradient which could be due to moisture-regulated stomatal responses but more research is needed to establish this underlying mechanism . Also, in comparison to non-drought conditions, the weaker and significantly different linear relationship of widely used NDVI and EVI with GPP for ENF and EBF could be attributed to lower sensitivity of NDVI and EVI to physiological properties of the canopy (e.g., photosynthesis) compared to structural properties (biomass and canopy cover). Interestingly, the largely consistent SIF-GPP relationship also during drought conditions (Figs. 8 and 9) demonstrates the suitability of these SIFr datasets to be used in drought impact studies Shekhar et al., 2020;Zhang et al., 2019). Future research investigating the SIFr-GPP relationships during different levels of drought severity (moderate vs extreme drought) could further give insight into if and how sensitive the slope of the relationship is to drought severity for different land cover Fig. 9. Comparison of strength of the SIF-GPP linear model (R 2 ) for four SIFr datasets (a-d) during drought (x-axis) and non-drought (y-axis) conditions for 36 flux sites with seven DBF, four EBF, ten ENF, nine GRA, three MF, two SAV and one WET site. The 1:1 line is showed in the black dash. All SIFr datasets are in their original temporal resolution, i.e., (daily-corrected) CSIF at 4 day, GOSIF and LUE-SIF at 8-day and HSIF at monthly resolution. See caption of Fig. 2 for the abbreviation of land cover types. Scatter plots with regression lines are shown in Fig. S14.
types. Future validation of these SIFr datasets with independent airborne SIF and a growing network of tower-based SIF measurement would give insight into the quality of these SIFr datasets in terms of the accuracy of the absolute SIF values.

Conclusions
Availability of long-term time series (up to 20 years) of global reconstructed SIF datasets (SIFr) at comparatively higher spatial and temporal resolution (up to every 4-day) are highly useful for studying and monitoring the productivity (GPP) of terrestrial ecosystems across time and space. Our study shows that the assumption of a linear relationship between SIFr datasets and ground-based GPP measured at flux sites generally holds. The CSIF and GOSIF datasets (available since 2000) have a stronger linear relationship to GPP than other SIFr and MODIS-VI datasets across time (drought and non-drought) and space (land cover types), and thus could be the most direct and reliable proxy for GPP among SIFr datasets. These SIFr datasets showed a higher correlation with NIRv datasets across most of the land cover types and thus can be further downscaled to 10 m -500 m spatial resolution for regional and global GPP modelling, depending on the NIRv resolution (Sentinel-2: 10 m, and MODIS: 500 m). Long-term (about 20 years) time series of SIFr datasets can be used to map the vulnerability of different vegetation types to extreme climatic events, such as droughts and heatwaves that are expected to increase globally in intensity and frequency. With newly (but shorter period) available SIF data from TRO-POMI, TanSat, and OCO-3, long-term reconstruction of these SIF using both data-driven and physiologically-constrained methods can give insights into the methodological differences and relationship of different reconstructed SIF datasets to global GPP.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
The data used in this study is freely available and can be accessed from the links given in Section 2 of the manuscript