Evaluating photosynthetic activity across Arctic-Boreal land cover types using solar-induced fluorescence

Photosynthesis of terrestrial ecosystems in the Arctic-Boreal region is a critical part of the global carbon cycle. Solar-induced chlorophyll Fluorescence (SIF), a promising proxy for photosynthesis with physiological insight, has been used to track gross primary production (GPP) at regional scales. Recent studies have constructed empirical relationships between SIF and eddy covariance-derived GPP as a first step to predicting global GPP. However, high latitudes pose two specific challenges: (a) Unique plant species and land cover types in the Arctic–Boreal region are not included in the generalized SIF-GPP relationship from lower latitudes, and (b) the complex terrain and sub-pixel land cover further complicate the interpretation of the SIF-GPP relationship. In this study, we focused on the Arctic-Boreal vulnerability experiment (ABoVE) domain and evaluated the empirical relationships between SIF for high latitudes from the TROPOspheric Monitoring Instrument (TROPOMI) and a state-of-the-art machine learning GPP product (FluxCom). For the first time, we report the regression slope, linear correlation coefficient, and the goodness of the fit of SIF-GPP relationships for Arctic-Boreal land cover types with extensive spatial coverage. We found several potential issues specific to the Arctic-Boreal region that should be considered: (a) unrealistically high FluxCom GPP due to the presence of snow and water at the subpixel scale; (b) changing biomass distribution and SIF-GPP relationship along elevational gradients, and (c) limited perspective and misrepresentation of heterogeneous land cover across spatial resolutions. Taken together, our results will help improve the estimation of GPP using SIF in terrestrial biosphere models and cope with model-data uncertainties in the Arctic-Boreal region.


Introduction
As a critical part of the global carbon cycle and land carbon sink for atmospheric CO 2 , terrestrial photosynthesis in the Arctic-Boreal region can play a key role in mitigating global climate change (Beer et al 2010, Mishra andRiley 2012). Due to exceedingly high warming trends at high latitudes (Post et al 2019, Walsh and Brettschneider 2019), Arctic-Boreal ecosystems are undergoing more rapid changes than the rest of the world (Box et al 2019, Canadell et al 2021, such as in photosynthetic productivity, growing season phenology, and vegetation composition (Myers-Smith et al 2020). As a result, the future direction and magnitude of terrestrial ecosystem change in these systems has become highly uncertain (McGuire et al 2009, Loisel et al 2021, Zona et al 2022. To better evaluate climate impacts on the Arctic-Boreal region and understand vegetation-climate feedbacks, monitoring the status of Arctic-Boreal terrestrial photosynthesis is essential (Fisher et al 2014).
Plant carbon uptake via photosynthesis at the ecosystem scale, gross primary production (GPP), can only be estimated indirectly from the ground or space. On the ground, tower-based eddy covariance (EC) techniques directly measure net ecosystem CO 2 exchange (Baldocchi 2003), which is then partitioned into GPP and ecosystem respiration. EC towers in the Arctic-Boreal region are unevenly and sparsely distributed in space (figure 1, table 1), which make it difficult to represent the spatial variability of GPP across heterogeneous land cover in the Arctic-Boreal region (Curasi et al 2022, Pallandt et al 2022. EC techniques are also prone to error in complex terrain, which plays an important role in above-ground biomass distributions in the Arctic-Boreal region (Bruun et al 2006, Dobrowski 2011, Riihimäki et al 2017.
Similar to EC towers, satellite remote sensing techniques indirectly infer GPP. An advantage of satellite remote sensing techniques is a more extensive spatial coverage, enabling the comparison of GPP across heterogeneous land cover (Funk et al 2004, Roland et al 2021 and complex Arctic-Boreal terrain (Roland et al 2019). However, satellite remote sensing techniques also have higher uncertainties due to more assumptions made in the derivation of GPP (Tramontana et al 2015. Remote sensing techniques often rely on canopy optical properties that can approximate absorbed photosynthetic active radiation (APAR) by vegetation. The fraction of APAR used for photosynthesis is referred to as light use efficiency (LUE). So, GPP can be derived as Remote sensing GPP products, such as from the moderate resolution imaging spectroradiometer (MODIS) (Running et al 2004, Zhao et al 2005, are primarily derived from the normalized difference in the surface reflectance between red and near-infrared regions, which is a proxy for the fraction of incoming light absorbed by the canopy, or APAR. However, APAR changes alone are not representative of the seasonal cycle in boreal evergreen ecosystems well, as vegetation photosynthetic activity ceases while maintaining light absorbing chlorophyll throughout the season (Bowling et al 2018, Magney et al 2019, Cheng et al 2020. Thus, quantifying variations in LUE is crucial for accurately estimating Arctic-Boreal GPP.
Remote sensing of solar-induced chlorophyll fluorescence (SIF) from space opens up a new possibility to infer GPP remotely (Frankenberg et al 2011, Guanter et al 2012, Sun et al 2017, Li et al 2018, Zhang et al 2020, Turner et al 2021, Li and Xiao 2022. SIF is a small amount of energy emitted from leaf chlorophyll, which is driven by APAR. SIF appears to be a good indicator of the partitioning of APAR between photochemical quenching for photosynthesis and non-photochemical quenching, i.e. LUE (Magney et al 2019, Pierrat et al 2022), especially in challenging environments that are snowy or have low solar angles (Walther et al 2016(Walther et al , 2018. Thus, satellite-based SIF is a promising tool for inferring GPP at the regional scale in the Arctic-Boreal region. Similar to equation (1), SIF can be conceptualized as: where Φ F is the quantum yield of fluorescence, and f esc is the escape ratio of SIF from the canopy ( Thus, the regression slope k can be generalized in different plant functional types to account for varying photosynthetic yields, SIF yields, and canopy structures since it is a function of LUE, Φ F , and f esc : Solving and categorizing k by plant functional types has improved the ability of biosphere models to simulate GPP in temperate regions (Delaria et al 2021, Wu et al 2021. However, the resulting k values from previous studies (Turner et al 2021, Li and Xiao 2022, Liu et al 2022 lack representativeness in the Arctic-Boreal region because they are categorized by general definitions of plant functional types at the global scale, rather than being tuned to the unique vegetation composition and land cover in the Arctic-Boreal region.
Hence, the goal of this study is to quantitatively evaluate the empirical SIF-GPP relationship (equation (3)) and its uncertainty in the context of the Arctic-Boreal region at the regional scale using remote sensing techniques. We chose to focus on the core region Arctic-Boreal vulnerability experiment (ABoVE) domain (www.above.nasa.com;Goetz et al 2011, Griffith et al 2012, Loboda et al 2017, where land cover types have been defined and validated in the context of Arctic-Boreal species and canopy structures (figure 1(a); Wang et al 2019). To obtain extensive spatial coverage we fit the empirical SIF-GPP relationship and solved for k using TROPOMI SIF dc and a state-of-the-art machine learning gridded GPP product (FluxCom RS; Jung et al 2020). To help biosphere modelers cope with the model-data uncertainties (Keenan et al 2011, Xiao et al 2014, we evaluated the goodness of empirically fitted SIF-GPP relationships with Pearson's r 2 values and reduced χ 2 given the uncertainties in both FluxCom GPP and TROPOMI SIF dc . Even though the gridded products are advantageous at regional scales in the Arctic-Boreal regions, the potential systematic biases of gridded products can complicate the understanding of the SIF-GPP relationship (Sun et al 2017). Thus, we addressed four other sources of uncertainties in the SIF-GPP relationship: (a) selection of gridded products, (b) snow contamination in remote sensing products, (c) changing biomass distribution along elevational gradients, and (d) limited perspective and misrepresentation of heterogeneous land cover across spatial resolutions. Here, we present the opportunities and limitations of remote sensing and machine learning tools for studying GPP in the Arctic-Boreal region (section 4.1). FluxCom RS ensembles include 18 members from nine machine learning models and two GPP flux partitioning methods. Using GPP from EC towers as training data (Tramontana et al 2016), all ensemble members of different methods predict GPP with the same set of predictors, including land surface temperature, land cover, the fraction of absorbed photosynthetically active radiation, and normalized difference vegetation index (NDVI) from MODIS land products. We took the standard deviation of the predicted GPP of all ensembles as the uncertainty of FluxCom GPP.

Gridded datasets and their uncertainties
Because the FluxCom RS GPP is predicted by remote sensing products, snow contamination in MODIS products (Cihlar 1996) can propagate into FluxCom GPP. To evaluate the impact of snow contamination on the SIF-GPP relationship, we compared the seasonal trajectory of FluxCom GPP with and without snow filtering. We used the 2018-2019 eight-day MODIS L3 0.05 • global snow cover product MOD10C2 (Hall and Riggs 2021) as a snow filter, which reports the area fraction of snow cover (dimensionless) in each grid cell. The snow cover data in the study area were regridded to the same spatial and temporal resolution as the FluxCom GPP product. Here, we define FluxCom GPP as snow-free when the snow cover is less than 0.1 (figure B.6).
Additionally, the uncertainty of FluxCom GPP can be also due to the extrapolation of trained parameters due to limited EC towers sampling. Jung et al (2020) has developed an extrapolation index (EI) to address this issue by illustrating the total distance of an extrapolated point to the nearest training data in the space of all predictors. Here, we reproduced the multi-year average (2001-2018) of annual mean EI and its seasonal range in the study domain to qualitatively examine the representativeness of FluxCom GPP.

TROPOMI SIF
We gridded individual SIF soundings from TRO-POMI at 740 nm between 2018 and 2019 in the study area to the same spatial and temporal resolutions as FluxCom GPP (appendix A). Because satellite-based SIF is an instantaneous value indicative of the light condition at the time of measurement, the daily mean SIF, SIF dc , was scaled from the instantaneous measurement using a length-of-day correction factor based on the diurnal cycle of solar radiation (Köhler et al 2018). To account for varying numbers of soundings across grids, we took the standard error of SIF dc from individual soundings falling in each grid cell as the uncertainty of TROPOMI SIF, which is derived as the standard deviation divided by the square root of the number of soundings.

Orthogonal distance regression
With snow-free FluxCom GPP and TROPOMI SIF dc as well as their uncertainties, we fit the linear model in equation (3) without an intercept using the orthogonal distance regression (Boggs et al 1989) for each grid cell, where the regression slope k, Pearson's r 2 , and reduced χ 2 were computed.
Previous studies (Liu et al 2022, Wu et al 2022) have often used Pearson's r 2 as the only metric for explanatory power even though measurement noise can reduce Pearson's r 2 , although the measurements themselves might be accurate but just less precise. Thus, we use both Pearson's r 2 and reduced χ 2 together to evaluate the linear empirical model between GPP and SIF dc from the perspective of correlation (Pearson's r 2 ) as well as the goodness of the fit (reduced χ 2 ). High reduced χ 2 suggests the linear model is underfitting the data. When reduced χ 2 is lower than 1, it suggests that the linear model is overfitting the given uncertainties on FluxCom GPP and grid TROPOMI SIF dc . A reduced χ 2 around 1 represents a good fit, regardless of Pearson's r 2 value.

Arctic-Boreal land cover map
In the context of Arctic-Boreal species and canopy structures, we categorized the fitted k, Pearson's r 2 , and reduced χ 2 by 15 Arctic-Boreal land cover types based on 2014 ABoVE Land Cover dataset from (Wang et al 2019). The original spatial resolution of the land cover dataset is 30 m × 30 m (LC30M), which we aggregated into 0.083 33 • × 0.083 33 • (LC008333D) grids to align with FluxCom GPP. The land cover pixels of LC30M were counted within each LC008333D grid. The land cover type with the maximal area fraction in the LC008333D grid is defined as the dominant land cover type (figure 1(a)), while the maximal area fraction is defined as the dominant land cover fraction ( figure 1(b)). Heterogeneous land cover is associated with a lower dominant land cover fraction.
Surface water is common in Arctic-Boreal ecosystems (Stow et al 2004, Muster et al 2013. However, NDVI obtained from mixed pixels including both vegetation and surface water is often close to that of vegetation only. Because water surfaces are very dark (Jiang et al 2005), few of the reflected photons measured from space emanate from water surfaces. To estimate the influence of the underestimated surface water on FluxCom GPP which uses NDVI (Tramontana et al 2016), we calculated the area fraction per LC008333D grid occupied by wetland land cover types including Fen, Bog, and Water. Here, we neglected Shallows/littoral land cover type as it is nonvegetation dominated and dominates less than 0.1% of all LC008333D grids.

Topography
We decomposed the resulting k, Pearson's r 2 , and reduced χ 2 as a function of elevation. The elevation data in the study area were obtained from the USGS Global 30 Arc-Second elevation dataset (GTOPO30; Earth Resources Observation And Science (EROS) Center 2017). We regridded the elevation data to the same spatial resolution as FluxCom GPP using Google Earth Engine (Gorelick et al 2017).

Ground-level GPP and SIF
Due to highly heterogeneous land cover (Myers-Smith et al 2020, Wang et al 2020) in the Arctic-Boreal region, the SIF-GPP relationships at different observational scales can vary. Satellite footprints often cover a larger area than the footprints of EC towers so the dominant land cover of the two scales may not match despite the satellite footprints centering on the location of towers. To address the difference and correspondence across scales, we compared the observations from towers against satellite pixels of the same land cover types.
We used half-hourly gap-filled GPP data of EC towers from principal investigators (PIs) and the Fluxnet2015 dataset (Papale et al 2015; table 1) in the study area and calculated the daily mean EC GPP. Because of various temporal ranges for different towers, we calculated the multi-year average of daily mean EC GPP at the eight-day interval aligned with the temporal interval of FluxCom GPP. We defined the land cover types for EC towers based on the description of tower footprints from site PIs.
We evaluated the TROPOMI SIF dc data against a tower-based SIF product in CA-Obs (Pierrat et al 2022, Pierrat and Stutz 2022), which is close to our study area but outside the LC map. A 2D scanning telescope measures SIF at 745-758 nm across a canopy representative loop that repeats every half hour, from which we calculated daily mean SIF at eightday intervals. The International Geosphere-Biosphere Programme (IGBP) classification of CA-Obs is evergreen needleleaf forests (ENF). Thus, we used it to benchmark FluxCom GPP and gridded TROPOMI SIF dc in Evergreen forest.

Annualized relationship of SIF and GPP
The 95th percentiles of TROPOMI SIF dc and snowfree FluxCom GPP are not consistent across space (figures 1(c) and (d)), suggesting that the regression slope k is not homogeneous in the Arctic-Boreal region. Tussock Tundra on the North Slope of the Brooks Range has a higher 95th percentile of SIF dc than the surrounding area, while the 95th percentile of GPP is similar to the surrounding area. The 95th percentile of SIF dc is high in the southern portion of our study area, which may be attributed to agricultural land located in southern Alberta and Saskatchewan (Guanter et al 2014).
The dynamic ranges of GPP and SIF dc vary with land covers (figure 2). The growing season maximal GPP is lowest in land covers with lower statures, such as Low Shrub and Tussock Tundra. The growing season maximal SIF dc is often lower than 0.5 mW m −2 sr −1 nm −1 except in Deciduous Forest, Woodland, Tall Shrub, and Herbaceous.
In Woodland, the linear SIF dc -GPP relationship splits (figure 2(d)) because Woodland is a heterogeneous land cover type coexisting with other land covers (Wang et al 2019). Thus, the SIF dc -GPP relationship of Woodland contains the features of both high-and low-statured land cover types.
The linear correlation of GPP and SIF dc from gridded products is comparable to tower-based measurements (figure 2). Except for Evergreen Forest and Fen, where the maximum EC GPP is lower than FluxCom GPP, FluxCom GPP may be overestimated. EC GPP can be negative during winter, which is an artifact of the flux partitioning (Hagen et al 2006, Wutzler et al 2018. The daily mean SIF from the tower-based instrument in CA-Obs nicely falls in the dynamic range of TROPOMI SIF dc (figures 2(a) and C.7).

Spatial patterns of the SIF-GPP relationship
The spatial distribution of the resulting regression slope k ( figure 3(a)) is primarily a function of land cover types ( figure 1(a)). Similar to figure 2, k is higher in Evergreen forest, which is in the southwest part of the study area, and lower in Tussock Tundra on the North Slope of the Brooks Range.
The correlation between SIF and GPP (Pearson's r 2 ; figure 3(b)) depends on the synchrony of the seasonal trajectories of SIF dc and GPP. Most of our study  eight-day TROPOMI SIF dc (mW m −2 sr −1 nm −1 ) and FluxCom GPP (gC m −2 day −1 ) in the study area. The color of hexbins represents the pixel recurrence at a given pair of SIF dc and GPP bins. The white solid contours are the 95 percentile of the recurrence when the snow cover is less than 0.1 (snow-free). The white dashed contours are the 95 percentile of the recurrence when the snow cover is greater than 0.1. The dotted white contour in (d) is the 95 percentile of grids where the dominant land cover (a.k.a Woodland) fraction is greater than 80%. The gray line is the mean SIF-GPP relationship with the math expression noted in each land cover type. The blue triangles are the multi-year average of eight-day tower-based daily mean SIF and daily mean EC GPP from CA-Obs, whose regression slope k is written in blue. The green crosses are the multi-year average of eight-day TROPOMI SIF dc and daily mean EC GPP from all other EC towers according to land cover types. The range of regression slopes of green crosses from different towers is written in green. Panel (o) shows the area fraction occupied by Fen, Bog, and Water.
area has moderate to high Pearson's r 2 ( figure 3(b)), where SIF dc and GPP peak simultaneously across our study area (figures 1(e) and (f)).
In the Sparsely vegetated northeastern part of the study area, Pearson's r 2 is low, and the annual mean EI (figure 3(f)) is high, indicating that the FluxCom models predict GPP in this region with few training samples and thus yield higher uncertainties. The high seasonal range in EI ( figure 3(d)) suggests the extrapolation is more severe in winter than in summer.
The reduced χ 2 is much higher than 1 near glacial lakes in Northern Canada (figure 3(a)) and Deciduous forest, indicating the empirically fitted SIF-GPP relationship is underestimated and does not fully capture the seasonal trajectories in SIF dc and GPP. One possible reason is that most training samples used by FluxCom models in the Arctic-Boreal are not Deciduous forest (figures 1(a) and 3(f)). Thus, the FluxCom models have to extrapolate from training samples that are less similar to the environment of the region so that the FluxCom GPP has a higher error in Deciduous forest.

Overestimated FluxCom GPP in wetlands
Similar to other reflectance-based GPP products (Joiner et al 2018), we found FluxCom GPP may be overestimated in wetlands. In Fen, FluxCom GPP is substantially higher than EC GPP (figure 2(k)) and other non-wetland herbaceous land cover types (figure 2). In Bog and Water, FluxCom GPP is also unrealistically high while SIF dc is around 0. These results suggest a potential overestimation of FluxCom GPP in wetlands.
This bias caused by water is more significant in the area with a high fraction of wetlands ( figure 2(o)), where the annual mean and seasonal range of EI are also high (figures 3(d) and (f)).

Topographic impact on the SIF-GPP relationship
There is a topographic dependence of k and Pearson's r 2 . k (Pearson's r 2 ) is higher (lower) along the Brooks Range, the Mackenzie River, the Alaska Range, and the north end of the Rocky Mountains (figures 3(a)-(c) and (e)). Meanwhile, the reduced χ 2 is mostly around 1 across topography, suggesting the fitted SIF-GPP relationship is reliable.  Roland et al 2021). For example, when Evergreen forest becomes more abundant, k is higher between 1000 and 1500 m in elevation. Above the tree line (∼1500 m), k drops as the fraction of grid composed of Evergreen forest reduces. The highest k in Evergreen forest is obtained at a 2000 m elevation which can be noisy because the reduced χ 2 is much less than 1 suggesting the linear model overfits the data.

Snow contamination and snow impact on the SIF-GPP relationship
FluxCom GPP is occasionally unrealistically high during winters, when SIF dc is around zero (figure 2). We found that this is a sign of snow contamination, especially in the land cover types with lower canopy heights, such as Low shrub, Herbaceous, and Tussock Tundra (figures 2(e), (h) and (i)). After snowy pixels were filtered, the distribution of TROPOMI SIF dc and FluxCom GPP is more towards linear.
Although the change in resulting k due to snow filtering is small, snow filtering has substantially improved the goodness of fit by increasing Pearson's r 2 and/or pushing the reduced χ 2 towards 1 (figure B.6) across all land cover types and all elevations, especially in low-statured land covers, such as Low shrub, Herbaceous, and Tussock Tundra (figures 4(e), (h) and (i)) where the split distribution pattern due to snow contamination is observed in figure 4. In forests (figures 4(a)-(c)), although Pearson's r 2 decreases, the reduced χ 2 has been improved by approaching 1.

Opportunities for remotely evaluating GPP seasonality in the Arctic-Boreal region
We reported and evaluated the SIF-GPP relationship in the context of Arctic-Boreal land cover types at the regional scale. The extensive spatial coverage of our study and validation from EC GPP and towerbased SIF data underscores the potential of using remote sensing and machine learning techniques in the Arctic-Boreal region if remote sensing data are carefully filtered for snow contamination.
Benefiting from the extensive spatial coverage, FluxCom GPP and TROPOMI SIF fill the gaps in land cover types that are too remote to be extensively sampled by ground-based measurements (Virkkala et al 2022) or in complex terrain where eddycovariance techniques are challenging to apply (Paw U et al 2000, Baldocchi 2003).

Uncertainties in the SIF-GPP relationship in the Arctic-Boreal region
Contrasting to a universal k for all land cover types solved in Sun et al (2017), Li et al (2018, and Li and Xiao (2022), we found it is challenging to find a one-model-fits-all approach to estimate GPP using SIF dc in the Arctic-Boreal region, especially across multiple land cover types or even within the same dominant land cover types. The heterogeneous land cover and complex terrain in the Arctic-Boreal region further complicate interpreting the fitted SIF-GPP relationship and resulting k values. The elevational and spatial gradients of sub-pixel land cover contribute to the uncertainties of k among the pixels of the same dominant land cover types. For future studies, comprehensive sampling of the physiological traits (such as LUE, Φ F , and f esc ; equation (4)) across land covers can help mechanistically explain the variations in k.
Another source of uncertainties in the SIF-GPP relationship is the temporal variability due to seasonally biased sampling of remotely sensed SIF and GPP. Because both TROPOMI SIF and MODIS data used in FluxCom GPP are derived from optical measurements, the large seasonal fluctuations of the solar radiation in the Arctic-Boreal regions lead to seasonal variabilities of valid soundings (Cheng et al 2022) and uncertainties. Our study provided both Pearson's r 2 and reduced χ 2 to help biosphere modelers use the resulting k judiciously considering the uncertainty of both SIF and GPP as well as the linearity between SIF and GPP.
The asynchrony of SIF and GPP can also deteriorate the linearity of SIF-GPP relationship. Because SIF contains the information of both APAR and LUE, the seasonal trajectory of SIF may deviate from the reflectance/APAR-based GPP products (such as Flux-Com GPP) (Walther et al 2016, Maguire et al 2021. Long-term and continuous EC GPP can help better constrain the temporal uncertainty in remote sensing-based GPP products. It is worth noting that complex terrain may cause high uncertainties in TROPOMI measurements (Turner et al 2020) and inaccurate length-of-day correction factors in SIF dc (Cheng et al 2022, Köhler et al 2018, leading to larger uncertainties in the SIF-GPP relationship. Fortunately, these impacts are negligible in this study since there are no missing samples due to topography (figure A.5), and the footprint of TRO-POMI soundings (5 km × 3.5 km at nadir, or up to 14 km at the edges of the swath) are large enough to average out the topographic impact on the length-ofday correction factor (Cheng et al 2022).

Variability of k values across latitudes and data products
Due to the non-uniform spectral shape of SIF, our k values are only suitable for estimating GPP with SIF measurements at 740 nm and not comparable to the k values evaluated by SIF at different wavelengths (Guanter et al 2012, Sun et al 2017, Li et al 2018, Zhang et al 2020. Compared to the studies using the same TRO-POMI SIF (Turner et al 2021, Li and Xiao 2022, Liu et al 2022, our study yields much higher k values, especially in high-statured land cover types. Since those previous studies (Turner et al 2021, Li and Xiao 2022, Liu et al 2022 mostly focus on lower latitudes, the disagreement in k of the same land cover types may indicate different vegetation composition, photosynthetic productivity, fluorescence yield, sub-pixel variability, and/or canopy openness across latitudes (Prock and Körner 1996, Kreyling 2020, Crous et al 2022 as suggested in equation (4).
The different k across spatial scales (figures 2 and C.7) and between our results and previous studies (Turner et al 2021, Li and Xiao 2022, Liu et al 2022 can also be attributed to the inconsistency between FluxCom and EC GPP (Sun et al 2017). Next, we will discuss the potential biases in Fluxcom GPP.

Snow contamination
Although the original FluxCom GPP product has already removed some snowy pixels by using MODIS quality flags (Jung et al 2020), we found some snow contamination still exists (figure 2). In this study, we used a more conservative snow filter (<0.1) to showcase the snow contamination in FluxCom GPP propagated from remote sensing products (Jin et al 2017, Myers-Smith et al 2020. More importantly, our results suggest that quantitative and standalone information on snow coverage in addition to quality flags is helpful for improving future machine learning products (Chen et al 2018).
Snow contamination does not impact all land cover types equally. Low-statured land cover types are more likely to have unrealistically high FluxCom GPP before the growing season starts (figure 2). Thus, the universal snow filter we used in this study may be too conservative. For future studies, rigorous validation of snow measurements at regional scales will greatly improve canopy radiative transfer simulations and optical remote sensing retrievals at the Arctic-Boreal region (Kobayashi et al 2007, Kobayashi and Iwabuchi 2008, Chen et al 2018.

Underrepresented water
Contrary to attributing the high k values in wetlands to underestimated SIF (Chen et al 2021), our results suggest the unrealistically high FluxCom GPP is the reason for high k values in wetland land cover types. FluxCom GPP has been overestimated because NDVI of surface water in mixed pixels with both vegetation and surface water is understated (Jiang et al 2005, 2006, Huemmrich et al 2021. Using near-infrared reflectance of vegetation for FluxCom models may better account for the dark surface water reflectance than NDVI and improve the SIF-GPP relationship . This bias further compounds the uncertainty due to a lack of sampling as high EI and high wetland area fractions collocate. Taken together, these two issues can limit the application of FluxCom GPP in the Arctic-Boreal region (figure 2(o); Stow et al 2004, Muster et al 2013.

Extrapolation of training data
Because the spread in FluxCom GPP ensembles may not fully represent the disagreement between Flux-Com and EC GPP when there are few EC towers as training samples for FluxCom (Pallandt et al 2022), the resulting k values may be more reliable where FluxCom and EC GPP are similar (such as Tussock Tundra and Low Shrub; figures 2(e) and (i)) than the ones where the FluxCom GPP is substantially overestimated (such as Evergreen forest and Fen; figures 2(a) and (k)).
Nevertheless, there is a time mismatch between FluxCom GPP and EC GPP (table 1) in this study, where the inter-annual variability of GPP seasonality is ignored. In future studies, more active EC towers with long-term record of GPP are needed to improve FluxCom GPP.

Limitations from heterogeneous sub-pixel land cover
We showed that land cover in the Arctic-Boreal region is highly heterogeneous at sub-pixel. The dominant vegetated land cover types on average occupy less than 50% of the area in each 0.083 33 • × 0.083 33 • grid ( figure 1(b)). Because heterogeneous land cover can blur the distinct SIF-GPP relationship of each individual land cover type (Zhang et al 2020), it is challenging to unmix the contribution of subpixel land cover types at the current spatial scale. This results in a few notable limitations in our study: (a) The land cover definitions of EC towers are different according to 30 m vicinity (LC30M), 0.083 33 • vicinity (LC008333D), and the actual footprint of towers based on PI's descriptions (tower footprint land cover in table 1). The observed vegetation composition and determining factor (physiology vs light absorption) for SIF variability may also shift across spatial scales (Maguire et al 2021), even though the dynamic range of SIF dc amplitude in our study is consistent from ground level to satellite level (figure C.7). As a result, there may be a mismatch of land cover types when we benchmark across spatial scales.

Conclusions
In this study, we evaluated the empirical linear relationship of SIF dc and GPP across the Arctic-Boreal region from the perspectives of Pearson's r 2 and the goodness of fit. Our results show the promise of monitoring Arctic-Boreal vegetation using novel remote sensing tools after careful quality control. For the first time, our study reports the fitted regression slope k as well as the uncertainties of fitted SIF dc -GPP relationship for the land cover types that are unique to the Arctic-Boreal region. The resulting k, Pearson's r 2 , and reduced χ 2 together can help biosphere modelers improve the estimation of GPP in the Arctic-Boreal regions and cope with model-data uncertainties.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https:// data.caltech.edu/records/20216.

Appendix B. Snow filters
We also tested different snow covers as thresholds. We found the snow filter works well for removing snow-contaminated FluxCom GPP and improving the goodness of the fit of the SIF-GPP relationship. For example, US-ICt, a Tussock Tundra site, represents the lower-stature canopies that benefit from the snow filter. Snow filters remove snow-contaminated FluxCom GPP during the growth onset, which has a higher error bar. In this example, a strict snow filter (a smaller value of snow cover) includes fewer data for the regression but improves the goodness of the fit by increasing Pearson's r 2 and pushing the reduced χ 2 towards 1.

Appendix C. Spatial upscaling
For CA-Obs, where we have observations (climatology) of SIF dc and GPP at both tower and gridded scales, we compared the measurements across spatial scales (figure C.7). The seasonality and magnitude of SIF dc across spatial scales are mostly consistent, while FluxCom GPP and EC GPP are not consistent and entirely synchronized. The difference in both the amplitude and timing of seasons between the two SIF products may be attributed to the deciduous trees and understory, which are more visible from space (TRO-POMI) than the tower instrument (PhotoSpec) due to the shallower view angles of PhotoSpec.