Narrow but robust advantages in two-big-leaf light use efficiency models over big-leaf light use efficiency models at ecosystem level

This study aims to (1) investigate whether two-big-leaf light use efficiency (LUE) models (TL) outperform big-leaf LUE models (BL) by incorporating different gross primary productivity (GPP) responses in sunlit and shaded leaves; (2) explore the robustness of using the leaf area index (LAI), clumping index ( Ω ) and spherical leaf angle distribution to partition canopies into sunlit and shaded leaves across canopy architectures; (3) identify optimal light response forms in LUE models. To exclude influences of drivers of GPP other than radiation, we collected various formulations of GPP response functions to temperature, vapor pressure deficit, CO 2 , soil water supply, light intensity and cloudiness index to construct 5600 BLs and 1120 TLs. These models were evaluated at 196 globally-distributed eddy covariance sites from the FLUXNET observational network using the Nash-Sutcliffe model efficiency (NSE), root mean squared error and Bayesian information criterion. Across all sites, the best big-leaf model (BL*; NSE = 0.82) was statistically equal to the best TL (TL*; NSE = 0.84). However, daily dynamics in GPP under hot and dry conditions were best described using TL* in 17% of sites, highlighting the local importance in separating sunlit and shaded leaves. Across approaches to represent effective LAI, the best approach relies on using normalized difference vegetation index with a spherical or flexible leaf angle distri- bution across sites rather than satellite LAI and Ω . We also observed similar performance between non-rectangular hyperbola and reciprocal light response functions across TLs. Models degrade when the maximum LUE is not differentiated between sunlit and shaded leaves, but not when light saturation levels are the same. Despite functional differences, the best five TLs agree in a larger contribution of shaded leaf area to total GPP, resulting from higher LAI and LUE. Overall, these results suggest marginal but robust selection of TL compared to BL.


Introduction
Canopy photosynthesis is the total photosynthesis of all the components, leaves mainly, that can assimilate CO 2 within a canopy (Johnson and Thornley, 1984). The photosynthetic rate of each component is affected by environmental conditions such as absorbed light, leaf temperature, and vapor pressure deficit (Cornic and Massacci, 1996;Farquhar et al., 1980;Wright et al., 2017). Due to the variability of photosynthetic rates across leaves (Baldocchi and Amthor, 2001;Kimura et al., 2020;Terashima and Hikosaka, 1995;Wu et al., 2017), scaling photosynthesis from leaf (or sub-leaf) to canopy is complex (e.g., 3D models, Bailey and Kent, 2021). Hence, an approximation method for calculating canopy photosynthesis, which separates leaves into several groups according to their orientations and vertical positions, is widely used (Bai et al., 2018;Baldocchi and Harley, 1995;Chen et al., 1999;Chen and Zhuang, 2014;Krinner et al., 2005;Leuning et al., 1995;Wang and Leuning, 1998). However, the number of leaf groups determines the complexity of the approximation approach, making it crucial to find a tradeoff between model complexity and efficiency (Hogue et al., 2006). Current canopy photosynthesis models, also known as ecosystem gross primary productivity (GPP) models, include multilayer models, two-leaf models (and two-big-leaf models), and big-leaf models. A multilayer model separates leaves into several vertical layers and leaves at each layer into sunlit and shaded parts (Baldocchi and Harley, 1995;Bonan et al., 2014;Leuning et al., 1995;Mercado et al., 2009;Zhang et al., 2020) or more classes depending on leaf angles and azimuthal orientations (Tol et al., 2009). Since gradients in illumination and physiological properties with canopy depth and leaf angle distribution are considered, multilayer models usually show a robust performance and the highest complexity compared with the other two kinds of models (Bonan et al., 2021). However, it is found that the performance of the multilayer model and big-leaf model for estimating GPP is not significantly different if plant hydraulics is excluded (Wozniak et al., 2020). The significant variability of leaf photosynthesis is between sunlit and shaded leaves (Chen et al., 1999;Pignon et al., 2017;Wang et al., 2018) due to the dissimilar illumination conditions. Sunlit leaves here refer to the leaves under the solar beam, which can absorb direct, sky diffuse, and radiation scattered due to leaf interactions. Shaded leaves can only absorb diffuse and scattered radiation. The photosynthesis of sunlit leaves is often limited by carboxylation capacity and thus gets saturated under intense global radiation (Björkman, 1968;Mathur et al., 2018;Nobel, 1976). By contrast, the photosynthesis of the shaded leaf is usually constrained by the electron transport rate due to limited diffuse light .
To simplify multilayer models and maintain the major cause of variability of leaf photosynthesis within the canopy, two-leaf models and two-big-leaf models, which assume the canopy can be represented by a big sunlit leaf and a big shaded leaf, have been developed (Bai et al., 2018;Chen et al., 1999;Dai et al., 2004;Wang and Leuning, 1998). Two-leaf models use leaf-scale parameters and integrate the leaf photosynthetic rate to the canopy scale Wang and Leuning, 1998). In contrast, two-big-leaf models directly derive parameters for sunlit and shaded canopies separately (Chen et al., 1999). Two-leaf and two-big-leaf models have less complexity than multilayer models (Bonan et al., 2021), and account for diverse illumination conditions and photosynthetic capacities between sunlit and shaded leaves (Chen et al., 2020;Luo et al., 2018).
As the simplest canopy photosynthesis model, big-leaf models do not distinguish between direct and diffuse components of incoming radiation and represent the radiation within the canopy as a single average value Wang and Leuning, 1998). Despite lower robustness than the other two kinds of models (Friend, 2001;Luo et al., 2018), a big-leaf model, which requires the least model parameters, can be the most efficient at the continental or global scale (Ciais et al., 2005;Sellers et al., 1997).
Light use efficiency (LUE) models (Monteith, 1972) are frequently used to estimate GPP, which define GPP as the product of light use efficiency (ε) and absorbed photosynthetic active radiation (APAR). As initially developed at the canopy scale, most LUE models adopt the big-leaf approximation approach (Mäkelä et al., 2008;Peltoniemi et al., 2015;Running et al., 2000;Wang et al., 2017) and neglect the bimodal distribution of radiation between sunlit and shaded leaves (Mercado et al., 2009). To address the effect of diffuse radiation on the GPP, several big-leaf LUE (BL) models have been improved by incorporating a function of cloudiness index (CloudI) to represent the effect of the diffuse fraction on the total canopy GPP (Donohue et al., 2014;Turner et al., 2006;Wang et al., 2015Wang et al., , 2018. Alternatively, the BL model bias due to ignorance of the dissimilar illumination can be reduced using parameter calibration (Carvalhais et al., 2008;Luo and Schuur, 2020;Mäkelä et al., 2008;Peltoniemi et al., 2015;Tian et al., 2020;Turner et al., 2006). Meanwhile, some two-big-leaf LUE (TL) models have been developed by separating ε and APAR into sunlit and shaded leaves (He et al., 2013;Yan et al., 2017). The responses of sunlit and shaded leaves to environmental changes can be accordingly differentiated based on the respective ε and APAR in TL models.
In previous comparisons between BL and TL models (Chen et al., 2020;He et al., 2013;Wu et al., 2015;Yan et al., 2017;Zhou et al., 2016b), the effect of diffuse radiation on GPP of shaded leaves, the leading cause of the difference between these two kinds of models (Wu et al., 2015;Yan et al., 2020), was not considered in BL models. Further, some model parameters determining the model performance and variability of GPP (e.g., parameters controlling the sensitivity of GPP to temperature) were not calibrated in the comparison of BL and TL models (He et al., 2013;Yan et al., 2017). Hence, it is still an open question whether the tradeoff between TL and BL models incorporating CloudI is justifiable, and under which conditions the discrimination between sunlit and shaded components of the canopy brings insight when analyzing the response of GPP to changes in environmental conditions.
Assuming that a TL model outperforms a BL model incorporating CloudI, a possible reason is that the separation of sunlit and shaded leaves addresses the effect of canopy variability which can affect the illumination condition within the canopy. The effective leaf area index (LAI e ), equal to the combination of leaf area index (LAI; Black et al., 1991) and clumping index (Ω; Nilson, 1971), and the leaf angle distribution (Chen et al., 1999;Guan et al., 2021;He et al., 2013) have been widely used to estimate fractions of direct, diffuse and scattered radiation in sunlit and shaded leaves for various canopy architectures. However, current satellite-remote-sensing-based LAI and Ω (Jiao et al., 2018;Myneni et al., 1997;Wei et al., 2016;Xiao et al., 2014;Zhu et al., 2013) products have been estimated using a biome-specific parameterization approach. In many cases, the leaf angle distribution was assumed to be spherical across canopies (Chen et al., 1999;Guan et al., 2021;He et al., 2013;Norman, 1982;Yan et al., 2017). Some of these assumptions can be relevant for the separation of sunlit and shaded leaves which can be affected by biome misclassification or neglecting the variability of canopy parameters within a biome.
Another reason for the better performance of a TL model over a BL model with CloudI can be the differentiated response of instantaneous ε between sunlit and shaded leaves to light and meteorological conditions. The effect of light intensity on GPP (i.e., light saturation) is significant for both sunlit and shaded leaves (Björkman, 1968;Guan et al., 2021;Nobel, 1976), but was neglected in many LUE models (He et al., 2013;Running et al., 2000;Yan et al., 2017). The light responses of GPP can be differentiated between sunlit and shaded leaves (Guan et al., 2021), or considered as a response of the whole canopy as in the BL model (Ibrom et al., 2008;Mäkelä et al., 2008). The light response form can either be reciprocal (Mäkelä et al., 2008) or non-rectangular hyperbola (Ibrom et al., 2008). But, the effect of different ways and forms to represent the light response of sunlit and shaded leaves has not yet been thoroughly assessed. In addition, the contribution to GPP of shaded leaves relative to sunlit leaves, which has an essential role under the fertilization effect of global dimming (Shao et al., 2020), can be biased since it depends on the choice of the response of GPP to meteorological factors (Bao et al., 2022). Overall, the current unknowns and knowledge gaps in modeling formulations motivate the use of globally available ecosystem-level observations to understand the differences in modeling GPP responses to environmental effects between BL and TL models. In particular, this study focuses on the following three objectives: (1) First, this study aims to identify the advantage of TL relative to BL models incorporating CloudI. The first hypothesis is that TL models are significantly better than BL models incorporating CloudI and light intensity at the site level since two-big-leaf models attempt to include the mechanisms of light responses more realistically. We therefore compare TL and BL models (introduced in Section 2.1) for their ability to estimate GPP at 196 eddy covariance sites across the globe from FLUXNET.
(2) The second goal is to test the hypothesis that splitting sunlit and shaded leaves can be estimated using LAI, Ω and flexible leaf angle distributions at various canopy structures. To test this hypothesis, we contrast three approaches using various vegetation indexes to estimate LAI e (Section 2.2). (3) Finally, we aim to identify the light response form of canopy GPP through the comparison of TL and three BL models (Section 2.3).
All models were calibrated and compared based on a method considering different sites and statistic metrics (Sections 2.4 and 2.5) and the best TL model was either selected from the published models or newly formulated (explained in Sections 2.1 to 2.3).
In general, the purpose of this paper is to evaluate different LUE models for estimating GPP from remote sensing products at ecosystem scales, particularly whether a slightly more complex and more mechanistic type of model (TL) has an advantage over a simpler one (BL). We see that the main benefit of the TL model can be compensated by an appropriate sensitivity function that includes cloudiness in BL models. The value and novelty of the approach are that we examine all possible models at the current state of the art and not some subjectively selected. From this comprehensive approach, the "best model" is found out of a very large number of models, and we suggest that the model structure of this model is representing the nature of GPP responses to the examined environmental factors and the vegetation structures in the sampled vegetation zones best. With this reason we justify to use this model as the best available representation to derive sensitivities of GPP to combinations of environmental drivers. We think that this is a very valuable learning on global GPP features which relates very well to the purpose of the paper.

Big-leaf and two-big-leaf light use efficiency models
In LUE models, GPP is calculated as the product of APAR and ε, defined as the maximum light use efficiency (ε max ) reduced by environmental sensitivity functions that control GPP variability between 0 and 1. In our previous study (Bao et al., 2022), we incorporated sensitivity functions of temperature (fT), vapor pressure deficit (fVPD), including CO 2 , soil water supply (fW), light intensity (fL) and CloudI (fCloudI) in LUE models.
The structures of TL and BL models differ in the calculation of APAR and ε max . In a BL model, APAR is the product of the fraction of absorbed photosynthetically active radiation (FAPAR) and photosynthetically active radiation (PAR), and sunlit and shaded leaves share the same ε max (Mäkelä et al., 2008;Running et al., 2000). By contrast, both APAR and ε max are separated to sunlit (APAR sun and ε sun ) and shaded parts (APAR shade and ε shade ) in TL models (He et al., 2013;Yan et al., 2017).
Accordingly, we defined the BL model structure as Eq.

GPP = ε max ⋅APAR⋅f T⋅f VPD⋅f W⋅f L⋅f CloudI
(1) Here the form of fL can be different in BL models and for sunlit and shaded leaves in TL models (compared in Section 2.3). The effect of the diffuse fraction on GPP, fCloudI, was removed in TL models since the effect is assumed to be addressed by the separation into APAR sun (Eq. (3)) and APAR shade (Eq. (4)).

APAR sun =
( PAR dir , PAR dif, and PAR sc refer to the direct, diffuse, and scatter components of photosynthetically active radiation intercepted per leaf area (Chen et al., 1999). PAR dir was derived from global radiation (R g ), the solar zenith angle (θ), and leaf-sun angle, i.e., leaf angle to the direct radiation (β). Additionally, PAR dif and PAR sc were estimated using R g , CloudI, Ω, and LAI. LAI sun and LAI shade are the leaf area index of sunlit and shaded leaves, and both vary with θ and CloudI (see the calculation in Section 2.2). Thus, TL models consider the changes in canopy architecture (e.g., leaf distribution pattern), light condition (e.g., differentiating APAR sun and APAR shade ), and sunlit and shaded leaf areas.
Since LUE models can be affected by the combinations of environmental factors and environmental sensitivity functions, various combinations of fT, fVPD, fW, fL and fCloudI were used in LUE models. We collected six fTs, seven fVPDs, two of which include the ambient CO 2 effect, nine fWs, one fL (more fLs were collected and compared in Section 2.3), and four fCloudIs from literature (see function names and equations in Table S1). In addition, the null hypothesis (no GPP response to an environmental factor) is tested via the introduction of alternative functions that represent no sensitivity of GPP to each of the environmental factors including temperature (T), vapor pressure deficit (VPD), soil water supply (W), light intensity (L), and CloudI were considered. To summarize, we made an ensemble of 5600 (=7 × 8 × 10 × 2 × 5) BL models and an ensemble of 1120 (=7 × 8 × 10 × 2) TL models. Since we adopted three methods to separate sunlit and shaded leaves in TL models (see Section 2.2), 3360 TL models were considered in the model comparison experiment.

Effective leaf area index estimation approaches
The separation of sunlit and shaded leaves relies on the accurate estimation of LAI e , the product of LAI and Ω (Ryu et al., 2010). However, current LAI and Ω products are typically based on the biome-specific parameterization approaches, which neglect the variability in canopy architecture and clumping within a biome and could be affected by leaf separation. Thus, we estimated daily LAI and LAI e based on their empirical relationship with spectral vegetation indices per site to compare with the existing LAI product. Here we selected the ratio vegetation index (RVI) and normalized difference vegetation index (NDVI) from earth observational data collected at the site to substitute LAI based on a linear function (Myneni et al., 1997) and an exponential function (Towers et al., 2019;Xu et al., 2020), respectively.
LAI NDVI = LAI max ⋅NDVI a2 (6) a 1 and a 2 are dimensionless empirical coefficients, and LAI max is the maximum LAI (=10 m 2 ⋅m − 2 ) estimated from MODIS LAI product (see Table 1) and the in-situ LAI (Migliavacca et al., 2021). LAI e was then estimated as follows: , where b is the average Ω per site, and LAI is estimated using RVI or NDVI. These three parameters (a 1 , a 2 and b) were calibrated together with the other LUE model parameters at each site (explained in Section 2.3). The RVI-estimated LAI (LAI RVI ), NDVI-estimated LAI (LAI NDVI ) and LAI-estimated LAI e were applied to calculate LAI sun and LAI shade according to Chen et al.'s study , (1999), which were used to represent the FAPAR sun and FAPAR shade , respectively, in TL models (see the equations and units in Table 1). The APAR sun and APAR shade were the product of PAR sun and FAPAR sun and the product of PAR shade and FAPAR shade (see Eqs. (3) and (4) and definitions in Table 1), respectively. In the equations of LAI sun and LAI shade , k represents the extinction coefficient that depends on leaf angle distribution, e.g., for a spherical leaf angle distribution, k=0.5 (Norman, 1982). While k could be calibrated, k and b would be highly correlated if they are both calibrated due to the factorial association between them. Hence, a spherical leaf angle distribution was  Solar zenith angle rad ∫ tss tsr (sinφsinδ + cosφcosδcosH(t))dt tss − tsr , δ is the solar declination angle, H(t) is the solar hour angle at time t, t ss and t sr are the local sunset time and sunrise time. Brinton et al. (1969) cosβ Average cosine leaf-sun angle -0.5 or calibrated Chen et al. (1999) PAR Photosynthetically active radiation Weiss et al. (1985) PAR dir Direct radiation of PAR  Myneni et al. (1997) FAPAR sun Fraction of absorbed PAR by sunlit leaves =LAI sun estimated using LAI NDVI , in TL NDVI models; =LAI sun estimated using LAI RVI , in TL RVI models; =LAI sun estimated using MODIS LAI and Ω, in TL LAI models Chen et al. (1999), He et al. (2013) FAPAR shade Fraction of absorbed PAR by shaded leaves =LAI shade estimated using NDVI, in TL NDVI models; =LAI shade estimated using RVI, in TL RVI models; =LAI shade estimated using MODIS LAI and Ω, in TL LAI models Chen et al. (1999), He et al. (2013) The gaps in the R g , R p , R n , T, VPD, and Precip were filled with CRUNCEP product (Viovy, 2018) downscaled using a random forest method (Besnard et al., 2019).
FAPAR was represented by NDVI according to their linear relationship in BL models (Myneni et al., 1997) and by LAI in TL models. Thus the units of FAPAR and ε were assumed for all sites. To consider different leaf angle distributions in the canopy, k and cosβ, average cosine leaf-sun angle, controlling the PAR dir , were optimized within the range of 0.1-0.9 in the best LUE model using the LAI product, which does not include b, and in the best LUE model using NDVI and RVI with a fixed b (=0.6). In summary, we applied three methods to represent LAI e : using LAI and Ω, NDVI, and RVI. TL models using these three LAI e datasets, defined as TL LAI , TL NDVI , and TL RVI , were compared separately and with BL models. The selected best TL LAI , TL RVI and TL NDVI were used to identify the effect of the leaf angle distribution on modeled GPP by comparing model performance with calibrated and fixed k.

Light response of gross primary productivity in big-leaf and two-bigleaf models
In BL models, we introduced fL to represent the light response of total canopy GPP. GPP can be saturated under high radiation at the daily scale ( Fig. S1). fL can be an independent form (e.g., TAL model; Mäkelä et al., 2008) or a non-rectangular hyperbola (NRH) form depending on ε max and APAR (Ibrom et al., 2008;Thornley, 1976). In both TAL and NRH light response forms (hereafter named as fL TAL and fL NRH ; see equations in Table 2), the empirical parameter γ controls the curvature of the light response of GPP. In fL NRH , another parameter, the maximum GPP under high radiation (GPP max ), was introduced to represent the saturation level of GPP. Due to more parameters, fL NRH is more flexible than fL TAL (Fig. 1).
To identify the tradeoff between fL TAL and fL NRH , we compared the performance of fL TAL and fL NRH based on the best BL model structure (BL-LUE TAL and BL-LUE NRH1 in Table 2). Further, we analyzed if T, VPD, W and CloudI affect both the saturation level of GPP and ε before saturation (represented by fL NRH , see equation in Table 2) or affect only the saturation level of GPP (represented by fL NRHscl , see equation in Table 2) by the comparison between relative models (BL-LUE NRH1 and BL-LUE NRH2 , TL-LUE NRH1-1 and TL-LUE NRH2-1 in Table 2).
We also assessed fL TAL and fL NRH based on the best TL model structure. To test if GPP of the sunlit leaf and the shaded leaf are saturated at a different light intensity and test if ε sun is equal to ε shade , we introduced five TL model structures based on the assumptions: a) the whole canopy is treated as a big-leaf of which the GPP gets saturated simultaneously (Eq. (2) or TL-LUE TAL1 , TL-LUE NRH1-1 , and TL-LUE NRH2-1 in Table 2); b) only GPP of the sunlit leaf (GPP sun ) gets saturated at high APAR sun and ε sun and ε shade are different (TL-LUE TAL2 , TL-LUE NRH1-2 , and TL-LUE NRH2-2 in Table 2); c) only GPP sun gets saturated at high APAR sun but ε sun and ε shade are the same (=ε max-TL ; TL-LUE TAL3 , TL-LUE NRH1-3 , and TL-LUE NRH2-3 in Table 2); d) GPP sun and GPP of the shaded leaf (GPP shade ) get saturated at different APAR sun and APAR shade , but ε sun and ε shade are the same (TL-LUE TAL4 , TL-LUE NRH1-4 , and TL-LUE NRH2-4 in Table 2); e) GPP sun and GPP shade get saturated at different APAR sun and APAR shade with different ε sun and ε shade (TL-LUE TAL5 , TL-LUE NRH1-5 , and TL-LUE NRH2-5 in Table 2).
To summarize, we contrasted 18 BL and TL models ( Table 2) to identify the best light response form of GPP using the method introduced in Section 2.5.

Forcing data and model parameters calibration
Aiming at comparing models without the interference of input data and parameters, all the models were forced using the same data and calibrated using the same method at each site.
The names, units, processing, and source of forcing data for LUE models at 196 eddy covariance (EC) sites (listed in table S2) are summarized in Table 1. The vegetation at these EC sites (Fig. S2) includes eleven plant functional types (PFT), which are dominated by evergreen needleleaf forests (28%), grasslands (21%), deciduous broadleaf forests (11%) and croplands (9%). Big-leaf models were only forced by PAR, FAPAR, T, VPD, W, CloudI, CO 2 , and elevation. TL models were additionally forced by PAR dir , PAR dif , PAR sc , LAI sun, and LAI shade . Due to over 70% of gaps in LAI and Ω products at 16 sites (in bold and italic in Table S2), TL LAI models were compared with other models at 180 sites.
Assuming that the response of GPP to environmental changes and the canopy architecture differ across sites, we calibrated all LUE models at each site, including LUE model parameters (Table S3) and the empirical coefficients (a 1 , a 2 , b, and k). The parameter inversion process was based on a Bayesian approach minimizing the sum of squares between model estimates and observations of GPP and evapotranspiration fluxes, and maximizing the range of sensitivity functions (section S1; as in Bao et al., 2022). In the calibration, we used only good-quality data identified using quality flags in EC data (indicate the values for reproducibility, see the definition in Pastorello et al. (2020) and QC in Table 1, e. g., NEE_VUT_REF_QC =0.85) and remote sensing data (see QA in Table 1).

Model comparison method
Given that model performance depends on the region or sites where the models are applied (Hogue et al., 2006), and that a model comparison result can be affected accordingly, we adopted a site-sampling method to compare models at various groups of sites to avoid selecting the best model for a specific group of sites or a specific region. First, a group of 100 sites was bootstrapped from all EC sites, and all the LUE models were forced to simulate GPP at a daily scale. Daily GPP was aggregated to weekly, monthly, and yearly GPP using the mean. The performance of each model was assessed at these sites using three metrics: Nash-Sutcliffe model efficiency (NSE), root mean squared error (RMSE), and Bayesian information criterion (BIC). The models were sorted according to their model performance at daily, weekly, monthly, and annual means, and only the top 1% models were regarded as the best models for the relative group of sites. Then, the model ranking process was repeated 200 times. In other words, models were sorted at 200 groups of bootstrapped sites (100 sites per group). To rank these models, a likelihood of each model (P), equal to the frequency of a model selected as one of the best models, was calculated. P represents the robustness of a model at various sites. Finally, we selected the model with the highest ranks according to NSE, RMSE, and BIC (i.e. the smallest sum of ranks sorted using NSE, RMSE and BIC) as the best model for all sites. The pseudo-code of the above process can be found in section S2 (similar to Bao et al. 2022).
The best TL LAI , TL RVI , and TL NDVI were selected according to P. They different in BL and TL models, i.e. the unit of ε max in BL models was gC⋅MJ − 1 , but the units of ε sun and ε shade in TL models were gC⋅m 2 (ground area)⋅MJ − 1 ⋅m − 2 (leaf area). QA was used to identify the good quality and snow-free observation conditions. NDVI and LSWI were calculated at the good-QA pixels around EC sites and averaged across all pixels (Walther et al., 2021). LAI and Ω were extracted using a 1km × 1km average-based window spatially and filled with nearest neighboring values temporally. In the equations for WAI, t represents the time (day). The initial condition of WAI was determined by a five-year spin-up run. The time series LAI, Ω, NDVI and LSWI were all filtered using Savitzky-Golay filter (Savitzky et al., 1964). Elev and CO 2 were the particular forcing data for fVPD P , fVPD P0 and fVPD PRELES . The other models only required PAR (including PAR sun and PAR shade ), T, VPD, W, CloudI, and FAPAR.
GPP obs and σ NEE were used to calibrate LUE model parameters, and ET obs and σ LE were used to calibrate WAI parameters.
The good-quality data used in model calibration referred to the R g , R p , R n , T, VPD, Precip, GPP obs and ET obs with QC≥0.8, and r red , r nir , and r swir with QA≥0.5.
were compared with the best BL model using the two-sample Kolmogorov-Smirnov (K-S) test and pair-wise t-test with a significance level (=0.05). The results of statistical tests were used to indicate the significance of the difference between models across sites. The 18 LUE models with different light response forms (Table 2) were assessed using a similar method as the one described above. However, we selected 30% of models (i.e., the top five models) as the best model from each iteration. P was equal to the frequency of a model selected as the best model across all iterations and used to sort light response forms represented by relative models.
To find the factor for the outperformance of the best TL model relative to the best BL model, we analyzed the spatial and temporal pattern of the NSE difference between them. First, the site-level NSE at different time scales was contrasted against the mean annual CloudI, ET, Table 2 LUE models with various forms of light response functions (parameters are in bold). GPP = εsun⋅APARsun⋅fL NRHscl− sun + εshade⋅APAR shade ⋅fL NRHscl− shade APAR: total canopy absorbed photosynthetically active radiation. APAR sun : absorbed photosynthetically active radiation for sunlit leaves. APAR shade : absorbed photosynthetically active radiation for shaded leaves. ε max : maximum light use efficiency in BL models. ε sun : maximum light use efficiency for sunlit leaves. ε shade : maximum light use efficiency for shaded leaves. ε max-TL : maximum light use efficiency for both sunlit and shaded leaves in TL models.

Model/function name Equation
GPP max : maximum GPP in BL models. GPP msu : maximum GPP of sunlit leaves. GPP msh : maximum GPP of shaded leaves. GPP scl : GPP max scaled by environmental factors. GPP ssu : GPP msu scaled by environmental factors. GPP ssh : GPP msh scaled by environmental factors. γ: the curvature parameter of GPP response to APAR for the whole canopy. γ sun : the curvature parameter of GPP response to APAR for sunlit leaves. γ shade : the curvature parameter of GPP response to APAR for shaded leaves. The forms of fT, fVPD, fW and fCloudI are the same as the ones in BL* and the TL* model. LAI, PAR, T, VPD, and Ω in the growing season (T≥10 • C and GPP≥ 0.5 × maximum GPP obs ) and all period (growing and non-growing season) per site to identify the spatial pattern of the model difference. The temporal pattern was analyzed according to the NSE of GPP at each bin of CloudI, ET, LAI, PAR, T, VPD, and Ω per site.

Performances of two-big-leaf and big-leaf light use efficiency models
TL models, especially TL NDVI models, were generally more robust than BL models across various sites (Fig. 2). P of TL NDVI models, indicating the robustness of a model at various groups of sites, was the largest using NSE (P=0.67), RMSE (P=0.63), and BIC (P = 0.59) as the assessing metrics (which dominates the top 1% best models shown in Fig. 2a-c). Big-leaf models considering the diffuse fraction (BL CloudI ) also had a good performance compared with BL models without CloudI. There were 19 BL CloudI models in the top 250 models and had higher ranks than most of TL RVI models, all TL LAI models and other BL models according to NSE, RMSE and BIC (orange bars in Fig. 2a-c).
The best TL model (hereafter named as TL*) was better than the best BL model (hereafter named as BL*) at a small proportion of sites. The site-level NSE of the best TL LAI , TL NDVI , TL RVI , and BL* displayed a similar distribution pattern (Fig. 3). The two-sample K-S test showed that the site-level NSE, RMSE and BIC of BL*, which is also of type BL Cloudi , was not significantly (p > 0.05) different from the best TL LAI , TL RVI , and TL NDVI models at daily, weekly, monthly, and annual scales (i. e. the population of site-level NSE, RMSE, and BIC were similar). However, the pair-wise t-test between NSE of TL* and BL* and RMSE of these two models displayed the significant difference (p < 0.05) at the daily and weekly scales, indicating more sites had larger NSE using TL* than using BL*. At the same time scales, the best TL NDVI , which is also the best TL model (i.e., TL*), showed an NSE difference larger than 0.1 from BL* in 10% of the sites ( Fig. 3a and b). TL* also outperformed at cropland sites at the daily, weekly and monthly scales and at sites of evergreen broadleaf forests at the annual scale (Fig. S3). Thus, incorporating the diffuse fraction in a BL model allows achieving similar performance to a TL model globally, while BL models were still worse at some specific sites or environmental conditions in simulating day-to-day variability of GPP.
The TL* selected according to the sum of model ranks using NSE, RMSE, and BIC had the same environmental sensitivity functions (fT Horn , fVPD PRELES , and fW Horn , see equations in Table S1) as BL* and showed light saturation of GPP for some sites. The correlation coefficients Fig. 1. The response of GPP to absorbed photosynthetically active radiation (APAR) in LUE models. Solid lines represent the response of GPP of the whole canopy in BL. Dash-dotted lines and dashed lines represent the response of GPP of sunlit leaves and shaded leaves in TL models, respectively. The response forms of GPP to APAR can be divided into the reciprocal response form (as in the TAL model; black and thin lines) and the non-rectangular hyperbola (NRH) response form (blue and thick lines). In the TAL form, the response curve is only controlled by a curvature parameter (γ), and in the NRH form, the response curve is controlled by a curvature parameter (γ) and the saturation level of GPP (GPP max ). These curves, drawn based on the synthetic data, exemplify the light response of GPP without the influence of other environmental factors. γ and GPP max are independent across sites (=initial values shown in Table S3) (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.).

Fig. 2.
Model rank according to the model likelihood. The model likelihood was estimated according to the Nash-Sutcliffe model efficiency (NSE; a), root mean squared error (RMSE; b), and Bayesian information criterion (BIC; c). BL CloudI refers to the BL models incorporating a function of cloudiness index (CloudI). TL LAI , TL NDVI , and TL RVI represent TL models using MODIS leaf area index (LAI), normalized difference vegetation index (NDVI), and ratio vegetation index (RVI) to represent LAI e , respectively. The smaller figures at the bottom right corner in a, b, c are the zoom-in windows of the best 250 models. The best 250 models are mainly TL NDVI models. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.) between the sensitivity functions in BL* and TL* were also high at most sites (e.g., fT, fVPD and fW, see Fig. S4), although absolute sensitivities of BL* and TL* could differ locally. This confirms that these functions can represent the responses of GPP across different canopy architectures and light saturation should be considered even though the absorbed radiation between sunlit and shaded leaves is differentiated.
Even though model parameters were calibrated, both the TL model and the BL model underestimated GPP in the growing season (Fig. S5). At some sites, this was due to the extremely high GPP at the peak of the growing season (e.g., CA-TP4 and FR-LBr in Fig. S6a and c), which could be neither captured by BL nor TL models. At some other sites, the fraction of underestimated GPP was related to a lower fraction of data in the growing season (r = -0.69; Fig. S7), resulting in under-representation of the growing season. The underestimation could be partly corrected if the role of growing-season is inflated by increasing the cost contribution under middle and high GPP (0.5 × maximum GPP obs ≤GPP obs <99 th percentile of GPP obs ) until the fraction of GPP in this range reached 0.7 (Fig. S5b, 7% of the underestimated GPP were corrected). Hence, the fraction of various ranges of GPP in the cost function should be considered in the model calibration and training of data-driven models.

The effect of the approach to represent effective leaf area index
Our results showed that NDVI was the best vegetation index to represent LAI e compared to RVI or the combination of LAI and Ω according to the performance of relative TL models (Fig. 2). The best TL NDVI , TL*, had larger NSE (NSE difference>0.1) in 9% and 18% of the sites than the best TL RVI and TL LAI at the daily scale (Fig. 3a). RVI was the second-best vegetation index, as some TL RVI had larger P compared with TL LAI and BL models. TL LAI underperformed compared to BL CloudI in 14% of the sites at the daily scale and in 29% of the sites at the annual scale (Fig. 3d), and the combination of LAI and Ω to split sunlit and shaded leaves accordingly had the lowest performance. On the other hand, flexible relationships between the LAI shade ratio and NDVI were derived using NDVI with site-calibrated parameters (Fig. 4), which suggests that the variability of canopy architectures matters for the partitioning of the canopy into shaded and sunlit leaves for a given NDVI. While using the combination of MODIS LAI and Ω, the LAI shade ratio changed with NDVI with similar slopes across sites, and the parameterized site-specific LAI e estimation approach (i.e., TL NDVI ) captures the largely different effects of canopy variability on the shaded leaf to total leaf ratio. We collected the ground-measured LAI at 20 sites (Table S4). Although the satellitebased LAI e (i.e., the product of MODIS LAI and Ω, the LAI e estimated using NDVI and RVI) did not always match with trends of groundmeasured LAI (e.g., sites in Fig. S8), the satellite-based LAI e had higher correlation with the GPP obs than the ground-measured LAI (Table S4). The ground-measured LAI did not correlate well with GPP obs and satellite-derived LAI e time series due to the different LAI measurement types and the mismatch between the footprint sizes.
Models with more parameters are usually more flexible and have superior performance if not punished for the number of parameters. The best TL LAI with the calibrated leaf-angle-distribution-related parameters (k and cosβ) has the same number of calibrated parameters (degrees of freedom) as TL*, while it was worse than TL* at 19% sites at the daily scale (NSE difference >0.1) and similar to TL* at the other 81% sites. The difference between TL* and the best TL LAI was not related to PFT (Fig. S9) or LAI (Fig. S10). This demonstrates that the variability of the leaf angle distribution was not as significant as that of NDVI for GPP Fig. 3. NSE distribution histogram of the BL*(yellow, the narrowest bar), TL LAI (shallow blue, the second widest bar), TL NDVI (dark blue, the second narrowest bar), and TL RVI (purple, the widest bar), respectively) at daily, weekly, monthly and yearly scales. The higher bar at the right side (i.e., more NSE close to unity) represents higher model performance (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.). estimation. Furthermore, the performance of TL* with calibrated k and cosβ was not significantly different from TL* with fixed k and cosβ (p > 0.05). Therefore, it is reasonable to assume a fixed leaf angle distribution across sites for the application of LUE models at the daily scale.

Contrasting light response forms in light use efficiency models
NRH light response form (BL-LUE NRH1 ) which considered T, VPD, W and CloudI effects on both the saturation level of GPP and ε, had a higher rank in BL models than the reciprocal light response form (BL-LUE TAL ) according to the model likelihood (Fig. 5). Therefore, NRH is more flexible than TAL to capture various light responses of GPP across sites. However, the rank of BL-LUE NRH2 , which only considers the environmental effect on the saturation level of GPP, was lower than BL-LUE TAL , showing that the environmental effect on ε for the whole canopy cannot be neglected. The reason was that the flexibility of NRH was constrained Fig. 4. Relationship between NDVI and LAI shade ratio derived from the best TL LAI (a) and the relationship between NDVI and LAI shade ratio using the best TL NDVI (b). The color represents the scatter density (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.).

Fig. 5.
Average model likelihood (P) of LUE models with various light response functions across daily, weekly, monthly and yearly scales. The likelihood calculated using BIC is not considered at the monthly and yearly scales due to the limited observational data relative to the number of model parameters. The left three bars represent BL models with fL TAL (blue bar, the 1 st bar), fL NRH (gray bar, the 2 nd bar) and fL NRHscl (purple bar, the 3 rd bar) light response functions (see equations in Table 2). The right fifteen bars are the TL models with fL TAL (blue bars, the 4 th -8 th bar), fL NRH (gray bars, the 9 th -13 th ) and fL NRHscl (purple bars, the last five bars) based on five different two-big-leaf assumptions (represented by the bar textures): (1) GPP sun and GPP shade had different maximum light use efficiencies and the same light response features (ε sun ∕ = ε shade and fL sun = fL shade ; bars with horizontal lines); (2) GPP sun and GPP shade had different maximum light use efficiencies and only GPP sun can be the saturated to the absorbed radiation (ε sun ∕ = ε shade and no fL shade , bars with '>'); (3) GPP sun and GPP shade share the same maximum light use efficiencies and only GPP sun can be saturated (ε sun = ε shade and no fL shade , bars with '/'); (4) GPP sun and GPP shade share the same maximum light use efficiencies but are saturated at different radiation with different curvatures (ε sun = ε shade and fL sun ∕ = fL shade , bars with '\'); (5) GPP sun and GPP shade share the same maximum light use efficiencies but are saturated at different radiation with different curvatures (ε sun ∕ = ε shade and fL sun ∕ = fL shade , bars with 'x') (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.). when the GPP max was limited by other scalars (i.e., fT, fVPD, fW and fCloudI), leading to the inability to simulate light-limited GPP (i.e., nonsaturating light response) within the observed radiation range. Another reason was that fitting the LUE model to daily GPP and driver values blurs the relationships compared to response functions from the instantaneous data. A simple theoretical experiment (see section S4) showed that models may represent data at the daily scale well even though the differences are large at half-hourly scale. Fig. 6. Comparison of the maximum light use efficiency (ε sun and ε shade ), PAR (PAR sun and PAR shade ), LAI (LAI sun and LAI shade ) and GPP (GPP sun and GPP shade ) in sunlit and shaded leaves based on five best TL models and the model mean. The black line and red line in these figures represent the median and mean, respectively (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.).
In TL models, the performance, i.e. P, of TAL and NRH were similar (Fig. 5). The P of TL-LUE TAL1 (=0.53) and TL-LUE NRH2-5 (=0.52) were close across different time scales. The third to the fifth best models, TL-LUE TAL2 , TL-LUE NRH1-2 and TL-LUE TAL5 (P =0.49, 0.47 and 0.46, respectively), were also similar. At short time scales (daily and weekly), TL-LUE NRH2-5 , with the most parameters, was superior to the other models (Figs. S11-12). However, at the annual scale it did not perform as well as TL-LUE TAL1 , which does not differentiate the light response curvature parameters (represented by γ and GPP max ) between sunlit and shaded leaves (Fig. S14). The pair-wise statistical tests between the sitelevel NSE and RMSE of models both showed better performance of TL-LUE NRH2-5 (Figs. S15-16), but the BIC of BL-LUE TAL , with the least parameters, was statistically better than other models (Fig. S17). Furthermore, the light response parameters in TL-LUE NRH2-5 were highly correlated at most sites (e.g., R between ε sun and γ sun was higher than 0.5 at 61% sites; see the distribution of R between highly correlated parameters of TL* in Fig. S18), indicating the possibility of overparameterization. From daily to yearly time scales, models that do not distinguish ε sun and ε shade and the light response curvature parameters, TL-LUE TAL3 , TL-LUE NRH1-3 and TL-LUE NRH2-3 , were all poor compared with other models. Models that differentiated ε sun and ε shade (e.g, TL-LUE TAL1 , TL-LUE NRH1-1 ) were generally better than models that only distinguish light responses (e.g., TL-LUE TAL4 , TL-LUE NRH1-4 ), suggesting that distinguishing ε sun and ε shade rather than light response curvature parameters were the key to separate light responses of GPP sun and GPP shade .
Due to the similar performance across sites and per site, we selected five models (TL-LUE TAL1 , TL-LUE NRH2-5 , TL-LUE TAL2 , TL-LUE NRH1-2 and TL-LUE TAL5 ) with good performance at different time scales as the best TL models. In these five models, the contribution of GPP shade was larger than GPP sun for higher ε shade and LAI shade relative to ε sun and LAI sun (Fig. 6).

Spatio-temporal differences between two-big-leaf model and big-leaf model
To find out whether there are certain locations or seasons, where the BL and TL models differ, the spatio-temporal pattern of the difference between the TL* (=TL-LUE TAL1 ) and BL* was investigated. Since the BIC of TL* and BL* showed no difference at the daily and weekly scales, the RMSE and NSE differences were similar across time scales and the purpose was to find a less-biased model, we here used only NSE, which considers model bias and the systematic error, to analyze the spatiotemporal difference between these two models. The results showed that TL* was statistically different from BL* at daily and weekly scales according to NSE. TL* had higher NSE in 10%, 10%, and 12% of all sites at the daily, weekly, and monthly scales (NSE difference>0.1; see Fig. 7). The difference between TL* and BL* was spatially related to the foliage distribution pattern (Fig. 8a). Here the foliage distribution pattern was indicated by the average clumping index in the growing season (Ω gs ), which is close to unity when the foliage distribution is random (leaves are on top of each other), or close to zero when the foliage is highly clumped (e.g., conifers; leaves are side by side). We also tested the other definitions of the growing season (e.g., T≥5 • C and GPP≥ 0.3 × maximum GPP obs ), which showed a similar pattern to Fig. 8a (see Fig. S19a-c). Although neither TL* and BL* captured the GPP variability at some sites (NSE<0.4 at 8% and 10% sites, respectively), TL* tended to have a larger NSE of GPP at high Ω gs (≥0.55). Further, TL* captured the peak of GPP at several high Ω gs sites while BL* did not (e.g, DE-Geb and FR-Lq1 in Fig. S6b and d; Ω gs =0.70 and 0.74, respectively). However, the NSE difference between TL* and BL* reduced at the monthly and annual scales (Fig. 7). BL* had significantly lower BIC (representing equal performance but fewer parameters) than TL* at the monthly and annual scales (p < 0.05; see Fig. S20c, d), and the difference between BIC of TL* and BL* did not show changing trend with Ω gs (Fig. 8b). The site-level RMSE of TL* and BL* did not show a Fig. 7. Comparison between NSE of BL* and TL* at daily (a), weekly (b), monthly (c) and yearly (d) scales. Each point in the figure represents a site. These site-level NSE of BL* and TL* are also compared using the pair-wise t-test and the value of p is displayed at the upper-left corner. The solid line is the 1:1 line. significant difference at the annual scale (Fig. S21d). These results demonstrated that BL* with fewer parameters was more efficient compared with TL* at longer time scales (e.g., annual), and TL* was superior at high Ω gs at short time scales (e.g., daily).
The temporal difference between TL* and BL* was related to the variability in environmental conditions (Fig. 9). The difference between TL* and BL* did not show any pattern as CloudI changed (Fig. 9a) explaining that incorporating CloudI in BL models corrected the model bias compared to the models that neglected the temporal variation of the diffuse fraction of PAR is neglected. However, the NSE of GPP estimated by TL* was generally higher than BL* when T>18 • C or VPD>1.2kPa (Fig. 9). Under warm and/or dry conditions (T>18 • C or VPD>1.2 kPa), BL* had higher NSE than TL* (NSE TL -NSE BL ≤-0.1) at 8% sites while TL* had higher NSE (NSE TL -NSE BL ≥0.1) in 17% of sites (Fig. 10). This advantage of TL models over BL models was also conserved in comparison between BL-LUE NRH1 and TL-LUE TAL1 (Fig. S22a, b) and between BL-LUE NRH1 and TL-LUE NRH2-5 (Fig. S22c, d). Thus, TL* was more robust in warm and/or dry weather which usually occurs in arid climates (represented by shallow blue circles in Fig. 9f).

Discussion
Our results demonstrate that the null hypothesis, i.e., the TL models are better suited than BL models to simulate GPP with measurable drivers at the site-level across time scales, needs to be rejected. More specifically, the differences between TL and BL models incorporating CloudI and light intensity at each site are not significant at the monthly and annual scales. There are only a few exceptions, where TL* had an advantage at daily and weekly time scales, especially under hot and dry conditions, which will be discussed in Section 4.1. Although TL* includes more process parameterizations, it, apparently, still has some limitations, see Section 4.2, and these lie in the retrieval of model parameters. The quality of the satellite-derived LAI and Ω are not as good to estimate fractions of sunlit and shaded leaves as NDVI with calibrated parameters per site (discussed in Section 4.3). Despite the limited improvement of the quality of simulated GPP, TL* confirms the theory of the total canopy light response, and explains in a simplified way, how the differences between sunlit leaves and shaded leaves, both in physiology and local conditions, causes the particular response to the different light regimes, between the beam and diffuse PAR (discussed in Section 4.4). Finally, we explain how the incorporation of theory in models includes a tradeoff between the accuracy of simulation and realistic description of the underlying processes.

Advantage of two-big-leaf light use efficiency model
Previous studies reported that the TL model performed much better than the BL model without considering diffuse fraction or light saturation, especially under a high diffuse fraction (He et al., 2013;Yan et al., 2020) or large LAI (Zhou et al., 2016b). This study shows that incorporating CloudI and light saturation effects in BL models eliminates the error of BL models relative to TL models at high diffuse fraction and LAI. BL* is as good as TL* across sites as CloudI can address the effects of diffuse radiation, θ, and dissimilar responses of sunlit and shaded leaves due to their correlation. However, TL* had a marginal advantage at high Ω gs and robust advantage in 17% of sites under hot and dry conditions. Distinguishing radiation components (direct, diffuse and scattered), leaf area fractions and light responses (including slopes represented by ε max and curvature degrees represented by γ and GPP max ) between sunlit and shaded leaves addressed the differentiated responses of GPP under different light regimes. In hot and dry weather, the dissimilar responses between sunlit and shaded leaves get prominent (Wozniak et al., 2020), TL models, which differentiated the ε and the leaf fraction between sunlit and shaded leaves, are therefore more robust than BL models. A reason might be that in 'normal' conditions, i.e., GPP is not heavily constrained by one factor, the interaction of GPP with VPD, which is positively correlated with both PAR intensity and the fraction of beam radiation can help BL models to parametrize different processes with few coupled simple empirical relationships (fVPD, fL and fCloudI). When one factor gets dominant, however, the limitation of overly-simplified model structure to accurately describe this one process becomes evident. Given that more vegetated areas will be facing temperature and water limitations (Jiao et al., 2021), our study emphasizes the increasing relevance of TL models for assessing the global carbon dynamics.

Limitations of two-big-leaf light use efficiency model
TL models are more complex than BL models since they additionally Fig. 8. Site-level NSE (a) and BIC (b) of the BL* and TL* against average clumping index in the growing season (Ω gs ). The growing season is defined as the period when GPP is larger than half of the maximum GPP and temperature is larger than 10 • C. Ω gs represents the distribution pattern of foliage, which is unity when the foliage distribution is random, or close to zero when the foliage is clumped. The site-level NSE is binned according to the range of Ω gs . The yellow and blue solid lines are the medians and dotted lines are the 25th and 75th percentiles. The shaded area represents the range of NSE or BIC at each bin (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.).
include parameters to separate ε max (or fL), LAI, and APAR of sunlit and shaded leaves. This complexity can lead to overfitting at some sites and restrict the application at the global scale. Hogue et al. (2006) demonstrated that "additional complexity cannot make model perfect". Considering this, BL* can be a good alternative under non-hot-and-dry conditions for its similar performance to TL* despite fewer parameters.
Although ε max , LAI, and APAR are differentiated between sunlit and shaded leaves, the vertical profile of canopy architecture and micrometeorological conditions may also not be sufficiently addressed in relevant detail by TL models (Bonan et al., 2021). Further, potential differences between green and senescent plant materials in vegetation canopies (Pacheco-Labrador et al., 2021) are ignored. ε sun and ε shade , related to J max and Vc max of sunlit and shaded leaves, are found to respond to T similarly (Alton et al., 2007;Hernández et al., 2020). And the sensitivity of stomatal conductance to T, VPD, and W, which differs between sunlit and shaded leaves (Hernández et al., 2020), is not considered in TL models. Such differences can result in asynchronous responses of ε sun and ε shade , i.e., different fTs, fVPDs, and fWs between sunlit and shaded leaves. In addition, the random and systematic error in FLUXNET carbon dioxide fluxes (Richardson et al., 2006;Wozniak et al., 2020) will propagate into model parameter uncertainty. Thus, the accuracy of TL models, depending on a data-driven approach, is still unstable and varies with environmental conditions. Both BL and TL models slightly underestimated GPP. Our study demonstrates that one of the reasons is the lower fraction of goodquality data during the growing season. The error can be partly corrected by increasing the fraction of growing-season data in the cost function. However, at some sites lack of good-quality observations at the peak of the growing season, the highest GPP cannot be represented, i.e. lack of information in the calibration data cannot be compensated by a statistical modeling approach alone. Another reason for the inability to capture high GPP is the uncertainty in the observational data (Fig. S23), which is used in the Bayesian approaches accounting for observational uncertainty. Given the long-tailed distribution of ecosystem fluxes and its influence on the interannual variability (Fatichi et al., 2014;Zscheischler et al., 2016), it can justify that failure in capturing high GPP at the half-hourly scale propagates to modeling the interannual variability in GPP (Wozniak et al., 2020). Similarly, the slight underestimation of high daily GPP can be the reason for a poor LUE model performance at the annual scale.

Separating sunlit and shaded leaves according to site-specific canopy architecture
Compared to BL models, the use of TL models requires parameterization of additional canopy traits, here the partitioning of the canopy into instantly sunlit and shaded leaves, which increases the demand for information from the empirical data from the sites. Theoretically, this partitioning may not be confused with the shade modification, i.e. leaves that develop specific physiological traits according to the local growing conditions in the lower parts of the crown (shade-adapted leaves). Such shade-adapted leaves may well be hit by beam radiation (e.g., sun flecks), but with a considerably lower likelihood compared to the sun crown. These succinct theoretical differences are being ignored in TL models that in fact treat instantly sunlit or shaded leaves as sun-adapted or shade-adapted leaves, respectively. The only justification for this simplification lies in the mentioned likelihood that considerably fewer shaded leaves will be hit by beam radiation compared to leaves in the sun crown.
Previous studies estimated fractions of sunlit and shaded leaves and absorbed radiation at various canopy architectures based on LAI e assuming a spherical leaf angle distribution He et al., 2013;Yan et al., 2017). Here LAI e , represented by the combination of LAI and Ω, determines the light attenuation through the canopy according to Beer-Lambert's law and is related to the canopy architecture. Since the canopy architecture affects the relationship between NDVI and fractions of sunlit and shaded leaves (Leblanc et al., 1997), the slope between NDVI and LAI shade ratio cannot be the same across sites. However, slopes between NDVI and LAI shade ratio derived from the TL LAI are similar across sites, which might be due to biome-specific parametrization of the LAI and Ω products (Myneni et al., 2002;Wei et al., 2019). Furthermore, optical remote sensing products lack information of the vertical canopy structure, which can increase the uncertainty of GPP estimation in the lower canopy (Damm et al., 2020). The site-calibrated parameter (b in Section 2.2) can correct the estimated LAI e , thus the site-calibrated NDVI is better than using LAI and Ω to represent LAI e at various canopy architectures. Our study highlights the importance of variability in canopy architectures and clumping across sites for separating sunlit and shaded leaves.
In addition, our results show that the flexible leaf angle distribution across sites cannot significantly improve TL models. In TL*, the parameter related to the leaf angle distribution, k, can be compensated by the parameter to derive LAI, b, due to their factorial relationship. The site-calibrated b can therefore address the variability of the leaf angle distribution across sites. TL LAI with flexible leaf angle distribution has the same number of parameters as TL* but still underperforms. This demonstrates that LAI e is more important than leaf angle distribution for capturing the canopy variability, which was not represented appropriately in TL LAI , but was in TL*. The fixed leaf angle distribution across sites is therefore reasonable to be used in the estimation of fractions and absorbed radiation of sunlit and shaded leaves. The fixed leaf angle distribution has also been adopted in most of other TL models (Chen et al., 1999;He et al., 2013;Luo et al., 2018;Yan et al., 2017).

Light response of total canopy GPP and GPP of sunlit and shaded leaves
As NRH light response curve is more flexible, it usually captures GPP dynamics well (Xu et al., 2019). However, GPP at some sites is not saturated in the range of observed radiation (Bao et al., 2022). The NRH form in which the GPP max was constrained by fT, fVPD and fW (i.e., fL NRHscl ) cannot represent the non-saturated GPP as well as the reciprocal form. Thus, in BL models, which assume canopy GPP responds to the illumination simultaneously, either the reciprocal light response form or the NRH form considering a high upper range of GPP max should be used to capture light-limited and light-saturated GPP across various sites.
In previous TL models, either the ε max or the light response curvature parameter was differentiated between sunlit and shaded leaves (Guan et al., 2021;He et al., 2013;Yan et al., 2017;Zhou et al., 2016a). Our study nevertheless shows that differentiating ε max is more important than differentiating light response curvature parameters. Since light saturation points of shaded leaves cannot be detected in the short observation range at some sites, the model ignoring GPP saturation of shaded leaves (e.g., TL-LUE TAL2 and TL-LUE NRH1-2 ) is similar to the best model considering the saturation of both sunlit and shaded leaves (e.g., TL-LUE TAL1 and TL-LUE NRH2-5 ). Due to the higher contribution of GPP shade relative to GPP sun , our study recommends considering the light saturation of shaded leaves but neglecting the difference between light response curvature parameters to reduce the model complexity.
Our study shows that the model performance differs at different temporal scales, i.e. whether simulations are compared at the daily, weekly, monthly or yearly scales. Therefore, the temporal aggregation level used to fit the models is also equally important. We chose the daily time scale, which is at the higher end of remote-sensing-based retrievals. Nevertheless, only at the sub-daily scale (e.g., half-hourly scale) some aspects of the canopy radiative transfer can be discussed and modeled in Fig. 10. Distribution of NSE difference between TL* (NSE TL ) and BL* (NSE BL ) under all conditions (a), 'union' condition (b) and 'intersect' condition. The 'union' condition represents T>18 • C or VPD>1.2 kPa. The 'intersect' condition represents T>18 • C and VPD>1.2 kPa. The sites with less than 15 effective days are not shown in the bars. a reasonable way. For example, angular aspects of the distribution of beam fraction depend on solar elevation angle that changes over the course of the day. Additionally, the leaf-scale photosynthesis is a nonlinear process with the highest sensitivities at low light conditions and saturations (i.e. no sensitivity to PAR) at high PAR intensities. The power of TL concept is thus hampered by the loss of information through the aggregation of half-hourly data to daily data. A simple theoretical experiment (see section S4) showed that if considering only APAR, fitting the GPP data to some models from the ones used to generate the GPP data leads to lower performance at the half-hourly scale (see ratio and NSE between GPP simulated by BL-LUE TAL and BL-LUE NRH2 in Table S5-6). But fitting daily GPP generated by different models can result in similar model performance. This experiment indicates two important relationships: (1) the information loss through temporal aggregation blurs relations between canopy responses and the processes of photosynthesis and radiative transfer, and thus the ability to calibrate more detailed mechanistic models, and (2) that the empirical relationships after temporal aggregation are nevertheless robust enough and the different LUE models are flexible to parametrize the aggregated canopy PAR response with sufficient quality. As a consequence, aggregated data can be simulated more reliably with simpler models and fewer parameters than the original half-hourly data that require the use of more sophisticated models.

Conclusion
Through the comparison of big-and two-big-leaf LUE models considering the effects of drivers and differing forms of response functions, we find that a big-leaf LUE model incorporating diffuse fraction and saturation at high light intensities is able to explain the observed GPP variability equally well as optimal two-big-leaf LUE models globally across FLUXNET sites. It should be noted that a two-big-leaf LUE model has an advantage at the daily and weekly scales under hot and dry conditions. Furthermore, this advantage relies on considering sufficient drivers (i.e., T, VPD, W, light saturation, diffuse fraction and CO 2 in this study) and using LAI e estimated from NDVI with site-calibrated parameters to separate sunlit and shaded leaves rather than using satellitederived LAI and clumping index. In fact, a two-big-leaf model using LAI e estimated from satellite-derived LAI and clumping index with a flexible or fixed leaf angle distribution performs worse than big-leaf LUE model approaches incorporating diffuse fraction and light intensity. In addition, to address the GPP difference between sunlit and shaded leaves, the maximum light use efficiency of sunlit and shaded leaves should be differentiated to reach the highest performance of two-big-leaf models.
Our study demonstrates that the diffuse fraction is the main reason for the difference between big-and two-big-leaf LUE models, and therefore the dissimilar photosynthetic rates between sunlit and shaded leaves. By incorporating cloudiness information the big-leaf LUE models can achieve similar performance at most sites. Another reason is the within-canopy GPP variability increasing with LAI.
For future studies, we recommend a two-big-leaf LUE model at the daily and weekly scales especially under hot and dry conditions. A bigleaf LUE model incorporating diffuse fraction and light intensity might be a good alternative at the monthly and annual scales in the majority of the forest sites. The estimation of LAI or LAI e should consider various canopy architectures which cannot be fully represented by the current optical-remote-sensing-data-based LAI and clumping index products. The microwave remote sensing data, which is more sensitive to the canopy structures, has the potential to improve the canopy stratification in the data and is thus expected to improve the parameter estimations of realistic process models. Ultimately, this study shows empirical evidence at the ecosystem level that supports understanding within canopy distribution of GPP dynamics to address impacts coming from warming and drying trends in weather conditions. While it is possible to fit simpler models to daily data, their performance is worse when training them with half-hourly data. Simple model approaches might therefore be superior to estimate GPP while improvement and test of micrometeorological theory require more sophisticated mechanistic models and suitable calibration data contains more detailed and relevant information. Our results confirm the otherwise stated rule that the complexity and detail of the model structures must match with the information content in the observational constraints, as any shortcomings will lead to reduced model performance.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Data will be made available on request.