Methodology to calculate forest stand level maximum potential productivity, potential achievable productivity and ecosystem fit

A modified Loomis-William model was originally developed to estimate the theoretical maximum yields of crops. That model was adapted in this paper to measure how much of the theoretical maximum potential productivity (tNpptmax) is reached in any forest due to edaphic and climatic limits to growth, i.e., its “Ecosystem fit” (eFit). The procedure to calculate eFit has not been published except as a concept. Our goal is to describe the methodology in sufficient detail to facilitate its use by the scientific community and forest managers. To calculate eFit you need: 1) to convert all photosynthetically active radiation to a photosynthetic product for each forest plot or stand to calculate its tNpptmax, and 2) use field-collected data of total observed net primary productivity (tNppobs). Theoretical maximum potential tNpp is calculated with a simple light-use efficiency model as the product of the efficiency at which forest canopies absorb solar radiation, the photosynthetic conversion efficiency into biomass, and remotely sensed solar radiation with temperature data extracted to the geographic coordinates for the site. Ecosystem fit represents a forest's realized percent productive capacity and is the ratio of field-collected tNpp (i.e., tNppobs) to the theoretical maximum potential tNpp (i.e., tNpptmax).• Available indices to assess forest productivity and adaptive capacity to land-use disturbance and climate change are sensitive at the small-to-meso spatio-ecophysiological scales.• A more holistic index (such as eFit) will provide an informative picture of forest conditions where management practices are undertaken and the ecosystem's capacity to adapt to environmental change.• A comparison of eFit across similar forests within a climatic zone is an indication of the stressors or constraints that are being imposed locally and that limit tNppobs.


Ecosystem Fit rationale to measure adaptive capacity of forests to grow biomass
The "Ecosystem Fit" (eFit) of a forest [8] is the ratio of tNpp obs /tNpp tmax (where tNpp obs = total net primary productivity measured at each site and tNpp tmax = theoretical maximum potential total net primary productivity possible when there are no limits to growth), each in units of dry biomass (Mg ha −1 yr −1 ) in this paper. It expresses how well a forest is adapted to its site and its potential for growth, i.e., is it possible to grow more biomass than what is measured in the field. Ecosystem Fit [7] directly compares the measured tNpp (or tNpp obs as used in this paper) to growth that is theoretically possible for that site without site-level constraints (i.e., tNpp tmax as used in this paper).
Current measures of tNpp do not allow an assessment of how vulnerable a forest is to climate change since this metric does not identify ranked thresholds of low, medium, or high tNpp by site ( sensu [9,21] ). In addition, tNpp does not inform how well forests perform in relation to site limiting factors. Thus, comparison of sites requires a productivity metric that defines the maximum potential productivity for each site. For example, forest stand-level comparisons for management strategies could be misleading as an internal reference of gross carbon flux since it could differ unpredictably between sites due to the variability of stand-specific factors affecting productive capacity. Whereas calculating eFit of an ecosystem relative to its current environment can be obtained with the calculation of tNpp tmax as a reference condition to assess the potential of any site to grow biomass [7 , 8] .
Since photosynthesis is the ecophysiological framework where carbon assimilation occurs, it is an indicator to quantify the environment's functional limitations on production [8 , 14] . Trees convert solar energy into biomass within their genetic and environmental constraints. When all the structural and functional attributes of trees operate at their optimal levels, and the environmental conditions are within the normal range of function, productivity will be maximum for that ecosystem under the conditions present [7] .

Observed tNpp
The dataset used in this paper consisted of 267 natural forest field sites and 40 forest plantation sites (for a total of n = 307) to develop the methodology to calculate plot level maximum potential productivity (tNpp tmax ), observed or actual productivity (tNpp obs ), and Ecosystem Fit [9] . The data and variables included in the dataset were comprised of site-level field measures that included above-and below-ground productivity collected from mature boreal, temperate and tropical

Calculating tNpp tmax and eFit
A basic light-use efficiency model was first used to calculate the site-specific theoretical maximum potential net primary productivity without site-level constraints limiting growth. This provided a sitespecific maximum tNpp that was then used to calculate Ecosystem Fit. Theoretical potential maximum tNpp (or tNpp tmax ) is the product of photosynthetically active radiation (PAR) and plant physiological parameters to calculate biomass ( sensu [4] ). This simple efficiency model calculates growth as the product of mean solar radiation available during the growing season [5] , the interception efficiency ( E i ) at which forest canopies absorb solar radiation, and the conversion efficiency ( E c ) or rate at which absorbed solar radiation is converted into tree biomass. Mathematical equations and model parameter specifications were from Bernacchi et al. [1 , 2] and the Global Change & Energy Project [6] .
The calculation of E c or the conversion efficiency of forest canopies used the average annual atmospheric CO 2 concentration the year of data collection (1981-2011) for each site as measured by the Mauna Loa Observatory ( https://www.sealevel.info/co2.html ). Model assumptions were that photorespiration was a constant fraction of photosynthesis, PAR was 45% of total solar radiation, and forest canopies absorbed 90% of incoming PAR during active growth [1] . In addition, we assumed no nutrient or water limitations to growth. Finally, the growing season was calculated as total monthdays when mean monthly temperature exceeded zero.

Required data acquisition and processing
The methods involved in the development of tNpp tmax , eFit, and metrics of forest ecosystem resilience require the following: (1) direct field measures of primary productivity and site-level environmental conditions, (2) global proximal-sensed solar radiation and temperature data, (3) identification of productivity groups using unsupervised clustering, (4) determination of response thresholds, correlations of productivity for each explanatory variable, and the most important environmental variables using binary regression tree analysis, and (5) visualization of response dependence to environmental gradients and low-order interactions that are often critical to understanding ecological processes.
The Random Forest algorithm is not affected by violations of the typical model assumptions of linearity and normality with little/no multicollinearity, and homoscedastic variance. This method has several desirable characteristics: (i) the ability to handle multiple types of data; (ii) the ability to assess interactions and nonlinearity; (iii) the ability to calculate variable importance, and iv) the ability to handle unbalanced data sets [12] .

Required data
1) Field measures of climatic, environmental, edaphic, and biotic metrics.
2) Total net primary productivity (tNpp) calculated as the sum of field measures of aboveground net primary productivity (aNpp) and belowground net primary productivity (bNpp), and therefore is equivalent to tNpp obs as used in this paper. 3) High spatial resolution average monthly/daily temperature and solar radiation data extracted for the geographic coordinates of each site (buffer zones can be used to calculate a mean for a given area in lieu of point data).
Required derived data 1) Determine the length of the growing season, i.e., the number of grow days when the temperature exceeds zero ( G d ). 2) Calculate mean temperature for the growing season ( G t ).
3) Calculate mean solar radiation for the growing season ( G srad ).

Model assumptions and parameterization
where c (kJ/mol) represents a scaling constant and H a (kJ/mol) represents activation energy, R is the molar gas constant, and the temperature T ( °K) measured at each site. A = net rate of CO 2 uptake after accounting for carboxylation and oxygenation where Wi is the rubisco-limited rate of carboxylation ( μmol m −2 s −1 )

Derive theoretical maximum primary productivity (tNpp tmax ) and Ecosystem Fit (eFit)
Calculate tNpp tmax as the product of solar radiation during the growing season and plant physiological parameters.
where G srad is mean solar radiation available during the growing season and K is the inverse specific energy of plant biomass (18.4 MJ kg −1 ).
Photosynthesis is the eco-physiological framework where carbon assimilation occurs. Thus, it is an indicator to quantify the environment's functional limitations on primary productivity [8] . Trees convert solar energy into biomass within their genetic and environmental constraints. If we suppose all structural and functional attributes of trees are operating at their optimal levels, and the environmental conditions are within the tree's normal range of operation, productivity will be maximized for that ecosystem under the conditions present [7] . Based on these constructs, eFit is described as a ratio of measured tNpp obs to the estimated tNpp tmax (tNpp obs /tNpp tmax ) or percent eFit (see Eq. 1 ). Fig. 4. Comparison of the relationship between eFit and tNpp obs in each climatic zone (left to right) and leaf phenology (red = deciduous, green = evergreen). Colored plot rug along the x-axis indicates observations by phenology. The large point is the centroid which represents the mean of all observations and the grey background represents the 95% confidence interval. The statistic is the squared Pearson product-moment correlation coefficient for the corresponding data pairs for deciduous and evergreen natural forests ( n = 267).

Model assumptions and validation of field collected data
This study performed statistical functions in R version 4.0.3 [17] , and all analyses were subject to the Shapiro-Wilk tests of normality [22] . Correlations were considered significantly different at p < 0.05 and linear regressions were evaluated by behavior of residuals and variance explained. To demonstrate the independence of tNpp and eFit to phenological structure at the global scale and their reliance on stand-level site conditions, comparisons were made with Wilcoxon signed-rank tests for non-parametric and unbalanced data for climatic classifications. Random forest model performance was evaluated by variable importance and which variables (or combinations of variables) and their interactions had the most predictive power. We validated the random forest regression models with visualization of the discrepancies between field observations and predictions of tNpp obs and eFit and calculation of Pearson product-moment correlation coefficient for each ( Fig. 1 ).
To validate the random forest classification model of the climatic zones we first evaluated the underlying structure of the data by first removing all phenological and climatic membership identifiers. Dissimilarity matrices (distances between all pairs of data points) were calculated for the three most important variables identified by the random forest regression, mean annual temperature, precipitation, and annual grow season. A multivariate random forest model was used to predict the clusters. The data were then aggregated using the partitioning around medoids (PAM) method from the R package "cluster " [20] , which repeatedly iterates until the medoid position is stabilized. The medoid of the cluster represents the median of all the attributes included. The optimal number of clusters was selected by evaluating the average silhouette width and the variance explained by the first two principal components. The clusters were visualized with nonlinear dimensional reduction with the R package "Rtsne " [10] utilizing the t-distributed stochastic neighbor embedding technique (t-SNE) [15] . The algorithm first creates probability distributions of the distance relationships between points in their high-dimensional space and then projects them into a reduced 2-dimensional semantic space that retains the proximities of the relationships. This allows identification of misclassifications and density distributions for classes to each environmental variable ( Fig. 2 ).
All class designations were validated by comparing the climatic assignment to model-generated clusters ( Fig 3 . A) and to the environmental gradients of annual precipitation and mean annual temperature ( Fig. 3 B). The Z-score for eFit was added as proportional-sized shapes.
The relationships between tNpp and eFit vary by biome and phenological behavior ( Fig. 4 ). The amount of variance of eFit explained by tNpp was lowest in the temperate forest biome, and especially for deciduous forests where 76% of the variance of eFit was explained by tNpp compared to the evergreen forests (87%). This may be due to the high seasonal variability of the temperate zone compared to the stability of the tropics and strong seasonal forcing and limits of the boreal climate. Thus, if a researcher wants to explain tNpp and eFit in deciduous forests in the temperate climatic zone, other variables need to be explored to improve the strength of the relationship between eFit and tNpp.