Least squares-based biomass conversion and expansion factors best estimate biomass than ratio-based ones: Statistical evidences based on tropical timber species

Graphical abstract

whereŴ is the predicted tree component biomass and V stem volume. Eq. (1) is, actually, a regression through the origin (RTO) of biomass on stem volume where, therefore, the BCEF value is the slope. However, the regression slope (BCEF) is not obtained using least squares (LS), but as the ratio of observed tree component biomass and stem volume [2,6]. Hence, the sum of squares of the residuals is not minimum, which may lead to strongly biased biomass estimates. Furthermore, in this case, the biomass is not modelled [2].
The assumption behind Eq. (1) is that tree component biomass is directly proportional to stem volume and that if stem volume is zero, then concurrently, tree component biomass is zero, which is true. Therefore, the ratio estimators are deemed appropriate [7][8][9][10], and BCEF is then computed as such (i.e. using ratio estimators). Nonetheless, as mentioned previously, it fails by not using least squares and not modelling the biomass. Therefore, fitting Eq. (1) using RTO À i.e. obtaining BCEF in Eq. (1) using least squares À might provide more accurate biomass estimates than using ratio estimators (ratio-based BCEFs).
The objective of this study was to compare LS-based and ratio-based tree component BCEFs with regard to predictive accuracy and ability. The study addressed the following research question: do LSbased-and ratio-based BCEFs differ in terms of predictive accuracy and ability? It was hypothesized that LS-based tree component BCEFs provide most accurate and reliable estimates.
After measuring the diameter at breast height (DBH), the trees were felled considering a predefined stump height of 20 cm. The aboveground portion of the tree was divided into following biomass components: stem, branches, foliage, and crown (branches + foliage).
The stem was divided into 5 segments equal in length and the diameter of each segment was measured at the midpoint. The volume of the stem was determined using Hohenadl's formula [6]. Each segment was fresh-weighted in the field and a disc sample removed on the top of it for ovendrying and subsequent dry-weighting. Discs were oven-dried at 105 C until constant mass. The dry mass of each segment was obtained by multiplying the ratio of oven-dry-to fresh mass of the disc by the relevant fresh mass of the segment. The dry mass of the stem was obtained as the sum of the dry masses of the constituent segments.
After removing the leaves, each primary branch (along with its secondary and higher-order branches, and twigs) was fresh-weighted in the field. A sample composed by a disc removed from the primary branch, samples of secondary and higher-order branches and twigs were taken from each primary branch. Dry mass of each primary branch was obtained similarly to that of each stem segment.
All the foliage from the crown was measured in the field and a sample of approximately 5% of the fresh mass collected for oven-drying. The dry mass of the foliage was obtained similarly to that of each stem segment.

Analyses
Before computing the BCEFs, the Shapiro-Wilk normality test and normal Quantile-Quantile (Q-Q) plots were used to detect departures of each tree component biomass (the response variable) from normality (see Appendices 1 and 2 of Supplementary materials). Shapiro-Wilk normality test and residual Q-Q plots were also used to diagnose the residual distribution (non-normality or normality of the residuals) using ordinary linear regression (see Appendices 3 and 4 of Supplementary materials).
Thus, LS-based BCEFs were obtained using generalized linear model if the response variable (biomass) was found to have a residual distribution other than a normal distribution; and using ordinary linear regression if the residual distribution was normal.
All the residual of all tree components of all species were found to be normally distributed, except the foliage biomass of A. quanzensis and P. angolensis and AGB of P. angolensis.
Ratio-basedand LS-based BCEFs were compared with regard to predictive accuracy and ability. The predictive accuracy was determined by the following sources of errors in model prediction: (1) error due to model misspecification, (2) error due to uncertainty in the model parameter estimates, and (3) error due to residual variability around model prediction.
Error due to model misspecification is here expressed by Akaike Information Criterion (AIC) [11], as it is a measure of a relative quality of statistical models for a given set of data. The error due to uncertainty in the model parameter estimates is expressed by the standard errors of the regression parameters [12]: standard error of the BCEFs, in this case. In turn, the error due to residual variability around model prediction is here expressed by coefficient of variation of the residuals (CVr) and Furnival's index of fit (FI) [2,12].
where e i 2 is the square of model residual and H ii is the diagonal element of the projection matrix H.
The lower the MEP and MPE, the better the models in terms of predictive ability.

Predictive accuracy
The errors due to model misspecification of LS-based BCEFs, as judged by AIC, were up to 115% smaller than those of ratio-based ones (Tables 2 and 3). The standard errors of the parameters (BCEFs À slopes) varied from 8 to 333% for ratio-based BCEFs and from 4 to 15% for LS-based BCEFs. The errors due to uncertainty in the model parameter estimates of LS-based BCEFs were up to 97% smaller when compared to those of ratio-based ones. Thus, ratio-based BCEFs were associated with wider confidence intervals (Fig. 1); and for all tree components and species, except for P. angolensis, ratiobased BCEFs were found not to be statistically significant (Fig. 1).
FI and CVr were also considerably smaller for LS-based biomass models (LS-based BCEFs), denoting thus, smaller error due to residual variability around model prediction for LS-based biomass models when compared to ratio-based biomass models (ratio-based BCEFs).
The three sources of errors in model prediction prove that biomass estimates obtained from LSbased BCEFs are more accurate than those obtained from ratio-based BCEFs; e.g. LS-based BCEFs were associated with higher predictive accuracy than ratio-based BCEFs.
Note that, on average, ratio-based BCEFs were larger than LS-based BCEFs; i.e. LS-based BCEFs indicates lower dry weight per unit of stem volume than ratio-based BCEFs. For example, the ratiobased BCEF for stem and AGB for M. stuhlmannii (Table 3) indicate that stem biomass and AGB (in Mg) are 1.36-and 1.73-fold larger than stem volume (in m 3 ), respectively; whilst LS-based BCEFs indicate

Predictive ability
The mean quadratic errors of prediction (MEP) of LS-based BCEFs were up to 100% (range: 15-100%) smaller than those of ratio-based ones (Tables 2 and 3). On the other hand, the model prediction errors (MPE) of LS-based BCEFs were up to 84% (range: 21-84%) smaller than those of ratio-based ones. Thus, the predictive ability was higher for LS-based BCEFs than for ratio-based BCEFs.

Is RTO appropriated for estimating BCEFs?
In this paper, the a priori reason why the regression was forced to pass through the origin is that if stem volume is zero, then concurrently, tree component biomass is zero. However, this fact is not enough to justify the use of RTO since, as argued by Wooldridge [17], "one serious drawback of RTO is that, if the intercept is different from zero, then the LS estimators of the slope will be severely biased". Therefore, it was tested whether the hypothesis of the intercept being equal to zero (a = 0) is data admissible ( Table 4).
The intercepts of all models were found not to be significant (Table 4) at significance level of 5%, except for the foliage of C. mopane and M. stuhlmannii.

Brief discussion
Biomass regression equations, using easily measurable tree dimensions as independent variables (DBH and tree height), yield the most accurate estimates [18][19][20][21], provided that they are obtained from a large number of trees [1,22]. However, due to their readiness in converting available stem volumes  Fig. 1. Significance of tree component BCEFs. The error bars indicate 95% confidence interval (CI) computed as CI = t Â SE; where t is the critical value of t distribution at 95% of probability and n À 2 degrees of freedom; n is the sample size; and SE is the standard error. Table 4 Test of hypothesis that the intercept of the regression W = a + bV + e is equal to zero. into any component biomass and their close link to standard forest inventory results [4], ratio-based BCEFs are the most used in obtaining national and regional AGB estimates and GHG reporting [2,3]. Nevertheless, as shown here, they have a very crude predictive accuracy and ability, mainly because they are not obtained using least squares, not minimizing the sum of squares of the residuals. LS-based BCEFs, to a certain extent, combine the advantages of biomass regression equations and ratio-based BCEFs. However, it should be noted that LS-based BCEFs might not provide biomass estimates as accurate as biomass regression equations. This is because BCEF-based biomass is dependent on stem volume which, in turn, is dependent on DBH, stem height and, sometimes, form factor, if the volume is computed based on form factor instead of a volume equation. All these variables have their own sources of errors which are propagated when estimating biomass. When using biomass regression equation, however, the biomass is, most of the time, dependent only on DBH alone or on DBH and tree height, minimizing the sources of errors.
The choice of an appropriate biomass equation (e.g. BCEF) is decisive for reducing uncertainties in forest biomass stock estimates [23], especially in the context of Reducing Emissions from Deforestation and Forest Degradation (REDD+). Besides being least accurate and precise, it can be seen from Fig. 1 that, for each 100 m 3 /ha of stem volume, ratio-based BCEFs estimates up to 75 Mg ha À1 (76%) larger biomass than LS-based BCEFs. In this context, ratio-based BCEFs will lead, on average, to overestimation of emission factors (EFs), forest reference and emission levels (FRELs), and will compromise the reliability of the estimates of carbon stock changes. Consequently, with unreliable FRELs, the country or REDD+ projects contribution in mitigating climate change through forest related actions cannot be properly assessed and, moreover, the contributions will be unreliable as well.
One of the most important drivers of forest-cover change and forest degradation in Mozambique is selective forest logging [24], which is mostly concentrated in A. quanzensis, M. stuhlmannii and P. angolensis [24], 3 of the 4 species under study in this research. Therefore, these species are responsible of a large part of forest-cover change and forest degradation due to forest logging, thus responsible of carbon emissions from forest degradation caused by logging. This highlights the need of accurately estimating the biomass of these species.
Accurately estimating biomass is a critical step in quantifying carbon emission from deforestation and forest degradation and in reducing uncertainties of those emissions. At local and global level, emissions related to forest degradation are poorly quantified [25]. At local level (Mozambique), accurate estimates of biomass of the species responsible for forest-cover change and forest degradation due to selective harvesting may promote better quantification of emission from forest degradation. The total emission from selective harvesting is the sum of (1) extracted log emissions (ELE), (2) logging damage factor (LDF), and (3) logging infrastructure factor (LIF) [25]. LS-based BCEFs may improve significantly the estimates of the first two factors as described below: Accurate estimates of stem BCEFs of those species will lead to better estimates of ELE, which, according to Pearson et al. [25], are "emissions resulting from conversion of the log to wood products and the subsequent emissions from retired wood products". Accurate estimates of branches, foliage and crown BCEFs of the concerned species will provide better estimates of LDF, defined as emission resulting from decomposition of all the dead wood produced as a result of felling the tree(s) [25], which include the foliage and the branches.
Overall, when compared to ratio-based BCEFs, LS-based BCEFs are a potential tool for better estimating biomass and carbon stocks, emission factors and FRELs, while reducing their uncertainties. Specifically, at local context, since LS-based were developed for tree species that are most selectively harvested thus top responsible for forest degradation caused by logging, LS-based BCEFs of these species may contribute in (better) estimating the country-specific emissions from forest degradation.

Conclusions
In this study, ratio-based-and LS-based BCEFs were compared in terms of predictive accuracy and ability. LS-based BCEFs were associated with extremely lower (1) error due to model misspecification, (2) error due to uncertainty in the model parameter estimates, and (3) error due to residual variability around model prediction, when compared to ratio-based ones; leading to higher predictive accuracy. LS-based BCEFs had lower values of (1) mean quadratic error of prediction, and (2) model prediction error; leading to higher predictive ability than ratio-based BCEFs.