Predicting Tree Height ‒ Diameter Relationship from Relative Competition Levels Using Quantile Regression Models for Chinese Fir ( Cunninghamia lanceolata ) in Fujian Province, China

: The importance of the height ‒ diameter (H ‒ D) relationship in forest productivity is well known. The general nonlinear regression model, based on the mean regression technical, is not able to give a complete description of the H ‒ D relationship. This study aims to evaluate the H ‒ D relationship among relative competition levels and develop a quantile regression (QR) model to fully describe the H ‒ D relationship. The dominance index was applied to determine the relative competition levels of trees for the Chinese fir. Based on the basic Weibull growth model, the mean regression for five relative competition levels and 11 QR models was constructed with 10 ‐ fold cross ‐ validation. We have demonstrated that the H ‒ D relationship for the Chinese fir strongly correlated with relative competition states, but the five curves from mean regression models did not show a notable difference between the trends of H ‒ D relationship under different competition levels. Similar regression results were found in QR models of the specific quantiles; the average tree height of five competition levels varied between 5.78% and 17.65% (i.e., about 0.06 and 0.18 quantiles). In addition, some special curves of the H ‒ D relationship such as the QR models of the 0.01 and 0.99 quantile showed the H ‒ D relationship under certain conditions. These findings indicate that the QR models not only evaluated the rates of change of the H ‒ D relationships in various competition levels, but also described their characteristics with more information, like the upper and lower boundary of the conditional distribution of responses. Although the flexible QR curves followed the distribution of the data and showed more information about the H ‒ D relationships, the H ‒ D curves may not intersect with each other, even when the trees reached their maximum height. Hence, the QR model requires further practice in assessing the growth trajectory of the tree’s diameter or tree height to gain better results.


Introduction
The relationship between diameter at breast height (DBH) and tree height (H) is an important factor in forest research and is often used to estimate forest resources and wood production [1][2][3]. In the study of forest ecology and forest management, the height-diameter (H-D) relationship is one of the most important elements in estimating the stand volume and biomass, and is used as a reliable indicator of forest growth and sustainable forest management [4][5][6]. The precise estimation of the H-D relationship is essential for giving a clear description of the stand conditions and its changes over time, and can further be employed in the estimation of stand growth and yield projection [7,8]. For example, Parresol [9] described the relationship between the height and diameter of a stem as important components in yield estimation, stand description and damage appraisal. Hence, the H-D model has been widely used as an important method for obtaining tree heights and evaluating forest productivity in even or uneven-aged forests [10,11].
Various H-D models have been developed and applied for the management of different forest types [12]. The H-D model varies between stands and species; depending on the variables used, the models were classified into two types: the locally applied model and the generalized use model [13,14]. The locally applied models commonly are only dependent on tree diameter or tree age and only applicable to the stand where the data were collected [15]. On the other hand, the generalized use models, with other stand-level variables (such as site condition, competition status, environment, and climatic factors), can be applied to a large area [2,16]. According to Meyer [17], the H-D relationship is probably the most sensitive and reliable measure indicator of site productivity under natural competition. Huang and Titus [18] described the stand quality for four major species forests based on the H-D relationship of dominant and codominant trees. Kearsley et al. [19] built several H-D models and thought those useful tools improved the accuracy of forest aboveground biomass estimation, which further proved that the H-D relationship is an important feature for evaluating the competition effect on tree growth and stand yield. However, those H-D models are often modeled by the ordinary least squares (OLS) regression technique, and this method is a mean regression that may not be reliable when random and independent observations are included into the calculation [5].
Other estimation methods such as maximum likelihood (ML) and restricted estimation maximum likelihood (REML) for mixed-effect models addressed this problem and have been applied in the study of height-diameter and crown diameter prediction models [14,20]. Although these modeling approaches are better at estimating the H-D relationship, the description for the distribution of the H-D relationship may still be weak. It is difficult to predict some extremely distributed observations without prior information supported by the mixed-effect model.
Quantile regression (QR) is a method to estimate the full conditional distribution of dependent variables or quantile functions of interest, presented by Koenker et al. in 1978 [21,22]. This method is flexible in terms of describing the different distribution of the relationship between independent and dependent variables [23]. In recent years, QR has increasingly been used to study forest inventory [24], stand density [25], diameter distribution prediction [26], diameter growth [27], tree taper [28], and the aboveground biomass of forests [29]. Up to now, QR has been used in the study of tree height prediction and stand tree height simulation, but rarely reported in the relationship between age and height or diameter and height [5,30]. Unlike the mean regression method, the QR method can describe the trend of the dependent variable at any quantile. Thus, it may able to extract more information from the data, and obtain a more comprehensive analysis of the relationship between height and diameter [31,32].
The Chinese fir (Cunninghamia lanceolata(Lamb.)Hook.) has the largest afforestation area in South China. Due to its rapid growth and excellent wood quality, this species has remarkable economic value and important ecological benefits [33,34]. To the authors' knowledge, although there have been a large number of studies conducted based on the H-D models to estimate the relationship between height and diameter in Chinese fir plantations [35,36], few studies have considered the QR approach. In addition, only scant attention has been paid to a comprehensive analysis of the relationship between height and diameter, especially for the application to non-dominant trees [1,11,37]. Therefore, the ability of the QR models to describe and evaluate the H-D relationships of the Chinese fir has not been fully validated.
To address the aforementioned problems, the goals of this study were to: (i) quantify the five relative competition levels of the Chinese fir, (ii) develop and evaluate H-D models applying mean regression and quantile regression approaches for the five competition levels, and (iii) compare the H-D curves in different competition levels to analyze the growth relationship between DBH and H at each level.

Data Collection
The data used in this study were collected from the National Forest Farm at Jiangle County, northwest Sanming City, southeastern China (117°05′-117°40′ E, 26°26′-27°04′ N) in July 2018. The region experiences a subtropical monsoon climate, with an annual average temperature of 18.6 °C and annual average precipitation of 1721.6 mm. Precipitation is concentrated in the summer. Plantation species in the forest farm include Chinese fir, Pinus massoniana Lamb., Schima superba Gardn. et Champ, Michelia macclurei Dandy, and Phyllostachys heterocycla (Carr.) Mitford cv. Pubescens.
Data for this study were obtained from the even-aged and pure stands plantation (more than 85%) throughout the forest farm. A total of 3133 trees in 28 sample plots (five temporary and 23 permanent) were investigated from the study area and set up in five age groups, including young forests (1-10 years), middle-age forests (11-20 years), nearly mature forests (21-25 years), mature forests (26-35 years) and over-mature forests (≥36 years). The projected area size of plots ranged from 400 to 900 m 2 , depending on the stand density. The plot size of dense stands is 400 and 600 m 2 , while the plot size of sparse stands is 600 and 900 m 2 . In each plot, we recorded its coordinates, elevation, slope aspect, degree, and position. The tree characteristics such as species, DBH (>4 cm, measured at 1.3 m above ground), H, crown width (CW), and the dominance tree height (DH) of the stand were measured ( Table 1). The tree positions (relative x, y coordinates) were recorded by grids of subplots (10 × 10 m) and a distance measurer. The southwest corner of each plot was assigned as the origin point for tree position measurement, and the x, y coordinates were measured to the nearest 0.05 m. In order to avoid the edge effect in the dominance index calculation, a 2-m buffer was set up along the sample plot boundaries [38]. The trees in the buffer zone were only considered in the neighborhood structure [39].

Relative Competition Quantification
Based on the neighboring trees, the Hegyi competition index and the dominance index were both used to quantifying the competition at the individual tree level [40,41]. The former has a higher accuracy of quantification, which is applied to the study of competition intensity with growth. The latter estimates the relative competition pattern, which simplifies the process of quantification and weakens the effects of abnormal trees [42]. Due to these characteristics, the dominance index is more applicable to our objective. Therefore, the trees were divided into five relative competition levels by using the dominance index (Di). The index expresses the degree of difference in the tree's characteristics (such as DBH, H, volume, and biomass) [43]. The dominance index is defined as the ratio of a certain number of neighboring trees that are larger than the reference tree. The equation is as follows: where 0， ℎ ℎ , 1， ℎ . In this study, difference in DBH was used to obtain the value of kij, and the number of neighboring trees n was set as 4. Therefore, the value of dominance included 0, 0.25, 0.5, 0.75 and 1, representing five competition levels, i.e., predominant (D0), dominant (D0.25), co-dominant (D0.5), suppressed (D0.75), and inferior (D1) (Figure 1). A larger Di value indicates that the reference tree is under competitive pressure [44].
The dominance indices of trees were calculated using the "forestSAS" package (version 1.0.1) [45] in R (version 3.6.0). Figure 1. The case of a reference tree i (black) and its neighboring trees (white) in each competition level. The circle size is proportional to diameter at breast height (DBH).

Base Model
Several base H-D relationship functions were selected based on previous studies [12,13,15,46]. Based on a preliminary analysis (not shown), the Weibull function fits the data best, which is similar to the findings of studies of Özçelik [5] and Yang [47]. The base function is as follows: where Hij and DBHi are, respectively, the total height (m) and diameter at breast height (cm) of the jth tree in the ith plot; DHi is the dominant height of the ith plot; and a, b, c, and d are the regression parameters.

Quantile Regression Model
For a nonlinear QR model, Equation (2) was used to predict the τth quantile. The quantile regression model is as follows: where is the predicted value of the τth quantile of tree height at diameter DBHij. The function computes an estimate on the τth conditional quantile function of the response, given the covariates, as specified by the formula argument [21,48].
Parameters from the quantile regression were obtained by minimizing the following problem [49]: The "quantreg" package (version 5.51) was used to develop a set of series quantile regression models; the observations and parameters belong to the moderate-sized (n ≪ 5000, p ≪ 20) category, so it is recommended that the default "br" method be used [48].

Evaluation and Testing
To compare the fit of basic H-D models, 10-fold cross-validation [50] and the following criteria for evaluation were chosen: Akaike information criterion (AIC), Bayesian Information Criterions (BIC), log-likelihood value (logLik), root mean squared error (RMSE), coefficient of determination (R 2 ), and mean absolute error (MAE).
where is the likelihood function for the model; yi and are the observed and predicted values of tree height, respectively; and is the average value of yi. All calculations were performed using R (R 3.6.0 for Windows). The core code can be found in Appendix A.

The Quantification of Five Relative Competition Levels
The statistical results (Table 2) showed the characteristics of DBH and H in each relative competition level, and their changes with competition levels. The difference in the DBH between the competition levels ranged from 11.33% to 23.31% with the increase in competitive pressure (D0 to D1 level), while the difference in tree height ranged from 5.78% to 17.65%. The proportions of trees, starting from the competition level D0 to D1, are 22.73%, 20.43%, 18.82%, 19.30%, and 18.72%, respectively, which shows that the number of trees at the five levels is not very different. According to the plotting of H-D relationships for five levels (Figure 2), the distribution and trend changes of H-D for the D0 level (Figure 2a) are relatively lower than (or to the right of) the other levels, whereas for the D0.75 and D1 levels (Figure 2d,e), they are relatively higher than (or to the left of) the other levels. To an extent, the clear distribution of the H-D relationships among the five competition levels proved the applicability of the dominance index in the relative competition quantification.

Base H-D Model Validation
The parameter estimates and fit statistics for base H-D models of the Chinese fir are listed in Table 3. All of the models that produced the H-D relationship performed well, with R 2 ranging from 0.8982 to 0.9027 in the training group, and with low RMSE and MAE ranging from 1.6479 to 2.0886 and 1.2747 to 1.6167, respectively, in the testing group. The fold-6th model had the lowest values of AIC, BIC, and the largest values of logLik; R 2 predicted the best results for training data. The coefficients of this model are a (5.6653), b (0.7755), c (0.0832), and d (1.3265). Hence, this model was further tested, and the residual plot ( Figure 3) shows homoscedasticity, which confirms that other weights were unnecessary. Therefore, the base model is suitable for H-D relationship fitting.

Mean Regression Models for Five Competition Levels
The parameters and summary statistics of H-D models for five competition levels are listed in Table 4, and the different model coefficients with 95% confidence interval (CI) between the five competition levels are shown in Figure 4. Depending on the coefficient change trend of the models between each level, the difference among the competition levels could be observed by the change ratio of the coefficients. In the Weibull model we used, the coefficients before the natural logarithm "e" (including a and b) are directly proportional to the ratio of H-D and the maximum tree height. The coefficients c and d related to the rate at which the tree height reaches the maximum tree height with increasing DBH. For example, the model coefficients c and d of the D1 are 36.46% and 9.69% smaller than D0.75. Therefore, the difference of H-D relationship trend between each competition level can be judged by the rate of change of the model's coefficients. However, the rates of change for the H-D relationships near the upper boundary of the conditional distribution of H-D relationships are still unclear [31], which shows that the conventional mean regression may be inappropriate or limited for estimating the H-D relationships with ungraded data for competition level.   (Figure a-d). The gray area is 95% confidence interval (CI). The black thin dotted line is the coefficient value of the base model.

Quantile Regression Models
According to the results of the quantile process for all trees (Figure 5), the variation trends of the QR models in some specific quantiles were similar to the other competition model's parameters (a, b, c and d). This means that the QR model may be used to predict the H-D relationship of trees at different competition levels. The median quantile (τ = 0.5) model and multiple quantiles like τ = 0.4 and τ = 0.6 were similar to the basic model, while other quantiles were also informative. The     (Figure a-d). The gray area is 95% confidence interval (CI). The dotted line is the coefficient value of the base model.

Comparison of Modeling Approaches
In order to illustrate the behavior of the H-D model and to compare the application capabilities of the two methods in practice, the H-D curves (as a stand height curve) fitted by the two methods were plotted ( Figure 6). The comparison results showed that the H-D relationships of the five different competition levels fitted by the mean regression (Figure 6a) might not be clearly distinguished. On the other hand, the quantile regression fitted the H-D curves of different shapes based on the different distribution characteristics of tree height increasing with DBH, which is flexible and more informative about the H-D relationships (e.g., combined with the H-D relationship distributions of the five competition levels, the dominant trees (D0) were mostly distributed in the 0.01 to 0.25 quantile range; the medium trees (D0.5) were mainly distributed in the 0.25 to 0.75 quantile range; the suppressed woods (D1) were mostly distributed in the 0.75 to 0.99 quantile range). However, the limitations of quantile regression are also obvious: the fitted curve of the model follows the distribution of the observed data and the basic shapes dictated by the quantile regressions [5], thus preventing the curves from crossing one another (when the tree reached its maximum height; the maximum height of the Chinese firs in this study area is about 24 m).

Discussion
To clarify the relationship between the height and diameter in different competition conditions, the mean regression and QR models for the H-D relationship were developed based on the Weibull growth model, and compared in this study. With the same coefficients of the H-D models, the estimation of the QR covered most of the predicted observations from the five competition level mean regression models. To compare the variation of the model parameters, the parameters in the five competitive mean regression models were consistent with the parameters in the QR model on the quantile between 0.3 and 0.7, which shows that the average tree height of five competition levels varied between 5.78% and 17.65% (i.e., about 0.06 and 0.18 quantiles). This shows that it is feasible to predict the H-D relationship curves of different competition levels or describe the various types of growth patterns with specific quantile regression models [51,52]. In addition, we can deduce the current growth and competition of trees by referring to the conditional distribution of the H-D relationship in the QR model. If the tree does not reach the maximum tree height, the current H-D relationship can be used to estimate its respective competition level. The H-D relationship of the dominant tree is closer to the lower boundary of the H-D distribution [31], i.e., the low quantile (0.01-0.3), while the suppressed tree is closer to the upper boundary of the H-D distribution, i.e., the high quantile (0.7-0.99). This rule is consistent with the general situation of tree growth [27,37], while the mean regression model cannot effectively quantify this phenomenon under this distribution range [30].
For most H-D growth relationship research in the forestry and ecology fields, the conditional mean of dependent variables is most important for statistical modeling, but sometimes researchers are more interested in the outliers or extreme values of the dependent variable. The quantile regression results in the median quantile range are similar to the mean regression results [31,53], and this is also confirmed by the fitting results of model parameters in this study. However, based on the characteristics of the QR model, the conditional distribution of the dependent variable is fully described by quantile regression and is therefore of more interest to researchers. The curves of the QR model are often used to explore the abnormal growth in the study of growth charts [54,55]. On the one hand, the QR model provides a more detailed view of the growth trend (e.g., the H-D relationship in this study) of trees in describing the structural distribution and changes to the H-D relationship. On the other hand, the QR models predict the H-D relationship assuming that the predictor variables have a specific distribution in response to different conditions, in this case, the competition levels [31].
Although the QR model is reliable in some studies, there are two problems that have to be solved for its practical application. First, when the distribution of the dependent variable (i.e., tree height) is concentrated in the data, there was no significant difference between the QR models. The QR models in this study showed concentrated distribution of dependent variable from 0.4 to 0.6 quantiles, similar to the previous study of the tree height prediction [27]; as a result, these QR models did not differ significantly. However, when the standard deviation of the tree height was large, or the dispersion degree of the H-D relationship was obvious, as in this study, the QR models of different quantiles were significantly different. Therefore, for the QR model, the distribution of the data may be as important as the quantile selection. Second, the growth trends of trees in fact crossed each other, while the growth trend curves fitted by QR are not allowed to do so as the QR curves usually follow the distribution of the data and the guidance of the base shape [22,48]. Therefore, the QR model may be more suitable for research about classification by models (such as the site index model and growth dynamics model). In conclusion, the distribution of data and the base guiding curve are the main issues that should be considered for QR model fitting.