Evaluating the accuracy of ALS-based removal estimates against actual logging data

We examined the accuracy of the stand attribute data based on airborne laser scanning (ALS) provided by the Finnish Forest Centre. The precision of forest inventory data was compared for the first time with operative logging data measured by the harvester. Airborne laser scanning (ALS) is increasingly used together with models to predict the stand attributes of boreal forests. The information is updated by growth models. Information produced by remote sensing, model prediction, and growth simulation needs field verification. The data collected by harvesters on logging sites provide a means to evaluate and verify the accuracy of the ALS-based data. This study investigated the accuracy of ALS-based forest inventory data provided by the Finnish Forest Centre at the stand level, using harvester data as the reference. Special interest was on timber assortment volumes where the quality reductions of sawlog are model predictions in ALS-based data and true realized reductions in the logging data. We examined the accuracy of total volume and timber assortment volumes by comparing ALS-based data and operative logging data measured by a harvester. This was done both for clear cuttings and thinning sites. Accuracy of the identification of the dominant tree species of the stand was examined using the Kappa coefficient. In clear-felling sites, the total harvest removals based on ALS and model prediction had a RMSE% of 26.0%. In thinning, the corresponding difference in the total harvested removal was 42.4%. Compared to logged volume, ALS-based prediction overestimated sawlog removals in clear cuttings and underestimated pulpwood removals. The study provided valuable information on the accuracy of ALS-based stand attribute data. Our results showed that ALS-based data need better methods to predict the technical quality of harvested trees, to avoid systematic overestimates of sawlog volume. We also found that the ALS-based estimates do not accurately predict the volume of trees removed in actual thinnings.


Introduction
Combined use of airborne laser scanning (ALS), field plots, and predictive models is today the most important source of information in management-oriented inventories of Finnish forests (Maltamo and Packalén 2014). ALS is also a promising method in national forest inventories (Grafström and Hedström Ringvall 2013) and pre-harvest stand measurement (Peuhkurinen et al. 2007). In practical forest planning, information is required by tree species (Packalén 2009). Aerial imagery is often used to interpret tree species and other attributes that are difficult to predict from laser scanning data (e.g., Packalén and Maltamo 2007;Ørka et al. 2013).
Verifications of inventories have shown that inventories based on ALS data (Wallenius et al. 2012) can be more accurate than those obtained by using traditional field-based methods (Suvanto et al. 2005). In addition, both in the evaluation of tree species-specific attributes (e.g., Packalén and Maltamo 2007;Breidenbach et al. 2010) and in the measurement of individual tree attributes (e.g., Korpela et al. 2010;Vauhkonen 2010;Yao et al. 2012;Silva et al. 2016), the accuracy has been at least the same as in traditional field assessments. However, further research is needed to improve the accuracy of tree quality estimation in ALS-based forest inventory (Wallenius et al. 2012).
The Finnish Forest Centre collects, maintains, and distributes stand attribute information on Finnish forests (Finnish Forest Centre 2019a). The data are based on the combined use of field inventories and remote sensing. Models are used to predict timber volumes and update the data. Field plots are used as training data and ALS is used to generalize the results over large inventory areas. Due to the amendment of the Forest Information Act, which came into force in early March 2018, much information was made publicly available through the Metsään.fi service (https://www.metsaan.fi/). Information on sawlog and pulpwood harvest removal by tree species is essential in timber sales and in the planning of harvesting operations. Information on the quality characteristics of trees is also important (Holopainen et al. 2013). When predicting timber assortments, the training data should have precise stand level information on sawlog and pulpwood removals, which can be measured in practice only by a harvester (Malinen et al. 2003).
Previous studies on the accuracy of ALS inventory have usually compared ALS-based stand attribute estimates with field measurements (e.g., Naesset 2007;Wallenius et al. 2012;White et al. 2013). The problem with these comparisons is that a part of the field "measurements" are model predictions. This is the case for example with timber assortment volumes, which are based on taper models and predicted quality deductions. There have also been some attempts to use harvester data for similar purposes (Siipilehto et al. 2016;Pesonen 2017). Harvester data have also been used as training data in the modeling of stand attributes such as volume, basal area, and diameter distribution, using laser scanning metrics as predictors (Bollandsås et al. 2011;Peuhkurinen et al. 2011;Holmgren et al. 2012;Barth and Holmgren 2013;Hauglin et al. 2018;Saukkola et al. 2019). For example, Bollandsås et al. (2011) studied the prediction of tree volume and quality characteristics in northeastern Norway.
Previous research on the use of harvester data in forest inventory purposes has so far been minor because the collection and combining harvester data with remote sensing information is tedious (Holopainen et al. 2013). Harvester data are collected for timber sales transactions and research needs are seldom taken into account. Besides, GPS positioning of harvester data has generally been inaccurate (Lindroos et al. 2015). However, Hauglin et al. (2018) recently presented an approach for improved GPS positioning with an average error in the tree location of about 1 m.
In this study, we applied operational large-scale logging data and compared them with ALS-based stand attribute data in the Metsään.fi service. The most important characteristics compared were the total volume and tree species-specific volumes by timber assortment. We considered both clear cuttings and thinnings. We also investigated the accuracy of determining the dominant tree species from ALS-based volume estimates since this information is of primary interest in harvesting and silviculture operations.

Study stands and volume calculation
Most of the stands were in southeastern Finland (Fig. 1). Stands from which Stora Enso Wood Supply Finland (WSF) purchased timber in 2013 or later were selected. About 50,000 stands were found, of which slightly more than half were thinning sites and less than half were clear-cutting sites. There were slightly less than 4000 forest stands for which forest resource information was available, but the information was not based on remote sensing in all cases.
The stand attribute information in Metsään.fi is based on the area-based interpretation approach of low pulse density (approximately 0.5 pulse m −2 ) ALS data and optical aerial image data (Finnish Forest Centre 2019a). Areal images are used to interpret the main tree species of the stand. The metrics of remote sensing data are combined with accurately measured field training data. The estimation method varies between inventory areas but usually, the k nearest neighbor approach is used (Maltamo and Packalén 2014). This results in tree species-specific stand attributes, such as mean diameter, basal area, and Lorey's height. By using this information, together with theoretical diameter distribution models and tree height models (Siipilehto 1999), the diameter distributions and tree heights in different diameter classes are predicted. The calculation procedure produces tree-level data, making it possible to use tree-level models in the calculation of timber assortment volumes, quality deductions and thinning removals, and update the information with the help of individual tree growth models.
The forest inventory information is updated annually using species-specific tree-level growth models (Hynynen et al. 2002). The need for thinning is evaluated based on stand basal area and dominant height, using the thinning instructions of Äijälä et al. (2019). If there is a need for thinning, i.e., basal area is higher than the thinning limit of the instruction, a thinning is simulated using the predicted diameter distribution. Trees are removed from different diameter classes until the basal area is reduced to the recommended post-thinning level. By using predicted diameters and tree heights, species-specific tree volumes and timber assortments are calculated by the taper models of Laasasenaho (1982). The minimum over-bark top diameter of sawlog or veneer log is 15 cm for pine, 16 cm for spruce, and 18 cm for decidious with a minimum piece length of 4 m for conifers and decidious. For pulpwood logs, the minimum top diameter is 7 cm and the minimum piece length is 2.7 m. Quality deductions of sawlog volumes are calculated using Mehtätalo's (2002) models with empirical municipal correction factors. The deduction depends on the fertility, latitude, altitude, and soil type (peat vs. mineral soil) of the site, species, age, and dbh of the tree, and whether the tree is planted or of natural origin. Finally, the resulting stand attributes are wall-to-wall calculated over a continuous grid of cells, and stand results are calculated from the cell results.
Metsään.fi information was retrieved at the latest date before logging when it was available. It was verified that the stands were similar in location and area to the harvested stands. Every stand was checked individually by comparing the stand borders in Stora Enso's map system and Metsään.fi database. The harvested stands selected for the study also had to be consistent with the stand level forest inventory data produced by the Finnish Forest Centre. Harvester measurements were analyzed case-by-case using the StanForD-based harvester files (Skogforsk 2007) to verify that no abnormal harvesting had occurred.
The harvester data included information on all merchantable stem pieces cut in the stand except waste pieces of decayed wood and treetops. The volumes of harvested trees were measured by the measurement system of the harvesters. In the cutting operations of the study stands, the sawlog lengths used were mainly 3.7-5.5 m with the increments of 0.3 m. Also, some shorter (3.1 or 3.4 m) and longer (5.8 and 6.1 m) log lengths were cut. The minimum top diameter of spruce logs was 16 cm, 15 cm with pine logs, and 17 cm with deciduous logs. The lengths of the pulpwood logs were 2.7-5.0 m. The minimum top diameters of spruce, pine, and deciduous pulpwood pieces were 7, 6, and 5 cm, respectively.
Separate comparisons were made for total, pulpwood, and sawlog removals. Comparison of timber assortments was made by the tree species: Scots pine (Pinus sylvestris L.), Norway spruce (Picea abies (L.) Karst.), and deciduous trees (mainly birch (Betula spp.)). All deciduous trees were combined into one class of sawlog and pulpwood removals. Conifer species were separated since they are the dominant tree species in Finland whereas deciduous trees typically form admixtures in conifer-dominated stands. For pine and spruce, sawlog and pulpwood removals were available directly from the harvester file. It was also verified that the logging had been carried out according to the plan. If this was not the case, the stand was excluded from the study.
After removing all inconsistencies, 82 clear cuttings were selected for the study. These clear cuttings comprised 121 stands with a total area of 148.3 ha. The total growing stock volume of the stands as measured by the harvester was 40,125.2 m 3 . Correspondingly, 79 thinning blocks were selected for the study. They consisted of 149 stands with a total area of 223.6 ha. The total volume removed in the thinning treatments, as measured by the harvester, was 16,609.8 m 3 . Of the 149 thinnings, 33 represented the first commercial thinning. The mean volumes of the study data are shown in Table 1.

Statistical analysis
The root mean square error (RMSE) and bias (BIAS) between harvester data and Metsään.fi estimates were calculated for total volume, tree species-specific volumes, and timber assortment volumes (Eqs. 1 and 3). The corresponding relative RMSEs and biases were calculated by Eqs. 2 and 4. Finally, the correlation between the harvester data and Metsään.fi estimates was calculated using the Pearson product-moment correlation coefficient. Bias Bias−% ¼ Bias where y obs is the observed value at logging site, y pred is the predicted value at logging site, N is the number of stands, and y obs is the average of observed volume.
Determining the dominant tree species is important for the planning of forest management. Usually, it is the question of separating pine-and spruce-dominated stands since deciduous-dominated stands are rare. The dominant tree species of the stand was determined by summing up the treespecific removals, both with harvester data and Metsään.fi data. Tree species with the highest harvested volume was regarded as the dominant tree species in a clear-cutting stand. The overall accuracy of the identification of dominant tree species was analyzed using the Kappa coefficient: where p o is the proportion of correctly classified observations and p e is the possibility for randomly correct classification. Landis and Koch (1977) have defined the significance of Kappa coefficient values as follows: < 0 no concordance, 0-0.20 slight, 0.21-0.40 valid, 0.41-0.60 moderate, 0.61-0.80 noteworthy, and 0.81 to 1.00 almost complete concordance.

Accuracy of ALS estimates in clear cutting
In clear cuttings, the scatterplot between the total measured removal by the harvester and the estimated removal of Metsään.fi showed a quite linear relationship (Fig. 2). The RMSE of the estimated removal of Metsään.fi was 26.0% ( Table 2). The bias was about 4% and the removal estimates of Metsään.fi were more often over-than underestimates. However, for very large (> 450 m 3 ha −1 ) removals, Metsään.fi produced considerable underestimates (Fig. 2).
The clear-cutting stands were mostly spruce-dominated with more than two-thirds of the total removal being spruce (Table 3). In the case of spruce sawlog, there was a rather strong relationship between measured removal and the Measured removal by harvester (m³ ha -1 ) Fig. 2 The scatterplot between the measured removal by harvester and estimated removal of Metsään.fi in clear cuttings estimated removal in Metsään.fi (Fig. 3). However, the RMSE% was almost 50% and the bias was more than 20% ( Table 4). The removals in Metsään.fi were overestimated, especially for spruce sawlog. The relationship was less pronounced for spruce pulpwood for which the RMSE was more than 50%. The bias was about −30%, Metsään.fi estimates being lower than the realized removals of spruce pulpwood. Less than a quarter of the total removal was pine sawlog or pulpwood ( Table 3). The scatterplot between measured removal and Metsään.fi estimate showed a rather linear relationship for pine sawlog (Fig. 3). The relative RMSE was slightly less than 70% and the bias about 4% (Table 4). The correlation between harvester-measured removal and Metsään.fi estimate was the weakest for pine pulpwood after deciduous sawlog ( Table 4). The RMSE% was over 100% for pine pulpwood and the bias was nearly -6%. Metsään.fi produced a slight systematic underestimate for pine pulpwood removal.
In the data, the amount of deciduous removal was the smallest, less than a tenth of the total harvest. The correlation between measured and estimated removals was the lowest for deciduous sawlog (Table 4) for which the RMSE% was almost 170% and bias about 14% (Table 4). With deciduous pulpwood, there was a significant correlation between the measured removal by the harvester and the estimated removal of Metsään.fi. The RMSE% was slightly below 100% and the bias for decidious pulpwood removal was about -16%.
For all timber assortments, the correlation between removal measured by harvester and estimated removal of Metsään.fi was statistically significant at the 1% level. In general, sawlog removal was systematically overestimated and pulpwood volume was underestimated, most clearly in spruce. The harvest removals of the other tree species were small, and therefore, the absolute over-and underestimations were also small. The relative errors of the tree species-specific removals were considerably higher than obtained for the total logging removal.
In clear cuttings, the dominant tree species was interpreted correctly in 87.7% of the stands. The Kappa value was 0.7 implying that there was considerable consistency between the materials (Table 5). According to the logging data, 54 of the 73 stands were spruce-dominated and 50 of them were sprucedominated also in the Metsään.fi data. A closer inspection of the results revealed that the main tree species was classified incorrectly mainly in mixed stands. Determination of the dominant tree species was incorrect for nine stands (Table 5).

Accuracy of ALS estimates in thinning
At thinnings, the correlation between measured removal by the harvester and estimated removal based on ALS inventory was low (Fig. 4). The thinning removals of Metsään.fi were mostly between 50 and 100 m 3 ha −1 and included both overand underestimates. The bias was -0.58% (Table 6). The correlation between removal measured by the harvester and estimated removal of Metsään.fi was considerably lower than in the case of clear cuttings but still statistically significant at the 5% level. The RMSE% was slightly above 40%. Many of the thinning stands were pine-dominated as about half of the actual removal was pine (Table 7).

Discussion
In clear cuttings, the RMSE of removal estimate of Metsään.fi was 26%, which corresponds to an error of about 70 m 3 per hectare. The bias was about 4% indicating that Metsään.fi slightly overestimated the removed volume. The results were less accurate than in studies where the estimate based on laser scanning was compared with the field measured value (e.g., Packalén and Maltamo 2007;Wallenius et al. 2012), most probably due to additional sources of errors such as mismatches between stand borders in Metsään.fi and the boundaries of harvest blocks. The acquisition date of the ALS data varied, and updating the data with growth models may have increased the differences between ALS-based and harvester data. Some other factors may also have caused errors. The waste pieces of decayed wood are not included in the harvester data (cf. Kärhä et al. 2019) although they are included in ALS-based estimates. The same trend has been observed also in earlier studies. For example, Hauglin et al. (2018) observed greater error rates when harvester data were used as training data for ALS-based volume prediction compared to field plot inventory-based training data.
In our study, the RMSE% was acceptable when looking at the total harvested volume, and it seems that the Metsään.fi estimate is fairly good. In the case of thinnings, the RMSE% was slightly over 40%, corresponding to a deviation of about 30 m 3 ha −1 between actual and estimated removal. However, the correlation between Metsään.fi estimate and true removal was low, suggesting that the thinning model applied in Metsään.fi does not produce results that reliably predict the removals of actual thinnings. The difference between the two estimates is only partly due to errors in Metsään.fi. Another reason for the differences is that the operator of the harvester may not follow the same thinning instructions as was used in Metsään.fi. Besides, the forest landowner (timber seller) and the procurement company may also have given thinning instructions that differ from those used in Metsään.fi. In the dataset used in this study, the removals of highly stocked stands were underestimated in Metsään.fi. This is an interesting result since, contrary to optical imageries, ALSbased volume estimation should not underestimate large volumes (Maltamo et al. 2006). The number of observations in our study is nonetheless so small that these results cannot be generalized. The main shortcomings of ALS-based forest Fig. 3 Tree species-specific scatterplots between timber assortment removals harvested (measured by harvester) and Metsään.fi estimates in clear cuttings  inventories are the low accuracy of tree species identification and the difficulty to predict the technical quality of the stems of trees (e.g., Wallenius et al. 2012;Naesset 2014). Also in our study, the RMSEs were considerably higher for tree speciesspecific volumes than for the total volume. ALS-based sawlog volume was overestimated, most probably mainly due to insufficient technical quality on many of the harvested stems.
Another reason for the differences in timber assortment volumes is crosscutting, which might differ in ALS-based calculation and in actual cutting. The operator of the harvester often uses manual bucking with pine and automated algorithms for spruce (Kärhä et al. 2017). Crosscutting depends on the desired distribution of different log lengths, which is ignored in the ALS-based calculation of assortment volumes.
On the other hand, the dominant tree species was interpreted correctly in approximately 90% of the stands in clear cutting, which is a high percentage. Determining the dominant tree species is usually successful with aerial photographs but determining the secondary tree species can be very challenging (Maltamo and Packalén 2014).
The vast majority of clear-cutting stands of this study located in southern Finland where spruce bark beetle (Ips typographus) is causing damage in large areas (Heino and Pouttu 2013). As a consequence of bark beetle damage, a part of log-sized spruce trees may have turned into energy wood or pulpwood. Spruce root and butt rot (Heterobasidion parviporum) is also common in the study area (Piri et al. 2017). In the case of spruce, the transition from sawlog to pulpwood was large, but it varied by region. However, the sawlog volume of the other tree species was also commonly overestimated in Metsään.fi data. Therefore, field measurements are required for all species to assess the quality properties of trees (Barth and Holmgren 2013). Wider use of harvesting data makes it possible to avoid high-cost field inventories.
In the case of thinning, larger deviations from Metsään.fi data are most likely partly due to the thinning models. Thus, based on the Metsään.fi estimate, it is not possible to realistically predict the amount of trees that are removed in actual thinnings. In Metsään.fi, determination of the dominant tree species for thinnings is not unambiguous, because some species not suitable for the site may have been removed Measured removal by harvester (m³ ha -1 ) Fig. 4 The scatterplot between the measured removal by harvester and estimated removal of Metsään.fi in thinnings completely in the felling. On average, removals were underestimated more in thinning than clear cutting, but the bias was very low in both cases. A slight underestimation may indicate that the actual thinning is implemented slightly more intensively than suggested by the thinning models (cf. Finnish Forest Centre , 2019b. The study provided new information on the accuracy of Metsään.fi data in operational use. In conventional timber trade, remote sensing-based information still requires field verification but is useful for the purposes of timber purchasing. For example, when searching for stands with potential harvest removal, the Metsään.fi service provides useful assistance and acts as a starting point for timber trade negotiations. Future studies should develop and identify better methods to predict tree quality based on laser scanning information. Harvester data could be used in the development of these methods (Barth and Holmgren 2013).
An important topic for further research and development is how to obtain more accurate information on the quality characteristics of stands by using remote sensing data. Possibilities for wider use of harvester data should also be explored. Experiences on the use of harvester measurements as a reference for laser scanning data have been promising (Maltamo et al. 2019). However, the positioning accuracy of the data should be improved (cf. Hauglin et al. 2018). Multisource single-tree interpretation is also an interesting possibility for the future. It may combine harvester data with low-resolution remote sensing data, for example (Kankare et al. 2015).

Conclusions
The study compared, for the first time, Metsään.fi forest inventory data with the logging machine's data. It was found that Metsään.fi overestimated sawlog removals, especially the removal of spruce sawlogs in clear cuttings. Determination of the dominant tree species was accurate. Metsään.fi estimate does not realistically describe the amount of trees that are removed in actual thinnings. Further development of remote sensing-based forest inventory could rely more on data collected by harvesters, to decrease the costs of field inventories.