INFLUENCE OF FLIGHT CONFIGURATION USED FOR LiDAR DATA COLLECTION ON INDIVIDUAL TREES DATA EXTRACTION IN FOREST PLANTATIONS

In the last decades, several studies have been conducted aiming to the extraction of forest variables from LiDAR data. Although such studies have indicated great potential, the high cost associated with LiDAR data acquisition process inhibits the technology to become an operational technique for forestry applications. The cost of a LiDAR survey, as for any other data collection techniques, is composed of fixed and variable costs. The variable portion, which can be optimized, is dependent, among other factors, on the number of flight hours. The flight time is mainly dependent on the flight configuration used for the survey. The objective of this paper is to investigate the impact of using different operational parameters on different species of forest plantations, to provide inputs for an adequate cost-benefit analysis. The different configurations are evaluated in terms of the number of individual trees automatically detected, individual height and volume, using the forest inventory as the reference data. The experiments have shown that compatible results are obtained using different configurations with flight time varying by a factor of 3.5 to 10 times. Also, for a given point density, preference should be given to the configuration based on a lower flying height.


INTRODUCTION
In the last decades, a growing interest in the use of data obtained through the Light Detection And Ranging (LiDAR) technology has been observed in forestry applications, due to the possibility to obtain three-dimensional information in a direct and accurate way, of both canopies and ground.Various studies have demonstrated the potentiality of LiDAR data collection to obtain important forest parameters (NAESSET, 1997;MEANS et al., 2000;Persson et al., 2002;Popescu et al., 2002;ZONETE, 2009).
LiDAR instruments are active remote sensory systems based on Light Amplification by Stimulated Emission of Radiation (LASER) technology, built to obtain in a direct way the three-dimensional coordinates of a large quantity of mapped points on the ground.The basic principle of operation in such systems is the measurement of distance from the sensor to the mapped point on the ground.Georeferencing of the mapped point is allowed by the positioning and orientation system, composed by an Inertial Navigation System (INS) integrated with a Global Navigation Satellite System (GNSS).
Among the studies developed using the LiDAR technology to obtain forest variables, two main approaches can be observed.The first approach aims to obtain the forest variables at a sample plot level, using metrics obtained from the LiDAR point cloud (NELSON et al., 1988;NAESSET, 1997;MEANS et al., 2000;MALTAMO et al., 2006).Metrics obtained from LiDAR point cloud are usually data of the canopy height distribution such as different height percentiles, average height, standard deviation, maximum height, among others.To create the models relating the variables of interest and the metrics obtained with LiDAR, plots are established in the field to obtain the forest variables.A second approach described in literature aims to obtain the forest variables using data of individual trees obtained from LiDAR data (Hyyppä;Inkinen, 1999;Persson et al., 2002;Popescu et al., 2002;Leckie et al., 2003).The last approach is therefore more dependent on a denser point cloud, since the identification of individual trees should be possible.One should note that the minimum density and configurations required for different species is still topic of research.
The density and accuracy of the point cloud obtained from a LiDAR system, as well as the survey cost, depend on a set of parameters that can be set up during the mission planning and/or by the system operator.These parameters include: (I) flight height, (ii) scanning angle, (iii) aircraft speed, (iv) LASER pulse frequency, (v) scanning frequency and (vi) laser beam divergence.Flight height and scanning angle have significant influence not only on the density of collected points but also on the survey costs.The smaller the resulting scanned strip width (through reduction of the flight height and/or of the scanning angle), the larger is the collected point density.In this case, the number of strips required to cover the studied area is larger, thus increasing the cost of the survey.Scanning frequency and platform speed are generally defined with the purpose of ensuring a regular distribution of points on the ground.In other words, they are defined so that the space between points collected along flight direction is as similar as possible to the space between points collected perpendicularly to the flight direction.The laser beam divergence angle of commercial systems is typically 0.2 mrad.This parameter together with the flight height are the main factors affecting the size of the LASER footprint on the ground.The pulse repetition frequency expresses the number of pulses emitted per second.This is the most important parameter for defining the efficiency of a given equipment.Systems are always set up to operate at its maximum pulse repetition frequency.
The investigation of the influence of the utilized flight configuration parameters on the quality of the extracted forest variables is of extreme importance.According to the opinion of Brazilian forest companies, one of the biggest obstacles to the operational use of LiDAR technology in forestry applications is the high cost associated.One way of reducing the cost would be the use of a configuration allowing for a smaller number of strips to cover the study area.Although investigations on that matter have not been conducted in Brazil yet, authors of the international scientific community have been carrying out investigations concerning the influence of the flight configuration parameters on the quality of the extracted forest variables.However, the number of studies is small and they often use simulated data and/or do not consider the real operational conditions (YU et al., 2004;LOVELL et al., 2005;GOODWIN et al., 2006).
The objective of this paper is to analyze the influence of different flight configurations used for the extraction of forest parameters, based on the approach at the individual tree level for different species of forest plantations in the intrinsic conditions of the Brazilian reality.The different configurations are evaluated in terms of number of trees automatically detected, individual height and volume, while having the forest inventory data as reference.

Characteristics of the study area
The study area belongs to KLABIN S/A company.It is located in the municipality of Telêmaco Borba (Figure 1), geographically located in the eastern-central region of the state of Parana, characterized by a Cfa climate type (humid subtropical).It is considered a regional industrial center, with the wood industry and paper production being the most important job creators and tax revenue providers for the municipality.

LiDAR data Collection
LiDAR data acquisition was performed by the Institute of Technology for Development (LACTEC), using an Airborne Laser Scanner equipment type ALTM 2050, from the Canadian manufacturer Optech inc.This equipment has a pulse repetition frequency of 50 kHz, variable scanning angle from 0 to ±20º and operational altitude from 200 to 2,000 m (nominal).This system has the ability of recording the first and last return of each emitted LASER pulse.The vertical precision of the point cloud, as specified by the manufacturer, is better than 15 cm for a 1,200 m flight height and better than 25 cm for a 2,000 m flight height (considering 1σ).The horizontal precision is defined by 1/2000 x flight height (1σ).
The utilized flight configurations are described in table 2. The influence of different flight heights and scanning angles was investigated.Table 2 also presents the number of strips and the flight duration for each of the studied configurations, which is proportional to the cost of survey.The flight survey for the raw data collection was performed in the period from August 14 th to 28 th , 2009.After the data collection, the basic data processing was carried out to obtain the threedimensional coordinates (E, N, Z) for the first and last return of each LASER pulse emitted.Basic processing involves estimation of the trajectory to determine the position and orientation of the sensor and the combination of these with the instantaneous scanner angle, the range, and the system calibration parameters, through the LiDAR point positioning equation to obtain the three-dimensional coordinates of the points.The PosPac software (Applanix) was used for trajectory processing and Realm Survey Suite software (Optech) was used for point cloud processing.

Field Data Collection
The forest inventory of the studied stands was performed by the company KLABIN S/A, who provided the data for this research work.Field data were collected in the period from November 24 th to December 10 th , 2009.Rectangular plots were installed in the field, with a size of 25 x 12.5 m (area of 312.5 m 2 ), containing approximately 50 trees each.In the A stand (Pinus taeda -2004) 14 plots were installed, in the B stand (hybrid Eucalyptus -2006) 27 plots, while in the C stand (Pinus taeda -1997) 19 plots were installed.Circumference at breast height (CBH) and height of the first 10 trees of each plot were measured.A canvas tape was used to measure the CBH.Tree height was measured with the help of a laser distance meter and a Suunto hypsometer.To obtain the height of the remaining trees in the plot, a hypsometric equation (in the form shown in equation 1), which was adjusted using height and diameter at breast height (DBH) (CBH measured and converted into DBH) of the measured trees, was utilized.These equations were provided by KLABIN S/A together with the biometric equations (in the form shown in equation 2).The coefficients of the hypsometric and biometric equations for the three stands are shown in table 3 Where: h : height of the tree;  , and 2  : coefficients of the model.
A topographic field survey was carried out in order to georeference the inventory data to allow for its correspondence to the LiDAR data.Three trees positioned on the corners of each plot had their coordinates determined.The topographic survey was performed using closed traverses, oriented by the points surveyed with a dual-frequency Global Positioning System (GPS) and a total station.

LiDAR point cloud classification
For the classification of the points on the ground, the progressive densification algorithm (AXELSSON, 2000), which is available in the TerraScan software supplied by Terrasolid, was utilized.In this method, the points on the ground are classified by iteratively building a triangulated surface model.The initial model is obtained from the selection of some local low points that for sure hit on the ground.Iteration parameters are used to add further points to the initial model.This algorithm has been widely used by the scientific community (KAARTINEN et al. 2012;YU et al., 2004) as well as in commercial applications with satisfactory results.
It is worth to mention that the quality of the determined tree heights and, as a consequence, their volume are affected by the accuracy of the terrain model.The reason is that the tree height information is obtained by subtracting the elevation of its position on the ground (i.e., projection on the ground of the highest point on the tree) from the elevation of the highest point on the tree.

Automated tree identification method
First return data were used for the automated identification of trees, since these returns are usually originated from the canopies.Prior to the detection of trees, the height (above the ground level) of the point cloud originated by the first return was determined.For this purpose, a Digital Terrain Model (DTM) was first generated from the points classified as ground, using the Delaunay triangulation known as Triangulated Irregular Network (TIN).Next, points from the first return (vector data) were projected on the DTM (TIN model) to obtain the corresponding terrain elevation.Then, the height of each first return point was obtained by subtracting the corresponding terrain elevation from the elevation of the investigated point.
For the automated identification of trees, a local maximum algorithm, for a defined neighborhood, was utilized.Such algorithm has been widely used in previous literature (KAARTINEN;HYYPPÄ, 2008;KAARTINEN et al., 2012).In this algorithm, each point is analyzed with respect to its neighbors.If the analyzed point is higher than all the points within the defined neighborhood, this point is identified as the tree top.The tree height is established by the height of the point defined as its top.The neighborhood of the point in analysis can be defined by a fixed or by a variable search radius.The variable search radius is used when the canopy diameter presents correlation with the height of the tree.In this case, an equation relating to height of the tree and diameter of its canopy is previously adjusted using manually selected samples.To reduce commission errors (detection of nonexistent trees) caused by noise in the data, an interpolated and smoothed raster height model, named Digital Canopy Model (DCM), was used in place of the original point cloud.To do so, first a TIN was created from the points of the first return using the height information to generate the model.Then, this model was converted to a regular grid (raster), with cell size of 20 cm (resolution compatible with the highest data density available).To smooth the model, a low-pass filter (3x3), shown in figure 2, was used.
Although the use of a smoothed model is more appropriate for the identification of trees, the height value might be underestimated.To overcome this problem, at the location of each of the identified trees, the maximum height value was searched in the original point cloud, within the search radius, to redefine the height value of each of the detected trees.

Evaluation of flight configuration influence
The influence of the flight configuration was evaluated in terms of the number of trees automatically detected as well as in terms of the obtained values for the individual tree height and volume/ha, using inventory data as reference.
To evaluate the quality of the results from the automated tree detection procedure, the percentage of omission errors (equation 3), the percentage of commission errors (equation 4) as well as the percentage of correctness (equation 5) were calculated.(5) To evaluate the quality of the determined height for individual trees, the height values obtained from LiDAR data were compared against the inventory tree height data, which have been obtained as previously described.For that purpose, the mean error, the standard deviation and the root mean squared error (RMSE) were calculated through equations 6, 7, and 8, respectively.N : number of trees.
To evaluate the combined influence of the quality of results from the automated tree identification procedure and the individual heights derived from LiDAR data, the individual volume was calculated using the volume equation provided by KLABIN S/A for each forest species investigated in this research.Volumes determined using the variables extracted from LiDAR data were compared to the volumes from the forest inventory.The volumetric error shown in equation 9 was calculated for this purpose.

RESULTS
Using the previously described methodology, the trees were automatically detected and individual tree heights and volumes were obtained for the three studied stands.During the experiments' implementation it noticed the presence of wind during the data collection for configurations II and III, in the B stand (hybrid Eucalyptus).
Besides causing reduction in the height of some trees and modifying the relative position between the tops of the trees in the stand, the presence of wind during the data acquisition also causes discrepancies between data obtained from different flight strips.This is due to the fact that the intensity and/or direction of the wind is different during the collection of each one of the strips, resulting in different tree position from one strip to another.Figure 3a shows an example of plots in the overlapping region of the B stand.One can note that the height's model generated in the overlapping region is quite noisy (Figure 3b), a condition that harms an adequate identification of the trees.To mitigate this problem, the overlaps were removed, leaving only data from one of the strips in the overlapping region (Figure 3c).This procedure was carried out for all the configurations and stands, in order to ensure adequate comparison between results from the various configurations using different approaches in the different stands.In the next sections, the results obtained for the stands A, B and C are presented.

Results of A Stand (Pinus taeda -2004)
Table 4 shows the results obtained from the automated tree identification procedure and the results from the comparison between the tree heights and volumes derived from the LiDAR data and the ones from the inventory data.In this table, the various configurations were listed in descending order of the mean density of the first return points.
Considering the eight studied configurations, it is possible to verify that, for the A stand, configurations I, II and III presented correct tree identification results above 88%, RMSE (%) of the individual height lower than 6.8% and absolute volume error lower than 13%.Configuration I had a flight duration of approximately 200 min and configuration III had a flight duration of 52 minutes, almost 4 times lower.
Configurations III and IV resulted in very similar density values, obtained using different flight heights and scanning angles.The same occurred for the configurations IV and VI, which produced similar point cloud densities obtained from different flight heights and scanning angles.In these cases, although the point cloud densities obtained from a flight height of 2,000 m (configurations V and VI) were a little higher than the ones obtained from the 700 m flight height (configurations III and IV), the performance of the last ones was better.Overall, it is possible to notice that better results (in terms of automated tree identification, individual height and volume) are obtained when increasing the point cloud density.Table 4. Results of the automated tree identification process and the estimates for the individual tree height and volume for the A stand using the inventory data as reference.Tabela 4. Resultados da identificação automática de árvores e das estimativas de altura individual e volume para o talhão A tendo como referência os dados do inventário.

Results of B Stand (hybrid Eucalyptus -2006)
Table 5 shows the results of the automated tree identification procedure and the estimates of individual height and volume for the different configurations in the B stand.Here again, in table 5 the various configurations were listed in descending order of the mean density of the first return points.
One can observe in table 5 that configurations I and V presented results for overall correct tree identification above 90%, individual height estimates with RMSE (%) lower than 3% and absolute volume error lower than 6.5%.One should note that configuration I has a flight duration of approximately 200 min, configuration IV a flight duration of approximately 21 min, and configuration V a flight duration of approximately 59 min.The ratio between these different configurations, considering the flight duration factor, varies from 3.5 to 10.
It is also possible to notice in table 5 that configurations III and V, which were obtained using different flight heights and scanning angles, resulted in similar density values.In the same way, configurations IV and VI also produced point clouds with similar densities.It can be noted that, differently from what happened in the A stand, in this case the configuration V resulted in a better performance than configuration III, which was obtained using a lower flight height.This can be explained by the presence of wind during the data collection using configurations II and III in the B stand, which degraded the performance of configuration III.When comparing configurations IV and VI, similar to what happened in the A stand, configuration IV (obtained from a lower flight height) was the one that presented better performance.Overall, one can note that better results (in terms of automated tree identification, individual height and volume) are obtained when increasing the point cloud density.

Results of B Stand (Pinus taeda -1997)
Table 6 shows the results obtained using the different configurations in the C stand.Results are again listed depending on the mean density of points from first return.Configuration I, which presented the best results, showed 81.2% of correct tree identification, RMSE (%) of 9.86% for individual height estimates and 12.33% of absolute error in the volume estimates.For comparison, configuration III provided 77.3% of correct tree identification, RMSE(%) of 10.26% for individual height estimates and 17.10% of absolute error in the volume estimates.It is clear that, with exception of volume, the other analyzed variables presented similar results.It should be noted that the ratio between the flight durations of these two configurations is 4. Table 6.Results of the automatic tree identification process and the estimates for the individual tree height and volume for the C stand using the inventory data as reference.Tabela 6. Resultados da identificação automática de árvores e das estimativas de altura individual e volume para o talhão C tendo como referência os dados do inventário.

DISCUSSION
Figures 4, 5 and 6 show the performance in terms of tree identification, individual height, and volume, respectively, using the different configurations in the three investigated stands.In the automated tree identification, data processing of the B stand (hybrid Eucalyptus) was the one that gave the best performance followed by the A stand (Pinus taeda -2004) and the C stand (Pinus taeda -1997).The superior performance for hybrid Eucalyptus, when compared to Pinus taeda, can be explained by the uniform height of the trees and to the canopy shape observed in this stand.The improved performance of the younger Pinus taeda is mainly due to the larger space between the canopies (canopies are still far from each other), which makes the identification through the utilized algorithm much easier.In the case of individual heights, the hybrid Eucalyptus presented again better performance than the Pinus due to the smaller variation of canopy heights from the center top to extremities.When comparing the different ages of Pinus, the younger presented in most cases worse performances due to the lower probability of the laser pulse hitting their top point.For volume (combined influence of number of trees and individual height), the best results were obtained for hybrid Eucalyptus, since it presented better performance both in terms of automatic tree identification and individual height.When comparing the results from stands with Pinus at different ages, the younger presented superior performance (except for configuration VIII), because the automated tree identification provided better results, which plays a stronger impact than height accuracy on the volume estimation.

CONCLUSIONS
In this paper, the influence of different flight configurations on the extraction of forest parameters was evaluated, at the individual tree level, for different species of forest plantations.More specifically, results were analyzed in terms of the quality of the results from the automated tree identification procedure as well as individual tree height and volume estimation, using the forest inventory as reference data.The general objective of this study was to provide inputs for a cost-benefit analysis in order to minimize costs related to LiDAR survey, through the identification of the operational parameters, which are able to produce acceptable results for a given LiDAR system and tree species.
A general analysis of the configurations studied for the species Pinus taeda shows that it is possible to obtain satisfactory results using configuration III (flight height 700 m and scanning angle 11°), which requires 4 times less flight time than configuration I.For the species hybrid Eucalyptus, configuration V (flight height 2.000 m and scanning angle 3,5°) presented comparable results to the more conservative configurations which generate larger point cloud densities.One should note that configuration V requires a flight time 10 times smaller than configuration I.
The analysis in terms of point density has verified that, in the hybrid Eucalyptus stand, densities of 6.7 ppm 2 in the configuration I and of 1.45 ppm 2 in the configuration IV produced very similar results.Similarly, for Pinus taeda (2004), densities of 6.7 ppm 2 in configuration I, and of 2.1 ppm 2 for configuration III produced compatible results.For Pinus taeda (1997), the best results were obtained with densities between 7.1 ppm 2 in configuration I and 2.2 ppm 2 in configuration III.In the comparative analysis of the results obtained from the different studied species it could be noted that hybrid Eucalyptus had the best results, due to a more uniform height of the trees and the canopy shape when compared to Pinus.These characteristics led to a better performance in terms of automated tree identification and individual height estimation.Within the different ages of Pinus investigated, results indicated a better performance for the younger Pinus plantation.This improved performed can be mainly attributed to the larger space between the canopies, since this characteristic enhances the performance of the automated tree identification algorithm.

Figure 1 .
Figure 1.Study area location.Figura 1. Localização da área de estudo.Data collected from three stands, planted with Pinus taeda L. and hybrid Eucalyptus (Eucalyptus grandis W. Hillex Maiden x Eucalyptus urophylla S. T. BLAKE), were used in this work.The main characteristics of the stands are shown in table 1.

:
tree height obtained from LiDAR data; inventory Height : tree height obtained from forest inventory;

Figure 4 .
Figure 4. Correctness (%) in the automatic tree identification for the stands A, B e C.Figura 4. Acerto (%) na identificação automática de árvores para os talhões A, B e C.
Figure 4. Correctness (%) in the automatic tree identification for the stands A, B e C.Figura 4. Acerto (%) na identificação automática de árvores para os talhões A, B e C.

Figure 5 .
Figure 5. RMSE (%) of the individual tree height for the stands A, B e C. Figura 5. REMQ (%) da altura individual para os talhões A, B e C.

Figure 6 .
Figure 6.Absolute error (%) in the volume for stands A, B, and C.Figure 6. Erro absoluto (%) no volume para os talhões A, B e C.

Figure 6 .
Figure 6.Absolute error (%) in the volume for stands A, B, and C.Figure 6. Erro absoluto (%) no volume para os talhões A, B e C.

Table 3 .
Coefficients of the hypsometric and biometric equations provided by KLABIN S/A for the investigated stands.Tabela 3. Coeficientes das equações hipsométricas e biométricas disponibilizados pela KLABIN S/A para os talhões estudados.

Table 5 .
Results of the automated tree identification process and the estimates for the individual tree height and volume for the B stand using the inventory data as reference.Tabela 5. Resultados da identificação automática de árvores e das estimativas de altura individual e volume para o talhão B tendo como referência os dados do inventário.