Tree species classification in a typical natural secondary forest using UAV-borne LiDAR and hyperspectral data

ABSTRACT Recent growth in unmanned aerial vehicle (UAV) technology have promoted the detailed mapping of individual tree species. However, the in-depth mining and comprehending of the significance of features derived from high-resolution UAV data for tree species discrimination remains a difficult task. In this study, a state-of-the-art approach combining UAV-borne light detection and ranging (LiDAR) and hyperspectral was used to classify 11 common tree species in a typical natural secondary forest in Northeast China. First, comprehensive relevant structural and spectral features were extracted. Then, the most valuable feature sets were selected by using a hybrid approach combining correlation-based feature selection with the optimized recursive feature elimination algorithm. The random forest algorithm was used to assess feature importance and perform the classification. Finally, the robustness of features derived from point clouds with different structures and hyperspectral images with different spatial resolutions was tested. Our results showed that the best classification accuracy was obtained by combining LiDAR and hyperspectral data (75.7%) compared to that based on LiDAR (60.0%) and hyperspectral (64.8%) data alone. The mean intensity of single returns and the visible atmospherically resistant index for red-edge band were the most influential LiDAR and hyperspectral derived features, respectively. The selected features were robust in point clouds with a density not lower than 5% (~5 pts/m2) and a resolution not lower than 0.3 m in hyperspectral data. Although canopy surface features were slightly different from original LiDAR features, canopy surface information was also important for tree species classification. This study proved the capabilities of UAV-borne LiDAR and hyperspectral data in natural secondary forest tree species discrimination and the potential for this approach to be transferable to other study areas.


Introduction
Tree species information is fundamental to our understanding of terrestrial ecosystems (Cazzolla Gatti et al. 2022).Detailed tree species composition and spatial distribution are essential for many applications, including forest resource dynamic monitoring (van Aardt and Wynne 2007), sustainable forest management (Dalponte, Bruzzone, and Gianelle 2012), ecosystem service quantification (Boerema et al. 2017), biodiversity assessment (Piiroinen et al. 2018).Tree species can also be used as an input for speciesspecific tree allometry models (Ørka et al. 2013), which is important for calculating vegetation growing stock volume, biomass or carbon storage estimation (Kirby and Potvin 2007;Korpela and Erkki Tokola 2006).Remote sensing-assisted tree species discrimination has been in development for nearly four decades (Fassnacht et al. 2016).It is cost-efficient and can be used to rapidly map the distribution of tree species over different space scales compared to traditional field tree species surveys (Masek et al. 2015).However, most remote sensors are carried on satellites or manned aircraft, and it is difficult to balance both spectral and spatial resolution (Goodbody et al. 2017).
In recent years, a flexible, convenient and affordable unmanned aerial vehicle (UAV) platform has gradually emerged, and it has shown the great capability for obtaining small-scale forest inventories (Colomina and Molina 2014;Hao et al. 2022).By collecting very high spatial or spectral resolution data, UAVs can increase opportunities for tree species classification (Zhang, Zhao, and Zhang 2020).Many small-sized and lightweight sensors carried on UAVs, such as RGB cameras, multispectral/hyperspectral imagers and laser scanners have been used for tree species discrimination (Schiefer et al. 2020;Zhong et al. 2020).Each of these sensors has its strengths, for example, hyperspectral data usually ranges hundreds of narrow continuous bands and can acquire more refined spectral responses (Schaepman-Strub et al. 2006).Some advanced UAV-borne hyperspectral data that can take into account both spectral and spatial resolution have emerged in the discrimination of tree species (Adão et al. 2017).Light detection and ranging (LiDAR) data can describe the three-dimensional architecture of the forest canopy, with important implications for tree species mapping (Coops et al. 2007).Research in the past decade has shown the advantage of combining hyperspectral and LiDAR datasets is that the spectral reflectance characteristics and spatial structure characteristics of tree species can be well represented (Dian et al. 2016;Zhongya et al. 2016).The state-of-art technology is to integrate UAV-borne hyperspectral imaging (UHSI) and laser scanning (ULS) data to achieve a double boost in spatial and spectral resolution so that more detailed background signals and characterization of canopies can be obtained in individual tree-based classification (Schiefer et al. 2020).
Revolutionary breakthroughs in data resolution for near-ground UAV systems are a double-edged sword.On the one hand, these very high-resolution measurements can capture rich information about tree canopies.For example, different wavelengths of hyperspectral data can reflect vegetation: 1) photosynthetic pigment content (e.g.chlorophyll, carotenoids and anthocyanins) (Clark and Roberts 2012), 2) water content (Asner 1998), and 3) leaf structure/ morphology (e.g.leaf area index, thickness of cell walls) (Fricker et al. 2015).High spatial resolution imagery can reflect crown texture information, such as the roughness of the crown surface, the size and arrangement of branches and leaves, the shadows inside the crown (Fassnacht et al. 2016).Highdensity ULS data can nearly completely reconstruct the 3D structure of trees, and further reflecting the morphology of branches and the distribution of foliage within a crown (Yifang et al. 2018;Michałowska and Rapiński 2021).The intensity information is also related to leaf type, area, orientation, clustering and gap (Korpela et al. 2010).But on the other hand, feature mining is by no means easy.Although numerous hyperspectral and LiDAR-derived variables have been proposed to describe these characteristics relevant to tree species classification (Ruiliang 2021), previous studies often considered only a few of these variables and lacked a comprehensive understanding of the characteristics of the data.
Moreover, such massive features brought by highresolution UAV data usually cause redundancy and multicollinearity, which can complicate classification (Yifang et al. 2018).For feature dimensionality reduction, many studies directly used the variable importance rankings in the decision tree model (Wai Tim et al. 2017;Zhong et al. 2020).However, this ranking does not take into account the collinearity between variables, resulting in pseudo variable importance (Wang et al. 2022).Some studies used correlationbased methods (Liu et al. 2017) or stepwise variable selection (Wang et al. 2022) or combined these two methods (Zhou et al. 2022) to address this issue.But a more efficient variable selection procedure deserves further consideration to increase our understanding of what specifically drives tree species discrimination.In addition, these UAV-derived features are affected by many factors, such as sensor performance, system errors and flight parameters (Rana et al. 2022), which are difficult to directly transplant to other research fields, which greatly limits the application of UAVs.
Natural secondary forests occupy approximately 46.2% of China's forests (Zhang, Dong, and Liu 2020).It is characterized by high tree species diversity, high canopy closure, and is dominated by broadleaved trees.Therefore, tree species classification in natural secondary forests is still a great challenge.Recent studies have conducted explorations of the combination of ULS and UHSI for tree species discrimination in different forest conditions, such as urban forest parks (Hartling, Sagan, and Maimaitijiang 2021), subtropical broadleaf forests (Qin et al. 2022) and mangrove forests (Cao et al. 2021).However, tree species classification in natural secondary forests has rarely been explored.
Therefore, given the above-mentioned problems, this study explored the utilization of a hybrid feature selection procedure for tree species classification in typical natural secondary forests by combining ULS and UHSI data.The specific goals of the study were as follows: (1) to generate detailed geometric and radiometric variables from ULS data and spectral and texture variables from UHSI, (2) to select the most valuable ULS and UHSI feature sets and combine these features for tree species discrimination, and (3) to assess and analyze the robustness of these selected features under a simulation application.

Study region and field measurements
The typical natural secondary forest situated in Maoershan Forest Farm, Shangzhi, Heilongjiang Province, Northeast China (127°29′-127°44′ E, 45° 14′-45°29′ N) was selected as the study region (Figure 1).It is dominated by broad-leaved forest and accompanied by a small number of coniferous plantations, such as Korean pine and larch.In this study, a total of 7 sites were established with an area of 9 hectares, and each site was evenly distributed with 3 sample square plots with a width of 30 m.
Field data were collected during August 2021 (leafon season).It mainly included tree species and location information of trees with diameter at breast height (DBH) greater than 5 cm.This study used a Qianxun SR3 Pro network real-time kinematic (RTK) to measure the tree trunk positions and four corner positions of each plot with centimeter-level accuracy.
The coordinates of trees relative to the plot boundaries were also recorded by a hand-held laser rangefinder, which were used as supplementary information for poor GNSS single tree positioning.The root mean square errors between the RTKpositioned sample trees and their georeferenced relative coordinates were about 0.3-0.5 m.In total, 11 main tree species were recorded, which can be divided into two types: 1) broadleaf species, including Mono maple (Acer mono), White birch (Betula platyphylla), Walnut (Juglans mandshurica), Manchurian ash (Fraxinus mandshurica), Cork tree (Phellodendron amurense), Dahurian poplar (Populus davidiana), Mongolian oak (Quercus mongolica), Basswood (Tilia), elm (Ulmus); and 2) conifer species, including larch (Larix olgensis) and Korean pine (Pinus koraiensis).

ULS data
ULS data for all sites were collected from September 4-6, 2021, using a Riegl VUX-1UAV laser scanner mounted on the DJI Matrice 600 Pro UAV.The flight altitude of the ULS system was between 120-370 meters above ground level due to the undulating terrain in the survey area, and the speed was 10 m/s.All flights were designed as crossing routes with 60 m spacing to guarantee point cloud quality.The scanning angle was interpreted within ±45°.The average point density was approximately 320-360 pulses/m 2 .For each site, the LiDAR data were preprocessed, and the main steps include point cloud denoising, point cloud filtering, point cloud rasterization, and height normalization.First, ground points were filtered and digital terrain models (DTMs) were generated (Guo et al. 2010).Subsequently, the Z-values of the original point clouds were subtracted from the corresponding DTMs to obtain the height relative to the ground, and canopy surface points (CSPs) were filtered by using a graph-based progressive morphological filtering (GPMF) algorithm (Hao et al. 2019).Finally, pit-free canopy height models (CHMs) with 0.1 m spatial resolution were interpolated by CSPs (Quan et al. 2021).Data preprocessing was performed by LiDAR360 V4.0 software and MATLAB R2021a.

UHSI data
UHSI data were acquired from September 8-10, 2021, on sunny clear-sky days around noon using a portable MicroCASI1920 pushbroom VNIR hyperspectral imager (https://www.itres.com/wp-content/uploads/2019/09/MicroCASI_1920_specsheet.pdf)mounted on the same drone as the laser scanner.The flying speed was 5 m/s, and the altitude was between 250-340 meters above ground level.The scan angle of the imager was 36.6° and the lateral overlap of the flight was set to 40%.Images with 288 spectral channels (400-1000 nm) and 0.1 m spatial resolution were generated.Considering excessive band redundancy, the original bands were resampled to 96 spectral bands (spectral width of 6.3 nm) by the data provider using the Bi-section method in RCX software (ITRES Research Ltd., Canada).Geometric calibration and radiometric correction of the hyperspectral images were also accomplished by RCX software.Atmospheric correction was done by ENVI 5.3 software FLAASH module.Finally, all hyperspectral images and CHMs were co-registered using control points collected manually from CHMs (the error was less than one pixel), and all preprocessed data were clipped with 5 m buffer plot boundaries for subsequent applications.The whole research workflow was shown in Figure 2.

Individual tree segmentation and sample organization
For visual detectability in both LiDAR and hyperspectral data, we chose a CHM-based individual tree segmentation method to automatically detect and delineate trees, which is named region-based hierarchical cross-section analysis (RHCSA).This approach is a top-down detection method, by detecting the relationship among the tree crowns in the horizontal plane to determine whether to split them (Zhao et al. 2017).The RHCSA was developed based on natural secondary forests, and previous studies have demonstrated its ability to segment individual tree crowns under different forest stand types (Yinghui et al. 2022;Zhen et al. 2022).
After segmenting individual trees, we matched each detected tree with the field samples with the assistance of the collected tree locations and RGB images synthesized from the hyperspectral image.Trees undetected and matched to more than one detected tree were removed.The laser points of each correct segment were labeled with the corresponding field measurement tree species for the derivation of LiDAR features.To eliminate the influence of shrubs and herbs, tree points with heights <2 m were removed, and we manually pruned the segmented point cloud to ensure the purity of the sample data.As a result, 904 sample trees of the 11 species were retained for further classification (Table 1).The examples are displayed in Figure 3.

ULS feature extraction
ULS features were extracted from all labeled individual tree laser points and classified into five categories: 1) height-related variables, 2) intensityrelated variables, 3) density-related variables, 4) echorelated variables, and 5) tree crown metrics.In addition to some basic metrics that can reflect the point distributions and radiometric information, a series of crown metrics describing the crown size and crown shape were calculated from the high-density ULS data.The crown size-related LiDAR metrics included crown length, diameter, volume, surface area, projected area, and perimeter.The crown shape was expressed as the roundness of the crown projection in the horizontal direction, and as the contour shape of the crown outer edges in the vertical direction (Gao, Huiquan, and Fengri 2017).Based on our previous study of crown profile modeling using ULS data (Quan et al. 2020), the basic parabola equation was used to fit the crown profile and the two equation coefficients were estimated to quantify crown shape.Moreover, the ratio of some crown metrics was also calculated to describe the crown shape.A full description of all LiDAR features is shown in Supplementary Material Table S1.

Hyperspectral feature extraction
For hyperspectral features, we first overlaid the vector crown boundaries delineated from the CHMs with the hyperspectral image and then calculated the hyperspectral features of each crown.The mean spectral reflectance of each band (b1-b96) within each tree crown was calculated, and the components 1-3 of the minimum noise fraction (MNF) rotation were also used to band dimensionality (Green et al. 1988).The third category of variable is spectral vegetation indices (VIs), which express the combination of bands through different mathematical formulas (Ali et al. 2017).Herein, a total of 53 VIs were calculated, which were divided into three categories: 1) structural, 2) pigment (such as chlorophyll, anthocyanin/ carotenoid), and 3) physiological.Then, 24 texture features were computed from the RGB bands, and   S2.To reduce the effect of shadowing and nonvegetation pixels, we masked all pixels with normalized vegetation index (NDVI) < 0.5, nearinfrared reflectance (NIR) < 0.2 and CHM <4 m (Piiroinen et al. 2018).Both LiDAR and hyperspectral features were generated using MATLAB R2021a.Standardization of the feature dataset was conducted to convert features to zero mean and unit variance before classification (Rana et al. 2022).

Feature selection
This study utilized a hybrid approach for selecting features in the classification model.By combining correlation-based feature selection (CFS) with the optimized recursive feature elimination (RFE) algorithm, important and meaningful variables were selected as inputs for the subsequent classification model.RFE is a backwards selection method that continuously finds the optimal subset of features (Gregorutti, Michel, and Saint-Pierre 2017).In this study, the RFE algorithm was optimized by retraining a random forest (RF) model and recomputing feature importance in each iteration (Figure 4).According to the flowchart, the feature selection procedure was conducted with the following steps: (1) Train an RF model on the initial set of features and compute the permutation importance (also known as the mean decrease accuracy (MDA)); (2) Cluster highly correlated features using Spearman's correlation coefficient (|r| ≥ 0.9) and retain (Rana et al. 2022) the variable with the highest ranking of importance in each cluster; (3) Retrain an RF using the retained features and compute the permutation importance; (4) Eliminate the least important variables from the current set of features; (5) Repeat steps (3-4) until only one feature remains and output the remaining variables in each recursion and the corresponding overall accuracy (OA); Ultimately, the minimum number of variables that met the accuracy requirements were chosen for the final classification.

RF classification and accuracy assessment
RF methodology is a mature and popular machine learning algorithm and is frequently used in tree species classification (Belgiu and Drăgu 2016).The RF is a nonparametric model that ensembles an abundance of single independent decision trees (Breiman 2001).The advantages of the RF method are that it is computationally fast, internally verified to calculate the error matrix, less sensitive to overfitting and is able to derive the importance of variables (Ruiliang 2021).It can also handle unbalanced data with missing values compared to other classifiers (Pal 2005).In this study, the RF model was executed with scikitlearn v0.24.2 (Pedregosa et al. 2011), and a RandomizedSearchCV function (Bergstra, James, and Yoshua 2012) was used to determine the optimal hyperparameter combination for the classifier with stratified fivefold random cross-validation.
The precision (user's accuracy, UA), recall (producer's accuracy, PA) and F1-score were used to evaluate the classification results for each tree species.The macro and weighted average of these three metrics, as well as the OA were used to evaluate the overall results (Mäyrä et al. 2021) by performing 10-fold cross-validation.McNemar's test was also employed to decide if the classification results differed significantly from one another (McNemar 1947).All feature selection and classification were implemented using the Python programming language.

Feature robustness analysis
We conducted simulation experiments to analyze the robustness of features under different UAV imaging conditions.Given the flexibility and autonomous opportunity of UAVs, the pre-defined flight height and speed directly influence the spatial resolution of the collected data (Ruiliang 2021).Therefore, we randomly sampled ULS returns with different percentages (i.e.75%, 50%, 25%, 10%, 5%, and 1%) (Hao et al. 2022) and resampled UHSI data at spatial resolutions of 0.2-0.8m by nearest neighbor assignment method to stimulate the reuse robustness of the selected features for classification in different spatial resolution data.Features were recalculated from the thinned point clouds and resampled images.
In addition to the spatial resolution, the spatial distribution of the point cloud is also affected by ULS sensors.Since not all laser scanners have multiple echoes capabilities and imagery sensors can mainly capture the surface structure of the canopy (such as photogrammetric point clouds), we simulated canopy surface information using the GPMF algorithm.The same LiDAR features were extracted from CSPs generated by GPMF and used for classification to test the robustness of features in point clouds distributed on the canopy surface.A specific introduction to the GPMF can be found in Hao et al. (2019).For the evaluation of features, in addition to the RF importance, we also used a paired T test to determine if the features derived from the simulated data are significantly different from that of the original data.

Feature selection
The spectral curve for each of the 11 species ranging 400-1000 nm was shown in Figure 5.All tree species had similar reflectance in the visible spectral range.The F.M. (see Table 1 for acronym meaning) had the highest NIR reflectance, while the P.A. had significantly lower reflectance than other species.The Q. M. and T.I., U.L. and P.D. and L.O. and A.M. were three pairs of classes that exhibited high visual similarity from Figure 5.The results showed that there were similar spectral reflectance values among many tree species, and even conifers and deciduous trees were difficult to distinguish based on spectral values.
A high correlation emerged among many of the LiDAR and hyperspectral features.After CFS clustering, 30 out of 90 ULS features and 35 out of 176 UHSI features were retained.The specific recursive elimination processes of the retained features and the corresponding accuracy are shown in Figure 6.When the number of features was decreased using the optimized RFE, the OA changed gradually at first and then rapidly decreased.The final features were selected by weighing the number of features and the OA.Eventually, 9 LiDAR features and 11 hyperspectral features were selected for subsequent classification.These features included 3 heightrelated variables, 1 intensity-related variable, 1 density-related variable, 2 echo-related variables, 2 tree crown metrics, 2 spectral transformation variables and 9 vegetation indices (Table 2).

Comparison of classification accuracies
The results of the classification (Table 3) indicated that combining LiDAR and hyperspectral features significantly increased the OA from 60.0% (LiDAR) and 64.8% (hyperspectral) to 75.7% (McNemar's test, p < 0.05).Using hyperspectral data provided significantly higher accuracy than using LiDAR data (McNemar's test, p < 0.05).P.K., L.O. and F.M. obtained the highest F1-score (86.7%, 81.8% and 88.2%) when using LiDAR, hyperspectral and combined data, respectively.P. A. has the lowest classification accuracy, regardless of the data used.
Confusion matrices were used to gain more insight into the details of misclassification, as shown in Figure 7.When using only LiDAR features, some of the P.D. were wrongly classified as B.P. and F.M.;      data were used, and the optimal classification results were achieved for P.K. when only LiDAR data were used.The distribution of tree species prediction results using the best feature combination is shown in Figure 8.

Importance of selected features
The permutation importance and ranking for classification based on ULS and UHSI data of the 20 selected variables are presented in Figure 9.
Imean_single was the most important feature, followed by Last_all_ratio, VARI_red_edge and H99.Seven out of the 10 top-ranked features were VIs.Tree crown metrics (CD_std and RCD) and small percentiles of height and intensity variables (H1 and I5) ranked low overall.
The differences in the 4 top-ranked features among the 11 tree species are plotted in Figure 10.The figure shows that the 4 features varied among the 11 tree species.The values of Imean_single (Figure 10a) and H99 (Figure 10d) for P.K. were clearly lower than that for other species.F. M and J.M. were most easily separated by VARI_red_edge (Figure 10c).Similarly, P.B. and P. A. were most easily separated by Last_all_ratio (Figure 10b).At times, it was difficult to distinguish species based on a single feature; for example, B. P. and F.M. had similar Imean_single values.Multiple features complement each other, facilitating the separation of multiple tree species.

Robustness analysis of features
Figure 11 shows that the OA of CSPs was consistent with the original point cloud (100% density), as well as the 75%-5% density points (McNemar's test, p > 0.05).When the point cloud density was reduced to 1%, the OA decreased rapidly by approximately 10%.For different hyperspectral spatial resolution images, the OA decreased significantly when the pixel size was larger than 0.3 m (McNemar's test, p < 0.05).The OA dropped by approximately 30% when the spatial resolution was reduced to 0.8 m.
By comparing the features selected from CSPs and the original point clouds, we found that the H1, Hmad, RCD and CD_std were retained in the CSP classification (Supplementary Material Table S3).As the density of the point cloud decreased, the selected features hardly changed, except for features with a point density of 1%.When the density of points was less than 10%, the crown metrics and density variables were significantly different from the original point clouds.The hyperspectral features were less affected by the spatial resolution, especially the vegetation index (Supplementary Material Table S4).

Contributions of ULS and UHSI features
Recently, the aggregate of advanced UAV and sensor techniques has been explored as an accurate means of characterizing forest structure at fine spatial/spectral scales.However, ULS and UHSI data with large amounts of spatial/spectral information also pose challenges for natural secondary forest species classification studies.In this study, we integrated ULS and UHSI data and performed comprehensive feature mining to discriminate 11 tree species in natural secondary forests.A total of 266 variables were extracted from ULS and UHSI data, comprehensively covering the geometric, radiometric, spectral and textural characteristics of trees.When using the same data sources, previous studies ignored the tree crown metrics (Cao et al. 2021;Qin et al. 2022).Herein, we implemented a rigorous and complete variable selection pipeline by combining a CFS and an optimized RFE algorithm.Compared to the studies by Zhou et al. (2022), we refitted the RF model and obtain a new importance ranking at each variable elimination.The results in this study indicated that after feature selection, using approximately 10 variables achieved a classification accuracy similar to that using approximately 30 variables (see Figure 6).
Based on the importance ranking of the selected features (Figure 9), we found that except for texture features and original band reflectance (b1-b96), other feature categories contributed significantly to classification.This might be because this study area is dominated by broad-leaved trees, and the texture characteristics of broad-leaved trees are similar, thus so they contribute less than spectral features, which is the same as the findings of Dan et al. (2015).The MNF components converted by the original band reflectance effectively represented the characteristics of the original band (Cao et al. 2021).Comparable to Yifang et al. (2018), we found that Imean_single ranked among the top few variables, illustrating that radiometric metrics are crucial in species classification.The Last_all_ratio was the second most important variable, and Ørka, Naesset, and Martin Bollandsås (2010) also reported the importance of the last returns for tree species discrimination.This finding implied that these two LiDAR variables represented the geometry structure characteristics of trees well.Moreover, the most important hyperspectral features were the VARI_red_edge, a structure-dependent vegetation index that was found to be virtually unaffected by atmospheric effects (Anatoly et al. 2002).We also found that the RVSI was important for classification, and identified inter-and intraspecies trends based on spectral changes in red-edge range (Merton 1999).This result is consistent with the results of Shi et al. (2018) and Qin et al. (2022).Additionally, 6 vegetation indices related to chlorophyll and carotenoid concentrations (DD, BGI, TCI, NPQI, NPCI and PRI) also made important contributions to tree species classification.Qin et al. (2022) and Run et al. (2021) also report the important role of PRI.Moreover, two crown metrics, including CD_std related to crown size and the RCD related to crown shape also showed optimistic importance.This demonstrated that crown morphological characteristics varied by tree species, and this difference was detected using LiDAR data.Run et al. (2021) also confirmed the important role of crown metrics in tree species detection.
Based on our results, we demonstrated that LiDAR and hyperspectral data combined can effectively increase the classification accuracy, which is in line with many early studies (Dian et al. 2016;Shi et al. 2018).This is mainly because the integration of LiDAR point clouds and hyperspectral images can provide both structural and spectral information.Similar to the study by Shi et al. (2018), we also found that hyperspectral features outperformed LiDAR features base on overall classification accuracy.One reason might be that the morphological differences among broad-leaved tree species are not as obvious as spectral differences.In this study, the OA of separating 11 tree species reached 75.7%, which was slightly inferior to that of some other studies; for instance, Shi et al. (2018) obtained 83.7% OA for discriminating five mixed temperate forest tree species, and Cao et al. (2021) obtained 97.22% OA for classifying seven mangrove tree species.However, Qin et al. (2022) mentioned that the increase in tree species has a negative impact on classification accuracy, we achieved high accuracy when faced with multispecies classification.For  example, Liu et al. (2017) obtained a 70.0%OA when mapping 15 common urban tree species, and Zhong et al. (2020) discriminated eight tree species and obtained a 66.34% OA in a subtropical natural forest in southwest China.The accuracy of the broad leaf species F.M. and J.M. and the conifer species L.O. and P.K. were all over 85%.Overall, we achieved a reasonable accuracy using advanced ULS and UHSI for individual tree species classification in typical natural secondary forests.

Robustness and transferability of features
The greatest value of UAV in forest resource surveys is its repeatable operation (Hao et al. 2022).Therefore, the re-use of UAV features deserves attention.Yifang et al. (2018) evaluated the robustness and transferability of the RF model and selected LiDAR features in two study sites.However, limited by the cost of data acquisition, most studies use only a single study area, and due to the specificity of UAV systems and the subjectivity of flight parameters, it is difficult to directly transplant UAV-derived features to other study areas (Rana et al. 2022).To overcome this, it is common practice to use point clouds with different densities and images with different spatial resolutions to test the transferability of models or features.This study additionally employed canopy surface points to evaluate the performance of features, which provides a detailed reference on how to balance cost and efficiency when UAVs are re-used.The results showed that even if the features used were not exactly the same as the original point cloud features, tree species classification with CSPs produced comparable accuracies.Although Ghanbari Parmehr and Amati (2021) believed that photogrammetry point clouds and LiDAR point clouds were highly consistent, that may be due to their study area was located in an extremely sparse forest.The comparable classification accuracy produced by CSPs proved that structural differences in the canopy surface have the ability to discriminate tree species.Yifang et al. (2018) also pointed out that crown top layer is the main source of structural differences between tree species.Features of CSPs are not affected by the penetration properties of the sensor and are more robust when using different types of scan data.In future UAV applications, lower cost photogrammetric point clouds similar to CSPs can be considered for tree species identification, but due to the lack of intensity and echo-related information of photogrammetric point clouds, perhaps a leaf-off condition of a low-density forest is more suitable for it (Liu et al. 2021).
For the resampled LiDAR data, the selected features were robust from 100% (104.43 pts/m 2 ) to 25% (26.11 pts/m 2 ) of the original point cloud density.Almost none of the crown metrics could be extracted when the point cloud density was less than 10% (10.44 pts/m 2 ).However, even a 5% density (5.22 pts/m 2 ) point cloud resulted in classification accuracy comparable to that of the original data.Since the point cloud density is affected by flight altitude and speed, an appropriate data density is one to which practitioners pay close attention.Wang et al. (2022) also demonstrated that the classification accuracies and feature selection were minimally affected by point density.Since features at different resolutions were all computed based on the mean of the crown object, hyperspectral features were hardly affected.However, as the resolution decreased, the details of the tree crown were lost, and the contribution of some features was weakened, resulting an influence on the classification accuracy.This result also demonstrated the value of high-resolution data in tree species classification.Although we did not resample the spectral bands by design, five VIs retained in this study were associated with the multispectral red, blue, green, and red-edge bands (GI, VARI, BGI, NPCI, PRI).Zhong et al. (2020) similarly demonstrated the importance of the GI feature when using multispectral data for tree species discrimination.In summary, the feature selection results and classification results of resampled data and CSPs demonstrated the robustness of selected ULS and UHSI features and the potential for applicability in other study areas.

Limitations and future work
In addition to the data sources explored above, the accuracy of UAV-assisted tree species discrimination is affected by many factors, such as the size of samples, individual tree segmentation, canopy shading and soil background effects (Ruiliang 2021).In this study, the RF classifier was used to overcome the problem of unbalanced sample size (Hartling, Sagan, and Maimaitijiang 2021), and the results showed that some small samples of species such as L.O. and P. K. achieved satisfactory classification accuracy.
However, misclassification still occurred for some tree species, such as P. A., A.M. and T.I.In addition to the effect of the small sample size, this is probably due to the similarities in spectra and morphology of different species, and variability within the same tree species.These variations may be due to differences in stand conditions (such as temperature, soil condition or topography), as well as tree competition, which introduces additional uncertainties in the classification (Yifang et al. 2018).In this study, trees were recorded at trunk locations, but they were identified based on the UAV view of the canopy, which may have led to errors when matching the measured trees with the detected trees.Additionally, due to overlapping crowns of adjacent trees, inaccurate tree segmentation leads to loss of crown architecture.Although we manually corrected the point clouds of samples, improving the efficiency and accuracy of tree segmentation in natural secondary forests for greater classification accuracy is still a potential topic for future research.In this study, the soil background mask on CHM was extracted with a threshold of 4 m during tree segmentation, and the thresholds of NIR and NDVI were set to filter the shadows inside the crown during object-oriented hyperspectral feature extraction, which was similar to the approach used by Piiroinen et al. (2018).
This study site adequately represents the ecological conditions and environmental settings of the Maoershan Forest Farm, and includes the vast majority of tree species in the northeast.It can be generalized in similar natural secondary forests in northeast China.Further experiments can be conducted again in other types of forests in the future.In addition, leafon/off data in different seasons can be integrated to explore changes in tree features across the phenological period to improve species classification accuracy.Collecting more samples and exploring deep learning models that do not rely on feature selection to improve classification accuracy are also future research goals.

Conclusions
In this study, we extracted hundreds of structural and reflective/radiative variables using ULS and UHSI data and screened the optimal subset of features for classifying 11 common tree species in typical natural secondary forests.The results demonstrated that the combination of ULS and UHSI effectively improved tree species discrimination accuracy compared to using either of them alone (OA increased by 15.7% and 10.9%, respectively).The mean intensity of single returns and the visible atmospherically resistant index for red-edge band were the most influential LiDAR and hyperspectral derived features, respectively.A simulation application showed that our selected features have good robustness in point clouds with different densities and images with different spatial resolutions.When using LiDAR for tree species classification, the point cloud density is recommended no less than 5 pts/m 2 , and the canopy surface points are also a good choice and when using hyperspectral data, the spatial resolution recommends not to be lower than 0.3 m.This study provides a comprehensive feature mining and tree species discrimination pipeline using ULS and UHSI data and demonstrates the potential transferability of UAV dataassisted tree species classification in other study areas.

Figure 1 .
Figure 1.(A) the location of Maoershan forest farm; (b) the Sentinel-2 image of Maoershan acquired on 21 May 2016 and the location of UAV sites; (c) LiDAR and hyperspectral data from a UAV site and photos of UAV and sensors.The projection coordinate system used for all data in this study is WGS 1984 UTM Zone 52 N.

Figure 2 .
Figure 2. Research workflow of this study.

Figure 3 .
Figure 3. Examples of data visualization for 11 tree species, each sample includes its 0.1 m resolution CHM (top left), 0.1 m resolution RGB image synthesized from the hyperspectral image (bottom left) and the point cloud (right).

Figure 4 .
Figure 4. Flowchart of the feature selection.

Figure 5 .
Figure 5.The mean and ± 1 standard deviation of the spectral reflectance (×10000) for all 11 tree species.

Figure 6 .
Figure 6.The number of features used in RFE versus the overall accuracy based on 10-fold cross-validation.
A.M. and T.I. and F.M. and U.L. were also misclassified.Almost half of L.O. were classified as F.M., but when hyperspectral features were used, the situation was completely reversed.When using only hyperspectral features, the P.D. and T.I. were wrongly classified as the U.L. and U.L. were misclassified as B.P. Regardless of the features used, P.A. were always wrongly classified as B.P. Combining LiDAR and hyperspectral features effectively reduced both omission and commission errors for tree species, except for A.M., L.O. and P.K.The optimal classification results were achieved for A.M. and L.O. when only hyperspectral

Figure 8 .
Figure 8. Tree species distribution map using combined ULS and UHSI data for the same UAV site example as in Figure 1.

Figure 9 .
Figure 9.The permutation importance of the final variables for classification.

Table 1 .
Number of samples for 11 species and corresponding abbreviations.
each band has 8 features.A full description of these hyperspectral features is shown in Supplementary Material Table

Table 2 .
List of selected features and descriptions.

Table 3 .
Classification results using different feature combinations.