Individual tree detection from unmanned aerial vehicle (UAV) derived point cloud data in a mixed broadleaf forest using hierarchical graph approach

ABSTRACT Studying individual trees is a common way that scientists employ to study forests and estimate forest parameters. In this study, a graph-based approach was developed for detecting individual trees in a broadleaf, complex forest region based on UAV-derived point cloud data. Horizontal cross-sections at different heights were applied to the Canopy Height Model (CHM) to extract initial candidates for graph nodes. The graph was processed in multiple steps, and individual treetop locations were detected based on graph nodes’ properties. The impact of various parameters, such as minimum area of connected components and minimum tree heights, on the performance of the developed method was investigated. The evaluation step demonstrated the potential of the proposed graph-based method for individual tree detection in a complex forest region in Mazandaran, Iran. In particular, the graph-based method obtained Precision, Recall, and F1-score values of 0.64, 0.73, and 0.68, respectively. Furthermore, the intercomparison with the well-known and most used Local Maximum (LM) suggested the applicability of the proposed method. After point cloud generation, the proposed method was implemented entirely in Python using open-source packages, which increases its applicability for other scholars and managers. The source code of the proposed algorithm can be found at https://github.com/Seyed-Ali-Ahmadi/Graph-based_ITCD.


Introduction
Forests are valuable natural resources with considerable benefits to humans and the surrounding environments and provide suitable habitats for many flora and fauna. Consequently, protecting and conserving these natural resources have been determined as a definitive priority to achieve sustainable development goals (Nations United, 2015). Sustainable forest administration needs a profound perception of forest patterns with consistent spatial and temporal dimensions (Gatziolis et al., 2015). Therefore, acquiring accurate and reliable information about forest status is essential to generating explicit forest inventory and supporting and promoting informative decision-making (Guerra-Hernández et al., 2018).
Generally, forest inventory attributes are collected via plot-level approaches (Tesfamichael et al., 2010) and tree-level (Schmitt et al., 2016;Wulder et al., 2008). In the first approach, the estimated attributes are commonly an average from a set of trees in a designated area composing a spatially continuous stand . In contrast, the latter focuses on measuring the desired attributes (i.e. volume and biomass) for each individual tree (Leite et al., 2020). Characterizing forests at the tree level improves forest inventory capabilities and also extends the applicability of such detailed information in other disciplines, such as ecology, biodiversity, and wildlife (Strîmbu & Strîmbu, 2015;. In particular, tree-level information can directly enhance accurate forestry, forest regulations, detailed biomass estimation, and forest growth modeling (Koch et al., 2006). For many years, tree variables were collected through field campaigns and in-situ observations. Ground-based methods such as Bitterlich's angle count technique (Scrinzi et al., 2015) are used to determine the basal area or forest density. But their main drawback is that they require fieldwork. Although these types of in-situ observations provide more reliable and accurate individual tree information, conducting regular field surveys is resourceintensive, time-consuming, and impractical for largescale studies (Tang & Shao, 2015). Therefore, it is appealing to incorporate remote sensing data and technology to fulfill these tasks.
Remote sensing has been contributing to forest monitoring and management by providing various data sources, including optical, thermal, Light Detection and Ranging (LiDAR), and Synthetic Aperture Radar (SAR), with different characteristics (Berni et al., 2009;Ghasemi et al., 2012;Ghorbanian et al., 2021;Grabska et al., 2020;Kwong & Fung, 2020;. Among all remote sensing data, over the last two decades, airborne LiDAR data were the dominant type employed for individual tree attributes estimation since they provide rapid and accurate information with adequate spatial detail about forest variables (Lim et al., 2003). Unmanned Aerial Vehicles (UAV) that acquire optical data with very high spatial resolutions have emerged as a feasible and costeffective alternative for forestry as a result of current technological advancements in sensors and digital image processing systems (Getzin et al., 2014;Granholm et al., 2015;Zhang et al., 2016). Notably, the development of the Structure from Motion (SfM) technique allows the extraction of fast and reliable photogrammetric point cloud data from UAV-based optical data through image matching approaches (Díaz-Varela et al., 2015;Zarco-Tejada et al., 2014). Subsequently, the extracted point cloud or its derived raster features (i.e. Digital Elevation Model (DTM), Digital Surface Model (DSM), and Canopy Height Model (CHM)) can be efficiently utilized to detect individual trees and predict their attributes of interest within forest communities (Baltsavias et al., 2008;Chen et al., 2021;Puliti et al., 2017).
Individual Tree Detection (ITD) serves as the fundamental step for tree position determination, species classification, crown delineation, biomass estimation, and other individual tree attributes prediction for a detailed forest inventory (Chen et al., 2021;Huang et al., 2018). In this regard, several algorithms were introduced to detect individual trees from remote sensing data. For instance, the Local Maximum (LM; (Popescu & Wynne, 2004)), region growing (Erikson, 2003), watershed-based (Huang et al., 2018), point cloud-based segmentation (W. W. Li et al., 2012), object-based classification (Selim et al., 2019), graphbased (Strîmbu & Strîmbu, 2015), and deep learningbased (Weinstein et al., 2019) methods have been developed and incorporated for ITD with diverse remote sensing data. For instance, Weinstein et al. (2019) developed a semi-supervised deep learning workflow to detect individual trees and their crowns using airborne RGB and LiDAR data and achieved recall and precision values of 69% and 60%, respectively. Likewise, L. Wang et al. (2004) introduced a two-stage approach comprising edge detection and marker-controlled watershed segmentation for crown delineation followed by treetop detection using high-resolution aerial imagery, obtaining an average agreement value of about 75%. More recently, researchers were attracted to applying these methods to remote sensing data acquired by UAVs (Berra, 2020;Chen et al., 2021;Huang et al., 2018;Mohan et al., 2017;Olli Nevalainen et al., 2017). In particular, Mohan et al. (2017) applied the LM method to UAV-derived CHM using the SfM technique for automatic ITD and obtained an accuracy of approximately 85%. In another study, the LM method was employed for ITD with UAV-derived rasterized photogrammetric point cloud data (Olli Nevalainen et al., 2017), and the ITD accuracies varied between 40% and 95% over 11 test plots, depending on plot characteristics.
ITD algorithms operate differently in various forest communities, and their accuracy depends on the forest structure. In particular, ITD algorithms perform better in simple (i.e. homogenous with regular patterns) and open forests than in primeval forests with dominant broadleaf species (Duncanson et al., 2014;D. D. Li et al., 2016). Consequently, it is essential to develop methods for ITD in complex forest communities (Berra, 2020). Additionally, to the best of our knowledge, no study was conducted for ITD using graph-based approaches and UAV-derived photogrammetric point cloud data. Therefore, the main objective of this study is to develop a new hierarchical graphbased approach for ITD using cost-effective UAVderived photogrammetric point cloud data that could provide accurate ITD results in a complex forest area. The developed method was tested over a complex forest community with old-growth broadleaf species and uneven-aged trees. Visual assessment and statistical evaluations were performed to explore the capability of the proposed method, and the results were compared with the most used ITD algorithm, LM.

Study area
The northern part of Iran is covered with Hyrcanian forest spreading between the Caspian Sea and the Mountainous areas of the Alborz range (Ghorbanian et al., 2020). The study area covers a portion of the northern primeval temperate forests with high species diversity, located at 52° 34′ N and 52° 02′ E in Mazandaran Province, Iran, with an elevation of about 27 meters below the mean sea level covering 16,843 m 2 . Different broadleaf tree species, including Ulmus minor, Quercus castaneifolia, and Populus caspica, exist in this area. The study area includes a mixture of uneven-aged trees, from separate young trees to old-growth trees with biomass occlusion, making it a relatively complex forest community. With a mean annual temperature and accumulated annual precipitation of 16.4°C and 997 mm, this area is classified as a humid-climate region based on De Martonne's climate classification (Miraki et al., 2021). Figure 1 shows the RGB orthophoto and geographical location of the study area, along with collected field data (i.e. observed individual trees' locations and species) through a field survey.

UAV images and field data
A consumer-grade Phantom 4-Pro UAV, carrying an RGB camera with 20-Megapixel image acquisition capability, was employed to collect raw images from the study area. The UAV flew at the altitude of 100 m to capture images with 2.5 cm spatial resolution, and both forward and side overlaps were adjusted to 90%. All images were captured with a dimension of 3000 by 3000 pixels, and the focal length was set to 8.8 mm. The UAV was equipped with a GPS antenna, enabling a rapid connection to GPS satellites for an automatic flight over a pre-defined acquisition route. The camera was installed on an accurate three-axis stabilizer with tolerance values of ±0.03° to avoid vibration during the acquisition step. Additionally, the image acquisition was conducted in a suitable condition of clear sky with no wind to avoid probable disturbances (Cardil et al., 2019).
Furthermore, the field data, i.e. individual treetop positions, were collected through a field survey between October and November 2018. For this purpose, first, several stations were accurately positioned with Gintec G10 Global Positioning System (GPS) using a Real-Time Kinematic (RTK) approach. Afterward, the azimuth-distance approach was applied to measure the position of each individual tree by employing a Suunto hand-bearing compass and a laser distance meter (Panagiotidis et al., 2017).

Methodology
This study proposes a graph-based technique to detect and locate individual trees in a complex forest using point clouds obtained from an imaging UAV. For this purpose, after creating the CHM from the UAVderived point cloud data, horizontal cross sections in different heights are generated, through which a graph is constructed where the nodes of the graph, after a set of post-processing, will present the tree locations. The proposed method attempted to identify individual tree locations from 3D point cloud data derived from UAV-RGB images through the SfM technique. In this regard, the graph theory concepts, i.e. nodes, edges, degree of nodes, and parent-child relationship, were considered to detect individual trees. Accordingly, the proposed method was divided into three steps: 1) preprocessing, 2) graph processing, and 3) accuracy assessment, the detail of which are provided in subsequent sections. Figure 2 presents the flowchart of the proposed method for ITD through a hierarchical graph approach.

Preprocessing
After visual inspection to remove low-quality images (Brovkina et al., 2018), the remaining images were ingested into the Agisoft Metashape software for further processing. This software integrates photogrammetric concepts along with the Structure from Motion (SfM) technique to generate point cloud data from a set of optical images (Iglhaut et al., 2019). SfM is a photogrammetric and computer vision technique that tries to reconstruct the 3D structure of the environment or objects from multiple 2D images. In this technique, the images are matched together, and then, through mathematical equations of bundle adjustment, the 3D location of each matched keypoint is extracted, resulting in a 3D reconstructed scene. A dense point cloud is the main product of the SfM technique (Iglhaut et al., 2019). Herein, the accuracy and the quality of the alignment and dense cloud generation steps were set to "high". Although highquality dense cloud generation demands high computational resources, it reasonably guarantees accurate and reliable point cloud data generation. The final point cloud includes 13,631,574 points, with 444 points per square meter density and 0.1 m point spacing.
Thereafter, the point cloud was employed to produce the Canopy Height Model (CHM) of the study area, which contains information about the tree height above the ground surface (Panagiotidis et al., 2017). To this end, two other layers of the Digital Elevation Model (DTM) and Digital Surface Model (DSM) were first computed from the point cloud data, allowing the generation of the CHM by subtracting DTM from DSM (Fankhauser et al., 2018;Hao et al., 2019). For DTM generation, the ground points were identified based on the Simple Morphological Filter (SMRF) algorithm (Pingel et al., 2013), which has been implemented in the Point cloud Data abstraction Library (PDAL). PDAL is an open-source library for manipulating and processing point cloud data in Python (PDAL Contributors, 2018) with a backbone of Point Cloud Library (PCL). The SMRF is a four-step algorithm that employs the painting technique to fill the empty cells of the initial minimum surface grid (Klápště et al., 2020). This algorithm includes four tunable parameters for ground point extraction, and their brief explanations and set values are summarized in Table 1. The value of each parameter was determined based on several trial-and-error attempts and visual interpretation to ensure a reliable ground point extraction.
Later, the DTM was produced at 15 cm spatial resolution by employing a simple interpolation step, which selects a point, among all points, within a given radius with a minimum value (i.e. elevation), followed by a gap-filling interpolation with a window size of 40. Similarly, the DSM was also produced by applying the mentioned interpolation approach to the original point cloud data and selecting a point with maximum value (i.e. elevation). Finally, the CHM was computed by subtracting the DTM from DSM for the individual tree detection (Daryaei et al., 2020;Tanhuanpää et al., 2016). The schematic procedure of image acquisition, DSM creation, and CHM computation by subtracting DTM from DSM is depicted in Figure 3. In order to reduce the sharp fluctuations of the forest canopy in the final CHM product, a 3-by-3 smoothing Gaussian filter was applied. The smoothed version of CHM reduces noise and better-estimates tree top locations (Tanhuanpää et al., 2016).

Graph processing
Graph construction. After generating the CHM in the previous step (Section 3.1), multiple horizontal crosssections were defined between the minimum and maximum heights of the CHM with equal height intervals. The minimum height (H min ) could be set to zero or a pre-defined height value to not consider trees with heights lower than this threshold value (i.e. smaller or younger trees that are shorter than this threshold will be ignored). Likewise, the maximum height (H max ) was equal to the height of the tallest tree in the study area, which was automatically extracted from the CHM. Initiating from the lowest horizontal crosssection, other cross-sections were gene-rated by successively adding the height step to the previous crosssection. The height of each layer was calculated by Equations 1 and 2.
where H i; j ð Þ l represents the height of layer l, H max and H min are the maximum and minimum height values extracted from the CHM, respectively, L is the maximum number of layers, and step is a user-defined parameter. Then, the cross-sections were calculated by thresholding the CHM based on the height of each layer using Equation 3.
where CHM g is the CHM, which was smoothed by a Gaussian filter and CS i; j ð Þ l represents the crosssection of the layer l, which was created by cutting CHM with its corresponding layer height, H i; j ð Þ l . Each cross-section resulted in a binary image showing the mask of trees taller than H i; j ð Þ l . Figure 4 presents a schematic view of different cross-sections after applying height layers (Equations 1-3) to a point cloud data of a single tree. Each cross-section resulted in single or multiple regions (i.e. intersections of height layers and CHM) used in later processing steps. These regions are referred to as connected components, denoted by C, and one or more connected components could appear in each cross-section. In order to better address these connected components, we used a double index notation which shows both the number of height layers (l) and the number of connected components (m). C l m points to the m-th connected component in the l-th height layer. In Figure 4,   Afterward, all the connected components were numbered and removed in each cross-section if their area was less than a certain threshold, defined by the user as the second input parameter (Equation 4).
where T area is the threshold of the area which defines the minimum possible area for a connected component (C l m ). Figure 5 shows four cross-sections that were obtained by cutting a hypothetical CHM at four different heights. These cross-sections were denoted by CS 1 ; CS 2 ; CS 3 ; and CS 4 (Figure 5e). It can be seen that the first cross-section ðCS 1 Þ includes larger connected components corresponding to lower CHM heights ( Figure 5a). As the height increase (CS 2 ; CS 3 ; and CS 4 ), connected components become smaller, and even some of them split (Figure 5b, 5c, and 5d). Additionally, those small regions that did not satisfy the area condition (Equation 4; marked by yellow crosses) were removed from each cross-section.
As illustrated in Figures 4 and 5, each connected component C l m in a certain height layer (l), is encircled inside a larger connected component C lÀ 1 m 0 from the previous layer with a lower height value and contains some smaller connected component(s) C lþ1 m 00 from the next layer. In each layer, connected components were numbered randomly; thus, m' and m" denote different connected component numbers. C lÀ 1 m 0 from one layer lower than the current layer was defined as the "parent" node for C l m ; and the smaller C lþ1 m 00 of the next layer was considered as "child(ren)" node(s) for C l m . This parental relationship was summarized in graph G, which has nodes (N) and edges (E). The graph will be denoted by In the proposed method, connected components define nodes of the graph N l m À � ; also, there is an edge E connecting each parent to their child E C l m ; C lÀ 1 m 0 À � . Every node has only one parent but can have single or multiple children. Figure 6a shows a graph constructed on a CHM based on some initial values for step and area. It also demonstrates connected components (nodes) and edges between them. Figure 6b simplifies the illustration of Figure 6a by showing different parts of the graph, such as nodes and edges, in a graph-based format.
Graph pruning. When a parent node splits into multiple child nodes, it means that some trees that were  merged in lower heights of the canopy are splitting and developing into individual tree top locations. As mentioned earlier, nodes and edges of the graph in the proposed method were constructed based on connected components and their parental relationships. This relationship created a large and complex graph that requires some simplifications in order to be more interpretable. Graph pruning is the process of making the graph more concise. This step used the concept of "degree of node (DoN)" in graph processing. DoN refers to the number of edges connected to a node, entering or exiting that node. Figure 6 illustrates a simplified visualization of how a graph is constructed and pruned. All nodes in the constructed graph do not carry useful information. Only conjunctions nodes, i.e. nodes that have DoN greater than two, are the important nodes with the necessary information for further processing steps. In other words, it is important for us to find those parent nodes that "separate" into children (DoN>2). Thus, the middle nodes with a degree of two (only one input edge and one output edge) were deleted, and the main parent was connected directly to the final child. This procedure was defined as the "pruning" part of the proposed method, which simplified the complex graph into a more abstract one. Equation 5 describes how the pruning operation was performed.
where N l m is that node of graph which is currently being inspected, E C l m ; C lþ1 m 00 À � is the edge connecting node N l m ;C l m to its child C lþ1 m 00 ; and E C lÀ 1 m 0 ; C l m À � is the edge connecting node N l m 's parent, C lÀ 1 m 0 , to itself, C l m . These two edges and the inspected node (N l m ) were removed, and a new edge was replaced within them. considers removing those nodes which have DoN = 2. In the pruned graph, only the nodes with degree = 1 are candidates for treetop locations. The 2D centroid of degree-one nodes on the CHM space (x and y) is calculated to be the location of the tree. The pseudocode of our methodology is provided in Table 2. It represents the step-by-step procedure of the proposed method in a summarized form.

Accuracy assessment
The assessment of the performance of the developed method included two aspects: (1) the positional accuracy assessment of the ITD concerning the true location of trees; (2) the hyperparameter analysis to examine the impact of various parameters on the detection results. According to the comparison of true tree locations and model detection results, they were classified into four cases: true positive (TP), true negative (TN), false negative (FN), and false positive (FP) (see, Figure 7). The number of correctly detected trees is denoted by TP, the number of incorrectly detected trees (those locations where no actual tree exists but the model tells there is one) is denoted by FP, and the number of incorrectly undetected trees (those locations where a tree exists but the model cannot detect any) is denoted by FN. TN was not applicable in this study because it is defined as those places where no tree exists, and also, the model finds no trees. Based on these definitions, four accuracy evaluation metrics were calculated, namely Precision, Recall, F1-score, and Root Mean Square Error (RMSE), to investigate the detection performance of the proposed method with different settings over the study area. Precision (see, Equation 6) is the ratio of the number of correctly predicted tree locations to the number of all detected trees detected by the method. Recall (see, Equation 7) is the ratio of the number of correctly predicted tree locations to the number of all actual existing trees. Practically, Precision (considers FP) and Recall (considers FN) are occasionally in contradictions, which is why necessitates the utility of other metrics such as the F1-score (see, Equation 8). The F1-score is the harmonic average of Precision and Recall (Powers, 2011), and its optimum value satisfies both high Precision and Recall simultaneously and thus has been considered in previous studies (Pleşoianu et al., 2020;Xi & Hopkinson, 2021). 9) is a measure of showing how much a predicted value deviates from its real one. Here, horizontal RMSE was used to estimate how close the predicted tree locations were to the actual tree locations. The specific formula for each accuracy evaluation index is provided in Equations 6 to 9: (2) Calculate vertical cross-sections with level-cutting on CHM.

Precision
RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where N denotes the number of trees available in the region. Figure 7 illustrates the above-defined concepts.
After detecting the individual trees, a k-dimensional tree (KDTree) was constructed based on tree locations (Maneewongvatana & Mount, 2002). Afterward, a Sparse Distance Matrix (SDM) was calculated, which contains all the possible distances between each detected tree coordinate (Tree i ) and all field survey coordinates of trees (GT j ). The nearest Tree i to each GT j was the one that had the minimum distance to GT j , and was also no farther than 5 m. Equations 10-13 show how the SDM was calculated and how to find whether a tree is correctly detected or not. SDM i;j ¼ jjTree i À GT j jj 2 ; i ¼ 1; 2; 3; . . . ; N trees j ¼ 1; 2; 3; . . . ; N GTs � (10) Where SDM i;j refers to the distance between i th detected tree and j th true tree, N trees is the number of detected trees and N GTs is the number of field surveyed trees. Equations 10-13 describe whether each Tree i or GT j is a TP, FP or FN. Tree i will be a correctly detected tree if it had the minimum distance to field data in comparison to all nearby detection points while its distance was also less than 5 m. It will be assigned to no GT if its distance to all surrounding GTs is more than 5 m. And finally, a GT will remain alone if there is was tree detected within its 5 m radius.

Results
Processing UAV images and obtaining point clouds was performed inside Agisoft Metashape. All other steps of our proposed method were implemented by open-source packages in the Python programming language. Figure 8 shows the results of the preprocessing steps (a) point cloud of the study area, (b) the result of ground filtering for DTM generation without interpolation for gap filling, and (c) the generated CHM from both DTM and DSM data. The CHM was smoothed by a Gaussian filter to remove unwanted noisy peaks. Then, it was inserted into our graph-based individual tree detection algorithm.

Hyperparameter examination
To be able to detect individual trees, the proposed method requires knowing the "height step" and "minimum area" values. The correct selection of these two parameters will affect the final accuracy of the results. One reliable way of understanding the effect of multiple parameters on the outputs of a method is to perform a grid search operation on the unknown parameters. By defining a reasonable range for both user-defined parameters, a grid search step was implemented to find the best set of (minimum area, height step) with respect to the final accuracy. Minimum Area values were allowed to change between 5 and 1200 pixels with a variable step of 5 pixels for values below 100 pixels, 10 pixels for values between 100 and 300 pixels, and 50 pixels for values larger than 300, resulting in 58 different steps values. Height Step was also allowed to change between 0.1 m and 1.0 m with a step size of 0.1 m resulting in 10 different values. For each couple of (minimum area, height step) out of 580 different values, the results of the algorithm were obtained, and their accuracy assessment parameters such as TP, FP, and FN were calculated. Based on these parameters, F1-Score was obtained for each couple and put into a table (see, Figure 9). The couple with the highest value for F1-Score is chosen as the best set of parameters. Figure 9 shows the results of the grid search visualized into an array with different colors. Each array element refers to a set of parameters (columns: minimum area, rows: height step) with a specific color that illustrates the F1-Score values. The best set of parameters from the grid search step is highlighted by a black box in Figure 9. (20, 0.8) and (25, 0.8) were the best configurations for the minimum area and height step according to their F1-Score values, which were both 0.68.
Moreover, looking at Figure 9 revealed that the accuracy of the outputs was more stable to "height step" than "minimum area". This can be understood Figure 9. The grid search results for two parameters, "minimum area" and "height step". Best configuration is highlighted by the black box. by the fact that colors (F1-Score values) are more stable along the vertical axis (height step) but have a variation between 0.25 and 0.6 along the horizontal axis. To be more specific, the average standard deviation of F1-Score values along the vertical (height step) and horizontal (area step) axes is 0.01 and 0.1, respectively. Further investigations revealed that selecting smaller values for the "minimum area" parameter will result in more accurate outputs. On the other hand, generally speaking, selecting the "height step" value had minor effects on the final accuracy.

Graph-based individual tree detection
Using the best set of parameters from the grid search step, the graph-based individual tree detection technique was applied over the study area. Detection results are illustrated in Figure 10. Since the number of detected trees may be more or less than the actual number of existing trees, it was necessary to find which detected tree belongs to which GT. Furthermore, some detected trees may be left alone because no GT existed near them (and vice versa). In order to find the correct matches for both detected trees and GTs, an SDM was constructed on their locations. SDM stores the pairwise distances between all detected trees and all GTs. All the distances larger than 5 m were ignored, and through an iterative procedure, each detected tree was assigned to its nearest GT. Some GTs or detected trees may find no matches due to their larger-than-5 m distances. By this procedure, TP, FP, and FN values were determined. Figure 10 shows truly detected trees by a black line connecting detections to their corresponding GT, FPs with a red cross on a red circle, and FNs with a blue plus on a black circle.

Impact of tree species
Besides the actual tree locations, our field survey information included the species of each tree. All the 205 trees in our study area were categorized into 10 different species. Figure 11 shows the histogram of tree species. Ulmus minor, followed by Quercus castaneifolia and Populus capsica were the major species existing in this region.
In order to find out whether accuracies depend on the species or not, we investigated the species of the truly detected trees (TPs) by our algorithm. Figure 11 shows the number of detected and undetected trees, categorized by their species. Major species are also visible in this plot. By further analyzing the results, we understood that Acer Velutinum and Morus, with three samples each, were completely detected. Moreover, we found out that Crataegus, a minor group with five samples, followed by Quercus Castaneifolia, the second major group in our study area with 49 samples, have the highest detection ratios (the number of detected trees divided by the number of not-detected trees) of 4 and 3.08, respectively. For better analyzing the results, the horizontal Root Mean Square Error (RMSE) of each detected tree was also obtained per species. Figure 12 helps estimate how accurately our proposed method can find the true tree locations corresponding to their species. The RSMEs were categorized for each tree species and ranged from less than 0.5 m to about 2.5 m. These results show that the method can satisfactorily predict tree locations in a complex forest region. There were a total of 205 trees in our study area, which were mixed together in a closed canopy forest with complex structures. This issue made the detection problem harder and decreased the RMSE values.

Inspection of generated graphs
The core of our methodology lays inside the idea of graph pruning. The pruned graph is easier to interpret and can directly result in treetop locations. Figure 13a shows the generated graph for the best scenario (i.e. minimum area of 25 pixels and height step of 0.8 m). It consisted of 643 nodes and 642 edges. Figure 13b shows the pruned version of the same graph in Figure 13a. The nodes and edges were decreased to 348 and 347, respectively. Approximately half of the nodes were removed, and the graph size was decreased. By looking at the pruned graph, one can understand that all the remaining nodes have either DoNs greater than one, i.e. they are parent nodes, or DoNs equal to one, i.e. they are tree tops. Figure 13c also verifies the previous sentence by showing the histogram of the degrees of nodes in both original and pruned graphs. It is visible that in the pruned graph, all nodes with DoN = 2 were removed.

Comparison with the local maximum (LM) ITD method
To be able to evaluate how our method was working, the LM method was chosen as a basic and widely used method, and the results were compared with it. LM is the most used method for individual tree detection in the literature (Apostol et al., 2020;Gu et al., 2020;Miraki et al., 2021;Plakman et al., 2020;Yang et al., 2020), and thus, it is a suitable choice for comparison. Since the LM method has a couple of user-defined variables ("kernel size" and "minimum height"), a similar grid search procedure was implemented for LM, and the best parameters are selected based on the highest F1-Score values which are shown in Figure 14.
Grid search over the LM parameters showed that a kernel size of 4 pixels and a minimum height of 4 m could result in higher accuracy. Although the results were robust with respect to minimum height, changing kernel size had apparently changed the accuracies. Additionally, in a region with diverse tree height values, the minimum height's effect would be highlighted and would affect the results. Figure 15 demonstrates the results of applying the LM method to our study area with the best set of parameters. With 150 detected trees (TP = 150), 55 undetected trees (FN = 55), and 99 trees predicted incorrectly (FP = 99), the F1-Score value of the LM method was 0.02 less than the proposed method (F1-Score = 0.66). Moreover, the LM approach achieved the RMSE = 2.19 m, which was 0.18 m more than the RMSE of the proposed method, and more trees were incorrectly detected. Likewise, it seems that the topcenter part of the region included a group of FPs, while the lower-right part of the region contained the group of FNs. It might prove that the region is so mixed, making it complex for tree detection algorithms. Table 3 summarizes the detection results of our graph-based technique in comparison to the widely used LM model. The graph-based technique could better overcome the spiky crowns and resulted in a higher F1-Score, and the tree center locations were closer to the GT locations, which is very appealing for a complex forest region.

Discussion
This study proposes a graph-based methodology to locate individual trees in a mixed broadleaf forest. The results of applying the proposed method to a complex forest were validated by comparing with a set of gathered field survey information, including the tree locations and species, and the detections were found to be accurate. In order to enhance the validation procedure, the Local Maximum (LM) method was selected as the comparison technique. LM has been recognized as a reliable ITD method extensively employed in the literature as the base method for tree detection (Gyawali et al., 2022;Shimizu et al., 2022;Zhen et al., 2022). The obtained results illustrated that the proposed method slightly improved the ITD in the study area. Other novel techniques, such as methods based on deep neural network architectures, have also shown very acceptable results. For instance, Schmohl et al. (2022) had a dataset of approximately 10,000 trees in an urban area and developed 3D convolutional networks. Likewise, Pleșoianu et al. (2020) employed over 10,000 tree samples in different locations to develop a deep learning ensemble model. The main limitation in utilizing such approaches for comparison is that these approaches as supervised methods require a large amount of reference data for appropriate training procedures (Schmohl et al., 2022), which was not available in this study. Moreover, their training procedure is timeconsuming, requires sophisticated hardware, and thus, prevents the applicability of such approaches on common computers.
One advantage of the proposed method is that it was entirely implemented on open-source packages, and everyone can implement such in the Python programming language. It did not require sophisticated hardware and performed comparably with the implemented LM method. Moreover, the ground filtering part of the proposed method, which was completely implemented by the PDAL package, performed faster and more accurate than the commercial LAStools software that has been primarily used in the literature (A. P. D. Corte et al., 2020;Dalla Dalla Corte et al., 2020;Olli Nevalainen et al., 2017). In particular, the ground filtering process using PDAL was completed in 16.82 seconds, while it took 30.86 seconds in the LAStools software.   The proposed method was also investigated from the operative point of view by calculating the computational time. In this regard, the grid search (see, Section 3.1) was conducted once more, and the computation time for each pair of parameters was calculated. The lowest computational time was observed to be approximately 4 seconds, and the best pair (i.e. obtained the highest accuracy) took about 8 seconds to complete the ITD process. This shows that the proposed method can detect individual trees in a few seconds, making it an appealing choice for such tasks.
The final goal of every forest modeling approach is to predict or estimate statistical forest parameters such as biomass, growth rate, etc. (Corte et al., 2022). The next step after detecting trees in a forest is to delineate their canopy to be able to extract tree attributes or apply 3D modeling tasks. The detected treetops are usually regarded as seeds to perform tree crown delineation (Gu & Congalton, 2021). Since the proposed method was working on connected components in each height layer, it has the capability to use these connected components for accurate tree crown delineation. In particular, the extracted treetops (latest children nodes) can be regarded as seeds to grow the region. The connected components in each height layer, which showed the boundary of each node in each layer, can be regarded as approximate boundaries to enhance the accuracy of delineation.
The investigated study area was a complex forest composed of multiple broadleaf trees, completely merged, with different heights and ages, making them difficult to be distinguished from above, even by human experts through visual inspection (Duncanson et al., 2014;D. D. Li et al., 2016). The results showed that while performing satisfactorily in detecting trees in a complex forest structure, the proposed method was powerful in detecting individual separated trees. Deciduous trees have more complicated treetop geometries, often with overlapping crowns, making them more challenging to detect and delineate than conifers, which might be the reason that closed hardwood forests were considered in a lower proportion of ITD investigations than open forests (Ghanbari Parmehr & Amati, 2021;De Oliveira et al., 2021;Picos et al., 2020;Seidel et al., 2019). Having multiple peaks on a canopy increases the probability of finding more trees instead of only one, which decreases the accuracy of ITD algorithms (Eysn et al., 2015;Zhen et al., 2016). Focusing on Figure 10 revealed that the proposed method was accurate in detecting separate trees and those with a clear visible boundary. On the top-center part of the region, there was a group of FPs; it seems that spiky crowns lead to more FPs due to the structure of the graph-based method. Mixed trees and same-height trees can be a source of uncertainty. On the other hand, a cluster of FNs in the lower-right corner of the region, which was missed by the proposed approach, was even difficult to spot visually by a human expert. The method easily and correctly detected those trees that were finely separated from their neighbors or had a clear crown, such as those in the central part of the region. Moreover, comparing the detection results with our field survey data showed that most of the correctly detected trees are taller than the missed ones. One reason can be that taller trees may occlude shorter trees, and their detection was not possible using UAVderieved point cloud data.
From another point of view, a complete set of field data with measured tree heights and canopy shapes (2D or 3D) will definitely help to evaluate the proposed method for more challenging tasks. In this regard, to the best of our knowledge, there is no suitable individual tree detection benchmark dataset, i.e. point cloud data from UAV-captured RGB images, freely available for research purposes. Creating such a dataset will help researchers to better evaluate and compare their results with each other. Based on these assumptions, various forest structures, tree species, canopy types, tree ages, heights, and complexities (Yao et al., 2014), can be parameters to consider when trying to create a benchmark dataset. Generating a field survey dataset with canopy boundaries is a difficult task, especially in complex forest structures. However, a rich field survey benchmark dataset can actually enhance the relevant studies worldwide.
Synergistic use of point cloud data along with other sources of remote sensing data such as multispectral or hyperspectral imagery will also help better predict the desired parameters. Multispectral or hyperspectral imagery can help to recognize tree species or separate them from each other based on their spectral reflectivity features (O. Nevalainen et al., 2017).
In order to be able to better compare the proposed method with similar research, some of the limitations of this study are presented. The available study area was relatively small; a larger study area with more trees and various species could lead to a better investigation of the method's generality. Moreover, the field survey information did not contain any information about the canopy shape or region of each tree. Another source of uncertainty identified in the present study was the underlying trees that existed below the upper canopy. Since the point cloud data was generated from UAV RGB images, it only covered the top surface of canopies and did not contain information from the underlying layers (Krisanski et al., 2020). On the other hand, multi-return LiDAR data can penetrate into canopies and record data from multiple layers of forest (T. . Covering only the top surface of the forest canopy causes the graph to miss the smaller underlying trees. These limitations highlight the possible better contribution of using laser-generated point clouds such as airborne LiDAR scanners, drone-based scanners, or even handheld laser scanners (Hyyppä et al., 2020). In particular, multi-return LiDAR data enables a more accurate generation of DTM (i.e. and CHM), especially in forests with high canopies due to possible return from the ground. This is not true for UAV-derived point cloud data since they only provide point cloud data from the upper canopy, and in regions with a high canopy, no point cloud data from the ground is generated (Ghanbari Parmehr & Amati, 2021). Furthermore, only multi-return LiDAR data could provide such information to study multilevel canopy forests. This is due to the fact the UAVderived point cloud data can provide data from smaller trees that are occluded by other canopies. Finally, it has been reported that both LiDAR and UAV-derived point cloud data are highly correlated with tree heights, with a better performance of LiDAR point cloud data (Ghanbari Parmehr & Amati, 2021), which could result in a better reconstruction of trees. However, it should be considered that consumergrade UAVs, such as the one used in this study, are cost-effect, and require lower resources than LiDAR data. Testing the proposed method on laser-generated data would justify the claim and show how accurately it performs on multi-return point cloud data. It is believed that the proposed method can obtain very accurate results in cases such as open-canopy forests, urban parks, and large artificial planted forests; and it should be tested in other deciduous and broadleaved forests in other parts of the world so it can manifest the robustness of the proposed method.
The points mentioned above are summarized to highlight some future research directions. Investigating the capabilities of the proposed method in delineating tree canopies in the presence of an appropriate field survey dataset would be an important step toward analyzing forests. Moreover, other types of forests, such as conifers, can be included to verify the generalizability of the method. Utilizing LiDAR point cloud data would enhance the performance of the proposed method by providing point cloud data from different depths of the canopy. And finally, the proposed method can be used for urban forest mapping, which is an important field of study in urban management.

Conclusions
A new individual tree detection approach based on graph analysis was introduced in this study. It utilized the generated point cloud data from UAV RGBimagery on a complex broadleaf forest to detect individual treetops. The main approach relied on partitioning the CHM on horizontal layers in different height steps to create a graph on trees. Processing of the graph will result in individual tree locations. Obtained results show that the proposed Graph-based ITD method performed better than the current widely used LM method. Out of 205 trees in the study area, 149 trees were correctly extracted with an RMSE of about 2 m that outperformed the LM method by about 0.2 m RMSE. The precision, recall, and F1-score of the proposed method were 0.64, 0.74, and 0.68, respectively, which all showed improvements compared to the LM method. As a result, the findings revealed that the proposed method could be effectively employed to detect individual trees in broadleaf forests. Furthermore, the proposed method has the advantage of depending solely on open-source Python libraries such as Numpy, NetworkX, Scipy, and Matplotlib, allowing users and academics to simply reimplement the approach for their own applications. On the other hand, the proposed method was developed based on consumergrade UAV RGB imagery, making the data collecting step inexpensive and accessible, and potentially improving the ITD technique in various parts of the world. viding the UAV-derived point cloud data along with the field survey data. The authors also thank the PDAL Contributors for providing valuable resources in the opensource policy. During this study, Arsalan Ghorbanian was funded by ERASMUS+ ICM STUDENT MOBILITY PROGRAMME FOR DOCTORAL STUDIES for a 5-month stay at Lund University, Sweden.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability statement
The UAV-derived point cloud and the field survey data that support the findings of this study were provided by Dr Hormoz Sohrabi and Sima Sadeghi (Tarbiat Modares University, Faculty of Natural Resources, Department of Forest Science) and are available at https://geospatial conf2019.ut.ac.ir/page_1021.html. The developed code that supports the findings of this study is available at https:// github.com/Seyed-Ali-Ahmadi/Graph-based_ITCD.