Methods for separating orchards from forest using airborne LiDAR

The aim of the study was to distinguish orchards from other lands with forest vegetation based on the data from airborne laser scanning. The methods based on granulometry provided better results than the pattern analysis. The analysis based on the Forest Data Bank/Cadastre polygons provided better results than the analysis based on the segmentation polygons. Classification of orchards and other areas with forest vegetation is important in the context of reporting forest area to international organizations, forest management, and mitigating effects of climate change. Agricultural lands with forest vegetation, e.g., orchards, do not constitute forests according to the forest definition formulated by the national and international definitions, but contrary to the one formulated in the Kyoto Protocol. It is a reason for the inconsistency in the forest area reported by individual countries. The aim of the study was to distinguish orchards from other lands with forest vegetation based on the data from airborne laser scanning. The study analyzed the usefulness of various laser scanning products and the various features of pattern and granulometric analysis in the Milicz forest district in Poland. The methods based on granulometry provided better results than the pattern analysis. The analysis based on the Forest Data Bank/Cadastre polygons provided better results than the analysis based on the segmentation polygons. Granulometric analysis has proved to be a useful tool in the classification of orchards and other areas with forest vegetation. It is important in the context of reporting forest area to international organizations, forest management, and mitigating effects of climate change.


Introduction
There are a number of forest definitions around the world. Some of them are formulated in national legislation and apply only to the forests of the respective state, and some are international. Poland, like many other countries, is obliged to report forest area for the Climate Convention (Kyoto Protocol) and the Food and Agriculture Organization of the United Nations (FAO/UN). Aerial and satellite imagery has been successfully used since the early 2000s to estimate forest area with an accuracy of 80-99% for forest areas, 95% for forests developed during the secondary succession, and 75% for trees on agricultural land. The accuracies were calculated for the whole study sites and represent the percentage of area classified correctly according to the reference data (Kunz et al. 2000;Haapanen et al 2004;Wężyk and de Kok 2005;Próchnicki 2006;Wang et al 2007a;Pekkarinen et al 2009;McRoberts et al 2012;Pujar et al 2014;Kolecka et al 2015;Hościłło et al 2015;Thompson et al 2016;Szostak et al 2017). Unlike data from Airborne Laser Scanning (ALS, active remote sensing technology), passive remote sensing data do not contain information on elevation, which is important for the forest definition formulated for reporting purposes. Currently, ALS data are available in many countries of the world (in Poland within the ISOK project-IT system of Country Protection) and are increasingly used in remote sensing analyzes. The use of ALS data makes it possible to achieve or improve the results of forest area estimates (Castillo-Núñez et al. 2011;McRoberts et al. 2012;Kolecka et al. 2015; Thompson et al. 2016;Naesset et al. 2016;Szostak et al. 2017). Land use is an important factor responsible for differences in official forest area statistics (Seebach et al., 2011). However, using remote sensing methods, land area can only be classified in terms of its coverage. From aerial or satellite coverage, it is not possible to determine the land use category according to BDOT (Terrain Object Database) or other national land use/land cover databases. There are areas covered with forest vegetation that are not forests according to the forest definition of the FAO/UN and the Polish Law on Forests 1991 (e.g., post-agricultural areas with secondary succession, swamps, and orchards).
However, it is difficult to distinguish urban green (forests and forest vegetation in communal areas) from regular forest complexes and secondary succession with remote sensing; it is possible to distinguish orchards. In this paper, we have chosen to compare two approaches, vector-and grid-based. The vector-based approach assumes that the regular spatial distribution of trees is reflected by the regular distribution of points representing their vertices, expressed in numerical form by corresponding statistical coefficients. Pattern analysis-based methods have been used by Tanada and Blanco (2016) to detect orchards with an accuracy (the percentage of area classified correctly according to the reference data) of 73-99%. The grid-based approach assumes that the different structure of various forms of land use and cover (LULC) is reflected in satellite and aerial photographs in the form of raster images with a characteristic distribution of pixel values. Methods based on texture analysis and granulometry have been used by Aksoy et al. (2010Aksoy et al. ( , 2012, Mougel et al. (2008), Ranchin et al. (2001), Trias-Sanz (2006, Helmholz and Rottensteiner (2009), Ursani et al. (2012), Erfanifard and Rezayan (2014), Komba et al. (2015), Kupidura (2019), and Kupidura et al. (2019), with accuracy up to 96%. The use of both approaches is based on the assumption that the specific regular structure of trees in orchards is different from the irregular structure of forests (and other types of trees). According to this hypothesis, this difference is so important that both the analysis of the distribution of the regularity of trees in the orchard and the analysis of the texture, depending on this distribution, should allow us to distinguish the distinctive features of these two different types of tree stands.

Brief presentation of tested methods
The study was conducted using two methods: pattern analysis of the mutual distribution of points representing tree crowns and texture analysis using the granulometric method. The main assumptions of the two methods used are presented below.

Pattern analysis
The basic concept that describes the distribution of points based on density is homogeneity. The distribution of points is homogeneous if the density of points is the same in all identical subareas. It can be tested using the χ2-test. Conversion of point data into continuous data allows their analysis with areas (after reclassification) or geostatistical data. Point distribution analysis as a method of analyzing the distribution of objects in geographic space dates back to the turn of the 1950s and early 1960s (Gatrell et al. 1995). This distribution can be described as clumped, scattered, and random (Pielou 1959;Ek 1969;Robeson et al. 2014). The distribution factor is a statistical measure expressed in numbers. It requires establishing the null hypothesis that objects are randomly distributed and then proving or disproving this hypothesis with a certain margin of error expressed by the significance level "p." Li and Zhang (2007) compared various methods of tree distribution analysis in spruce-fir stands, including nearest neighbor, K-Ripley function, and autocorrelation. Pattern analysis algorithms are inferential statistics; they start with the null hypothesis that your traits or the values associated with your traits have a spatially random pattern. They then calculate a p-value, which is the probability that the null hypothesis is correct (that the observed pattern is simply one of many possible versions of complete spatial randomness). Calculating a probability can be important when you need to have a high degree of confidence. The following list has some of the most popular algorithms for pattern analysis and gives a brief description of each (https:// pro. arcgis. com/ en/ pro-app/ latest/ tool-refer ence/ spati al-stati stics/ an-overv iew-of-the-analy zing-patte rns-tools et. htm): Average nearest neighbor-Calculates a nearest neighbor index based on the average distance of each feature to its nearest neighbor feature High/low clustering-Measures the degree of clustering for either high or low values using the Getis-Ord General G statistic Incremental spatial autocorrelation-Measures spatial autocorrelation for a range of distances and optionally produces a line plot of these distances and their corresponding z-values. Z-values reflect the intensity of spatial clustering, and statistically significant peak z-values indicate distances where spatial processes that promote clustering are most pronounced. These peak distances are often appropriate values for tools with a distance band or distance radius parameter. Multi-distance spatial cluster analysis (Ripley's k-function)-Determines whether features or the values associated with features exhibit statistically significant clustering or dispersion over a range of distances. Spatial autocorrelation-Measures spatial autocorrelation based on feature locations and attribute values using Global Moran's I-statistic.

Texture analysis using granulometry
Granulometry was introduced by Haas et al. (1967) for binary images. This was later extended to grayscale images; also local granulometric analysis was introduced (Dougherty et al. 1992;Vincent 1996), allowing to analyze the neighborhood of each pixel. Granulometry is based on a series of morphological operations: opening and closing with a successively increasing size of the structuring element that determines the scope of the operation. This allows for the quantification of the occurrence of texture elements of various types and sizes. In this way, each pixel is assigned a set of values that allow analysis of the texture in this vicinity. The effectiveness of local granulometry in relation to other methods of texture analysis as a tool to support land cover/land use classification in satellite imagery has been demonstrated by Kupidura and Uwarowa (2017) and Kupidura (2019), among others.
In the research, also opening by reconstruction and closing by reconstruction have been tested. Morphological operations by reconstruction are based on geodesic reconstruction, which consist in the morphological operations of erosion and geodesic dilation (Nieniewski 2005). The application of these operations does not change the shape of the image elements that are not completely removed, unlike simple opening and closing operations. The reason for applying granulometric analysis based on operations of this type is the possibility of occurrence of younger forest areas with a regular planting distance, which may resemble orchards in texture. At the same time, these areas occur within or are directly adjacent to larger forest areas. Granulometric analysis based on operations by reconstruction should therefore, it is assumed, differentiate these two types of areas-artificial regeneration in the immediate vicinity of forest areas and orchard meadows.

Study area and data
The Milicz forest district is located in the Katowice Regional Directorate of state forests in Lower Silesia (Fig. 1). The information on habitats, species, age, and volume of the stands is summarized in Table 1  Airborne laser scanning data (ALS) were acquired in August 2015 using a Riegl LMSQ680i laser scanning system with a pulse frequency of 360 kHz, resulting in point clouds with an average of 10 pulses/m 2 . The mean flight altitude was 550 m, and the field of view of the scanning system was 60 degrees. Together with the point clouds, the data provider generated a digital surface model (DSM) and a digital terrain model (DTM) with a very high spatial resolution of 0.5 m using TerraSolid software. This DTM was used to normalize all returns from the raw point clouds (Stereńczak et al., 2018). Based on the digital terrain model and the digital surface model, the canopy height model and the normalized differential surface model, respectively, were generated. Based on the point cloud from airborne laser scanning, the "Intensity Image" was generated, i.e., the image representing the average intensity of laser beam reflection for individual pixels.
Vector data representing forest divisions (in the state forests) available on the website Forest Data Bank as of 2016 (www. bdl. lasy. gov. pl) and agricultural plots (in the areas surrounding the state forests) available on LPIS Agricultural Plots Identification System as of 2016 (www. geopo rtal. gov. pl) were combined into a set of polygons for which the statistics for pattern and granulometry analysis will be calculated. The second set of polygons was generated based on the digital land cover model in ENVI 5.0. (https:// www. l3har risge ospat ial. com/ Softw are-Techn ology/ ENVI) using the feature extraction extension. The value of the variable defining the number and size of polygons-0.95 was chosen by trial and error.
Based on the vegetation indices and canopy height model, a mask of forest vegetation (2 m) was also used, generated as part of the REMBIOFOR project "Remote sensing determination of wood biomass and carbon stocks in forests" (BIOSTRATEG1/267755/4/NCBR/2015), conducted at the Forest Research Institute between 2014 and 2018 (www. rembi ofor. pl). A digital orthophotomap in false color representation (CIR) and the spatial resolution of 0.2 m was also used to create reference layers and as a basis for illustrations (Fig. 2).

Pre-processing
The general scheme of the analysis is presented in Figure 3.
The values of average nearest neighbor and Ripley's K-function ratios were calculated for the Forest Data Bank/ Cadastre and object segmentation polygons. The analysis was performed using a script written in the Python programming language in ArcGIS Pro (www. arcgis. com, ESRI corporation-www. esri. com). Points representing tree canopies higher than 2 m were generated based on the canopy height model in the software Impact Lidar (Deutscher et al., 2016).
The following datasets were used for the granulometric analysis: An image representing the average intensity of reflectivity ("int") at a resolution of 0.5 m An image representing the average intensity of reflectance, clipped to the boundaries of the areas occupied by forest vegetation > 2 m ("clip") at a resolution of 0.5 m An image representing mask of forest vegetation > 2 m ("mask") at a resolution of 0.5 m.

Layer preparation
The average nearest neighbor and K-Ripley coefficients were mapped to the Forest Data Bank/Cadastre and object segmentation polygons. The tools are available in the ArcGIS toolbox and are designed to perform the pattern analysis based on the location of points representing single trees. They are explained in the "pattern analysis" section. Raster layers were then created for further classification.
Twelve layers were created based on three datasets each for granulometric analysis in the software Blue Note 1.1.5 (https:// sourc eforge. net/ proje cts/ bluen ote/): 3 granulometric maps by opening ("1-3") 3 granulometric maps by closing ("4-6") 3 granulometric maps by opening by reconstruction ("rec1-3") 3 granulometric maps by closing by reconstruction ("rec4-6") The terms "opening", "closing," and "reconstruction" are explained in the "Texture analysis using granulometry" section. In total, 36 layers were generated. The selection was determined by the diversity of the products of the granulometric analysis. For the polygons Forest Data Bank/Cadastre and object segmentation, the average pixel values were calculated from the layers representing the results of the granulometric analysis. The mean values of average nearest neighbor, the K-Ripley coefficient, and 36 layers resulting from the granulometric processing described in this section were assigned to the polygons representing orchards and other lands with forest vegetation. Raster layers were created for further classification.  3 granulometric maps by "opening" ("1-3") 3 granulometric maps by "closing" ("4-6") 3 granulometric maps by "opening" and "reconstrucƟon" ("rec1-3") 3 granulometric maps by "closing" and "reconstrucƟon" ("rec4-6")

Layer selection
To perform a first discrimination analysis, groups of test polygons were selected from Forest Data Bank/Cadastre and object segmentation representing orchards and other areas with forest vegetation (Fig. 4).
The basic statistics, i.e., minimum, maximum, amplitude, mean, standard deviation, and quartiles, were calculated for these polygons and 36 layers resulting from the granulometric processing. The differences (∆) between the mean values for orchards and other lands with forest vegetation are shown in the "Results" section. Further analyses were performed on the layers for which the difference between the means was more than 30 (where 255 is the maximum value) for the polygons representing orchards and other lands with forest vegetation. The threshold was selected based on visual analysis of the results of the granulometric analysis. One layer was discarded based on the visual analysis. The results of the pattern analysis are not included here, due to different extreme values, possible to obtain (0-undefined, in case of the pattern analysis, and 0-255, in case of the Fig. 4 Test polygons from the Forest Data Bank/Cadastre and the object segmentation for the discriminant analysis granulometric analysis). However, the results of the pattern analysis were included in the subsequent processing.

Classification
Selected layers representing different combinations of pattern analysis and granulometric processing results (separately for Forest Data Bank/Cadastre and object segmentation polygons) were thresholded. In each case, we selected the best possible thresholds using the "trial and error" method. Two classes were distinguished on the resulting maps: "Orchards" and "Other areas with forest vegetation". Selected layers representing different combinations of pattern analysis and granulometric processing results (separately for Forest Data Bank/Cadastre and object segmentation polygons) were combined into 4 separate band compositions (2 methods vs. 2 sets of polygons) and subjected to the supervised parametric classification of maximum likelihood using the 40 training polygons (twenty for each class) randomly dispersed across the study site but within the borders of Forest Data Bank/Cadastre and the object segmentation polygons representing "Orchards" and "Other areas with forest vegetation."

Accuracy assessment
We chose a number of randomly dispersed non-orchard polygons of similar area to the orchard polygons and assessed how well we discriminated between orchards and nonorchards using a confusion matrix. We repeated the selection of non-orchards 3 times to obtain a measure of uncertainty in our results. The accuracy of the analysis carried out in this way was determined using the overall accuracy, Kappa coefficient, and commission (overestimation) and omission (underestimation) values. Overall accuracy is the amount of correctly classified pixels of total pixels of the image. Cohen's Kappa coefficient is a statistic that is thought to be a more robust measure as it takes into account the possibility of the agreement occurring by chance. The omission and commission values represent the underestimation and overestimation of each class.

Results
The values of average nearest neighbor, the K-Ripley coefficient, and the mean values of 36 layers (resulting from the granulometric processing described in the previous section) were assigned to the polygons representing orchards and other lands with forest vegetation. Then, basic statistics, i.e., minimum, maximum, amplitude, mean, standard deviation, and quartiles, were calculated for these polygons. The differences (∆) between the mean values for orchards and other lands with forest vegetation are shown in Tables 2 and 3. Further analysis was performed on the layers for which the difference between the mean values (for orchards and other areas with forest vegetation) was more than 30, both for the Forest Data Bank/Cadastre and object segmentation polygons. The threshold was chosen based on visual analysis of the results of the granulometric analysis. One of the layers ("clip _rec_close1") was rejected, based on the visual analysis as well.
The following layers were selected for further analysis: 3rd band of the granulometric analysis based on closing on the image representing the average reflectance intensity-"Int_close3". 3rd band of the granulometric analysis based on closing on the image representing the average reflectance intensity clipped to the borders of areas with forest vegetation-"Clip_close3". 1st band of the granulometric analysis based on opening by reconstruction on the image representing the average reflectance intensity clipped to the borders of areas with forest vegetation-"Clip_rec_open1". 2nd band of the granulometric analysis based on opening by reconstruction on the image representing the average reflectance intensity clipped to the borders of areas with forest vegetation-"Clip_rec_open2". 1st band of granulometric analysis based on opening on an image representing a mask of tall vegetation-"Mask_ open1". 3rd band of granulometric analysis based on closing on an image representing a mask of tall vegetation-"Mask_ close3". 3rd band of granulometric analysis based on opening by reconstruction on an image representing a mask of tall vegetation-"Mask_rec_open1".
Two classes were distinguished in the resulting images: "Orchards" and "Other tree vegetation." In this way, two sets of layers (created on the basis of polygons from Forest Data Bank/Cadastre and object segmentation) were obtained, 7 layers each. In both cases, a composition was created from 7 selected bands and put under a supervised classification using the maximum likelihood algorithm, based on the "training points" layer. Two compositions were made out of four pattern analysis bands for the Forest Data Bank/Cadastre and the object segmentation polygons. As in the case of the resulting images, two classes were distinguished on the two obtained reference maps: "Orchards" and "Other areas with forest vegetation." The Kappa values of the classification based on the granulometric bands and the Forest Data Bank/Cadastre polygons varied between the 3 sets of test polygons and achieved higher values (89.2-95.3%) than the classification based on the pattern analysis bands (15.5-47.2%). The Kappa values of the classification based on the granulometric bands and the object segmentation polygons were also higher values (79.4-92.8%) than the classification based on the pattern analysis bands (43-58.7%) ( Table 4). The Kappa values of the classification based on the granulometric bands achieved higher values (89.2-95.3%) for the Forest Data Bank/ Cadastre polygons than the object segmentation polygons (79.4-92.8%). This may be associated with greater regularity and consistency of the areas separated in this way. However, in the case of pattern analysis composition, the trend is opposite. The values of the classification based on the pattern analysis bands achieved higher values (15.5-47.2%) for the Forest Data Bank/Cadastre polygons than the object segmentation polygons (43-58.7%). The classification performed on the composition of granulometric analysis seems to be more reliable here (Table 4).
In the accuracy analysis performed using polygons from Forest Data Bank/Cadastre and the reference map, the higher accuracy was obtained for the 1st band of the granulometric analysis based on opening by reconstruction derived from the image representing the intensity of reflectance -7.6 (5. 6-29.4) clipped to the borders of areas with high forest vegetation ("Clip_rec_open1"; Kappa = up to 73.5%) and the 3rd band of the granulometric analysis based on closing performed on the image representing a high vegetation mask (> 2 m) ("Mask_close3"; Kappa = up to 80.6%). The indices of the granulometric maps most useful in distinguishing the two analyzed classes indicate the characteristic features of the grains of the analyzed texture (Table 5).
In the accuracy analysis performed using object segmentation polygons and the reference map, the ranking of granulometric analysis products, based on the accuracy of classification, is generally the same, as in the case of the analysis presented in Table 5. However, the results are significantly (and systematically) worse than in the previous case. Again, the highest accuracy was obtained for the 1st granulometric map based on opening by reconstruction, representing the reflectance intensity clipped to the borders of areas with high vegetation ("Clip_rec_open1"; Kappa = up to 57.4%) and the 3rd granulometric band based on closing derived from the image representing a mask of high vegetation (> 2 m) ("Mask_close3"; Kappa = up to 60.3%) ( Table 6).
The same as for the classification performed based on the composition of bands, the Kappa values of the classification based on the granulometric bands achieved higher values for the Forest Data Bank/Cadastre polygons than the object segmentation polygons. Figure 5 shows the results of the classifications based on the granulometric analysis using Forest Data Bank/Cadastre and object segmentation polygons. For representative purposes, the results were compared with a fragment of a digital orthophotomap with the spatial resolution of 0.5 m representing the largest orchard clusters in the research area of Milicz. The areas representing orchards are marked with dark gray and other areas with light gray. The figures also show the boundaries between the polygons representing orchards and other areas with forest vegetation (or without it) from the reference maps created on the basis of the Forest Data Bank/Cadastre and object segmentation polygons.

Discussion
Due to the nature of the structure of forests and orchards, this study uses an analysis based on morphological operations (opening and closing)-simple and by reconstruction. The rationale for the use of the reconstruction-based granulometric analysis is the possibility of sparsely planted wooded areas with an orchard-like texture in the vicinity of much denser tree stands, and these areas should be differentiated by this analysis. The extraction of orchards, vineyards, and other regularly distributed high-growing crops, performed by numerous authors, resulted in an accuracy over 90% (Tanada and Blanco, 2016;Gordon and Philipson, xxxx;Geneletti and Gorte 2003;Viau et al. 2005;Warner and Steinmaus 2005;Trias-Sanz 2006;Kumar et al. 2008;Mathews and Jensen, 2012;Fieber et al. 2013;Sicre et al. 2014;Nolan et al. 2015;Karakizi et al. 2016;Mirakhorlou et al. 2017;Roy et al. 2018), using various types of indicators, classification types, models, and other methods.
The pattern analysis resulted in relatively low and unstable accuracy which does not reflect the regularity of the granulometric analysis results. The single band thresholding results were not included, due to the inability to visually distinguish orchards and other areas with forest   vegetation, even closely to the reference image. However, it turned out that satisfactory accuracy (Kappa value up to 95.3%) can also be obtained by using granulometry-based methods. The best results were obtained using the classification on the composition of selected granulometry bands. The supervised classification of the composition of bands is easier and requires less time than the selection of proper thresholds. It does not require many training polygons, just well selected. For the single bands, the best results were obtained using the thresholding on the image representing the values of the 3rd granulometry band by closing on the mask of high vegetation ("Mask_close3") and the 1st granulometry band by opening and reconstruction on the image representing the reflectance intensity clipped to the borders of the areas with high vegetation ("Clip_ rec_open1"). This is due to the relatively small size of objects-trees (opening, index 1; granulometry based on opening analyzes the amount of brighter objects in the image) and a slightly larger size of the space between them (closing, index 3; granulometry based on closing analyzes the amount of darker objects in the image), especially in the case of orchards. The texture in this respect turns out to be a distinctive feature for these two classes, allowing a relatively high (Kappa = 57.4-80.6%) classification accuracy. The results obtained for other granulometric maps are characterized by lower accuracy (with some Fig. 5 The results of orchard classification based on 2 best results of granulometric analysis and the Forest Data Bank/Cadastre and the segmentation polygons exceptions), which proves that the two classes are more similar in this aspect of texture. International organizations such as the FAO/UN and the UNFCCC require signatory countries to maintain and increase the area of forests and other areas with forest vegetation. The methodology proposed in this article can help distinguish forests from orchards (or other regular crops of over 2 m height) with a consistent and objective methodology. Forests play an important role in the accumulation of carbon stocks, the release of which into the atmosphere would increase the negative effects of climate change. In addition, forests are important for biodiversity, provide environmental services, and influence quality of life and health. Regular crops imitating forests (such as orchards) are able to accumulate carbon and play some role in the ecosystem as well, but nowhere near compared to natural forests. So, by using good monitoring methods regarding forest vegetation, we are not only improving this tool, but also solving many other problems that contribute to the management and protection of forest vegetation.

Conclusions
Granulometry has proved to be a useful tool in the classification of orchards and other areas with forest vegetation. The methods based on granulometry provided better results than the pattern analysis. The analysis based on the Forest Data Bank/Cadastre polygons provided better results than the analysis based on the segmentation polygons. Classification of orchards and other areas with forest vegetation is important in the context of reporting forest area to international organizations, forest management, and mitigating effects of climate change.