Automatic Identiﬁcation of Forest Disturbance Drivers Based on Their Geometric Pattern in Atlantic Forests

: Monitoring forest disturbances has become essential towards the design and tracking of sustainable forest management. Multiple methodologies have been developed to detect these disturbances. However, few studies have focused on the automatic detection of disturbance drivers, an essential task as each disturbance has different implications for the functioning of the ecosystem and associated management actions. Wildﬁres and harvesting are two of the major drivers of forest disturbances across different ecosystems. In this study, an automated methodology is presented to automatically distinguish between the two once the disturbance is detected, using the properties of its geometry and shape. A cluster analysis was performed to automatically individualize each disturbance and afterwards calculate its geometric properties. Using these properties, a decision tree was built that allowed for the distinction between wildﬁres and harvesting with an overall accuracy of 91%. This methodology and further research relating to it could pose an essential aid to national and international agencies for incorporating forest-disturbance-driver-related information into forest-focused reports.


Introduction
Forests play an essential role in modern societies since they provide valuable resources and ecosystem services, such as carbon and biodiversity reservoirs [1][2][3]. However, nowadays forests are threatened by anthropogenic pressures and climate change [1,4]. In this context, monitoring forests and their disturbances has become essential for developing management strategies oriented towards increasing forests' resilience and guaranteeing their sustainability [5,6]. Since the signing of the Paris Agreement [7], which mandated the reporting of forest changes, carbon emissions and uptakes, a special need has arisen among national and international agencies to conduct forest disturbances detection, identification and quantification.
Wildfires have been identified as one of the major drivers of forest disturbances across many different ecosystems. According to the Global Forest Goals Report 2021, fire has been a prevalent forest disturbance in the tropics [8], as well as in boreal and temperate forest ecosystems [9,10]. However, nowadays the fire regimes in the forested ecosystems of Europe are currently gaining great relevance due to climate change [11]. In fact, climate driven wildfires have been highlighted by many countries as a significant threat compromising forest-based economic, social and environmental benefits [8]. Therefore, monitoring wildfires is a key point in forest change reporting.
Another relevant forest disturbance is forest harvesting. Worldwide national and international forest agencies put great effort into minimizing illegal logging in order to reduce the degradation of forest ecosystems [12,13]. However, it is also important for management agents to monitor legal forest harvesting in order to design sustainable management policies and evaluate progress towards a sustainable bio-economy. The monitoring of forest harvestings is also needed to ensure the traceability of forest products and to fulfill the European regulations related to legal wood supply [14]. Such forest monitoring becomes essential in countries or regions whose forests are mostly privately owned and with a very active forest sector. In these kind of areas, traditional forest harvesting monitoring based on field visits may not be feasible or fully efficient.
In this context, the open access to satellite data has marked a turning point in forest monitoring and disturbance detection [15]. Satellites, such as the family of Landsats and Sentinel-2, have been widely used to report forest extent and forest changes [15]. In the field of change detection, a wide range of methodologies and algorithms have been developed that are currently well established and widely used. The most efficient ones rely on time series analysis [16]. Some examples are the family of Breaks For Additive Season and Trend (BFAST) algorithms [17][18][19], LandTrender [20] and Continuous Change Detection and Classification (CCDC) [21]. As a result of the application of these kinds of algorithms, the location and extent of the disturbances can be automatically obtained, as well as the timing and degree of the disturbance.
However, proper forest monitoring requires more than the location and extent of disturbances. It would also be essential to automatically identify the drivers of the detected disturbances since each type of driver (i.e., harvesting, wildfires, windstorms or insect plagues) has different implications for forest management agents [22][23][24]. Forest agencies have traditionally derived this information from pre-existing databases [25][26][27][28], visual interpretation of very high-resolution images (VHR-imagery) [29] or even field work. The main drawbacks of these kind of approaches are that they can be outdated, time-consuming and/or expensive. Again, remote sensing constitutes an alternative for automating the identification of drivers in forest disturbances through efficient affordable procedures. In this field, spectral indexes have showed to be efficient in the detection of wildfires (as for example the NBR [30]), but they do not allow to differentiate wildfires from harvestings (the latter are usually detected as wildfires) [31,32]. Consequently, new approaches are needed.
Some interesting advances have been recently achieved in the identification of drivers of forest disturbances. In the context of disturbances in the USA, an algorithm has been developed that correlates each type of disturbance with the "shape" of the spectral response of the disturbance area over time [33]. This technique has been enhanced by including additional predictor variables that again rely on information, such as pre-existing fire-burn perimeters from pre-existing databases [34]. This enables the prediction of the disturbance causality on a large scale in US forests. A different study performed in boreal forests found that studying the spectral response of the short-wave infrared (SWIR) portion of the electromagnetic spectrum of the soil left behind after a disturbance provides an effective way of discriminating between harvesting and wildfires [35]. Supervised classification of different spectral metrics together with geometric metrics of changes has been used as well by Hermosilla et al. [36]. However, it is worth noting that Neigh et al. found difficulties when transferring between different ecosystems' methods that rely on spectral response to detect disturbance drivers [37]. This can severely affect the application of such methods on very large scales [37]. Therefore, it may be valuable to analyze a different approach that is less dependent on the radiometric responses of the disturbances.
This study presents a novel methodology based on Sentinel-2 data to automatically identify the driver of forest disturbances, once they have been detected. Since previous studies make the point that forest disturbances such as wildfires and harvestings manifest different shapes [35,38,39], the proposed methodology uses the geometric and shape properties of the disturbances to identify the potential driver. It is applied to Galicia (Northwest of Spain), a region characterized by numerous and intense forest harvestings, frequent wildfires and a very high rate of private forests owners (99% of Galician forests are privately owned) [40,41]. Section two describes the study area, materials and methodology. Section three presents the obtained results. Section four accomplishes the discussion. Section five comes to the conclusions of the described methodology and its implementation to the selected study area.

Study Area
The present study was carried out in Galicia, a region in the Northwest of Spain (see Figure 1). This region encompasses a total of 29,574 km 2 , nearly 70% of which is covered by forest land [40]. Forest area is mainly covered by Pinus sp., Eucalyptus sp. and mixed broadleaves (riparian species, Quercus robur, Quercus pyrenaica and Castanea sativa, among others) [40,42]. Shrubs are abundant as well, comprising approximately 30% of the forest land cover [40,42].

Study Area
The present study was carried out in Galicia, a region in the North Figure 1). This region encompasses a total of 29,574 km 2 , nearly 70% of by forest land [40]. Forest area is mainly covered by Pinus sp., Eucalyp broadleaves (riparian species, Quercus robur, Quercus pyrenaica and Cast others) [40,42]. Shrubs are abundant as well, comprising approximately land cover [40,42]. The main forest disturbances in Galicia are due to wildfires and ha tion to wildfires, it might be highlighted that although Galicia represe Spanish land surface, between 2006 and 2015 it reported the largest n events in Spain (it reported 29.22% of all Spanish wildfire events occurr two years) and accounted for the largest percentage of burned surface ( Spanish burned surface in these years) [43].These trends have continue sequent years [44].
In relation to harvestings, it should be noted that Galicia has an act [40,41,45,46]. More than half of the annual fellings in Spain are harvest making this area one of the European regions with the highest percen The main forest disturbances in Galicia are due to wildfires and harvestings. In relation to wildfires, it might be highlighted that although Galicia represents only 6% of the Spanish land surface, between 2006 and 2015 it reported the largest number of wildfire events in Spain (it reported 29.22% of all Spanish wildfire events occurring between these two years) and accounted for the largest percentage of burned surface (22.51% of the total Spanish burned surface in these years) [43]. These trends have continued throughout subsequent years [44].
In relation to harvestings, it should be noted that Galicia has an active forestry sector [40,41,45,46]. More than half of the annual fellings in Spain are harvested in Galicia [45], making this area one of the European regions with the highest percentage of harvesting intensity [46]. Most forest harvesting actions are produced on private forests since 99% of Galician forests are privately owned [41]. Management actions are rarely uniform across the region as land property is highly fragmented; approximately 40% of the land covered by the main productive tree species in Galicia are in cadastral parcels smaller than 0.5 ha [47]. The mean of forest harvesting processed records per year in the last six years (2015-2020) amounts to 88,140 [41].

Satellite Image Acquisition
Sentinel-2 images were used in this study to perform the disturbance-detection-related steps. The data product used was the Level-2A, which includes geometric, radiometric and atmospheric corrections [48]. Images covering the whole study area were downloaded from the Copernicus Open Access Hub [49]. One image per month was downloaded for all the study years (2018, 2019 and 2020). The criteria to select the images were: for each month select the image with smallest percentage of cloud cover, discard images with a cloud cover smaller than 50% and finally if no image satisfies the maximum cloud-cover criterion, select an image at the end of the previous month or at the beginning of the following month.
Information from the European Forest Fire Information System (EFFIS) [50] was used as complementary data to obtain the location of wildfires that took place in Galicia during the period analyzed.
Information related to harvesting locations was obtained from one of the regional government's databases, which is shared by the Xunta de Galicia in the context of the "Galician continuous forest inventory" project [51]. It consisted of a set of cadastral reference numbers where harvesting licenses were granted during the period from 2017 to 2019. This information was cross-referenced with the Spanish cadastral geodatabase [47] to obtain the exact location of the harvesting activities.
The information obtained from these two sources was contrasted using VHR aerial orthorectified images (PNOA images) dating from 2017 and from 2020 [52]. The images are available online through the Spanish National Cartographical Institute (IGN) [52]. The images from 2017 have a spatial resolution of 0.25 m and their georeferencing mean square error is ≤0.5 m. The images from 2020 have a spatial resolution of 0.15 m and a georeferencing mean square error of ≤0.20 m.

Main Method
This section describes the methodology to automatically obtain the driver of a forest disturbance, considering the forest disturbance map as the starting point. It might be highlighted that the methodology can be applied whatever procedure is followed to obtain the disturbances map. The first subsection (Section 2.3.1.) describes the particular procedural chain that was performed in this case to obtain the disturbance map of the study area, which is the region of Galicia. The following subsections (Sections 2.3.2 and 2.3.3) are focused on describing in detail the proposed novel methodology to distinguish among forest disturbance drivers. A diagram of the main method is presented in Figure 2. downloaded from the Copernicus Open Access Hub [49]. One image per month was downloaded for all the study years (2018, 2019 and 2020). The criteria to select the images were: for each month select the image with smallest percentage of cloud cover, discard images with a cloud cover smaller than 50% and finally if no image satisfies the maximum cloud-cover criterion, select an image at the end of the previous month or at the beginning of the following month. Information from the European Forest Fire Information System (EFFIS) [50] was used as complementary data to obtain the location of wildfires that took place in Galicia during the period analyzed.
Information related to harvesting locations was obtained from one of the regional government's databases, which is shared by the Xunta de Galicia in the context of the "Galician continuous forest inventory" project [51]. It consisted of a set of cadastral reference numbers where harvesting licenses were granted during the period from 2017 to 2019. This information was cross-referenced with the Spanish cadastral geodatabase [47] to obtain the exact location of the harvesting activities.
The information obtained from these two sources was contrasted using VHR aerial orthorectified images (PNOA images) dating from 2017 and from 2020 [52]. The images are available online through the Spanish National Cartographical Institute (IGN) [52]. The images from 2017 have a spatial resolution of 0.25 m and their georeferencing mean square error is ≤0.5 m. The images from 2020 have a spatial resolution of 0.15 m and a georeferencing mean square error of ≤0.20 m.

Main Method
This section describes the methodology to automatically obtain the driver of a forest disturbance, considering the forest disturbance map as the starting point. It might be highlighted that the methodology can be applied whatever procedure is followed to obtain the disturbances map. The first subsection (Section 2.3.1.) describes the particular procedural chain that was performed in this case to obtain the disturbance map of the study area, which is the region of Galicia. The following subsections (Sections 2.3.2 and 2.3.3) are focused on describing in detail the proposed novel methodology to distinguish among forest disturbance drivers. A diagram of the main method is presented in Figure 2.  The forest disturbance map of Galicia was obtained through the comparison of three consecutive annual land cover maps, specifically those corresponding to 2018, 2019 and 2020. The annual land cover maps were obtained according to the multitemporal approach described in Alonso et al. [53]. This approach was specifically designed to optimize the results considering the singularities of the Galician forest sector. It consists of performing supervised classifications of multi-temporal series of Sentinel-2 images. At first, one image is selected per single month for the study area; in that way, a group of twelve images is considered for every analyzed year. All the images are pre-processed to remove clouds using the cloud mask provided by the Sentinel Level 2A product. Secondly, a dataset of training pixels is defined for every year. In our case, it was performed through photointerpretation of the available PNOA images, dating from 2017 and from 2020 [52], as well as Sentinel-2 images. Contours of pixels with homogeneous land use were delineated to define the training polygons of the map classes. Once the 2019 training dataset was defined, it was used as a starting point for building the dataset for 2018 and 2020: considering the photointerpretation, the polygons were redefined if any land use change was appreciated. As a result, the size of training pixels datasets was similar for every classification. Thirdly, a supervised classification is performed on every single Sentinel image. For this end, bands with 10 m resolution were resampled to 20 m resolution using the nearest neighborhood interpolation. The classifications were performed applying the random forest algorithm [54]. As a result of this process, land cover maps are obtained for every single month. The last step is the aggregation of the 12 maps of a given year in a final map. This step was performed using the plurality voting decision criteria [53]. In that way, one aggregated map is obtained for every analyzed year. The three obtained maps were cross-verified using a set of points randomly distributed around the study area. The ground truth of those points for each year was obtained by photointerpretation of PNOA images from 2017 and from 2020 [52]. The cross-verification was performed trough the creation of a confusion matrix for each year.
Once land cover maps were obtained for 2018, 2019 and 2020, inter-annual comparisons were performed to detect potential land cover changes. For this end, pairs of maps were combined into a single raster file through a simple transformation and arithmetic raster calculation. A multiplication factor was applied to the raster file MAPyear 1 , and the result was added to the MAPyear 2 . A graphic illustration of this combination is presented in Figure 3. The digital values of the resulting raster file preserve the land cover classes of the two combined maps for every single pixel: the first digit of every pixel corresponds to the class of the earlier year, while the last digit corresponds to the class of the later year. In this way, changes involving vegetation reduction can be easily detected. For instance, pixels where a tree forest class has moved to any other non-tree class are considered disturbances, and pixels where shrubs have moved to another non-vegetated class are considered disturbances as well. Therefore, all the pixels that in 2018 had a forest-related cover (Eucalyptus spp., Conifer, Broadleaf or Shrub) and in 2019 had a land cover, which implied a vegetation reduction (shrubs, crop and pastures, bare soil or anthropogenic area), were considered as change pixels. The same criterion was followed to detect vegetation changes in the biennium in 2019-20. All the digital values reflecting these kind of changes were reclassified to obtain a binary result (0-1), and therefore obtain a disturbance/non-disturbance raster file. The analyzed bienniums were combined through raster layer addition. The result can be considered the disturbance map of the whole study area in the considered period.

Disturbance Pixels Clustering
Once the forest disturbance map is available for the study area, the identification of the drivers can be accomplished over it. As previously pointed out, the subsequent procedure does not depend on the methodology that was followed to obtain the disturbances map. The only requirement is to arrange the disturbance raster layer in a binary format, where pixels with digital value 1 correspond to a disturbance occurring in the considered

Disturbance Pixels Clustering
Once the forest disturbance map is available for the study area, the id the drivers can be accomplished over it. As previously pointed out, the su cedure does not depend on the methodology that was followed to obtain th map. The only requirement is to arrange the disturbance raster layer in a b where pixels with digital value 1 correspond to a disturbance occurring in t period and pixels with digital value 0 correspond to areas where forest v remained unaltered.
The first step to be accomplished is the clustering of those pixels of th map that correspond to the same disturbance event. This step is needed to tomatically individualize each disturbance event to further perform the geo acterization of each of them. The basic criterion to stablish the grouping is of pixels. A cluster analysis of the disturbance pixels was performed. The a was the Density-Based Cluster Algorithm (DBSCAN) [55]. This algorithm vised learning method, which is efficient at finding clusters of arbitrary sh dealing with large databases. It allows to identify regions of high point dens be assimilated to disturbance events, and regions of low point density, wh not have any other point close enough to be grouped in a cluster. Two para configured: the minimum number of points needed in a region to be con (minPts) and a distance threshold above which it is not possible to consider be neighbors (eps). Several values of both parameters might be tested to fin tion that optimize the result for the study area. The result of this step is a disturbance events. The first step to be accomplished is the clustering of those pixels of the disturbance map that correspond to the same disturbance event. This step is needed to be able to automatically individualize each disturbance event to further perform the geometrical characterization of each of them. The basic criterion to stablish the grouping is the proximity of pixels. A cluster analysis of the disturbance pixels was performed. The algorithm used was the Density-Based Cluster Algorithm (DBSCAN) [55]. This algorithm is an unsupervised learning method, which is efficient at finding clusters of arbitrary shape and when dealing with large databases. It allows to identify regions of high point density, which can be assimilated to disturbance events, and regions of low point density, where points do not have any other point close enough to be grouped in a cluster. Two parameters can be configured: the minimum number of points needed in a region to be considered dense (minPts) and a distance threshold above which it is not possible to consider two points to be neighbors (eps). Several values of both parameters might be tested to find a configuration that optimize the result for the study area. The result of this step is a map of forest disturbance events.

Disturbance Clusters Characterization
Once change disturbance events are mapped, they can be characterized in terms of shape-related and pattern-related metrics. According to previous experience, the shape of a wildfire usually tends to be close to an irregular ellipse with a large number of snags, resulting in a disturbance area with irregular boundaries [35,38,39]. On the contrary, harvesting tends to have a more regular shape and to form a continuous patch [35,38,39]. Several metrics were obtained to characterize every single disturbance cluster.
A first set of metrics was obtained to study how the pixels in each cluster in the map of forest disturbance events are distributed in relation to one another. All metrics were based on a previously calculated distance matrix containing the Euclidean distance between each pair of pixels in a cluster. The diagonal of the obtained distance matrix was set to NoData to avoid including distances of 0 in the subsequent calculations. The metrics obtained are presented in Table 1. Table 1. Set of metrics calculated for each detected disturbance cluster to study how the pixels of the cluster are distributed in relation to one another.

mdmin
The average of the minimum distance from each pixel to the nearest pixel in the cluster. To obtain this, a matrix was computed containing the distances from each pixel to its nearest pixel. Then the mean of the values contained in the minimum-distance matrix was calculated. mdmax The average of the maximum distance from each pixel to the farthest pixel in the cluster. To obtain this, a matrix was computed containing the distance from each pixel to its farthest pixel. Then the mean of the values contained in the maximum-distance matrix was calculated. mdmean The average of the average distances from each pixel to the rest of the cluster pixels.
To obtain this, a matrix was computed containing the average of the distances from each pixel to the rest of the pixels. Then the mean of the values contained in the mean-distance matrix was calculated.

mdsd
The average of the standard deviation of the distances from each pixel to the rest of the cluster pixels. To obtain this, a matrix was computed containing the standard deviation of the distances from each pixel to the rest of the pixels. Then the mean of the values contained in the standard-deviation matrix was calculated. mdkur The average of the kurtosis of the distances from each pixel to the rest of the cluster pixels. To obtain this, a matrix was computed containing the kurtosis of the distances from each pixel to the rest of the pixels. Then the mean of the values contained in the kurtosis matrix was calculated. mdskew The average of the skewness of the distances from each pixel to the rest of the cluster pixels. To obtain this, a matrix was computed containing the skewness of the distances from each pixel to the rest of the pixels. Then the mean of the values contained in the skewness matrix was calculated.
A second set of metrics for all the clusters present in the map of forest disturbance events was calculated. This one was focused on metrics related to the distance from each pixel in the cluster to the center of gravity of the cluster. Therefore, the center of gravity of each cluster was calculated and afterwards metrics presented in Table 2 were calculated: Table 2. Set of metrics calculated for each detected disturbance cluster to study the position of pixels in the cluster in relation to the center of gravity of the cluster.

Metric Description
dmeanCOG The average distance from each of the cluster of pixels to the cluster's center of gravity. dminCOG The distance from the closest cluster of pixels to the cluster's center of gravity. dmaxCOG The distance from the farthest cluster of pixels to the cluster's center of gravity. dsdCOG The standard deviation of the distances of each of the clusters of pixels to the cluster's center of gravity. dkurtCOG The kurtosis of the distances of each of the clusters of pixels to the cluster's center of gravity. dskewCOG The skewness of the distances of each of the clusters of pixels to the cluster's center of gravity.
An additional set of metrics was calculated for all the clusters present in the map of forest disturbance events, based on the polygon formed by each cluster. To obtain this polygon for each cluster, the concave hull [56] of the polygon of the cluster was calculated using the k-nearest neighbor approach to compute the region occupied by a set of points. The value of k is used to control the concaveness of the output polygon, resulting in the Remote Sens. 2022, 14, 697 8 of 18 "smoothest" polygon when the lowest k value is used. In this study, k was set to 3. Using the resulting polygons, the shape metrics presented in Table 3 were calculated for each cluster.
2 * (A * π) /P (1) Table 3. Set of metrics calculated on the polygon formed by each disturbance cluster. The polygon was obtained from the concave hull of the disturbance pixel centers.

P.A
The perimeter of the polygon divided by its area. P.sqrt.A The perimeter of the polygon divided by the square root of its area.

Max.Distan
The maximum diameter calculated as the maximum distance between the vertices of two of the polygon's parts. Equation (1). Equation of the Shape Index. A stands for the area of the polygon. P stands for the perimeter. Equation from [57].

Disturbance Drivers Modeling (Wildfires or Harvesting)
This step aims at classifying forest disturbance events into wildfires or harvestings considering the obtained shape and pattern metrics. For this end, a ground truth dataset is required. It was built by selecting clusters in the map of forest disturbance events and assigning them to either wildfires or harvestings. Wildfires were identified through the EFIS geodatabase [50]. Harvesting areas were identified through the information provided by the Xunta de Galicia [51]. The information provided by these two external sources was verified and filtered through photointerpretation by comparing the 2017 PNOA [52] and the 2020 PNOA [52]. Finally, a total of 304 clusters were selected, 170 corresponding to wildfires and 134 to harvesting areas. Although the sample might seem scarce, it was not augmented to avoid having a big class imbalance. A bigger amount of harvesting reference data was available but more records of wildfires during the study period were not found.
The modeling technique used was decision-tree learning through the building of a classification tree. This technique predicts the value of a target variable by learning simple decision rules inferred from the features of the data [58]. This technique was selected due to its simplicity of implementation and interpretation. In fact, there exists previous scientific literature related with land forest disturbances that relies on this technique [59,60]. In order to implement it, the ground truth set of data was randomly divided into training and verification datasets. Seventy percent of the clusters were used to train the decision tree and the other 30% were used to verify it. The decision tree was built on the training dataset using the default parameters of the "rpart" library of the R software [61]. This library also allows to calculate the variable importance of all the variables that formed the tree, understood as "the sum of the goodness of split measures for each split for which it was the primary variable, plus the adjusted agreement for all splits in which it was a surrogate". The library scales the results to sum 100 and omits the variables with an importance smaller than 1%.
The resulting decision tree allowed to classify the disturbances and obtain a map of wildfires and harvestings. This map was cross-verified with the remaining ground truth subset. A confusion matrix was built, and the following accuracy metrics were obtained: Overall Accuracy (OA), Users Accuracy (UA), Producers Accuracy (PA) and Kappa Index. As the decision tree created may change as the training dataset changes, the presented modeling steps were repeated several times in order to obtain multiple decision trees and multiple sets of accuracy metrics. Finally, the decision tree that produced the best accuracy metrics was selected.

Results
This section is divided in four subsections. In Section 3.1. results of the disturbances mapping are presented. The next section, Section 3.2., focuses on presenting the results of the disturbance clustering procedure. Section 3.3. analyzes the behavior of the shaperelated metrics calculated for the clusters presented in the previous section depending on their disturbance nature (harvestings or wildfire). The final section presents the results of the disturbance drivers modelling step.

Disturbances Mapping
The annual land cover maps were obtained for the whole study area following the methodology described in the previous section. The considered map classes were: conifers, Eucalyptus spp., other broadleaves, shrubs, crops and pastures, bare soils, anthropogenic areas and water bodies. A total of 1671 random points were obtained to perform the cross-verification. The estimated overall accuracy obtained for each map was: 78% for the 2018 map and 79% for both 2019 and 2020 maps.
The comparison of consecutive annual maps was performed to obtain the disturbances map. The result is a binary raster file showing all the pixels where the vegetation was significantly reduced. The raster layer addition of the disturbances map corresponding to 2018-19 and the disturbances map of 2019-20 results in the final disturbance map for the study area in the analyzed period. An example is shown in Figure 4.

Results
This section is divided in four subsections. In Section 3.1. results of the disturbances mapping are presented. The next section, Section 3.2., focuses on presenting the results of the disturbance clustering procedure. Section 3.3. analyzes the behavior of the shape-related metrics calculated for the clusters presented in the previous section depending on their disturbance nature (harvestings or wildfire). The final section presents the results of the disturbance drivers modelling step.

Disturbances Mapping
The annual land cover maps were obtained for the whole study area following the methodology described in the previous section. The considered map classes were: conifers, Eucalyptus spp., other broadleaves, shrubs, crops and pastures, bare soils, anthropogenic areas and water bodies. A total of 1671 random points were obtained to perform the cross-verification. The estimated overall accuracy obtained for each map was: 78% for the 2018 map and 79% for both 2019 and 2020 maps.
The comparison of consecutive annual maps was performed to obtain the disturbances map. The result is a binary raster file showing all the pixels where the vegetation was significantly reduced. The raster layer addition of the disturbances map corresponding to 2018-19 and the disturbances map of 2019-20 results in the final disturbance map for the study area in the analyzed period. An example is shown in Figure 4.

Disturbances Clustering
As a result of the cluster analysis, a total of 25,307 disturbance clusters were obtained for the period of 2018-2020. An example of several of the clusters obtained is shown in Figure 5. The configuration parameters that optimize the results considering the singulari-ties of the disturbances in the study area were: 10 minPts and 100 eps. High minPts values involve wildfire ledges that are not considered individual disturbances (i.e., the ledge of the cyan blue wildfire cluster in Figure 5). Furthermore, close individual harvests are efficiently clustered together (i.e., marked harvests in Figure 5). Low eps values prevent large wildfires from being identified as single disturbances (i.e., wildfires marked in Figure 4 that correspond to one single wildfire event). It was observed that those clusters identified as wildfires maintained the typical shape and point distribution of a wildfire.
As a result of the cluster analysis, a total of 25,307 disturbance clusters were obtained for the period of 2018-2020. An example of several of the clusters obtained is shown in Figure 5. The configuration parameters that optimize the results considering the singularities of the disturbances in the study area were: 10 minPts and 100 eps. High minPts values involve wildfire ledges that are not considered individual disturbances (i.e., the ledge of the cyan blue wildfire cluster in Figure 5). Furthermore, close individual harvests are efficiently clustered together (i.e., marked harvests in Figure 5). Low eps values prevent large wildfires from being identified as single disturbances (i.e., wildfires marked in Figure 4 that correspond to one single wildfire event). It was observed that those clusters identified as wildfires maintained the typical shape and point distribution of a wildfire. Figure 5. Examples of the cluster analysis results. On the left: disturbances binary raster in red disturbance pixels. On the right: the result of the cluster analysis; all the pixels that belong to a cluster are shown in the same color. Examples of wildfires and harvesting disturbances and clusters are highlighted.

Disturbances Clusters Characterization
All the obtained clusters were characterized in terms of shape and pattern according to the metrics described in the previous section. In order to accomplish the subsequent step, consisting of modelling disturbances in the study area, the metrics of a ground truth dataset were analyzed in detail. For this end, 304 clusters were selected, 170 corresponding to wildfires and 134 to harvesting areas. They were identified using the complementary databases mentioned in Section 2.3.4 and filtered by photointerpretation. The metrics of these datasets are presented in Figures 6-8. The paired boxplots clearly show which metrics present different values depending on the driver of the disturbance. Several Figures reveal that wildfires have more outstanding outliers than harvestings and sometimes they take values in a wider range (mdmax, mdsd, MaxDistan, etc.). This might be due to some metrics being highly related to the size of the event (for instance mdmax or dmax-COG) and some others to the shape (mdmean, mdsd, etc.). The wildfires in the study area present an important variability of sizes and shapes; most of them affect small areas but some of them are large wildfires.

Disturbances Clusters Characterization
All the obtained clusters were characterized in terms of shape and pattern according to the metrics described in the previous section. In order to accomplish the subsequent step, consisting of modelling disturbances in the study area, the metrics of a ground truth dataset were analyzed in detail. For this end, 304 clusters were selected, 170 corresponding to wildfires and 134 to harvesting areas. They were identified using the complementary databases mentioned in Section 2.3.4 and filtered by photointerpretation. The metrics of these datasets are presented in Figures 6-8. The paired boxplots clearly show which metrics present different values depending on the driver of the disturbance. Several Figures reveal that wildfires have more outstanding outliers than harvestings and sometimes they take values in a wider range (mdmax, mdsd, MaxDistan, etc.). This might be due to some metrics being highly related to the size of the event (for instance mdmax or dmaxCOG) and some others to the shape (mdmean, mdsd, etc.). The wildfires in the study area present an important variability of sizes and shapes; most of them affect small areas but some of them are large wildfires. Remote Sens. 2022, 14, x FOR PEER REVIEW 11 of 18      Figure 6 shows the paired boxplots obtained for the metrics related to how the po in each cluster are distributed in relation to one another. The metrics that seem to mo differ between harvesting and wildfires are: mdmax, mdmean and mdsd. The larger ues of mdmean in wildfires might be due to their irregular shapes, which involve lar medium distances among points. The larger values of mdmax in wildfires might be to the occurrence of events with larger sizes than the size of harvestings. The combina of larger sizes and irregular shapes result in higher values of mdsd in wildfires. Figure 7 shows the paired boxplots obtained for the metrics related to the dista from each point in the cluster to the center of gravity of the cluster. The metrics wh present the least overlap between boxplots are dmaxCOG, dmeanCOG and dsdCO Here again it might be highlighted that the fact that dmaxCOG and dmeanCOG are lar in wildfires has to do with the larger size of wildfires and their more irregular shape. Figure 8 shows the paired boxplots obtained for the metrics calculated focused the shape of the polygon formed by the cluster. The greatest differences between the h vesting and wildfire boxplots can be seen in P.A and D.A followed by Max.Distan. In f above all the metrics presented, P.A and D.A are the ones that present the greatest dif ences. In small events, the quotient between perimeter and total area, or between diam and total area is higher. Harvestings usually present smaller areas than wildfires, and is reflected in their respective P.A and D.A boxplots.

Disturbance Driver Modeling (Wildfire or Harvesting)
Shape and pattern-based metrics of wildfires significantly differ from the charac istic metrics of harvestings. The decision tree that optimized the classification of distu ances according to its driver is shown in Figure 9. The metrics present in this tree are D mdmin and P.A. This indicates that generally, when the relation between the diagona the polygon formed by the cluster of a disturbance and the area of the polygon is gre than 0.01, that the disturbance is harvesting; however, when this relation is smaller, m conditions must be analyzed. First, if the average of the mdmin is relatively large (>30  Figure 6 shows the paired boxplots obtained for the metrics related to how the points in each cluster are distributed in relation to one another. The metrics that seem to mostly differ between harvesting and wildfires are: mdmax, mdmean and mdsd. The larger values of mdmean in wildfires might be due to their irregular shapes, which involve larger medium distances among points. The larger values of mdmax in wildfires might be due to the occurrence of events with larger sizes than the size of harvestings. The combination of larger sizes and irregular shapes result in higher values of mdsd in wildfires. Figure 7 shows the paired boxplots obtained for the metrics related to the distance from each point in the cluster to the center of gravity of the cluster. The metrics which present the least overlap between boxplots are dmaxCOG, dmeanCOG and dsdCOG. Here again it might be highlighted that the fact that dmaxCOG and dmeanCOG are larger in wildfires has to do with the larger size of wildfires and their more irregular shape. Figure 8 shows the paired boxplots obtained for the metrics calculated focused on the shape of the polygon formed by the cluster. The greatest differences between the harvesting and wildfire boxplots can be seen in P.A and D.A followed by Max.Distan. In fact, above all the metrics presented, P.A and D.A are the ones that present the greatest differences. In small events, the quotient between perimeter and total area, or between diameter and total area is higher. Harvestings usually present smaller areas than wildfires, and this is reflected in their respective P.A and D.A boxplots.

Disturbance Driver Modeling (Wildfire or Harvesting)
Shape and pattern-based metrics of wildfires significantly differ from the characteristic metrics of harvestings. The decision tree that optimized the classification of disturbances according to its driver is shown in Figure 9. The metrics present in this tree are D.A, mdmin and P.A. This indicates that generally, when the relation between the diagonal of the polygon formed by the cluster of a disturbance and the area of the polygon is greater than 0.01, that the disturbance is harvesting; however, when this relation is smaller, more conditions must be analyzed. First, if the average of the mdmin is relatively large (>30), it is a wildfire. If it is not, an additional condition needs to be analyzed. This last condition is the relation between the perimeter and the area of the polygon that is formed by the cluster (P.A). If this relation is smaller than 0.014, the disturbance is a wildfire; on the contrary, it is a harvesting event.
Remote Sens. 2022, 14, x FOR PEER REVIEW 13 of 18 is a wildfire. If it is not, an additional condition needs to be analyzed. This last condition is the relation between the perimeter and the area of the polygon that is formed by the cluster (P.A). If this relation is smaller than 0.014, the disturbance is a wildfire; on the contrary, it is a harvesting event. Figure 9. Decision tree that allows for the differentiation between harvesting and wildfires using the shape-related metrics calculated. D.A: the maximum diameter of the polygon formed by the cluster, calculated as the maximum distance between the vertices of two parts of the polygon, divided by the area of the polygon. mdmin: the average of the minimum distance from each disturbance pixel to its nearest disturbance pixel in the disturbance. P.A: the perimeter of the polygon formed by the disturbance divided by the area of the polygon.
The variable importance results are presented in Figure 10. Variables with a greater importance are D.A. and P.A., followed by Max.Distan and mdmax. It might be highlighted that although mdmin forms parts of the final decision tree, its importance by itself is around 10%. Figure 10. Variables' importance according to the decision-tree algorithm implemented through the library rpart [61]. Variables with importances' smaller than 1% were omitted.
The confusion matrix obtained by applying this decision tree to the test points is presented in Table 4. In general, high accuracy metrics were obtained. The OA was 91% and the UA and the PA range between 87% and 94%. The Kappa Index obtained was 0.82. Figure 9. Decision tree that allows for the differentiation between harvesting and wildfires using the shape-related metrics calculated. D.A: the maximum diameter of the polygon formed by the cluster, calculated as the maximum distance between the vertices of two parts of the polygon, divided by the area of the polygon. mdmin: the average of the minimum distance from each disturbance pixel to its nearest disturbance pixel in the disturbance. P.A: the perimeter of the polygon formed by the disturbance divided by the area of the polygon.
The variable importance results are presented in Figure 10. Variables with a greater importance are D.A. and P.A., followed by Max.Distan and mdmax. It might be highlighted that although mdmin forms parts of the final decision tree, its importance by itself is around 10%. Figure 9. Decision tree that allows for the differentiation between harvesting and the shape-related metrics calculated. D.A: the maximum diameter of the polygo cluster, calculated as the maximum distance between the vertices of two parts of vided by the area of the polygon. mdmin: the average of the minimum distance fr ance pixel to its nearest disturbance pixel in the disturbance. P.A: the perimete formed by the disturbance divided by the area of the polygon.
The variable importance results are presented in Figure 10. Variables importance are D.A. and P.A., followed by Max.Distan and mdmax. It lighted that although mdmin forms parts of the final decision tree, its impo is around 10%. Figure 10. Variables' importance according to the decision-tree algorithm impleme library rpart [61]. Variables with importances' smaller than 1% were omitted. The confusion matrix obtained by applying this decision tree to the test poin in Table 4. In general, high accuracy metrics were obtained. The OA was 9 and the PA range between 87% and 94%. The Kappa Index obtained was 0 The confusion matrix obtained by applying this decision tree to the test points is presented in Table 4. In general, high accuracy metrics were obtained. The OA was 91% and the UA and the PA range between 87% and 94%. The Kappa Index obtained was 0.82.

Discussion
The shape and pattern metrics of disturbances revealed that wildfires in the study area tend to affect larger surfaces and/or affected areas and present larger extensions than harvestings, which tend to present more regular shapes than wildfires. This analysis is aligned with previous experiences [35,38,39].
High accuracy metrics were obtained (OA 91%) for distinguishing between wildfires and harvesting through the study of the geometry and shape of these disturbances. This high OA accuracy and the accompanying high UA and PA high-accuracy metrics (between 87% and 94%) indicate that the described methodology could help to distinguish between wildfires and harvestings once a disturbance map was obtained. Besides, this indicates that, as previously pointed out, wildfires and harvesting tend to have different shapes [35,38,39], a fact that aids in associating a driver to each disturbance. In fact, Hermosilla et al. [36] already identified that some geometric metrics (area and compactness) could aid in attributing a driver to a disturbance. In this study, different metrics (D.A, mdmin and P.A) have been found useful. The fact that other geometric metrics were useful in a quite different study area (the boreal forest in Canada) indicates that the application of this method in different biomes and study areas should be further studied in more detail, as wildfire regimes and harvesting practices can greatly differ among biomes resulting in disturbances that respond to different spatial patterns [6,62]. Such research would assess the adaptation of the method to different realities, contemplate the introduction of different metrics and identify which decision tree would best suit each different situation.
Although decision-tree algorithms were found useful in this study, there exists the possibility of testing different machine learning algorithms. For example, Random forest [54], a widely used machine learning algorithm in disturbance-related scientific works [63,64]. Besides, by using such algorithms, the repetition of the modelling steps to select the tree that retrieved better results could be avoided as this algorithm is based on the combination of multiple decision trees.
One of the key points in the described methodology is the aggregation of change pixels to individualize disturbance events. This step is needed in any disturbance driver attribution methodology that relies on the disturbance's geometry and shape. A novel way of performing this step has been described here (cluster analysis). The main advantage of this method is that it can be used with any change-detection method relying on remotesensing imagery, whereas, in previous studies, the individualization of a disturbance was dependent upon knowledge of the date and duration of the event [35,36]. This meant that, while many other methods have been successfully developed, the change detection in these studies had to be performed mainly by dense time-series analyses [16,65]. The fully automatic methodology described here could be used in conjunction with any other type of change-detection methodology so long as its result contains the localization of the change pixels. The main difficulty that the cluster analysis may represent is pinpointing the optimum configuration of the cluster algorithm parameters to correctly individualize disturbances. It was found that with the parameters selected in this study, some areas of the disturbance events were detached into different clusters (i.e., in the case of large wildfires) and that disturbances smaller than 4 ha were not detected due to the need for setting the minpts to 10. Nonetheless, for detecting wildfires, the selected parameters confer an improvement in terms of scale when compared with studies using other databases, for example those that rely on MODIS data (250 m spatial resolution) [50]. Therefore, further research should be conducted to fine-tune this step and improve the cluster results as well as to include small disturbances in the analysis and test the methodology in such cases.

Conclusions
A novel method based on the analysis of the shape and geometrical pattern of forest disturbances was described. It allows to identify whether forest disturbances have been driven by wildfire or by harvesting. High accuracy metrics were obtained for the study case, displaying the effectiveness of the methodology. The main novelty of this approach relies on the single use of geometrical and shape metrics of forest disturbances. The main advantages of the method are its ability to fully automate the process of distinguishing between drivers, and the possibility of applying it to any set of forest changes regardless of the remote-sensing-based method used to detect them. This may be a major advantage, as different study areas might require the use of different methodologies to detect forest disturbances due to different climatic conditions (e.g., the continuous presence of clouds) or landscape configurations (e.g., high fragmentation), among others. For our record, this is the first method to derive forest disturbance drivers that is not linked with the methodology followed to obtain the disturbances. Besides, the methodology is highly affordable, as it does not require additional sets of data, and it can be performed in near real-time as soon as the disturbances are detected. Advances such as this one in forest disturbance driver detection could be essential in aiding national and international agencies incorporate this information into their forest-related reports. Therefore, further studies on drivers and the utility of the geometric properties of disturbances in different biomes and study areas would be relevant. This will aid in the design of management actions aimed at improving forest sustainability as well as in the decision-making process of designing responses to the disturbances, thus contributing to the preservation of forests and supporting their key roles in the current context of climate change.