Edinburgh Research Explorer Mapping tropical disturbed forests using multi-decadal 30m optical satellite imagery

Tropical disturbed forests play an important role in global carbon sequestration due to their rapid post-dis- turbance biomass accumulation rates. However, the accurate estimation of the carbon sequestration capacity of disturbed forests is still challenging due to large uncertainties in their spatial distribution. Using Google Earth Engine (GEE), we developed a novel approach to map cumulative disturbed forest areas based on the 27-year time-series of Landsat surface re ﬂ ectance imagery. This approach integrates single date features with temporal characteristics from six time-series trajectories (two Landsat shortwave infrared bands and four vegetation indices) using a random forest machine learning classi ﬁ cation algorithm. We demonstrated the feasibility of this method to map disturbed forests in three di ﬀ erent forest ecoregions (seasonal, moist and dry forest) in Mato Grosso, Brazil, and found that the overall mapping accuracy was high, ranging from 81.3% for moist forest to 86.1% for seasonal forest. According to our classi ﬁ cation, dry forest ecoregion experienced the most severe disturbances with 41% of forests being disturbed by 2010, followed by seasonal forest and moist forest ecor- egions. We further separated disturbed forests into degraded old-growth forests and post-deforestation regrowth forests based on an existing post-deforestation land use map (TerraClass) and found that the area of degraded old-growth forests was up to 62% larger than the extent of post-deforestation regrowth forests, with 18% of old-growth forests actually being degraded. Application of this new classi ﬁ cation approach to other tropical areas will provide a better constraint on the spatial extent of disturbed forest areas in Tropics and ultimately towards a better understanding of their importance in the global carbon cycle.


Introduction
As hotspots of global biodiversity and carbon storage, tropical forests play an important role in biodiversity conservation, climate change mitigation and the provision of multiple other ecosystem services (Foley et al., 2005). However, millions of hectares of tropical forests have been lost due to deforestation and degradation disturbances, resulting in estimated net carbon emissions of 1.4 ± 0.5 Pg yr 1 from 1990 to 2010 (Houghton, 2012). These emissions represent the second largest anthropogenic source of carbon dioxide to the atmosphere after burning of fossil fuels (van der Werf et al., 2009). In contrast, a significant proportion of previously disturbed tropical forests are regrowing, trapping some of the carbon we are adding to the atmosphere, and with the potential to sequester more in the future. The carbon sink due to tropical forest recovering from deforestation and logging has been reported to be up to 70% greater than that of intact tropical forests (Pan et al., 2011). However, our ability to accurately assess tropical carbon sources or sinks is hampered by the lack of precise information on the extent of disturbed forests in the tropics (Baccini et al., 2017).
Remote sensing has played a key role in identifying forest disturbances and recovery, especially with the recent proliferation of highresolution satellite data (Hansen et al., 2013). Several approaches have T previously been used to map disturbed forests in tropical regions, including optical approaches based on moderate resolution MODIS imagery (Langner et al., 2007), high-resolution Landsat imagery (Lu, 2005;Vieira et al., 2003) and very high-resolution SPOT data (Carreiras et al., 2014;Kimes et al., 1999;Souza et al., 2003), as well as Synthetic Aperture Radar (SAR) (Kuplich, 2006;Trisasongko, 2010) and Lidarbased approaches (Andersen et al., 2014). However, the majority of these studies have focused on local scales and have been based on single date images. For example, Vieira et al. (2003) classified forests into young, intermediate, advanced and mature forests for one municipality in the state of Pará, using Landsat spectral information and vegetation indices, and found that combining Landsat shortwave infrared band (1.55-1.75 μm) with NDVI generated a better classification than using any individual band/index. Carreiras et al. (2017) further demonstrated the use of combined Landsat spectral bands with ALOS PALSAR backscatter intensity to distinguish secondary regrowth forest and mature forest in three landscapes in Brazilian Amazon. Such multiple multisensor fusion approaches have yet to be applied over regional scales.
Several regional satellite-based land cover classifications that include secondary regrowth and forest degradation have become available for Neotropical regions. Two prominent examples are the TerraClass post-deforestation land use/land cover classification (Almeida et al., 2016) and the DEGRAD forest degradation product (INPE, 2007(INPE, -2013, both of which were developed by Brazilian National Institute for Space Research (INPE) specifically for the Brazilian Amazon. In TerraClass, available since 2004, secondary regrowth forest is mapped on previously deforested areas larger than 6.25 ha using a semi-manual approach (Almeida et al., 2016). The DEGRAD product is produced mainly by visual interpretation of Landsat and CBERS satellite images from a single year and is annually available between 2007 and 2013 (INPE, 2007(INPE, -2013. Recently, another product, MapBiomas, has become available that provides annual national-level land cover and land use maps for Brazil (MapBiomas, 2015). MapBiomas, available from 2000 to 2016, classifies forest land cover as dense forest, open forest, secondary forest, degraded forest, flooded forest or mangrove, using an empirical decision tree classification algorithm based on single date spectral mixture analysis. All of those single date imagery based approaches are limited in the discriminatory power they can provide as they make no use of temporal degradation/recovery signals which characterise disturbed forests. Thus, none of the existing products fully exploits the potential of existing Landsat time-series data spanning multiple decades to provide reliable maps of both forest regrowth and degradation. Furthermore, none of these products captures historical (pre-2000) disturbances. There is therefore a clear need for a product that provides a more comprehensive picture of historical disturbances over tropical regions.
Methods that exploit temporal information in satellite data (e.g. threshold approaches, trajectory fitting or segmentation) have been found to be very useful for mapping forest disturbances (Hermosilla et al., 2015;Hirschmugl et al., 2017;Huang et al., 2010;Kayastha et al., 2012;Kennedy et al., 2007;Kennedy et al., 2010;White et al., 2017). However, majority of these time-series based approaches are based on a single time-series trajectory and have mainly been implemented at local scales in extratropical regions (e.g. Canada, U.S.). For example, the recently developed LandTrendr (Kennedy et al., 2010), Vegetation Change Tracker (Huang et al., 2010) and patch-based VeRDET (Vegetation Regeneration and Disturbance Estimates through Time)  algorithms have all only been extensively tested in the United States. A recent inter-comparison of disturbance detection algorithms for US forests found that different time-series analysis algorithms are sensitive to different disturbance patterns, with little agreement among these disturbance detection results (Cohen et al., 2017). Thus, when applying these algorithms elsewhere, local calibration and further secondary classification are needed to improve the algorithm's classification performance (Cohen et al., 2018). Machine learning approaches (i.e. random forest) offer the potential to harness the differential sensitivities of different time-series once provided with an appropriate training dataset, but have rarely been coupled with multiple time-series trajectories in Tropics.
In this study, we develop a novel Landsat multiple time-series based classification methodology to map cumulative disturbed forest areas in Tropics, which exploits the power of 1) time-series images relative to single date images, 2) an ensemble of reflectance bands/indices trajectories relative to single trajectories, and 3) machine learning algorithm which enhances classification power by harnessing the differential sensitivities of different time-series. The 'disturbed forests' in this study include both degraded old growth forests and post-deforestation regrowth forests. The former are characterised by a reduction of forest canopy cover (e.g. selective logging, windfall, fire) but have not been clearfelled and thus have not been included in deforestation estimates. The latter refer to areas that have been previously deforested (clearfelled) and converted to other land uses (e.g. pasture, agriculture and mining) but which have subsequently undergone a recovery process following abandonment. Our approach integrates information from six different time-series trajectories (Landsat 5/7 short-wave infrared band 5, band 7, NDVI, SAVI, NDWI 2130 , NDWI 1640 ), extracting both statistical and temporal characteristics from each trajectory which then serve as inputs for random forest classification. It not only captures disturbances occurring within study period , but also areas disturbed prior to 1984 which thereafter have exhibited clear recovery patterns. Here, we apply this method to three forest ecoregions (seasonal, moist and dry forests) in the Brazilian state of Mato Grosso.

Study area
Our study area (Fig. 1), the state of Mato Grosso, is located in the southern edge of Brazilian Legal Amazon. Mato Grosso is the third largest state in Brazil, covering a total area of 903,357 km 2 . According to the Terrestrial Ecoregions of the World (TEOW) from World Wildlife Fund (WWF), 43% of Mato Grosso area is covered by Cerrado (tropical savanna), 27% by seasonal forest, 18% by moist forest, 6% by dry forest and 6% by Pantanal (tropical wetlands) (Olson et al., 2001). In Mato Grosso, 139,917 km 2 have been deforested since 1988 (INPE, 2017) accounting for 26.5% of the state's intact forest in that year (Skole and Tucker, 1993), most of which has been converted into pasture and agricultural land use due to demand for beef and soy beans (Barona et al., 2010). According to TerraClass (Almeida et al., 2016), herbaceous pasture and shrubby pasture cover 61.4% of the total deforested areas in Mato Grosso while 19.2% of deforested areas are under secondary regrowth (including secondary vegetation and regeneration with pasture). The combination of extensive disturbances and significant amount of remaining intact forest makes Mato Grosso an ideal testbed for the application of our newly developed disturbed forests mapping approach (see Section 3).
As indicated, TerraClass is a project that maps land use/land cover on previous deforested areas provided by PRODES (Program for Deforestation Monitoring, INPE, 2017) at approximately bi-annual intervals across the Brazilian Legal Amazon (Almeida et al., 2016). Ter-raClass classifies previously deforested areas into 12 land use categories including pasture, annual crops, secondary vegetation and urban areas. It is extensively validated via field campaigns to determine the accuracy of classification. These have been conducted across different Amazonian regions, including the state of Mato Grosso. This is the best available information on the distribution of secondary forests in any region of the Tropics. However, TerraClass involves a huge effort based largely on visual interpretation and does not map degradation.
The aim of this study is to propose a Landsat multiple time-series based approach in Tropics to 1) improve the efficiency/cost-effectiveness of mapping disturbed forests vs. intact forests, facilitating future TerraClass efforts, 2) map degraded old-growth forests (outside of TerraClass), and 3) eventually enable mapping of disturbed forests over domains for which no reliable data on forest disturbance exist. Only forest areas are considered in this study. To make sure all non-forest areas are excluded, we created a forest cover mask by merging TerraClass-2010 old-growth forest, secondary vegetation and pasture with regeneration categories (Fig. 1). The latter category effectively captures the beginning of the regenerative process containing shrubs and early successional vegetation (Almeida et al., 2016).

Methodology and dataset
The whole approach was developed in Google Earth Engine (GEE) . GEE is a cloud-based geospatial processing platform which consists of over 40 years of historical and current Earth observation imagery, making pixel-based land use and land cover classification feasible across large regions through its inbuilt machine learning algorithms. The overall methodology (Fig. 2) consisted of building Landsat multiple (six) annual time-series trajectories, calculating trajectory metrics (eleven metrics divided into four groups, Table 2), generating a training and validation database, applying a machine learning random forest classification algorithm and validating the disturbed forests vs. intact forests classification map, all of which were coded and processed in GEE. We subsequently used the post-deforestation regrowth forest mask generated from TerraClass-2010 to separate the disturbed forests identified through our classification map into post-deforestation regrowth forests and degraded forests (Table 1). Finally, we performed a relative important analysis of trajectories and trajectory metrics used in the random forest classification to evaluate the extent to which the full suite of all trajectories/metrics enhanced discriminatory power relative to a single trajectory or individual group of trajectory metrics. To do this, ten separate classifications were performed whereby our classification procedure was repeated for each individual trajectory separately (but using all four groups of trajectory metrics), or separately for individual groups of trajectory metrics (but using all six trajectories).

Landsat surface reflectance dataset
We used Landsat atmospherically corrected surface reflectance (SR) products (30 m resolution) (Masek et al., 2006;USGS, 2018) to generate annual time-series trajectories. All Landsat-5 Thematic Mapper (TM) surface reflectance images acquired during the period of 1984-2010 were used except for 2001 and 2002. In 2001, most images had striping artifacts limiting their use, while in 2002, images from Landsat 5 only covered 61% of our study area. For these reasons, we used Landsat-7 Enhanced Thematic Mapper Plus (ETM+) images, which are compatible in their spectral characteristics (Claverie et al., 2015;Masek et al., 2013), for these two years. In terms of spectral bands, we chose spectral bands 3 (red, 0.52-0.60 μm) which is sensitive to the amount of chlorophyll, 4 (near-infrared, 0.76-0.90 μm) which is related to leaf cellular structure, 5 (shortwave-infrared, 1.55-1.75 μm) and 7 (shortwave-infrared, 2.08-2.35 μm) which relate to leaf water content (Nelson et al., 2000). To minimize the influence of the extent of rivers on the classification, we excluded water bodies in our analysis using the Joint Research Center (JRC) Yearly Water Classification History v1.0 product. This dataset contains maps of the location and temporal distribution of surface water from 1984 to 2015 at annual resolution, generated using more than three million scenes from Landsats 5, 7 and 8 (Pekel et al., 2016).

Generating time-series trajectories
We processed 11,483 images in total for our entire study period , ranging from 257 to 715 annual images depending on data availability, with annual spatial coverage of 99% of our study area (see Table S1 in supplementary information). Five steps were involved to process the Landsat SR data and produce time-series trajectories for 1984-2010. First, areas covered by clouds and cloud shadows were removed based on the pixel quality and radiometric saturation Old-growth forest Secondary vegetation Pasture with regeneration . Later, we merged old-growth forest, secondary vegetation and pasture with regeneration into the forest cover mask as the forest boundary. The study area encompasses three WWF forest ecoregions (moist, seasonal and dry forest).
attributes of the Landsat surface reflectance product. Second, original surface reflectance (16-bit signed integer) values were converted to 0-1 range values by multiplying by the scale factor of 0.0001. Third, four vegetation indices (VIs) were calculated including the Normalized Difference Vegetation Index (NDVI), Normalized Difference Water Index (NDWI 2130 , NDWI 1640 ) (Chen et al., 2005) and Soil-Adjusted Vegetation Index (SAVI) (Huete, 1988). Fourth, to minimize the influence of cloud contamination and improve the quality of input data, we selected the maximum value of individual VIs for each year (Maxwell and Sylvester, 2012). For time-series of reflectance from spectral bands 5 and 7, median values were calculated for each year. In the final step, we used the JRC Yearly Water Classification History v1.0 product to mask water areas (Pekel et al., 2016). After processing, annual timeseries trajectories  of Landsat SR spectral band 5 (1.55-1.75 μm), band 7 (2.08-2.35 μm), NDVI, NDWI 2130 , NDWI 1640 and SAVI were used for the classification of disturbed forests and intact forests.

Trajectory metrics
We calculated eleven metrics divided into four groups ( Table 2) for each of the six spectral trajectories to act as inputs for random forest Trajectory metrics (x11) (  (x6): algorithm (see Section 3.4), based on a priori expectations of divergence between intact and disturbed forests. Each of these 11 metrics may capture information that is linked to a particular disturbance type. For example, the coefficient of variation (C.V.) shows the extent of variability in relation to the mean. Forests which have experienced large disturbances would be expected to have higher C.V. than undisturbed intact forests. We further hypothesized that time-series trajectories of intact forest would follow a normal distribution, while those of disturbed forests would tend not to and be much more likely to exhibit greater skewness and kurtosis. Finally, trends (based on linear regressions) were also estimated from the time-series trajectories. We hypothesized that disturbance events would likely result in either decreasing (deforestation/degradation) or increasing (regrowth) trends over time, and thus expected that the regression slopes of disturbed pixels would be much smaller/greater than undisturbed pixels where we expected that the slope value is close to zero. It has been found that regrowth secondary forests in Amazonia are cut and burned on average every 5 years (Aguiar et al., 2016). Thus, we also considered the maximum absolute regression slopes derived from individual 5-year windows within the 1984-2010 study period. Fig. 3 demonstrates differences in trajectories and trajectory metrics between intact and disturbed forest pixels. For intact forests (undisturbed during 1984-2010), we expected trajectories to fluctuate, but to follow a normal distribution pattern, while trajectories of disturbed forests were expected to exhibit more pronounced decrease and increase patterns. Trajectories of disturbed forest pixels' can follow various patterns, depending on whether they have been disturbed once ( Fig. 3 Disturbed B) or multiple times ( Fig. 3 Disturbed A) within the study period  or disturbed before 1984 but following a clear recovery pattern within study period ( Fig. 3 Disturbed C).

Sampling design
We used GEE random sampling to generate a set of spatially representative points of disturbed and intact forests for classification training and validation based on TerraClass-2010 map of old-growth forest, secondary vegetation and pasture with regeneration, USGS (United States Geological Survey) 30 m Global Tree Cover 2010 (Hansen et al., 2013), the Hansen Global Forest Change (GFC) product (Hansen et al., 2013), and 30 m Global Land Cover 2010 (GlobeLand30-2010) produced by National Geomatics Centre of China (Chen et al., 2015). Since TerraClass uses deforestation vector data from PRODES (INPE, 2017) as input data to map subsequent land use/covers (Almeida et al., 2016), it inherited PRODES historical misalignment issues. To better align TerraClass with GFC products, we registered the TerraClass-2010 classification map using the GEE image displacement algorithm by calculating the displacement between TerraClass-2010 forest mask and GFC forest mask (Hansen et al., 2013).
For intact forests, points were randomly sampled from areas that met the following conditions: i) classified as old-growth forest in TerraClass-2010; ii) tree canopy cover > 75% in GFC in 2000 and no forest loss during 2000-2010; iii) tree cover > 75% in USGS 30 m Global Tree Cover 2010; and, iv) classified as forest in GlobeLand30-2010. Similarly, disturbed forest pixels were sampled from areas that satisfied the following conditions: i) classified as secondary vegetation or regeneration with pasture in TerraClass-2010; ii) tree cover > 75% in USGS 30 m Global Tree Cover 2010; and iii) classified as forest in GlobeLand30-2010. To reduce the influence of unwanted positional errors among these land cover products and avoid edge effects, we required that both intact forest and disturbed forest sampled points were located at least 100 m away from the patch boundary. For each forest ecoregion (moist/seasonal/dry forest), 10,000 points (5000 intact and 5000 disturbed) were randomly sampled, respectively. In total, we Areas that have been previously deforested (clearfelled) and converted to other land uses (e.g. pasture, agriculture and mining) but which have subsequently undergone a recovery process following abandonment. Secondary vegetation or regeneration with pasture in TerraClass-2010.

Degraded forest
Degraded old-growth forests. Characterised by a reduction of forest canopy cover (e.g. selective logging, windfall, fire) but have not been clearfelled and thus have not been included in deforestation estimates. sampled 30,000 intact and 30,000 disturbed points across the study area as the training and validation database.

Random forest classifier
Mapping of disturbed forests was performed by using the GEE Random Forest classifier algorithm, which has been recently successfully applied to cropland mapping (Shelestov et al., 2017;Xiong et al., 2017), oil palm plantation detection (Lee et al., 2016), mapping urban settlement and population (Patel et al., 2015) and soil mapping (Padarian et al., 2015). Random Forest (RF) classification is a relatively well-known supervised machine leaning algorithm that iteratively produces an ensemble of decision tree classifications by using corresponding randomly selected subsets of the training dataset (Breiman, 2001). It grows classification trees by splitting each node using a random selection subset of input variables, which reduces overfitting and yields a more robust classification compared to other classifiers (Breiman, 2001). RF uses a voting system to classify data and the final classification category for each pixel is determined by the plurality vote of all trees generated to build the forest.
We used 66 variables comprising 11 metrics ( Table 2) for each of the six time-series trajectories as input predictors for the RF classification. RF classifications were applied in moist, seasonal and dry forest ecoregions, respectively. All classifications were based on the outputs of 500 decision trees (see Fig. S1 in supplementary information). Each tree split was based on eight variables randomly selected from all 66 input variables, which was the default configuration for the GEE random forest classifier. After constructing our disturbed forest classification, we performed a post-classification filtering to reduce noise and remove spurious classification artifacts by applying a 90 m × 90 m majority filter.

Classification validation
To evaluate how well our classification performed, we used ten-fold cross-validation (Kohavi, 1995;Schaffer, 1993) based on above randomly sampled database (see Section 3.3, i.e. 10,000 points for each forest ecoregion), which randomly partitions our sampled database into ten equal sized subsets. Of the ten subsets, a single subset (1000 points) was retained as the validation data for testing the classification algorithm, and the remaining nine subsets (9000 points) were used as training data for RF classifier. The cross-validation process was repeated ten times. The final accuracy estimation was determined by the average of ten-fold results. The accuracy matrix included overall accuracy (OA), producer's accuracy (PA), user's accuracy (UA) and Kappa statistic (Kohavi, 1995).
For an additional independent confirmation for our Landsat optical sensor based classification of disturbed forests vs. intact forests, we used another microwave radar based satellite product, ALOS/PALSAR 25 m spatial resolution mosaic imagery, and very high resolution (5 m) RapidEye imagery as the visual interpretation. ALOS PALSAR imagery consists of dual polarization HH (transmission of horizontal wave and reception of horizontal component) and HV (horizontal transmission and vertical reception), but it has been shown that the polarization mode HV is more effective in deforestation detection than HH polarization (Motohka et al., 2014), which corresponds with findings of close relations between HV backscatter and vegetation structural properties (e.g. forest height, forest cover) (Joshi et al., 2015). Thus, we visually compared the 2007-2010 ALOS/PALSAR HV backscatter change with our final classification results.
SAR data are stored as digital number (DN) in unsigned 16 bit and typified by a high degree of speckles in the image (random 'salt and pepper' noise). To reduce noise and improve image interpretability, a multi-temporal speckle filter (7 × 7) (Lee, 1980;Lopes et al., 1990) was (1) σ 0 is generally negative and can vary from −35 dB in very low backscatter areas (degraded/deforested area), up to 0 dB for extremely high backscatter (dense forest area). For visual interpretation, we expected a decrease or an increase in σ 0 in forest areas that have been  recently disturbed or are recovering from past disturbances (Joshi et al., 2015). However, we also expected that many disturbed areas in our classification would not be captured by PALSAR due to its short time period (2007)(2008)(2009)(2010).

Classification results
As represented in Fig. 2, the new developed disturbed forests vs. intact forests classification approach was applied to three different ecoregions in Mato Grosso. The final classification map (Fig. 4) was generated by training the random forest classifier individually for each ecoregion on the entire sampled database. Our classification results representative of the year 2010 show that disturbed forests (both postdeforestation regrowth forests and degraded forests) were widely spread across Mato Grosso, but were most prevalent along rivers and next to non-forest areas (Fig. 4). Forests in Mato Grosso covered a total area of 295,383 km 2 in 2010 (Table 3), accounting for about 63% of the total study area. Our results show that, until 2010, 25% of the total forested area was disturbed (Table 3). Forest cover percentage varied considerably across ecoregions, ranging from 37% in dry forest to 74% in moist forest (Table 3). Dry forest experienced the most severe disturbances with 41% of forest cover classified as disturbed, followed by seasonal forest and moist forest where disturbed forests accounted for 28% and 20% of forest cover, respectively (Table 3).
We further separated disturbed forests identified through our classification map into post-deforestation regrowth forests and degraded forests. It shows that the area of degraded forests was up to 62% larger than the area of post-deforestation regrowth forests across ecoregions, with degraded forests and post-deforestation regrowth forests covering a total area of 47,039 km 2 and 28,246 km 2 , respectively (Table 4). By comparing degraded forests and old-growth forests classified in Terra-Class for the year of 2010, we found that 18% of areas identified as oldgrowth forests in TerraClass were actually degraded forests, ranging from 15% to 27% across various ecoregions (Table 4).

Ten-fold cross validation
Ten-fold cross validation was used as the main validation of our disturbed forests and intact forests classification map, with accuracy matrices provided in Table 5. Overall, all the classification accuracies were above 81% with Kappa agreements above 62%. Across ecoregions, the overall accuracy was the highest in seasonal forest at 86.1%, with a producer's accuracy of 88.9% for intact forests and 83.3% for disturbed forests. In moist forest and dry forest regions, the overall accuracies were lower at 81.3% and 82.6%, respectively.

High-resolution image interpretation
To further validate our classification, we consider in detail one landscape within each biome, comparing our results to radar and other very high-resolution data. Examples in Fig. 5-7 allow for visual comparison of our classification in selected focal areas within each forest ecoregion with corresponding ALOS PALSAR HV backscatter (σ 0 ) temporal (2007-2010) change composite images and very high-resolution (5 m) RapidEye true-colour composite images (Team Planet, 2017). Overall, this comparison at local scales shows a very good visual agreement between our classification and the PALSAR temporal change as well as with RapidEye images across ecoregions (Fig. 5-7), especially those logging roads shown in Fig. 6. As expected, there were some mismatches between our classification and the temporal change in PALSAR HV σ 0 , such as several disturbed areas from our classification not appearing in PALSAR temporal change image. This is likely due to PALSAR images only being available from 2007 and thus not capturing much forests disturbed before 2007.

Importance of individual trajectories and metrics
The relative importance of individual trajectories in our classification was measured by the percentage of overall accuracy change (% OAC) when running our classification for a single trajectory (but using all four groups of trajectory metrics) relative to our full suite multitrajectory classification ( Table 5). The larger the overall accuracy change, the less important an individual trajectory is in distinguishing the differences between disturbed forests and intact forests. All of the single time-series trajectory based classifications had much lower (3-15% across ecoregions) overall classification accuracy than our full suite classification (Fig. 8). In moist forest and dry forest ecoregions, Landsat shortwave spectral bands 5 and 7 were the most important trajectories for distinguishing disturbed forests and intact forests, decreasing %OAC the least relative to our full suite classification. However, in the seasonal forest ecoregion, NDWI trajectories were the most important, decreasing the overall accuracy the least, followed by spectral band 7.
The important of specific groups of trajectory metrics (Table 2) was determined in an analogous manner to the importance of specific trajectories. Importance patterns for groups of metrics were similar across ecoregions (Fig. 8B), with location metrics being the most important in distinguishing disturbed and intact forests, followed by temporal metrics, scale metrics and single year (2010) values. However, single year (2010) values alone were found to have much less discriminatory power than other metrics, resulting in much lower (up to 20%) classification accuracy relative to our full suite classification with all groups of metrics included (Fig. 8B).

Comparing with other products
We compared our classification of disturbed forests in Mato Grosso with other relevant products which have recently become available (Fig. 9). These include the MapBiomas land use/cover products (2000−2010) and the Latin American secondary forest map recently produced by Chazdon et al. (2016). The latter was derived from the map of Neotropical forest aboveground biomass of Baccini et al. (2012) for 2008. To ensure comparability in time, we only compared disturbed forests from our classification against the area of secondary forests < 24 years old from Chazdon et al. (2016). To compare against Map-Biomas products (2000-2010, we reclassified open forest, degraded forest, secondary forest, and flooded forest categories from MapBiomas-2010 map into one disturbed forest class. Areas classified as non-dense forest in 2000-2009 MapBiomass products but classified as dense forest in 2010 were also considered as disturbed forests. Our estimate of disturbed forest area in Mato Grosso was three times larger than disturbed forests from MapBiomas with corresponding spatial distribution shown in Fig. 9(A & B). The biggest classification differences were located in moist forest ecoregion, followed by seasonal forest and dry forest. The difference relative to MapBiomas may be due to the use of different classification methods (single date based classification) and the limited time period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010) for MapBiomas. However, secondary forest area estimates from Chazdon et al. (2016) were approximately three times greater than the disturbed area from our classification (Fig. 9C), increasing to four times greater in the dry forest biome. This may be due to the coarse resolution (500 m) of forest age map, the misclassification of some anthropogenic land use areas as forest or to errors arising from interpreting the age from the forest biomass map .
The large discrepancies of estimated disturbed forests among those products highlight the importance of using high-resolution time-series images and the consideration of historical disturbances when mapping secondary forest regrowth and forest degradation. By excluding pre-2000 historical disturbances and ignoring time-series spectral characteristics, MapBiomas significantly underestimate the area of disturbed forests (Fig. 9B), and correspondingly may underestimate the impacts of disturbance on tropical biodiversity and carbon cycles.

Discussion
In this study, we developed a new time-series approach in GEE to map disturbed forests (both forest degradation and post-deforestation regrowth) and intact forests. This approach incorporates random forest machine learning algorithm with multiple Landsat time-series trajectories, which enhances classification power by harnessing differential sensitivities of different time-series. It is flexible with respect to the disturbance patterns it captures. It detects three different disturbances trends (Fig. 3): 1) single disturbancetime-series have a decrease then increase pattern; 2) multiple disturbancestime-series have multiple increase and decrease signatures pattern; and 3) recovery on previous disturbed areastime-series only have an increase pattern. For example, in this study, it not only maps areas that disturbed and recovering during time-series period , but also captures areas that disturbed before 1984 but following a recovery process after 1984, making our approach more valuable and suitable for distinguishing disturbed forests and intact forests.
Application of our approach in moist/seasonal/dry ecoregions in Mato Gross resulted in high overall classification accuracy, ranging from 81.3% to 86.1% across ecoregions. On one hand, the misclassification of disturbed forests as intact forests may relate to the fast recovery process of secondary regrowth forests whose structural and spectral characteristics could be similar to intact forests after 20-40 years recovery (Aide et al., 2000;Poorter et al., 2016). The degraded old-growth forests recover at even faster rates. For example, it has been shown that about 50% of the canopy opening caused by selective logging becomes closed within one year of regrowth (Asner et al., 2004), making it harder to capture such quick recovery process from remote sensing perspectives. On the other hand, the misclassification of intact forests as disturbed might be because of our sampling of intact forests points which may still include few disturbed old-growth forests, as TerraClass does not map degraded forests. Furthermore, the variation of classification accuracy across ecoregions might be due to the differences of land-use history, land use intensity, severity of disturbance events, soil fertility and texture (Chazdon, 2003) and water availability (Poorter et al., 2016), which are highly associated with post-disturbance recovery processes and the structure of regrowth forests.
By separating disturbed forests into post-deforestation regrowth forests and degraded forests, we found that approximately two-thirds of disturbed forests were degraded forests, highlighting the importance of effective systems for detecting these. Forest monitoring system should not only focus on clear-cut forest deforestation and recovery, but also degraded forests which may release more than double the amount of carbon than released by deforestation (Baccini et al., 2017). Interestingly, our classification clearly captured straight-line patterns of disturbed forests, which also present a consistent agreement with both PALSAR HV backscatter intensity change and RapidEye very high resolution images (Fig. 6). Further development of our methodology may provide new opportunities to map selective logging activities at a large regional scale. The methodology developed in this study dramatically exploits the power of multiple long-term Landsat time-series in the discrimination of disturbed vs. intact forests with support of GEE's massive storage and calculation capability. Unlike previously published single time-series trajectory based approaches (e.g. LandTrendr, VCT, VeRDET) (Cohen et al., 2017), this approach incorporates six different time-series trajectories which generates a much higher classification accuracy than single-trajectory based classification (Fig. 8A). Also, this approach integrates single year features with scale, location and temporal characteristics derived from time-series trajectories, which significantly enhanced the discriminatory power. Single year features were found to be the least powerful (up to 20% less) for discriminating disturbed pixels compared to the combined use of single year features and other time-series features (Fig. 8B). Thus, combination of single year and time-series features represents a significant advance on widespread single-year approaches to map previously disturbed forests.

Conclusion
Our study explored the feasibility of using multiple long time-series Landsat surface reflectance imagery to map tropical historically disturbed forests as far back as 1984. Using case studies of the moist, seasonal and dry forests in Mato Grosso, we found that this methodology has high potential in mapping various forested land cover types related to disturbances with an overall accuracy of up to 86.1%. The classification approach developed in this study is capable of capturing not only forest regrowth from forest deforestation (clear-cut), but also forest degradation (partially cut) due to selective logging or other small scale disturbances. Based on TerraClass-2010 forest mask, until 2010, 41% dry forest in Mato Grosso were disturbed, with 28% and 20% of seasonal forest and moist forest disturbed, respectively. By comparing classification from this study with TerraClass-2010 land cover map, we found that up to 18% of area classified as old-growth forest in TerraClass was actually degraded forests, highlighting the importance of including degradation monitoring alongside clear felling monitoring.
Our study clearly demonstrates the potential of extensive time-series of satellite imagery to map historical forest disturbances and recovery processes. More specifically, the discrimination of disturbed forests (both degraded forest and post-deforestation regrowth forest) vs. intact forests was enhanced by simultaneously combining a suite of single date features and time-series characteristics derived from multiple time series of spectral bands and vegetation indices. Our approach is readily applicable to other larger tropical areas, making pan-tropical mapping of forest disturbances and regrowth a highly tangible prospect. Fig. 8. The percentage of overall accuracy change (% OAC) when running our classification procedure for individual trajectories separately (but using all four groups of trajectory metrics) or separately for individual groups of trajectory metrics (but using all six trajectories) relative to our full suite classification with all trajectories/ metrics included (Table 5). The larger the absolute % OAC, the less important the particular trajectory (or the group of trajectory metrics).