Towards accurate mapping of forest in tropical landscapes: A comparison of datasets on how forest transition matters

Tropical forests represent half of the Earth ’ s remaining forest area, but they are shrinking at high rates, which poses a threat to their multiple ecosystem services. As a response, international environmental agreements and related programs require information about tropical forested landscapes. Despite the increasing quantity and quality of remote sensing-based data, the effective monitoring of forests in the tropics still faces operational challenges: (a) applicability at local levels, with lack of reference or cloud-free information; (b) overcoming geographical, ecological, or biophysical variability; (c): stratification, distinguishing forest categories related to functionality and disturbance history. We conducted an extensive ground verification campaign through 36 landscapes in 9 regions of Zambia, Ecuador and Philippines, which constitute a gradient of pantropical deforestation contexts or forest transitions. We collected over 16,000 ground control points and digitized over 18,000 ha with details on land use and forest disturbance history. We trained a random forest algorithm and generated high-resolution (30 m) binary forest maps covering ~15 Mha, building on 39 optical (Landsat-8), radar (Sentinel-1) and elevation bands, indices and textures. We validated the quality of the outputs across the studied deforestation gradient and compared them to (a): 3 national land cover maps used for international reporting, (b): 4 global forest datasets (Global Forest Change, Copernicus Land Cover, JAXA and TanDEM-X Forest/Non-Forest). Our method generated highly accurate (92%) forest maps for the studied regions when compared to the global datasets, which generally overestimated forest cover. We achieved accuracies similar to the national maps, following a standardized method for all countries. The difficulties in delineating forest increased in more advanced stages of deforestation, with recurring struggles to distinguish non-forest tree-based systems (e.g. perennials, palms, or agroforestry), shrublands and grasslands. Regrowth forests were repeatedly misclassified across contexts, countries and datasets, in contrast to reference or degraded forests. Our results highlight the importance of in situ verification as accompanying method to establish efficient forest monitoring systems, especially in areas with higher rates of forest cover change and in tropical regions of advanced deforestation or early reforestation stages. These are precisely the areas where current REDD + or Forest Landscape Restoration initiatives take place.

Tropical forests represent half of the Earth's remaining forest area, but they are shrinking at high rates, which poses a threat to their multiple ecosystem services. As a response, international environmental agreements and related programs require information about tropical forested landscapes. Despite the increasing quantity and quality of remote sensing-based data, the effective monitoring of forests in the tropics still faces operational challenges: (a) applicability at local levels, with lack of reference or cloud-free information; (b) overcoming geographical, ecological, or biophysical variability; (c): stratification, distinguishing forest categories related to functionality and disturbance history.
We conducted an extensive ground verification campaign through 36 landscapes in 9 regions of Zambia, Ecuador and Philippines, which constitute a gradient of pantropical deforestation contexts or forest transitions. We collected over 16,000 ground control points and digitized over 18,000 ha with details on land use and forest disturbance history. We trained a random forest algorithm and generated high-resolution (30 m) binary forest maps covering ~15 Mha, building on 39 optical (Landsat-8), radar (Sentinel-1) and elevation bands, indices and textures. We validated the quality of the outputs across the studied deforestation gradient and compared them to (a): 3 national land cover maps used for international reporting, (b): 4 global forest datasets (Global Forest Change, Copernicus Land Cover, JAXA and TanDEM-X Forest/Non-Forest).
Our method generated highly accurate (92%) forest maps for the studied regions when compared to the global datasets, which generally overestimated forest cover. We achieved accuracies similar to the national maps, following a standardized method for all countries. The difficulties in delineating forest increased in more advanced stages of deforestation, with recurring struggles to distinguish non-forest tree-based systems (e.g. perennials, palms, or agroforestry), shrublands and grasslands. Regrowth forests were repeatedly misclassified across contexts, countries and datasets, in contrast to reference or degraded forests. Our results highlight the importance of in situ verification as accompanying method to establish efficient forest monitoring systems, especially in areas with higher rates of forest cover change and in tropical regions of advanced deforestation or early reforestation stages. These are precisely the areas where current REDD+ or Forest Landscape Restoration initiatives take place.

Introduction
Tropical forests represent almost half of the Earth's remaining forest area, but continue to shrink at relatively rapid rates (FAO and UNEP, 2020), while suffering processes of degradation and landscape fragmentation (Taubert et al., 2018;Vancutsem et al., 2021). The drivers of these dynamics, which are mostly anthropogenic and related to land use (LU) (Curtis et al., 2018;Seymour and Harris, 2019), pose a threat to the multiple ecosystem services and functions provided by tropical forests (Wilson et al., 2017). With the objective of tackling these pressures, several international environmental agreements (e.g. Agenda 2030 for Sustainable Development, Paris Agreement) currently promote numerous programs for the conservation, rehabilitation and sustainable use of forests in tropical landscapes. Some globally relevant examples of established initiatives are the Forest Landscape Restoration (FLR) projects within the Bonn Challenge, or arrangements supported by the Reducing Emission from Deforestation and forest Degradation program (REDD+).
In order to appraise the achievement of international environmental objectives fairly and effectively, forest cover (FC) and its change have to be coherently analyzed across territories, with certifiable methodologies and common metrics (GFOI, 2020;Harris et al., 2018). This is a precondition for drawing sound conclusions about the contributions of these programs to sustainable development. The field of remote sensing offers a low-cost, ready and reliable source of information for individual countries to meet their reporting needs. During the last decades, the availability, quantity and quality of satellite sensors and FC or Land Cover (LC) and LU (LCLU) maps with enhanced spatial and temporal resolution has improved drastically (Galiatsatos et al., 2020;Grekousis et al., 2015). Yet, establishing such operational systems of Measurement, Reporting and Verification (MRV) or National Forest Monitoring (NFM) is particularly challenging in tropical countries. Some known reasons are the lack of national forest inventories or frequently-updated national LCLU maps, limited technical expertise and resources, or the absence of good governance and administrative capacity (Ochieng et al., 2016).
Global forest datasets grant methodological comparability between regions and contexts by considering a larger spatial scope. Thus, they are often presented as an inestimable basis to establish REDD+ reference levels, or to quantify FC and its change at national or regional scales. For instance, the Global Forest Change (GFC) dataset (Hansen et al., 2013), Globeland30 (Chen et al., 2015), or the Copernicus Global Land Service LC Layers (CGLS-LC100) (Buchhorn et al., 2020), are commonly mentioned in MRV or NFM guidelines (Finegold et al., 2016;GFOI, 2020). However, global and regional FC maps must be used cautiously and only under certain circumstances (Tropek et al., 2014). Namely, as a cross-check to the national mapping capacities (if extant), or as a temporary step to developing such proficiencies GFOI, 2020). We summarize the technical limitations of global forest datasets in the following interrelated operational challenges.
First, global FC datasets are not always accurate at local spatial levels. The low accuracies in specific landscapes are partly related to a lack of reference/auxiliary data, such as reliable and detailed in situ information (Fritz et al., 2011). Additionally, inconsistencies may occur between the temporal or the spatial coverage of regional or global maps and the scope of local analysis, together with incongruities between the pixel size of global maps (sometimes of medium to low resolution) and the size of the targeted LCLU patches on the ground. Moreover and especially in the tropics, areas with permanent cloud cover result in low quality or non-existing observations (Hilker et al., 2012). In this respect, Synthetic Aperture Radar (SAR) is a promising technology, as its observations are not affected by sunlight or cloud presence. Its potential for regional forest monitoring (alone or in combination with optical sources) is being explored by current research (Joshi et al., 2016), and the first SAR-based global forest maps have been published already (Martone et al., 2018;Shimada et al., 2014).
Second, the accuracy of global forest datasets varies regionally due to ecological, biophysical and biochemical dissimilarities (e.g. different seasonality, tree height/canopy, water content) of the vegetation between biomes and geographical areas (Crowther et al., 2015;Yang et al., 2017). Distinct forest definitions (based on the minimum size of forest extent, canopy cover and tree height thresholds, or the level of detail of LU) are adequate and accepted in each country or territory depending on the reporting purposes . Matching remote sensing derived classes, which are based on physical thresholds, with national surveys built on definitions of countries or organizations, can be burdensome. For instance, very different tree cover (TC) thresholds of the GFC match the specific forest characteristics of different territories (Galiatsatos et al., 2020;Hansen et al., 2013). Moreover, the change dynamics and the drivers of deforestation often differ strongly between regions (e.g. industrial crops/plantations vs. smallholding) (Curtis et al., 2018;Ferrer Velasco et al., 2020). All these contextual differences make it challenging to establish consistent methods of forest classification and definition, which are equally accurate and reliable across the globe. Third, the accurate differentiation of forest types over large geographic extents still faces some technical burdens. Certain physical variables (e.g. biomass, tree height/cover) have been estimated and mapped globally, but still with issues regarding their validity in the tropics (Hansen et al., 2013;Potapov et al., 2021;Spawn et al., 2020). It is even more challenging to make classification methods match forest definitions, which are based on LU and distinguish between disturbance levels or forest functions (Putz and Redford, 2010;Vancutsem et al., 2021). Similarly, improving the capacity to identify forest stands or certain tree species (e.g. invasive, commercially interesting or selectively logged) could be applied for the effective monitoring of forest degradation or disturbance levels (Fassnacht et al., 2016). These limitations worsen when mapping multifunctional tropical landscapes, which are characterized by mixed fast-growing types of forest and nonforest tree-based systems (Caughlin et al., 2020). A promising application is time series analysis, which can provide valuable insights on LCLU history (Winkler et al., 2021;Woodcock et al., 2020) or on the ecological characteristics of the forest (Jha et al., 2020).
In this study we use data collected in situ across thirty-six tropical landscapes in Africa, South America and Southeast Asia, to generate forest cover maps that combine information from active and passive remote sensing systems. We test the accuracies of such maps and those of other secondary sources which are commonly used for NFM or MRV in the studied regions. With this, we aim to explore the ability to accurately delineate forest in the tropics with up-to-date methods, while studying the influence of different deforestation contexts and LCLUs s on the quality of forest mapping outputs.

Hypothesis
We hypothesize that the deforestation contexts and the associated forest disturbance regimes have an impact on the classification accuracies of forest maps, because they are an exemplification of the problems of geographical variability and the separation of vegetation types. We theorize that this influence might be mostly related to the degree of deforestation/degradation and to the number and proportion of land cover classes, independently of the classification method/dataset or the analyzed region. The framework of how we conceptualize deforestation contexts and forest disturbance regimes is presented in the following subsection, which is then followed by the research questions of this study.

Forest transition: deforestation contexts and forest disturbance regimes
The forest transition theory describes a process of net forest area decline and re-expansion as a result of socio-economic development (Mather, 1992), which has been reported for several nations and regions worldwide (Köthke et al., 2013;Meyfroidt and Lambin, 2011). One of the most common uses of this theory has been the classification of territories into different transition stages based on their FC and deforestation rates, to analyze the related drivers and design effective polices correspondingly, such as the specific regulations related to REDD+ (Angelsen and Rudel, 2013;Hosonuma et al., 2012).
Based on the aforesaid literature, regions passing through these phases build a gradient of what we call deforestation contexts, characterized by specific forest disturbance regimes and pertinent policies: (a) In an initial deforestation context, also known as 'pre-transition' or 'before the frontier', FC is high (close to the potential natural vegetation) and deforestation is still low or inexistent. In this phase, mature forests are abundant, while conservation measures and sustainable concession policies are encouraged. Measures based on timber certification, control of imports/exports, such as the EU's FLEGT Action Plan (Forest Law Enforcement, Governance and Trade), aim to operate at this level. (b) At some point, deforestation and degradation increase and accelerate, in what is known as 'early transition' or 'frontier area' phases, eventually entering a middle deforestation context. These stages are characterized by an increased proportion of disturbed and degraded forests and by the suitability of direct regulation measures (e.g. protected areas, LU zoning) and efforts to reduce the extensive agriculture rent. Gradually, FC decreases at the expense of deforested vegetation (e.g. crops or grasslands), reaching what is typically known as 'late transition' or 'forestagricultural mosaics'. (c) Eventually in an advanced deforestation context, deforestation rates decrease and are ultimately reversed into net positive reforestation rates. This results on an increased proportion of natural (forest succession) or artificial (forest plantations) forest regrowth, occurring in areas which had previously been clearfelled and converted to other LCLUs. This shift into the so-called 'post-transition' phase can be catalyzed by different drivers, such as (a) the abandonment of forest lands due to forest scarcity or diminished agricultural rent, or (b) by structural and policy changes due to economic development. Regions in these advanced stages are also appropriate for direct regulation (e.g. LU zoning and active reforestation: FLR measures) and for environmental policies to increase forest rent and its capture, together with the intensification of the agricultural sector.

Research questions
Building on the forest transition theory as conceptual framework and considering the challenges of using earth observation approaches in tropical forest areas as described above, we focus on the following research questions, which will later serve as structure to organize the discussion section: (1) Can we develop a methodology for the accurate delineation of FC in different tropical regions? (2) How good are the classification accuracies of our forest maps and other global sources in the selected countries/regions, when compared to the existing NFM used for international reporting? (3) How do the different deforestation contexts and their associated forest disturbance regimes influence the results of regional forest mapping in tropical landscapes?
The first two questions are methodological steps to address the main research problem: exploring the influence of de− /reforestation stages on the produced forest maps. Our results can help to establish pathways towards coherent LU planning and sustainable forest management, while improving the knowledge about monitoring of forest disturbance regimes. We want to further understand how to produce consistent forest maps and achieve satisfactory accuracies for the effective monitoring with both conservation and restoration purposes. Such improvements can facilitate the establishment of forest strata to meet the activity data requirements of REDD+ and to efficiently monitor FC in FLR projects. We test our hypothesis in multifunctional landscapes with LCLU dynamics representative of very diverse tropical regions, aiming to establish conclusions and generalizations at pantropical level.

Study design: selection of landscapes, regions and countries
Our research is based on data collected through thirty-six landscapes of approximately 10,000 ha each (Fig. 1), distributed in equal number among nine regions of three tropical countries in Africa (Zambia), South America (Ecuador) and Southeast Asia (Philippines). These landscapes are all study sites of the larger research project Landscape Forestry in the Tropics (LaForeT: www.la-foret.org), coordinated by Germany's federal research organization Thünen Institute of International Forestry and Forest Economics. Each of the landscapes was positioned within the boundaries of an independent jurisdictional unit (chiefdom, parish or municipality in Zambia, Ecuador and Philippines, respectively) to ensure homogeneous formal administration. They were all selected as multifunctional landscapes, thus capturing a diversity of forest and LCLUs of the corresponding region representatively, together with characteristic LCLU change dynamics. The nine selected regions comprise a diversity of biophysical, geographical, socioeconomic and demographic settings, in order to facilitate generalizations from a broader pantropical perspective.
Our study design aimed to obtain a selection of landscapes that depict different forest transition stages, thus a gradient of pantropical deforestation contexts and a variety of the associated forest disturbance regimes ( Table 1). The three regions of each country comprise three different deforestation contexts (initial, middle, and advanced) within the respective national perspective. Previously, the three countries had been selected and classified into the same three categories, considering their situation within the forest transition curve at national level. In order to classify both countries and regions, we estimated FC and average annual change rates from the most up-to-date national LCLU maps used for NFMs and international reporting . Thus, we relied on information from the second phase of the Integrated LU Assessment (

Data collection
We collected ground verification information across the thirty-six research landscapes between September 2016 and October 2019 (Fig. S1). Field teams were composed by two to five researchers familiar with the locally prevailing forest and LCLU types, together with local guides familiar with a particular landscape. They spent approximately one month and a half in each landscape, in which georeferenced ground control points (GCPs) and photographs (GCPhotos) with LCLU information were obtained, following a standardized field protocol (Annex S1) based on existing good practice guidelines (GFOI, 2020;Olofsson et al., 2014).
We conducted a stratified sampling approach to capture the main forest and LCLU types in each landscape. These strata were identified by the expert teams on the ground, through related activities within the larger LaForeT project (e.g. scoping visits, key informant interviews, community workshops, participatory mapping exercises, household interviews, forest inventories). The delineation of relevant strata and the design of the field sampling campaign built on visual interpretation of existing satellite images (Google Earth imagery) or auxiliary maps, such as those produced in participatory mapping workshops. A 4-Tier crosscountry harmonized classification scheme was used to categorize LCLUs (Table S1), based on FAOs FRA forest definitions and on IPCC categories (Di Gregorio, 2005;FAO, 2018). This scheme was modified to include typical LCLUs of the regions, such as particular agroforestry systems (Huxley, 1999). Additionally, the classification system included details on forest disturbance and regeneration history, namely about the type (human/natural) and the age (up to 20 years) of the last disturbance and the type of regeneration (human/ natural). This information was determined by researchers and inhabitants familiar with the locally prevailing forest and LCLU types.
The teams covered every pertinent class with a representative number of GCPs, spatially distributed across each landscape (Fig. 2). A minimum distance of 100 m between points was required, together with homogeneous LCLU within a radius of 10 m around the GCPs. Additionally, photo sequences or GCPhotos, consisting of four pictures in a clockwise direction of compass, were collected for a number of GCPs belonging to the main LCLU classes. In total (Table S2), 16,676 GCPs were collected, with an average of 463 GCPs per landscape: 245, 597 and 548 in Zambia, Ecuador and Philippines, respectively. In addition, more than 14,000 GCPhotos (over 2800 sequences) were collected, with an average of 79 sequences per landscape: 40, 120 and 80 in Zambia, Ecuador and Philippines, respectively.

Digitization of the training & validation dataset
After cleaning the collected GCPs and GCPhotos (removing duplicates and inconsistent data), we harmonized the dataset to fulfil a crosscountry LCLU classification scheme based on forest disturbance regimes (Tables 2, S1 and S2).
First, reference forest represents forests with none or slight disturbances before the ground verification took place. This class includes mostly mature old-growth forests or intact primary forests, but also (in more deforested landscapes) secondary forests, which had the last disturbance at least 10 years ago, without being completely clearfelled. Second, degraded forest comprises areas of forest with a more recent disturbance shorter than 10 years (mostly human impact in the form of logging), leading to a current state of degradation: reduction of forest canopy cover but not completely clearfelled. Next, forest regrowth includes forests which had been completely clearfelled and converted to other LCLUs, but which have subsequently undergone a recovery process either spontaneously (succession) or actively by humans (plantations). The rest of forests with no information on disturbance history (mostly areas of forest identified visually in the satellite images) were categorized as undefined forest.
We consider four classes of deforested vegetation. First, tree-based system covers the most relevant non-forest tree vegetation types: agroforestry systems (e.g. traditional 'chackras' in Ecuador, trees on crops in Philippines), palms (e.g. coconut, oil) or other perennial crops (e.g. cacao plantations, orchards). This category had no observations in Zambian landscapes. Second, annual cropland comprises deforested areas with irrigated or rainfed cropping fields (mostly cereals such as rice or maize) and land prepared for agriculture. Third, shrubland (woody), which was only relevant in Zambia. Fourth, grassland includes mainly pastures, but also other grassland types such as abandoned croplands or grass-covered river banks in Zambia (locally referred to as 'dambos').
Last, representing non-vegetation classes, built-up covers mostly settlements and roads, while waterbody comprises rivers, marshlands and aquaculture, but also oceans in coastal regions.
We digitized polygons containing homogeneous information of the abovementioned LCLU categories with Quantum GIS v3.10 ( Fig. 2), based on the collected GCPs, GCPhotos and using up-to-date satellite images (Google Earth imagery) as a reference. Altogether (Table 2), we digitized 23,880 ha (2136 polygons) of forest, from which 6193 ha (1636 patches) included information about forest disturbance and regeneration history. 4987 polygons of 20,528 ha were digitized for the non-forest categories. To minimize overoptimistic assessment due to overfitting problems , these polygons were split randomly into two independent training and validation datasets, which included 70% and 30% of the total number of polygons, respectively, preserving the share of the LCLU classes per region.

Creation of LaForeT forest maps
The processing steps to create the LaForeT maps and the subsequent analysis were performed with Quantum GIS v3.10, SNAP v8.0, ENVI v5.6 and PyCharm v2019.3. Further details on the selection and the processing of scenes, bands, indices and textures can be found in Tables S3 to S6.

Remote sensing data
The fusion of optical and radar remote sensing data is commonly used in LCLU applications (Joshi et al., 2016), including the mapping and monitoring of FC in tropical regions (Hirschmugl et al., 2020;Reiche et al., 2016). Some known advantage, when compared to the use of single sensors, is the yield of additional information, increasing the chances of targeting specific LCLU types. We created seven multi-sensor composites (stacked raster layers) coregistered to 30 m resolution, which included thirty-nine variables per pixel each: • Seven mosaicked Landsat-8 bands and seven related vegetation indices. • Twenty-four Sentinel-1-derived bands, consisting on one sigma nought and three texture values for two points in time and three different polarizations. • One elevation band (height above sea level), obtained from the Shuttle Radar Topography Mission (SRTM)-1Sec digital elevation model.
These seven composites cover the nine studied regions, as two regions in Ecuador (Amazon) and two in Philippines (Cagayan Valley) are geographically close to each other. Regional spectrograms of the chosen variables for the analyzed LCLU classes can be found in Figs. S2 to S4.
• Landsat-8 Landsat-8 offered a higher number of available scenes and the best spatial and temporal coverage for our regions, when compared to other high-resolution optical sensors (e.g. Sentinel-2). However, as obtaining cloud-free information was still challenging in Ecuador and Philippines, we created multi-temporal seasonal mosaics, similarly to previous approaches (Hansen et al., 2013;Potapov et al., 2012).
A best period of three to four months with cloud-free coverage was selected in each region, usually coinciding with the respective dry season (Table S3). In total, we used 269 scenes from nineteen different Landsat tiles, downloaded using the on-demand service of the United States Geological Survey (USGS) and its ESPA Bulk Downloader. This included all the available Landsat-8 Level-2 Surface Reflectance images (Collection 1 OLI/TIRS Combined) for the selected months within the year of the ground verification and the two previous years. This three- year period permitted almost cloud-free mosaics and was acceptable considering the defined thresholds between forest classes (ten years from the last disturbance) and under the observed LCLU change dynamics. We created and applied a cloud mask to each of the downloaded scenes, based on the Quality Assessment bands and, in the case of Ecuador (where the preliminary results were unsatisfactory) on the 'Fmask' method (Zhu and Woodcock, 2012). Finally, we created 30 m resolution mosaics by co-registering the masked scenes and clipping them to the bounding coordinates of each region (Table S4), with every pixel containing the average cloud-free value for each of the seven Landsat-8 bands (Table S5 and Fig. S5).
We then calculated a group of seven vegetation indices for each of the mosaics (Table S6). This selection was derived from Schultz et al. (2016) and it includes indices based on wetness (NDMI, TCw) and greenness (EVI, GEMI, NDVI, SAVI, TCg), which are commonly used in deforestation monitoring.
• Sentinel-1 We included information derived from Sentinel-1C-band SAR imagery, which can contribute to map FC or LCLU, independently from clouds or luminosity (Abdikan et al., 2016;Hirschmugl et al., 2020). With the aim of capturing short-term LCLU changes, we included scenes from two points in time within the selected season of each region: one close to date of in situ verification (last) and another point two years before (first). In total, we selected thirty-two scenes of Level-1 highresolution Ground Range Detected (GRD) Interferometric Wide (IW) swath data with Dual VV/VH Polarization, and downloaded them from the Copernicus Open Access Hub.
We used a standardized pre-processing workflow to treat our scenes, following good practice recommendations (Palazzo et al., 2018). First, we applied updated orbit files to the downloaded scenes. Second, thermal noise (background energy generated by the receiver) was removed, using the noise lookup tables. Next, we applied radiometric calibration, thus converting pixel values to normalized radar cross-section or backscatter coefficient (sigma nought). As a fourth step, we removed the speckle from our images, by applying the improved Lee sigma filter (Lee et al., 2009). Following, we converted our data from slant to ground range geometry (terrain correction) using bilinear interpolation of the SRTM-1Sec digital elevation model and Universal Transverse Mercator (UTM) as a map projection. The pre-processed bands were then clipped to the bounding coordinates of each region (Table S4), creating two mosaics (first and last) per region. This process was repeated for three polarizations: VV, VH and for the absolute difference between VV and VH's sigma noughts (VV-VH), which had reported improved accuracies in previous studies (Abdikan et al., 2016).
Finally, we converted the original sigma nought values to integer numbers and then calculated three Grey Level Co-occurrence Matrix (GLCM)-derived texture features (Haralick et al., 1973): GLCM-mean, GLCM-variance and contrast (Table S7). Textures account for neighbor pixels and are commonly used in forest monitoring applications (Numbisi et al., 2019;Herold et al., 2004). We used a 9 × 9-pixel window and repeated the process for each polarization and point in time.

Supervised classification and post-processing
We performed a supervised classification for each of our seven composites, using the corresponding regional training datasets (70% of the digitized polygons) and a random forest (RF) classifier (Breiman, 2001). RF is a machine learning method, which has been widely used to classify LCLU (Gislason et al., 2006;Pal, 2005). As a non-parametric method, RF presents the advantage of omitting distribution assumptions and thus, working with multisource information such as our composites. Moreover, RF permits to rate the relative importance or contribution of the different variables to the classification output. Considering the computational time and the accuracy of our regional models, we used a maximum of 1000 trees and 50,000 pixels as training samples; only pixels with valid data (e.g. cloud-free) for all the variables were included in the model and later classified.
In total, we built eight independent RF models to generate eight LaForeT forest maps, which covered an extent of approximately 15 million hectares. The Cagayan Valley composite (Philippines), was classified separately for the two regions of analysis: North and South. For each of the outputs, confidence maps were generated and further analyzed (Table S8 and Fig. S6). Moreover, the bands were ranked based on how much the accuracy decreased when the variable was excluded (Fig. S7). Isolated groups of less than five pixels, considering 8-connectivity, were reclassified as no forest, as they did not reach a minimum size of 0.5 ha. Lastly, an ocean mask was applied to the maps before clipping them to the bounding boundaries of the respective region of analysis (Table S4).

Secondary sources: national and global forest datasets
Next, we selected up-to-date national maps and relevant global forest datasets of high to medium resolution, ranging from 25 to 100 m (Table 3). All the secondary sources were converted to binary Forest/ Non-Forest (FNF) maps, clipped to our areas of interest and coregistered to spatially match our own maps. The national sources were the LCLU maps used for NFMs and international reporting of reference levels in the respective countries, which were the closest to the date of our data collection (ILUA-II, 2016;MAE, 2017;NAMRIA, 2017). Regarding the global forest datasets, we first selected two sources based on optical data: the GFC dataset (Hansen et al., 2013) and the CGLS-  (Buchhorn et al., 2020). Additionally, we selected two recent SAR-derived global FNF maps: one produced by the Japan Aerospace Exploration Agency (JAXA) based on the ALOS-2 PALSAR-2 information (Shimada et al., 2014) and one created by the German Aerospace Center (DLR) based on data from the TanDEM-X satellite (Martone et al., 2018). The GFC dataset is not a forest map itself (it depicts TC) and it provides older estimations (2000,2010) than the period covered by our maps (2016-2019). However, we selected it for its relevance, as it is widely used as a reference for global forest monitoring. In order to generate FNF maps, we defined TC thresholds from GFC's 2010-dataset that matched FC in our regions (Fig. S8), based on Galiatsatos et al. (2020). Regarding CGLS-LC100, a forest map between 2017 and 2019 was selected, depending on the year when the most GCPs were collected in each region. In the case of JAXA, information from 2017 was used everywhere, as it was the most up-to-date dataset available.

Quality analysis
Finally, we analyzed the quality of our map outputs and the selected secondary sources, grouping the results by region, country and deforestation context. We generated error matrices (Olofsson et al., 2014) for all datasets in each of the study regions, by measuring the number of correctly classified pixels within the validation dataset (30% of the digitized polygons). We used the zonal histogram tool of QGIS, which appends fields representing counts of each unique value from a raster layer (i.e. LCLU classes) contained within zones (i.e. validation polygons). We then obtained thematic accuracy measures (user, producer and overall accuracies) for all the compared FNF sources, together with producer accuracies of LCLU subclasses, as the probability of correctly being classified as forest or no-forest (Tables S9 to S14). The main steps related to data collection and processing, as input for the accuracy assessment, are summarized in Fig. 3. Moreover, we analyzed the differences in FC estimation for the different sources, at regional and landscape level (Table S15 and Figs. S9 to S44). In addition, we did a per-pixel spatial comparison based on Yang et al. (2017), in which the overall and the individual-class spatial agreements for every unique pair-combination of datasets were determined in each region, after resampling the datasets to the lowest resolution of each pair by nearest neighbor interpolation (Tables S16 and S17).  Fig. 3. Flowchart diagram of the main steps of data collection and processing, as input for the accuracy assessment.

Cloud-cover and confidence maps
In total, only 2% of the pixels in the analyzed regions (1.25% within the studied landscapes) presented no Landsat-8 data in any scene after mosaicking (Table S5 and Fig. S5). This was mostly due to clouds, cirrus or shadows presence, but also (in a smaller number) waterbodies, settlements or pixels with no data. Altogether, the treated pixels had an average of ten observations (9.97), with regional averages between 15 and 18 scenes in Zambia, 2 and 7 scenes in Ecuador, and 7 and 12 scenes in Philippines. While the Zambian regions were almost completely cloud-free, the availability of optical data in the selected areas of Ecuador and Philippines was more problematic, which justified the use of the mosaics. The region of Esmeraldas in Ecuador presented a relatively high percentage of pixels without information (21.78% of the total area, 4.25% within the landscapes), after mosaicking. The rest of regions had lower number of pixels with no data after mosaicking, with values between 0% and 3.36% (0% and 1.55% within in the landscapes).
The overall standardized average confidence values did not vary strongly across regions, with results between 32% (Esmeraldas) and 46% (North Cagayan) (Table S8 and Fig. S6). FC-specific confidences ranged from 30% (Esmeraldas) to 47% (North Western and North Cagayan). Non-FC-specific confidences were especially low (35-36%) in all the Ecuadorian regions and the highest in both North and South Cagayan Valley regions in the Philippines (44% and 46%, respectively). The maps in Zambia and the Philippines provided the highest average total confidence values (42% and 44%), when compared to Ecuador (38%). Regions in earlier deforestation contexts resulted in better overall confidences (45%) than regions in middle (40%) and advanced (39%) ones, related to decreasing specific confidences of the forest class (46%, 39% and 37%).

Relative importance of variables
Elevation was the most decisive variable across the study regions (Fig. S7). The contribution of this band to the accuracy of the classification algorithm ranked within the five more important variables in every region.
Among the Landsat-derived variables, moisture-related indices (NDMI and TCw) ranked generally better than greenness-related ones. However, some greenness variables, such as NDVI and TCg, were still very relevant in the classification of certain regions (e.g. Southern Cagayan Valley, Esmeraldas, Copperbelt and Leyte). The individual Landsat bands were also relatively important to the classification outputs, with all of them contributing in specific regions. The ultra-blue band (coastal/aerosol) ranked the highest across regions, while the green, red and SWIR bands were also relevant in specific areas.
The Sentinel-1-derived variables also contributed importantly to improve the accuracy of the different classifications. Overall, the textures ranked better than the backscatter signal (sigma0) across polarizations and points in time. For instance, the mean GLCM of the last image ranked second among all the studied variables. In general, the VH polarization reported the best results, in both the first and the last scenes. The VV-bands of the old (first) scenes contributed more relevantly to the accuracy of the classifications than the ones of the new (last) images. The difference polarization (VV-VH) showed the worst results when compared to VV and VH.

Thematic accuracy assessment
The detailed error matrices of all the analyzed maps, with the results for LCLUs grouped by country and deforestation context, can be found in the supplementary material (Tables S9 to S14).

• Overall accuracies
Our produced forest maps (Table 4) had an overall accuracy of 92%. User accuracies (precisions) of 92% and 93% were observed for forest and no-forest, respectively. Our maps presented better producer accuracies (sensitivities) for the forest class (96%) than for the no-forest category (85%).
From all the analyzed sources (Fig. 4), our maps and the national datasets presented the highest overall accuracies for the total sample (92%). Within the secondary global sources, the GFC dataset exhibited the best overall accuracies (91%). The other three global maps reported overall accuracies of 88% (JAXA-FNF), 86% (TanDEM-X-FNF) and 85% (CGLS-LC100).
Our forest maps showed better overall accuracies in Zambia and in the Philippines (96% for both) than in Ecuador (79%). The same pattern was observed in all the analyzed global sources. In Zambia, the national LCLU maps presented the lowest overall accuracies (89%), in relation to the global datasets (with values ranging from 92% to 96%). The classification results in Zambia were characterized by lower overall accuracies in the Eastern Province. In Ecuador, the national LCLU maps also provided the best overall accuracies (93%). In general, the five global datasets (including our maps) presented relatively unsatisfactory overall accuracies across the three Ecuadorian regions (ranging from 48% to 87%). In the Philippines, the national datasets and our maps reported the best results (95% and 96% overall accuracy, respectively) in contrast to a range of accuracies between 79% and 91% in the secondary global datasets. The classification results in the Philippines were repeatedly affected by lower overall accuracies for Leyte. Philippines was also the only subsample where another secondary global dataset different than GFC provided the highest accuracy, namely the JAXA-FNF dataset.
The overall accuracies of our forest maps were better in regions with initial deforestation contexts (96%) than in regions with middle or advanced ones (89% and 90%, respectively). We observed a similar trend in all the secondary sources, with exception of the national and the TanDEM-X-FNF maps.

• Sensitivity of LCLU classes and forest disturbance regimes
Reference forests showed the highest sensitivities (producer accuracies) among the analyzed forest disturbance regimes in three datasets: Table 4 Error matrix with the overall results of the produced LaForeT FC maps (total sample). LaForeT (93%), national (92%) and JAXA-FNF (93%) (Fig. 5). The other three maps reported higher sensitivity of degraded forests, which averaged 90% when considering all the studied datasets. Regrowth forests was the forest class with the lowest sensitivities (75% average of all maps). Even the national LCLU maps, which showed relatively high overall accuracies, reported the lowest sensitivity among the sources (49%) for regrowth forests. The best sensitivities for a forest subclass were observed in forests with no disturbance history (between 92% and 98%), thus in forest areas that had been identified visually in satellite images.
Considering deforested vegetation, the best results were obtained by the national, LaForeT and GFC datasets (94%, 85% and 85%, respectively), while the other sources presented lower sensitivities (between 55% and 74%). The CGLS-LC100 dataset and the two SAR-derived global maps (JAXA-FNF and TanDEM-X-FNF) reported very low sensitivities, even in non-vegetation areas (i.e. built-up and waterbodies). All the sources showed higher sensitivities for annual croplands, with values between 84% and 94%. Worse were the results for other deforested vegetation subclasses, namely for non-forest tree-based systems (e.g. agroforestry, palms and perennials) and for grasslands. The worst results were observed in shrublands (mainly in Zambian landscapes with presence of degraded forests), which always reported very low accuracies below 65%.
In general, the sensitivities of all the forest subclasses decreased in regions and countries with more advanced deforestation contexts, while the opposite trend was observed for deforested vegetation (Fig. 6). Overall, the maps show higher sensitivities for all forest subclasses in Zambia and Ecuador. In contrast, we can observe better results for deforested vegetation in the Philippines. The secondary global forest maps were particularly inaccurate in mapping deforested vegetation, while the national maps delivered the best sensitivities in all the analyzed deforestation contexts and countries. On average, the sensitivities of regrowth forests were the lowest among the forest subclasses independently of the analyzed country or deforestation context.

FC estimations
Details on the estimations of FC for all the landscapes (individually and grouped by region, countries or deforestation context), can be found Fig. 4. Overall accuracies (range 50-100%, with the 100% value corresponding to the outer ring of the presented hexagons) of the different compared regional maps for the total sample and the different subsamples (countries, regions and deforestation contexts).
in Table S15 and Figs. S9 to S44. The national LCLU maps reported the lowest FC estimations (57%) for our landscapes (Fig. 7). The highest estimations were the ones of CGLS-LC100 (75%) and TanDEM-X-FNF (74%), followed by our maps (66%), JAXA-FNF (64%) and GFC (62%). According to our study design, estimations of FC decreased gradually in regions with middle and advanced deforestation contexts for all the compared datasets. At the same time, discrepancies between maps increased along this gradient (Fig. 8).
In Zambia, all the sources provided similar estimations of FC for the landscapes in North Western (from 78% to 89%) and Copperbelt (from 59% to 71%). In contrast, the estimations of FC for the landscapes in the Eastern region varied substantially, between 9% (CGLS-LC100) and 59% (GFC). In Ecuador, the estimations of FC by the global sources were much higher, from 76% (LaForeT) to 95% (CGLS-LC100), than the ones by MAE's maps (61%). These discrepancies were stronger in Esmeraldas and in the Amazon frontier. Similarly, CGLS-LC100 (71%) and TanDEM-X-FNF (76%) provided higher estimations of FC in the Philippines, when compared to the other sources (36% to 48%). These discrepancies were particularly strong in Leyte and South Cagayan. Fig. 9 shows the spatial agreements between our maps and the secondary sources in all the studied regions. The extended results for all dataset combinations and different subsamples are depicted in Table S17. The overall spatial agreements between the different sources had little variation, with values ranging from 76% to 83% and similar results in the three countries. In general, the specific spatial agreements for forests (ranging from 82% to 88%) were higher than the ones for the no-forest class (between 62% and 74%), which were particularly low in Ecuador (32% to 65%). Only in Philippines, the specific spatial agreements for the no-forest class were similar and even higher (68% to 92%) than the ones for the forest class (58% to 85%). We observed that the overall and forest-class specific agreements gradually decreased in regions with more advanced deforestation contexts. Thus, overall agreements ranged from 83% to 90% in initial, from 74% to 83% in middle and from 59% to 78% in advanced deforestation contexts, respectively. In contrast, no-forest class-specific agreements remained similar across deforestation contexts or even increased in later forest transition stages, ranging from 60% to 72%, 60% to 74% and 61% to 81% for initial, middle and advanced deforestation contexts, respectively.

Fig. 5.
Sensitivity or producer accuracies (range 50-100%, with the 100% value corresponding to the outer ring of the presented hexagons) of the specific LCLU types and forest disturbance regimes (based on the forest transition theory) in the analyzed datasets for the total sample. Note: The first row depicts forest disturbance regimes, represented by the different stages related to the forest transition. The second row shows the results for the specific LCLU types within the deforested vegetation category. The third row includes LCLU classes not included in the analysis of the forest disturbance regimes.

Mapping of tropical forest
We were successful in developing a standardized and consistent methodology to generate accurate high-resolution (30 m) forest maps for various tropical regions across three continents. Overall, our findings reaffirm the potential of using innovative machine learning techniques together with the fusion of freely-accessible multi-sensor and multitemporal satellite information, in order to improve the outputs of tropical forest mapping Reiche et al., 2018;Wang et al., 2019). Our reference dataset, along with the produced maps and methods, can be used in future studies to analyze additional forest disturbance or LCLU aspects in the tropics.
The application of a non-parametric classifier such as the RF algorithm presented the advantage of dealing with several bands, indices and textures per pixel, capturing the physical and spectral differences of forest between the analyzed regions (Figs. S2 to S4 and S8).
Elevation was the only variable that strongly enhanced the map outputs in all the studied regions. This highlights the potential of DEMs as valuable auxiliary information to improve LCLU classification accuracies by, for example, reducing the relief effect of satellite images or by predicting disturbance susceptibility (Fahsi et al., 2000). We also interpret that elevation acted as an indicator of accessibility, which is s key determinant of deforestation in the tropics, observed across the studied landscapes. Moreover, our findings reaffirm the relevance of wetness-related indices for the effective monitoring of FC in the tropics, when compared to greenness-related ones (Schultz et al., 2016). Similarly, the importance of the ultra-blue band could be related to mist/ haze and other fine aerosol particles, which are characteristic of areas with continuous rain and cloud coverage (Pöschl et al., 2010). In further studies, it might be opportune to incorporate more complex indices related to canopy density (e.g. Normalized Difference Fraction Index) or leaf surface properties (e.g. Leaf Area Index), which have reported satisfactory results in the past (Souza et al., 2013). Finally, our findings expand the recent developments in the field of SAR, by ratifying the advantages of using textural information, derived from Sentinel-1 backscatter (i.e. recurring importance of GCLM-mean of the VH polarization across regions), to map FC (Numbisi et al., 2019). Additionally, better contributions of certain variables in older scenes (i.e. VV polarization) ratify the importance of including multi-temporal information to capture historical LCLU and FC changes (Pulella et al., 2020).
However, we have to be cautious when interpreting the relative importance of variables in RF models, especially if a large number of predictors are used. This behavior may lead to serious overfitting problems and biased estimations, due to unaccounted spatial correlation between variables . This can also be the reason for the region-specific results and for the unexpected contribution of certain variables (e.g. ultra-blue band), which may be correlated to other predictors like elevation (Fig. S7). Further studies should consider a preselection of variables in every region, based on expert knowledge or spectral separability.
Furthermore, comparisons of the results for the studied sensors (Landsat-8, Sentinel-1) need to be addressed critically, due to the substantial differences on the type and availability of temporal data used. For instance, the creation of Landsat-8 seasonal mosaics using a relatively long 3-year period, lead to very different timestamps per map, depending on regional cloud cover. Additionally, the quality and density of these mosaics decreased drastically in areas with poor availability of data (i.e. Ecuador). In contrast, Sentinel-1 uses single observations for only two points in time. Further studies could try to increase data density and ideally perform a time-series approach by extending the analysis period or the number of sensors. This could improve the poor results obtained for certain LCLUs, which suffered recent changes. In addition, Fig. 6. Sensitivity or producer accuracies (range 50-100%, with the 100% value corresponding to the outer ring of the presented hexagons) of the forest disturbance regimes in the analyzed datasets, grouped by deforestation contexts (left) and countries (right) Note: Deforestation contexts, countries and forest disturbance regimes are represented by the different stages related to the forest transition. some processing steps may be optimized, such as the use of median (instead of average) to reduce the blur of the optical mosaics, or the use of multi-temporal speckle filters for the SAR scenes (Wang et al., 2019;Woodcock et al., 2020).

Comparing tropical forest maps
Our extensive field campaign to collect training and validation data in situ allowed us to achieve satisfactory classification outputs, which generally outperformed the results of the global secondary maps (Fig. 4). This emphasizes the importance of using updated reference data from the ground, which ideally should include detailed and standardized information about the different forest strata. Similarly, the relatively high accuracies of JAXA-FNF in the Philippines are probably related to the fact that the country was used for the training of the map's classifier (Shimada et al., 2014). Undefined forests (identified visually in the satellite images), reported the highest producer accuracies in all the compared datasets and contexts (Fig. 5). We argue that only relying on this type of information for training and validation might omit relevant forest types and lead to wrong estimations of FC (Figs. 7 and 8). Certainly, there is a trade-off between reducing economic and logistic costs of implementing such an extensive field campaign and improving the quality of the generated maps. Regarding this, the synergetic development of collaborative and harmonized global reference databases and the integration of both NFM and Inventory systems in tropical countries are still highly desired (Fritz et al., 2011).
The generally high accuracies of the maps produced by the national mapping agencies (Fig. 4) are promising, as we analyzed three countries with very different capacities regarding their MRV/NFM systems and their commitments to international reporting (e.g. participation in REDD+ program) (Nesha et al., 2021). In Zambia (Phiri et al., 2019), where NFM agencies are still undergoing phases of development and capacity building, the recently produced ILUA-II maps performed well but still slightly worse than the global datasets. In Ecuador, MAE's relatively long-established inventory and mapping capabilities delivered satisfactory overall accuracies, in contrast to the disconcerting results of all other datasets, which noticeably overestimated FC (Fig. 7). In order to produce their regularly updated national LCLU and deforestation maps, MAE uses a combination of Landsat time-series and very high resolution imagery for training and validation (i.e. RapidEye and aerial photographs) (MAE-MAGAP, 2015). In Philippines, where again global secondary sources generally overestimated FC (Fig. 7), NAMRIA's 2015 maps reported the best accuracies in the three studied regions. This suggests an improvement of the quality of previous LCLU datasets by the Philippine national mapping agency (Estoque et al., 2018;Santos, 2018).
Nevertheless, any comparison of results between regions or between map sources should be made critically. For instance, the quality of the different maps depends on their scale and purpose, but also on the sensors used (active vs. passive) and the related resolutions and processing steps. Related to this, the size of the uniform LCLU patches observed on the ground, which should match the minimum mapping unit required by the resolution of the used satellite sensors, is regiondependent (Table 2). This could explain the generally better results in Zambia, where larger patches were observed, and the difficulties to detect smaller deforested vegetation patches in Ecuador (Smith et al., 2003), usually surrounded by forests of greater heights and denser canopy cover (Fig. S8). Furthermore, cloud cover clearly affected the confidences of our maps and the overall accuracies of the global maps in Philippines and especially in Ecuador, but barely in Zambia. Additionally, the temporal gap between data collection and scene acquisition ( Fig. S1 and Table S3) or map production (Table 3), might explain better accuracies of datasets in specific regions (e.g. JAXA-FNF in Philippines).
Further studies can try to optimize this caveat by using auxiliary information to improve outdated maps, such as the GLAD alerts in the case of GFC (Hansen et al., 2016). Regarding this dataset, our findings confirmed how a preliminary definition of a TC threshold, can match the diverse forest definitions and deliver improved classification accuracies Fig. 9. Spatial agreements between LaForeT maps and the selected secondary forest datasets. See Fig. 1 for reference. (Galiatsatos et al., 2020), even if there is a temporal gap with the validation data. The GFC analysis (Fig. S8) also underpins the strong regional dependency of ecological features (i.e. TC) and the high sensitivity of map outputs to these biological aspects. For instance, the presence of other tree-based systems commonly misclassified as forest (Fig. 5) has probably influenced the classifications of certain regions negatively. The clearest examples are Esmeraldas in Ecuador, with large oil palm plantations, and Leyte in the Philippines, characterized very steep mountains and historical expansion of coconut palms to take part of degraded forest in the last decades (Estomata, 2014). Furthermore, the worse results in the Eastern province of Zambia can be related to the known challenges in mapping sparse forests of dry ecosystems associated with woodlands or savannas (Feng et al., 2016;Hill, 2021). These ecosystems are characterized by lower canopy densities, slower growth rates, less greenness or water content and problematic LCLUs, such as shrublands (Fig. 5). The better accuracies of our method and SAR-based global sources in this region suggest potential advantages of using SARderived observations (alone or combined with optical data) to accurately map forests and deforestation in dry tropical areas, as previously demonstrated by other studies Reiche et al., 2018).

Monitoring tropical forest across forest transitions
Our initial hypothesis, that the different deforestation contexts and their associated forest disturbance regimes strongly influence the classification outputs of regional forest maps in the tropics, finds empirical evidence in our analysis. We observed a tendency of increased difficulties in distinguishing FC by global maps in more developed stages of our deforestation contexts gradient. This was manifested as progressively worse classification outputs in regions with middle and advanced deforestation contexts, regarding not only the confidences of our maps (Table S8) and their overall accuracies (Fig. 5), but also the accuracies of the secondary global datasets and the overall and forest-specific spatial agreements among map sources (Table S17). Generally, all the studied forest types, reported worse producer accuracies in middle and advanced deforestation contexts, independently of the analyzed dataset (Fig. 6). Consequently, the estimation of FC in these regions presented wider ranges or variances, associated with larger uncertainties and errors (Figs. 7, 8 and 9).
Apart from the specific methodological limitations of each region or dataset, as discussed in the previous subsections, these findings can also be explained by our general hypothesis. Namely, accelerated LU dynamics in advanced deforestation contexts result in more diverse and complex LC patches of smaller size, with increased difficulties to map forest correctly (Smith et al., 2003): i.e. tree-based systems (i.e. perennial crops, palms and other agroforestry arrangements), shrublands and grasslands (Fig. 5). Accelerated LU dynamics also result in more degraded and sparse forests, which again increase the uncertainties of FC measurements and disturbance detections (Feng et al., 2016;Vancutsem et al., 2021). This would also explain why regrowth forests presented worse producer accuracies than reference and degraded forests across datasets, countries and deforestation contexts (Fig. 6); thus, confirming the challenges to identify relatively young (less than 20 years) tropical tree plantations and succession forests, grown in areas which have been completely clearfelled (Caughlin et al., 2020;Li et al., 2017).
The number of rehabilitation and reforestation initiatives in tropical landscapes is growing, as forests are a specific target within Goal 15 of the Sustainable Development Goals for 2030 (SDGs) (Holl, 2017). For instance, FLR projects within the Bonn Challenge have 350 million hectares pledged worldwide, together with country-led partnerships, such as Initiative 20 × 20 or AFR100. Other examples are afforestation and reforestation projects within the Clean Development Mechanism (CDM) or the Great Green Wall project in Africa, which aims to restore 100 million hectares of currently degraded land by 2030. The goals of these initiatives (increasing vegetation cover, biodiversity recovery and recovery of ecological processes) often synergize with those of other relevant programs in place, like REDD+ (Verchot et al., 2018). Yet, as forest protection and rehabilitation measures continue to bloom in the tropics, so does the need for rigorous monitoring and improved implementation and reporting mechanisms (Murcia et al., 2016;Stanturf et al., 2019).
Our findings suggest that the recommendation of using forest datasets carefully and rather as a reference, is especially relevant in regions with more advanced stages of degradation/deforestation or for the case of reforested areas. We argue that these regions with higher rates of FC change also have a greater need to use stratified in situ information for training/validation and to develop improved classification approaches which can be linked to forest condition and landscape multifunctionality. These are precisely the regions where most of the abovementioned environmental programs (e.g. REDD+ or FLR) are likely to take place. Omitting this may lead to wrong estimations of FC and therefore to biased conclusions about the success or failure of such international policies.

Conclusion
Our study represents an innovative attempt to analyze forest classification accuracies at pantropical level on basis of the forest transition theory. In the context of the international Agenda 2030 for Sustainable Development and the Paris Agreement, numerous measures and programs for the conservation, rehabilitation and sustainable use of forests are being implemented worldwide (e.g. FLR, REDD+). Although the goals of these initiatives might be well-intended and desirable, there is a need to improve the technical capacity to measure their success or effectivity, in order to draw sound conclusions on their contributions to sustainable development. This includes the ability to monitor tropical FC accurately and derive precise estimations of the quality and quantity of the associated ecosystem services. Our pantropical study clearly demonstrated how all the compared national and global forest maps struggled to differentiate forests with a disturbance history from other vegetation types, often resulting in wrong FC estimations. We empirically proved that these complications are accentuated in regions with higher rates of FC change (in advanced stages of deforestation or reforestation) and particularly for forests grown in previously deforested areas. We therefore interpret our findings as evidence that the deliberations regarding the applicability of secondary forest maps and the establishment of forest monitoring systems should be especially critical in these contexts. Our results also indicate the importance of in situ verification as accompanying method for MRV in regions of advanced stages of deforestation and early stages of reforestation. This should be relevant for upcoming policy making and research, as these are also the areas where forest protection and rehabilitation measures are required the most.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.