Forecasting deforestation in the Brazilian Amazon to prioritize conservation efforts

As Amazon deforestation rates reach the highest levels observed in the past decade, it is extremely important to direct conservation efforts to regions containing preserved forests with a high risk of deforestation. This requires forecasting deforestation, a complex endeavor due to the interplay of multiple socioeconomic, political, and environmental factors across different spatial and temporal scales. Here we couple high-resolution land-cover maps with Bayesian hierarchical spatial models to identify the main drivers of recent deforestation in the Brazilian Amazon and predict which areas are likely to lose a larger proportion of forest in the next 3 years. Recent deforestation was positively associated with forest edge density, the length of roads and waterways, elevation and terrain slope; and negatively associated with distance to urban areas, roads, and indigenous lands, area designated as protected or indigenous territory, and municipality GDP per capita. From these variables, forest edge density and distance to roads showed the largest effect sizes and highest predictive power. Predictive accuracy was highest for shorter time windows and larger grid sizes. Predicted deforestation was largely concentrated in the North-Eastern portions of the Brazilian Amazon, and amounted to roughly 3, 5, and 6 million hectares for 2020, 2021, and 2022, respectively. About 50% of this predicted deforestation is expected to occur inside protected areas or indigenous lands. Our short-term forecasts can help plan preventive measures to limit deforestation while meeting the specific needs of local areas.


Introduction
The Amazon rainforest contains half of the planet's remaining tropical forests and holds 10% of the global carbon reserves, stabilizing local and global climate (Brienen et al 2015, IPBES 2018. It constitutes the most extensive collection of living plants and animals, holding 10% of the world's known species, and is home to more than 30 million people, including more than 350 ethnic groups (WWF 2016). Given the paramount importance of the Amazon rainforest for global climate and biodiversity, substantial national and international policies and resources have been directed to protect its standing forests (Ometto et al 2011, Nepstad et al 2014, Tollefson 2015, Levis et al 2020, resulting in a drastic decrease in deforestation rates since 2004 (PRODES 2020). However, deforestation rates have been increasing since 2015, and in 2020 rates reached the highest levels of the decade (PRODES 2020, Silva Junior et al 2021). As immense forested areas are being destroyed and fragmented, it is imperative to direct conservation efforts to regions containing preserved forests with a high risk of deforestation (Stropp et al 2020).
Forecasting deforestation across the Amazon is nevertheless a complex endeavor due to the influence of multiple socioeconomic, political, and environmental factors operating across different spatial and temporal scales (Laurance et al 2001, Soares-Filho et al 2006, Rosa et al 2013, Swann et al 2015, Bax et al 2016, Sales et al 2017, Vijay et al 2018, Brito et al 2019. Although past efforts forecasting Amazon deforestation have employed binary forest cover layers representing primary forest (Rosa et al 2013, Swann et al 2015, Sales et al 2017, Brito et al 2019, high-resolution, accurate land-cover maps recently became available for the entire Amazon biome, from 1985 to 2019 . These new maps allow modeling the deforestation of both primary and secondary forests (Nunes et al 2020) and assessing fine-scale changes in forest composition and configuration across the time series. Additionally, most past Amazon deforestation forecasting efforts generated categorical maps, indicating areas with a high probability of being deforested in the future, and only two studies created continuous predicted deforestation maps (Rosa et al 2013, Sales et al 2017. Importantly, continuous maps allow attributing a specific deforestation risk to a delimited region, thereby enabling policy makers to prioritize preventive actions in the field and allocate their available resources accordingly. Lastly, deforestation is a highly non-random process that tends to be aggregated in space, more likely to occur in the proximity of previously deforested areas (Brito et al 2019). Despite this well-known property of deforestation, this spatial autocorrelation in deforestation has rarely been considered in forecasts (Sales et al 2017), even though doing so may allow for improved prediction and less biased inference in understanding factors driving deforestation.
Recent advances in spatial modeling increase the ability to reliably capture spatially autocorrelated processes in space and time across large regions. Bayesian hierarchical spatial models implemented with integrated nested Laplace approximations (INLA) allow relatively fast Bayesian inference by using numerical approximations (Rue et al 2009(Rue et al , 2017. When combined with a stochastic partial differential equation (SPDE) for modeling the spatial correlation, INLA-SPDE has shown many powerful applications, including predicting spatially autocorrelated responses (Lindgren and Rue 2015, Zuur et al 2017. INLA-SPDE allows the implementation of complex models using simple code, grants fast inference even for spatial problems with hundreds of thousands of observations , and usually outperforms other spatial modeling and machine learning approaches (Bhatt et al 2017). INLA-SPDE thus represents a more straightforward, flexible, and reproducible approach to forecast Amazon deforestation than those based on multiple complex modeling steps (Rosa et al 2013, Bax et al 2016, Sales et al 2017, Vijay et al 2018, Brito et al 2019 or simulations with pre-defined static parameters (Soares-Filho et al 2006, Swann et al 2015. Additionally, INLA-SPDE allows performing variable selection using different approaches and across spatial and temporal scales, thereby providing an ideal modeling framework to predict deforestation in a specific time window, over the entire biome, or in regions of special interest. Here we couple high-resolution, accurate landcover maps across the biome with INLA-SPDE models to identify the main drivers of recent Amazon deforestation and predict which areas are likely to lose a larger proportion of forest. Our main aim was to implement a flexible spatial modeling approach that can be easily replicated to generate shortterm (1-3 years) deforestation forecasts across the Brazilian Amazon as well as in specific regions. Since our models can be easily adapted to reflect different temporal and spatial scales, and their predictions can be readily updated as new data become available, we hope they facilitate the planning of preventive measures targeted to the specific needs of certain areas, and thereby help reduce future Amazon deforestation. We provide all code and data needed to replicate our analyses, and an interactive map enabling visualizing our predictions.

Dataset
The deforestation process can be separated into an occurrence and an intensity component (Sales et al 2017). Since the bulk of short-term deforestation occurs in the proximity of deforested areas (Brito et al 2019), we chose to focus on modeling deforestation intensity. Deforestation was thus measured as the proportion of forest replaced by anthropic land-cover classes in a specific area during a given time interval, quantified as the proportional change in the number of forest pixels on land-cover maps from MapBiomas. Forest pixels replaced by water or non-forested natural areas (like wetland, grassland, or rocky outcrop, see full class legends in https: //mapbiomas.org) were not considered as deforested, since these do not represent anthropic landcover classes. As our goal was to provide short-term deforestation forecasts (2020-2022), we only analyzed deforestation occurring in the recent past (see details in section 2.3). We restricted our analyses to the Brazilian Amazon (containing approximately 60% of all Amazon forest), because it has unified databases covering protected areas, indigenous lands, socioeconomic indicators, etc. Additionally, the entire region is subject to the same national legislation regime.
We used Google Earth Engine to analyze landcover maps from MapBiomas (collection 5.0), and quantified deforestation in grids of different sizes (see details in next section) covering the entire Brazilian Amazon. MapBiomas comprises a multidisciplinary network that reconstructed annual land use and land-cover information between 1985 and 2019 for Brazil, based on a random forest model applied to the Landsat archive using Google Earth Engine (Souza et al 2020) (see updated methods here: https://mapbiomas.org/en/methodologyoverview). They mapped five major classes, which were subsequently broken into two sub-classification levels leading to the most comprehensive and detailed mapping for the country at a 30 m pixel resolution. The average overall accuracy of the land use and land-cover time-series, based on a stratified random sample of 75 000 pixel locations, was 95% in the Amazon biome .
To model deforestation and generate forecasts, we first selected a suite of putative deforestation predictors. To do so we performed a non-exhaustive literature review and listed all socioeconomic, environmental and landscape variables found to be associated with Amazon forest deforestation in previous studies (table S1 (available online at stacks.iop.org/ ERL/16/084034/mmedia)). We then selected all variables that could be retrieved or calculated from open access repositories or spatial layers, assessed missing values and cross-correlations, and chose a set of 20 relatively uncorrelated (r < 0.6) predictor variables (table S2). Percent pasture cover was included as a landscape composition metric (highly negatively correlated with percent forest cover), since pastureland is widespread across the Brazilian Amazon and cattle ranching is known to be one of the main drivers of Amazon deforestation (Barona et al 2010, Faria and Almeida 2016, Jusys 2016. Besides pasture, other land-cover classes were rare, spatially clumped, or unrelated to deforestation, so they were omitted. We chose forest edge density to assess landscape configuration, given that it is strongly associated with Amazon deforestation (Broadbent et al 2008, Cabral et al 2018, and was highly correlated with other metrics of configuration (e.g. number of forest patches and mean patch area). Since most spatial layers did not contain temporal information (e.g. when indigenous lands were created, when roads were built, etc), or represented point estimates for a specific year (e.g. socioeconomic indicators for municipalities), we considered these as 'static' variables (Rosa et al 2013). These do not change across time but are useful to differentiate regions by their abiotic and socioeconomic characteristics. In contrast, we retrieved four 'dynamic' variables directly from land-cover maps of each year (forest edge density, proportion of pasture, minimum distance to mining areas, and minimum distance to urban areas), representing predictors that change over time and reflect the condition of each site at a specific year (table S2). All predictor variables were retrieved using the same grids employed to assess deforestation, through custom R functions. Due to the heavy computing burden imposed by these analyses we used a computer cluster comprised of a SGI UV 3000 machine (one Blade with 384 cores, Intel(R) Xeon(R) CPU E5-4650 v3 @ 2.10 GHz, and 2 TB RAM) based at Instituto Tecnológico Vale.

Modeling Amazon deforestation
We modeled deforestation intensity using INLA models with a spatially correlated random effect (INLA-SPDE) and a binomial distribution of errors, implemented through the INLA R package (Lindgren and Rue 2015, Zuur et al 2017. We used the proportion of deforested pixels in each grid as the response variable, coded as the ratio of the number of deforested pixels to the total number of pixels on each grid cell. An observationlevel random effect was included in all models to account for over-dispersion. We used constrained refined Delaunay triangulation to define a mesh for the spatial correlated random effect (Zuur et al 2017). The mesh contained an inner area, covering the entire Brazilian Amazon (where triangle edges were smaller than one decimal degree), and an outer area (where edges could reach four decimal degrees) to avoid boundary effects. A cutoff value was used to ensure that sampling locations within less than one decimal degree were replaced by a single vertex (total number of vertices ranged between 692 and 715, figure  S1). This mesh was then used to estimate Gaussian Markov random fields with a Matérn covariance function through SPDE (Lindgren and Rue 2015, Zuur et al 2017, Gómez-Rubio 2020, Moraga 2020. Since the primary goal of modeling was prediction, we approached variable selection explicitly based on cross-validation, rather than model selection criteria. We used spatial blocking crossvalidation (Wenger andOlden 2012, Roberts et al 2017) to identify the combination of predictor variables resulting in highest prediction accuracy across different temporal and spatial scales. Using k-means clustering of geographic coordinates we grouped our sampling locations into five spatial blocks, covering the entire Brazilian Amazon (figure S2). The observed deforestation values from each spatial block were then compared with the predicted deforestation values generated with observations from the remaining four blocks, using root mean square error (RMSE) as a measure of prediction accuracy. This spatial blocking cross-validation approach was implemented borrowing code from the spatial leave-one-out crossvalidation from the INLAutils R package (Lucas et al 2020).
To assess the influence of the temporal scale choice on prediction accuracy, we quantified deforestation during different time windows (2018-2019, 2017-2019, 2016-2019 and 2015-2019), always using 2019 as reference since it is the year with the latest available land-cover map. We employed data from grids of 25 × 25 km for all base years, given that this grid size represented smaller areas than that of most municipalities in the Brazilian Amazon ( figure S3). We then fitted a full model for each base year, containing all 20 predictor variables (forest edge density was the only predictor showing a nonlinear effect on deforestation, so it was modeled as a random walk trend). All predictor variables were scaled and centered, and only additive effects were modeled, ignoring possible interactions. We ran spatial blocking cross-validation to compare each of these full models with reduced models where each predictor was removed one by one. The process was repeated starting with a new full model where the predictor variable resulting in the highest drop in RMSE was excluded. We ran several rounds of this variable selection protocol until reaching a model where the removal of any predictor variable did not reduce RMSE and chose this as our final model. Variable selection was performed in the computer cluster described above and took, on average, 62 h per run (base year).
Having identified the combination of predictor variables resulting in highest prediction accuracy for each time window, we then assessed the influence of grid size on prediction accuracy. To determine the influence of the spatial grain choice on prediction accuracy, we quantified deforestation in squared grids of different sizes (5 × 5, 10 × 10, 15 × 15, 20 × 20, 25 × 25, 30 × 30, 35 × 35, and 40 × 40 km) covering the entire Brazilian Amazon. We then fitted models using data from grids of all sizes for a single time window (2018-2019), employing the same set of predictor variables resulting in highest prediction accuracy in the models above (see results). RMSE was then computed for each grid size using spatial blocking five fold cross-validation.
To illustrate how our predictive approach can be down-scaled to smaller areas, we re-ran analyses for a smaller region at a finer spatial scale. We selected a 100 × 100 km area surrounding the Altamira region in the State of Pará, which showed high predicted deforestation across all periods (see results). We then retrieved deforestation and predictor variables for this region at the smallest spatial scale (5 × 5 km), across all years, and ran a full variable selection protocol employing spatial blocking crossvalidation. We were thus able to assess how variable selection results and deforestation predictions generated for a smaller region compare to those of the entire Brazilian Amazon.

Predicting future deforestation
We employed the best fitting INLA-SPDE models for each base year (containing the combination of predictor variables resulting in highest prediction accuracy) to predict future deforestation using the same set of scaled and centered predictors retrieved for the reference year of 2019. Since each model was fitted for a different time window in the past, predictions represent the same time window in the future (2018-2019 models were used to predict 2019-2020 deforestation, while 2016-2019 models predicted deforestation in 2019-2022). We restricted our predictions to three years in the future, as prediction accuracy decreased sharply afterward (see results). We did not assess temporal forecast error because this would have implied moving further into the past (assessing a 3 year prediction would require fitting models on 2013-2016 deforestation to predict 2019), and thus assessing the effect of different deforestation drivers under lower deforestation rates (Silva Junior et al 2021). Models were ran joining estimation and prediction stacks, whereby the latter contained missing values (NA) as response variable. Predicted deforestation proportions for each grid were then transformed to hectares, considering a 30 m pixel resolution (900 m 2 = 0.09 ha), saved as vector (shapefile) layers, and used to estimate total predicted deforestation per year as well as deforestation inside protected areas and indigenous lands. We calculated confidence intervals for yearly deforestation estimates by multiplying RMSE from each model by the total number of analyzed pixels across all grids, given that our response variable and error measure were proportions. Finally, we generated interactive maps with the leaflet package (Cheng et al 2019), and static maps using QGIS (QGIS Development Team 2020). A flowchart summarizing the methods to predict recent Amazon deforestation is presented in figure S4.

Results
Recent deforestation was positively associated with forest edge density, the length of roads and waterways, elevation and terrain slope; and negatively associated with distance to urban areas, roads, and indigenous lands, area designated as protected or indigenous land, and municipality GDP per capita (table S3). From these variables, forest edge density and distance to roads showed the largest effect sizes and highest predictive power, as revealed by the change in RMSE (figure 1). Variable selection through spatial blocking cross-validation showed reasonably consistent results across time windows, as eight predictors were selected in all three years, five in two years, and only two were chosen in a single year (figure 1 and table S3). Five variables where excluded in all years (distance to mining and urban areas, length of State roads, proportion of pasture, and aridity; table S3), so they were excluded when assessing prediction accuracy across different grid sizes. Predictive accuracy was highest for shorter time windows and larger grid sizes (RMSE for a 3 year prediction using 25 × 25 km grids was 0.018 representing approximately 1136 ha, figure 2). Moreover, we found a lower temporal consistency in variable selection results when analyzing a smaller region (figure S5), and predictive accuracy was lower when compared with that of the entire Brazilian Amazon (RMSE for a 3 year prediction of the Altamira region was 0.08, representing approximately 189 ha in the 5 × 5 km grid).
Predicted deforestation was largely concentrated in the North-Eastern portions of the Brazilian Amazon, and amounted to 3.36 ± 4.70, 4.84 ± 6.18 and 6.10 ± 7.27 million hectares for 2020, 2021, and 2022, respectively (figures 3 and S6). Deforestation predictions are plausible since predicted deforestation in each grid was always smaller than the area occupied by existing forests in 2019. A substantial fraction of this predicted deforestation is expected to occur inside protected areas and indigenous lands (approximately 20% and 30% of total predicted deforestation, respectively; figure 4). For instance, predicted deforestation in the Altamira region concentrates inside the Ituna/Itatá Indigenous land (figure 3).

Recent past deforestation
Our results confirm previous studies reporting a significant association between deforestation and length and distance to roads and rivers, protected areas and indigenous lands, and forest edge density (figure 1 and table S3) , Laurance et al 2009, Soares-Filho et al 2010, Armenteras et al 2013, Barber et al 2014, Cabral et al 2018. Forest edge density showed the strongest effect size on deforestation (Broadbent et al 2008, Cabral et al 2018, suggesting that forest fragmentation facilitates human access and results in further deforestation. These results indicate that initiatives to reduce fragmentation (through active restoration or natural regeneration) and preserve contiguous forests may reduce deforestation risk (Orsi andGeneletti 2010, Tambosi andMetzger 2013).
Contradicting earlier findings, our models did not reveal significant associations between deforestation and the proportion of pasture (Barona et al 2010, Rosa et al 2013, Faria and Almeida 2016, Jusys 2016 nor the distance to mining areas (Sonter et al 2017). The lack of an effect of pasture suggests that recent deforestation is no longer driven by cattle ranching, which is supported by MapBiomas transition   analyses (only 1.5% of all forest in the Brazilian Amazon was transformed into pasturelands between 2015 and 2019). Secondary forests (which were analyzed jointly with primary forests) exhibit different deforestation dynamics than primary forest (Nunes et al 2020), so they could have contributed to mitigate the effect of cattle ranching on deforestation. Alternatively, an intensification of production on current agricultural lands could be reducing demand for new areas (Lapola et al 2014, Strassburg et al 2014. The absence of an effect of distance to mining areas, on the other hand, suggests that the result reported by Sonter et al (2017) is restricted to older mining leases (they analyzed operations approved before 2005) and deforestation accumulated for a longer time window (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015). Since our posterior means were always positive (indicating increasing deforestation further away from mining areas) and credible intervals wide, our results suggest that recent deforestation in the Brazilian Amazon is not driven by mining, although distance to mining sites might help predict future deforestation in specific regions (figure S5). We also found higher deforestation associated with steeper terrain and higher elevation, whereas Bax et al (2016) and Rosa et al (2013) found lower deforestation at higher elevations. This discrepancy suggests that the effect of elevation on deforestation may change in time (indeed we only detected a significant elevation effect in 2016). Considering that elevation across the Brazilian Amazon is relatively low (median = 173 m above sea level, 95% quantile = 483.07 masl), our findings suggest that areas which are more prone to flooding (lowlands and flatlands) are less attractive for human development and therefore less likely to be deforested.
In line with previous studies, we found higher deforestation in municipalities with lower per capita GDP (Faria andAlmeida 2016, Jusys 2016) (but total municipality GDP was found to be positively associated with deforestation (Rosa et al 2013)), suggesting that municipalities with a higher income per capita have lower incentives to cut down the forest. Although the indigenous population in Brazil shows lower socioeconomic and health indicators than the non-indigenous population (Coimbra et al 2013), their traditional way of life and culture depends on forest resources, so they have incentives to avoid deforestation. Indeed, our results show an inhibitory effect of indigenous lands and protected areas on deforestation, as expected (Nepstad et al 2006, Baragwanath andBayi 2020). However, we also found higher deforestation in the proximity of indigenous lands and protected areas, which suggests increasing pressure on protected forests (Guerra et al 2019).

Predicted future deforestation
Variable selection through spatial blocking crossvalidation showed more consistent results across temporal than across spatial scales (figures 1 and S5), revealing that the effect of different deforestation drivers is strongly dependent on the chosen spatial scale (Fletcher and Fortin 2018). These results suggest that a flexible variable selection approach yields higher predictive accuracy than those based on a fixed set of predictors with pre-defined effects on deforestation (Soares-Filho et al 2006, Brito et al 2019. Our deforestation predictions were nevertheless constrained by the small number of dynamic variables available (those which changed over time, table S2), which limited the temporal resolution of our predictions. For example, the available layer of unpaved roads has not been updated since 2012, so our models likely underestimated the effect of road proximity (although a dynamic forest edge density predictor might have partially compensated for this limitation). Moreover, deforestation intensity may vary with time since human interventions (Sales et al 2017). For instance, the effect of distance to roads was stronger when considering longer time windows (figure 1). Incorporating temporal information into spatial layers could thus transform some of our static into dynamic variables, thereby improving predictive accuracy. Additionally, integrating climate forecasts could also improve our models by helping them capture feedbacks between climate change and deforestation (Bonan 2008, Staal et al 2020, Xu et al 2020. As the temporal resolution of land cover maps increases (MapBiomas maps for any given year are usually released in August of the following year), the temporal forecast error could be assessed by comparing shortterm predictions with recently quantified deforestation. Finally, comparing our approach with other spatial modeling techniques (Bhatt et al 2017) could help identify the method resulting in highest predictive accuracy.
Our forecasts indicate that a forested area larger than Denmark will be lost by 2021 inside the Brazilian Amazon. Although our predictions are plausible, we note that confidence intervals for Amazon-wide deforestation estimates are broad, and caution that we are likely underestimating future deforestation since rates of deforestation increased in 2020 compared with 2019 (Silva Junior et al 2021). However, our spatial blocking cross-validation approach maximized predictive accuracy across the biome (Roberts et al 2017), so we believe that the spatial pattern of our predicted deforestation should remain largely unaffected by the total amount of forest loss. The bulk of predicted deforestation was concentrated in the North-Eastern portions of the so-called arc of deforestation (Aldrich et al 2012), in the States do Pará and Maranhão. However, several other regions outside this arc of deforestation are also expected to lose large forested areas, including lands west of Boa Vista (Roraima State) and regions surrounding or overlapping with protected areas and indigenous lands (figures 2 and S6). The large predicted deforestation inside protected areas and indigenous lands ( figure 4) is alarming, since it indicates that ongoing deforestation occurring inside these areas is expected to increase, thus undermining the role of these critical instruments to safeguard the Amazon forest , Soares-Filho et al 2010, Rosa et al 2013, Maretti et al 2014, Faria and Almeida 2016, Cabral et al 2018, Brito et al 2019. Indeed, the current Brazilian government has taken actions that undermine the protection of standing forests and promote deforestation (Ferrante and Fearnside 2019, Pereira et al 2020).

Conclusion
Here we implement a straightforward, flexible, and reproducible approach to model Amazon deforestation and generate short-term deforestation forecasts for the entire biome or regions of special interest. Our results reveal that millions of hectares of forests are expected to be lost by 2022, with a substantial amount of deforestation taking place inside protected areas and indigenous lands. Shortterm deforestation forecasts provide key information to develop preventive measures suited to the specific needs of certain areas, such as restoration and conservation initiatives, sustainable agro-industry schemes, recreational programs, or increased surveillance (Nepstad et al 2014, Tollefson 2015, Viana et al 2016, Levis et al 2020. We are nevertheless aware of the need for appropriate vehicles to disseminate these forecasts to the general public, promote social and environmental accountability, and thereby help regional stakeholders plan actions to prevent deforestation. In spite of the gloomy forecast, some remarkable initiatives are already making progress (https://plataforma.alerta.mapbiomas.org/; https://amazonialegalemdados.info/).