Using deep convolutional neural networks to forecast spatial patterns of Amazonian deforestation

Tropical forests are subject to diverse deforestation pressures while their conservation is essential to achieve global climate goals. Predicting the location of deforestation is challenging due to the complexity of the natural and human systems involved but accurate and timely forecasts could enable effective planning and on‐the‐ground enforcement practices to curb deforestation rates. New computer vision technologies based on deep learning can be applied to the increasing volume of Earth observation data to generate novel insights and make predictions with unprecedented accuracy. Here, we demonstrate the ability of deep convolutional neural networks (CNNs) to learn spatiotemporal patterns of deforestation from a limited set of freely available global data layers, including multispectral satellite imagery, the Hansen maps of annual forest change (2001–2020) and the ALOS PALSAR digital surface model, to forecast deforestation (2021). We designed four model architectures, based on 2D CNNs, 3D CNNs, and Convolutional Long Short‐Term Memory (ConvLSTM) Recurrent Neural Networks (RNNs), to produce spatial maps that indicate the risk to each forested pixel (~30 m) in the landscape of becoming deforested within the next year. They were trained and tested on data from two ~80,000 km2 tropical forest regions in the Southern Peruvian Amazon. The networks could predict the location of future forest loss to a high degree of accuracy (F1 = 0.58–0.71). Our best performing model (3D CNN) had the highest pixel‐wise accuracy (F1 = 0.71) when validated on 2020 forest loss (2014–2019 training). Visual interpretation of the mapped forecasts indicated that the network could automatically discern the drivers of forest loss from the input data. For example, pixels around new access routes (e.g. roads) were assigned high risk, whereas this was not the case for recent, concentrated natural loss events (e.g. remote landslides). Convolutional neural networks can harness limited time‐series data to predict near‐future deforestation patterns, an important step in harnessing the growing volume of satellite remote sensing data to curb global deforestation. The modelling framework can be readily applied to any tropical forest location and used by governments and conservation organisations to prevent deforestation and plan protected areas.


| INTRODUC TI ON
To achieve the pledge made by world leaders at COP26 to end deforestation by 2030, innovative approaches to forest protection are urgently required. Previous efforts to curb tropical primary forest loss have failed to have a net positive impact  despite countries and companies making substantial commitments (e.g. the New York Declaration of Forests in 2014). In regions of the Amazon deforestation is surging (Beuchle et al., 2021) which risk inducing the biome to undergo a critical transition with profound consequences for climate and biodiversity (Lovejoy & Nobre, 2019). Although the ultimate causes of deforestation need to be addressed-notably global demand for agricultural and wood products-interventions to tackle proximate causes, such as illegal logging and mining, are also necessary (Asner & Tupayachi, 2016;Finer et al., 2014). One measure identified as effective in reducing deforestation is 'targeting protected areas to regions where forests face higher threat' (Busch & Ferretti-Gallon, 2017). However, knowing how to optimally allocate limited resources to tackle diverse threats across vast, difficult to access tropical forest landscapes is challenging. To enable effective, on-the-ground prevention measures, up-to-date information on the location, relative severity and likely evolution of threats is required.
Governmental and NGO commitments to make timely interventions have led to products that give near-real-time alerts on the location of deforestation events (Hansen et al., 2016;Reiche et al., 2021), but access to these alerts has had little material benefit in terms of curbing deforestation (Moffette et al., 2021). One issue has been that responses based on near-real-time mapping can only ever be reactionary. As a result, there have been calls to develop early warning systems that inform decision-makers of the location of near-term deforestation risk (e.g. WWF, 2020). By developing innovative solutions that harness the growing volume of remote sensing (RS) data, interventions stand a greater chance of preventing deforestation. In the longer term, cost effective conservation plans should not only consider the spatial distribution of conservation features (e.g. species, carbon stocks) and the financial costs, but also account for how threats are likely to evolve (Wilson et al., 2007).
Deforestation is difficult to predict as it results from complex interactions within human-ecological systems but characteristic drivers-such as agricultural expansion and ease of access-have been identified (Geist & Lambin, 2002). Drivers can be mapped to spatially resolved geographical, economic, social and biological variables (e.g. agricultural land value, distance to roads). Within a statistical or machine learning (ML) framework, these can be correlated to the likelihood of deforestation at a given location typically, by linking the spatial predictor layers to changes in forest extent over a single time step of several years (see e.g. Cushman et al., 2017;Mayfield et al., 2017;Saha et al., 2020 for intercomparisons of different ML frameworks). However, Rosa et al. (2014)'s review of such approaches concluded that they had a limited or poorly defined ability to forecast locations of deforestation, a key reason being that spatially explicit datasets on the drivers of deforestation are often incomplete, out-ofdate or unreliable in tropical forest regions. For example, proximity of roads is widely accepted to be a strong predictor of future deforestation (Barber et al., 2014) but roads in tropical forest landscapes are often unofficial, illegal or not comprehensively mapped (Perz et al., 2007), and can be highly dynamic; appearing, expanding and changing course to access new resources . Most studies that have included roads as predictors used static road maps which is inappropriate in fast changing contexts, while others have used highly speculative methods to predict the development of roads in tropical forest landscapes Mena et al., 2017).
With many of the decisions that lead to road development in tropical forest regions likely to remain hidden from public view, a fundamentally new approach to making predictions of the spread of deforestation is required. Analogous issues exist for other landscape features that help predict deforestation including the location/spread of human settlements and agricultural land.
The ML frameworks used to date have been limited in how they represent local context. Without the ability to retain the spatial structures of the data, pixels are represented in isolation or local context is represented as averaged neighbourhood metrics. Contrasting approaches that identify threatened regions based on dynamic data (i.e. resolved over numerous time steps) have tended to focus simply on spatial and temporal trends in intensity of deforestation (e.g. the emerging hot spot approach of Harris et al., 2017). Although these approaches are easy to implement and update, they are naïve to the drivers of the observed forest loss and therefore unable to reliably predict whether a deforestation front is likely to spread. This risks inefficient targeting of conservation resources. A forecasting approach that could automatically learn and detect the contextual features of the landscape that signal deforestation risk from satellite imagery, and update its predictions accordingly, would bypass the limitations of both exiting ML and simple trend-based approaches.
Through implementing modern computer vision approaches, it may be possible for an artificial neural network to automatically learn what features of a landscape are indicative of future deforestation from historic events and thereby avoid the need to rely on problematic modelling framework can be readily applied to any tropical forest location and used by governments and conservation organisations to prevent deforestation and plan protected areas.

K E Y W O R D S
Amazon, artificial intelligence, convolutional neural networks, deep learning, deforestation forecasting, machine learning, spatial forecasting, tropical forests data layers. Deforestation often exhibits striking spatiotemporal patterns when viewed from space (e.g. fishbone pattern; de Filho & Metzger, 2006). The emergence of these patterns suggests a degree of predictability while their characteristic spatial and spectral structures can give clues as to the drivers. Over recent years, the quantity of freely available, high-resolution imagery has been increasing at a rate that has exceeded our ability to harness it to support decisionmaking. However, deep convolutional neural networks (CNNs), that can retain spatial structure and be trained to automatically extract features of images, have revolutionised the field of computer vision (Voulodimos et al., 2018). They have led to radical advances in our ability to make predictions in countless fields and have recently been taken up by RS researchers (Zhu et al., 2017) and ecologists working with Earth observation data (Brodrick et al., 2019) to generate novel insights. This includes work on detecting logging trails (Abdi et al., 2022), human settlements (Corbane et al., 2021), deforestation (Torres et al., 2021), forest disturbance (Kislov et al., 2021) and quantifying the properties of terrestrial vegetation (Kattenborn et al., 2021). Importantly, however, these studies have focused on assessing the current state of systems rather than predicting future states.
Forecasting deforestation is a temporal extension of the landcover classification task. 2D CNNs (designed for working with image data; Kussul et al., 2017) and networks that incorporate an additional dimension (e.g. time) implicitly in their characterisation of data, including 3D CNNs (Li et al., 2017) and recurrent convolutional neural networks (ConvRNNs; Interdonato et al., 2019), have been shown to have state-of-the-art accuracy in detecting and classifying landcover change. Similarly, Convolutional Long Short-Term Memory (ConvLSTM)-based RNNs have been shown to improve on the performance of ConvRNN for RS classification tasks (Rußwurm & Körner, 2018). These promising developments suggest that 2D CNNs, 3D CNNs and ConvRNNs architecture types, if trained on large volumes of spatially resolved historic deforestation data, could characterise and learn the features of the imagery associated with changing forest cover and be used to predict the future location of deforestation (something that has not previously been attempted).
Critically, they preserve and work with the spatial (and temporal) structures of the input datasets and therefore make predictions based on the local context of the scenes they are presented with.
This article describes a range of deep CNN model architectures (MAs) that we designed to predict the risk of forest loss at a given pixel (~30 m) when presented with a multilayered scene (of data from freely available, global datasets) centred on that pixel, with the aim of automatically identifying areas threatened by deforestation. To test these networks, we trained and evaluated their performance on two regions in the Peruvian Amazon, one of the most biodiverse regions on the planet and exceptionally rich in endemic amphibians, birds, fishes, bats, and trees (Bass et al., 2010;Jenkins et al., 2013) but facing a range of threats including deforestation driven by copper and gold mining, logging, agriculture, cattle ranching and crude oil extraction (Finer et al., 2014(Finer et al., , 2015Piotrowski, 2019). We trained and tested the models with cloud free satellite data for 2014-2020 and spatially resolved annual deforestation data for 2001-2020 . The best performing model was used to forecast deforestation in across regions in 2021, predicting the risk of deforestation for every forested pixel in the landscape.

| Study regions
This study focused on two regions in the south of Peru (Figure 1). The size of the study regions were suitable as they provided many millions of training points while allowing processing to remain within the computing resources available to us for reasonable experimentation. A preliminary broad intercomparison of a broad set of models focused on the Madre de Dios (MdD) department of Peru. MdD has approximately 97.4% primary forest cover and primary forest loss has steadily increased since 2000 Turubanova et al., 2018). It is almost entirely intact Southwest Amazon moist forest (Olson et al., 2001;Potapov et al., 2017). The Interoceanic Highway-a 2575-km road from the coast of Peru to Brazil-was completed in 2011, enabling access to portions of MdD previously protected from human activity. This increased access supported immigration that has increased pressure on the forest. Legal and illegal gold mining is prevalent, seeding remote deforestation frontiers in otherwise intact forest (Nicolau et al., 2019). The region exports a range of agricultural products including cotton, coffee, sugarcane, cocoa beans, Brazil nuts and palm oil. Large logging concessions and illegal selective logging (particularly for mahogany) place additional pressure on the forest there (Chirinos & Ruiz, 2003). The dynamic and diverse pressures made it a compelling trail region.
The most promising models were further developed and additionally applied to a second region. It centred on the Junín department, included surrounding departments and was cropped by the limit of the Amazon Ecoregion. It has approximately 88% forest cover, primarily split between Southwest Amazon moist forest and Peruvian Yungas (subtropical cloud forest) with some Ucayali moist forest (lowland to premontane moist forests; Olson et al., 2001).
While most of its forest is considered primary there are some highly fragmented regions Turubanova et al., 2018).
Drivers of deforestation are similar to MdD but with less mining and more hydrocarbon exploration (Finer et al., 2014(Finer et al., , 2015. It gave the opportunity to trail the networks on a similarly sized area but one with greater environmental heterogeneity.

| Datasets
We limited ourselves to global, freely available datasets so the approach could be replicated and scaled to any forest location. The forest state labels used in this study were taken from the Global Forest Change dataset . It tracks the location of forest loss globally and is collated on an annual basis (latest version: 2001-2020) at ~30 m resolution. Loss events are based on a pixel's forest cover (here >30% to ~0%) and are neutral to the cause of transition. Management activities, unauthorised resource extraction and loss due to other biotic or abiotic factors are not differentiated, thus observed patterns of loss relate to all possible causes.
The dataset includes a data mask describing areas of land or permanent water which we used to filter valid pixels. It also includes percentage tree cover observed in 2000 which we included as a predictor layer (for information to on the forest density) and to filter out pixels with low cover (<30%). From the same source, cloud-free composite Landsat imagery of four spectral bands was available for 2014 to 2020. Near infrared (NIR) and short-wave infrared (SWIR) bands are informative for the RS of vegetation. These were included allow the computer vision networks to learn features from the spectral signals. We included the Japan Aerospace Exploration Agency's (JAXA) 30 m resolution ALOS Global DSM as a layer to give another dimension for the networks to learn from and exploit the features of the topology. It is an L-band SAR derived product which has better canopy penetration than C-band products and so gives a clearer signal of topology. Historic forest change labels were processed into four one-hot encoded layers, which assigned pixels to classes based on how recently (from current time t) they lost forest cover: (i) 0-1 years from t, (ii) 2-4, years from t, (iii) 5-8 years from t or (iv) more than 8 years from t. A pixel that does not register in these layers was free from loss since 2000. These layers encode the proximity, in time and space, of recent forest loss and thereby assist the models in representing contagion effects. All layers (see Table 1) were normalised to take values between 0 and 1 before being fed into the models.
Although additional layers (e.g. distance to roads, settlements) could have been included and may have improved overall accuracy, we wanted to test the ability of the deep CNNs to automatically extract useful features from a set of regularly updated, scalable RS
b Latest available cloud-free observation at pixel.

TA B L E 1
Predictor layers. The first three layers are static and the rest change year-on-year (t). All are ~30 m. abbreviations: Top-of-atmosphere (TOA), near infrared (NIR) and short-wave infrared (SWIR) data layers. This tests their utility in poorly mapped or highly dynamic regions. Additional details of the datasets are given in S2.

| Modelling process
The aim was to train a classifier that, when presented with a scene centred on a single forested pixel, could predict whether the target central pixel would remain forested (0) or transition to a non-forest state (1) in the year ahead. The model training and testing used 2014 to 2020 data, and forecasts were produced for 2021 (a year for which data were not yet unavailable). This section overviews the model training and testing process. Additional details are given in S4 and corresponding scripts are available at https://github.com/PatBa ll1/DeepF orest cast.
At a time t (years since 2000), the set of valid pixels (J t ) were those that had forest cover. When a pixel transitioned from a forest state to a non-forest state, this change was taken to be permanent. Specifically, a datapoint with a specific spatial location (central pixel), j, and point in time, t, consisted of data from a square (2r + 1) × (2r + 1) pixel scene, where r is the number of pixels the scene has in vertical and horizonal directions from the target central pixel, of the layers given in Table 1 (stacked bands). This allowed the network to observe what was happening in the neighbourhood of the pixel being considered (i.e. make decisions based on the field of view), which contrasts with typical ML approaches that use information on the pixel in isolation or with averaged neighbourhood metrics that lose the spatial structure information of the data. Each datapoint was associated with a non-forest (0) or forest (1) label from the year ahead (t + 1; see Figure 2).
F I G U R E 2 For a given forested pixel at the Centre of a scene, the convolutional neural networks learned to predict the state of the pixel (forest/non-forest) in the year ahead based on multispectral satellite imagery, historic loss patterns and the terrain.
For each datapoint, two types of 3D tensors were inputted into the models. The first, S j , contained the stacked static layers (see Table 1) and was of the shape S j ∈ R s×(2r+1)×(2r+1) , where s is the number of static layers (here 2 or 3 depending on whether the DSM layer was included). The second type, X j t , contained the stacked dynamic layers (see Table 1) and was of the shape X j t ∈ R d×(2r+1) ×(2r+1) where d is the number of dynamic layers (here 8). The label associated with the datapoint was defined as Y j t+1 ∈ {0, 1}, and took a value of 1 only if the central target pixel was labelled as non-forest in year t + 1. By allowing the models to learn the features (and interactions) of the data layers in tensors that are associated with the year-ahead forest/non-forest labels, the model would then be able to make predictions of the year-ahead label (Ŷ j t+1 ) when presented with new input tensors. Output Ŷ j t+1 could vary continuously between 0 (no chance of forest loss) and 1 (certain transition) with a threshold applied to determine the binary classification (what we refer to here as deforestation 'risk').
The input data for each year, with associated year-ahead (t + 1) labels, was partitioned, with stratified random sampling, into training (80%) and test (20%) sets. The training set was further split with stratified random sampling into five even splits so that fivefold cross validation could be performed while tracking model accuracy during the training process. Stratification was based on the proportion of class labels. Less than 0.5% of each study area transitioned from forest to non-forest in any given year which meant the number of pixels that remained as forest the next year (0)  The models were tested on the withheld 2014-2018 inputs (2019 labels) test set before being tested on the year beyond the training period test set (2015-19 inputs; 2020 labels). It was important to test the models' ability to predict the labels for a year outside of its training period to understand whether the models are transferable through time and therefore able to produce reliable forecasts.
The optimised models (with hyperparameters and weights retained) were then updated by continued training on 2015-2019 input data and 2020 labels. The final models were then used to produce deforestation forecasts across the entirety of each study area for 2021 (available for inspection as a GeoTiff at https://doi.org/10.5061/ dryad.hdr7s qvjz). Figure S13 summarises the arrangement of modelling processes. The metrics we used to evaluate the models were area under receiver operating characteristic curve (AUC), precision, recall and F 1 score. AUC integrates model performance over all classification thresholds. As the threshold could be adjusted to suit the user's relative preference towards a good true-positive rate (recall or sensitivity) or false-positive rate, AUC was insightful to assess overall model performance. As F 1 score (harmonic mean of the precision and recall) can communicate accuracy while accounting for variable class imbalance it was selected as a metric for model intercomparison over simple prediction accuracy (which can be misleading in cases of class imbalance).
We used the Adam optimizer with weighted cross entropy loss to train the models through stochastic gradient descent (Kingma & Ba, 2014). The penalty for missing a loss event was set to be double that of incorrectly identifying a loss event at a pixel that remained forested as we felt it preferable for the network to be tuned to avoid missing the relatively rare loss events. To avoid overfitting, regularisation was implemented through a dropout layer (Srivastava et al., 2014) and early stopping criteria. The models were trained with repeated exposure to the full sampled training set (i.e. multiple epochs).
We used GPUs (4 × Tesla P100-PCIE-16GB) and CPUs (12 × 5980 MB) with a 12-hour time limit on a high-performance computing cluster to train and test the models and make forecasts.
Details on the computational resources used, including hardware and software, are given in S5. The trials conducted as part of the training process, including hyperparameters used, accuracies attainted, and computing resources used is available at https:// wandb.ai/patba ll/forec astin g/. A spatial pyramid pooling layer in each network allowed the spatial input window size to be varied and optimised as a hyperparameter (He et al., 2015).

| Model architecture 1: 2D convolutional neural network
The first and simplest MA we employed (MA1) was a type of 2D convolutional neural network (2D CNN). In a 2D CNN, filters slide ('convolve') in two spatial dimensions across a 3D input tensor (with a spatial extent and a depth that corresponds to the number of bands) to extract spatial features that can be used by the network to make predictions. Inputs were simply the dynamic tensor static tensor stacked at a point in space. Potential temporal features of the data were not implicitly characterised by this type of network. Temporal change was simply captured by pairing the input data for 1 year (at time t) with the forest/non-forest labels from a year ahead (at t + 1).
See Figure S10 for a visual representation of the MA.

| Model architecture 2: 3D convolutional neural network
The second MA (MA2) was a type of 3D convolutional neural network (3D CNN). Li et al. (2017) demonstrated that 3D CNNs applied to hyperspectral data could achieve state-of-the-art land use classification accuracy. We thought that, by substituting the spectral dimension with the temporal dimension, it would be possible to draw out the spatiotemporal features of forest change. This type of network slides ('convolves') filters in three dimensions (two spatial and one temporal). This meant, for a given central target pixel, the series of dynamic tensors (see Section 2.4.3) could be stacked temporally along the channel axis to form a 4D input tensor and 3D convolutions used to characterise the spatio-temporal features of the input data. The static tensor was passed to a separate 2D convolutional branch. See Figure S11 for a visual representation of this MA. CNN and pass the static tensor to a 2D convolutional branch. These MAs, while able to achieve comparable accuracies to the first two, were dropped after a broad intercomparison as their training could not be parallelised and so it took far longer to achieve these accuracies. However, their model classes are available for experimentation from https://github.com/PatBa ll1/DeepF orest cast as their implementation may become more effective as the amount of historic data and available computing resources increases.

| Baseline: Random forest
To assess the relative benefit of using deep CNN-type methods, we trained, tuned and tested random forest classifiers to perform the same task. The data structure was equivalent to that of the 2D CNN, that is, a spatial window around a focal pixel was still used as an input and forest state labels were offset by 1 year. Spatial structure was not preserved-the 3D tensors were necessarily flattened into a 1D feature vector-but the model was free to learn which pixels (closer or father from the central pixel) were of greatest importance. AUC = 0.935). This suggested that models that could implicitly handle and characterise the temporal with the spatial structure of the input data may be better suited to predicting future deforestation. In other words, those models that could retain and work with the spatiotemporal patterns of the data (how the spatial patterns evolved) rather than just use spatial patterns tended to perform better. MA4 and MA3, while achieving comparable accuracies, were uncompetitive with respect to training time due to their constraints on parallisation (a well-known problem with RNNs). Therefore, MA1 and MA2 were selected to be taken forward for further development and testing.
The spatial dimensions of the input tensors determined the size of the scene that models were able to view and had an influence on model performance. Model AUC plateaued around an input frame of ~35 × 35 pixels (~1 km × 1 km) and began to decrease for input frames larger than ~50 × 50 pixels (~1.5 km × 1.5 km). Between these sizes seemed to be the sweet spot where the networks could learn from local context while also remaining focused on the pixel of interest.
Full details of the model performances, including hyperparameters and receiver operating characteristic (ROC) curves, are given in S7.

| Model development
Further tuning and extended training improved within-year accuracy of the 2D CNN and 3D CNN model classes which performed better than the baseline models. Across the board, the models predicted less accurately on the Junín region than MdD. It suggests that the Junín region is inherently more difficult to predict on (it has a more cloud cover, diversity of forest types and fragmentation). Models that included the topological layer as an input predictor layer were no more accurate than those without suggesting it was uninformative to the model. To reduce redundancy, this layer was dropped.
There was a drop in model prediction accuracy from the within training period test set and the year ahead test set (see Table 2). This was to be expected but it could also indicate some change in the patterns of deforestation or a change in the nature of the input data (it is difficult to produce satellite data that is exactly consistent across dates due to e.g. atmospheric conditions). The greater the complexity of the model, the greater the drop off in accuracy from nowcast to forecast predictions indicating a degree of overfitting. The model that was trained on both regions sequentially was only marginally better at predicting year ahead deforestation than the model that was trained on just the region to be predicted (for 3D CNN). The highest year ahead AUC from the 3D CNN for the Junín region was returned by the model trained exclusively on MdD data. This suggests that data quality is at least as important as volume and spatial range of training data. It also suggests that the spatio-temporal patterns of forest loss between the two regions are somewhat similar and that the models are spatially transferable.
TA B L E 2 Within year and year ahead pixel wise classification accuracy of the 3D CNN, 2D CNN and random forest baseline model classes. The reported accuracies are all for unseen test sets. The 2019 labels are within the training period and 2020 are beyond (to test forecast accuracy). The ratio of 0 to 1 labels in the test sets were 4:1

Model class
Training period (training labels)

Test set 2019
Test set 2020 Bold shows the best performing models for forecasting at each site.

AUC Precision Recall
While the 3D CNN forecasted most accurately on the MdD region, the 2D CNN gave the best forecast accuracy for the Junín region. The 3D CNN has the advantage of being able to convolve over spatial and temporal dimensions together (and so learn spatiotemporal features) but the 2D CNN has more datapoints available to learn from as the inputs are for a single year input data (with an offset label) and not stacked across time like with the other model types. More details of the model performances over different hyperparameters is available at https://wandb.ai/patba ll/forec astin g2d/ and https://wandb.ai/patba ll/forec astin g/ for the 2D CNN and 3D CNN, respectively.

| Forecasts
The near-future forecasts exhibited some interesting spatial features. To illustrate the 3D CNN's assessment of deforestation risk, we have focused on four recent forest loss events in the two study regions. The 2021 risk forecast was overlaid over Planet/NICFI biannual composite satellite imagery for 2020 (see Figure 3). Figure 3a shows a rapidly growing commercial agricultural area well serviced by a river and roads in Junín. The network signals a very high risk that agricultural clearance will continue to expand into the remaining forest areas. Figure 3b shows a new, remote unauthorised gold mine in MdD. The network anticipates an immediate expansion of the mining operations into surrounding areas. Figure 3c shows a newly laid access route connecting newly established agricultural settlements to a large river. The network highlights a risk of clustered conversion along parts of its route. Figure 3d shows The apparent ability of the networks to automatically extract landscape features relating to different drivers of forest loss and infer whether loss was likely to continue from a source was encouraging. It highlights the potential for deep learning computer vision methods to process large volumes of RS data to manage risks and make predictions in complex human-natural systems. Previous attempts at forecasting deforestation have often had limited predictive capacity due to a lack of reliable input data on dynamic landscape features. From satellite data and other complementary spatial datasets, our model appeared able to discriminate between one-off loss events (e.g. landslide) from frontiers that are likely to progress over time (e.g. from a newly built road, an unauthorised gold mine).
The dimensionality of convolutions in the networks-spatial in the 2D CNN case and spatiotemporal in the 3D CNN case-involved a trade-off. The ability of the 3D CNN to exploit spatiotemporal features allowed it to perform most accurately in the case of MdD, the region with less topological variation and ecosystem variety. Indeed, the ability of 3D CNNs to view a pixels spatial and temporal context together has been shown to improve classification accuracy from RS data (Ji et al., 2018;Xu et al., 2018). This implicit handling of the temporal dimension with the spatial dimensions would seem espe- Deep learning models have many hyperparameters that need tuning. We demonstrated a degree of spatial transferability of our models, but users should be aware that additional tuning and retraining may be necessary to achieve optimal performance in a new region. The optimal window width to view the scene was 1-1.5 km. For larger windows, the marginal benefit from the additional information provided less benefit than the cost of handling this information. In other words, there was a point at which adding further away pixels to a scene began to distract the network from the area of focus.
We stress that, while the reported accuracy is encouraging, there are reasons to receive these metrics cautiously. The GFC  dataset provided all data labels and so the reported accuracies specifically relate to how well the networks predict future outputs of this classifier and not necessarily the ground truth. The GFC dataset was generated by a bagged decision tree that classifies filtered Landsat data. Our models were likely to have approximated and incorporated behaviours of this classifier and accuracy may be inflated (relative to on-the-ground change) from learned correlations.
Any ML approach is necessarily limited by the fact it is based on historic data and the assumption that observed trends will continue, and therefore does not implicitly account for regime changes in the system. This is relevant here, as political changes have substantial impacts on Amazonian deforestation (Pereira et al., 2020). A changing climate, volatile international economic conditions and advances in technology further challenge the assumption of stationarity.
However, as the forecasting system can be dynamically updated it is able to learn emerging patterns which is an improvement on previous approaches. Additionally, the models output a continuous measure of risk across all pixels and, by varying the threshold at which the binary classification is made, it is possible to adjust the overall allocation of forest loss in predictions. It would be possible to couple the spatial evaluation of risk from the CNN system (accounting for landscape scale changes) to a model of overall forest loss based on external drivers to generate forecasts that reflect global trends. The pixelwise risks can be integrated to assess relative risk across a region.

| Data sources
We limited our set of data layers to test the networks' ability to automatically extract dynamic landscape features. However, including additional RS data layers as predictors could help to inform the networks and improve accuracy. Of particular interest is night lights, which provide a dynamic signal for human presence and economic activity. At current, the resolutions of available night light products are too coarse to integrate into our system, but this is likely to change in the near future (Levin et al., 2020). The ~30 m resolution of Landsat derived products is likely to fail to pick up on selective logging. Higher resolution imagery (e.g. Planet NICFI, Sentinel-2) may make that feasible but there is not currently a matching dataset for forest cover labels. The GFC dataset only provides annual information on forest loss and not forest gain which means the forecasts presented here cannot account for reforestation. The Copernicus Global Land Cover collection (Buchhorn et al., 2020) could provide labels for more types of landcover transitions but is only available for 2015-2019 and at 100 m resolution. As this data source develops more intricate predictions that support forest regeneration analyses may be possible. Furthermore, rather than simply predicting transitions between forest and non-forest states it may also be valuable to make predictions on gradual declines in relation to disturbance events and forest resilience. In this context, it would be possible to train networks similar to those presented here to predict the evolution of greenness metrics such as NDVI (Boulton et al., 2022). The ALOS JAXA elevation layer was found to not improve model performance, which was somewhat surprising in Junín, which has inaccessible mountainous areas. It may be that processing the elevation into a relative slope layer would allow the networks to better comprehend the features of the local topology.

| Probing the black box
Deep neural networks are considered 'black box' models so it is difficult to say how the networks were making decisions and to what degree they were recognising different potential drivers of loss. There has been work on interpreting deep CNNs to explore their internal reasoning (Li et al., 2021

| Technological advances
The MAs were inspired by models that have performed well on land cover classification and adapted to fit the specific forecasting task.
However, recent architectural innovations could further help to improve performance. Transformer networks (originally from natural language processing) have proven themselves to be the stateof-the-art in handling sequential data (Vaswani et al., 2017) and recently in some computer vision tasks (Dai et al., 2021). Using selfattention, they avoid the difficulties in parallelising RNNs, and so can be trained more efficiently. Integrating transformer components into the networks may prove to be an effective way of handling the spatiotemporal features of the data (especially as the available time series grows). This could help to extend the networks forecasting horizon. Further improvements could also be gained by refining the training process to focus it on contentious datapoints. The under sampling of the non-change class was random and many training points would be uninformative as they are in very remote locations.
A method that directs the training to cases that sit close the decision threshold between the classification states would address this. We have provided open-source code and benchmark results, and hope further improvements can be integrated through collaborations.

| Applications
The forecasting approach presented here was designed to be flexible, useful to a range of potential users and scalable to forests globally. All pixels in a landscape can be ranked according to the likelihood of loss allowing for spatial prioritisation. An organisation may have resources to police a certain proportion of a region and the forecasts would allow them to monitor the most threatened areas. The forecasts could also direct interventions to emerging frontiers of loss in otherwise isolated and intact landscapes. On the other hand, when planning a permanent protected area, it may be preferable to avoid areas that are most imminently threatened but address areas that may undergo transition if left unprotected.
In either case, it is beneficial to understand how threats are likely to evolve.
Governmental and non-governmental organisations can use the tools presented here to understand how the landscapes in which they operate will change into the future and refine their protocols for managing and responding to deforestation risk. The effectiveness of deep CNNs in the current context suggests that they should also be applied to other ecological forecasting tasks based on RS data such as forest fire forecasting.

CO N FLI C T O F I NTE R E S T
None declared.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/2041-210X.13953.

DATA AVA I L A B I L I T Y S TAT E M E N T
All codes are freely available at the project Github repository https:// doi.org/10.5281/zenod o6858022 (Ball et al., 2022a