Introduction

Although their origins lie in the 1940s (Goodfellow and others 2016), the past decade has seen rapidly growing application of tools associated with artificial intelligence (AI), machine learning (ML) and deep learning (DL) across the sciences. In the Google Scholar Metrics report for 2020, the most cited papers across all subject areas are dominated by those in the field of AI, including three of the top five in Nature (the other two being in genetics).Footnote 1 This trend reflects the rapid progress and growing importance of AI methods and tools in many fields, including computer vision, sound classification, natural language processing, gaming, and robotics. The increasing use of AI, ML and DL methods has been driven by a combination of technical developments in the algorithms themselves, the availability of large data sets, rising computational power (including cloud-based services, GPU-optimised code, specialist processor units, edge computing), and accessible open-source frameworks for their implementation. It has also been driven by massive funding from the private and public sectors, partly due to (sometimes hyperbolic) assessments of the opportunities offered by AI. Alongside these advances, there has been growing concern over AI's ethical and privacy challenges. While developments in hardware, software and knowledge offer potentially transformative opportunities for ecologists (for example, powerful tools to work with new data sources such as images, audio and language), as with many new technologies, they have been over-hyped, which has led to some cynicism as to what they offer. As we will discuss, DL methods—while certainly not the be-all and end-all of analysis methods—have great potential to advance ecosystem ecology.

Even against the background of the exponential growth in the scientific literature (Wang and Barabási 2021), there has been an explosion of publications discussing or using AI and deep learning in the environmental sciences since the mid-2000s (Figure 1a). Analysis of the keywords in a corpus of papers considering artificial intelligence (see Supplementary Material) identified three broad topic areas (Figure 1b): (i) environmental modelling and forecasting (for example, time-series analysis of water or air quality), (ii) automated image detection and classification (for example, identification of species in wildlife camera traps) and (iii) remote sensing and landscape classification (for example, image classification for forest disturbance detection from satellite data). This brief analysis underscores that ecologists increasingly apply DL approaches in a range of problem domains.

Figure 1
figure 1

The rise of AI and DL in the scientific literature: a normalised publication rate in articles in environmental science overall, artificial intelligence, and deep learning from 1971 to 2021; and b focal areas of publications in publications in environmental science using deep learning tools based on keyword co-occurrence analysis. In b the groups are identified by maximising the density of connections between vs. across groups (that is, modularity; R command igraph::group_fast_greedy), with the size of the nodes (black points) proportional to the frequency that a keyword appeared together and the weight of the edges (lines between nodes) is proportional to the strength of the link (frequency of co-occurrences). The normalisation in a is the publication rate in a given year relative to 2000 for artificial intelligence and environmental science, while deep learning is normalised to 2006, as this was the earliest paper published with deep learning in the environmental science discipline. See Supplementary Material for details.

Current perspectives on DL among ecologists range from ‘DL is a universal panacea’ to ‘DL is an inscrutable black box to ‘DL methods are an over-hyped fad’. In this review, we seek to provide a realistic perspective of how to best capitalise on the investments by the public and private sectors in these technologies and leverage those developments to foster new avenues for ecosystem research. Our review is not intended as a ‘how to’ primer, nor is it aimed at experts in DL methods. Likewise, it is not a comprehensive evaluation of every potential or realised application of DL in the context of ecosystem ecology. Instead, it is intended to introduce what DL and associated methods might offer ecosystem ecology and some of the challenges these applications pose. We briefly describe the neural networks that underpin DL, and consider their application in three contrasting problem domains, before concluding with how ecosystem ecologists might best exploit these new approaches.

Deep Learning Algorithms

Deep learning relies on artificial neural networks (ANN), which are loosely modelled on the brain with artificial neurons (nodes) connected so they can communicate (analogous to synaptic connections). Deep neural networks (DNN) have become widely used during the past decade but descend from simpler artificial neural networks devised in the 1950s and 1960s (Figure 2; Goodfellow and others 2016; Razavi 2021). While these early networks mainly used one hidden layer of nodes (Figure 2), DNNs have many of them; hence, the moniker ‘deep’ (see Glossary for expanded definitions). Depending on the application, the inputs to a DNN (Figure 2) can be diverse, including pixels in an image, words in a sentence, data points in a time series, and can be mixed in type (qualitative, categorical, quantitative). Similarly, the type of outputs can vary with classification (the network determines for the given inputs one of a pre-defined set of classes) and regression (the network determines a single numeric value from the input data).

Figure 2
figure 2

A schematic of neural network architecture for a a shallow neural network with just one hidden layer, and b a deep feed-forward neural network with multiple hidden layers (here depicted as uniform and fully connected layers with not all links shown).

DNNs vary in their architecture, that is the details of the ‘wiring’ of the nodes. The most straightforward architecture is the feed-forward neural network (Figure 2 right), in which raw data are successively transformed into more abstract representations until some output is produced (for example, identification of a species from an audio recording). In this architecture, nodes (the neurons) are fully connected between but not within layers. In a simple (feed-forward) ANN, input data are transformed by a sequence of nodes in a ‘hidden’ layer to generate output (Figure 2). In a shallow ANN, the single hidden layer will consist of a series of nodes that can transform data using a sigmoidal function, based on the fact that any data transformation can be achieved using a stack of sigmoidal functions and a linear transform (Goodfellow and others 2016; Borowiec and others 2022). Although in theory given sufficient nodes a shallow neural network can apply any transformation, it is more efficient to use multiple layers than one single enormous one (Razavi 2021). Razavi (2021) provides an intuitive and thorough geometric explanation of these ideas, which he calls the concept of ‘depth’ (p. 4). In DNN, the multiple layers have different purposes; for example, in a CNN different layers may apply convolution kernels to extract key features from an image and pooling layers to generalise (down sample) these features. The architecture of deep neural networks is such that layers go from general to specific, with the last layer fully connected and producing the output. Thus, as Yoskinski and others (2014) discuss, when DNNs are trained to classify images, the first layer tends to identify similar high-level features (“Gabor filters or colour blobs”, p. 1) irrespective of the image type. Each node is characterised by an activation function that defines how the values from incoming connections are combined and forwarded to the next layer of nodes. In a fully connected network, each node is connected to all nodes in the following layer via variable weights that are learned and hence represent the relationships between variables. The DNN ‘learns’ by optimising the connection weights in the network to minimise the prediction error (Olden and others 2008; LeCun and others 2015).

Learning is most commonly performed using a backpropagation algorithm, where an error function is minimised iteratively across observations by updating the weights to decrease their contribution to the overall error (Olden and others 2008). In supervised training (that is, the output variable [response] is labelled, and the algorithm trained to predict the label), the goal is to minimise the difference between the predicted and observed values. After a DNN is trained, it can be used to predict outcomes based on new data. Usually, the output is directly used for classification (for example, what animal is in the picture?) or regression (for example, what is the predicted value of a time series?), but DNNs can be the (core) part of more complex toolchains. Image segmentation (a.k.a. image semantics, assigning each pixel in an image to a class) or object detection (detecting the type and position of multiple objects in an image) are examples derived from computer vision that have become regularly used in ecology.

As noted previously, there are many ways to wire a DNN’s nodes (that is, the architecture of the DNN). Ecologists have most frequently used convolutional neural networks (Brodrick and others 2019; Christin and others 2019; Borowiec and others 2022), as they are particularly well-suited for image and audio processing. In a convolutional neural network (CNN), the hidden layers comprise convolution layers (hence the name), pooling layers, and fully connected layers designed for different components of image recognition (feature extraction, downscaling, and integration); Rawat and Wang (2017) review the design and application of CNNs for image classification. Another architecture of potential importance for ecosystem ecologists is recurrent neural networks (RNN; Figure 3). RNNs process sequences and keep a ‘memory’ of past data by feeding the output of a layer back into that same layer (hence ‘recurrent’, Figure 3a). RNNs can be imagined as a sequence of neural networks feeding each other by sharing parameter information (the unfolded neural network in Figure 3b). The length of time at which a previous network state is influential will depend on changes in weights during training; thus, in principle, RNNs can deal with short- and long-term memory effects or dependencies (Goodfellow and others 2016). This architecture is well-suited to time-series applications, such as forecasting hydrological and meteorological conditions (Rahmani and others 2021; Zhi and others 2021). Because of their ability to deal with sequential data, RNNs offer a route to the near-time ecological forecasting advocated by Dietze and others (2018).

Figure 3
figure 3

Schematic architecture of a recurrent neural network. The architecture is shown in a folded and unfolded view (a and b, respectively).

Machine learning engineers are continuously refining existing and devising new DNN architectures. For example, the “transformer” architecture (Vaswani and others 2017) initially developed in the language domain is now increasingly used for image processing tasks (Chen and others 2021a). While this complexity could seem overwhelming for ecologists, deep learning software packages are increasingly available that hide most of the technical complexity and can be used from well-known computing platforms such as R, Python, and Julia.

Changes in the Data Landscape

Machine learning and deep learning methods have emerged in an era of large datasets (potentially comprising > 1 × 109 items; Goodfellow and others 2016). This emergence is crucial for DL because these methods thrive disproportionally on big data compared to classical statistical approaches. DL has the potential to leverage the information hidden in such large datasets to answer ecological questions in new ways. Thus, any discussion of DL necessitates considering two inter-related trends: big data and born-digital data. Big data are characterised by the three Vs: volume, velocity, and variety (LaDeau and others 2017). Volume relates to the fact that we have unprecedented amounts of data available (although ‘unprecedented’ is context-dependent and of itself unremarkable), velocity is a function of the rapidity of data generation, sometimes happening in real-time, and variety means data are heterogeneous in form and curation. Where do these data come from? Increasingly, data collection is automated using devices that remotely measure the environment, including camera traps, satellite platforms, unmanned aerial vehicles (UAVs; drones), automated audio recording devices (continuously monitoring sondes in freshwater and marine ecosystems), and simulation models (Kays and others 2020; Keitt and Abelson 2021). In some countries, these data are openly available and collected via large coordinated research programmes such as NEON (USA; Keller and others 2008) or TERN (Australia; Cleverly and others 2019). These data streams often consist of images, audio, video or unstructured texts, which are well suited for DNNs but challenging to use with traditional statistical methods. An additional source of big data is citizen science, whether in collecting information or labelling massive datasets. This diversity of sources gives rise to the fourth ‘v’, veracity (their variable uncertainties), which is crucial to understand for these data to be used effectively (Farley and others 2018). Reconciling the various types and scales of data available to ecologists is a fundamental challenge in effectively leveraging data-led methods. DL methods are excellent tools to address this broader challenge. A particular challenge for DL methods is their demand for large amounts of accurately labelled data for supervised learning; we will return to this problem later.

The Use of DL in Ecosystem Ecology

Ecosystem ecology is the study of the dynamics of energy and matter in ecosystems, resulting from the interactions of abiotic and biotic components of such systems and occurring across multiple spatial and temporal scales. As the publications in Ecosystems would attest, the field has a broad remit and interfaces with nearly every other sub-discipline of ecology. To illustrate the range of applications of DL in ecosystem ecology, we will consider three broad areas: analysis of data describing energy and matter fluxes, image processing and analysis, and integration with earth system and ecosystem models. We have drawn on case studies that align with fundamental questions of ecosystem ecology, yet in many cases, these are allied with other components of ecology. Likewise, many of the opportunities and challenges associated with using DL are not domain-specific and encompass the use of these tools across subfields of ecology (for example, the potential of large-scale text analysis and automated translation to help alleviate biases in literature syntheses).

Problem Domain 1: Synthesis and Prediction of Massive Data Describing Ecosystem Fluxes

Global networks, such as FluxNet and automated hydrological and meteorological stations, yield vast amounts of high-resolution information describing ecosystem fluxes (Baldocchi 2020). Deep learning methods have been applied to these data to predict temporal dynamics and to assess how they might be affected by global change. Recurrent neural networks and their relatives, such as the long short-term memory (LSTM) model (a variant of RNNs), are well-suited to modelling temporal data and have begun to be used to model earth system dynamics. Kraft and others (2019) developed RNNs to predict the normalised difference vegetation index (NDVI) based on climate data, land cover and soil information using an LSTM architecture. Their models demonstrated that including memory (past data) improved model performance in both global and biome-specific models. While the gains in performance varied between biomes they were somewhat predictable from a biome’s position in climate space. For example, memory effects seem stronger in sub-tropical regions where seasonal effects are less important than sporadic climate events (for example, interspersed wet and dry periods, see also Hansen and others 2022). The strength of memory effects also varies through time in different biomes (for example, it is strong in spring in contexts where meltwater is important). Similarly, Zhi and others (2021) used an LSTM model to predict dissolved oxygen content in catchments across the conterminous USA. Because dissolved oxygen is a vital indicator of the health of freshwater ecosystems, there is a need to develop models that are transferrable to sites where data are lacking. Zhi and others (2021) trained their model on measurements of dissolved oxygen concentrations at more than 200 sites spanning 1980–2014 (minimum of n = 10 points) alongside high-quality daily meteorological data and a suite of variables characterising watershed conditions. The models captured the seasonal dynamics of dissolved oxygen, although the predictions were damped at some sites. Zhi and others (2021) comment that the model performance is affected by a lack of data at dissolved oxygen extremes and heterogeneous data availability. Similar methods have been used to predict critical components of the earth system at a global scale. For example, Besnard and others (2019) implemented an RNN to predict net ecosystem CO2 exchange at forest sites across the globe, using a range of data for training (remotely sensed data, down-scaled climate information and eddy covariance flux information). Their model captured broad seasonal and inter-site trends but did not adequately predict extreme conditions. Besnard and others (2019) considered that this failure to capture anomalies adequately could be explained by several issues, including missing data in the remote-sensing time series and the temporal resolution and spatial content of the information. Issues of data scarcity (including labelled data) and patchiness are a recurring challenge for data-hungry models such as DNNs.

Problem Domain 2: Interrogating Image Data

Object Identification and Labelling

Rapid developments in computer vision have made image analysis and processing a frequent application for DL in the environmental sciences (Figure 1b). DL-informed image processing has been used in many ecological contexts, including (i) identifying wildlife species in camera trap data, (ii) the extraction of multidimensional whole-organism phenotypic information (‘phenomics’), (iii) mapping disturbance events (for example, fire and floods), and (iv) tracking organism movement. The almost archetypal application of DL in ecology has been to extract taxonomic information from imagery. In a pioneering study, Norouzzadeh and others (2018) demonstrated the ability of DL methods to identify wildlife species in motion-activated wildlife camera imagery. They trained nine DL architectures (for comparison) to detect and identify species in the Serengeti Snapshot database, which contains 3.2 million images (Swanson and others 2015). The model (Figure 4) approached or exceeded the accuracy of human volunteers, with potentially enormous (up to 99%) timesaving. For example, their model accurately identified the 75% of images not containing an organism, which considerably reduces the number of images requiring manual assessment.

Figure 4
figure 4

Schematic overview of the two-stage workflow used by Norouzzadeh and others (2018). In both steps, nine DL architectures were trained, with the best performing retained for prediction. For step 1, this was a single model (a CNN), and in step 2 it was a model ensemble. The graphs summarise model (M) vs expert (H) accuracy. There is no bar for ‘H’ in the ‘Present?’ graph as human expertise on this task was not assessed. Wildebeest icon by Lukasiniho under CC3 license (http://Creativecommons.Org/Licenses/By-Nc-Sa/3.0/).

DL models have been used to characterise vegetation structure and to identify and predict disturbances in forest landscapes, primarily via the analysis of remotely sensed imagery; again, convolutional neural networks are the leading DL architecture used in this context. Using aerial imagery, Rammer and Seidl (2019b) trained a CNN to predict bark beetle outbreaks in a German national park within individual years and along a 23-year time series. The network was trained to predict whether a single focal cell (30 × 30 m) will be disturbed in the next year based on the average climate conditions, the spatial pattern of hosts and current disturbance in a 600 × 600 m window around the focal cell. Their CNN outperformed a number of other machine learning methods and did so without the inclusion of meteorological data, on the grounds that such data are often scarce or unavailable. These applications are not limited to landscape-level dynamics. Kattenborn and others (2020) trained CNNs using UAV imagery to identify individual tree species cover in forests, estimate plant cover in a glacial vegetation succession, and identify invasion dynamics (first two cases in New Zealand, the third in Chile). Their models performed well, but they suggest important trade-offs between the accuracy and the spatial resolution of the predictions. DL models have been implemented at still finer scales to identify insects (Valan and others 2019) and pollen grains (Daood and others 2016; Olsson and others 2021). In short, DL methods are versatile, accurate, and efficient for image processing tasks; the application of these methods to ecological questions will likely continue to grow.

Beyond Labels: Measuring Functional Traits and Behaviour

Phenotypic variation is linked to a range of ecosystem properties and functions. Studies of variation in phenotype over large spatial extents can address macroecological questions and changes over time assess how morphology tracks environmental changes (for example, body-size shifts under climate change). Manually extracting high volumes of multidimensional phenotypic data is time-consuming; hence, there is considerable interest in leveraging advances in computer vision and DL methods to facilitate this process (Lürig and others 2021).

As described previously, citizen science efforts have led to the collection of large bodies of data, especially labelled images. Schiller and others (2021) used a CNN trained using trait information from the open TRY database (Kattge and others 2020) to estimate six plant functional traits from plant images stored in the iNaturalist database. They explored: (i) how the inclusion of intraspecific variation in traits and bioclimatic information influenced model performance and (ii) the potential for a CNN to predict traits indirectly using covariance structures (for example, leaf shape, which is apparent in the image, may predict elemental concentration in tissues). If a model can make accurate indirect trait predictions this would enable more easily measured (or cheaper) parameters to act as surrogates for more difficult ones. Schiller and others’ (2021) best performing models had normalised absolute mean errors in the range of 8–15% (r2 = 0.16–0.58) with predictions better for leaf form than tissue-related traits (that is, directly vs. indirectly measured). Similarly, Weeks and others (2022) developed a DL-based workflow to identify bones in images of bird skeletons in museum collections and measured 11 skeletal traits (the Skelevision project: https://skelevision.net/). This process involves detecting the bones of interest in an image (image segmentation) and then measuring them through a multistage process that used DL models to identify bones in images and to measure the characteristics of interest. Weeks and others (2022) commented that an advantage of the method was that it did not damage specimens. The accuracy of bone detection in the models depended on the morphological element; however, classification and skeletal measurement were accurate and repeatable, with only one trait showing any phylogenetic signal (for example, bias varies across taxa). Weeks and others (2022) emphasise that a critical advantage of their workflow is that it is easy to generate data describing new traits given the low annotation requirements. In short, there seems little doubt that there are many opportunities for trait-based ecology to benefit from the integration of computer vision and DL.

Data about movement can provide information about the behavioural component of phenomics (Lürig and others 2021). DL can be used to detect objects (that is, animals) in video data and track them, as well as classify such data into states potentially associated with different behaviours. These workflows involve object detection and identifying key points on the body (for pose) or tracking the objects’ movement. Software toolkits have been developed that integrate computer vision and DL models to detect individuals and estimate their pose (Graving and others 2019) and movement (Walter and Couzin 2021). For example, Lopez-Marcano and others (2021) describe a workflow for detecting and tracking individual fish (bream) in video imagery. They used a CNN to identify the fish (based on a training set of 8700 annotated images) and tested three object tracking algorithms. The workflow efficiently identified and tracked individual fish and, as with other applications leveraging DL, allowed data to be collected and analysed at a scope not otherwise possible. Such applications have significance for tracking animal movement, which underpins ecosystem functions such as biogeochemical cycling and seed dispersal, and may also inform conservation activities such as identifying individuals of threatened species (Tuia and others 2022).

Problem Domain 3: Modelling Ecosystem Dynamics

Hybrid Earth System and Ecosystem models

The incorporation of DL into process-based earth system models to form ‘hybrid’ model platforms is a very active research frontier (Reichstein and others 2019; Irrgang and others 2021). In such models, some system components and processes are simulated using data-driven representations and others using more mechanistic/process-based approaches. The advantage of this hybrid architecture is that it can leverage the physical consistency of process models with the data-driven performance of deep learning models (Reichstein and others 2019). There are several rationales for incorporating a DL component into ecosystem models (Reichstein and others 2019; Irrgang and others 2021): (i) to improve the estimation or upscaling of uncertain parameters, (ii) as plug-in components to replace physical models or model components, (iii) to test models by helping identify errors, and (iv) to emulate computationally expensive physical models (that is acting as meta-models).

Hybrid mechanistic-DL models have begun to be implemented to predict ecosystem properties, including evaporation (Koppa and others 2022), evapotranspiration (Chen and others 2021b), lake temperature (Read and others 2019), and snow-pack distribution (Xu and others 2022). For example, Chen and others (2021b) implemented a hybrid physical-DL framework to predict daily ecosystem respiration and evapotranspiration in the western USA. Their approach combined high-resolution eddy covariance and meteorological data with land surface information (NDVI via remote sensing) to support physical and DL models (a LSTM) of evapotranspiration and ecosystem respiration. They tested the model at local (individual FLUXNET sites) and ecoregion scales (model transferability within ecoregions). At the site scale, their model successfully captured long-term trends in evapotranspiration and ecosystem respiration; however, performance was less adequate when predicting short-term fluctuations, especially during summer extremes. Tests at the eco-region scale were also successful, although there were some issues in predicting summer extremes, demonstrating the ability of these hybrid models to predict unmeasured locations or those where data are missing. In general, Chen and others (2021b) note that the hybrid model performed well and the architecture should be extendable to other biogeochemical cycles. However, they highlight some uncertainties arising from feature selection, capturing extremes (the poorer short-term performance in summer was attributed to a lack of extremes in the training data; the earlier example of Zhi and others (2021) suffered from similar problems), resolution of meteorological information especially in mountainous terrain, issues inherent in remote sensing (for example, cloud cover), and error propagation within and between the components of the hybrid model architecture. Although some of these issues are problem-specific, they again speak to general issues in data-driven modelling concerning the data available for training, especially in infrequently observed conditions, sparse sampling, and the selection of variables to include in the model.

A concern surrounding DL models is that they may identify patterns in a way that is not constrained by known physical laws (what Reichstein and others 2019 call ‘physical inconsistency’). Karniadakis and others (2021) describe three ways that information can be introduced to machine learning (including DL) models to make them ‘physics-informed’’: (i) observational biases where the data used to train the model carry information about the underlying mechanisms, (ii) inductive biases where known physical laws are embedded in the model architecture, and (iii) learning biases where the model is penalised for violating physical constraints. Arguably, using inductive bias is the approach that will most strictly honour physical reality but it requires a rather complete mechanistic understanding of the system (difficult for complex and open systems such as ecosystems) and does not scale well (Karniadakis and others 2021). These physics-informed method are beginning to be adopted by ecosystem ecologists although the terminology used differs between disciplines and applications. For example, building on the ‘theory-guided data science’ of Karpatne and others (2017), Jia and others (2019) implement a RNN to predict lake water temperature in a way that honours the conservation of energy and the relationships between depth and density. Their constrained RNN outperformed a physical lake ecosystem model, and Jia and others (2019) argue that the inclusion of physical constraints makes it more easily generalisable. Read and others (2019) tested this approach across more lakes, comparing it to an unsupervised DL model and a physical model of lake temperature. Their hybrid DL model outperformed the others both in lakes where there was detailed site-specific information and in a wider pool of nearly 70 test lakes where there was less information. Likewise, physical laws might also be used to evaluate model performance; for example, Razavi (2021) show how a DL model of precipitation could be tested using a temperature threshold for snow formation (that is, is snow vs. rain predicted at appropriate temperatures). Ultimately, linking DL and mechanistic models may improve predictive performance and help develop causal understanding of the systems of interest.

Meta-models and Model Emulation

Another potential application of DL in models of ecosystem dynamics is as model emulators or meta-models. Even with access to large-scale computing infrastructure, there are limits to which brute-force approaches can run complex ecological models over large areas and/ or long periods. Many techniques have been proposed for scaling models before, during, or after model application (Fritsch and others 2020), including meta-modelling (Urban and others 1999; Cipriotti and others 2015) or model emulation (Reichstein and others 2019). The basis of meta-modelling is that a simpler (in computational or representational terms) form of a complex model is developed and applied over larger, longer, or more heterogeneous conditions, or used in what would be otherwise unfeasible computational experiments. For example, Cipriotti and others (2015) used matrix models to synthesise a complex individual-based model of grassland dynamics by tracking transitions between states in grid cells. DL models provide a way to deal with cases with many states and a more complex environment, in which full coverage of all possible combinations is impossible by conventional approaches. Rammer and Seidl (2019a) used a DNN that learns the probability of transitions between 103 and 106 ecosystem states from process-based simulations conditional on state history, spatial context, and environmental conditions. The approach was subsequently used to project post-fire regeneration under future climate and fire regimes for the Greater Yellowstone Ecosystem (USA), projecting substantial regeneration failure in the twenty-first century due to limited seed supply and post-fire drought (Rammer and others 2021). Similarly, Dagon and others (2020) trained a feed-forward neural network to emulate a detailed model of ecosystem fluxes at extended spatial scales.

Challenges for DL in Ecosystem Ecology

Deep learning has considerable potential for ecosystem ecology, as illustrated for the three application domains described above. However, considerable challenges remain. Here we consider three challenges for the use of DL in ecosystem ecology and discuss potential ways to mitigate them: (i) data availability, and especially large labelled databases for supervised learning, (ii) the issues of interpretability in data-led modelling (for example, understanding why a model makes a given prediction), and (iii) the environmental costs of data-led methods.

Dealing with a Paucity of High-Quality (Labelled) Data

Most applications of DL by ecosystem ecologists have involved supervised classification; in other words, a model learns its task using a labelled or annotated training (reference) dataset. However, supervised learning depends on the availability and veracity of large labelled datasets (Karpatne and others 2019), especially given the concern that DL models may overfit when trained on small datasets (Goodfellow and others 2016). In a number of examples reviewed earlier, model performance was negatively affected by scarce and patchy data, especially for extreme conditions. There is a massive effort involved in developing expert-curated training sets, whether ecological or not. Citizen science may provide one solution; the Serengeti snapshot database (Swanson and others 2015), for example, contains 3.2 million images of animals across 1.2 million snapshot captures, which have been labelled (presence, identification, count) by volunteers at an estimated cost of 14.6 years’ worth of 40-h weeks. However, while citizen science initiatives may increase the scope of such efforts they will also potentially carry biases in space, time and expertise, although this will vary with the project and may not differ from ‘professional’ data (Kosmala and others 2016). The effort to measure plant functional traits by integrating data from the open TRY database and the citizen science application iNaturalist described by Schiller and others (2021) is an interesting example of how different data streams can be used to develop global syntheses. Irrespective of such efforts, there are many ecological contexts where there will be a persistent shortage of high-quality labelled.

Various solutions have been proposed to address the issue of limited training data based on concerns that models trained on small datasets are vulnerable to overfitting. First, although the large majority of ecological applications use supervised learning, the development of unsupervised and self-supervised algorithms that circumvent the need for extensive labelled training data is an active area of research (for example, Yan and Wang 2022). Where supervised models are used, two solutions to data paucity are generating synthetic data to augment existing databases and minimising the amount of labelled data required. Data augmentation is the generation of new training data from existing training examples. For example, images can be geometrically altered (shifting, mirroring, rotating, zooming, shearing) or audio data distorted to increase the data set while not having to add more raw information or labelling effort. This approach has received some attention from ecosystem ecologists. For example, Grünig and others (2021) used data augmentation to expand the data available to train a model for detecting and classifying damage to plants by pests and pathogens. Another alternative, especially for temporal data is to use the output of physical simulations to train DL models (a form of meta-modelling); of course, using a process-based model to train a DL relies on the robustness and/or the transferability of the physical model.

Another way to deal with the problem of the data required to train effective DL models is to limit the amount of labelling required. Two approaches that seek to achieve this are transfer learning and active leaving. Transfer learning takes advantage of models developed for one specific setting elsewhere (Goodfellow and others 2016; Weiss and others 2016). Transfer learning has three potential benefits compared to training a new model ‘from scratch’ (Torrey and Shavlik 2010): better initial performance, more rapid improvement in performance as the model is trained, and better final performance. Transfer learning leverages the property that in broad problem domains (for example, image classification) the early layers are often similar across DL models, irrespective of the specific problem (Yosinski and others 2014). By using pre-trained models as the starting point for model training, knowledge can be transferred (for example, general image understanding in the context of a DL model) to a new task where there is limited labelled data (Goodfellow and others 2016). Another approach aiming to reduce the labelling burden is active learning, that uses methods to select the most informative examples (that is, those from which the DNN can learn the most at a given point in time) from the pool of unlabelled data. In an iterative process an expert user is occasionally asked to label such informative unclassified samples during model training (Norouzzadeh and others 2021), thus selectively extending the data set. The hope is that by being selective about which data are labelled by the expert (the so-called oracle) the costs involved will drop as a reduced set of the most informative data are selected for annotation.

Ecologists have already begun to use active and transfer learning. For example, Valan and others (2019) used transfer learning in the taxonomic identification of invertebrates (via a CNN) because of a lack of training data and concern over the computational cost of fine-turning. They used a pre-trained CNN trained on the ImageNet data set (currently 1.4 × 107 images in 100,000 classes), extracted features (that is, an intermediate representation in the CNN) and used these to train a support vector machine with a smaller labelled dataset (100–1000 s of images). Transfer learning can involve models trained on quite different data. Norouzzadeh and others (2018) tested the ability of their models of wildlife imagery when trained on smaller datasets simulating wildlife cameras and the generic ImageNet database, which is not wildlife-specific. In both cases, the models performed well. Russo and others (2020) tested the effectiveness of active learning to reduce the labelling effort involved in detecting anomalies in data (in their case, specific conductivity in mesocosm experiments). Their workflow involved labelling data (complete labelling, random labelling of a subset, active learning) and then training DNN models using these labels. Their analysis demonstrates that models with high predictive accuracy can be developed with a fraction of the labelling effort using an active learning method. Likewise, Norouzzadeh and others (2021) demonstrate how a workflow integrating active learning can massively reduce labelling requirements; the most accurate of the algorithm they used had an accuracy comparable to that of Norouzzadeh and others (2018) but labelling just 14,000 versus 3.2 million images (a 99% reduction).

Prediction, Explanation, Interpretability, and Learning

Ecological modellers have long debated the relative merits of simple and complex models in various guises such as the realism versus tractability trade-off (Levins 1966; Evans and others 2013; Razavi 2021). This argument is particularly acute for deep learning methods, especially given their seemingly “unreasonable effectiveness” (Sejnowski 2020) and the large amounts of data they typically require. While in some problem domains explainability may not matter, in others it does. Thus, there is growing interest in ‘interpretable machine learning’ (Murdoch and others 2019). Roscher and others (2020) distinguish between: (i) transparency (being able to communicate the decisions made in the model implementation process and how they influence the outcomes), (ii) interpretability (for example, using post hoc assessment to understand how a decision based on a model prediction was reached) and (iii) explainability (explaining the outcome of a modelling exercise in a process-sense, acknowledging the context-dependent nature of explanation). Methods designed to help a modelling exercise develop these qualities have begun to be used by ecologists. These methods can examine the global (how the model learned to identify patterns from the data it was trained with) or local model structure (why did a model make a prediction for a given site or sample), and are reviewed in detail in an ecological context by Lucas (2020). Ryo and others (2021) illustrate the use of these interpretation methods (‘explainable AI’) in the context of species distribution models and highlight how explaining the global model and individual predictions can yield improved causal understanding of the system being predicted. Other examples include the visual interrogation of DL models using saliency maps, which depict how each data point influences the nodes in a DNN, or by methods that highlight surprising predictions (McGovern and others 2019). Likewise, sensitivity analysis and layerwise relevance propagation can facilitate understanding of a model’s outcomes by mapping the relationship between inputs and outputs (Montavon and others 2018; Toms and others 2020). These methods and others more routinely applied to machine learning approaches (for example, variable importance metrics or partial dependence plots) help understand the model (interpretability) but do not necessarily generate knowledge of themselves. Thus, as Roscher and others (2020) and Razavi (2021) emphasise, domain-specific expertise remains crucial for interpreting and assessing DL models’ credibility and predictions.

The ability of DL to uncover patterns in large, messy and heterogeneous data may inspire new hypotheses that can be tested with experiments or models. Identifying surprising predictions (or ones that have not been observed) is important because one route to model-based learning is for such predictions to be empirically confirmed (Mankin and others 1975). As Reichstein and others (2019) outline, this does not necessarily challenge the ‘classical’ hypothetico-deductive model; instead, the patterns identified by DL approaches constitute new ways to observe complex systems. Patterns that cannot be explained by existing theoretical frameworks can guide and inform new experiments. In this way, resource-intensive experiments could be more efficiently targeted. For example, the RNN developed by Kraft and others (2019) to explore how memory effects vary across biomes generates hypotheses about the causes of that biome-level variation (in their case, they speculate that the temporal grain of climate variability will influence the importance of memory effects). Where observations do not depart from existing theory or understanding, they may improve model predictions and parameterisation. In this context, there is likely an important role for unsupervised methods; that is, those where a model makes a prediction to unlabelled data. For example, Sonnewald and others (2020) used machine learning techniques to identify marine eco-provinces from high-dimensional nutrient and plankton data; their approach identified approximately 100 unique eco-provinces. Then, the question for ecosystem ecology is, which biogeochemical and ecological processes and variables control those eco-provinces? Why do they vary through time and space? And how do they relate to existing classifications derived in other ways (for example, via expert assessment)?

Reconciling Energy and Environmental Costs of Data-Led Approaches

The modelling most frequently conducted by ecologists is not as energy expensive as the approaches used in other fields, such as large-scale natural language models. Yet, in a review of the ecological applications of DL it would be remiss not to touch on recent concerns about the environmental (mainly C and energy) costs of computer-intensive methods (Dhar 2020; Schwartz and others 2020). These concerns are true of any computationally expensive approach, although the size and training effort in some DL models makes it acute (Thompson and others 2020). The emphasis in DL models has been on extracting maximal predictive performance, which will result in high energy usage. However, as Canziani and others (2017) demonstrate, energy limits probably set upper bounds on practical accuracy given the relationship between time and performance is hyperbolic, and so it becomes necessary to trade off predictive performance and energy cost. Less computationally intensive models may also be more practical for deployment in edge computing (Tuia and others 2022). Recently, guidelines for environmentally sustainable computing have been published (Lannelongue and others 2021) alongside calls for the energy costs of computationally intensive projects to be reported (Lottick and others 2019; Strubell and others 2020). To support efficiency in reporting open-source code bases and online apps (for example, Green Algorithms [https://green-algorithms.org/] and machine learning CO2 Impact [https://mlco2.github.io/impact/]) have been developed. We anticipate rapid developments in this area and a move towards an energy conscious ‘green AI’ (Schwartz and others 2020).

Conclusion

Whether some of the more hyperbolic claims regarding DL will prove well-founded remains to be seen, but there is no doubt that deep learning algorithms offer ecologists opportunities for prediction and understanding in the dawning age of big data. These opportunities range from incremental advances in existing questions (that is, application of DL methods to existing problems), to the expansion of the scope and scale we ask questions at, to entirely new (and unpredictable) questions and processing capacities. We anticipate that hybrid physical-DL models are an area where there are particular opportunities for ecosystem ecology as the binary view of mechanistic vs. empirical models becomes blurred. However, DL methods also amplify debates about the place of data, theory and models in science. To understand data, do we need a hypothetical generating model? Or can we identify empirical truisms to make predictions? These questions are long-standing and likely unresolvable; focusing on them might be unhelpful for advancing the science of ecology, particularly if they are posed as binaries. Thus, the challenge for ecosystem ecologists in leveraging data-led approaches is not solely technical but is also to reconcile competing narratives in ways that equip us to deal with a rapidly changing environment.