A review of machine learning applications in wildfire science and management

Artificial intelligence has been applied in wildfire science and management since the 1990s, with early applications including neural networks and expert systems. Since then the field has rapidly progressed congruently with the wide adoption of machine learning (ML) in the environmental sciences. Here, we present a scoping review of ML in wildfire science and management. Our objective is to improve awareness of ML among wildfire scientists and managers, as well as illustrate the challenging range of problems in wildfire science available to data scientists. We first present an overview of popular ML approaches used in wildfire science to date, and then review their use in wildfire science within six problem domains: 1) fuels characterization, fire detection, and mapping; 2) fire weather and climate change; 3) fire occurrence, susceptibility, and risk; 4) fire behavior prediction; 5) fire effects; and 6) fire management. We also discuss the advantages and limitations of various ML approaches and identify opportunities for future advances in wildfire science and management within a data science context. We identified 298 relevant publications, where the most frequently used ML methods included random forests, MaxEnt, artificial neural networks, decision trees, support vector machines, and genetic algorithms. There exists opportunities to apply more current ML methods (e.g., deep learning and agent based learning) in wildfire science. However, despite the ability of ML models to learn on their own, expertise in wildfire science is necessary to ensure realistic modelling of fire processes across multiple scales, while the complexity of some ML methods requires sophisticated knowledge for their application. Finally, we stress that the wildfire research and management community plays an active role in providing relevant, high quality data for use by practitioners of ML methods.


Introduction
Wildland fire is a widespread and critical element of the earth system (Bond & Keeley, 2005), and is a continuous global feature that occurs in every month of the year. Presently, global annual area burned is estimated to be approximately 420 Mha (Giglio, Boschetti, Roy, Humber, & Justice, 2018), which is greater in area than the country of India. Globally, most of the area burned by wildfires occurs in grasslands and savannas. Humans are responsible for starting over 90% of wildland fires, and lightning is responsible for almost all of the remaining ignitions. Wildland fires can result in significant impacts to humans, either directly through loss of life and destruction to communities, or indirectly through smoke exposure. Moreover, as the climate warms we are seeing increasing impacts from wildland fire (Coogan, Robinne, Jain, & Flannigan, 2019). Consequently, billions of dollars are spent every year on fire management activities aimed at mitigating or preventing wildfires negative effects. Understanding and better predicting wildfires is therefore crucial in several important areas of wildfire management, including emergency response, ecosystem management, land-use planning, and climate adaptation to name a few.
Wildland fire itself is a complex process; its occurrence and behaviour are the product of several interrelated factors, including ignition source, fuel composition, weather, and topography. Furthermore, fire activity can be examined viewed across a vast range of scales, from ignition and combustion processes that occur at a scale of centimeters over a period of seconds, to fire spread and growth over minutes to days from meters to kilometers. At larger extents, measures of fire frequency may be measured over years to millennia at regional, continental, and planetary scales (see Simard (1991) for a classification of fire severity scales, and S. W. Taylor, Woolford, Dean, and Martell (2013) for a review of numerical and statistical models that have been used to characterize and predict fire activity at a range of scales). For example, combustion and fire behavior are fundamentally physicochemical processes that can be usefully represented in mechanistic (i.e., physics-based) models at relatively fine scales Coen (2018). However, such models are often limited both by the ability to resolve relevant physical processes, as well as the quality and availability of input data Hoffman et al. (2016). Moreover, with the limitations associated with currently available computing power it is not feasible to apply physical models to inform fire management and research across the larger and longer scales that are needed and in near real time. Thus, wildfire science and management rely heavily on the development of empirical and statistical models for meso, synoptic, strategic, and global scale processes Simard (1991), the utility of which are dependent upon their ability to represent the often complex and non-linear relationships between the variables of interest, as well as by the quality and availability of data.
While the complexities of wildland fire often present challenges for modelling, significant advances have been made in wildfire monitoring and observation primarily due to the increasing availability and capability of remote-sensing technologies. Several satellites (eg. NASA TERRA, AQUA and GOES), for instance, have onboard fire detection sensors (e.g., Advanced Very High Resolution Radiometer (AVHRR), Moderate Resolution Imaging Spectroradiometer (MODIS), Visible Infrared Imaging Radiometer Suite (VIIRS)), and these sensors along with those on other satellites (e.g., LANDSAT series) routinely monitor vegetation distributions and changes. Additionally, improvements in numerical weather prediction and climate models are simultaneously offering smaller spatial resolutions and longer lead forecast times Bauer, Thorpe, and Brunet (2015) which potentially offer improved predictability of extreme fire weather events. Such recent developments make a data-centric approach to wildfire modeling a natural evolution in research problem domains that use empirical and statistical models, given sufficient data. Consequently, there has been a growing interest in the use of Machine Learning (ML) -a subset of Artificial Intelligence (AI)methodologies in wildfire science and management in recent years.
Although no formal definition exists, we adopt the conventional interpretation of ML as the branch of science that involves programming computers to learn to perform certain tasks without being explicitly programmed (T. Mitchell (1997)). This approach is necessarily data-centric, with the performance of ML algorithms dependent on the quality and quantity of available data relevant to the task at hand. The motivations for using AI for forested ecosystem related research, including disturbances due to wildfire, insects, and disease, were discussed in an early paper Schmoldt (2001), while Olden, Lawler, and Poff (2008) further argued for the use of ML methods to model complex problems in ecology. The use of ML models in the environmental sciences has seen a rapid uptake in the last decade, as is evidenced by recent reviews in the geosciences Karpatne, Ebert-Uphoff, Ravela, Babaie, and Kumar (2017), forest ecology Z. , extreme weather prediction McGovern et al. (2017), flood forecasting Mosavi et al. (2018), statistical downscaling Vandal, Kodra, and Ganguly (2018), remote sensing Lary, Alavi, Gandomi, and Walker (2016), and water resources Shen (2018); Sun and Scanlon (2019). Two recent perspectives have also made compelling arguments for the application of deep learning in earth system sciences Reichstein et al. (2019) and for tackling climate change Rolnick et al. (2019). To date, however, no such paper has synthesized the diversity of ML approaches used in the various challenges facing wildland fire science.
In this paper, we review the current state of literature on ML applications in wildfire science and management. Our overall objective is to improve awareness of ML methods among fire researchers and managers, and illustrate the diverse and challenging problems in wildfire open to data scientists. This paper is organized as follows. In Section 2, we discuss commonly used ML methods, focusing on those most commonly encountered in wildfire science. In Section 3, we give an overview of the scoping review and literature search methodology employed in this paper. In this section we also highlight the results of our literature search and examine the uptake of ML methods in wildfire science since the 1990s. In Section 4, we review the relevant literature within six broadly categorized wildfire modeling domains: (i) Fuels characterization, fire detection, and mapping; ii) fire weather and climate change; (iii) fire probability and risk; (iv) fire behavior prediction; (v) fire effects; and (vi) fire management. In Section 5, we discuss our findings and identify further opportunities for the application of ML methods in wildfire science and management. Finally, in Section 6 we offer conclusions. Thus, this review will serve to guide and inform both researchers and practitioners in the wildfire community looking to use ML methods, as well as provide ML researchers the opportunity to identify possible applications in wildfire science and management.
2 Artificial Intelligence and Machine Learning "Definition: Machine Learning -(Methods which) detect patterns in data, use the uncovered patterns to predict future data or other outcomes of interest" from Machine Learning: A Probabilistic Perspective, 2012 (Murphy, 2012).
ML itself can be seen as a branch of AI or statistics, depending who you ask, that focuses on building predictive, descriptive, or actionable models for a given problem by using collected data specific to that problem. ML methods learn directly from data and dispense with the need for a large number of expert rules or the need to model individual environmental variables accurately. ML algorithms develop their own internal model of the dynamics when learning from data and thus need not be explicitly provided with physical properties of different parameters, for example, those affecting wildland fire. For example, the current state of the art in wildfire prediction includes physics-based simulators that fire fighters and strategic planners rely on to take many critical decisions regarding allocation of scarce firefighting resources in the event of a wildfire (A. Sullivan, 2007). These physics-based simulators, however, have certain critical limitations; they normally render very low accuracies, have a prediction bias in regions where they are designed to be used, are often hard to design and implement due to the requirement of large number of expert rules. Furthermore, modelling many complex environmental variables is often difficult due to large resource requirements and complex or heterogeneous data formats. ML algorithms, however, learn their own parametric rules directly from data and do not require expert rules, which is particularly advantageous when the number of parameters are quite large and their physical properties quite complex, as in the case of wildland fire. Therefore, a ML approach to wildfire response may help to avoid many of the limitations of physics-based simulators.
A major goal of this review is to provide an overview of the various ML methods utilized in wildfire science and management. Importantly, we also provide a generalized framework for guiding wildfire scientists interested in applying ML methods to specific problem domains in wildland fire research. This conceptual framework, derived from the approach in (Murphy, 2012) and modified to show examples relevant to wildland fire and management is shown in Fig. 1. In general, ML methods can be identified as belong to one of three types: supervised learning; unsupervised learning; or, agent based learning. We describe each of these below.
Supervised Learning -In supervised ML all problems can be seen as one of learning a parametrized function, often called a "model", that maps inputs (i.e., predictor variables) to outputs (or target variables) both of which are known. The goal of supervised learning is to use an algorithm to learn the parameters of that function using available data. In fact, both linear and logistic regression can be seen as very simple forms of supervised learning. Most of the successful and popular ML methods fall into this category.
Unsupervised Learning -If the target variables are not available, then ML problems are typically much harder to solve. In unsupervised learning, the canonical tasks are dimensionality reduction and clustering, where relationships or patterns are extracted from the data without any guidance as to the "right" answer. Extracting embedded dimensions which minimize variance, or assigning datapoints to (labelled) classes which maximize some notion of natural proximity or other measures of similarity are examples of unsupervised ML tasks.
Agent Based Learning -Between supervised and unsupervised learning are a group of ML methods where learning happens by simulating behaviors and interactions of a single or a group of autonomous agents. These are general unsupervised methods which use incomplete information about the target variables, (i.e., information is available for some instances but not others), requiring generalizable models to be learned. A specific case in this space is Reinforcement Learning (Sutton & Barto, 1998), which is used to model decision making problems over time where critical parts of the environment can only be observed interactively through trial and error. This class of problems arises often in the real world and require efficient learning and careful definition of values (or preferences) and exploration strategies.
In the next section, we present a brief introduction to commonly used ML methods from the aforementioned learning paradigms. We note that this list is not meant to be exhaustive, and that some methods can accommodate both supervised and unsupervised learning tasks. It should be noted that the classification of a method as belonging to either ML or traditional statistics is often a question of taste. For the purpose of this review -and in the interests of economy -we have designated a number of methods as belonging to traditional statistics rather than ML. For a complete listing see tables 1 and 2.

Decision Trees
Decision Trees (DT) (Breiman, 2017) belong to the class of supervised learning algorithms and are another example of a universal function approximator, although in their basic form such universality is difficult to achieve. DTs can be used for both classification and regression problems. A decision tree is a set of ifthen-else rules with multiple branches joined by decision nodes and terminated by leaf nodes. The decision node is where the tree splits into different branches, with each branch corresponding to the particular decision being taken by the algorithm whereas leaf nodes represent the model output. This could be a label for a classification problem or a continuous value in case of a regression problem. A large set of decision nodes is used in this way to build the DT. The objective of DTs are to accurately capture the relationships between input and outputs using the smallest possible tree that avoids overfitting. C4.5 (Quinlan, 1993) and Classification and Regression Trees (CART, (Breiman, Friedman, Olshen, & Stone, 1984)) are examples of common single DT algorithms. Note that while the term CART is also used as an umbrella term for single tree methods, we use DT here to refer to all such methods. The majority of decision tree applications are ensemble decision tree (EDT) models that use multiple trees in parallel (ie. bootstrap aggregation or bagging) or sequentially (ie., boosting) to arrive at a final model. In this way, EDTs make use of many weak learners to form a strong learner while being robust to overfitting. EDTs are well described in many ML/AI textbooks and are widely available as implemented libraries.

Random Forests
A Random Forest (RF) (Breiman, 2001) is an ensemble model composed of a many individually trained DTs, and is the most popular implementation of a bagged decision tree. Each component DT in a RF  Figure 1: A diagram showing the main machine learning types, types of data, and modeling tasks in relation to popular algorithms and potential applications in wildfire science and management. Note that the algorithms shown bolded are core ML methods whereas non-bolded algorithms are often not considered ML.
model makes a classification decision where the class with the maximum number of votes is determined to be the final classification for the input data. RFs can also be used for regression where the final output is determined by averaging over the individual tree outputs. The underlying principle of the RF algorithm is that a random subset of features is selected at each node of each tree; the samples for training each component tree are selected using bagging, which resamples (with replacement) the original set of datapoints. The high performance of this algorithm is achieved by minimizing correlation between trees while reducing model variance so that a large number of different trees provides greater accuracy than individual trees. However, this improved performance comes at the cost of an increase in bias and loss of interpretability (although variable importance can still be inferred through permutation tests).

Boosted Ensembles
Boosting describes a strategy where one combines a set of weak learners -usually decision trees -to make a strong learner using a sequential additive model. Each successive model improves on the previous by taking into account the model errors from the previous model, which can be done in more than one way. For example, the adaptive boosting algorithm, known as AdaBoost (Freund & Shapire, 1995), works  spect to its parameters, i.e., how much a change in the parameters will change the output of the function. GBMs sequentially build an ensemble of multiple weak learners by following a simple gradient which points in the opposite direction to weakest results of the current combined model (Friedman, 2001). The details for the GBM algorithm are as follows. Denoting the target output as Y , and given a tree-based ensemble model, represented as a function T i (X) → Y , after adding i weak learners already, the "perfect" function for the (i + 1)th weak learner would be h( In practice, we can only approach this perfect update by performing functional gradient descent where we use an approximation of the true residual (i.e., loss function) at each step. In our case this approximation is simply the sum of the residuals from each weak learner decision tree L(Y, T (X)) = i Y − T i (X). GBM explicitly uses the gradient ∇ T i L(Y, T i (X) of the loss function of each tree to fit a new tree and add it to the ensemble.
In a number of domains, and particularly in the context of ecological modeling GBM is often referred to as Boosted Regression Trees (BRTs) (Elith, Leathwick, & Hastie, 2008). For consistency with the majority of literature reviewed in this paper we henceforth use the latter term. It should be noted that while deep neural networks (DNNs) and EDT methods are both universal function approximators, EDTs are more easily interpretable and faster to learn with less data than DNNs. However, there are fewer and fewer cases where trees-based methods can be shown to provide superior performance on any particular metric when DNNs are trained properly with enough data (see for example, Korotcov, Tkachenko, Russo, and Ekins (2017)).

Support Vector Machines
Another category of supervised learning includes Support Vector Machines (SVM) (Hearst, Dumais, Osuna, Platt, & Scholkopf, 1998) and related kernel-based methods. SVM is a classifier that determines the hyperplane (decision boundary) in an n-dimensional space separating the boundary of each class, for data in n dimensions. SVM finds the optimal hyper-plane in such a way that the distance between the nearest point of each class to the decision boundary is maximized. If the data can be separated by a line then the hyper-plane is defined to be of the form w T x + b = 0 where the w is the weight vector, x is the input vector and b is the bias. The distance of the hyper-plane to the closest data point d, called a support vector, is defined as the margin of separation. The objective is to find the optimal hyper-plane that minimizes the margin. If they are not linearly separable, kernel SVM methods such as Radial Basis Functions (RBF) first apply a set of transformations to the data to a higher dimensional space where finding this hyperplane would be easier. SVMs have been widely used for both classification and regression problems, although recently developed deep learning algorithms have proved to be more efficient than SVMs given a large amount of training data. However, for problems with limited training samples, SVMs might give better performances than deep learning based classifiers.

Artificial Neural Networks and Deep Learning
The basic unit of an Artificial Neural Network (ANN) is a neuron (also called a perceptron or logistic unit). A neuron is inspired by the functioning of neurons in mammalian brains in that it can learn simple associations, but in reality it is much simpler than its biological counterpart. A neuron has a set of inputs which are combined linearly through multiplication with weights associated with the input. The final weighted sum forms the output signal which is then passed through a (generally) non-linear activation function. Examples of activation functions include sigmoid, tanh, and the Rectified Linear Unit (ReLU). This non-linearity is important for general learning since it creates an abrupt cutoff (or threshold) between positive and negative signals. The weights on each connection represent the function parameters which are fit using supervised learning by optimizing the threshold so that it reaches a maximally distinguishing value.
In practice, even simple ANNs, often called Multi-Layered Perceptrons (MLP), combine many neuron units in parallel, each processing the same input with independent weights. In addition, a second layer of hidden neuron units can be added to allow more degrees of freedom to fit general functions, see Figure  2(a). MLPs are capable of solving simple classification and regression problems. For instance, if the task is one of classification, then the output is the predicted class for the input data, whereas in the case of a regression task the output is the regressed value for the input data. Deep learning (LeCun, Bengio, & Hinton, 2015) refers to using Deep Neural Networks (DNNs) which are ANNs with multiple hidden layers (nominally more than 3) and include Convolutional Neural Networks (CNNs) popularized in image analysis and Recurrent Neural Networks (RNNs) which can be used to model dynamic temporal phenomena. The architecture of DNNs can vary in connectivity between nodes, the number of layers employed, the types of activation functions used, and many other types of hyperparameters. Nodes within a single layer can be fully connected, or connected with some form of convolutional layer (e.g., CNNs), recurrent units (e.g., RNNs), or other sparse connectivity. The only requirement of all these connectivity structures and activation functions is that they are differentiable.
Regardless of the architecture, the most common process of training a ANN involves processing input data fed through the network layers and activation functions to produce an output. In the supervised setting, this output is then compared to the known true output (i.e., labelled training data) resulting in an error measurement (loss or cost function) used to evaluate model performance. The error for DNNs are commonly calculated as a cross entropy loss between the predicted output label and the true output label. Since every part of the network is mathematically differentiable we can compute a gradient for the entire network. This gradient is used to calculate the proportional change in each network weight needed to produce an infinitesimal increase in the likelihood of the network producing the same output for the most recent output. The gradient is then weighted by the computed error, and thereafter all the weights are updated in sequence using a backpropagation algorithm (Hecht-Nielsen, 1992).
ANNs can also be configured for unsupervised learning tasks. For example, self-organizing maps (SOMs) are a form of ANN adapted for dealing with spatial data and have therefore found widespread use in the atmospheric sciences (Skific & Francis, 2012). A SOM is a form of unsupervised learning that consists of a two-dimensional array of nodes as the input layer, representing say, a gridded atmospheric variable at a single time. The algorithm clusters similar atmospheric patterns together and results in a dimensionality reduction of the input data. More recently, unsupervised learning methods from deep learning, such as autoencoder networks, are starting to replace SOMs in the environmental sciences (Shen, 2018).

Bayesian Networks
Bayesian networks (Bayes net, belief network; BN) are a popular tool in many applied domains because they provide an intuitive graphical language for specifying the probabilistic relationships between variables as well as tools for calculating the resulting probabilities (Pearl, 1988). The basis of BNs is Bayes theorem, which relates the conditional and marginal probabilities of random variables. BNs can be treated as a ML Figure 2: Logistic regression can be seen as basic building block for neural networks, with no hidden layer and a sigmoid activation function. Classic shallow neural networks (also known as Multi-Layer Perceptrons) have at least one hidden layer and can have a variety of activation functions. Deep neural networks essentially have a much larger number of hidden layers as well as use additional regularization and optimization methods to enhance training. task if one is trying to automatically fit the parameters of the model from data, or even more challenging, to learn the best graphical structure that should be used to represent a dataset. BNs have close ties to causal reasoning, but it is important to remember that the relationships encoded in a BN are inherently correlational rather than causal. BNs are acyclic graphs, consisting of nodes and arrows (or arcs), defining a probability distribution over variables U. The set of parents of a node (variable) X, denoted π X , are all nodes with directed arcs going into X. BNs provide compact representation of conditional distributions since p(X i |X 1 , . . . , X i−1 ) = p(X i |π X i ) where X 1 , . . . , X i−1 are arranged to be all of the ancestors of X i other than its direct parents. Each node X is associated with a conditional probability table over X and its parents defining p(X|π X ). If a node has no parents, a prior distribution is specified for p(X). The joint probability distribution of the network is then specified by the chain rule P (U ) = X∈U p(X|π X ).

Naïve Bayes
A special case of a BN is the Naïve Bayes (NB) classifier, which assumes conditional independence between input features, which allows the likelihood function to be constructed by a simple multiplication of the conditional probability of each input variable conditional on the output. Therefore, while NB is fast and straightforward to implement, prediction accuracy can be low for problems where the assumption of conditional independence does not hold.

Maximum Entropy
Maximum Entropy (MaxEnt), originally introduced by Phillips, Aneja, Kang, and Arya (2006), is a presence only framework that fits a spatial probability distribution by maximising entropy, consistent with existing knowledge. MaxEnt can be considered a Bayesian method since it is compatible with an application of Bayes Theorem as existing knowledge is equivalent to specifying a prior distribution. MaxEnt has found widespread use in landscape ecology species distribution modeling (Elith et al., 2011), where prior knowledge consists of occurrence observations for the species of interest.

Genetic Algorithms
Genetic algorithms (GA) are heuristic algorithms inspired by Darwin's theory of evolution (natural selection) and belong to a more general class of evolutionary algorithms (M. Mitchell, 1996). GAs are often used to generate solutions to search and optimization problems by using biologically motivated operators such as mutation, crossover, and selection. In general, GAs involve several steps. The first step involves creating an initial population of potential solutions, with each solution encoded as a chromosome. Second a fitness function appropriate to the problem is defined, which returns a fitness score determining how likely an individual is to be chosen for reproduction. The third step requires the selection of pairs of individuals, denoted as parents. In the fourth step, a new population of finite individuals are created by generating two new offspring from each set of parents using crossover, whereby a new chromosome is created by some random selection process from each parents chromosomes. In the final step called mutation, a small sample of the new population is chosen and a small perturbation is made to the parameters to maintain diversity. The entire process is repeated many times until the desired results are satisfactory (based on the fitness function), or some measure of convergence is reached.

Reinforcement Learning
Reinforcement learning (RL) represents a very different learning paradigm to supervised or unsupervised learning. In RL, an agent (or actor) interacts with its environment and learns a desired behavior (set of actions) in order to maximize some reward. RL is a solution to a Markov Decision Process (MDP) where the transition probabilities are not explicitly known but need to be learned. This type of learning is well suited to problems of automated decision making, such as required for automated control (e.g., robotics) or for system optimization (e.g., management policies). Various RL algorithms include Monte Carlo Tree Search (MTCS), Q-Learning, and Actor-Critic algorithms. For an introduction to RL see Sutton and Barto (2018).

Clustering methods
Clustering is the process of splitting a set of points into groups where each point in a group is more similar to its own group than any other group. There are different ways in which clustering can be done, for example, the K-means (KM) clustering algorithm (MacQueen et al., 1967), based on a centroid model, is perhaps the most well-known clustering algorithm. In K-means, the notion of similarity is based on closeness to the centroid of each cluster. K-means is an iterative process in which the centroid of a group and points belonging to a group are updated at each step. The K-means algorithm consists of five steps: (i) specify the number of clusters; (ii) each data point is randomly assigned to a cluster; (iii) the centroids of each cluster is calculated; (iv) the points are reassigned to the nearest centroids, and (v) cluster centroids are recomputed. Steps iv and v repeat until no further changes are possible. Although KM is the most widely used clustering algorithm, several other clustering algorithms exist including, for example, agglomerative Hierarchical Clustering (HC), Gaussian Mixture Models (GMMs) and Iterative Self-Organizing DATA (ISODATA).

K-Nearest Neighbor
The K-Nearest Neighbors (KNN) algorithm is a simple but very effective supervised classification algorithm which is based on the intuitive premise that similar data points are in close proximity according to some metric (Altman, 1992). Specifically, a KNN calculates the similarity of data points to each other using the Euclidean distance between the K nearest data points. The optimal value of K can be found experimentally over a range values using the classification error.

Neuro-Fuzzy models
Fuzzy logic is an approach for encoding expert human knowledge into a system by defining logical rules about how different classes overlap and interact without being constrained to "all-or-nothing notions of set inclusion or probability of occurrence. Although early implementations of fuzzy logic systems depended on setting rules manually, and therefore are not considered machine learning, using fuzzy rules as inputs or extracting them from ML methods are often described as "neuro-fuzzy" methods. For example, the Adaptive Neuro-Fuzzy Inference System (ANFIS) (Jang, 1993) fuses fuzzy logical rules with an ANN approach, while trying to maintain the benefits of both. ANFIS is a universal function approximator like ANNs. However, since this algorithm originated in the 1990s, it precedes the recent deep learning revolution so is not necessarily appropriate for very large data problems with complex patterns arising in high-dimensional spaces. Alternatively, human acquired fuzzy rules can be integrated into ANNs learning; however, it is not guaranteed that the resulting trained neural network will still be interpretable. It should be noted that fuzzy rules and fuzzy logic are not a major direction of research within the core ML community.

Literature search and scoping review
The combination of ML and wildfire science and management comprises a diverse range of topics in a relatively nascent field of multidisciplinary research. Thus, we employed a scoping review methodology (Arksey & O'Malley, 2005;Levac, Colquhoun, & O'Brien, 2010) for this paper. The goal of a scoping review is to characterize the existing literature in a particular field of study, particularly when a topic has yet to be extensively reviewed and the related concepts are complex and heterogeneous (Pham et al., 2014). Furthermore, scoping reviews can be particularly useful for summarizing and disseminating research findings, and for identifying research gaps in the published literature. A critical review of methodological advances and limitations and comparison with other methods is left for future work. We performed a literature search using the Google Scholar and Scopus databases and the key words "wildfire" or "wildland fire" or"forest fire" or "bushfire" in combination with "machine learning" or "random forest" or "decision trees" or "regression trees" or "support vector machine" or "maximum entropy" or "neural network" or "deep learning" or "reinforcement learning". We also used the Fire Research Institute online database (http://fireresearchinstitute.org) using the following search terms: "Artificial Intelligence"; "Machine Learning"; "Random Forests"; "Expert Systems"; and "Support Vector Machines". Furthermore, we obtained papers from references cited within papers we had obtained using literature databases.
After performing our literature search, we identified a total of 298 publications relevant to the topic of ML applications in wildfire science and management (see supplementary material for a full bibliography). Furthermore, a search of the Scopus database revealed a dramatic increase in the number of wildfire and ML articles published in recent years (see Fig. 3). After identifying publications for review, we further applied the following criteria to exclude non-relevant or unsuitable publications, including: (i) conference submissions where a journal publication describing the same work was available; (ii) conference posters; (iii) articles in which the methodology and results were not adequately described to conduct an assessment

Wildfire applications
In summary, we found a total of 298 journal papers or conference proceedings on the topic of ML applications in wildfire science and management. We found the problem domains with the highest application of ML methods was Fire Occurrence, Susceptibility and Risk (127 papers) followed by Fuels Characterization, Fire Detection And Mapping (65 papers), Fire Behaviour Prediction (43 papers), Fire Effects (34 papers), Fire Weather and Climate Change (20 papers), and Fire Management (16 papers). Within Fire Occurrence, Susceptibility and Risk, the subdomains with the most papers were Fire Susceptibility Mapping (71 papers) and Landscape Controls on Fire (101 papers). Refer to table 3 for a summary of methods used for each subdomain and the supplementary material for a full listing of all papers by subdomain, with ML methods used and study areas considered.

Fuels characterization
Fires ignite in a few fuel particles; subsequent heat transfer between particles through conduction, radiation and convection, and the resulting fire behavior (fuel consumption, spread rate, intensity) is influenced by properties of the live and dead vegetative fuels, including moisture content, biomass, and vertical and horizontal distribution. Fuel properties are a required input in all fire behavior models, whether it be a simple categorical vegetation type, as in the Canadian FBP System, or as physical quantities in a 3 dimensional space (eg. see FIRETEC model). Research to predict fuel properties has been carried out at two different scales 1) regression applications to predict quantities such as the crown biomass of single trees from more easily measured variables such as height and diameter, and 2) classification applications to map fuel type descriptors or fuel quantities over a landscape from visual interpretation of air photographs or by interpretation of the spectral properties of remote sensing imagery. However, relatively few studies have employed ML to wildfire fuel prediction, leaving the potential for substantially more research in this area.
In an early study, Riaño, Ustin, Usero, and Patricio (2005) used an ANN to predict and map the equivalent water thickness and dry matter content of wet and dry leaf samples from 49 species of broad leaf plants using reflectance and transmittance values in the Ispra region of Italy. Pierce, Farris, and Taylor (2012) used RF to classify important canopy fuel variables (e.g. canopy cover, canopy height, canopy base height, and canopy bulk density) related to wildland fire in Lassen Volcanic National Park, California, using field measurements, topographic data, and NDVI to produce forest canopy fuel maps. Likewise, Riley, Grenfell, Finney, and Crookston (2014) used RF with Landfire and biophysical variables to perform fuel classification and mapping in Eastern Oregon. The authors of the aforementioned study achieved relatively high overall modelling accuracy, for example, 97% for forest height, 86% for forest cover, and 84% for existing vegetation group (i.e. fuel type). López-Serrano, López-Sánchez,Álvarez-González, and García-Gutiérrez (2016) compared the performance of three common ML methods (i. SVM; ii. KNN; and iii. RF) and multiple linear regression in estimating above ground biomass in the Sierra Madre Occidental, Mexico. The authors reported the advantages and limitations of each method, concluding that that the non-parametric ML methods had an advantage over multiple linear regression for biomass estimation. García, Riaño, Chuvieco, Salas, and Danson (2011) used SVM to classify LiDAR and multispectral data to map fuel types in Spain. Chirici et al. (2013) compared the use of CART, RF, and Stochastic Gradient Boosting SGB, an ensemble tree method that uses both boosting and bagging, for mapping forest fuel types in Italy, and found that SGB had the highest overall accuracy.

Fire detection
Detecting wildfires as soon as possible after they have ignited, and therefore while they are still relatively small, is critical to facilitating a quick and effective response. Traditionally, fires have mainly been detected by human observers, by distinguishing smoke in the field of view directly from a fire tower, or from a video  Table 3: Summary of application of ML methods applied to different problem domains in wildfire science and management. A table of acronyms for the ML methods are given in 1. Note that in some cases a paper may use more than one ML method and/or appear in multiple problem domains.
feed from a tower, aircraft, or from the ground. All of these methods can be limited by spatial or temporal coverage, human error, the presence of smoke from other fires and by hours of daylight. Automated detection of heat signatures or smoke in infra-red or optical images can extend the spatial and temporal coverage of detection, the detection efficiency in smoky conditions, and remove bias associated with human observation. The analytical task is a classification problem that is quite well suited to ML methods.
CNNs (ie. deep learning), which are able to extract features and patterns from spatial images and are finding widespread use in object detection tasks, have recently been applied to the problem of fire detection. Several of these applications trained the models on terrestrial based images of fire and/or smoke (   similarly used a 3D CNN to incorporate both spatial and temporal information and so were able to treat smoke detection as a segmentation problem for video images. Another approach by Y. Cao, Yang, Tang, and Lu (2019) used convolutional layers as part of a Long Short Term Memory (LSTM) Neural network for smoke detection from a sequence of images (ie. video feed). They found the LSTM method achieved 97.8% accuracy, a 4.4% improvement over a single image-based deep learning method.
Perhaps of greater utility for fire management were fire/smoke detection models trained on either unmanned aerial vehicle (UAV) images (Alexandrov, Pertseva, Berman, Pantiukhin, & Kapitonov, 2019;Y. Zhao, Ma, Li, & Zhang, 2018) or satellite imagery including GOES-16 (Phan & Nguyen, 2019) and MODIS (Ba, Chen, Yuan, Song, & Lo, 2019). Y. Zhao et al. (2018) compared SVM, ANN and 3 CNN models and found their 15-layer CNN performed best with an accuracy of 98%. By comparison, the SVM based method, which was unable to extract spatial features, only had an accuracy of 43%. Alexandrov et al. (2019) found YOLO was both faster and more accurate than a region-based CNN method in contrast to Barmpoutis et al. (2019).
[PJ: note change of subdomain name here -perhaps have to change elsewhere]

Fire perimeter and severity mapping
Fire maps have two management applications: 1) Accurate maps of the location of the active fire perimeter are important for daily planning of suppression activities and/or evacuations, including modeling fire growth 2) Maps of the final burn perimeter and fire severity are important for assessing and predicting the economic and ecological impacts of wildland fire and for recovery planning. Historically, fire perimeters were sketch-mapped from the air, from a ground or aerial GPS or other traverse, or by air-photo interpretation.
Developing methods for mapping fire perimeters and burn severity from remote sensing imagery has been an area of active research since the advent of remote sensing in the 1970s, and is mainly concerned with classifying active fire areas from inactive or non burned areas, burned from unburned areas (for extinguished fires), or fire severity measures such as the Normalized Burn Ratio (Lutes et al., 2006).
In early studies using ML methods for fire mapping Al-Rawi et al. (2001) and Al-Rawi, Casanova, Romo, and Louakfaoui (2002) used ANNs (specifically, the supervised ART-II neural network) for burned scar mapping and fire detection. Pu and Gong (2004) compared Logistic Regression (LR) with ANN for burned scar mapping using Landsat images; both methods achieved high accuracy (> 97%). Interestingly, however, the authors found that LR was more efficient for their relatively limited data set. The authors in Zammit, Descombes, and Zerubia (2006) performed burned area mapping for two large fires that occurred in France using satellite images and three ML algorithms, including SVM, K-nearest neighbour, and the K-means algorithm; overall SVM had the best performance. Likewise, E. Dragozi, I. Z. Gitas, D.G. Stavrakoudis (2011) compared the use of SVM against a nearest neighbour method for burned area mapping in Greece and found better performance with SVM. In fact, a number of studies (Alonso-Benito, Hernandez-Leal, Gonzalez-Calvo, Arbelo, & Barreto, 2008;X. Cao, Chen, Matsushita, Imura, & Wang, 2009;Pereira et al., 2017;Petropoulos, Knorr, Scholze, Boschetti, & Karantounias, 2010;Petropoulos, Kontoes, & Keramitsoglou, 2011;F. Zhao, Huang, & Zhu, 2015) have successfully used SVM for burned scar mapping using satellite data. Mitrakis, Mallinis, Koutsias, and Theocharis (2012) performed burned area mapping in the Mediterranean region using a variety of ML algorithms, including a fuzzy neuron classifier (FNC), ANN, SVM, and AdaBoost, and found that, while all methods displayed similar accuracy, the FNC performed slightly better. Dragozi et al. (2014) applied SVM and a feature selection method (based on fuzzy logic) to IKONOS imagery for burned area mapping in Greece. Another approach to burned area mapping in the Mediterranean used an ANN and MODIS hotspot data (Gómez & Pilar Martín, 2011). Pereira et al. (2017) used a one class SVM, which requires only positive training data (i.e. burned pixels), for burned scar mapping, which may offer a more sample efficient approach than general SVMs -the one class SVM approach may be useful in cases where good wildfire training datasets are difficult to obtain. In Mithal et al. (2018), the authors developed a three-stage framework for burned area mapping using MODIS data and ANNs. Crowley, Cardille, White, and Wulder (2019) used Bayesian Updating of Landcover (BULC) to merge burned-area classifications from three remote sensing sources (Landsat-8, Sentinel-2 and MODIS). Celik (2010) used GA for change detection in satellite images, while Sunar andÖzkan (2001) used the interactive Iterative Self-Organizing DATA algorithm (ISODATA) and ANN to map burned areas.
In addition to burned area mapping, ML methods have been used for burn severity mapping, including GA (Brumby et al., 2001), MaxEnt (Quintano, Fernández-Manso, Calvo, & Roberts, 2019), bagged decision trees (Sá et al., 2003), and others. For instance, Hultquist, Chen, and Zhao (2014) used three popular ML approaches (Gaussian Process Regression (GPR) (Rasmussen & Williams, 2006), RF, and SVM) for burn severity assessment in the Big Sur ecoregion, California. RF gave the best overall performance and had lower sensitivity to different combinations of variables. All ML methods, however, performed better than conventional multiple regression techniques. Likewise, Hultquist et al. (2014) compared the use of GPR, RF, and SVM for burn severity assessment, and found that RF displayed the best performance. Another recent paper by Collins, Griffioen, Newell, and Mellor (2018) investigated the applicability of RF for fire severity mapping, and discussed the advantages and limitations of RF for different fire and land conditions. One recent paper by (Langford, Kumar, & Hoffman, 2019) used a 5-layer deep neural network (DNN) for mapping fires in Interior Alaska with a number of MODIS derived variables (eg. NDVI and surface reflectance). They found that a validation-loss (VL) weight selection strategy for the unbalanced data set (i.e., the no-fire class appeared much more frequently than fire) allowed them to achieve better accuracy compared with a XGBoost method. However, without the VL approach, XGBoost outperformed the DNN, highlighting the need for methods to deal with unbalanced datasets in fire mapping.

Fire weather prediction
Fire weather is a critical factor in determining whether a fire will start, how fast it will spread, and where it will spread. Fire weather observations are commonly obtained from surface weather station networks operated by meteorological services or fire management agencies. Weather observations may be interpolated from these point locations to a grid over the domain of interest, which may include diverse topographical conditions; the interpolation task is a regression problem. Weather observations may subsequently be used in the calculation of meteorologically based fire danger indexes, such as the Canadian Fire Weather Index (FWI) System (Van Wagner, 1987). Future fire weather conditions and danger indexes are commonly forecast using the output from numerical weather prediction (NWP) models (e.g., The European Forest Fire Information System (San-Miguel-Ayanz et al., 2012)). However, errors in the calculation of fire danger indexes that have a memory (such as the moisture indexes of the FWI System) can accumulate in such projections. It is noteworthy that surface fire danger measures may be correlated with large scale weather and climatic patterns.
To date there has been relatively few papers that address fire weather and danger prediction using machine learning. The first effort (Crimmins, 2006) used self-organizing maps (SOMs) to explore the synoptic climatology of extreme fire weather in the southwest USA. He found three key patterns representing southwesterly flow and large geopotential height gradients that were associated with over 80% of the extreme fire weather days as determined by a fire weather index. Nauslar, Hatchett, Brown, Kaplan, and Mejia (2019) used SOMs to determine the timing of the North American Monsoon that plays a major role on the length of the active fire season in the southwest USA. Lagerquist, Flannigan, Wang, and Marshall (2017) also used SOMs to predict extreme fire weather in northern Alberta, Canada. Extreme fire weather was defined by using extreme values of the Fine Fuel Moisture Code (FFMC), Initial Spread Index (ISI) and the Fire Weather Index (FWI), all components of the Canadian Fire Weather Index (FWI) System (Van Wagner, 1987). Good performance was achieved with the FFMC and the ISI and this approach has the potential to be used in near real time, allowing input into fire management decision systems. Other efforts have used a combination of conventional and machine learning approaches to interpolate meteorological fire danger in Australia (Sanabria, Qin, Li, Cechet, & Lucas, 2013).

Lightning prediction
Lightning is one of the common causes of wildfires, thus predicting the location and timing of future storms/strikes is of great importance to predicting fire occurrence. Electronic lightning detection systems have been deployed in many parts of the world for several decades and have accrued rich strike location/time datasets. Lightning prediction models have employed these data to derive regression relationships with atmospheric conditions and stability indices that can be forecast with NWP. Ensemble forecasts of lightning using RF is a viable modelling approach for Alberta, Canada (Blouin, Flannigan, Wang, & Kochtubajda, 2016). Bates et al. (2017) used two machine learning methods (CART and RF) and three statistical methods to classify wet and dry thunderstorms (lightning associated with dry thunderstorms are more likely to start fires) in Australia.

Climate Change
Transfer modeling, whereby a model produced for one study region and/or distribution of environmental conditions is applied to other cases (Phillips et al., 2006), is a common approach in climate change science. Model transferability should be considered when using ML methods to estimated projected quantities due to climate change or other environmental changes. With regards to climate change, transfer modeling is essentially an extrapolation task. Previous studies in the context of species distribution modeling have indicated ML approaches may be suitable for transfer modeling under future climate scenarios. For example, Heikkinen, Marmion, and Luoto (2012) indicated MaxEnt and generalized boosting methods (GBM) have the better transferability than either ANN and RF, and that the relatively poor transferability of RF may be due to overfitting.
There are several publications on wildfires and climate change that use ML approaches. Amatulli, Camia, and San-Miguel-Ayanz (2013) found that Multivariate Adaptive Regression Splines (MARS) were better predictors of future monthly area burned for 5 European countries as compared to Multiple Linear Regression and RF. (Parks et al., 2016) projected fire severity for future time periods in Western USA using BRT. Young, Higuera, Duffy, and Hu (2017) similarly used BRT to project future fire intervals in Alaska and found up to a fourfold increase in (30 year) fire occurrence probability by 2100. Several authors used MaxEnt to project future fire probability globally , for Mediterranean ecosystems (Batllori, Parisien, Krawchuk, & Moritz, 2013), in Southwest China (S. Li et al., 2017), the pacific northwestern USA (R. Davis, Yang, Yost, Belongie, & Cohen, 2017), and for south central USA (Stroh, Struckhoff, Stambaugh, & Guyette, 2018). An alternative approach for projecting future potential burn probability was employed by Stralberg et al. (2018) who used RF to determine future vegetation distributions as inputs to ensemble Burn-P3 simulations. Another interesting paper of note was by Boulanger, Parisien, and Wang (2018) who built a consensus model with 2 different predictor datasets and 5 different regression methods (generalised linear models, RF, BRT, CART and MARS) to make projections of future area burned in Canada. The consensus model can be used to quantify uncertainty in future area burned estimates. The authors noted that model uncertainty for future periods (> 200%) can be higher than that of different climate models under different carbon forcing scenarios. This highlights the need for further work in the application of ML methods for projecting future fire danger under climate change.

Fire Occurrence, Susceptibility and Risk
Papers in this domain include prediction of fire occurrence and area burned (at a landscape or seasonal scales), mapping of fire susceptibility (or similar definitions of risk) and analysis of landscape or environmental controls on fire.

Fire occurrence prediction
Predictions of the number and location of fire starts in the upcoming day(s) are important to preparedness planning -that is, the acquisition of resources, including the relocation of mobile resources and readiness for expected fire activity. The origins of fire occurrence prediction (FOP) models go back almost 100 years (Nadeem, Taylor, Woolford, & Dean, 2020). FOP models typically use regression methods to relate the response variable (fire reports or hotspots) to weather, lightning, and other covariates for a geographic unit, or as a spatial probability. The seminal work of Brillinger and others in developing the spatiotemporal FOP framework is reviewed in S. W. Taylor et al. (2013) The most commonly used ML method in studies predicting fire occurrence were ANNs. As early as 1996, Vega-Garcia, Lee, Woodard, and Titus (1996) used an ANN for human-caused wildfire prediction in Alberta, Canada, correctly predicting 85% of no-fire observations and 78% of fire observations. Not long after, Alonso-Betanzos et al. (2002) and Alonso-Betanzos et al. (2003) used ANN to predict a daily fire occurrence risk index using temperature, humidity, rainfall, and fire history, as part of a larger system for real-time wildfire management system in the Galicia region of Spain. Vasilakos, Kalabokidis, Hatzopoulos, Kallos, and Matsinos (2007) used separate ANNs for three different indices representing fire weather (Fire Weather Index; FWI), hazard (Fire Hazard Index; FHI), and risk (Fire Risk Index) to create a composite fire ignition index (FII) for estimating the probability of wildfire occurrence on the Greek island of Lesvos. Sakr, Elhajj, Mitri, and Wejinya (2010) used meteorological variables in a SVM to create a daily fire risk index corresponding to the number of fires that could potentially occur on a particular day. Sakr, Elhajj, and Mitri (2011) then compared the use of SVM and ANN for fire occurrence prediction based only on relative humidity and cumulative precipitation up to the specific day. While Sakr et al. (2011) reported low errors for the number of fires predicted by both the SVM and ANN models, ANN models outperformed SVM; however, the SVM performed better on binary classification of fire/no fire. It is important to note, however, that ANNs encompass a wide range of possible network architectures. In an Australian study, Dutta, Aryal, Das, and Kirkpatrick (2013) compared the use of ten different types of ANN models for estimating monthly fire occurrence from climate data, and found that an Elman RNN performed the best.
After 2012, RF became the more popular method for predicting fire occurrence among the papers reviewed here. Stojanova, Kobler, Ogrinc,Ženko, and Džeroski (2012) evaluated several machine learning methods for predicting fire outbreaks using geographical, remote sensed, and meteorological data in Slovenia, including single classifier methods (i.e., KNN, Naive Bayes, DT (using the J48 and jRIP algorithms), LR, SVM, and BN), and ensemble methods (AdaBoost, DT with bagging, and RF). The ensemble methods DT with bagging and RF displayed the best predictive performance with bagging having higher precision and RF having better recall. Vecín-Arias, Castedo-Dorado, Ordóñez, and Rodríguez-Pérez (2016) found that RF performed slightly better than LR for predicting lightning fire occurrence in the Iberian Peninsula, based on topography, vegetation, meteorology, and lightning characteristics. Similarly, Y. Cao, Wang, and Liu (2017) found that a cost-sensitive RF analysis outperformed GLM and ANN models for predicting wildfire ignition susceptibility. In recent non-comparative studies, B. Yu, Chen, Li, Wang, and Wu (2017) used RF to predict fire risk ratings in Cambodia using publicly available remote sensed products, while Van Beusekom et al. (2018) used RF to predict fire occurrence in Puerto Rico and found precipitation was found to be the most important predictor. The maximum entropy (MaxEnt) method has also been used for fire occurrence prediction (Chen, Du, Niu, & Zhao, 2015;De Angelis, Ricotta, Conedera, & Pezzatti, 2015). For example, De Angelis et al. (2015) used MaxEnt to evaluate different meteorological variables and fire-indices (e.g. the Canadian Fire Weather Index, FWI) for daily fire risk forecasting in the mountainous Canton Ticino region of Switzerland. The authors of that study found that combinations of such variables increased predictive power for identifying daily meteorological conditions for wildfires. Dutta, Das, and Aryal (2016) use a two-stage machine learning approach (ensemble of unsupervised deep belief neural networks with conventional supervised ensemble machine learning) to predict bush-fire hot spot incidence on a weekly time-scale. In the first unsupervised deep learning phase, Dutta et al. (2016) used Deep Belief Networks (DBNet; an ensemble deep learning method) to generate simple features from environmental and climatic surfaces. In the second supervised ensemble classification stage, features extracted from the first stage were fed as training inputs to ten ML classifiers (i.e., conventional supervised Binary Tree, Linear Discriminant Analyser, Nave Bayes, KNN, Bagging Tree, AdaBoost, Gentle Boosting Tree, Random Under-Sampling Boosting Tree, Subspace Discriminant, and Subspace KNN) to establish the best classifier for bush fire hotspot estimation. The authors found that bagging and the conventional KNN classifier were the two best classifiers with 94.5% and 91.8% accuracy, respectively.

Landscape scale burned area prediction
The use of ML methods in studies of burned area prediction have only occurred relatively recently compared to other wildfire domains, yet such studies have incorporated a variety of ML methods. For example, Cheng and Wang (2008) used an RNN to forecast annual average area burned in Canada, while Archibald, Roy, van Wilgen, and Scholes (2009) used RF to evaluate the relative importance of human and climatic drivers of burnt area in Southern Africa. Arnold, Brewer, and Dennison (2014) used Hard Competitive Learning (HCL) to identify clusters of unique pre-fire antecedent climate conditions in the interior western US which they then used to construct fire danger models based on MaxEnt.
Mayr, Vanselow, and Samimi (2018) evaluated five common statistical and ML methods for predicting burned area and fire occurrence in Namibia, including GLM, Multivariate Adaptive Regression Splines (MARS), Regression Trees from Recursive Partitioning (RPART), RF, and SVMs for Regression (SVR). The RF model performed best for predicting burned area and fire occurrence; however, adjusted R 2 values were slightly higher for RPART and SVR in both cases. Likewise, de Bem, de Carvalho Júnior, Matricardi, Guimarães, and Gomes (2018) compared the use of LR and ANN for modelling burned area in Brazil. Both LR and ANN showed similar performance; however, the ANN had better accuracy values when identifying non-burned areas, but displayed lower accuracy when classifying burned areas.

Fire Susceptibility Mapping
A considerable number of references (71) used various ML algorithms to map wildfire susceptibility, corresponding to either the spatial probability or density of fire occurrence (or other measures of fire risk such as burn severity) although other terms such as fire vulnerability and risk have also been used. The general approach was to build a spatial fire susceptibility model using either remote sensed or agency reported fire data with some combination of landscape, climate, structural and anthropogenic variables as explanatory variables. In general, the various modeling approaches used either a presence only framework (e.g., MaxEnt) or a presence/absence framework (e.g., BRT or RF).
Early attempts at fire susceptibility mapping used CART (Amatulli & Camia, 2007;Amatulli, Rodrigues, Trombetti, & Lovreglio, 2006;Lozano, Suárez-Seoane, Kelly, & Luis, 2008). Amatulli and Camia (2007) compared fire density maps in central Italy using CART and multivariate adaptive regression splines (MARS) and found while CART was more accurate that MARS led to smoother density model. More recent work has used ensemble based classifiers, such as RF and BRT, or ANNs (see the supplementary material for a full list) Several of these papers also compared ML and non-ML methods for fire susceptibility mapping and in general found superior performance from the ML methods. Specifically, Adab (2017) mapped fire hazard in the Northeast of Iran, and found ANN performed better than binary logistic regression (BLR) with an AUC of 87% compared with 81% for BLR. Bisquert, Caselles, Sánchez, and Caselles (2012) found ANN outperformed logistic regression for mapping fire risk in the North-west of Spain. Goldarag, Mohammadzadeh, and Ardakani (2016) also compared ANN and linear regression for fire susceptibility mapping in Northern Iran and found ANN had much better accuracy (93.49%) than linear regression (65.76%). Guo, Zhang, et al. (2016) and  compared RF and logistic regression for fire susceptibility mapping in China and found RF led to better performance. Oliveira, Oehler, San-Miguel-Ayanz, Camia, and Pereira (2012) compared RF and LR for fire density mapping in Mediterranean Europe and found RF outperformed linear regression. Perestrello De Vasconcelos et al. (2001) found ANN had better classification accuracy than logistic regression for ignition probability maps in parts of Portugal.
Referring to table 3 a frequently used ML method for fire susceptibility mapping was Maximum Entropy (MaxEnt) which is extensively used in landscape ecology for species distribution modeling (Elith et al., 2011). In particular, Vilar et al. (2016) found MaxEnt performed better than GLM for fire susceptibility mapping in central Spain with respect to sensitivity (i.e., true positive rate) and commission error (i.e., false positive rate), even though the AUC was lower. Of further note, Duane, Piqué, Castellnou, and Brotons (2015) partitioned their fire data into topography-driven, wind-driven and convection-driven fires in Catalonia and mapped the fire susceptibility for each fire type.
There were two applications of ML for mapping global fire susceptibility including Moritz et al. (2012) who used MaxEnt and R. Luo et al. (2013) who used RF. Both of these papers found that at a global scale, precipitation was one of the most important predictors of fire risk.
The majority of papers considered thus far used the entire study period (typically 4 or more years) to map fire susceptibility, therefore neglecting the temporal aspect of fire risk. However, a few authors have considered various temporal factors to map fire susceptibility. Martín, Zúñiga-Antón, and Rodrigues Mimbrero (2019) included seasonality and holidays as explanatory variables for fire probability in northeast Spain. Vacchiano, Foderi, Berretti, Marchi, and Motta (2018) predicted fire susceptibility separately for the winter and summer seasons. Several papers produced maps of fire susceptibility in the Eastern US by month of year (Peters & Iverson, 2017;Peters, Iverson, Matthews, & Prasad, 2013). Parisien et al. (2014) examined differences in annual fire susceptibility maps and a 31 year climatology for the USA, highlighting the role of climate variability as a driver of fire occurrence. In particular, they found FWI90 (the 90th percentile of the Canadian Fire Weather Index) was the dominant factor for annual fire risk but not for climatological fire risk. Y. Cao et al. (2017) considered a 10 day resolution (corresponding to the available fire data) for fire risk mapping, which makes their approach similar to fire occurrence prediction.
In addition to fire susceptibility mapping, a few papers focused on other aspects of fire risk including mapping probability of burn severity classes (Holden, Morgan, & Evans, 2009;Parks et al., 2018;Tracy et al., 2018). Parks et al. (2018) additionally considered the role of fuel treatments on fire probability which has obvious implications for fire management. Additionally Ghorbanzadeh, Blaschke, Gholamnia, and Aryal (2019) combined fire susceptibility maps with vulnerability and infrastructure indicators to produce a fire hazard map.
A number of papers directly compared three or more ML (and sometimes non-ML) methods for fire susceptibility mapping. Here we highlight some of these papers, which elucidate the performance and advantages/disadvantages of various ML methods. Y. Cao et al. (2017)  This was the only application of deep learning we could find for fire susceptibility mapping. The authors found that CNN outperformed the other algorithms with overall accuracy of 87.92% compared with RF (84.36%), SVM (80.04%), MLP (78.47%), KLR (81.23%). They noted that the benefit of CNN is that it incorporates spatial correlations so that it can learn spatial features. However, the downside is that deep learning models are not as easily interpretted as other ML methods (such as RF and BRT).

Landscape controls on fire
Many of the ML methods used in fire susceptibility mapping have also been used to examine landscape controls -ie. the relative importance of weather, vegetation, topography, structural and anthropogenic variables -on fire activity, which may facilitate hypothesis formation and testing or model building. From table 3 the most commonly used methods in this section were MaxEnt, RF, BRT and ANN. These methods all allow for the determination of variable importance (i.e. the relative influence of predictor variables in a given model of a response variable). A commonly used method to ascertain variable importance is through the use of partial dependence plots (Hastie et al., 2009). This method works by averaging over models that exclude the predictor variable of interest, with the resulting reduction in AUC (or other performance metrics) representing the marginal effect of the variable on the response. Partial dependence plots have the advantage of being able to be applied to a wide range of ML methods. A related method for determining variable importance, often used for RFs, is a permutation test which involves random permutation of each predictor variable (Strobl, Boulesteix, Zeileis, & Hothorn, 2007). Another model-dependent approach used for ANN is the use of partial derivatives (of the activation functions of hidden and output nodes) as outlined by Vasilakos, Kalabokidis, Hatzopoulos, and Matsinos (2009). It should be noted that while many other methods for model interpretation and variable dependence exist, a discussion of these methods is outside the scope of this paper.
In general, the drivers of fire occurrence or area burned varied greatly by the study area considered (including the size of area) and the methods used. Consistent with other work on "top down" and "bottom up" drivers of fire activity, at large scales climate variables were often determined to be the main drivers of fire activity whereas at smaller scales anthropogenic or structural factors exerted a larger influence. Here we discuss some of the papers that highlight the diversity of results for different study areas and spatial scales (global, country, ecoregion, urban) but refer the reader to the supplementary material for a full listing of papers in this section. Note that many of the papers listed under this subsection in the supplementary section also belong to the fire susceptibility mapping section and have already been discussed there.
Aldersley, Murray, and Cornell (2011) considered drivers of monthly area burned at global and regional scales using both regression trees and RF. They found climate factors (high temperature, moderate precipitation, and dry spells) were the most important drivers at the global scale, although at the regional scale the models exhibited higher variability due to the influence of anthropogenic factors. At a continental scale Mansuy et al. (2019) used MaxEnt to show that climate variables were the dominant controls (over landscape and human factors) on area burned for most ecoregions for both protected areas and outside these areas, although anthropogenic factors exerted a stronger influence in some regions such as the Tropical Wet Forests ecoregion. (Masrur, Petrov, & DeGroote, 2018) used RF to investigate controls on circumpolar arctic fire and found June surface temperature anomalies were the most important variable for determining the likelihood of wildfire occurrence on an annual scale. Chingono and Mbohwa (2015) used MaxEnt to model fire occurrences in Southern Africa where most fires are human-caused and found vegetation (i.e., dry mass productivity and NDVI) were the main drivers of biomass burning. Curt, Borgniet, Ibanez, Moron, and Hély (2015) used BRT to examine drivers of fire in New Caledonia. Interestingly, they found that human factors (such as distance to villages, cities or roads) were dominant influences for predicting fire ignitions whereas vegetation and weather factors were most important for area burned. Curt, Fréjaville, and Lahaye (2016) modeled fire probabilities by different fire ignition causes (lightning, intentional, accidental, negligence professional and negligence personal) in Southeastern France. They found socioeconomic factors (eg. housing and road density) were the dominant factors for ignitions and area burned for human-caused fires. P. M. Fernandes, Monteiro-Henriques, Guiomar, Loureiro, and Barros (2016) used BRT to examine large fires in Portugal and found high pyrodiversity (ie. spatial structure due to fire recurrence) and low landscape fuel connectivity were important drivers of area burned. Curt et al. (2016) modeled fire probabilities by different fire ignition causes (lightning, intentional, accidental, negligence professional and negligence personal) in Southeastern France. They found socioeconomic factors (eg. housing and road density) were the dominant factors for ignitions and area burned for human-caused fires. Leys, Commerford, and McLauchlan (2017) used RF to find the drivers that determine sedimentary charcoal counts in order to reconstruct grassfire history in the Great Plains, USA. Not surprisingly, they found fire regime characteristics (eg. area burned and fire frequency) were the most important variables and concluded that charcoal records can therefore be used to reconstruct fire histories. L.-M. Li, Song, Ma, and Satoh (2009) used ANNs to show that wildfire probability was strongly influenced by population density in Japan, with a peak determined by the interplay of positive and negative effects of human presence. This relationship, however, becomes more complex when weather parameters and forest cover percentage are added to the model. Z. Liu, Yang, and He (2013) used BRT to study factors influencing fire size in the Great Xingan Mountains in Northeastern China. Their method included a "moving window" resampling technique that allowed them to look at the relative influence of variables at different spatial scales. They showed that the most dominant factors influencing fire size were fuel and topography for small fires, but fire weather became the dominant factor for larger fires. For regions of high population density, anthropogenic or structural factors are often dominant for fire susceptibility. For example Molina, Lora, Prades, and Silva (2019) used MaxEnt to show distance to roads, settlements or powerlines were the dominant factors for fire occurrence probability in the Andalusia region in southern Spain. MaxEnt has also been used for estimating spatial fire probability under different scenarios such as future projections of housing development and private land conservation (Syphard et al., 2016). One study in China using RF found mean spring temperature was the most important variable for fire occurrence whereas forest stock was most important for area burned (Ying, Han, Du, & Shen, 2018).
Some authors examined controls on fire severity using high resolution data for a single large fire. For example, several authors used RF to examine controls on burn severity for the 2013 Rim fire in the Sierra Nevada (Kane et al., 2015;Lydersen et al., 2017;Lydersen, North, & Collins, 2014). At smaller spatial scales fire weather was the most important variable for fire severity, whereas fuel treatments were most important at larger spatial scales (Lydersen et al., 2017). A similar study by Harris and Taylor (2017) showed that previous fire severity was an important factor influencing fire severity for the Rim fire. For the 2005 Riba de Saelices fire, Viedma, Quesada, Torres, De Santis, and Moreno (2015) looked at factors contributing to burn severity using a BRT model and found burning conditions (including fire weather variables) were more important compared than stand structure and topography. For burn severity these papers all used the Relativized differenced Normalized Burn Ratio (RdNBR) metric, derived from Landsat satellite images, which allowed spatial modeling at high resolutions (eg. 30m by 30m). In addition to the more commonly used ML methods one paper by Wu, He, Yang, and Liang (2015) used KNN to identify spatially homogeneous fire environment zones by clustering climate, vegetation, topography, and human activity related variables. They then used CART to examine variable importance for each of three fire environment zones in south-eastern China. For landscape controls on fire there were few studies comparing multiple ML methods. One such study by Nelson, Nijland, Bourbonnais, and Wulder (2017) compared CART, BRT and RF for classifying different fire size classes in British Columbia, Canada. For both central and periphery regions they found the best performing model was BRT followed by CART and RF. For example, in the central region BRT achieved a classification accuracy of 88% compared with 82.9% and 49.6% for the CART and RF models respectively. It is not clear from the study why RF performed poorly, although it was noted that variable importance differs appreciably between the three models.

Fire Behavior Prediction
In general, fire behavior includes physical processes and characteristics at a variety of scales including combustion rate, flaming, smouldering residence time fuel consumption, flame height, and flame depth. However, the papers in this section deal mainly with larger scale processes and characteristics such as the prediction of fire spread rates, fire growth, burned area, and fire severity, conditional on the occurrence (ignition) of one, or more, wildfires. Here, our emphasis is on prognostic applications, in contrast to the Fuels Characterization, Fire Detection and Mapping problem domain, in which we focused on diagnostic applications.

Fire spread and growth
Predicting the spread of a wildland fire is an important task for fire management agencies, particularly to aid in the deployment of suppression resources or to anticipate evacuations one or more days in advance.
Thus, a large number of models have been developed using different approaches. In a series of reviews A. L. Sullivan (2009aSullivan ( , 2009bSullivan ( , 2009c described fire spread models he classified as being of physical or quasiphysical nature, or empirical or quasi-empirical nature, as well as mathematical analogues and simulation models. Many fire growth simulation models convert one dimensional empirical or quasi-empirical spread rate models to two dimensions and then propagate a fire perimeter across a modelled landscape. A wide range of ML methods have been applied to predict fire growth. For example, Markuzon and Kolitz (2009) tested several classifiers (RF, BNs, and KNN) to estimate if a fire would become large either one or two days following its observation; they found each of the tested methods performed similarly with RF correctly classifying large fires at a rate over 75%, albeit with a number of false positives. Vakalis, Sarimveis, Kiranoudis, Alexandridis, and Bafas (2004) used a ANN in combination with a fuzzy logic model to estimate the rate of spread in the mountainous region of Attica in Greece. A number of papers used genetic algorithms (GAs) to optimize input parameters to a physics or empirically based fire simulator in order to improve fire spread predictions (Abdalhaq, Cortés, Margalef, & Luque, 2005;Artés, Cencerrado, Cortés, & Margalef, 2014, 2016Carrillo, Artés, Cortés, & Margalef, 2016;Cencerrado, Cortés, & Margalef, 2012, 2013Denham & Laneri, 2018;Denham, Wendt, Bianchini, Cortés, & Margalef, 2012;Rodríguez, Cortés, & Margalef, 2009;Rodriguez, Cortés, Margalef, & Luque, 2008). For example, Cencerrado et al. (2014) developed a framework based on GAs to shorten the time needed to run deterministic fire spread simulations. They tested the framework using the FARSITE (Finney, 2004) fire spread simulator with different input scenarios sampled from distributions of vegetation models, wind speed/direction, and dead/live fuel moisture content. The algorithm used a fitness function which discarded the most timeintensive simulations, but did not lead to an appreciable decrease in the accuracy of the simulations. Such an approach is potentially useful for fire management where it is desirable to predict fire behavior as far in advance as possible so that the information can be enacted upon. This approach may greatly reduce overall simulation time by reducing the input parameter space as also noted by Artés et al. (2016) and Denham et al. (2012), or through parallelization of simulation runs for stochastic approaches (Artés et al., 2017;Denham & Laneri, 2018). A different goal was considered by Ascoli, Vacchiano, Motta, and Bovio (2015) who used a GA to optimize fuel models in Southern Europe by calibrating the model with respect to rate of spread observations. Kozik, Nezhevenko, and Feoktistov (2013) presented a fire spread model that used a novel ANN implementation that incorporated a Kalman filter for data assimilation that could potentially be run in real-time, the resulting model more closely resembling that of complex cellular automata than a traditional ANN. The same authors later implemented this model and simulated fire growth under various scenarios with different wind speeds and directions, or both, although a direct comparison with real fire data was not possible (Kozik, Nezhevenko, & Feoktistov, 2014). Zheng, Huang, Li, and Zeng (2017) simulated fire spread by integrating a cellular automata (CA) model with an Extreme Learning Machine (ELM; a type of feedforward ANN). Transition rules for the CA were determined by the ELM trained with data from historical fires, as well as vegetation, topographic, and meteorological data. Likewise, Chetehouna, Tabach, Bouazaoui, and Gascoin (2015) used ANNs to predict fire behavior, including rate of spread, and flame height and angle. In contrast, Subramanian and Crowley (2017) formulated the problem of fire spread prediction as a Markov Decision Process, where they proposed solutions based on both a classic reinforcement learning algorithm and a deep reinforcement learning algorithm -the authors found the deep learning approach improved on the traditional approach when tested on two large fires in Alberta, Canada. The authors further developed this work to compare five widely used reinforcement learning algorithms (Ganapathi Subramanian & Crowley, 2018), and found that the Asynchronous Advantage Actor-Critic (A3C) and Monte Carlo Tree Search (MCTS) algorithms achieved the best accuracy. Meanwhile, Khakzad (2019) developed a fire spread model to predict the risk of fire spread in Wildland-Industrial Interfaces, using Dynamic Bayesian Networks (DBN) in combination with a deterministic fire spread model. The Canadian Fire Behavior Prediction (FBP) system, which uses meteorological and fuel conditions data as inputs, determined the fire spread probabilities from one node to another in the aforementioned DBN.
More recently Hodges and Lattimer (2019) trained a (deep learning) CNN to predict fire spread using environmental variables (topography, weather and fuel related variables). Outputs of the CNN were spatial grids corresponding to the probability the burn map reached a pixel and the probability the burn map did not reach a pixel. Their method achieved a mean precision of 89% and mean sensitivity of 80% with reference 6 hourly burn maps computed using the physics-based FARSITE simulator. Radke, Hessler, and Ellsworth (2019) also used a similar approach to predict daily fire spread for the 2016 Beaver Creek fire in Colorado.

Burned area and fire severity prediction
There are a number of papers that focus on using ML approaches to directly predict the final area burned from a wildfire. Cortez and Morais (2007) compared multiple regression and four different ML methods (DT, RF, ANN, and SVM) to predict area burned using fire and weather (i.e., temperature, precipitation, relative humidity and wind speed) data from the Montesinho natural park in northeastern Portugal, and found that SVM displayed the best performance. A number of publications subsequently used the data from Cortez and Morais (2007) to predict area burned using various ML methods, including ANN (Safi & Bouroumi, 2013;Storer & Green, 2016), genetic algorithms (Castelli, Vanneschi, & Popovič, 2015), both ANN and SVM (Al Janabi, Al Shourbaji, & Salman, 2018), and decision trees (Alberg, 2015;H. Li, Fei, & He, 2018). Notably, Castelli et al. (2015) found that a GA variant outperformed other ML methods including SVM. D. W. Xie and Shi (2014) used a similar set of input variables with SVM to predict burned area in for Guangzhou City in China. In addition to these studies, Toujani, Achour, and Faïz (2018) used hidden Markov models (HMM) to predict burned area in the north-west of Tunisia, where the spatiotemporal factors used as inputs to the model were initially clustered using self-organizing maps (SOMs). Liang, Zhang, and Wang (2019) compared back-propagation neural networks, recurrent neural networks (RNN) and Long Short Term Memory (LSTM) neural networks to predict wildfire scale, a quantity related to area burned and fire duration, in Alberta Canada. They found the highest accuracy (90.9%) was achieved with LSTM.
Most recently, Y. Xie and Peng (2019) compared a number of machine learning methods for estimating area burned (regression) and binary classification of fire sizes (> 5 Ha) in Montesinho natural park, Portugal. For the regression task, they found a tuned RF algorithm performed better than standard RF, tuned and standard gradient boosted machines, tuned and standard generalized linear models (GLMs) and deep learning. For the classification problem they found extreme gradient boosting and deep learning had a higher accuracy than CART, RF, SVM, ANN, and logistic regression.
By attempting to predict membership of burned area size classes, a number of papers were able to recast the problem of burned area prediction as a classification problem. For example, Y. P. Yu, Omar, Harrison, Sammathuria, and Nik (2011) used a combination of SOMs and back-propagation ANNs to classify forest fires into size categories based on meteorological variables. This approach gave Y. P. Yu et al. (2011) better accurary ( 90%) when compared with a rules-based method ( 82%).Özbayolu and Bozer (2012) estimated burned area size classes using geographical and meteorological data using three different machine learning methods: i) Multilayer Perceptron (MLP); ii) Radial Basis Function Networks (RBFN); and iii) SVM. Overall, the best performing method was MLP, which achieved a 65% success rate, using humidity and windspeed as predictors. Zwirglmaier, Papakosta, and Straub (2013) used a BN to predict area burned classes using historical fire data, fire weather data, fire behaviour indices, land cover, and topographic data. Shidik and Mustofa (2014) used a hybrid model (Fuzzy C-Means and Back-Propagation ANN) to estimate fire size classes using data from Cortez and Morais (2007), where the hybrid model performed best with an accuracy of 97.50% when compared with Naive Bayes (55.5%), DT (86.5%), RF (73.1%), KNN (85.5%) and SVM (90.3%). Mitsopoulos and Mallinis (2017) compared BRT, RF and Logistic Regression to predict 3 burned area classes for fires in Greece. They found RF led to the best performance of the three tested methods and that fire suppression and weather were the two most important explanatory variables. Coffield et al. (2019) compared CART, RF, ANN, KNN and gradient boosting to predict 3 burned area classes at time of ignition in Alaska. They found a parsimonious model using CART with Vapor Pressure Deficit (VPD) provided the best performance of the models and variables considered.
We found only one study that used ML to predict fire behavior related to fire severity, which is important in the context of fire ecology, suggesting that there are opportunities to apply ML in this domain of wildfire science. In that paper, Zald and Dunn (2018) used RF to determine that the most important predictor of fire severity was daily fire weather, followed by stand age and ownership, with less predictability given by topographic features.

Fire Effects
Fire Effects prediction studies have largely used regression based approaches to relate costs, losses, or other impacts (e.g., soils, post-fire ecology, wildlife, socioeconomic factors) to physical measures of fire severity and exposure. Importantly, this category also includes wildfire smoke and particulate modelling (but not smoke detection which was previously discussed in the fire detection section).

Soil Erosion and Deposits
Mallinis, Maris, Kalinderis, and Koutsias (2009) modelled potential post-fire soil erosion risk following a large intensive wildfire in the Mediterranean area using CART and k-means algorithms. In that paper, before wildfire, 55% of the study area was classified as having severe or heavy erosion potential, compared to 90% post-fire, with an overall classification accuracy of 86%. Meanwhile, Buckland, Bailey, and Thomas (2019) used ANNs to examine the relationships between sand deposition in semi-arid grasslands and wildfire occurrence, land use, and climatic conditions. The authors then predicted soil erosion levels in the future given climate change assumptions.

Smoke and Particulate Levels
Smoke emitted from wildfires can seriously lower air quality with adverse effects on the health of both human and non-human animals, as well as other impacts. Thus, it is not surprising that ML methods have been used to understand the dynamics of smoke from wildland fire. For example,  used RF to predict the minimum height of forest fire smoke using data from the CALIPSO satellite. More commonly, ML methods have also been used to estimate population exposure to fine particulate matter (e.g., PM2.5: atmospheric particulate matter with diameter less than 2.5µm), which can be useful for epidemiological studies and for informing public health actions. One such study by  also used RF to estimate hourly concentrations of PM2.5 in British Columbia, Canada. Zou et al. (2019) compared RF, BRT and MLR to estimate regional PM2.5 concentrations in the Pacific Northwest and found RF performed much better than the other algorithms. In another very broad study covering several datasets and ML methods, Reid et al. (2015) estimated spatial distributions of PM2.5 concentrations during the 2008 northern California wildfires. The authors of the aforementioned study used 29 predictor variables and compared 11 different statistical models, including RF, BRT, SVM, and KNN. Overall, the BRT and RF models displayed the best performance. Emissions other than particulate matter have also been modelled using ML, as Lozhkin, Tarkhov, Timofeev, Lozhkina, and Vasilyev (2016) used an ANN to predict carbon monoxide concentrations emitted from a peat fire in Siberia, Russia. In a different application related to smoke, Fuentes et al. (2019) used ANNs to detect smoke in several different grape varietals used for wine making.

Post-fire regeneration, succession, and ecology
The study of post-fire regeneration is an important aspect of understanding forest and ecosystem responses and resilience to wildfire disturbances, with important ecological and economic consequences. RF, for example, has been a popular ML method for understanding the important variables driving post-fire regeneration (João, João, Bruno, & João, 2018;Vijayakumar et al., 2016). Burn severity (a measure of above and below ground biomass loss due to fire) is an important metric for understanding the impacts of wildfire on vegetation and post-fire regeneration, soils, and potential successional shifts in forest composition, and as such, has been included in many ML studies in this section, including (Barrett, McGuire, Hoy, & Kasischke, 2011;Cai, Yang, Liu, Hu, & Weisberg, 2013;Cardil, Mola-Yudego, Blázquez-Casado, & González-Olabarria, 2019;Chapin, Hollingsworth, & Hewitt, 2014;Divya & Vijayalakshmi, 2016;Fairman, Bennett, Tupper, & Nitschke, 2017;Han, Shen, Ying, Li, & Chen, 2015;Johnstone, Hollingsworth, Chapin, & Mack, 2010;Martín-Alcón & Coll, 2016;Sherrill & Romme, 2012; J. R. Thompson & Spies, 2010). For instance, Cardil et al. (2019) used BRT to demonstrate that remotely-sensed data (i.e., Relative Differenced Normalized Burn Ratio index; RdNBR) can provide an acceptable assessment of fireinduced impacts (i.e., burn severity) on forest vegetation, while (Fairman et al., 2017) used RF to identify the variables most important in explaining plot-level mortality and regeneration of Eucalyptus pauciflora in Victoria, Australia, affected by high-severity wildfires and subsequent re-burns. Debouk, Riera-Tatché, and Vega-García (2013) assessed post-fire vegetation regeneration status using field measurements, a canopy height model, and Lidar (i.e., 3D laser scanning) data with a simple ANN. Post-fire regeneration also has important implications for the successional trajectories of forested areas, and a few studies have examined this using ML approaches (Barrett et al., 2011;Cai et al., 2013;Johnstone et al., 2010). For example, Barrett et al. (2011) used RF to model fire severity, from which they made an assessment of the area susceptible to a shift from coniferous to deciduous forest cover in the Alaskan boreal forest, while Cai et al. (2013) used BRT to assess the influence of environmental variables and burn severity on the composition and density of post-fire tree recruitment, and thus the trajectory of succession, in northeastern China. In other studies not directly related to post-fire regeneration, Hermosilla, Wulder, White, Coops, and Hobart (2015) used RF to attribute annual forest change to one of four categories, including wildfire, in Saskatchewan, Canada, while (Jung, Tautenhahn, Wirth, & Kattge, 2013) used GA and RF to estimate the basal area of post-fire residual spruce (Picea obovate) and fir (Abies sibirica) stands in central Siberia using remotely sensed data. Magadzire, Klerk, Esler, and Slingsby (2019) used MaxEnt to demonstrate that fire return interval and species life history traits affected the distribution of plant species in South Africa. ML has also been used to examine fire effects on the hydrological cycle, as Poon, Kinoshita, Poon, and Kinoshita (2018) used SVM to estimate both pre-and post-wildfire evapotranspiration using remotely sensed variables.
Considering the potential impacts of wildfires on wildlife, it is perhaps surprising that relatively few of such studies have adopted ML approaches. However, ML methods have been used to predict the impacts of wildfire and other drivers on species distributions and arthropod communities. Hradsky et al. (2017), for example, used non-parametric BNs to describe and quantify the drivers of faunal distributions in wildfire-affected landscapes in southeastern Australia. Similarly, (Reside, VanDerWal, Kutt, Watson, & Williams, 2012) used MaxEnt to model bird species distributions in response to fire regime shifts in northern Australia, which is an important aspect of conservation planning in the region. ML has also been used to look at the effects of wildfire on fauna at the community level, as G. Luo, Zhang, Yang, and Song (2017) used DTs, Association Rule Mining, and AdaBoost to examine the effects of fire disturbance on spider communities in Cangshan Mountain, China.

Socioeconomic effects
ML methods have been little used to model socio-economic impacts of fire to date. We found one study in which BNs were used to predict the economic impacts of wildfires in Greece from 2006-2010 due to housing losses (Papakosta, Xanthopoulos, & Straub, 2017). The authors did this by first defining a causal relationship between the participating variables, and then using BNs to estimate housing damages. It is worth noting that the problem of detecting these causal relationships from data is a difficult task and remains an active area of research in artificial intelligence.

Fire management
The goal of contemporary fire management is to have the appropriate amount of fire on the landscape, which may be accomplished through the management of vegetation including prescribed burning, the management of human activities (prevention), and fire suppression. Fire management is a form of risk management that seeks to maximize fire benefits and minimize costs and losses (Finney, 2005). Fire management decisions have a wide range of scales, from long range strategic decisions about the acquisition and location of resources or the application of vegetation management in large regions, to tactical decisions about the acquisition of additional resources, relocation, or release of resources during the fire season, to real time operational decisions about the deployment and utilization of resources on individual incidents. Fire preparedness and response is a supply chain with a hierarchical dependence. S. Taylor (2020) describes 20 common decision types in fire management and maps the spatial-temporal dimensions of their decision spaces.
Fire management models can be predictive, such as the probability of initial attack success, or prescriptive such as to maximize/minimize an objective function (e.g.,optimal helicopter routing to minimize travel time in crew deployment). While advances have been made in the domain of wildfire management using ML techniques, there have been relatively few studies in this area compared to other wildfire problem domains. Thus, there appears to be great potential for ML to be applied to wildfire management problems, which may lead to novel and innovative approaches in the future.

Planning and policy
An important area of fire management is planning and policy, where various ML methods have been applied to address pertinent challenges. For example, Bao, Xiao, Lai, Zhang, and Kim (2015) used GA, which are useful for solving multi-objective optimization problems, to optimize watchtower locations for forest fire monitoring. (Bradley, Hanson, & DellaSala, 2016) used RF to investigate the relationship between the protected status of forest in the western US and burn severity. Likewise, Ruffault and Mouillot (2015) also used BRTs to assess the impact of fire policy introduced in the 1980s on fire activity in southern France and the relationships between fire and weather, and Penman, Price, and Bradstock (2011) used BNs to build a framework to simultaneously assess the relative merits of multiple management strategies in Wollemi National Park, NSW, Australia. McGregor et al. (2016) used Markov decision processes (MDP) and model free Monte Carlo method to create fast running simulations (based on the FARSITE simulator) to create interactive visualizations of forest futures over 100 years based on alternate high-level suppression policies. McGregor, Houtman, Montgomery, Metoyer, and Dietterich (2017) demonstrated ways in which a variety of ML and optimization methods can be used to create an interactive approximate simulation tool for fire managers. The authors of the aforementioned study utilized a modified version of the FARSITE fire-spread simulator, which was augmented to run thousands of simulation trajectories while also including new models of lightning strike occurrences, fire duration, and a forest vegetation simulator. McGregor et al. (2017) also clearly show how decision trees can be used to analyze a hierarchy of decision thresholds for deciding whether to suppress a fire or not; their hierarchy splits on fuel levels, then intensity estimations, and finally weather predictors to arrive at a generalizable policy.

Fuel treatment
ML methods have also been used to model the effects of fuel treatments in order to mitigate wildfire risk. For example, Penman, Bradstock, and Price (2014) used a BN to examine the relative risk reduction of using prescribed burns on the landscape versus within the 500m interface zone adjacent to houses in the Sydney basin, Australia. Lauer, Montgomery, and Dietterich (2017) used approximate dynamic programming (also known as reinforcement learning) to determine the optimal timing and location of fuel treatments and timber harvest for a fire-threatened landscape in Oregon, USA, with the objective of maximizing wealth through timber management. Similarly, Arca, Ghisu, and Trunfio (2015) used GA for multi-objective optimization of fuel treatments.

Wildfire preparedness and response
Wildfire preparedness and response issues have also been examined using ML techniques. Costafreda-Aumedes et al. (2015) used ANNs to model the relationships between daily fire load, fire duration, fire type, fire size, and response time, as well as personnel and terrestrial/aerial units deployed for individual wildfires in Spain. Most of the models in Costafreda-Aumedes et al. (2015) highlighted the positive correlation of burned area and fire duration with the number of resources assigned to each fire, and some highlighted the negative influence of daily fire load. In another study, Penman et al. (2015) used Bayesian Networks to assess the relative influence of preventative and suppression management strategies on the probability of house loss in the Sydney basin, Australia. O'Connor et al. (2017) used BRT to develop a predictive model of fire control locations in the Northern Rocky Mountains, USA, based on the likelihood of final fire perimeters, while Homchaudhuri, Zhao, Cohen, and Kumar (2010) used GAs to optimize fireline generation. Rodrigues, Alcasena, and Vega-García (2019) modelled the probability that wildfire will escape initial attack using a RF model trained with fire location, detection time, arrival time, weather, fuel types, and available resources data. Important variables in Rodrigues et al. (2019) included fire weather and simultaneity of events. Julian and Kochenderfer (2018a) used two different RL algorithms to develop a system for autonomous control of one or more aircraft in order to monitor active wildfires.

Social factors
Recently, the use of ML in fire management has grown to encompass more novel aspects of fire management, even including the investigation of criminal motives related to arson, as Delgado, González, Sotoca, and Tibau (2018) used BNs to characterize wildfire arsonists in Spain thereby identifying five motivational archetypes (i.e., slight negligence; gross negligence; impulsive; profit; and revenge).

Discussion
ML methods have seen a spectacular evolution in development, accuracy, computational efficiency, and application in many fields since the 1990s. It is therefore not surprising that ML has been helpful in providing new insights into several critical sustainability and social challenges in the 21st century (Butler, 2017;Gomes, 2009;B. L. Sullivan et al., 2014). The recent uptake and success of ML methods has been driven in large part by ongoing advances in computational power and technology. For example, the recent use of bandwidth optimized Graphics Processing Units (GPUs) takes advantage of parallel processing for simultaneous execution of computationally expensive tasks, which has facilitated a wider use of computationally demanding but more accurate methods like DNNs. The advantages of powerful but efficient ML methods are therefore widely anticipated as being useful in wildfire science and management.
However, despite some early papers suggesting that data driven techniques would be useful in forest fire management (Kourtz, 1990(Kourtz, , 1993Latham, 1987), our review has shown that there was relatively slow adoption of ML-based research in wildfire science up to the 2000s compared with other fields, followed by a sharp increase in publication rate in the last decade. In the early 2000s, data mining techniques were quite popular and classic ML methods such as DTs, RF, and bagging and boosting techniques began to appear in the wildfire science literature (e.g., Stojanova, Kobler, Džeroski, and Taškova (2006)). In fact, some researchers started using simple feed forward ANNs for small scale applications as early as the mid 1990s and early 2000s (e.g., Al-Rawi et al. (2002); Mccormick, Brandner, and Allen (1999)). In the last three decades, almost all major ML methods have been used in some way in wildfire applications, although some more computationally demanding methods, such as SOMs and cellular automatons, have only been actively experimented with in the last decade (Toujani et al., 2018;Zheng et al., 2017). Furthermore, the recent development of DL algorithms, with a particular focus on extracting spatial features from images, has led to a sharp rise in the application of DL for wildfire applications in the last decade. It is evident, however, from our review that while an increasing number of ML methodologies have been used across a variety of fire research domains over the past 30 years, this research is unevenly distributed among ML algorithms, research domains and tasks, and has had limited application in fire management.
Many fire science and management questions can be framed within a fire risk context. Xi, Taylor, Woolford, and Dean (2019) discussed the advantages of adopting a risk framework with regard to statistical modeling of wildfires. There the risk components of "hazard, "vulnerability and "exposure are replaced respectively by fire probability, fire behavior and fire effects. Most fire management activities can be framed as risk controls to mitigate these components of risk. Traditionally, methods used in wildfire fire science to address these various questions have included physical modeling (e.g., A. L. Sullivan (2009aSullivan ( , 2009bSullivan ( , 2009c), statistical methods (e.g., S. W. Taylor et al. (2013); Xi et al. (2019)), simulation modeling (e.g., Keane et al. (2004)), and operations research methods (Martell (2015); Minas, Hearne, and Handmer (2012)).
In simple terms, any analytical study begins with one or more of four questions: "what happened?; "why did it happen?; "what will happen?; or "what to do? Corresponding data driven approaches to address these questions are respectively called descriptive, diagnostic, predictive, and prescriptive analytics. The type of analytical approach adopted then circumscribes the types of methodological approaches (e.g., regression, classification, clustering, dimensionality reduction, decision making) and sets of possible algorithms appropriate to the analysis.
In our review, we found that studies incorporating ML methods in wildland fire science were predominantly associated with descriptive or diagnostic analytics, reflecting the large body of work on fire detection and mapping using classification methods, and on fire susceptibility mapping and landscape controls on fire using regression approaches. In many cases, the ML methods identified in our review are an alternative to statistical methods used for clustering and regression. While the aforementioned tasks are undoubtedly very important for understanding wildland fire, we found much less work associated with predictive or prescriptive analytics, such as fire occurrence prediction (predictive), fire behaviour prediction (predictive), and fire management (prescriptive). This may be because: a) particular domain knowledge is required to frame fire management problems; b) fire management data are often not publicly available, need a lot of work to transform into an easily analyzable form, or do not exist at the scale of the problem; and c) some fire management problems are not suited or cant be fully addressed by ML approaches. We note that much of the work on fire risk in the fire susceptibility and mapping domain used historical fire and environmental data to map fire susceptibility; therefore, while that work aims to inform future fire risk, it cannot be considered to be predictive analytics, except, for example, in cases where it was used in combination with climate change projections. It appears then that, in general, wildfire science research is currently more closely aligned with descriptive and diagnostic analytics, whereas wildfire management goals are aligned with predictive and prescriptive analytics. This fundamental difference identifies new opportunities for research in fire management, which we discuss later in this paper.
In the remainder of the paper, we examine some considerations for the use of ML methods, including: data considerations, model selection and accuracy, implementation challenges, interpretation, opportunities, and implications for fire management.

Data considerations
ML is a data-centric modeling paradigm concerned with finding patterns in data. Importantly, data scientists need to determine, often in collaboration with fire managers or domain experts, whether there are suitable and sufficient data for a given modeling task. Some of the criteria for suitable data include whether: a) the predictands and covariates are or can be wrangled into the same temporal and spatial scale; b) the observations are a representative sample of the full range of conditions that may occur in application of a model to future observations; and c) whether the data are at spatiotemporal scale appropriate to the fire science or management question. The first of these criteria can be relaxed in some ML models such as ANNs and DNNs, where inputs and outputs can be at different spatial or temporal scales for appropriately designed network architectures, although data normalization may still be required.
The second criterion also addresses the important question of whether enough data exists for training a given algorithm for a given problem. In general, this question depends on the nature of the problem, complexity of the underlying model, data uncertainty and many other factors (see Roh, Heo, and Whang (2018) for a further discussion of data requirements for ML). In any case, many complex problems require a substantive data wrangling effort, to acquire, perform quality assurance, and fuse data into sampling units at the appropriate spatiotemporal scale. An example of this in daily fire occurrence prediction, where observations of a variety of features (e.g., continuous measures such as fire arrival time and location, or lightning strike times and locations) are discretized into three-dimensional (e.g., longitude, latitude, and day) cells called voxels.
For the problem domain fire detection and mapping, most applications of ML used some form of imagery (e.g., remote sensed satellite images or terrestrial photographs). In particular, many papers used satellite data (e.g., Landsat, MODIS) to determine vegetation differences before and after a fire and so were able to map area burned. For fire detection, many applications considered either remote sensed data for hotspot or smoke detection, or photographs of wildfires (used as inputs to an image classification problem). For fire weather and climate change, the three main sources of data were either weather station observations, climate reanalyses (modelled data that include historical observations), or GCMs for future climate projections. Reanalyses and GCMs are typically highly dimensional large gridded spatiotemporal datasets which require careful feature selection and/or dimensional reduction for ML applications. Fire occurrence prediction, susceptibility, and risk applications used a large number of different environmental variables as predictors, but almost all used fire locations and associated temporal information as predictands. Fire data itself is usually collated from fire management agencies in the form of georeferenced points or perimeter data, along with reported dates, ignition cause, and other related variables. Care should be taken using such data because changes in reporting standards or accuracy may lead to data inhomogeneity. As well as fire locations and perimeters, fire severity is an attribute of much interest to fire scientists. Fire severity is often determined from remotely sensed data and represented using variables such as the Differenced Normalized Burn Ratio (dNBR) and variants, or through field sampling. However, remote sensed estimates of burn severity should be considered as proxies as they have low skill in some ecosystems. Other fire ecology research historically relies on in situ field, sampling although many of the ML applications attempt to resolve features of interest using remote sensed data. Smoke data can also be derived from remote sensed imagery or from air quality sensors (e.g., PM2.5, atmospheric particulate matter less than 2.5 µm).
Continued advances in remote sensing, as well as the quality and availability of remote sensed data products, in weather and climate modeling have led to increased availability of large spatiotemporal datasets, which presents both an opportunity and challenge for the application of ML methods in wildfire research and management. The era of "big data" has seen the development of cloud computing platforms to provide the computing and data storage facilities to deal with these large datasets. For example, in our review we found two papers (Crowley et al., 2019;Quintero, Viedma, Urbieta, & Moreno, 2019) that used Google Earth Engine which integrates geospatial datasets with a coding environment (Gorelick et al., 2017). In any case, data processing and management plays an important role in the use of large geospatial datasets.

Model selection and accuracy
Given a wildfire science question or management problem and available relevant data, a critical question to ask is what is the most appropriate modeling tool to address the problem? Is it a standard statistical model (e.g., linear regression or LR), a physical model (e.g., FIRETEC or other fire simulator), a ML model, or a combination of approaches? Moreover, which specific algorithm will yield the most accurate classification or regression. Given the heterogeneity of research questions, study areas, and datasets considered in the papers reviewed here, it is not possible to comprehensively answer these questions with respect to ML approaches. Even in the case where multiple studies used the same dataset (Alberg, 2015;Al Janabi et al., 2018;Castelli et al., 2015,?;Cortez & Morais, 2007;H. Li et al., 2018;Safi & Bouroumi, 2013;Storer & Green, 2016) the different research questions considered meant a direct comparison of ML methods was not possible between research studies. However, a number of individual studies did make comparisons between multiple ML methods, or between ML and statistical methods for a given wildfire modeling problem and dataset. Here we highlight some of their findings to provide some guidance with respect to model selection. In our review (see section 4 and the supplementary material), we found 28 papers comparing ML and statistical methods, where in the majority of these cases ML methods were found to be more accurate than traditional statistical methods (e.g., GLMs), or displayed similar performance Bates et al. (2017);de Bem et al. (2018); Pu and Gong (2004). In only one study on climate change by Amatulli et al. (2013), MARS was found to be superior to RF for their analytical task. A sizable number of the comparative studies (14) involved classification problems that used LR as a benchmark method against ANN or ensemble tree methods. For studies comparing multiple ML methods, there was considerable variation in the choice of most accurate method; however, in general ensemble methods tended to outperform single classifier methods (e.g., Dutta et al. (2016); Mayr et al. (2018); Nelson et al. (2017); Reid et al. (2015); Stojanova et al. (2012)), except in one case where the most accurate model (CART) was also the most parsimonious (Coffield et al., 2019). A few more recent papers also highlighted the advantages of DL over other methods. In particular, for fire detection, Q. X.  compared CNNs with SVM and found that CNNs were more accurate, while Y. Zhao et al. (2018) similarly found CNNs superior to SVMs and ANNs. For fire susceptibility mapping, G. Zhang et al. (2019) found CNNs were more accurate than RF, SVMs, and ANNs. For time series forecasting problems, Liang et al. (2019) found LSTMs outperformed ANNs. Finally, Y. Cao et al. (2019) found that using an LSTM combined with a CNN led to better fire detection performance from video compared with CNNs alone.
In any case, more rigorous inter-model comparisons are needed to reveal in which conditions, and in what sense particular methods are more accurate, as well as to establish procedures for evaluating accuracy. ML methods are also prone to overfitting, so it is important to evaluate with robust test datasets using appropriate cross-validation strategies. In general, one desires to minimise errors associated with either under-specification or over-specification of the model, a problem known as the bias-variance trade-off (Geman, Bienenstock, & Doursat, 1992). However, several recent advances have been made to reduce overfitting in ML models, for instance, regularization techniques in DNNs (Kukačka, Golkov, & Cremers, 2017). Moreover, when interpreting comparisons between ML and statistical methods, we should be cognizant that just as some ML methods require expert knowledge, the accuracy of statistical methods can also vary with the skill of the practitioner. M. P. Thompson and Calkin (2011) also emphasize the need for identifying sources of uncertainty in modeling so that they can better managed.

Implementation Challenges
Beyond data and model selection, two important considerations for model specification are feature selection and spatial autocorrelation. Knowledge of the problem domain is extremely important in identifying a set of candidate features. However, while many ML methods are not limited by the number of features, more variables do not necessarily make for a more accurate, interpretable, or easily implemented model (Breiman, 2001;Schoenberg, 2016) and can lead to overfitting and increased computational time. Two different ML methods to enable selection of a reduced and more optimal set of features include GAs and PSO. Sachdeva et al. (2018) used a GA to select input features for BRT and found this method gave the best accuracy compared with ANN, RF, SVM, SVM with PSO (PSO-SVM), DTs, logistic regression, and NB. Hong et al. (2018) employed a similar approach for fire susceptibility mapping and found this led to improvements for both SVM and RF compared with their non-optimized counterparts. Tracy et al. (2018) used a novel random subset feature selection algorithm for feature selection, which they found led to higher AUC values and lower model complexity. Jaafari et al. (2019) used a NFM combined with the imperialist competitive algorithm (a variant of GA) for feature selection which led to very high model accuracy (0.99) in their study. Tien Bui et al. (2017) used PSO to choose inputs to a NFN and found this improved results. (G. Zhang et al., 2019) also considered the information gain ratio for feature selection. As noted in Moritz et al. (2012) and Mayr et al. (2018), one should also take spatial autocorrelation into account when modeling fire probabilities spatially. In general, the presence of spatial autocorrelation violates the assumption of independence for parametric models, which can degrade model performance.
One approach to deal with autocorrelation requires subsampling to remove any spatial autocorrelation Moritz et al. (2012). It is also often necessary to subsample from non-fire locations due to class imbalance between ignitions and non-ignitions (e.g., Y. Cao et al. (2017); G. Zhang et al. (2019)). Song, Kwan, Song, and Zhu (2017) considered spatial econometric models and found a spatial autocorrelation model worked better than RF, although S. J. Kim et al. (2019) note that RF may be robust to spatial autocorrelation with large samples. In contrast to many ML methods, a strength of CNNs is its ability to exploit spatial correlation in the data to enable the extraction of spatial features.

Interpretation
A major obstacle for the adoption of ML methods to fire modeling tasks is the perceived lack of interpretability or explainability of such methods, which are often considered to be "black box models. Users (in this case fire fighters and managers) need to trust ML model predictions, and so have the confidence and justification to apply these models, particularly in cases where proposed solutions are considered novel. Model intepretability should therefore be an important aspect of model development if models are to be selected and deployed in fire management operations. Model interpretability varies significantly across the different types of ML. For example, conventional thinking is that tree-based methods are more interpretable than neural network methods. This is because a single decision tree classifier can be rendered as a flow chart corresponding to if-then-else statements, whereas an ANN represents a nonlinear function approximated through a series of nonlinear activations. However, because they combine multiple trees in an optimized way, ensemble tree classifiers are less interpretable than single tree classifiers. On the other hand, BNs are one example of an ML technique where good explanations for results can be inferred due to their graphical representation; however, full Bayesian learning on large-scale data is very computationally expensive which may have limited early applications; however, as computational power has increased we have seen an increase in the popularity of BNs in wildfire science and management applications (e.g., Papakosta et al. (2017); Penman et al. (2015)).
DL-based architectures are widely considered to be among the least interpretable ML models, despite the fact that they can achieve very accurate function approximation (Chakraborty et al., 2017). In fact, this is demonstrative of the well-known trade-off between prediction accuracy and interpretability (see Kuhn and Johnson (2013) for an in-depth discussion). The ML community, however, recognizes the problem of interpretability and work is underway to develop methods that allow for greater interpretability of ML methods, including methods for DL (see for example, McGovern et al. (2019)) or model-agnostic approaches (Ribeiro, Singh, & Guestrin, 2016). Runge et al. (2019) further argue that casual inference methods should be used in conjunction with predictive models to improve our understanding of physical systems. Finally, it is worth noting that assessing variable importance (see Sec. 4.3.4) for a given model can play a role in model interpretation.

Opportunities
Our review highlights a number of potential opportunities in wildfire science and management for ML applications where ML has not yet been applied or is under-utilized. Here we examine ML advances in other areas of environmental science that have analogous problems in wildland fire science and which may be useful for identifying further ML applications. For instance, J. Li, Heap, Potter, and Daniell (2011) compared ML algorithms for spatial interpolation and found that a RF model combined with geostatistical methods yielded good results; a similar method could be used to improve interpolation of fire weather observations from weather stations, and so enhance fire danger monitoring. Rasp and Lerch (2018) showed that ANNs could improve weather forecasts by post-processing ensemble forecasts, an approach which could similarly be applied to improve short-term forecasts of fire weather. Belayneh, Adamowski, Khalil, and Ozga-Zielinski (2014) used ANNs and SVMs combined with wavelet transforms for long term drought forecasting in Ethiopia; such methods could also be useful for forecasting drought in the context of fire danger potential. In the context of numerical weather prediction, Cohen et al. (2019) found better predictability using ML methods than dynamical models for subseasonal to seasonal weather forecasting, suggesting similar applications for long-term fire weather forecasting. McGovern et al. (2017) discussed how AI techniques can be leveraged to improve decision making around high-impact weather. More recently, Reichstein et al. (2019) have further argued for the use of DL in the environmental sciences, citing its potential to extract spatiotemporal features from large geospatial datasets. Kussul, Lavreniuk, Skakun, and Shelestov (2017) used CNNs to classify land cover and crop types and found that CNNs improved the results over standard ANN models; a similar approach could be used for fuels classification, which is an important input to fire behaviour prediction models. Shi, Xie, Zi, and Yin (2016) also used CNNs to detect clouds in remote sensed imagery and were able to differentiate between thin and thick cloud. A similar approach could be used for smoke detection, which is important for fire detection, as well as in determining the presence of false negatives in hotspot data (due to smoke or cloud obscuration). Finally, recent proposals have called for hybrid models that combine process-based models and ML methods (Reichstein et al., 2019). For example, ML models may replace user-specified parameterizations in numerical weather prediction models (Brenowitz & Bretherton, 2018). Other recent approaches use ML methods to determine the solutions to nonlinear partial differential equations Raissi and Karniadakis (2018); Raissi, Perdikaris, and Karniadakis (2019). Such methods could find future applications in improving fire behaviour prediction models based on computationally expensive physics-based fire simulators, in coupled fire-atmosphere models, or in smoke dispersion modeling. In any case, the applications of ML that we have outlined are meant for illustrative purposes and are not meant to represent an exhaustive list of all possible applications.

Implications for fire management
We believe ML has been under-utilized in fire management, particularly with respect to problems belonging to either predictive or prescriptive analytics. Fire management comprises a set of risk control measures, which are often cast in the framework of the emergency response phases: prevention; mitigation; preparedness; response; recovery; and review (Tymstra, Stocks, Cai, & Flannigan, 2019). In terms of financial expenditure, by far the largest percentage spent in the response phase (Stocks & Martell, 2016). In practice, fire management is largely determined by the need to manage resources in response to active or expected wildfires, typically for lead times of days to weeks, or to manage vegetative fuels. This suggests the opportunity for increased research in areas of fire weather prediction, fire occurrence prediction, and fire behaviour prediction, as well as optimizing fire operations and fuel treatments. The identification of these areas, as well as the fact that wildfire is both a spatial and temporal process, further reiterate the need for ML applications for time series forecasting.
From this review, there were few papers that used time series ML methods for forecasting problems, suggesting an opportunity for further work in this area. In particular, recurrent neural networks (RNNs) were used for fire behavior prediction (Cheng & Wang, 2008;Kozik et al., 2013Kozik et al., , 2014 and fire occurrence prediction (Dutta et al., 2013). The most common variant of RNNs are Long Short Term Memory (LSTM) networks (Hochreiter & Schmidhuber, 1997), which have been used for burned area prediction (Liang et al., 2019) and fire detection (Y. Cao et al., 2019). Because these methods implicitly model dynamical processes, they should lead to improve forecasting models compared with standard ANNs. For example Gensler, Henze, Sick, and Raabe (2017) have used LSTMs to forecast solar power and S. Kim, Hong, Joh, and Song (2017) used CNNs combined with LSTM for forecasting precipitation. We anticipate that these methods could also be employed for fire weather, fire occurrence, and fire behaviour prediction.
We note that there are a number of operational research and management science methods used in fire management research including queuing, optimization, and simulation of complex system dynamics (e.g., Martell (2015)) where ML algorithms dont seem to provide an obvious alternative. For example, planning models to simulate the interactions between fire management resource configurations and fire dynamics reviewed by (Mavsar, González Cabán, & Varela, 2013). From our review, a few papers used agentbased learning methods for fire management. In particular, reinforcement learning was used for optimizing fuel treatments (Lauer et al., 2017) or for autonomous control of aircraft for fire monitoring (Julian & Kochenderfer, 2018a). GAs were used for generating optimal firelines for active fires Homchaudhuri et al. (2010) and for reducing the time for fire simulation Cencerrado et al. (2014). However, more work is needed to identify where ML methods could contribute to tactical, operational, or strategic fire management decision making.
An important challenge for the fire research and management communities is enabling the transition of potentially useful ML models to fire management operations. Although we identified several papers that emphasized their ML models could be deployed in fire management operations (Alonso-Betanzos et al., 2002;Artés et al., 2016;J. Davis, Nanninga, Hoare, & Press, 1989; J. R. Davis, Hoare, & Nanninga, 1986;Iliadis, 2005;Y. Liu et al., 2015;Stojanova et al., 2012), it can be difficult to assess whether and how a study has been adopted by, or influenced, fire management agencies. This challenge is often exacerbated by a lack of resources and/or funding, as well as the different priorities and institutional cultures of researchers and fire managers. One possible solution to this problem would be the formation of working groups dedicated to enabling this transition, preferably at the research proposal phase. In general, enabling operational ML methods will require tighter integration and greater collaboration between the research and management communities, particularly with regards to project design, data compilation and variable selection, implementation, and interpretation. However, it is worth noting that this is not a problem unique to ML, it is a long-standing and common issue in many areas of fire research and other applied science disciplines, where continuous effort is required to maintain communications and relationships between researchers and practitioners.
Finally, we would like to stress that we believe the wildfire research and management communities should play an active role in providing relevant, high quality, and freely available wildfire data for use by practitioners of ML methods. For example, burned area and fire weather data made available by Cortez and Morais (2007) was subsequently used by a number of authors in their work. It is imperative that the quality of data collected by management agencies be as robust as possible, as the results of any modelling process are dependent upon the data used for analysis. It is worth considering how new data on, for example, hourly fire growth or the daily use of fire management resources, could be used in ML methods to yield better predictions or management recommendations using new tools to answer new questions may require better or more complete data. Conversely, we must recognize that despite ML models being able to learn on their own, expertise in wildfire science is necessary to ensure realistic modelling of wildfire processes, while the complexity of some ML methods (e.g., DL) requires a dedicated and sophisticated knowledge of their application (we note that many of the most popular ML methods used in this study are fairly easy to implement, such as RF, MaxEnt, and DTs). The observation that no single ML algorithm is superior for all classes of problem, an idea encapsulated by the "no free lunch theorem (Wolpert, 1996), further reinforces the need for domain-specific knowledge. Thus, the proper implementation of ML in wildfire science is a challenging endeavor, often requiring multidisciplinary teams and/or interdisciplinary specialists to effectively produce meaningful results.

A word of caution
ML holds tremendous potential for a number of wildfire science and management problem domains. As indicated in this review, much work has already been undertaken in a number of areas, although further work is clearly needed for fire management specific problems. Despite this potential, ML should not be considered a panacea for all fire research areas. ML is best suited to problems where there is sufficient highquality data, and this is not always the case. For example, for problems related to fire management policy, data is needed at large spatiotemporal scales (i.e., ecosystem/administrative spatial units at timescales of decades or even centuries), and such data may simply not yet exist in current inventories. At the other extreme, data is needed at very fine spatiotemporal scales for fire spread and behavior modeling, including high resolution fuel maps and surface weather variables which are often not available at the required scale and are difficult to acquire even in an experimental context. Another limitation of ML may occur when one attempts make predictions where no analog exists in the observed data, such as may be the case with climate change prediction.

Conclusions
Our review shows that the application of ML methods in wildfire science and management has been steadily increasing since their first use in the 1990s, across core problem domains using a wide range ML methods. The bulk of work undertaken thus far has used traditional methods such as RF, BRT, MaxEnt, SVM and ANNs, partly due to the ease of application and partly due to their simple interpretability in many cases. However, problem domains associated with predictive (e.g., predicted fire behavior) or prescriptive analytics (e.g. optimizing fire management decisions) have seen much less work with ML methods. We therefore suggest opportunities exist for both the wildfire community and ML practitioners to apply ML methods in these areas. Moreover, the increasing availability of large spatio-temporal datasets, from climate models or remote sensing for example, may be amenable to the use of deep learning methods, which can efficiently extract spatial or temporal features from data. Another major opportunity is the application of agent based learning to fire management operations, although many other opportunities exist. However, we must recognize that despite ML models being able to learn on their own, expertise in wildfire science is necessary to ensure realistic modelling of wildfire processes across multiple scales, while the complexity of some ML methods (e.g. DL) requires a dedicated and sophisticated knowledge of their application. Furthermore, a major obstacle for the adoption of ML methods to fire modeling tasks is the perceived lack of interpretability of such methods, which are often considered to be black box models. The ML community, however, recognizes this problem and work is underway to develop methods that allow for greater interpretability of ML methods (see for example, (McGovern et al., 2019)). Data driven approaches are by definition data dependent -if the fire management community wants to more fully exploit powerful ML methods, we need to consider data as a valuable resource and examine what further information on fire events or operations are needed to apply ML approaches to management problems. Thus, wildland fire science is a diverse multi-faceted discipline that requires a multi-pronged approach, a challenge made greater by the need to mitigate and adapt to a world with more fire.