A data-driven prediction model for Fennoscandian wildfires

. Wildfires are recurrent natural hazards that affect terrestrial ecosystems, the carbon cycle, climate and society. They are typically hard to predict, as their exact location and occurrence are driven by a variety of factors. Identifying a selection of dominant controls can ultimately improve predictions and projections of wildfires in both the current and a future climate. (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) Data-driven (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) models (cid:58)(cid:58)(cid:58) are (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) suitable (cid:58)(cid:58)(cid:58) for (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) identification (cid:58)(cid:58) of (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)


Introduction
Boreal ecosystems, covering large parts of northern North America and Eurasia, comprise one of the world's most extensive biomes (Balshi et al., 2009).In the boreal region, which stores approx.30% of the world's soil carbon pool, fires are the major stand-renewing agent and play a major role in carbon storage and emissions (Flannigan et al., 2009).Boreal fires emit up to 9% of global fire carbon emissions and 15% of global fire methane emissions ::::::: annually : (Van der Werf et al., 2017;Flannigan et al., 2009).The fire season length, fire frequency and burned area have increased in many parts of the boreal region, and fires (Padilla et al., 2015;Randerson et al., 2012).On the other hand, the availability and quality of national fire datasets varies among countries, whereas satellite products allow for consistent transboundary fire data.
Several fire characteristics are of interest, such as the fire regime, emissions, burned area, duration and feedbacks with vegetation.The characteristics all have in common that they rely on the fundamental question of when and where fires are likely to occur.Fire occurrence, or likelihood thereof (i.e.fire danger probability), can be used for monitoring, forecasting and projections, and has been investigated using three main approaches: fire weather indices, global fire models, and data-driven models, as further elaborated below.
Over the last two decades, global fire models (fire-enabled dynamic global vegetation models ::::::: Dynamic ::::::: Global ::::::::: Vegetation ::::::: Models; ::::::: DGVMs) that can be coupled with climate models have been developed (Hantson et al., 2016).Most global fire models are process-based models that estimate fuel load and fuel moisture, based on fire weather indices, or other atmospheric and simulated moisture conditions.These estimates are subsequently combined with the probability of lightning and/or anthropogenic ignition, to determine whether a fire will occur in a grid cell.Lightning rates can be constant in time or based on statistics, whereas population density is typically used to estimate human ignition probabilities as well as human suppression of fires.An advantage of global fire models is that they allow for examining the feedbacks between fire, vegetation and climate (Hantson et al., 2016).However, this relies on how well the model is able to represent reality.Whereas the global fire models have shown to reproduce the seasonality in burned area well, they vary in their ability to represent the spatial pattern in burned area, and are generally unable to represent interannual variations (Hantson et al., 2020).
Despite the :::: these : challenges, bottom-up approaches are needed due to their ability to :::::: valuable ::: as :::: they : facilitate an explicit link between climate science and societal or environmental impacts, as emphasised by the compound event framework among others (Van der Wiel et al., 2020;Zscheischler et al., 2018;Leonard et al., 2014).The approach allows for a high degree of flexibility in terms of potential drivers, target variables and models.This flexibility allows for both detailed regional investigation ::::::::::: investigations, and constructing models transferable to future ::::::: different ::::::: (future) : climate scenarios (e.g.Goulart et al., 2021).As opposed to the fire weather indices and global fire models, data-driven models do not require a priori assumptions of the dominant mechanisms and physical processes controlling fire occurrences (except indirectly in the selection of potential predictors).This makes data-driven models suitable when the nature of the processes is complex and/or partly unknown, and when the emphasis is on accurate predictions or identification of dominant controlling mechanisms (Parisien and Moritz, 2009).Data-driven models can in this way both help improve process-based models, and work as independent prediction models.A data-driven approach allows us to pinpoint controlling variables :::::: identify ::::::::: controlling :::::: factors : for fire occurrences for specific regions.This is especially important for regions such as Fennoscandia, which possess special ::::: highly :::::: variable : hydroclimatological conditions, and where the current number of (satellite-detected) fires are relatively low as compared to other regions of the world.
In this study, we developed a temporally and spatially explicit data-driven machine learning model for Fennoscandia to reach two main objectives: -Identify the dominant predictors of wildfires -Construct monthly fire danger probability maps A satellite-based burned area product was used to construct the target dataset of fire occurrences over the period 2001-2019 at a monthly time step.We chose a random forest (RF) algorithm to train the model and identify dominant predictors from a predefined set of 30 hydrometeorological and land cover based metrics :::::: indices.To have trust in the model, it was tested on an independent dataset using the Area Under the Curve of the Receiver Operating Characteristic (ROC-AUC).In addition, we answered ::: aim :: to :::::: answer : the following research questions: 1. How well does the data-driven model perform as compared to the Canadian Fire Weather Index (FWI) , which ::: that is developed for similar biomes and latitudes as Fennoscandia?
5. What is the effect of not including a dynamical vegetation index?
The paper is structured as follows: Sect. 2 provides a detailed description of the data sources and methods applied.The results are presented in Sect.3. A discussion is provided in Sect.4, followed by concluding remarks given in Sect. 5. ::: The

Satellite-based fire occurrence dataset
Data used for the target variable was the gridded fire burned area European Space Agency Climate Change Initiative (ESA-CCI) product version 5.1.1cds,downloaded from the Copernicus Data Store (ESA-CCI, 2020).This dataset holds the same information as the Fire ECV Climate Change Initiative Project (Fire CCI) burned area product version 5.1, and is based on Terra ::: the ::::::::: quality :: of ::: the ::::::::::: observations.::: The ::: fire :::::: burned :::: area :::::: dataset :: is :::::::: available :::: both :: as : a ::::: 0.25 occurring :: in ::: two :: or ::::: more ::::: cells) :: is :::::: reduced ::::: when ::::: using ::: the :::::: coarser ::::: scale :: of ::: the :::: grid :::::: product :: as ::::::::: compared :: to ::: the :::: pixel ::::::: product.: From the global fire burned area dataset, Fennoscandia was selected, and a two-class target variable of fire occurrence was constructed by classifying each data point as 1 (fire) if the burned area of the data point exceeded zero, and 0 (no-fire) otherwise.Thus, each data point (i.e. each grid cell for each time step) was independently considered as either a fire occurrence or not, and no merging was performed for : of : fires that potentially extended multiple grid cells or months.The original monthly time step, period 2001-2019, and 0.25 • longitude/latitude regular grid of the dataset were kept to have as many data points as possible.However, the months October to March were omitted each year, as they had less than 20 fire occurrences over the whole period and all grid cells, which was considered too few occurrences for the analysis.Figure 2 shows the distribution of the fire occurrences over time and space.There is an extreme imbalance between the two classes (fire and no-fire) in the target variable, with only 0.3% of the data points classified as fire.
Figure 3 shows the differences in the spatial and temporal distribution of fire data points :::: (i.e.::: grid ::::: cells :::: with ::::::: recorded :::: fire ::::::::: occurrence :: at : a ::::: given ::::: time :::: step) : in the Norwegian fire occurrence dataset and the satellite-based fire occurrence dataset for the spatial and temporal domain of the Norwegian fire occurrence dataset.There are substantial differences between the two datasets.The Norwegian dataset has in total 800 fire data points, as opposed to 24 in the satellite-based dataset.Whereas the three months of the highest number of fire data points in the Norwegian dataset are May-July, no fire data points exist for June and July in the satellite-based dataset.The unusual high number of fires in 2018 is not reflected in the satellite-based dataset.
Finally, the Norwegian dataset show a higher spread in fire occurrence across Norway as compared to the satellite-based dataset.

Criteria for potential predictors
The selection of potential predictors was governed by three criteria: available in most climate models, transferable to future ::::::: different climate scenarios, and compatible with the spatiotemporal resolution and domain of the target data.In addition, the data should be of high quality and the predictor :::::::: predictors : should have a physical interpretation.
By limiting the selection of potential predictors to the two first criteria, we allow for investigations of fire hazard prediction ::::: danger :::::::::: probability : as modelled by the data-driven model for different potential future climate scenarios.Due to the lack of human influence variables ::::: being :::::::::: represented in many climate models, predictors such as human infrastructure, settlement, and The third criterion ensured compatibility of all data used in the analysis.In order to carry out the data-driven approach, all potential predictors needed ::: need : to have the same spatiotemporal domain and resolution as used for the satellite-based target variable.Thus, all potential predictors were spatially constrained to Fennoscandia using a spatial resolution of 0.25 • , and consisted of monthly values from April to September over the period 2001-2019.After the preparation of the potential predictors, the spatial domain was further limited to the grid cells for which all data (including the target variable) had values for all time steps, and to grid cells where the fraction of burnable area (fraction_burnable) was greater than zero.   .Statistical dependency of the potential predictors (abbreviations according to Table 1) using the full dataset: a) Spearman rank correlation and b) hierarchical clustering using Ward's Method (Ward, 1963).

Data and pre-processing of potential predictors
The final selection comprised 30 potential predictors, several of which were highly correlated (Fig. 4).The derivation of each potential predictor is described in the following sections, ordered by categories as given in Table 1 (first column).

Precipitation, temperature and meteorological drought indices
The precipitation and temperature predictors were calculated based on ensemble mean daily precipitation totals (rr), and daily minimum, maximum and mean air temperature (tn, tx and tg, respectively) at 0.25 • longitude/latitude from E-OBS version 23.1e (Cornes et al., 2018).E-OBS is a European dataset based on the European Climate Assessment and Dataset station information (ECA&D), with data covering the period 1950 until near-present.
From the daily values, monthly precipitation sums (rr_sum) and monthly means of tn, tx and tg (tn_mean, tx_mean and tg_mean, respectively) were calculated.The corresponding normalised anomalies (rr_sum_anomaly, tn_mean_anomaly, tx_mean_anomaly and tg_mean_anomaly) were calculated by subtracting the 1991-2020 mean and dividing by the 1991-2020 standard deviation of each month separately.The reference period 1991-2020 was chosen to follow the most recent normal period (WMO guidelines).An additional metric ::::: index, tx_max, was constructed by extracting the maximum daily tx for each month.
Two meteorological drought indices were included; the Standardized Precipitation Index (SPI; Guttman, 1999;McKee et al., 1993) and Standardized Precipitation-Evapotranspiration Index (SPEI; Beguería et al., 2014;Vicente-Serrano et al., 2010), each calculated for a 2, 3, 6 and 9 month accumulation period.The accumulation period is added to the abbreviation, e.g.SPI3 (representing a 3-month accumulation period).Both SPI and SPEI are indicators of the dryness/wetness of a site as compared to normal, and can be compared across locations with different climatology and highly non-normal precipitation distributions (Stagge et al., 2014).Whereas SPI is an estimate of the precipitation anomaly, SPEI estimates the anomaly in the climatic water balance defined as precipitation minus potential evapotranspiration (PET).We used the Hargreaves equation (Hargreaves and Samani, 1985) to estimate PET, following the recommendation by Stagge et al. (2014).The Hargreaves equation estimates daily PET based on each day's tg, a proxy for net radiation (tx minus tn), and an estimate of extraterrestrial radiation based on the grid latitude and day of the year.To compute the SPI (SPEI), the precipitation (precipitation minus PET) for a given accumulation period during the reference period is fitted to a parametric distribution, then non-exceedance probabilities from the distribution is transformed to the standard normal distribution, and finally the standard normal distribution is used to estimate anomalies in terms of standard deviations over a period of interest.As for the precipitation and temperature anomaly calculations, 1991-2020 was used as the reference period.Following the recommendations of Stagge et al. (2015), we applied the gamma distribution including a "centre of mass" adjustment for zero precipitation for SPI, and the generalised extreme value distribution for SPEI.All final SPI and SPEI values were limited to the range -3 to 3 due to the uncertainty related to the extrapolation required for extreme values when based on a limited historical record (Stagge et al., 2015).

Wind speed
Hourly eastward and northward components of 10m wind were derived from ERA5-Land hourly data, which is available from 1981 to near-present at a 0.1 • regular longitude/latitude global grid (Muñoz Sabater, 2019a) .Daily mean values were calculated, and the eastward and northward components were combined to derive the daily mean wind speed.Further, the data was remapped using a second order conservative remapping to match the 0.25 • grid used for the analysis.For each month, the daily mean wind speed was used to derive the monthly mean wind speed (wpeed_mean), and the 10th and 90th percentile of daily wind speed (wspeed_p10 and wspeed_p90, respectively).

Land cover
The fraction of burnable area (fraction_burnable) was extracted from the same dataset as used for the satellite-based target variable (ESA-CCI, 2020).This metric :::: index : represents the fraction of each grid cell that corresponds to vegetated land cover that could burn, and :: i.e. dataset is shifted 0.125 • as compared to the grid used for the analysis, and a second order conservative remapping was therefore applied to remap the data.Finally, the FWI metrics monthly mean FWI (FWI_mean) and monthly max FWI (FWI_max) were calculated from daily FWI values.

Model set-up and training for Fennoscandia
The target variable and potential predictors comprised the dataset used for the model training.Although a machine learning algorithm for constructing the model is automated, several choices have to be made in a model set-up and training procedure.
This section describes these choices and the considerations made, including how to split the dataset into a training and a test set, and the choice of machine learning algorithm, evaluation criterion, complexity terms to be tuned and training procedure specifications.

Training and test set
The dataset was split into a training and a test set to obtain an independent evaluation of the final model.Due to temporal dependency in the time series, a fully random selection of data points to the training and test set would likely give a too optimistic evaluation of the model.Instead, assuming limited dependence between years, whole years were selected for both the training and test set.Five years (26%) were selected for the test set : 2004, 2011, 2013, 2017 and 2018.The remaining 14 years (74%) constituted the training set.The years were selected at random, with two exceptions: 1) the year 2018 was manually chosen to be included in the test set to evaluate the model's prediction capability in what is considered an extreme year in terms of hydrometeorological conditions (Bakke et al., 2020), and 2) the year 2017 was randomly chosen for the test set among the years 2016, 2017 or 2019, to allow ::: for comparison with the Norwegian fire occurrence dataset (Sect.2.6.5) for at least two years (2017 and 2018).

Machine learning algorithm
The machine learning algorithm was required to 1) have a straightforward way of estimating the importance of the predictors, 2) have the ability to deal with non-linearities, 3) have the ability to deal with extreme imbalanced data, and 4) have the ability to predict fire danger using probabilities rather than binary classification into fire/no-fire.A random forest classifier (RF) was chosen as a machine learning algorithm that fulfilled the above requirements.The RF, introduced by Breiman ( 2001), is a model built up of an ensemble of decision tree classifiers (DTs).A DT is a non-parametric supervised learner that builds a tree by splitting the data multiple times based on predictor thresholds ::::: values :::::::::: (thresholds) and performs classification estimates based on the target variable values in the end nodes.The complexity of a DT is determined by the tree size and number of predictors.To reduce the variance (instability) of a single DT, an ensemble of DTs can be built based on bootstrap samples of the data and used for prediction by aggregating the DT estimates (bagging).However, the benefit of aggregating the DT estimates is limited by correlation among the DTs.Random Forest is a variant of bagging that meets this shortcoming by randomly selecting a subset of predictors for each split, hence reducing the correlation among the DTs.The complexity of a RF is determined by the tree size and number of predictors, as well as the number of trees to build.Whereas an increase in tree size and number of (irrelevant) predictors can lead to overfitting, the number of trees to build cannot.However, computational power and computational time limit the number of trees to be built, and the prediction accuracy typically stabilises after a certain number of trees.
Here, we applied the random forest classifier method in the Python package scikit learn (Pedregosa et al., 2011).To control the complexity of the model, the maximum size of each tree (max_depth) and the number of predictors included (Np) were tuned (ref.Sect. 2.5.4).The number of trees was set to 100.The number of predictors to consider for each split was set to the square root of the total number of predictors, as recommended for RF classification problems (Hastie et al., 2009).
To account for the imbalance in the target data, the target classes (fire and no-fire) were weighted inversely proportional to the class frequencies.Remaining parameters were set to default values given by the classifier method.Instead of hardcoded classifications based on the majority class of the end node, probability predictions were calculated based on the proportion of each class.Probability predictions allow for flexibility in the classification threshold, and are needed for the evaluation criterion chosen in this study (Sect.2.5.3).

Evaluation criterion
Model evaluation criteria typically consider all target variable classes with equal weight.Consequently, many traditional evaluation criteria are not applicable due to the extreme imbalance between the number of fire and no-fire data points.As an example for our data, a model predicting no fires at all would have a model accuracy (i.e.proportion of correctly classified data points) of 99.7%, equalling the proportion of no-fires in the target data.This accuracy indicates a near-perfect model for a model that clearly does not meet our objective.Thus, we needed a criterion that is not affected by the extreme imbalance in the target data.
We chose the Area Under the Curve of the Receiver Operating Characteristic (ROC-AUC), which calculates the area under the curve of true-positive rate (Sensitivity) vs false-positive rate (1-Specificity) for different classification thresholds based on the probability predictions (Fawcett, 2006).Thus, it tackles imbalanced data and takes into account probability prediction in one single measure.A ROC-AUC score less than 0.5 indicates a worse than random model, a value of 1 indicates a perfect model.
The ROC-AUC score is 0.5 for a model predicting no fires at all, a model predicting fires for all data points, and (on average for) a model predicting a target dataset in which each data point is randomly selected as fire or no-fire.

Training of the model
To avoid overfitting and to make the model as simple as possible without losing prediction capability, the RF complexity parameter controlling the maximum depth of each tree (max_depth) and the number of predictors (Np) were tuned.The max_depth was tuned from values ranging from 1 (the simplest tree structure) to 20 (a complex tree structure).In case of Np, the model was trained to find the best predictor subset using a backward-stepwise selection procedure.To find the optimal combination of max_depth and Np, the model was trained using cross-validation (CV).
The training set was split into seven CV folds by grouping two and two years keeping the number of fires as constant as possible (due to the low number of fires in some years).Full years were selected for the folds to limit the temporal dependency (Sect.2.5.1).For each CV-iteration, the model was trained on each combination of max_depth and Np.A backward-stepwise selection procedure was implemented to find the best predictor subset for each Np.For each CV iteration and max_depth value, the backward-stepwise selection procedure was as follows: 1) The model was fitted using all predictors, 2) the model evaluation criterion was calculated using the left-out fold, 2) predictor importances were estimated based on the left-out fold, 3) the predictor with the lowest predictor importance was omitted from the predictor subset, and 4) repeating from step 1, now using the new predictor subset instead of all predictors.We also tested an alternative backward-stepwise selection procedure, which explicitly accounted for the high correlation between predictors by omitting the least important of the two most correlated predictors at each step.This method performed less well.A limitation of this method is that predictors with low correlation with other predictors were :: are kept regardless of their importance for the model performance.
Predictor importance was estimated using the permutation importance from the python package scikit learn (Pedregosa et al., 2011), which estimates the decrease in the model score (here: ROC-AUC score) when one of the predictors is randomly shuffled.To increase the robustness of the estimate, the random shuffle was repeated five times, and the mean of the predictor importance of each reshuffling used as the predictor importance estimate.The predictor importance is typically close to zero (or even negative due to randomness) for non-important predictors, as well as for highly correlated predictors.In the case of highly correlated predictors, the predictor importance of one predictor can experience an abrupt increase when a highly correlated predictor is omitted.Thus, the elimination criterion in the backward-stepwise selection cannot depend on a static set of predictor importances (e.g. of the full model; Genuer et al., 2015), but needs to be based on the predictor importances of the updated model for each predictor subset.

Final model selection and evaluation for Fennoscandia
This section describes the selection (Sect.2.6.1), and evaluation (Sect.2.6.2-2.6.5) of the final data-driven model for Fennoscandia.The data-driven model performance was evaluated on the test set, and compared with the performance of FWI (Sect.2.6.2).

Selection of a data-driven model
For each combination of max_depth and Np, the average cross-validation ROC-AUC scores were calculated, and the combination of max_depth and Np yielding the highest score (max_depth_opt and Np_opt, respectively) was selected for the final model.The selection of the optimal predictor subset was not trivial, as the Np_opt predictors selected in each cross-validation iteration could potentially differ due to the varying sub-dataset used as the left-out fold.To select the optimal subset of Np_opt predictors, the seven-fold cross-validation was performed again using max_depth_opt and Np_opt, and the model fitted to each of the predictor subsets selected during the training of the model.The predictor subset yielding the highest average cross-validation ROC-AUC score was chosen for the final model.

Model evaluation
The predictability of the final model was evaluated for the test set (i.e. the years not included in the training of the model) using the ROC-AUC criterion (ref.Sect. 2.5.3).The prediction capability of the data-driven model was then compared with the ROC-AUC of the FWI metrics; FWI_mean and FWI_max (Sect.2.4).

Predictor importance
To estimate the importance of each predictor selected for the final model, we used permutation importance, as described in Sect.2.5.4,but now using ten random shuffles.For comparison, the impurity-based importances (from the Python package scikit learn; Pedregosa et al., 2011) were also estimated.The impurity-based importance estimates the normalised total reduction of the criterion introduced by each predictor.Although it is typically biased towards predictors with high cardinality, typically an issue for numerical predictors, this was not considered a problem here because all input data were continuous.The impurity-based importances sum to 1, and the higher a value, the more important the predictor.As opposed to permutation importancethat , ::::: which : can be estimated on any dataset, the impurity-based importance estimates are based on the training set only, which can be misleading in the case of overfitting.On the other hand, in the case of highly correlated predictors, the impurity-based importance may still give a better representation of the importances (as compared to permutation importances) due to the randomness in the subset of predictors selected for each split in the tree construction.Due to differences in the computation and the pros and cons of the two predictor importance estimates, we applied both the permutation importance and the impurity-based importance.Permutation importances were estimated for the training set and test set separately.

Fire danger probability maps
Fire danger probability maps were produced for each month and year in the test set using the prediction probabilities of the datadriven model, and plotted together with fire occurrences from the satellite-based fire occurrence dataset.Similar maps based on FWI_mean and FWI_max were also produced for comparison.In addition, the gridwise Spearman rank correlation between the model predictions, FWI_mean and FWI_max were calculated to reveal any consistent spatial patterns in the agreement (or lack thereof) of the :: fire :::::: danger : predictions.

Model evaluation using the Norwegian fire occurrence dataset
The Norwegian fire occurrence dataset was included as an independent dataset to evaluate the data-driven model's ability to predict a more detailed fire dataset.The two years included in both the test set for Fennoscandia and the Norwegian fire occurrence dataset (i.e.2017 and 2018) were used as basis for this evaluation, with Norway as the spatial domain.Therefore, a two-fold cross-validation was applied (instead of the seven-fold used in the main analysis) in which each year constituted a fold.The remaining training set-up follows the procedure as described in Sect.2.5.

Additional experiments
Two additional experiments were performed to test the effect of: (1) using two alternative machine learning algorithms, and ( 2) not including a dynamical vegetation index as a potential predictor.A description of the experiments are given below.
Two additional, but related, machine learning algorithms were tested to assess the applicability of using the Random Forest algorithm.First, we tested a Decision Tree model (DT; ref.Sect.2.5.2) to see if a single tree was sufficient to construct a good prediction model.The DT was trained using the same procedure as for the RF model (Sect.2.5), except that all predictors were evaluated as candidates for each split.Second, we tested a boosting algorithm called AdaBoost (Freund and Schapire, 1997).
Boosting algorithms are considered one of the most powerful learning ideas introduced in this century (Hastie et al., 2009).
The algorithm makes predictions based on aggregation of results from a selection of constructed weak classifiers (here: DT classifier with max_depth=1).For each new classifier constructed after the first, the data points are weighted based on previous misclassifications, in order to give more emphasis on the misclassified observations.AdaBoost was trained using the same procedure as for the RF model (Sect 2.5), except that instead of max_depth, the parameter defining the maximum number of weak classifiers to construct (either 50, 100, 200 or 500) was tuned.

Selection of a data-driven model for Fennoscandia
Based on the average cross-validation (CV) ROC-AUC scores for all combinations of max_depth and number of predictors (Np), the highest score was found for max_depth=9 and Np=15 (Fig. S1).The CV scores revealed that there were no clear best-choice of max_depth independent of the selected number of predictors, and vice versa.This emphasises the importance of testing the combined effect and not fit max_depth and Np separately.For many combinations, the ROC-AUC values were similar (varying only on the second decimal), opening for selecting a simpler model without too high cost of prediction capability.We chose to automatically select the combination yielding the best score, to avoid an extra element of subjectivity.
The max_depth of nine was well within the investigated max_depth values.The degrading scores on both ends (i.e. the lowest and highest max_depth values investigated) gave trust in that the range of max_depth values investigated was sufficient.Of the original 30 potential predictors, half remained for the final data-driven model.The number of trees (i.e.100) used for the data-driven model were considered sufficient, as indicated by the stabilisation of the model performance (Fig. S2).
Of the 15 predictors, five were related to volumetric soil water (swvl1, swvl2, swvl3, swvl4 and swvl2_anomaly), three to temperature (tg_mean, tn_mean and tx_mean), three to wind speed (wspeed_mean, wspeed_p10 and wspeed_p90), two to precipitation (rr_sum and rr_sum_anomaly), and the two predictors snow cover (snowc) and fraction of burnable area (fraction_burnable).All meteorological drought indices and temperature anomalies were omitted.For precipitation and soil moisture, both monthly mean (or sum) values and monthly anomalies were selected.Notably, many of the selected predictors were highly correlated (Fig. 4).

Predictor importance
Figure 6 shows the importances of the predictors, using the permutation importance estimated for the training and test set separately, as well as the impurity-based importance estimates.The dominant predictor in all cases was the volumetric soil water anomaly in soil layer 2 (swvl2_anomaly).The remaining predictors were of significantly less, and similar, importance.
In all three algorithms tested, anomaly in shallow volumetric soil water (either in soil layer 1 or 2), monthly mean of daily maximum temperature and volumetric soil water in soil layer 4 were the top three most important predictors (Fig. 6, S3b and S4b).The permutation importance estimates of the training set is generally higher than that of the test set.This is expected since the predictor subset was chosen based on the training set.Differences in the hydrometeorological conditions in the training set years and test set years may also play a role in explaining the differences in the order and magnitude of the permutation importance estimates.
The bivariate and univariate distributions of the most dominant predictor (swvl2_anomaly) and the predicted fire danger probabilities are shown in Fig. 7.The figure shows a clear distinction between prediction ::: fire :::::: danger probabilities for fire and no-fire data points for the training set (Fig. 7a), and less distinct so in the test set (Fig. 7b).For both the training and test set, most fire data points occur for swvl2_anomaly values below zero (dryer soil that normal), and there is a weak negative relationship between the swvl2_anomaly values and prediction ::: fire :::::: danger probabilities.However, a high density of prediction ::: fire :::::: danger probabilities of approx.zero are distributed along a relatively wide range of swvl2_anomaly values (between approx.-1.5 and 1), pointing to the importance of other predictors in minimising the fire danger probabilities for these data points.
In the experiment including NDVI as potential predictor, the test set ROC-AUC score was 0.799 (Fig. S5) compared to 0.791 for the baseline.The most dominant predictor for this experiment was still swvl2_anomaly, with NDVI in second place, regardless of the predictor importance estimate.The final model had a more complex tree structure (max_depth=16) and fewer number of predictors ( 11) as compared to the model without NDVI.The difference in the ::::: higher : number of predictors :: in ::: the ::::: model ::::::: without ::::: NDVI can likely be explained by the fact that other predictors are included to compensate for the lack of NDVI information about vegetation health.

Fire danger probability maps
Fire danger probability maps :::::::: composed using the data-driven model and (for comparison) the FWI metrics for 2018 are shown in Fig. 8, and for the remaining test set years in Fig. S6-S9.The maps also include the actual fire occurrences according to the satellite-based target variable (marked as dots).The year 2018 had one of the highest number of fire occurrences during the period (Fig. 2), which is reflected in high-end fire danger values covering large parts of the region, in particular in May-July as well as August in the southeast.By visual inspection, many of the fires occurred in cells of high fire danger predicted by the data-driven model as well as by the FWI metrics.On the other hand, several months have high fire danger in areas with no fire occurrences.This likely reflects either an actual high fire danger but a lack of ignition sources, or a weakness in the ::: fire :::::: danger predictions.
In Fig. 9 a high (>0.8)grid-wise rank correlation in fire predictions :::::::::: correlations :: in :::: fire :::::: danger : between FWI_max and FWI_mean across Fennoscandia reflect temporal agreement in fire predictions ::::: danger : between the two FWI metrics.However, the grid-wise rank correlation between the data-driven model and either of the two FWI metrics are spatially less coherent, with approx.90% of the grid-cell correlations ranging between 0.4 and 0.9.The highest correlations (between 0.7 and 0.9) are found in eastern Fennoscandia (Russia) and along a southwest-northeastern belt from southern Norway, through mid and   The ROC-AUC score of the data-driven model prediction of the original (satellite-based) target variable for Fennoscandia was recalculated for the evaluation period (2017 and 2018) with Norway as the spatial domain.The ROC-AUC score of the Norwegian fire occurrence dataset was 0.755 as compared to 0.726 for the satellite-based fire occurrence dataset (Fig. S10a).
The pronounced differences :::::::: difference : in number of fires (Fig. 3) are :: is reflected in the non-smooth ROC-AUC curves of the  Prediction of the Norwegian fire occurrence dataset using the two FWI metrics had notably higher ROC-AUC score (0.851 for FWI_max and 0.867 for FWI_mean) as compared to 0.755 for the data-driven model trained for Fennoscandia (Fig. S10b).
The monthly fire danger prediction maps for Norway jointly showing fire occurrences, reflect the higher prediction performance obtained by the FWI metrics, in particular in capturing the fire dense areas in May-July 2018 (Fig. S11-S12).The satellitebased target variable had a low representation of fire occurrences in Norway; accordingly, we did not expect an equally good performance of the data-driven model trained for Fennoscandia as for FWI when evaluated for the Norwegian dataset.
A ::: The : separate data-driven model was trained using the Norwegian fire occurrence dataset as the target variable .It ::::: (Sect.::::: 2.6.5)resulted in a final model for Norway with max_depth=2 and Np=8, thus representing : 8. :::: This ::::::::: represents a considerably simpler model compared to the data-driven model trained using the satellite-based fire occurrence dataset for Fennoscandia.This may be explained by the reduced geospatial complexity in the training set.The ROC-AUC score improved from 0.755 to 0.836 for the test set years 2017 and 2018 (Fig. S13a), although still slightly below the FWI performance.As opposed to the data-driven model trained for Fennoscandia, this model selected monthly maximum temperature and monthly mean of daily maximum temperature as the two dominant predictors (Fig. S13b).The remaining predictors were of notably lower importance, and related to (anomalies in) volumetric soil water, high wind speed, precipitation anomaly and fraction of burnable area.

Discussion
A data-driven model developed on a monthly and 0.25 • spatial resolution was found suitable for fire danger probability prediction ::::::: mapping in Fennoscandia, despite the region's spatiotemporal heterogeneity in hydroclimatological conditions.In the following, we discuss the selected predictors, known challenges, the added value of a data-driven model and ways forward.

Dominant predictors of Fennoscandian wildfires
The relatively large number of statistically dependent predictors selected for the data-driven model, illustrate :::::::: illustrates the complexity of the controlling mechanisms of fires.In the literature, a wildfire is often referred to as a compound hazard as it is caused by the co-occurrence of several drivers, not necessarily extreme themselves.
Monthly mean daily maximum temperature (tx_mean) and volumetric soil water in the deepest soil layer (swvl4) have the second and third highest test set permutation importance, ::::::::::: respectively.Together with anomalies in shallow volumetric soil water (soil layer 1 or 2), these predictors were the most dominant also in the additional experiments using Decision Tree and Adaboost : , : instead of Random Forest, : as the machine learning algorithm.Including NDVI as a potential predictor still gives swvl2_anomaly as the dominant predictor, with NDVI as the second, and swvl4 and tx_mean as the fourth and fifth most important according to test set permutation importance.The consistency in terms of predictors chosen between the experiments emphasise ::::::::: emphasises : the importance of these predictors in predicting fire danger probabilities.Whereas tx_mean is the average of the highest daily temperaturesand affects , :::::::: affecting ::: the : general evaporative demand and transpiration, swvl4 is related to slowly changing deep soil moisture, which is important for water uptake by plants with deep roots.Thus, both metrics have separate roles ::::::::::: Accordingly, :::: each ::::: index ::: has :: a ::::::: separate ::: role : in controlling the condition ::::::::: conditions favourable (or non-favourable) for fires, roles that also differ from the role of shallow soil moisture anomaly.
The creation of a data-driven model using the Norwegian fire occurrence dataset as the target variable for the model training, gave a somewhat different selection of dominant predictors.A fewer number of predictors were selected, and the highly correlated tx_max and tx_mean stood out as the two most dominant predictors.We recognise three potential reasons for the difference between this model and the original model developed for Fennoscandia; a change in target variable, study domain and period under investigation, whereof the two latter follow from the first.The target variable was based on ten-fold more fire occurrences for Norway than what was available for the satellite-based dataset for Fennoscandia.It is here worth noting the overall difference in size of the fires recorded in each database, with generally larger fires being represented in satellitebased burned area products than in the Norwegian dataset.Thus, the predictors found for the Norwegian target variable may be more important for small fires, which is typically not included in the satellite-based dataset.Moreover, the difference may reflect different dominant controls of fires in Norway as compared to the remaining part of Fennoscandia, and in particular as compared to the areas with the highest density of fire occurrences in the satellite-based dataset.There is also a likely possibility that the reanalysis data do not represent the volumetric soil water conditions as well in Norway compared to other parts of Fennoscandia.The difference in period used in the model development should also be considered.In the Norwegian model set-up, only two years were used for training and two for evaluation.These years may not reflect the longer period used for Fennoscandia.In general, more trust is given to models trained on longer time series, enabling a better representation of the variability in hydrometeorological conditions.

Model transferability at the cost of additional potential predictors
Ensuring transferability of the data-driven model to climate projections come at a cost of limiting the type of potential predictors.In the case of predicting fire danger in a stationary climate, several additional metrics :::::::: predictors are expected to improve the prediction accuracy.Two such predictors are latitude and month of yearthat : , ::::: which could guide the model to differentiate between important hydrometeorological predictors depending on the season.For example, it is expected that SPEI3 has a different effect on fire danger when the three months cover ::::::::::: accumulation :::::: period ::::: covers : a snow accumulation period as compared to the growing season.However, such predictors are not suitable for a non-stationary climate as the snow and growing season ::::::::::: characteristics : are expected to changein characteristics such as timing and length : , ::: e.g.::: the :::::: timing ::: and ::::::: duration, and the relation to hydrometeorological variables in the current climate, :::::: indices :::: may ::: thus : no longer be valid.

The effect of the type of fire data chosen as target variable
In this study, we selected a satellite-based fire occurrence dataset as the target variable for the main analysis, and a national fire occurrence record as an alternative target variable for comparison.The lack of small fires in the satellite-based dataset was particularly notable when compared with the Norwegian dataset for April-September 2018 (Fig. 3).During 2018, Norway experienced a record high number of grass and forest fires (DSB, 2019).However, many of these fires were small and rather quickly extinguished, and thus not captured by the satellite.FWI outperformed the data-driven model trained on the satellitebased target variable, in predicting the Norwegian fire occurrence dataset.This was unsurprising :: not ::::::::: surprising, as the datadriven model was not trained on the Norwegian dataset, small fires in general, and for most of Norway, no fires at all.A data-driven model trained on small fires in Norway considerably improved the prediction ability, despite that only two years were available for model training.
It is not given which of the data sources for fire occurrence is better to use as target variable.A benefit of national records is that they typically have registered most fire occurrences, including small fires.However, the recording procedures and information logged may have changed over time, and vary from country to country (e.g.Aalto and Venäläinen, 2021).In addition, there are large differences among countries in the period coverage of historical fire recording, and the availability of such datasets, limiting transnational studies.Satellite-based burned area products are typically readily available, consistent across country boundaries and exist for longer periods than what can be found for many national recordings.They also have a global spatial coverage, which allows for a global ::::::::: large-scale application of the proposed methodology, such as an analysis of different drivers in different regions. :: As ::: an ::::::::: alternative :: to :::::: burned :::: area, :::::::::::: satellite-based :::::: active ::: fire ::::::: products :::: can :: be :::: used :: to :::::::: construct :: a ::: fire ::::::::: occurrence ::::::: dataset.Whether or not the lack of small fires in the satellite-based products is a limitation or not, depends on the objective.For example for forecasting, monitoring or projections used for fire preparedness planning in Norway, capturing small fires is vital :: as ::::: small :::: fires :::: have ::: the ::::::: potential ::: to :::::: develop :::: into ::::: large :::: fires :::: with ::::::::: devastating ::::::: impacts.It is worth noting that small fires are not necessarily small following natural conditions, but may be so following the wildfire preparedness and suppression in the area.
Thus, predicting small and large fires may be of similar importance for a region.However, small fires can be ::: fires :::: that :::: stay ::::: small, ::: are of less importance for objectives related to projected changes in fire occurrences relevant to : in ::::: terms ::: of large-scale changes in emissions, albedo or ::: and ecosystem functioning.
First, a data-driven model can be applied as an additional and independent prediction model, as already mentioned in Sect.4.4.
It can either be fire danger probability mapping as exemplified in this study, or it can be combined with ignition and/or spread, for example in a similar way as is done in process-based global fire models.Depending on the construction of the model, it can be applied jointly with both fire weather indices and process-based global fire models.Assessments using multiple models of fundamentally different construction can improve the trust in the predictions, and the knowledge of the inevitable prediction uncertainties.
Second, a data-driven approach can automatically sort out the metrics that are important predictors, and omit the remaining.
A benefit of a data-driven model as developed here, is that the approach can easily be transferred to other regions.This is in opposition to FWI, which is developed for boreal forests, and should be used with caution when applied for other biomes (Bedia et al., 2018;Dowdy et al., 2009).In this work we used a satellite derived fire dataset and globally available hydrometeorological variables, which can all be obtained for other regions around the world.This would only require additional training on the local settings, but would not require a new workflow or model implementation.This regional transferability can be combined with the flexibility of implementing different sets of potential predictors and target variables, allowing for regional and application specific analysis.This was exemplified by the present study, in which a data-driven approach was applied for fire prediction :::::::: predicting ::: fire :::::: danger ::::::::::: probabilities on a understudied region of the boreal region with special ::::: highly ::::::: varying hydroclimatology.
A notable improvement was found when using the local fire occurrence dataset to train the model, illustrating the potential of high-performing data-driven models adapted to local conditions, when high-quality target data is available.The difference in the selection of dominant predictors between the data-driven model constructed for Fennoscandia and Norway, exemplified that certain relationships are more important in some regions than others.The flexibility in spatial domain and predictors also allows for large-scale analysis, more in line with process-based global fire models in terms of constructing regional-independent relationships between drivers and fires.constructing ::::::: monthly ::: fire :::::: danger ::::::::: probability :::::: maps, and as a tool for identifying dominant predictors.Data-driven models have a high degree of flexibility to make them applicable for specific domains :::::: making :::: them ::::::: suitable ::: for ::::::::: adaptation :: to :::: other ::::::: regions and applications.Thus, we regard data-driven models as valuable contributions in a wide range of applications related to fire monitoring, forecasting and projections.

Conclusions
The data-driven approach was found suitable to identify dominant predictors for fire occurrence and to construct spatiotemporal resolved fire danger probability maps in Fennoscandia.Anomalies in the volumetric soil water in soil layer 2 (7-28 cm) were found to be the dominant predictor, followed by monthly mean of daily maximum temperature and volumetric soil water in the deepest soil layer (layer 4; 100-289 cm).Other selected predictors were related to wind speed, precipitation, snow cover and fraction of burnable area.The selected predictors emphasise the importance of other predictors than weather alone, as has traditionally been used for fire weather indices.In addition, the variation in the type of predictors emphasises the complexity in driving mechanisms for fire occurrence and the value of a bottom-up approach to automatically identify the most important predictors.
The following concludes our research questions presented in the introduction: 1.The data-driven model for Fennoscandia was comparable : to : (and slightly outperforming) the Canadian Fire Weather Index (FWI), which is ::: was : developed for similar biomes and latitudes as Fennoscandia.The temporal rank correlations of the fire danger probability maps produced by the two approaches showed large spatial variability, pointing to the value of including more than one approach when mapping fire danger.
Table 1.Target variable (fire occurrence), and potential predictors included in the main analysis.All normalised anomalies are calculated based on each corresponding month's mean and standard deviation from the reference period 1991-2020.The same reference period was used when calculating SPI and SPEI.

Figure 1 .
Figure 1.Flow chart of the data-driven approach for Fennoscandia.Sections given in the flow chart refer to sections in the text where detailed descriptions can be found.Grey, green and dark brown boxes represent steps relevant to model training, model evaluation and both, respectively.

Figure 2 .
Figure 2. Distribution of fire occurrences in the satellite-based target variable over a) time (month and year) and b) space (0.25 • resolution).

Figure 3 .
Figure 3. Distribution of fire occurrences in Norway using a) the satellite-based dataset and b) the Norwegian dataset.The figures show the spatial distribution over Norway (map), and the temporal distribution over the period 2016-2019 (bar plot), as defined by the spatial and temporal domain of the Norwegian fire occurrence dataset.Note the different y-axes of the two bar plots.

Figure 5 .
Figure 5. Test set ROC curve and ROC-AUC score using the data-driven model as compared to using monthly mean FWI (FWI_mean) and monthly max FWI (FWI_max).

Figure 6 .
Figure 6.The importances of the predictors selected for the final data-driven model estimated using the permutation importance of the test and training set separately, as well as the impurity-based feature importance of the training set.The bar height of the permutation importance represents the mean, and the error-bar represents the standard deviation of ten random shuffles of each predictor.::: The ::: bars ::: are :::::: ordered :: by ::: the ::::::::: permutation :::::::: importance :: of ::: the ::: test ::: set.

5103. 5
Model evaluation using the Norwegian fire occurrence dataset 515satellite-based as compared to the Norwegian fire predictions.Due to the low number of fire occurrences in the satellite-based target variable, the corresponding ROC-AUC score must be interpreted with caution.

Figure 7 .
Figure 7.The bivariate and univariate distributions of the anomaly in volumetric soil water in soil layer 2 (swvl2_anomaly) and predicted fire danger probabilities.Results are shown for a) the training set, and b) the test set.The distributions are given separately for the fire data points (red) and no-fire data points (blue) from the satellite-based target variable.The histograms show the relative frequencies, and the box plot whiskers define the 5th to 95th percentile range.

Figure 8 .
Figure 8. Fire danger probability maps for April-September 2018 using a)-f) the data-driven model predictions, g)-l) FWI_mean, and m)-r) FWI_max.Blue markers show fire occurrences using the satellite-based fire occurrence dataset.Colour axes are truncated at the 5th and 95th percentile.

Figure 9 .
Figure 9. Spearman rank correlation of a) the data-driven model and FWI_mean, b) the data-driven model and FWI_max, and c) FWI_mean and FWI_max.The correlations are calculated using the test set.Hatches indicate regions of effective p-value (i.e.p-value accounting for autocorrelation) smaller than 0.01.

4. 5
Added values of a data-driven model and ways forward A data-driven (statistical/machine learning) model differs fundamentally from a process-based global fire model or fire weather index.Process-based models use established or assumed relationships between various metrics :::::: indices and fire occurrence to construct fire danger or fire occurrence models.While a data-driven model is also based on process understanding in the selection of metrics :::::: indices : tested as potential predictors, it differs by explicitly accounting for fire occurrences in relation to the state of the predictors when constructing the :: the ::::::::::: construction ::: the model.
By limiting the potential predictors to those available in most climate model when developing the data-driven model, the final model allows for analyses of future changes in fire danger probabilities given different climate scenarios.Although outside of the scope of this study, using ::::::: applying : our model in future climate scenarios can give valuable new insight into :: as : what to expect of future changes in fire danger probability in Fennoscandia ::::: under ::::: future :::::: climate :::::::: scenarios.In summary, our study has demonstrated the value of a data-driven model as an independent prediction model ::::: model ::: for ::::::::::