A Global Probability‐Of‐Fire (PoF) Forecast

Accurate wildfire forecasting can inform regional management and mitigation strategies in advance of fire occurrence. Existing systems typically use fire danger indices to predict landscape flammability, based on meteorological forecasts alone, often using little or no direct information on land surface or vegetation state. Here, we use a vegetation characteristic model, weather forecasts and a data‐driven machine learning approach to construct a global daily ∼9 km resolution Probability of Fire (PoF) model operating at multiple lead times. The PoF model outperforms existing indices, providing accurate forecasts of fire activity up to 10 days in advance, and in some cases up to 30 days. The model can also be used to investigate historical shifts in regional fire patterns. Furthermore, the underlying data driven approach allows PoF to be used for fire attribution, isolating key variables for specific fire events or for looking at the relationships between variables and fire occurrence.


Introduction
Wildfires play a crucial role in the Earth system influencing ecosystems, communities, air quality, and the carbon cycle (Moritz et al., 2014).Catastrophic wildfire effects include agricultural and infrastructure damage, destruction of wild habitats, and loss of both animal and human life.Wildfire management and prevention strategies can help mitigate some of these effects, but these depend on accurate forecasts of fire occurrence.
The stochastic nature of wildfire ignitions makes forecasting specific events extremely challenging.Beyond ignitions, the two remaining prerequisites for fire activity, fuel and moisture, are intricately linked to the land surface state.Therefore, it is feasible that a probabilistic fire forecast can be achieved with a reasonable degree of accuracy by considering the land surface state.Historically fire danger forecasts have been derived by linking weather variables with fire activity to produce an index, the most widely used is the Fire Weather Index (FWI, Van Wagner, 1974).The FWI provides only an approximate estimation of the state of fuel moisture based on the impact of meteorological conditions on a fixed fuel bed typical of the Boreal forests.It neglects to account for the moisture state of living vegetation, the type of vegetation or the abundance of fuel available to burn in the event of a fire.As a result, the FWI overestimates fire danger in fuel limited regions and occasionally regions with high live moisture content (Di Giuseppe, 2023).Furthermore, as the FWI was developed specifically for Canadian forests, the interpretation of values is complicated when considering different biomes (Viegas et al., 1999;Wotton, 2009).
Previously, process-based models have attempted to describe fire danger based on the land surface, and to a degree the atmospheric state from climate time scales (e.g., Mangeon et al., 2016;Thonicke et al., 2010) to specific events (e.g., Anderson et al., 2015).The complexity in the relationship between vegetation and fire occurrence make probabilistic wildfire occurrence difficult to describe using process-based modeling.Recently, the use of data driven methods has emerged as an option to overcome the complexity of describing the underlying processes (e.g., Kondylatos et al., 2022;Zhang et al., 2021;Zhu et al., 2022).A comparison between the two methods by Leuenberger et al. (2018) showed several advantages to using the data-driven approach.Data-driven methods are effective when required input observations are available; however, this is not an option when generating future forecasts.Here, we propose a global probabilistic fire forecast which derives key input variables using state-of-the-art numerical weather prediction (NWP) and land surface modeling and then uses a data-driven approach to estimate the likelihood of fire occurrence on a given day for multiple forecast lead times.

Materials and Methods
We propose a modeling framework to provide a forecast Probability of Fire (PoF) on a given day within a 9 by 9 km grid cell using data-driven statistical methods informed by observations and an Earth system model.As input, we used analysis output from the Integrated Forecasting System (IFS), which is the operational NWP system used by the European Centre for Medium-Range Weather Forecasts (ECMWF).The IFS is initialized every 6-hr and provides both NWP and land surface output at hourly out to seasonal forecast ranges.As time varying NWP input, we used daily mean precipitation, 10 m wind speed, 2 m dew point temperature, and 2 m temperature taken from the re-analysis product (ERA5-Land, Muñoz Sabater, 2019), which assimilates observations to improve accuracy of modeled variables.Fuel characteristics used as input were computed following McNorton and Di Giuseppe (2023) using additional land surface variables including vegetation cover, vegetation type, soil moisture at 4 different levels, surface pressure, and skin temperature.These characteristics include both dead and live fuel load and moisture, which were further divided into wood and foliage components.Furthermore, as input we used time-varying Leaf Area Index (LAI) for both low (shrubs and grasses) and high (forests) vegetation, based on satellite observations, as described by Boussetta and Balsamo (2021).The remaining input variables were taken from static land cover maps for urban fraction (McNorton et al., 2023), orography and vegetation type (Boussetta et al., 2021).This resulted in a total of 17 input features, 4 directly from NWP, 7 from fuel characteristics, 2 LAI estimates, and 4 land static surface maps.
Using the inputs described, we trained two tree-based data-driven models, Random Forest (RF, Breiman, 2001) and Extreme Gradient Boosting (XGBoost, Chen & Guestrin, 2016), and interpret the output of the classification models as estimates of the likelihood of fire occurrence trained using active fire observations.A probabilistic forecast approach was chosen given a binary classification would almost always return a no fire event as the probability values would fall below 50%.Both RF and XGBoost methods are supervised algorithms which involve training an ensemble of decision trees to resolve the classifier problem.The RF approach randomly subsamples the data to train individual trees which are then clustered to form the final decision tree model (Breiman, 2001).XGBoost improves the performance of decision trees by sequentially adding models to correct errors made by previous ones and uses regularization techniques to prevent overfitting (Chen & Guestrin, 2016).The optimization of XGBoost makes it computationally efficient when compared to RF.In the classifier approach we define a positive hit as an active fire detection within the 9 × 9 km grid cell on a given day.
For training, we used daily data from the MODIS Collection 6 and 6.1 Active Fire Data set between 2010 and 2014 (Giglio et al., 2020).The data set uses retrievals in the mid-infrared and a contextual algorithm (Giglio et al., 2016).The data set likely misses many fire events as it only provides a snapshot at the overpass time, missing fires occurring before and after the overpass window.Furthermore, spurious detections are also likely for non-wildfire events for example, gas flaring.Where possible, these detections were removed using a spurious signal mask.To align with input data, we gridded the active fire data to the same resolution (∼9 km, daily), resulting in 2.9 × 1010 data points (5,136 longitude, 2,560 latitude, and 2,191 days) which was too large for training either model given the computing resources available.To reduce the training data size whilst maintaining a representation of spatial and temporal variability in the trained model we first removed all non-land points from the data set.Of the remaining points we randomly sampled 5% of the data for training, resulting in 3 × 10 8 data points for training.The RF training had increased memory demands, for this only 2% of the data was sampled.The resulting models predict the probability that at least one MODIS active fire detection would be made within a ∼9 km grid cell for a given day, which we assume to be a proxy for the PoF occurrence.
The trained model was then used at multiple forecast lead times using operational NWP forecasts from the IFS.These were simulated from hourly output averaged daily to provide 1-30 days forecasts.The regional and global skill of each forecast lead time was evaluated against a monthly climatology of the XGBoost model output.
Remaining input features can either be taken from observations or model predictions, here we used a combination of both.For example, operationally LAI is taken from a climatology as forecast LAI is not available in real-time; however, for historical evaluation we used observation-based LAI from the CONFESS Project (Boussetta & Balsamo, 2021).With further model development more variables will be derived in the model forecast with the aim of improving forecast skill.

Results
The XGBoost and RF trained models were evaluated globally using daily MODIS active fire observations for a period independent from the training, 2015-2019.The resulting models were also compared with a monthly climatology derived using the XGBoost method for the entire 2015-2019 period to evaluate their relative forecast performance.Given the binary classifier approach resulted in few positive results, as the PoF on a given day for a given grid cell was rarely greater than 50%, typical evaluation metrics, such as the confusion matrix, were not possible.Instead, we focused on the probabilistic skill of the model.

Probability of Fire (PoF) Evaluation
A receiver operating characteristic (ROC) curve was used to show the performance of the models by comparing the probabilistic false positive rate with the true positive rate for land points at ∼9 km resolution at daily resolution between 2015 and 2019 (Figure 1).The true positive rate represents the proportion of correctly forecast fire events, whilst the false positive rate represents the proportion of forecast fires in which no fire occurs, typically as the true positive rate increases so to does the false positive rate.The large number of values which occupy extremely low probability space, for example, in desert regions, were removed by only including probabilities greater than 0.1%.The area under the ROC curve (AUC) provides a metric for assessing the skill of the model in distinguishing between a positive fire event or no fire event, with a score of 1 illustrating a perfect model.Results show, when considering all vegetation types, the XGBoost method produces the highest AUC score, 0.818, higher than the RF method, 0.746, and the monthly climatology based on XGBoost, 0.810.By removing probabilities less than 0.1% these scores are artificially decreased.For specific vegetation types XGBoost model skill is high, for example, short grass (AUC: 0.829) and broadleaf savannah (AUC: 0.820); however, it is noticeably lower for needleleaf forests (AUC: 0.719).The RF (AUC: 0.644) only slightly outperforms a random forecast (AUC: 0.5) over needleleaf forests, indicating low model skill.
A reliability curve, which is a measure of how well the predicted probability reflects the true PoF occurrence, was generated by binning predicted fire activity into one percentile bins from 0% up to 20% (Figure 1).The frequency of predictions above 20% become too small to be considered as a reliable metric, although any value above this can be considered to represent extreme fire danger.Model data was evaluated against observed fire frequency based on all points within the target bin, where an ideal model would follow a 1-to-1 line.Across all vegetation types the highest model skill is found when using XGBoost with RF providing reasonable skill at low fire probabilities but underestimating fire occurrence as PoF increases.Therefore, the RF model is unsuitable for more extreme fire events where fire danger is considered high.XGBoosts produced reliable forecast probabilities for all vegetation types when compared with both RF and the XGBoost climatology, although for broadleaf savannah, crops and short grass the model overpredicts PoF when values exceed 10%, which rarely occurs.For needleleaf forests the XGBoost model offers reliable prediction when PoF is <5%, after which the model overpredicts PoF; however, the frequency of occurrence for predictions about 5% is very low.For broadleaf forests the model accurately predicts PoF continuously up to the 20% threshold considered.The relative improved performance when compared with the XGBoost climatology shows modeling time specific PoF results in a more reliable forecast.
For qualitative purposes we present normalized histograms of modeled PoF when fires are and are not detected, these show the distribution of observed active fires are skewed toward higher probabilities (Figure 1).When considering points with PoF larger than 0.1%, we find 35% of all fires occur at probabilities between 0.1% and 1%; which is unsurprising considering these make up over 80% of all points.

PoF Forecast Skill
The predictive capability of the PoF model for operational forecasting is dependent on the accuracy of input forecasts.To evaluate model skill, we used input from day-1, 3, 10, and 30 forecast times and compared the results with observed active fire from MODIS for the target day between 2015 and 2019.The model setup was as

Geophysical Research Letters
10.1029/2023GL107929 previously described, however operational NWP forecasts provided input to the model, which differ from the ERA5-Land data used to train the model (Muñoz-Sabater et al., 2021).Whilst both input data sets were used at the same spatiotemporal resolution (daily average and ∼9 km) there are two main differences.ERA5-Land assimilates more observations, providing a more realistic representation of the land and atmosphere state relative to the operational forecast.However, ERA5 (Hersbach et al., 2020) atmospheric data is used as boundary conditions for ERA5-Land, which is at a coarser resolution (∼25 km) than operational forecast data (∼9 km).A further consideration is whilst operational high-resolution simulations provided input for 1-, 3-, and 10-day forecasts, the production of the 30-day forecast used the operational extended-range forecast simulated at ∼36 km resolution, which likely limits model skill.
Given model performance previously assessed, we only evaluated the XGBoost model, details of which are given by Supporting Information S1.The ROC curve and reliability curve for multiple forecast lead times show similar model skill between day 1 and day 10 (Figure 2).30-day forecasts failed to outperform the climatology of the 30day XGBoost forecast.The decrease in model skill was most evident when forecasting relatively rare high PoF values (>0.05), the frequency of which are reduced in the climatology because of averaging multi-year values.This compensates for the typical over-prediction of high probability events by the model at all lead times.The reduced resolution and declining NWP skill at longer lead times further explains this reduction in PoF skill for 30-day forecasts.These results show that whilst model skill may be present in the 30-day forecast it is difficult to quantify.
Improved forecast skill could be attained by training lead-time specific models which would overcome model bias which emerges at longer forecast times.If such models were implemented, it would be recommended they were retrained with each new model cycle to ensure consistency between training and operational input.Currently, we have trained the model using the most realistic input variables available, ERA5-Land (Muñoz Sabater, 2019), hence biases from the operational forecast may limit model skill.These biases are expected to reduce as the model forecast continues to improve toward closer alignment with ERA5-Land.To evaluate regional skill at different lead times we compared the observed total fire count from MODIS observations gridded to 9 km resolution between 2015 and 2019 with our model forecasts for eight fire prone regions defined by Giorgi and Francisco (2000).The summed probabilities over all ∼9 km grid cells within a domain were considered representative of the predicted total fire count (Figure 2).Regions with well defined fire seasons (Western North America, Amazon, Western and Eastern Africa, and the Mediterranean) were well represented at all forecast lead times, although the maximum amplitude of fire counts for the Western North America fire season was typically overestimated, particularly for the 30-day forecasts.A similar overestimation in modeled fire activity was observed in Australia, although the timing of peak fire activity was typically in good agreement with observations.The interannual variability in regions with seasonal fire were also well represented, for example, Western Africa.
The model capability to accurately represent spatial variability in fire activity at various lead times was limited by NWP skill.To illustrate this, we show the predictive skill of PoF during the peak of the 2019/2020 bushfire season in Australia, one of the largest in recorded history (Deb et al., 2020).As a case study we selected the 31 December when most of the active fires were in New South Wales in the southeast of the country (Figure 3).The examples shown were made using the operational version of the model, which bins PoF values into 5 fire danger categories, low (<0.5%),medium (0.5%-1.0%), high (1%-2%), very high (2%-4%), and extreme (>4%).Results highlight the spatial accuracy provided by forecasts out to 10-day and show some regional agreement between 30-day forecasts and fire activity.The lack of representation of vegetation in the FWI was evident by the high values produced in the sparsely vegetated central Australian regions.The FWI shown was not subject to forecast errors as it was derived at the analysis timestep, providing a now-forecast without any lead time.Despite this, it still underperformed relative to the PoF model at all forecast times, illustrating the improved predictive skill of the model.

Fire Attribution
Several of the 17 input features of the PoF model were partially correlated; however, given they were features which were not fully correlated, were readily available and over-fitting is unlikely given the high volume of training data, we opted to keep all variables to maximize model performance.However, given there was partial correlation, traditional feature selection methods make attribution difficult to interpret, instead we interpreted the results by looking at both Shapley values (SHAP) and the interaction of those values (Lundberg & Lee, 2017).The SHAP value denotes the importance of any given feature in a model, with a positive SHAP value indicating a positive impact on the model prediction and a negative SHAP value a negative impact on the model prediction.
The 2 m dewpoint temperature, which along with 2 m temperature can be used to infer humidity, emerged as the most important single feature globally.This was unsurprising given it is related to both temperature and moisture availability.Individual importance of variables relating to fuel load and moisture were lower as they include multiple well correlated variables, which when combined become more important.
Several interesting features emerged from the SHAP calculations, for example, increased 2 m dewpoint temperature displayed increased SHAP values.However, increased 2 m temperature resulted in decreased SHAP values in response to the uncorrelated component in the relationship between 2 m temperature and dewpoint temperature.Therefore, for a given 2 m dewpoint temperature, an increasing 2 m temperature typically acts to reduce the PoF (Figure 4).For other variables non-linear relationships were identified, for example, SHAP values for precipitation followed an exponential decay function, which is expected as beyond a certain precipitation rate the flammability of the fuel remains unchanged.As fuel load increased the SHAP values increased, however, the contribution of fuel load to PoF was dependent on the abundance of other fuel types, as expected.For high fuel loads the contribution of each variable generally became smaller as the remaining 3 fuel load variables compensated to increase PoF in these fuel rich regimes, whilst the combined SHAP value contribution from all 4  SHAP values for fuel moisture content, a commonly used variable for fire danger forecasting (Yebra et al., 2013), typically showed a step change decrease above a certain threshold.Similar thresholds are reported in the literature for both DFMC and LFMC.Argañaraz et al. (2018), supported by the subsequent study of Rao et al. (2023), found LFMC thresholds for wildfire ignition of 55%-106%.Whilst Masinda et al. (2021) and Wotton (2009) found DFMC ignition thresholds of 10%-40% and 17%, respectively, with variability for both DFMC and LFMC thresholds being controlled by vegetation type.Here, DFMC was separated between fine and coarse fuels with average ignition thresholds, based on SHAP values, of 42% and 70%, respectively, when averaged across all vegetation types.The resulting DFMC threshold was therefore dependent on the allocation of dead fuel between fine and coarse fuel.This represents the point above which the SHAP values related to DFMC noticeably decrease but as previously mentioned the result of the correlation between variables mean this does not necessarily represent a realistic ignition threshold.We found average LFMC ignition thresholds were difficult to distinguish because they are vegetation dependent but also vary as a function of phenology.For example, during the growth season, LFMC is usually high whilst vegetation is active; however, these high LFMC values occur during active fire seasons.
The attribution for fire activity performed here, in part, neglects other controlling influences on fire activity, for example, ignition and land use change.However, this evaluation tool could still be used at climate time scales to identify regime shifts in the driving variables of fire activity (e.g., Kelley et al., 2019).Furthermore, spatial patterns and plant specific attribution can provide insight into regional fire activity; however, that is beyond the scope of this study and will be investigated further in a follow-up study.

Conclusions
The stochastic nature of wildfire activity limits predictability of specific events; however probabilistic forecasting has previously been used to provide a representation of fire danger (e.g., Di Giuseppe, 2023).Here, we move beyond conventional fire danger indices to produce a probability of fire, PoF, forecast at various lead times using a data-driven model informed by a combination of NWP variables, land surface modeling and observations.The intention is to provide a global high-resolution daily forecast which can be utilized for wildfire planning and management strategies.As the model was trained using observed fire activity it encompasses, to an extent, processes which are difficult to represent in physical-based models, for example, the human-drivers for fire activity.The model was trained globally at ∼9 km resolution to provide a daily grid cell specific PoF, an easily interpretable metric.The model can be simulated at any NWP lead-time and model skill is demonstrated up to 10 days in advance.Model skill beyond this may be limited by the degraded resolution of extended-range forecasts, it is recommended that high-resolution forecasts (∼9 km) be used for all forecast lead-times when available.Alternatively, lead-time specific models could be trained to account for forecast biases.Although not explored in this study, the modeling framework could easily be adapted to provide a forecast probabilistic spread using input data from the operational NWP ensemble (Leutbecher & Palmer, 2008).
Two versions of the data-driven model were trained, using RF and XGBoost.XGBoost provided better skill, although lower memory requirements meant it was trained on a large sample of data.Furthermore, software packages for attribution are readily available with XGBoost.The skill in predicted fire activity in certain vegetation types, for example, Needleleaf Forest, was relatively low, although some skill was still found.The model provided a high forecast skill for other vegetation types, for example, Broadleaf Savannah.The model skill at predicting actual fire activity is limited by the accuracy of the observing systems available (Wooster et al., 2021).
As more variables become available for forecast input in the future, for example, prognostic LAI, model skill is expected to further improve, as currently input variables limit the skill at longer lead times.The addition of input variables that control ignition, whether human or lightning activity, are expected to further improve model skill.Furthermore, the impact of fire occurrence on fuel availability for future burns could be considered within the fuel model.Currently, variables directly correlated with fire activity have been used as input to the model; however, it is feasible that by performing regional or grid point specific training will result in improved model performance by indirectly representing processes not included in our input.The model, as described here, is used operationally at ECMWF.It provides a real-time 10-day fire probability forecast and is available through the ECCharts web platform hosted at https://eccharts.ecmwf.int.
The attribution of input variables to the PoF can be made using evaluation tools, such as SHAP (Lundberg & Lee, 2017).Here we use these to identify relationships between specific variables and fire danger.Whilst these are sometimes simple linear fits, they are often more complex involving either threshold values or non-linear relationships.These attribution tools could be used to identify causes of extreme fire events and to evaluate controls on specific fire regimes both in the real-time and at climate timescales.
-driven model informed by satellite observations and an Earth System Model provides accurate fire forecasts upto 10 days in advance • The probability of fire forecast is implemented operationally in a numerical weather prediction model to provide real-time forecasts• Fire attribution is demonstrated and can be used for specific fire events or historical analysisSupporting Information:Supporting Information may be found in the online version of this article.

Figure 1 .
Figure 1.Receiver operating characteristic curves for a daily XGBoost model (blue), the XGBoost model climatology (orange), and a daily Random Forest model (green) of active fires using ERA5-Land input evaluated against MODIS observations for all land points globally between 2015 and 2019 for multiple vegetation types (left column).The same data represented using reliability curves (middle column).Normalized histograms of the modeled probability of fire divided between observed non-fire and fire detections for all vegetation types (right column).

Figure 2 .
Figure 2. Receiver operating characteristic curves for daily Probability of Fire (PoF) model at multiple lead times and monthly mean PoF model climatologies using input from the Integrated Forecasting System evaluated against MODIS active fire observations for all ∼9 km land points globally between 2015 and 2019 (top left).The same data represented using reliability curves (top right).Timeseries of 5-day average active fire counts at ∼9 km resolution from MODIS active fire observations compared with PoF model at multiple lead times.Also displayed is the R-value and RMSE for each forecast lead time (bottom three rows).

Figure 3 .
Figure 3. Active fire detection map over Australia from MODIS for 31 December 2019 (top left).The Fire Weather Index using analysis forcing for the same date with colorscale representing index value (top right).Probability of Fire model forecast using 1-day (middle left), 3-day (middle right), 10-day (bottom left), and 30-day (bottom right) lead times from the Integrated Forecasting System for the same date.

Figure 4 .
Figure 4. Ordered Shapley values (SHAP) for 17 features used to inform the XGBoost fire model, where color indicates feature value and dots represent individual predictions (top).Dependency plots displaying the change in feature value with respect to the SHAP value, where the color indicates a second feature value given by the colourbar and dots represent individual predictions globally for all times (bottom).