Application of remote sensing and machine learning algorithms for forest fire mapping in a Mediterranean area

study, we developed five new hybrid machine learning algorithms namely, Frequency Ratio-Multilayer Perceptron (FR-MLP), Frequency Ratio-Logistic Regression (FR-LR), Frequency Ratio-Classification and Regression Tree (FR-CART), Frequency Ratio-Support Vector Machine (FR-SVM), and Frequency Ratio-Random Forest (FR-RF), for mapping forest fire susceptibility in the north of Morocco. To this end, a total of 510 points of historic forest fires as the forest fire inventory map and 10 independent causal factors including elevation, slope, aspect, distance to roads, distance to residential areas, land use, normalized difference vegetation index (NDVI), rainfall, temperature, and wind speed were used. The area under the receiver operating characteristics (ROC) curves (AUC) was computed to assess the effectiveness of the models. The results of conducting proposed models indicated that RF-FR achieved the highest performance (AUC = 0.989), followed by SVM-FR (AUC = 0.959), MLP-FR (AUC = 0.858), CART-FR (AUC = 0.847), LR-FR (AUC = 0.809) in the forecasting of the forest fire. The outcome of this research as a prediction map of forest fire risk areas can provide crucial support for the management of Mediterranean forest ecosystems. Moreover, the results demonstrate that these novel developed hybrid models can increase the accuracy and performance of forest fire susceptibility studies and the approach can be applied to other areas.


Introduction
Forest fire disaster is considered as one of the main causes of dramatic depletion of the forest ecosystems worldwide (Venkatesh et al., 2020) due to both anthropogenic or natural processes (Sachdeva et al., 2018). Fire regimes are becoming even more pronounced in many regions due to an increasing effect of recent global warming, with increasing impacts on human well-being resources and ecosystem function processes (Hantson et al., 2016;Li et al., 2019;Zema et al., 2020). The impacts include deterioration of soil ecology and water hydrology (Santana et al., 2014;Zema et al., 2020), and land degradation and soil erosion (Zema et al., 2020). Consequently, this disaster makes the ecosystem out of balance as an ultimate stage. Despite its dramatic impacts, forest fire plays essential roles in many forest processes, for example, forest fire influences the composition and successional stages (Amit Parashar and Sas Biswas, n.d.;Huebner et al., 2012), and it acts as a selective factor of the traits of plants (Escudero et al., 2000;Fernández-García et al., 2019;Keeley et al., 2011;Pausas and Vallejo, 1999). In the Mediterranean region, despite their ecological roles and the services they provide, forest ecosystems have been heavily impacted by fragmentation, degradation, and deforestation (Belinchón et al., 2009;Scarascia-Mugnozza et al., 2000). Furthermore, climate change exacerbates the deterioration of these ecosystems, due to rainfall decreasing and temperature increasing (Simioni et al., 2020). These phenomena, together with anthropogenic disturbance, place the Mediterranean forest ecosystem in an alarming situation. Not only these factors but also forest fire is one of the most issues which affect the Mediterranean forest (Oliveira et al., 2018). Olivella et al. (2006) demonstrated that more than 50,000 fires burn an average of 600,000-800,000 ha of Mediterranean forest annually. Thus, in order to protect the functionalities of the forest ecosystems as well as their precious services and benefits for people's well-being, forest fire susceptibility maps remain important to fully understand and predict possible hazards that may occur in this environment. Moreover, establishing a forest fire susceptibility map as early as possible is of the utmost importance of land use planning (Bax and Francesconi, 2018). Such a map can also serve as a valuable reference to reduce vulnerability and could help to improve decision-making planning in ecological risk prevention (Hong et al., 2018).
Creating forest fire susceptibility maps requires the establishment of forest fire inventory as the target variable and some explanatory variables. Various studies used a set of data composed of climate data (rainfall, temperature, humidity, and wind), topography data (elevation, slope, and aspect), and land use data as explanatory variables (Tien Díaz-Avalos et al., 2001;Ganteaume et al., 2013;Krawchuk et al., 2006;Vecín-Arias et al., 2016;Verdú et al., 2012). Indeed, there is no common agreement to follow for the selection of appropriate ignition variables. The processing of explanatory data is the most challenging task in the elaboration of forest fire maps in large study areas due to various data sources and the modelling complexity (Bonham-Carter, 2014;Tien Bui et al., 2017). In an effort to process such datasets and to produce forest fire susceptibility maps, geographic information system (GIS) have proven to be useful for processing and overlaying geospatial data (El Hafyani et al., 2020;Teodoro and Duarte, 2013). Furthermore, the remote sensing approach and satellite images as a popular means of data collection play a crucial role in constructing forest fire susceptibility maps.
Several statistical bivariate and multivariate methods have been used widely for forest fire modeling such as multiple linear regression (Oliveira et al., 2012), Poisson regression (Wotton et al., 2003), and Monte Carlo simulations (Conedera et al., 2011). In recent years, an obvious trend in the use of machine learning algorithms in a wide range of natural hazard assessments disciplines such as landslide (Chen et al., , 2019Hong et al., 2016;Pham et al., 2018a), flood (Tehrany et al., 2015b), rainfall modelling (Pour et al., 2020;Pham et al., 2019), streamflow modelling (Pham et al., 2020), water quality modelling (Pham et al., 2021) and forest fire modeling (Arpaci et al., 2014), has been observed. Machine learning algorithms can handle non-linear and highly dimensional data and solve serious problems (Chen et al., 2019;Knudby et al., 2010;Recknagel, 2001). Pourtaghi et al. (2016) compared boosted regression tree (BRT), generalized additive model (GAM), and random forest (RF) for forest fire mapping of the Minudasht Township, Golestan Province, Iran. The results indicated that BRT had better accuracy than GAM and RF. Jaafari et al. (2018) applied five decision tree-based classifiers including, Alternating Decision Tree (ADT), Classification and Regression Tree (CART), Functional Tree (FT), Logistic Model Tree (LMT), and Naïve Bayes tree (NBT). Their results indicated that the ADT classifier performed the best accuracy. However, despite the wide use of these models in different fields such as environmental and ecological science, the selection of an appropriate model is still challenging because each algorithm exhibits some drawbacks leading to implications in its outcomes (Jaafari et al., 2019b;Tien Bui et al., 2019).
Given that the use of ensemble / hybrid models built by a combination of Machine learning algorithms and statistical methods increases the modeling performance (Tien Bui et al., 2014Truong et al., 2018;Akay, 2021) (Costache et al., 2020a) to assess flash-flood potential. They used a bivariate method based on Statistical Index (SI) and its combination with machine learning models including LR, CART, MLP, RF, and SVM. They found that the MLP-SI hybrid model achieved the highest accuracy in comparison to other models. Tehrany et al. (2019) used Logitboost ensemble-based decision tree (LEDT) and benchmarked it with SVM, RF, and Kernel logistic regression (KLR) and found that the LEDT ensemble model can achieve the highest predictive accuracy for the susceptibility of the forest fire. Nevertheless, few publications have been considered the use of the ensemble models for forest fire susceptibility modeling (Moayedi et al., 2020). In this study, five novel hybrid models by coupling the statistic approach with machine learning algorithms are used. Frequency Ratio was integrated into five machine learning methods, namely MLP, LR, CART, SVM, and RF for forecasting forest fire susceptibility. The research was conducted for the north of Morocco as the case study which is most affected by the fire at the national scale. To the best of our knowledge, no study so far has been conducted for forecasting and assessing of forest fire susceptibility in this area based on machine learning algorithms and GIS tools. The main goals of this paper are: (i) exploring the effectiveness of the hybrid models for forest fire forecasting in this study area (ii) producing spatial susceptibility maps using the proposed approaches to identify the critical areas which require an emergency intervention. Thus, the findings of this paper are expected to be a useful tool for providing a crucial direction for management of Mediterranean forest ecosystem.

Study area
The study area in this research is located in Tanger-Tétouan-Al Hoceima region, North of Morocco ( Fig. 1), covering Fahs-Anjra province and Tanger-Asilah prefecture with a total area of 1685 km 2 , which has a typical Mediterranean climate with oceanic influence ("DRATT", 2015). In this area, the temperature varies between 10 • C and 14 • C in winter and between 20 • C and 24 • C in summer, the annual precipitation can reach 635 mm, and the dry season is from May to mid-September ("DRATT", 2015), based on the Walter-lieth climate diagram, drawn by data weather from Climate Chart application, https://climatecharts. net/ (Harris et al., 2016) for station Tangier. The study area is characterized by different types of forest composed of mixed oak forests of evergreen and semideciduous forests (Ajbilou et al., 2006) which are characterized by an extend of invasive and flammable species diversity (Chebli et al., 2018). This diversity is a catalyst for forest fire disaster. Generally, forests in northern Morocco have the highest level and the greatest fragility due to fires (Chebli et al., 2018;Zidane et al., 2019). According to (Chebli et al., 2018), about 1185 ha of forest ecosystem in the north part of Morocco is destroyed by fires annually (43% of the total burned forest in Morocco). Though restoration efforts led by the National Forestry Conservation Agency may offset these losses, forests have been also destroyed by various human activities, for example, forest resources are exploited for other purposes, like cannabis cultivation, collection of fuelwood, and goat grazing (Chebli et al., 2020;Stambouli et al., 2005). Also, the north of Morocco experiencing rapid population growth, with a total population of approximately 3.2 million in 2014 Fig. 1. Location of the study area and forest fire inventory points.
which is characterized by a highly urbanized population with an increase of 2,45% per year. Whereas the rural population increased by only 0,21% per year (HCP, 2014), Most of this population development has occurred in and around forest areas. These forests have also suffered from soil erosion coupled with a decrease of natural vegetation which has led to degradation of the forest and pastoral cover and an increase of cultivated land (Chebli et al., 2018). Therefore, researches on forest ecosystem degradation in this region are necessary to accelerate the process of the restoration project and to provide a scientific contribution to forest fire modeling.

Data
In this study, and as it is well discussed by several studies Pourtaghi et al., 2016;Tien Bui et al., 2017), the first step to model forest fire is the establishment of the forest fire inventory map which corresponds to the past-occurred events presenting the target variable. In this case, historical data provided by (HCEFLCD, 2011) and previous reports, were used for preparing the forest fire inventory map of the study area. Moreover, data on fire occurrences can be obtained from MODIS (Moderate-Resolution Imaging Spectroradiometer) (Giglio et al., 2003). Users can use an existing freely-available hotspot data fire occurrences through the exploitation of the Fire Information for Resource Management System (FIRMS), https://firms.modaps.eosdis.na sa.gov/, to improve the quality of forest fire historical data. The preprocessing of the data was performed to exclude anomalous points based on the extent of fire occurrence and the presence of severe collateral damage. Moreover, forest fires mostly occurred during March, April, May, June, July, August, September, and October during the period 2005-2015, for this, the remaining months were removed from the analysis. A total number of 510 fire event locations were selected for this study. Forest fire points were randomly split into a 70% training dataset and 30% for validation. For training and testing sets the same number of non-forest fire locations were randomly selected using the random point tool in ArcGIS 10.4, across the study area. And randomly split into 70% training dataset and 30% for validation ( Fig. 1). Table 1 summarizes the total number of forest fire occurred during each month for the period 2005-2015. Based on this, all explanatory variables were selected for the same period to match fire occurrence period.
Based on previous studies, the explanatory variables for forest fire modeling are composed of four groups of data including topography, anthropogenic activities, climate, and vegetation Nami et al., 2018;Pourghasemi, 2016;Pourghasemi et al., 2016;Pourtaghi et al., 2016). These include; slope angle, aspect, elevation, distance to roads and residential, rainfall, temperature, wind speed, Land use, and NDVI (Table 2). All these variables were reclassified following the method outlined by (Tsangaratos et al., 2017).
Topography plays an important role as it can control the distribution of vegetation and wind speed as well as it has an important role in the velocity of rainfall and soil moisture (Chuvieco and Congalton, 1989;Nami et al., 2018;Schmidt et al., 2008;Tehrany et al., 2019). In this study, the Digital Elevation Model (DEM) obtained from Shuttle Radar Topographic Mission (SRTM) with a pixel size of 30 m, was used to derive topography data including slope, aspect, and elevation. Slope ( Fig. 2a) was calculated and divided into five groups including (1) Up-slope areas are more affected by intensive fires whereas down slopes areas are less affected (Lentile et al., 2006). Aspect (Fig. 2b) was calculated and classified into nine classes such as flat zones, north, north-east, east, south-east, south, south-west, west, and north-west. The elevation ( Fig. 2c) was composed of five classes as 0-62 m, 62.1-142 m, 142.1-243 m, 243.1-370 m, and 370.1-761 m, respectively. All topographic characteristics were calculated using Surface tool in spatial analyst tools available in ArcGIS 10.4 software.
Fire is more pronounced in forest ecosystems close to roads and populated areas due to the traffic accidents and the impact of human beings on the natural ecosystems uncontrolled (Tien . The distance to the roads ( Land cover information is widely recognized as a proxy of fuel (Tien . Also, the Normalized Difference Vegetation Index (NDVI), is used to monitor photosynthetic activity and provides information on vegetation biomass and phenology (Jiang et al., 2006;Mohajane et al., 2018). In this study, two satellite Landsat8-OLI (Operational Land Imager) images, covering the study area (paths/ rows 201/035 and 201/036) were acquired from the United States Geological Survey (USGS) website used to extracted the land cover information. These two images were mosaiced. First, we applied radiometric and atmospheric corrections for both images using ENVI 5.2 software based on Dark Object Subtraction (DOS) algorithm (Chavez, 1996), then we extracted the study area. based on Ground-based survey data and high-resolution imagery from Google Earth, a total of 250 ground truth points were collected and the Maximum Likelihood supervised classification method using ArcGIS 10.4 was applied to classify the image into four class categories: water bodies, forests, croplands, and build-up/Bare lands. The produced land cover achieved an overall accuracy of 93% (Fig. 3c).
The NDVI factor was calculated using the near-infrared (NIR) and visible red bands through the following Equation (1) (Freden et al., 1973): The NDVI index ranges from − 1 to 1with the higher values represent healthy vegetation, and the lower values indicate non-vegetative cover. In our case, the NIR and RED bands are band 5 (0.85-0.88 µm) and band 4 (0.64-0.67 µm), respectively, of the Landsat 8 OLI image. We applied an NDVI thresholding to classify NDVI image (Fig. 3d) and we generated six groups:

Frequency Ratio (FR)
FR model is one of the bivariate statistical methods, it is used to describe the importance of classes of each explanatory factor on forest fire occurrence, it is used to define the ratio of the probability of forest fire occurrence to the probability of a non-occurrence for given attributes (Bonham-Carter, 2014; Lee and Talib, 2005). It has been used in different natural environmental hazard studies, for its advantages of ease of use and understanding (Costache, 2019a;Pradhan and Lee, 2009), such as landslide mapping (Nsengiyumva et al., 2019;Zhou et al., 2016), forest fire mapping  and groundwater potential mapping. The FR was calculate using the following Equation (2) (Lee and Talib, 2005): where: FR is the Frequency Ratio of class i of factor j, Np(LX i ) is the number of fire locations within class i of factor variable X , Np(X j ) is the number of pixels within factor variable X j , m is the number of classes in the factor variable X i , and n is the number of factors in the study area.

Multi-Layer perceptron neural network (MLP NN)
Artificial neural networks (ANN), inspired by biological neurons, are among artificial intelligence methods that recognize convoluted patterns in data (Skapura, 1996). ANN consists of various nonlinear computational elements which are operating in parallel and organized in patterns (Lippmann, 1987). ANN has traditionally been designed as a structure composed of an input layer and an output layer called a multilayer perceptron (Rosenblatt, 1958) which are connected by one or more hidden layers This structure can identify non-linear relationships (Basheer and Hajmeer, 2000;Rumelhart et al., 1985). MLP NN is trained in three phases including the feedforward, the backpropagation, and the adjustment of the weights (Basheer and Hajmeer, 2000;Lippmann, 1987). In the feedforward phase, the data broadcasts to each of the hidden layers with multiple weighted summations, then by reaching the output layer an activation function computes the output value. In backpropagation phase, the initial weights are selected randomly, then the predicted output for a given observation and the expected output for that observation are compared. When all the observations are given to the network, the total calculated error is distributed among the nodes in the network, then the weights are updated according to a generalized delta rule. The feed-forward and back propagating process is repeated iteratively until we achieve a minimal error.
In this study, we used Adam optimizer (Adaptive Moment Optimization) optimizer, which is developed by (Kingma and Ba, 2017), the first and second moments are obtained using the following equations: These two moments X and Y can be corrected using the following equations: The update the weights U is then expressed as follows: With δ is the initial learning rate, by default is set to 0.001. The main advantages of MLP NN are, its ability to process faster a large volume of data, pattern recognition, and it is distinguished by its ability of learning and parallel processing, data fusion, and handling noise (Ni, 2008).

Logistic regression (LR)
LR, a statistical model for solving binary problems, was developed by (Cox, 1958). In LR, the target variable is a categorical binary value and the output of LR is likelihood values, which specify the probability of the occurrence of a certain class based on the feature values. Mathematically, a LR is a special case of linear regression and is based on the central mathematical concept of logit, the natural logarithm of an odds ratio (Hosmer and Lemeshow, 2000). The plot of the simplest case of linear regression for one feature and one dichotomous outcome is two parallel lines, each corresponding to a value of the dichotomous outcome. It is a linear plot in the middle and curved at the ends, by computing the mean of these two outcomes. It is not easy to describe this S-shaped plot with a linear equation as the ends are not linear and the errors are not normally distributed or constant across the entire range of data (Peng et al., 2001). The key to solving this problem is LR as it applies the logit transformation to the target variable and predicts the logit of outcomes from features (Peng et al., 2002). The conditional mean of the dichotomous outcome in LR is based on the binomial distribution which is the only assumption of LR and denotes that there is the same probability across the range of feature values. LR describes the effect of features by determining the coefficients of them, handles both continuous and categorical variables but just binary categorical ones for the dependent variable (Menard, 2000).

Classification and regression tree (CART)
CART method, a binary univariate decision tree algorithm and nonparametric supervised classification and prediction method, was introduced by (Breiman et al., 1984). CART classifies the dataset by constructing a tree. The procedure is conducted by sorting the whole training dataset down from the root to some leaf nodes in a recursive manner. At each stage of the process, the root node is the ideal informative feature, and the division of the training dataset into subsets is conducted based on the splitting rules applied to a single feature. The splitting rules in a CART decision tree algorithm are the Gini Index and towing criteria (Karimi et al., 2019a). The splitting procedure maximizes the homogeneity of subsets by determining the proper features and their corresponding thresholds. CART is applied to the training dataset repetitively to complete the tree which happens when three parameters of the stopping rules are fulfilled, including 1-the minimum number of records in a leaf node, 2-the minimum number of records in a parent node, and 3-the maximum number of splits at each node (Quinlan, 1993). The stopping rules make the CART model robust and efficient with the least complexity. Several decision rules can be utilized for classification and prediction studies are determined by the constructed tree. CART is able to identify the most important features (Delen et al., 2013). Based on the importance of the features, CART divides a complex dataset into smaller subsets hence it creates simple solutions which can be understood and interpreted easily (Quinlan, 1993). Moreover, CART handles non-linear relations between the target variable and explanatory variables, process large datasets of variables including continuous and categorical, without any prior data preparation (Friedl and Brodley, 1997), makes no statistical assumptions, analyzes data in different measurement scales without the need to normalization (Qin et al., 2009), handles missing data and outliers (Delen et al., 2013)) and makes good visualizations of the relationships between the features (Karimi et al., 2019a)).

Random forest (RF)
RF (Breiman, 2001), a non-parametric supervised method applied for both classification and prediction (Costache et al., 2020a;Nefeslioglu et al., 2010), was developed by (Breiman, 2001). RF is composed of a combination of decision trees (Breiman, 2001). Each decision tree individually votes for assigning the most frequent class to the input data and the majority vote of the trees determines the class prediction (Breiman, 2001). In RF models trees are growing from different training data subsets to increase the diversity (Breiman, 1996) as a result, greater classifier stability is achieved (Breiman, 2001). The data subsets which are not included for the training of a tree are called out-of-bag which can be classified by that tree to assess the accuracy and the performance and calculate an internal unbiased estimate of the generalization error based on the number of trees (Breiman, 2001). Increasing the number of trees in the model, lead to the generalization error converges, and avoiding overfitting the data by RF. For building RF model, a suitable attribute selection measure is required to maximize dissimilarity measures between classes. The most frequent methods for selecting the attributes at each node are gain-ratio (Quinlan, 1993), Gini Index (Breiman et al., 1984), and Chi-square. When a tree is growing, each node is divided using the best split of a random subset of input features. This makes each tree to be less strong, but at the same time decreases the correlation between trees, as the result the generalization error is reduced and the model's accuracy is increased (Breiman, 2001). In addition, RF has the ability to run on large datasets efficiently, it can process massive amounts of input features, it gives estimates of the importance of the features used for modeling, it replaces missing data, and detects outliers through computing proximities between pairs of cases, and it is approximately robust to outliers and noise (Breiman, 2001;Rodriguez-Galiano et al., 2012).

Support vector Machine (SVM)
SVM developed by (Vapnik, 1963) at AT&T Bell Laboratories (Vapnik, 1995), is used for both classification and regression tasks and then, it was enhanced by (Boser et al., 1992) using an idea based on statistical learning theory. SVM is a binary classifier developed to find a linear hyperplane that separates two classes optimally, but it can be promoted to an n-class classifier (Belousov, 2002). SVM finds an optimal hyperplane for classification by projecting the input data into the higherdimensional Hilbert space (Yang et al., 2008). SVM minimizes an upper bound of generalization error by widening the distance of the hyperplanes separating the two classes (Huang et al., 2002;Karimi et al., 2019b). This action guarantees a low generalization error, independent of data distribution (Huang et al., 2002;Karimi et al., 2019b). In SVM models a penalty parameter beyond the margin of the hyperplanes is incorporated to consider misclassification errors (Vapnik, 1998). The penalty parameter makes a trade-off between the margin size and the number of error instances in which a larger value for the penalty parameter leads to smaller misclassification error and smaller margin size (Vapnik, 1998). SVM addresses non-linearity uses a mathematical function called kernel trick through to explore the data in a higherdimensional space (Statnikov, 2011). The performance of the SVM algorithm is dependent on suitable kernel functions which are, the polynomial kernel, sigmoid kernel, radial basis function, and linear kernel (Chapelle et al., 1999;Choubin et al., 2019). Furthermore, SVM prevents overfitting in the model and assures good generalization and classification performance (Huang et al., 2002). Continuous and categorical variables can be both processed by SVM effectively, and it can also handle non-linear data, complex and noisy data with outliers (Huang et al., 2002;Karimi et al., 2019b).

Methodology for forest fire susceptibility mapping
The computation of forest fire susceptibility mapping was performed through the following steps. Also, the graphical representation of the order of methodological steps is highlighted in Fig. 5.

Establishment of fire database
The database containing the 10 fire conditioning factors and 510 fire locations was processed in ArcGIS 10.4 software. In order to be included in the methodological procedure, all the conditioning variables were converted into raster format with the pixel size equal to 30 m. Given the fact that forest fire susceptibility represents a binary classification, the preparation of another dataset containing the non-fire locations was required (Tien . The use of a dataset representing the absence of the phenomenon in the modelling process can also increase the performance of the applied methods (Costache et al., 2020b). These non-fire points were randomly placed within the areas characterized by the negative values of NDVI which highlights the presence of water bodies and poor vegetation where the fire forest is almost impossible (Tien . Also, it should be mentioned that the number of non-fire locations is equal to those of fire locations. The fire locations were attributed value "1 ′′ and the non-fire locations were attributed value "0".

Setup of training and validating dataset
The validation is a mandatory step in the forecasting studies as it can be affected by a specific phenomenon. Thus, the fire and non-fire samples were divided into training dataset (70%) and validating dataset (30%) (Tien Pourtaghi et al., 2015). Therefore, 714 fire and non-fire locations will be used to train the models, while another 306 fire and non-fire locations will be involved in the validation of fire susceptibility maps. The Subset Features tool available in ArcGIS 10.4 was employed in order to establish both the training and validation dataset.

Computing FR coefficients
Since FR coefficients are used as input to the machine learning models, these values were calculated only for the training dataset. The use of FR coefficients as input in natural hazards modelling is considered an effective solution to assign the same type of values to the classes/ categories of all (Costache and Tien Bui, 2019;Tehrany et al., 2015a). Thus, by involving the FR values within the models the following ensembles will result MLP-FR, LR-FR, CART-FR, RF-FR and SVM-FR. The computation of FR coefficients will be done according to section 3.1. Moreover, in order to bring the value of the fire conditioning factor within the same range, the FR coefficients were normalized between 0.1 and 0.9 according to Equation (8) below (Costache, 2019a): where x is the standardized value of v; v is the current value of the variable; r is the limits of the range value, and l is the limits of the standardization range.

Configuration of fire susceptibility models
The next stage of the modelling process is represented by the configuration of the models used to calculate the fire forest susceptibility. In this regard, the FR values which characterize the factor classes/ categories were assigned to fire and non-fire locations. In this study, Python 3.8.2 ("Python Release Python 3.8.2," n.d.) was used to apply machine learning models. Therefore, the fire and non-fire locations having attributed the FR coefficient were converted in tabular format in order to be read in Python. Then, different amounts of hyperparameters were set, and by running the models several times, the best configuration was chosen to reach the highest accuracy for each model. For RF   Fig. 5. Flowchart of the methodology adopted. Fig. 6. Structure of the RF model used for forest fire susceptibility in this study. model the number of trees equal to 100 and the random split variable equal to 1 leads to the highest accuracy with the minimum time to get the results. The advantage of RF comparing to other models is the smaller number of hyperparameters to be set. Based on these criteria, the structure of the RF model used in this research is shown in Fig. 6.
For SVM model, the penalty parameter equal to 5 and the kernel function of radial basis function and its parameter equal to 1 had the highest accuracy. For LR model, the maximum iteration equal to 1000, the inverse of regularization strength equal to 3, and the tolerance for stopping criteria equal to 0.001were set. However, in this model changing the hyperparameters did not change the accuracy meaningful. For the NN model (Fig. 7) the optimizer was set to Adam, the activation function was set to Relu (Rectification Linear Unit), the number of hidden layers was set to 60, and the number of neurons in each layer was set to 2 to reach the highest accuracy. For the CART model, the maximum number of splits equal to 10, the minimum number of parent nodes equal to 10 and the minimum number of leaf nodes equal to 5 resulted in the highest accuracy. Fig. 8 illustrates the structure of the CART used for modelling forest fire in this research.

Evaluation of models performance
Once the models of the forest fire susceptibility are implemented, it is necessary to evaluate their performance. To achieve this, we used the receiver operating characteristic (ROC) curve (Chen et al., 2019;Jaafari et al., 2019b;Tien Bui et al., 2017), The ROC curve is plotted with the sensitivity as the y-axis and the 1-specificity as the x-axis (Costache, 2019b(Costache, , 2019cPourghasemi et al., 2016;Tien Bui et al., 2017). The AUC value ranges from 0.5 to 1. The highest AUC value indicates a perfect measure of separability, while the lowest AUC value indicates the worst measure of separability.
Additionally, we used statistical measures such as overall accuracy, precision, specificity, sensitivity, and Kappa index (Tien . These statistical criteria were calculated based on the following equations: In the above equations, TP (true positive) and TN (true negative) are the number of pixels which are considered correctly classified as forest fire and non-forest fire, respectively. While FP (false positive) and FN (false Negative) are the numbers of pixels that are incorrectly classified.

Fire susceptibility mapping
Based on the results obtained, the models for forest fire susceptibility mapping for the study area were adopted. To generate forest fire susceptibility maps, the layers obtained were then transformed to GIS environment, the five classes of very low, low, moderate, high, and very high susceptibility to forest fire were applied (Jaafari et al., 2019b;Tien Bui et al., 2017) using natural breaks method in ArcGIS 10.4 (Tehrany et al., 2019).

Frequency Ratio weights
The results of the first of the workflow are represented by the values of FR coefficients. FR coefficients equal to 0 were achieved by the mean annual temperature between 15.9 • C and 16.5 • C and the water bodies land use category. On the surfaces characterized by these two Fig. 7. Architecture of MLP used in this study.
parameters, no forest fire location was recorded. Low FR coefficients were also achieved by the flat zones (0.215), rainfall class between 629 and 700 mm/year (0.314), or the north-eastern slopes (0.445). The highest FR values were recorded by the areas with rainfall between 900 and 943 mm/year (3.58), distances to roads between 240.1 m and 480 m (3.302), and the surfaces where the average wind speed is between 3.75 m/s and 4 m/s (2.653) ( Table 3). According to the described methodology, these values were used as input in the 5 machine learning models.

MLP-Fr
The statistical metrics were involved in the estimation of MLP-FR performance (Table 4). Thus, the accuracy of the training dataset was equal to 0.883, the Precision achieved a value of 0.856, the Sensitivity was 0.904, the Specificity is 0.863, while the K-index was situated at 0.766. In terms of validating dataset, the following values of statistical metrics were achieved: Accuracy = 0.844, Precision = 0.801, Sensitivity = 0.877, Specificity = 0.817 and K-index = 0.689.
The fire susceptibility map was derived through the MLP-FR model (FSM-MLP-FR) after the model performance assessment (Fig. 9). The values of FSM-MLP-FR were grouped into five classes using the Natural Breaks method. The first class, highlighting the areas with very low susceptibility, accounts around 20.2% of the study area, the low fire susceptibility is presented at approximately 10.44%, while the medium class of FSM-MLP-FR occupies a surface equal to 21.92% of the research zone. The high and very high fire susceptibility span on a total of 47.62% of the study area.

CART-Fr
The same statistical measures were used to assess the performance of CART-FR ensemble for both training and validating datasets. The use of training sample revealed the achievement of the following results (Table 4): Accuracy = 0.9, Precision = 0.907, Sensitivity = 0.894, Specificity = 0.905 and K-index = 0.799; while the involvement of validating data set helps to achieve the next values: Accuracy = 0.828, Precision = 0.795, Sensitivity = 0.851, Specificity = 0.807 and K-index = 0.656. In both of the cases, the statistical metrics highlight very good performances of CART-FR ensemble.
Following the performance evaluation, the fire susceptibility map (FSM-CART-FR) was computed (Fig. 9). Similar to the previous case, the FSM-CART-FR values were grouped into 5 classes using the Natural Breaks method. The very low and low susceptibility is spread on around 22.65% of the research area, while the medium values are encountered at approximately 13.39%. The high and very high susceptibility can be found around 63.97% of the study area.

LR-Fr
The performances achieved by LR-FR ensemble were the lowest among all the applied models. Thus, in terms of training dataset the following values were achieved (Table 4) The classification of Fires Susceptibility Map (FSM-LR-FR) into five classes (Fig. 9), according to the Natural Breaks method, highlights the following situation: very low susceptibility present on 17.26% of the study area, low susceptibility spread on 18.58% of the research area, medium values located on 23.4%, and high and very high susceptibility with a surface equal to 40.77% of the study area.

SVM-Fr
The application of SVM-FR ensemble is characterized by the following performances in terms of training dataset (Table 4): Accuracy = 0.91, Precision = 0.898, Sensitivity = 0.919, Specificity = 0.901 and K-index = 0.819. The use of validating dataset shows that the Accuracy = 0.858, the Precision = 0.841, the Sensitivity = 0.870, the Specificity = 0.846 while the K-index = 0.715. It should be mentioned that overall the SVM-FR ensemble achieved the second-best performance after RF-FR model. The Forest Fire Susceptibility Map was represented by dividing the FSM-SVM-FR values into five classes using the Natural Breaks method (Fig. 9). The very low fire susceptibility appears on approximately 26.74% of the study zone, the low susceptibility is spread on 8.09% of the total analyzed territory, the medium FSM-SVM-FR values are present on 11.09%, while the high and very high susceptibility accounts around 54.07% of the Fahs -Anjra province and Tanger-Asilah prefecture.

RF-Fr
After the training of the RF-FR ensemble, its performance was measured with the help of several statistical metrics (Table 4). Thus, in terms of training sample, the accuracy of 0.997 was the highest between all the applied models. In this case, the precision, sensitivity and specificity have the same values as the Accuracy while K-index is equal to 0.994. The involvement of validating sample in the assessment of models performance revealed that RF-FR achieved also the best results highlighted by the following values: Accuracy = 0.904, Precision = 0.914, Sensitivity = 0.896, Specificity = 0.912 and K-index = 0.808.
Once the model performance was assessed, the mapping of the Fire Susceptibility Index was performed (Fig. 9). Thus, the very low susceptibility has 18.68% of the total study area, the low susceptibility is present on 12.47% of the analyzed zone, the medium values are encountered on approximately 16.79%, while the areas exposed in a high and very high degree to fire occurrence can be found on around 52.05% of the study area.

Results validation
The results validation was done by using the ROC Curve. In this regard, the Success Rate, constructed with the training data, and Prediction Rate, constructed with the validating data were employed and their plots are shown in Fig. 10a and 10b respectively. Thus, in terms of success rate, the highest AUC value (0. Given the fact that all the models, for both success and prediction rate, achieved AUC values higher than 0.8, we can assume that the applied algorithms were performant concerning the identification of areas susceptible to forest fire occurrence.

The importance of conditioning factors
Machine learning models have recently become the focus of intense interest of researchers in several environmental hazards studies. Modeling of Forest fire is a complex issue (Tien Bui et al., 2017), its propagation is commonly linked to several factors, such as climate, topography, human, and vegetation factors. However, at present, there is no agreement for the selection of explicative variables for forest fire susceptibility.
In this research, we showed that 10 influencing parameters (i.e. slope, aspect, elevation, distance to road, distance to residential, normalized difference vegetation index, land use, rainfall, temperature, and wind speed) could be used to implement FR-MLP, FR-LR, FR-CART,  FR-SVM and FR-RF for mapping forest fire susceptibility in the north of Morocco. These models classify the study area into 5 levels of risk. The outcomes from this study could be served as a reference to identify areas which require emergency intervention.
As mentioned before, one of the advantages of RF method is its ability to estimate the importance of the features used for modeling. Fig. 11 shows the result of evaluating the conditioning factors for fire forest modeling in the study area using RF method. It is obvious that NDVI is the most important conditioning factor, following by distance to roads, rainfall and wind speed. Slope has the least importance among the conditioning factors. This finding support other previous studies (e.g. Holsinger et al., 2016;Tien Bui et al., 2019, 2016. However, (Pourghasemi et al., 2020) emphasized that land use, annual mean rainfall, slope, and elevation are the best factors for the forst fire susceptibility. Similar to this finding, in a study developed by (Pourtaghi et al., 2016) the most important factors were soil type, average annual temperature, and land-cover for forest fire susceptibility.
In this paper, the RF -FR model achieved the highest accuracy among other models, these findings are consistent with prior research (Pourtaghi et al., 2016;Tien Bui et al., 2017). These authors stated that this outperformance of RF is due to its advantages to handle and to deal with nonlinearities between variables, another essential advantage for RF is that it is easy to use and processing large dataset. Indeed, Random Forest algorithm has been employed with promising results in other prior research related to natural hazards, for example for flood susceptibility mapping, (Costache et al., 2020c), in their paper proved that RF achieved high accuracy among other algorithms applied in their case study.
As discussed above, FR is Followed by SVM-FR model in term of accuracy, the SVM, in a way that it can be applied for both classification and regression task, has numerous advantages such as its capacity to fix complexity of overfitting and its applicability to handle smaller dataset with high dimensionality . Due to these advantages, SVM has been widely used in natural hazards assessment proving successful results. Also, (Tien Bui et al., 2017) reported the predictive performance for both RF and SVM for forest fire prediction. MLP showed effective results due to the ability to be involved in processing non-linear datasets (Huang et al., 2020).
CART has interesting advantages, Nevertheless, it is worthy to note that the CART algorithm is rarely present in the literature review for forest fire susceptibility and in general for moderate prediction purposes .
LR achieved also good accuracy. similarly, it has shown successful results in other fields, for example, flood mapping (Costache et al., 2020a) and landslide susceptibility (Nhu et al., 2020). Overall, the hybrid models based on ML algorithms and bivariate statistical methods provided promising and more performance results in comparison to individual models (Costache and Tien Bui, 2019;Pham et al., 2018b).
To sum up, selecting a suitable ML algorithm for forest fire models like for any other environmental hazards assessment is a challenging task because each model presents its drawbacks and advantages (Tien Jaafari et al., 2019a). Therefore, all the aforementioned developed models in this study are recommended for forest fire susceptibility studies scope.
Forest ecosystems in the north of Morocco are dramatically influenced by a number of natural and anthropogenic disturbance, for instance, in a recent work, (Salhi et al., 2020) reported the fragility of soil quality, hence, a situation which requires a need of anti-erosion activities. Also, the major part of landscape areas in the north of Morocco is exploited for cannabis plantations (El Motaki et al., 2019) due to its economic value. All these factors increase the widespread forest fire disaster. This paper highlights the urgent need for government agencies to act for maintaining forest ecosystem prevention and planning mitigation strategies. Despite a huge effort provided by a public agency to cope with this situation even though dealing with local conflicts.
Finally, looking to an ecosystem forest sustainable for our future, such efforts could be a powerful tool and they must be accomplished: i-a public alarm system could be implemented to secure the urgent situation for the forest ecosystem. ii-delivering and communicating the results of the forest fire to the public community could be a pathway to reduce the influence of population behaviour.

Conclusion
Forest fire is one of the main causes of ecosystem service losses and deterioration of human life. Thus, it is considered one of the pressing issues. In many parts of the world governments' policy on forest protection and management is challenging to meet their goals. With these issues in mind, this paper is motivated by the need for preparation predictive models of the forest fire risk and identification of areas requiring immediate management actions with a goal of "Act Now before Tomorrow". In this paper, five hybrid models have been developed namely, RF-FR, SVM-FR, MLP-FR, CART-FR, and LR-FR for forest fire modelling based on 510 forest fire locations and a total of 10 forest fire conditioning factors (elevation, slope, aspect, distance to roads, distance to residential, land use, Normalized Difference Vegetation Index (NDVI), rainfall, temperature, and wind speed). Results of models proposed indicate that RF-FR achieved the highest performance (AUC = 0.989), followed by SVM-FR (AUC = 0.959), MLP-FR (AUC = 0.858), CART-FR (AUC = 0.847), LR-FR (AUC = 0.809) in the forecasting of forest fire, respectively. In light of these results, the maps generated here could be a very effective management tool for analyzing and developing forest fire management and strategies. Also, the methodology can further be extended to other areas with a similar issue.  Fig. 11. The importance of conditioning factors.