Pattern Extraction and Rule Generation of Forest Fire Using Sliding Window Technique

Patterns can be extracted from historical data and used to make predictions of future events. Such predictions are useful to support decision makers in various areas. In this study the sliding window technique is used to reveal forest fire patterns that relate four meteorological conditions (temperature, relative humidity, wind speed and rainfall) with burnt area size. Extracted patterns are then being grouped based on the size of burnt area. Rules are then generated resulting in eight distinct patterns of meteorological conditions that could predict the size of forest fire. Experimental results showed that extracted patterns produced good prediction accuracy.


Introduction
Forest fire plays an important role in shaping forest ecosystems all over the world. For example, the Mountain Ash forests in Australia depend on fire for the regeneration of the ecosystems. There is increasing evidence to show that the link between climate change and El Nino phenomenon is causing an escalation in number and size of forest fire. There is new evidence from Amazon showing that tropical forests that have experienced burning before would be more susceptible to future burning. Thus, there is an increased possibility that wildfire episodes will occur more frequently, and in the magnitude not endurable by the tropical forest ecosystem. Scientists believed that the entire Amazon would be threatened, and this will affect the biodiversity and climate change globally (Rowell & Moore, 2000). Forest fire occurrence can caused massive destruction. A severe drought throughout Siberian Peninsular in 2005 caused massive fire in northern and central Portugal, destroying more than 150 000 ha land and killing 15 people (Voigt et al., 2007).
Fire, being a chemical reaction, needs heat, oxygen, and fuel for ignition and spread. Forest fire happens when an uncontrollable fire starts and spreads itself to natural vegetation, thus the ignition cause and environmental conditions affects the probability of occurrence. Influential factors of forest fire are like fuel's distribution and quality, weather, topography and human factors (Orozco, 2008). In addition, fires are easily ignited during draught seasons. Umamaheshwaran et al. (2007) applied image mining techniques that used satellite images to produce forest fire prediction model. Laneve and Cadau (2007) assessed the quality of fire hazard prediction model based on Fire Potential Index and using SEVIRI/MSG satellite images. Meanwhile, it is well known that weather conditions and fire risks are related, and weather conditions are a crucial factor which determines whether wildfire would occur, and how far it will spread (Wells et al., 2007). A recent statistical survey indicated that weather and forest Fire Weather Index (FWI) components are highly influential in forest fires in Portugal (Carvalho et al., 2008).
Predicting natural hazards using satellites has begun in 2000, and it has changed the way natural disasters are being assessed (Gillespie et al. 2007). The use of satellite images in fire prediction and detection could be seen in the literature from Umamaheshwaran et al. (2007), and Laneve and Cadau (2007). While using satellite images provides almost real-time data for analyst to predict natural hazards, atmospheric interference such as clouds, smoke and haze cause distortions in the images retrieved. Apart from that, spatial satellite images have low resolution. In addition, using satellites require high equipment and maintenance costs (Cortez & Morais, 2007).
Earlier studies have shown that weather and forest Fire Weather Index (FWI) components have significant influence on forest fire (Cortez & Morais, 2007;Carvalho et al., 2008). Cortez and Morais (2007) have concluded in their studies that further research is needed to confirm that direct weather conditions (i.e. temperature, rain, relative humidity and wind speed) are preferable over accumulated values in predicting forest fire behaviour.
Previous study by Cortez and Morais (2007) showed that predicting forest fire burnt area using Support Vector Machine (SVM) fed with meteorological data (i.e. rain, wind, temperature and humidity) gives the best performance compared with other four data mining approaches (Multiple Regression, Decision Tree, Random Forest, Neural Network). In their study, the five data mining approaches were being used with data consisting of spatial, temporal, meteorological and Fire Weather Index (FWI) data from Montesinho Natural Park in northern region of Portugal. The proposed solution, however, achieved a lower predictive accuracy for large fires. Thus, further exploratory research is required for predicting burnt forest area by using only meteorological data.
Data mining can be used to discover interesting salinity and temperature patterns and the patterns can be used to predict future events (Huang et al., 2008). By using association rules mining spatial-temporal patterns revealing the salinity and temperature variations can be discovered. Furthermore to evaluate the rules generated that have antecedent and consequent, an important measurement is proposed. The proposed process of mining association rules consists of several steps: transforming quantitative attributes, discovering frequent inter-transaction itemsets, generating association rules, and identifying insight association rules. During the frequent inter-transaction itemset discovery, sliding window technique is used so that during analysis only rules within the window would be considered and thus minimizing effort in mining uninteresting rules. The proposed method could generate rules about salinity and temperature patterns but the research did not evaluate the performance of the proposed important measurement.
This research has investigated patterns of weather in relation to the size of forest fire. The patterns are then classified and rules that can be used for decision making are formulated. Section 2 presents the research approach while Section 3 describes the extraction of the forest fire patterns. The rule generating activity is explained in Section 4 and concluding remarks are presented in Section 5. Figure 1 depicts the approach that has been used in conducting the research. The sliding window technique used by Huang et al. (2008) in their study to discover rules of ocean salinity and temperature variations has been adopted and adapted in this study.

Research Approach
Forest fire data which consist of forest fire and meteorological information, have been collected from UCI Machine Learning Repository and from the study by Cortez and Morais (2007). In the data preparation stage, the attributes are described, records with missing values were removed or missing values were replaced with estimated values, or ignore the missing values during analysis. After the cleaning stage, transforming data by performing discretization to change the data type from continuous to categorical is a significant task in data mining process. A suitable discretization technique used can potentially improve the performance of the data mining technique significantly. Pattern extraction and rule generation using the sliding window technique are performed after the data preparation stage.
The data contain forest fire occurrence and forest fire weather index (FWI) components in Montesinho natural park, located in northern Portugal region, from 2000 until 2003. The data was integrated with weather observations (wind speed, temperature, relative humidity and rainfall) obtained from Braganca Polytechnic Institute. The occurrences of forest fires were within the Montesinho Natural Park. The park is being divided into eight distinct X and Y location by placing a 9x9 grid on the map. There is a total of 81 combinations of X and Y used in this study. The forest fire data consist of 517 where 270 records or 52% of the data consists of forest fire occurrences with burnt area more than or equals to 0.01ha and the other 247 instances or 48% of the data contains forest fire occurrences with burnt area less than 0.01ha. There are 5 attributes namely temperature, relative humidity (RH), wind speed, rainfall and the burnt size that have been included in this study.

Forest Fire Pattern Extraction
The actual forest fire data were in continuous value. Data transformation has been performed for these data whereby the continuous values have been changed to categorical form. This is a significant task in data mining process. A suitable discretization technique used can significantly improve the performance of the data mining technique. In pattern extraction study, data need to be in the categorical form and thus, discretization must be performed before the analysis could begin. The continuous values of the attributes temperature, relative humidity, wind speed and rain are being transformed into categorical form by using the ranges applied by practitioners (Kottlowski, 2006;Pearce, 2008).
The values for temperature are transformed into six categories as shown in Table 1 while the values for relative humidity are transformed into three categories as presented in Table 2. Temperatures are measured in Celcius and classified from 'very cold' to 'extremely hot' while relative humidity is from 'low' to 'very high'. The codes are assigned to facilitate the analysis of patterns with sliding window technique process.
Tables 3 and 4 display the categories used for wind speed and rainfall measurement. There are thirteen categories for wind speed which starts from 'calm' to 'hurricane' and it is measured in km/hr. Six (6) categories have been used for rainfall where it is measured in mm/hr. The rainfall is categories from 'very light rain' to 'extreme rain'. Again, codes are assigned for the purpose of patterns analysis.
The data was divided into the training and testing sets after the transformation process is complete. 80% of the data or 414 records were allocated for the training and the remaining 20% (103 records) for validation. The sliding window technique is used to extract patterns from the data. Each window slice will capture a set of patterns which consist of temperature, relative humidity, wind speed and rainfall with the associated burnt area. The sliding window technique captures data pattern as it moves down the data set. The captured patterns are then recorded. Figure 2 depicts the process of extracting patterns using the sliding window technique.
A total of 32 forest fire patterns have been obtained in this stage (refer Table 5). There are patterns that occurred many times such as patterns number 22 and 31 which occurs 97 times and 88 times respectively. However, there are patterns which occurred only once (refer patterns number 1, 5, 10, 14, 16, 17, 19, 24 and 26).
All the 32 patterns obtained from the training data were validated using the validation dataset. If a pattern correctly predicts the fire size, then the pattern is given a count for "True Positive". Otherwise, the pattern is given a count for "True Negative". The percentage of accuracy for each pattern is calculated by taking the total number of "True Positive" to be divided by the total number of validation against a pattern. A total of 18 patterns (56%) have been validated while the remaining 14 patterns (44%) could not be validated as the patterns were not found in the validation data. However, this does not affect the results as the total occurrences of each of these 14 patterns are small. Most of them (8 patterns) have only 1 occurrence, while 4 patterns have 2 occurrences, 1 pattern has 4 occurrences and 1 has 5 occurrences (see Table 5). The 18 patterns that have been validated have high occurrences. All the patterns that have been validated are summarized in Table 6. Each pattern is given an ID to simplify the process of eliminating patterns, classifying patterns and generating rules. Table 6 shows that out of the 18 patterns, three patterns have less than 50% accuracy (pattern ID 1, 7 and 33). Only 15 patterns were used in the classification and rule generation processes. In the pattern classification stage, the patterns were grouped according to size of burnt area (refer Table 7). There are 14 patterns under the target class "M" representing medium sized fire (1-500ha), and 1 pattern under target class "L" representing large sized fire (>500ha). There is no pattern associated with target class "S" representing small sized fire (<1ha).

Rule Generation
In the rule generation stage, the patterns that are being extracted and classified are translated into rules that can be used for predicting fire behaviour according to burnt area size. First, the categories for the four attributes are converted into their interval representations (refer Table 8).
From the interval representation, it can be seen that there are some intervals from two different patterns that could be merged into one. Patterns with ID 3 and 4 have similar values for the attributes except for the wind speed. It is apparent that these two records can be merged together. Similar merging can be performed to patterns with ID 5 and 6. Thus, patterns with ID 3, 4, 5 and 6 are merged to form one rule. Merging patterns with ID 9, 10, 11 and 12 will produce another rule. As a result of merging, 8 rules are generated as depicted in Table 9.
The first rule obtained from pattern ID 17 can be interpreted as, "when the temperature is between 21 and 28 degrees Celsius, relative humidity between 0 and 50 percent, wind speed between 1 and 5 km/hr and rainfall of less than 0.25 mm/hr, the predicted forest fire size will be large, i.e. more than 500 hectare". From the obtained rules, it is observed that whenever there is forest fire, the rainfall must be minimal or none i.e below 0.25 mm/hour. The first rule generated from the pattern with ID 17 has very high occurrence (refer Table 5, record 31) and the validation shows that this pattern is highly related to forest fire behaviour, and could cause large fire. Another pattern highly related to forest fire behaviour (refer Table 5, record 22) is the pattern with ID 11 which is later being used to form the 5 th rule. It can also be concluded that high temperature is significant to forest fire because out of the eight rules, six of them shows that the temperature is in the range of 14 to 35 degrees Celsius.

Conclusion
Patterns that relate the size of the forest fire and meteorological attributes namely temperature, relative humidity, wind speed and rainfall were obtained using the sliding window technique. These patterns showed that different combinations of the four meteorological conditions will affect the size of forest fires. The rules generated from the obtained patterns can be used to identify the size of forest fire burnt area. By knowing the behaviour of a potential fire, the decision maker can plan for the management of the fire more effectively. Effective prediction of forest fire could increase the efficiency in fire management, thus saving lives and natural resources. Future study may include the use of fuzzy technique with the sliding window approach in extracting the forest fire patterns as classification of meteorological attributes is very subjective and depends on experts.