Uni-temporal Sentinel-2 imagery for wildfire detection using deep learning semantic segmentation models

Abstract Wildfires are common disasters that have long-lasting climate effects and serious ecological, social, and economic effects due to climate change. Since Earth observation (EO) satellites were launched into space, remote sensing (RS) has become a more efficient technique that can be used in agriculture, environmental protection, geological exploration, and wildfires. The increasing number of EO satellites orbiting the earth provides huge amounts of data, such as Sentinel-2 with its Multi Spectral Instrument (MSI) sensor. Using uni-temporal Sentinel-2 imagery, we proposed a workflow based on deep learning (DL) semantic segmentation models to detect wildfires. In particular, we created a new big wildfire dataset suitable for semantic segmentation models. We tested our dataset using DL models such as U-Net, LinkNet, DeepLabV3+, U-Net++, and Attention ResU-Net. The results are analysed and compared in terms of the F1 score, the intersection over union (IoU) score, the precision and recall metrics, and the amount of training time each model possesses. The best results were achieved using U-Net with the ResNet50 encoder, with F1-score of 98.78% and IoU of 97.38%, and we developed it into a pre-trained DL Package (DLPK) model that is able to detect and monitor the wildfire from Sentinel-2 images automatically.


Introduction
Wildfire seasons have become more widespread due to climate change, resulting in new dynamic scenarios.Knowing where, how big, and how often wildfires occur is vital to managing emergency response activities, determining economic and ecological damage, and assessing recovery.Since Earth observation (EO) satellites were launched into space, remote sensing has become the best way to monitor the timely coverage of wildfires locally and globally (Chuvieco et al. 2019).The European Commission's new data policy, established in partnership with the European Space Agency (ESA), offers unrestricted access to high-resolution, multitemporal, and multispectral data collected by the Sentinel-2 satellites (Drusch et al. 2012).On a global scale, satellite imagery offers valuable data about the earth with minimum cost and time.In addition to better data quality, the dataset's temporal resolution has significantly improved with the Sentinel-2 satellite imagery series (Drusch et al. 2012).As a result, remote sensing has been used in many applications, including wildfire monitoring (Wang et al. 2022), flood mapping (Kalantar et al. 2021), and damage mapping (ElGharbawi and Zarzoura 2021).High-resolution datasets have spawned numerous methods for wildfire mapping in the past few years (Barboza Castillo et al. 2020).These methods concentrate primarily on change detection by generating curated features.Detection of wildfires using methods based on deep learning (DL) has been considered and has become a trending topic (Zhao et al. 2022).
The main objectives of this work are: Creating a big dataset for Turkey's wildfires using Sentinel-2 multiband images suitable for DL semantic segmentation models.
Conducting a series of experiments to determine the best loss function that is suitable for DL models to test our dataset.
Testing our dataset with a series of experimental semantic segmentation models.Developing a DL model package (DLPK) that can detect any wildfire from Sentinel-2 imagery to support decision-making.

Related work
Since EO satellites were launched into space, remote sensing (RS) has become a more efficient technique that can be used in agriculture, environmental protection, geological exploration, and wildfires.Spectral indices are the most commonly used approaches in remote sensing to characterize wildfire and burn severity (Key and Benson 2006), spectral unmixing, and models of radiative transfer (Chuvieco et al. 2007), which are used with multispectral or hyperspectral data.Wildfires can be detected by sensors such as the MSI on the Sentinel-2 satellite, which has multispectral bands with band02 (visible), band08 (Near-Infrared [NIR]), and band12 (shortwave infrared (SWIR)).For instance, burnt areas absorb more NIR radiation than unburnt areas, whereas burnt areas reflect more radiation in the visible and SWIR bands (Quintano et al. 2011).Therefore, many spectral indices have been proposed to detect burnt areas, such as Normalized Burn Ratio (NBR) (Key and Benson 2006), Relative Differenced NBR (RdNBR) (Cardil et al. 2019), and Burned Area Index for Sentinel 2 (BAIS2) (Filipponi 2018).It is possible to differentially normalize these indices when using pre-wildfire and post-wildfire satellite images to differentiate burned areas.However, these methods require cloud cover-free satellite images.The threshold levels are often set based on their appearance, the types of land, and the amount of tree cover (Loboda et al. 2007).The problem with these methods is that they do not work well with different weather conditions when satellite images are taken.Also, using indices to detect the damage usually requires manual or semi-manual methods, setting thresholds that depend on the soil type and cannot be easily set.Nonetheless, frequent issues must be resolved when using satellite imagery for wildfire mapping, such as atmosphere opacity, which is a common issue (Nolde et al. 2020).The existence of fire smoke and clouds makes it impossible to observe the areas that have been burned, and the shadows cast by clouds may even lead to false detections.In addition, several problems arise due to the features of the sensors.For instance, it has been noted that sensors with a coarse resolution tend to underestimate the size of burned areas, mostly when the fires are small and sporadic (Chuvieco et al. 2019).Typically, burned areas do not cover every pixel in an area; therefore, they are combined with other land cover types into a single pixel in terms of spatial and spectral aggregation (Laris 2005).We used the Sentinel-2 satellite imagery, which has a high spatial resolution (10 m) compared with other satellite imagery such as Landsat (30 m) and MODIS (250-500 m), which can discover more area details, and the wildfire border could be detected easily.Still, Landsat and MODIS satellite images have the advantage of the thermal bands that are important for measuring land surface temperature.DL algorithms are able to automatically detect object properties of various sizes without requiring additional human input for certain hyper-parameters (Reichstein et al. 2019).When using convolutional neural networks (CNN) for semantic segmentation models, spatial information must be preserved to identify each image pixel.Autonomous cars, social contact, robotic systems, health research, and precision farming (Hu et al. 2021) are all examples of image-processing activities that are becoming critical.2D and 3D satellite images scenes have also been subjected to semantic segmentation algorithms (Ma et al. 2019).Fully convolutional networks were trained with high-resolution (VHR) optical satellite imagery using Sentinel-2 and SAR data (Wurm et al. 2019).Using EO images with a low spatial resolution, the DL model was able to map the burnt area (Pinto et al. 2020).Deep convolutional autoencoders (U-Net and ResUnet) were used on uni-temporal Landsat images and proposed a sample window size of 256 Â 256 pixels in the DL model training (Langford et al. 2018).Sentinel-2 satellite images have been widely used in a variety of remote sensing applications, including a method of cloud masking (Kristollari and Karathanassi 2020), detecting urban changes (Papadomanolaki et al. 2019), land use and land cover classification (Helber et al. 2019), marine debris detection (Kikaki et al. 2022), human settlement mapping (Corbane et al. 2021), dam detection (Balaniuk et al. 2020), smoke classification (Wang et al. 2022), and wildfire detection and monitoring (Alencar et al. 2022;Seydi et al. 2022).The availability of training data is a significant obstacle in developing a DL segmentation model for the burned area.DL data analysis uses algorithms to improve itself continuously, but data quality is required for these models to work efficiently.The accuracy of the data needed to solve a specific problem, known as features, is critical to the learning outcome for problem-solving.Automatic, semi-automatic, and manual (human intervention) methods are available for creating datasets.However, there is a problem with the precision of the automatically generated creations.Even if many models are available for automatic creation, they cannot achieve a high level of accuracy.As a result, we created a manual dataset for Turkey's wildfires in this work.
The main contributions of this work are: Proposing a complete workflow to detect and monitor wildfires.Creating a manual big dataset of Turkey's wildfires using Sentinel-2 multiband images suitable for DL semantic segmentation models.Conducting a series of experiments to evaluate the efficiency of DL semantic segmentation models for monitor and detect wildfires.Developing a pre-trained DLPK that can detect wildfire from Sentinel-2 imagery.

Dataset
In this work, our dataset consists of two parts: images and masks.Table 1 gives a general overview of the dataset's specifications and will be explained in more detail in each subsection.

Sentinel-2 multispectral data
Sentinel-2 is a wide-swath, high-resolution, multispectral imaging mission supporting land monitoring studies.Sentinel-2 is based on two identical satellites (Sentinel-2A and Sentinel-2B) that move in a sun-synchronous orbit with an average altitude of 786 km.Sentinel-2A was launched in 2015 with the MSI sensor.Sentinel-2A has two levels of processing (level-1C and level-2A).The top of atmosphere (TOA) reflectance measurements and the parameters for converting them into radiances are included in the level-1C product and multispectral registration at the sub-pixel level.A subpixel multispectral registration is provided by level-2A, which provides orthorectified bottom-ofatmosphere reflectance (Gascon et al. 2017).

Sentinel-2 imagery for wildfire
In the solar domain, the spectral wavelength ranges from 0.4 to 2.5 lm, which includes visible light (red, green, and blue), NIR, and SWIR.Numerous studies have shown that the NIR and SWIR spectral bands are more susceptible to fire-induced changes in vegetation and soil, while visible bands are less sensitive to fire effects (Roy et al. 2019), as shown in Figure 1.The reduction in moisture leads to an increase in SWIR reflectance.In contrast, the reduction in leaf area index and chlorophyll leads to a decrease in the NIR reflectance after burning (Chuvieco et al. 2019), and thus they are well-suited to being used for false-colour images that emphasize wildfire areas.Preparing dataset

Study area
In July and August 2021, more than 200 forest fires burned 1700 square kilometres in the Mediterranean Region of Turkey (Kilic ¸aslan 2022), in the country's worst wildfire season in history.On 28 July 2021, wildfires broke out in Manavgat, Antalya Province, with a temperature of approximately 37 C (99 F).As a result of the fires' impact on forests and residential areas, several neighbourhoods and villages have been evacuated.According to the most recent data from the Disaster and Emergency Management Presidency (AFAD) (Turkish Red Crescent Society https:// www.kizilay.org.tr),many animals have died due to the wildfires.On 31 July 2021, Sentinel-2 took images of wildfires near the coastal towns of Alanya and Manavgat (Eke et al. 2022).

Preparing Sentinel-2 images
The Sentinel-2 images were obtained from the official website of the United States Geological Survey (USGS) (USGS.Sentinel-2 Missions: Sentinel-2 Levels of Processing.Available online: https://earthexplorer.usgs.gov/).The scenes were selected by checking one by one from Sentinel-2 processing level 1 C imagery, to select images that were taken when the affected area had less than 1% cloud cover, and the preprocessing for each image was done using SNAP software.As shown in Table 2, five images were selected in September 2021 from several provinces in Turkey with various climatic zones and ecoregions backgrounds, such as grasslands, settlements, water bodies, and shrubs.Figure 2 shows the study area locations.

Wildfire labelling
The wildfire polygons were labelled using ArcGIS Pro 3.1 software.The labelling process was done manually on Sentinel-2 images using band-12 SWIR, band-8 NIR, and band-2 Blue, the wildfires appear in red colour.The labelling process was performed by polygons that were drawn as vector data along the border of wildfires without drawing any class of the background, such as grasslands, settlements, water bodies, and shrubs.In the final, a small patch of size 128 Â 128 from the Sentinel-2 multiband images and the label data were extracted for the dataset and sorted by removing any empty or broken patches.

Sentinel-2 Turkey's wildfire dataset
In our wildfire dataset, each image has a spatial resolution of 10 m and consists of thirteen bands.The image is saved using the Universal Transverse Mercator (UTM)  as the coordinate system and GeoTiff as a format file.The dataset has 21,690 images containing the burned area's pixels.The mask is a binary image of the burned area that consists of two categories: the burned area in the foreground and the nonburned area in the background.The values of each pixel are saved in an 8-bit unsigned integer with a value of 1 for the burned area and 0 for the non-burned area.Tables 3 and 4 show the distribution numbers of images and masks depending on wildfire areas, and Figure 3 depicts the dataset's image and mask samples.

Materials and methods
A complete workflow for wildfire mapping using Sentinel-2 multi-spectral data is demonstrated from the DL perspective, as shown in Figure 4. Training data from Sentinel-2 is fed into these DL models (U-Net, LinkNet, Attention ResU-Net, U-Netþþ, and DeepLabV3þ).The best-trained DL model will be developed into a (DLPK) file to be used with ArcGIS Pro software for detecting the wildfire automatically.
Table 5 shows five important models, LinkNet with a changing encoder, U-Netþþ, U-Net with a changing encoder, DeepLabV3þ, and Attention ResU-Net, which are compared in wildfire detection.

U-Net model architecture
The U-Net (Ronneberger et al. 2015) architecture, as shown in Figure 5, is a Ushaped network architecture that consists of contracting (encoder) paths on the left  side and expansive (decoder) paths on the right side, with each path consisting of four blocks connected via a bridge.The U-Net model is an example of a typical model based on upsampling and deconvolution.The encoder follows the typical architecture of a convolutional network, consisting of the repeated application of two 3 Ã 3 convolutional blocks, each followed by a rectified linear unit (ReLU) and a 2 Ã 2 max pooling operation with stride 1 to down-sample the input.The decoder consists of upsampling layers and Conv (convolution).The decoder up-samples the output from the encoder and regenerates it to the input image size.The decoder path in each step consists of an upsampling of the feature map followed by a 2 Ã 2 Convs (convolution) that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3 Ã 3 Convs, each followed by a rectified linear unit (ReLU).Each block will concatenate feature maps from the encoder part corresponding to the encoder.The 1 Ã 1 Conv (convolution) final layer results from calculating the probability of a burned area pixel.

LinkNet model architecture
The LinkNet architecture (Chaurasia and Culurciello 2017) is a network that uses an encoder-decoder structure to focus on fast prediction.The LinkNet architecture is a U-shaped version with two differences from the U-Net architecture.First, it uses a residual module (res-block) instead of the ordinary convolution structure that U-Net used.Second, the synthesis of deep and shallow features by adding instead of stacking that U-Net used.This LinkNet can guarantee the network's high accuracy and efficient forward propagation.The encoder part of LinkNet can be changed into ResNet with different depths and representations.So, the number of layers on the encoder can be changed to measure operational accuracy and efficiency.ResNet18 is the encoder for the LinkNet architecture, one of the lightest Res-Nets.As shown in Figure 6, Using the abbreviation 'conv' to refer to convolution and 'full-conv' to refer to full convolution (Long et al. 2015).In addition, the notation/2 indicates the downsampling of a signal by a factor of 2, which can be done by performing stridden convolution, and the notation Ã 2 means an upsampling by a factor of 2. Between each convolutional layer, batch normalization was used, followed by ReLU non-linearity (Ioffe and Szegedy 2015).In Figure 6(a), the encoder is on the left side of the network, and the decoder is on the right side.As shown in Figure 6(b), the encoder begins by convolution on the input image with a stride of 2 and a kernel size of 7 Â 7. Spatial max-pooling in a 3 Â 3 area is performed using a 2-step stride.

DeepLabV31 model architecture
The DeepLab family of network architectures is a series of incremental improvements upon a first architecture called DeepLab, first published in 2014 (Chen et al. 2014), followed by its second version in 2017 (Chen, Papandreou, Schroff, et al. 2017), and its third declination in 2017 (Chen, Papandreou, Kokkinos, et al. 2017).The latest flavour of DeepLab, called DeepLabV3þ, arrived in 2018 (Chen et al. 2018).DeepLabV3þ, as  recovery) utilizes three similar upsampling residual blocks (SRes) in order to achieve accurate positioning and feature recovery.To acquire information on both a low level and a high level that is more comprehensive, the attention and squeeze excitation block (ASE) has been included as a connection horizontally, which improves the way  that downsampling features and upsampling information are represented as feature information.In the end, a SoftMax layer was integrated to generate the result of model segmentation.
U-Net11 model architecture U-Netþþ (Zhou et al. 2018), also known as Nested U-Net, is an extension of the U-Net architecture to improve segmentation accuracy.As shown in Figure 9, U-Netþþ with nested dense skip pathways is an excellent way to get multi-scale feature maps from multi-level convolution pathways.The standard U-Netþþ architecture consists of downsampling and upsampling modules, convolution units, and skip connections between convolution units.The main difference between U-Netþþ and U-Net architectures is the skip pathways in U-Netþþ, which uses the dense connection method (Huang et al. 2017).In the U-Net architecture, the encoder's feature maps are sent directly to the decoder, while in the U-Netþþ architecture, the feature maps are sent via a dense convolution block whose number of convolution layers depends on the pyramid level.

DLPK model
A DLPK is a pre-trained model that is used for image classification and object detection.The DLPK model has many extensions depending on the framework that is used for the trained model, such as (.h5) for Keras and (.pb) for TensorFlow, and can be saved locally or stored on a portal.The DLPK model is able to detect wildfire from Sentinel-2 images automatically using standard DL environments or any GIS software that deals with DL models.In this work, the DLPK model will be pretrained using Keras and based on the model that will achieve the highest results on our dataset.

DLPK model data
In order to evaluate the performance of the DLPK model, several wildfires were chosen that were close to the climate conditions in Turkey, as shown in Table 6.The Greece wildfire season in 2021 was the worst in the last 13 years, 130,000 ha of land were burned, and five wildfires started in early August (Giannaros et al. 2022).Also, in 2022, many countries were affected by wildfires, such as Spain, Croatia, and the United States, and a single wildfire event was selected for each country as a case study.

Accuracy assessment
Assessing the detection's accuracy is a crucial part of mapping wildfire areas.It can be analysed by comparing it to a reference mask with standard measurement indices.This comparison is based on how the results look and how the numbers measure them.Different metrics, such as the F1 score, which is also known as the Sorensen dice coefficient (SDC), shown in Equation ( 1), and the intersection over union (IoU), which is also known as the Jaccard Index, shown in Equation ( 2), are often used to judge how accurate the segmentation results are: We also calculated the precision as shown in Equation ( 3), which describes how many burned pixels are detected as burned.Additionally, the recall was calculated, which describes how many of the truly burned pixels have been detected, as shown in Equation ( 4).
where TP is true positive, FP is false positive, and FN is false negative.We did not calculate the overall accuracy (OA) in this work because the number of unburned pixels mainly influences it.When the classes of the dataset are imbalanced, the unburned class dominates while the burned class is a small portion of the image.For example, if the burned area is 5% of the total image and the unburned area is 95%, the model is therefore 95% accurate, and at the same time, the model did not detect any burned area.

Loss function
The loss is the sum of the errors made by each batch in the training or validation sets, which shows how properly or badly a trained model works after each optimization step.The wildfire dataset consists of images with burned areas in the foreground pixels; therefore, we chose loss functions that prioritize foreground pixels and the samples that were difficult to segment.Experiments were conducted using binary cross-entropy (BCE) loss functions (Pihur et al. 2007), dice loss (Sudre et al. 2017), focal loss (Lin et al. 2017), and a hybrid loss consisting of contributions from both dice loss and focal loss (Zhu et al. 2019).
BCE loss functions implemented as Equation ( 5): where y i is the label and ŷi is predicted output.
Dice loss functions implemented as Equation ( 6): where i and î are the corresponding ground truth mask and predicted mask.

Models training
We used Keras with the TensorFlow backend as a framework, with the adaptive moment estimation (Adam) optimization algorithm (Kingma and Ba 2014).The dataset was split into 17,352 training tiles, 2169 validation tiles, and 2169 testing tiles.During 300 epoch training, validation data was sent across the network, and its loss was estimated and monitored.The network was trained in 32-batches until it reached convergence at an initial learning rate of 1e-5.Three techniques were used during the training: reducing the learning rate, early stopping, and saving the best model.To avoid overfitting during the training, we reduced the learning rate by a factor of 0.5 if the validation loss did not improve after three epochs.The training will be stopped if it does not improve after five epochs, and the model that had the lowest validation loss will be saved.

Experiments and result
This section summarizes the different performances of DL network semantic segmentation models for mapping wildfires with the Sentinel-2 imagery dataset.The experiments are based on Python version 3.9.12 and Jupyter Notebook version 6.4.11 using the Windows system.The hardware used is an Intel i7-8700 central processing unit (CPU) @3.20 GHz and an NVIDIA GTX 1660ti with 32 GB of memory.

Testing model performance using deferent loss functions
We perform initial experiments using four selected loss functions with the DL model.We selected the best loss function for further experiments based on the initial experiments' results.We train U-Net with the ResNet50 encoder over 300 epochs using different loss functions, such as BCE, focal loss, dice loss, and hybrid loss.
As shown in Table 7, the best result depending on the F1 score and IoU for U-Net with the ResNet50 encoder, is the dice loss function, which means it is suitable for the DL segmentation models for our dataset.Thus, the dice loss function will be used in further experiments.

Testing model performance using GPU and CPU
We also perform initial experiments using the Attention ResU-Net model to train over 300 epochs, to select the best performance using CPU and graphics processing unit (GPU).
The performance of the Attention ResU-Net model with GPU has resulted in a precision of 99.31%, a recall of 97.71%, an F1-score of 98.54%, and an IoU of 97.76%.The training has an average speed of 693 s per epoch and a total of 1 d, 10 h, 5 min, and 47 s in the training of 177 epochs, as shown in Table 8 and Figure 10(a).
The performance of the Attention ResU-Net model with the CPU has resulted in a precision of 97.25%, a recall of 96.62%, an F1-score of 97.47%, and an IoU of 97.61%.The training has an average speed of 9941 s per epoch and a total of 2 d, 51 min, and 13 s in the training of 26 epochs, as shown in Table 8 and Figure 10(b).Thus, the GPU hardware will be used in further experiments.
Compared to the reference mask of the burned area, the Attention ResU-Net model prediction with GPU is entirely the same as the reference mask.The model can detect the burned area, but we found a few images, such as the first image on the left in Figure 11, that had partially affected areas that the model detected as not being completely burned and ignored.The model performance with the CPU achieved lower than the GPU but also suggested excellent accuracy, confirmed by an IoU score of 97.61% and an F1 score of 97.47%, as shown in Figure 11.

Segmentation models result
As shown in Table 9, U-Net with the ResNet50 encoder model has the best result among all models that were tested, with a precision of 98.91%, a recall of 98.55%, an     dataset.The DLPK model has been able to achieve a high level of detection accuracy, as shown in Figure 12 and Table 10.

Discussion
The DL segmentation models require reference data to detect wildfires with Sentinel-2 imagery.(Knopp et al. 2020) used reference data from three data sources, which are the Portuguese Institute for Nature Conservation and Forests (ICNF), the California Department of Forestry and Fire Protection (CAL FIRE), and the German Aerospace Centre (DLR).(Florath and Keller 2022) relied on OSM data to generate reference data.Since the wildfire data in Turkey was lacking, we created a manual dataset using a post-wildfire image from the Sentinel-2 satellite.The manual creation could take a long time, but it provides high accuracy, as confirmed in Section 5.2.227 images with a 512 Â 512 size.However, our dataset contains 21,690 images with a 128 Â 128 size, which is more suitable for segmentation models because the number of images in the dataset is significant for segmentation models.Sentinel-2 has the highest spatial resolution for publicly available multi-spectral optical remote sensing data, and there is a great need to utilize this dataset to produce wildfire perimeter data at a higher resolution, especially for countries and regions that are less investigated in wildfire ecology (e.g.Turkey).DL solves the challenging needs of satellite image processing, and the interest of the RS community in DL methods is growing fast, and many architectures have been proposed recently to address RS problems.Most of these methods detect wildfires using two images for pre-and post-event (El Mendili et al. 2020), which takes more time than using uni-temporal images for post-event only.In our work, we used the uni-temporal images for the post-wildfire dataset to train 14 deep-learning models.As shown in Table 11, U-Net with ResNet50 encoder, Attention ResU-Net, and U-Net with ResNet101 encoder was the best three models from all tested models.In general, the trend is that false positive pixel can be reduced slightly by using encoders with deeper architectures, but in our work, the ResNet50 encoder achieved a higher result than deeper encoders such as ResNet101 and ResNet152 with the U-Net model.Attention ResU-Net performances were slightly better in f1-score and IoU metrics than U-Net with a ResNet101 encoder, but training time was two times slower.U-Net with ResNet50 achieved high performance in the corresponding metrics and the training time.

Conclusions
One of the biggest challenges to DL models is the need for available public training datasets to identify features and extract them for making accurate decisions, especially for countries and regions that are less investigated in wildfire ecology (e.g.Turkey).In this work, a big dataset for Turkey's wildfires using Sentinel-2 multiband images was created to aid the development of remote sensing or computer vision-based models for image segmentation, object detection, and classification concerning wildfires.Binary classification for burned-area and non-burned-area detections is supported in this dataset.Various experiments were conducted in this work to evaluate the performance of wildfire monitoring and detection using our dataset, and they achieved great results with the DL segmentation models.We compared 14 deep-learning models based on combinations between five architectures (U-Net, U-Netþþ, Attention ResU-Net, LinkNet, and DeepLabV3þ) and four encoders (ResNet101, ResNet50, ResNet152, and MobileNet) for U-Net and LinkNet.U-Net with a ResNet50 encoder, Attention ResU-Net, and U-Net with a ResNet101 encoder models achieved the best results in IoU and F1-score metrics.We developed a pre-trained DLPK model that has the ability to detect wildfire automatically from Sentinel-2 imagery, which can be used by organizations or wildfire protection associations, and it will decrease decision-making time.Our DLPK model is based on a U-Net with a ResNet50 encoder that achieved high detection of burned area.In general, the proposed DLPK model offers significant benefits: Effective and straightforward in comparison to other state-of-the-art techniques.It works for a retrospective analysis purpose after the wildfire is contained to determine damage.Can be a promising approach to producing operational products after X hours of Sentinel-2 data for rapid mapping.
The future work will use a multitemporal series of satellite images to detect more details about post-wildfire conditions and provide information about the types of land cover that have low or high effects on spreading the wildfire.

Figure 3 .
Figure 3.Samples of four images and masks of the dataset, (a, b, c, d) showing the images in false-colour composite (Red ¼ B12, Green ¼ B08, Blue ¼ B02), (e, f, g, h) showing the binary mask of wildfire with red colour and black.

Figure 8 .
Figure 8. Attention ResU-Net model architecture (a) feature learning module, (b) contextual fusion module, and (c) feature recovery module.

Table 2 .
The information of S-2 satellite images for study area.

Table 3 .
Distribution of dataset images.

Table 4 .
Distribution of dataset masks.

Table 5 .
Deep learning models parameter.

Table 6 .
Case studies to evaluate DLPK model.

Table 7 .
Results of U-Net with ResNet50 encoder using deferent loss functions.F1-score of 98.78%, and an IoU of 97.98%, with an average training speed of 165 s per epoch and a total time of 10 h, 35 min, and 7 s in the training of 231 epochs.While DeepLabV3þ model results are lower than other models in the corresponding metrics, with a precision of 88.98%, a recall of 82.76%, an F1 score of 85.91%, and an IoU of 79.13%.The training speed has an average speed of 317 s per epoch, with a total time of 5 h, 44 min, and 23 s in the training of 62 epochs.Our experiments achieved the highest F1-score of 98.78% and the highest IoU of 97.98% using U-Net with the ResNet50 encoder.Thus, we chose the U-Net with the ResNet50 encoder model to develop into a pre-trained DLPK model based on our

Table 8 .
Attention ResU-Net model results using GPU and CPU.