Detection of Subtle Thermal Anomalies: Deep Learning Applied to the ASTER Global Volcano Dataset

Twenty-one years of Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) global thermal infrared (TIR) acquisitions provide a large amount of data for volcano monitoring. These data, with high spatial and spectral resolution, enable routine investigations of volcanoes in remote and inaccessible regions, including those with no ground-based monitoring. However, the dataset is too large to be manually analyzed on a global basis. Here, we systematically process the data over several volcanoes using a deep learning algorithm to automatically extract volcanic thermal anomalies. We explore the application of a convolutional neural network (CNN), specifically UNET, to detect subtle to intense anomalies exploiting the spatial relationships of the volcanic features. We employ a supervised UNET network trained with the largest (1500) labeled dataset of ASTER TIR images from five different volcanoes, namely, Etna (Italy), Popocatépetl (Mexico), Lascar (Chile), Fuego (Guatemala), and Kliuchevskoi (Russia). We show that our approach achieves high accuracy (93%) with excellent generalization capabilities. The effectiveness of our model for detecting the full range of thermal emission is shown for volcanoes with very different styles of activity and tested at Vulcano (Italy). The results demonstrate the potential applicability of the proposed approach to the development of automated thermal analysis systems at the global scale using future TIR data such as the planned NASA Surface Biology and Geology (SBG) mission.

to identify reactivation of activity, forecast eruptions, and assess hazards [1]. In particular, volcanic thermal changes, indicative of preeruptive volcanic thermal activity, have been observed [2]. High spatial resolution satellite sensors such as the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) can be used to detect such anomalies. ASTER is onboard of Terra satellite and has acquired volcanic data since 2000 and with improved cadence since the urgent request protocol (URP) program started [3], [4]. The URP sensor web enables new acquisitions that are triggered by either orbital or ground-based systems generating more than 320 000 new scenes of the world's volcanoes [5].
The abundance of image data from numerous orbital sensors (current and planned) is leading to the critical need for novel approaches to process these large datasets to avoid manual inspection. Artificial intelligence (AI) is quickly growing in different remote sensing fields because of its capability to automatically learn patterns from the data. In fact, AI enables problem-solving by creating systems able to make predictions or classifications based on input data. In particular, with the advent of machine learning (ML) and deep learning (DL) techniques, a new programming paradigm allows us to deal with this amount of data by extracting data-driven patterns (i.e., without explicit programming), otherwise missed by traditional approaches.
Over the last decades, different methods have been explored to detect and estimate the temperature above background in volcanic areas [6], [7]. The most common approaches rely on a spatial statistical analysis based on a scene-byscene choice of the background temperature region. Thresholdbased techniques are commonly adopted to automatically classify thermal features; however, subtle thermal changes that can reveal preeruptive signs are typically missed (e.g., low background thermal gradient and the presence of background nonvolcanic thermal features). In fact, although significant thermal anomalies, such as active vents or lava flows, are easy to detect, a major challenge still exists to detect subtle anomalies because of their similar values to solar-heated rocks or their partial obscuration by clouds, volcanic degassing, and plumes.
Given external perturbation and possible uncertainties with respect to seasons, local weather, topography, and surface heterogeneity, subtle thermal anomalies are commonly a weak signal in a strong background level, and thus, they are hard to This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ isolate using only statistically based pixel intensity approaches that work well for much higher temperature anomalies. Despite that, some satisfactory results are shown using spatiotemporal anomaly detection algorithms based on a statistical profiling approach where fixed statistical thresholds are used [6]. In addition, a temporal-based statistical approach has also been used to process ASTER data [8]. However, the impact of shortterm meteorological warming is difficult to eliminate because it is the same magnitude as subtle thermal changes.
ML is widely used in data analysis to automatically learn patterns from the data. Several applications for volcano monitoring have been proposed using both ground [9], [10] and satellite remote sensing data [11], [12]. ML techniques involve a feature extraction phase followed by a classification step. While the feature extraction phase is embedded in the ML technique rather than performed manually (i.e., feature engineering), it is referred to as DL. This tool is widely used for image segmentation and pattern recognition where deep convolution neural networks (CNNs) allow adaptation to most specific applications without the need for manual feature extraction by using hierarchical learning methods.
Here, we develop a DL approach in Google Colab platform to automatically extract subtle to large volcanic thermal features by using the spatial information from ASTER data acquired over several decades. In particular, rather than using a purely intensity-based detection approach, we extract spatial features from ASTER TIR images by using a convolutional neural network (CNN). Our goal is to exploit the same features of the human visual system to train a DL model to recognize volcanic anomalies based on their spatial features rather than just their intensity. This neglects other external (and complicating) sources such as solar irradiance. The UNET network (the so-called fully CNN) is trained by using nearly 1500 labeled ASTER TIR images over five volcanoes, the largest such labeled database of its kind [13]. Its effectiveness in detecting the complete range (subtle to high temperature) of thermal changes is then applied to TIR data from Vulcano, Italy, as a test of the model.

II. DATA A. ASTER
ASTER is on the Terra satellite that was launched in late 1999 as a part of NASA's Earth Observing System [5], [14]. ASTER has an equatorial revisit time of 16 days, which is improved at higher latitudes due to converging orbits as well as with the URP program started in 2009. URP-based ASTER acquisitions entail data collection on every overpass possible using off-nadir pointing. Data are acquired in eight spectral bands (three in VNIR and five in TIR). The TIR bands span from 8.125 to 11.65 µm at 90-m spatial resolution. Topographically corrected images (AST_L1T_003) are used in this study to reduce the effect of topography. These data are provided by USGS (https://lpdaac.usgs.gov/products/ast_l1tv003/).

B. Volcano Selection
We chose a set of five volcanoes to train and test the UNET model ( Fig. 1), namely, Etna (Italy), Popocatepetl (Mexico), Lascar (Chile), Fuego (Guatemala), and Kliuchevskoi (Russia). A sixth volcano-Vulcano (Italy)-was chosen for the testing phase. In particular, these volcanoes show different thermal features (e.g., fumaroles, lava flow, and lava lake) and different environmental conditions, such as topography, solar inclination, and cloud conditions. We have considered both nighttime and daytime data to exploit all the available images at each volcano. We use the band B13 (10.25-10.95 µm) data for the temperature as has been done in previous studies [8].

III. METHOD
A schematic framework of the proposed approach, including the training and prediction phases, is shown in Fig. 2. Supervised techniques require a large volume of labeled data produced using an automatic or semiautomatic approach to serve as the training set. We use the statistical approach proposed in [13] to label data followed by an expert supervision step required to properly check the correctness of the labeled data. Only the images that pass through this stage are used.
Input and target data are then processed by a UNET that learns to detect volcanic thermal features (positive class) against background, clouds, and missing data [i.e., border pixels with missing values (negative class)] during the training phase. Finally, the trained UNET model is employed in the anomaly prediction process using images that have never been seen by the model.

A. Volcano Selection and Data Preprocessing
Coordinates of the volcano of interest are inserted by the user in the Google Colab interface and the ASTER L1T data are accessed through Google Earth Engine (GEE) server from 2000 to 2021. Data are available in digital number (DN) format and converted to brightness temperature (BT) for band B13 using the steps described in [15]. Because of the large data volumes, images are clipped using an optional buffer centered on the volcanic edifice. Finally, the.geotiff images are downloaded and saved in cloud through Google Colab.

B. Model Input Preparation
The input image size needs to be chosen at this phase and the choice is driven by several considerations. Because the number of pixels containing anomalies is far less than the background in the image, the number of training samples belonging to the different classes has to be balanced; otherwise, the algorithms can become "overtuned" to specific case studies [16]. This process is case-specific and the optimal size is achieved by trial and error. This "imbalance problem" means that only a few locations contain objects (i.e., foreground) and the rest are background objects. It leads to a main issue: training is inefficient as most locations are easy-negatives, meaning that they can be easily classified by the detector as background that contributes no useful learning. Easy negatives account for a large portion of training data, thus weighting more in the model performance making small loss even when the foreground is not well detected.
Therefore, we crop the image in patches containing a similar amount of volcanic thermal features and background pixels. By reducing the input image size, we reduce the size of the data and retain the region of interest and its corresponding labels. Then, patches are extracted from each image to determine the input size of the CNN (i.e., dimension of power of 2). We have chosen a 48 × 48 pixel size. Different sizes were tried and 48 × 48 was considered optimal, being a good compromise between the number and the size of the convolutional layers reaching 3 × 3 (the smallest size able to extract the smallest spatial feature).
Then, each input temperature image is converted into grayscale images, with pixel values scaled to [0, 255] using the min-max normalization. The patch extraction sequence is shown in Fig. 3.

C. UNET Model Identification
CNN is a neural network class that exploits deep, locally connected layers to extract discriminative features (e.g., the spatial distribution of thermal anomalies) and to classify the input learning from the data. In particular, features are progressively generated with an increasing level of complexity from the input to the output. For image-based classification, the first layers convolve small spatial regions with weight blocks that are found during the training phase. These blocks represent the feature extractors that work similar to the human eye and thus similar to the Gabor filter principles [17]. Our goal is to replicate the way in which the human visual system detects thermal anomalies in a grayscale image to extract automatically only those pixels. This overcomes the difficulty of using simple intensity-based approaches to detect subtle changes that are comparable with the background. Training the CNN allows learning from the data without any a priori setting of the relevant spatial features. For this purpose, we use the UNET, a type of CNN designed for semantic image segmentation. In UNET, the initial series of convolutional layers  are interspersed with max-pooling layers, i.e., an operation that computes the maximum value for patches of a feature map to create a downsampled (pooled) feature map, thus decreasing the resolution of the input image. These layers are followed by a series of convolutional layers interspersed with upsampling operators, successively increasing the resolution of the input image. Combining these two series paths forms a U-shaped graph [18]. The U-shaped architecture is symmetric and consists of two parts. The left half is the encoding/contracting path constituted by the convolutional process decreasing the input dimension. The right half is the decoding/expansive path, which is constituted by transposed 2-D convolutional layers allowing data to be upsampled to the original image size.

D. Semiautomatic Labeling Technique
During the model identification phase, images have to be labeled to obtain the targets that are used to tune model parameters and evaluate performance indices. Labeling is a process of generating the expected model outputs for a subset of images, i.e., the expected model output will be a binary image whose pixels have a null value if they belong to the background and one if they are an anomaly. These have been prepared by applying the statistical technique introduced in [13] where a multistep approach based on both intensity and texture features is adopted. In particular, because a purely intensity-based approach is not always able to detect subtle anomalies very close to the background area, we also use a bank of Gabor filters to emulate how the human eye performs feature extraction. Abundant applications for the Gabor feature space exist not only in biologically inspired systems but also in numerous other areas such as fingerprint detection and texture classification [19]. While a Gabor filter is applied to an image, it gives the highest response at edges and at points where texture changes, thus having a distinguishing value at the spatial location of that feature. This bank of filters is tuned for different frequencies and orientations to localize roughly orthogonal subsets of frequency and orientation information in the input data. In particular, we designed a bank of 24 Gabor filters that represent combinations of multiple wavelengths, from 2.8 to 11.3 pixels/cycle, and 18 orientations, namely, from 0 • to 170 • with steps of 10 • [13]. The input for this detection technique is an image whose pixel values are the thermal intensity weighted by these spatially located features. This is done in order to use both thermal and shape information to detect anomalies by setting some a priori statistical parameters. This algorithm detects both subtle and higher temperature anomalies. However, because it relies mainly on fixed parameters for the Gabor filter design, it may still either detect nonvolcanic anomalies or miss some. Thus, the identified volcanic thermal anomalies are later checked by an expert and only the ones with a high degree of confidence are used as labeled target images.
This step is critical to overcome the limits due to only using an intensity-based approach. The visual inspection performed by the expert is actually able to identify a subtle volcanic anomaly that has an intensity only slightly above the background (thus having similar values in the TIR image) but also a clear distinguishable spatial feature that is associated with the volcanic-related activity (based on the spatial features of the TIR image). Thus, although the automatic portion of the semiautomatic labeling algorithm gives a result mainly based on intensity, the expert check addresses any misclassification related to volcanic and not volcanic areas having similar intensities.
For this study, a large group of 1500 input and target images from five volcanoes were collected and labeled for the UNET input. This represents the largest database of labeled, high spatial resolution TIR data for volcanic thermal anomalies.

E. Training and Test Datasets Creation
For the training phase, images containing primarily the volcanic area are selected for the UNET input. The input and target images are then split into the training and testing datasets with a ratio of 80:20 [20]. Images acquired over Etna, Popo, Lascar, Fuego, and Kliuchevskoi volcanoes were used, and thus, for each volcano, a subset (1200) was available for the training phase corresponding to 2.7 × 106 data samples. During the training phase, it is fundamental to include samples representative of different kinds of anomalies, namely, subtle fumaroles, hot gas vents, active lava flows, and domes as the positive class, as well as the background data, clouds, and missing data as the negative class. For the testing phase, a dataset consisting of 20% of the labeled data (i.e., 300 images corresponding to about 7 × 105 samples) is used.

F. UNET Training
The model has been trained with a learning rate of 1 × 10 −3 , a mini-batch size of 128, and four encoder depths, for 100 and 300 epochs with the Adam optimizer to minimize the focal loss [18], [21]. The focal loss function is based on the cross-entropy (CE) loss. The focal loss compensates for class imbalance by using a modulating factor that emphasizes hard negatives during training.
In fact, although we have reduced the size of the training input images, class imbalance problem may still be present because many subtle thermal anomalies occupy a very small portion of the image (i.e., few pixels, with respect to the background). The focal loss is an improved version of CE loss that tries to handle the class imbalance problem by assigning more weight to hard or easily misclassified examples (i.e., background with noisy texture, partial object, or the object of our interest) and to downweight easy examples (i.e., background objects). Thus, the focal loss reduces the loss contribution from easy examples and increases the importance of correcting misclassified examples.

G. Testing and Performance Evaluation
The learned model is then applied to test data never seen during the training phase. Different performance indices are used in order to assess the feasibility of the proposed approach. For each class, namely, anomaly and background, precision, recall, and F1-score, also known as Sørensen-Dice coefficient, are computed.
Precision indicates the fraction of predicted positive values that are actually positive, recall indicates the fraction of the actual positive values that are correctly predicted as positive by the classifier, and F1-score combines precision and recall into a single measure. Macro scores are calculated by getting the metrics for each class individually and then taking mean of the measures (1-9) MacroPrecision = Precision anomaly + Precision background 2 Recall anomaly = TP TP + FN (4) F1_score anomaly = 2 Precision anomaly * Recall anomaly Precision anomaly + Recall anomaly (7) F1_score background = 2 Precision background * Recall background Precision background + Recall background (8) MacroF1_score = F1_score anomaly + F1_score background 2 (9) where true positive (TP) is the number of real positives that are correctly predicted as positive, false negative (FN) is the number of real positives that are wrongly predicted as negative, false positive (FP) is the number of real negatives that are wrongly predicted as positive, and true negative (TN) is the number of real negatives that are correctly predicted as negative.

H. Output Preparation
The UNET output is a collection of 48 × 48 thermal anomaly binary maps and therefore has to be reaggregated combining the previously processed map patches into the full images. The trained model is now applied to the same or other volcanoes for monitoring volcanic thermal behavioral trends.

I. Comparison With Intensity-Based Approaches
A hotspot detection algorithm based on intensity alone finds the best minimum detectable TA value, i.e., a threshold above which every pixel is considered an anomaly. In such intensitybased approaches, the reference value can be computed statistically in space or/and in time.
First, we consider the state-of-the-art algorithm to process ASTER data to be RASTER proposed in [8]. It uses a temporal approach to detect TA, i.e., temporal mean and standard deviation are computed for each location over 20 years (i.e., 2000-2020) of ASTER observations and used to compute a z-score value of band B13. Furthermore, to enhance the detection of subtle changes, a normalized index based on bands B13 and B12 is used. We implemented this algorithm for comparison and only one out of five cases produced a detected anomaly.
Second, because the choice of the background/nonvolcanic varies greatly in prior studies, it greatly affects the threshold value. We also consider a different intensity-based approach (intensity-based 2). An intensity-based algorithm able to detect the same subtle anomalies would pick all the pixels in the scenes greater than that value. Thus, setting the minimum value as the threshold of the intensity-based algorithm, we then apply the trained UNET. In this way, we do not arbitrarily choose any of the existent methods to find the intensity threshold avoiding biasing the result, i.e., with an incorrect choice of the background area to consider.

IV. RESULTS
In order to evaluate the performance of the trained UNET with focal loss performance, a comparison with the standard CE UNET using the same setting is shown in Table I. The accuracy index computed as the number of corrected predictions as a percentage of the total number of predictions would be biased by the background due to unbalanced issues. In fact, the number of pixels belonging to the background class is far higher than those belonging to the anomaly class. Thus, a better performance index is the F1-score, which is a more indicative performance index for datasets with classes with unbalancing issues. F1-score accounts for both the number of prediction errors and the type of errors that are made. In particular, in order to assess the capability of correctly classifying both classes, i.e., anomaly and background, we compute the Macro F1-score that is the unweighted mean of the F1-scores calculated for each class [see (9)]. This measure returns the objective results on imbalanced datasets giving equal importance to each class, i.e., a majority class will contribute equally along with the minority. Overall, macro indexes (bold parameters in Table I) are the best metrics for unbalanced datasets.
For each of the volcanos listed in Section II-A, we show the results obtained by applying the UNET with focal loss on the complete ASTER dataset covering the area of interest. The maximum temperature above average is computed as the difference between the maximum temperature of the detected anomalies and the median temperature of the background area (pixels belonging to the background class). For the five volcanoes used for testing, the original band B13, the detected anomalies, and the maximum temperature above average are shown in Fig. 4. The time series of the maximum temperature above average of the detected volcanic anomalies are shown in Fig. 5.
Beside the volcanoes previously shown that have been used to train the UNET, we then applied the detection capability to Vulcano Island (Italy), which was not used during the training phase. In particular, we computed radiative heat flux following the steps in [15].
We computed the total heat flux produced by the detected thermal anomalies as the sum of radiative and convective heat flux computed using (10)- (12). The main difference with respect to previous studies is that the investigated area is not manually detected and constrained to a limited pixel area because a greater area allows possible detection of a larger size anomaly [e.g., Fig. 7(m)] and all the entire archive (day and night) is used Total Heat flux = Radiative flux + Convective flux Thus, for each ASTER scene, the total heat flux is given by the sum of the heat flux produced by each anomaly.
In order to validate the results retrieved by using the UNET, measurements from ground-based instruments are used. In particular, thermal surveys were used, which have been conducted inside the fumarole field since 1994, following the same path and patterns logging the number and maximum temperature using IR radiometers [22]. A thermal normalization index was developed during these surveys to account for the thermal behavior of the system, using maximum and mean temperatures, number of vents, and the standard deviation.
Ground temperature has also been measured since January 2020 using a HOBO data logger station located close to the fumarole field and within the ground thermal anomaly area. It records the 15-cm depth temperature, surface temperature, and air temperature at 5 cm above the surface every 5 or 10 min. The data presented here are the daily averaged surface temperature. Fig. 8 shows the average maximum temperature above background from January 2000 to March 2022 [ Fig. 8(a)] plotted together with the number of fumaroles detected during the field campaign [ Fig. 8(b)] and radiative heat flux of the thermal anomalies plotted together with the normalization index [ Fig. 8(c)].

A. Target Volcanoes
Different performance indices show that the trained UNET (e.g., focal loss and CE) both reached high performance levels. Overall, the F1-scores, both for class and macro, obtained for the focal loss are higher than for CE (0.93 versus 0.91). This confirms the fact that using the focal loss deals better with unbalanced classes and represents the most appropriate choice as a loss function for this style of analysis. More specifically, the training phase is less polarized by background behavior than UNET with CE, thus enhancing the UNET capability to learn anomaly behavior and reduce the number of FPs (0.93 for focal loss versus 0.88 for CE as macro-precision). On the other hand, the recall index computed using UNET with CE is higher than the UNET with focal loss because the training phase is more polarized by the background, thus learning the background behavior better and reducing the number of FNs (0.92 for focal loss versus 0.95 for CE as macro recall).
The UNET trained with the focal loss results for each of the investigated volcanoes is given in Fig. 4 Fig. 4(b) and (c)]. Finally, Fig. 4(e) shows that the UNET is also able to detect smaller anomalies on Popocatépetl during periods of clouds and where higher contrast emerges between cloudy and clear volcanic areas, thus avoiding the detection of FP pixels. Purely intensity-based approaches described in Section III have been compared resulting in either a large portion of missing detection, i.e., FN (intensity-based approach 1: RASTer), or large portions of incorrect detections, i.e., FP (intensity-based approach 2).
Once the thermal features are detected, the time series displaying the maximum temperature above average background is computed. The time series data obtained for each volcano from 2000 to 2022 acquired by the ASTER TIR sensor is shown in Fig. 5. It is worth noting the capability of this model to monitor temperature above average of thousands of images with high precision, thus allowing the detection of subtle changes prior to eruptions. In fact, analysis of the thermal time series shows that in some cases (e.g., Klyuchevskoi), measurable above background temperatures persist for years between eruptions and begin to increase months before the next eruption, becoming precursory in nature. In particular, T increases before well-known eruptions at Klyuchevskoi, namely, 2007/03/ 18-05/22, 2008/10/15-11/30, 2009/09/20-2010/08/28, 2013/08/13-10/13, and 2016/04/06-09/05 [23], which are highlighted in red [ Fig. 5(c)]. This trend was noted for one of these eruptions in a prior study [24]. Because this approach is able to detect volcanic TAs even where their values are similar to the background), the delta temperature values can be very small (even negative) even  Time series for (a) Vulcano (Italy) of the temperature above average, (b) detected area using UNET (black) and number of fumaroles using ground-based measurements (red), and (c) heat flux using UNET (black) and normalization index using ground-based measurements (red).
in the presence of high local gradient around the volcanic TA. These gradients can happen because of solar heating, stronger spatial variations in ambient surface temperature due, for example, to surface cooling with altitude, and atmospheric effects. All can produce greater variations in satellite-recorded BTs than those induced by fumarolic activity, geothermal heating, or conduction/convection above shallow intrusions. As a consequence, potential thermal anomalies associated with low-temperature active volcanic phenomena can produce lower BT values.
We  Fig. 6(b)]. It is evident that anomalies that are very similar in values with the background are detected by the UNET because of the local spatial features that have been learned by the DL model.

B. Vulcano Island
To further test the UNET model, we applied it to ASTER TIR data of Vulcano Island, which was not used during the training phase. Fig. 7 shows the detection maps for centered on Vulcano crater, which is typically characterized by fumarolic activity [15].
It is noteworthy that even for the daytime acquisitions [ Fig. 7(a)-(d)], the algorithm is able to clearly distinguish the volcanic thermally elevated pixels despite solar irradiance on the southern wall of the cone. In fact, scenes without volcanic TA containing, however, nonvolcanic TA due to solar radiation and clouds [ Fig. 7(c), (d), (g), and (h)] are successfully not detected by the algorithm.
The solar radiance is not detected as thermally anomalous because the DL model is able to recognize its spatial features as background rather than volcanic in origin. Even anomalies partially covered by thin clouds [Fig. 7(e) and (f)] are detected as well as a small, transient anomaly occurring north of the main crater near the island's port during a period of warm CO 2 emission [ Fig. 7(i)-(l)] [25]. Two months later, this thermal  Comparison of the ASTER thermal anomaly area and heat flux time series to the ground data shows a notable difference between 2000-2008 and 2009-2022. Prior to 2008, the ASTER URP program had not been enabled [5]. Data for Vulcano were acquired either randomly during the process of compiling the ASTER global map or collected intermittently as part of the larger ASTER global volcano acquisition, which collected data only once every three months. These data were not frequent enough to detect the thermal behavior monitored by the ground measurements. Since 2009, the increased amount of ASTER data available because of the URP program allows a better comparison to the thermal activity.
The heat flux time series in Fig. 8(c) shows some seasonal patterns until May 2021, when increased activity started and a crisis phase was announced. The crisis period began on late June 2021 with the highest thermal level from September 2021 to January 2022 [26]. The seasonality is mirrored by the area trend inside the crater, which becomes smaller during wetter months (Fig. 9). Generally, the seasonal pattern of anomaly areal growth is consistent with the daily average surface temperature measured by ground-based instruments (Fig. 10), i.e., their relationship is linear. However, during the crisis period, a change in the area/daily average temperature relationship is observed. From July 27, 2021 onward, for the same, measured daily average temperatures far higher anomalous area are observed for most of the samples. This is likely due to the fact that the temperature increase was localized in a few of the pixels belonging to the larger extended thermal anomaly. As a consequence, the crisis produced a change more evident in the area trend of the thermal anomaly rather than the average surface temperature.
The increase in the number of detected anomalies is localized inside the entire crater area. Even though seasonality appears to affect the measured temperature for each pixel, the increase of the detected anomalous area may not only be caused by an exogenous solar source. Other processes may be triggered by seasonal exogenous sources such as solar radiation and rainfall.
The surface heat flux is a result of the combined effect of the endogenous geothermal source and the exogenous source from the sun. This heat flux changes due to other external factors such as rainfall, strong winds, and changes in the local sun conditions. Thus, the main issue in accurate surface temperature monitoring of a volcanic area is the difficulty in detecting magmatic impulses and differentiating those from the other thermal signals. It has been shown that the Fossa area sites are heated mainly by the process of steam condensation, producing steam-heated soils (SHSs) [25].
The main thermal changes in the SHS result in changes to the conductive heat flow, but also changes in the depth of the convective front of the geothermal system, reflecting any contraction or expansion of that convective zone.
The uppermost condensation level lays at the top of the layer where the rising vapor adds additional heat to the ground through condensation. When this condensation level moves upward, the conductive layer diminishes and ultimately disappears resulting in the heat transfer becoming essentially advective. The amount of rainfall can affect the thickness of the conductive layer. However, tiltmeters also show some seasonal trends [27], consistent with the solar thermal radiation causing a thermoelastic effect measured by sensors that could be due to displacements of underground masses, e.g., aquifers [26]. Therefore, the thermal changes observed in Vulcano's crater area may be due to a combination of processes. One possibility is the fact that the expansion of the convective flow is reflected by an areal expansion of the thermal anomalies. As previously stated, the expansion/contraction of the convective flow is affected by the amount of rainfall, which is also seasonal. Increased rainfall during the wet season would increase hydrothermal activity, thus enhancing deposition, which progressively decreases the permeability [28]. In this case, thermal anomalies may not be visible in the entire area, thus being subdued during the wet season and more observable during the dry season. Finally, an increase in air temperature leads to an increase in evaporation inside of the crater area during dry seasons, which leads to an increase in the heated areas detected as thermally anomalous. Thus, exogenous sources, such as solar radiance and amount of rainfall, could trigger processes that are responsible for the increase in the areal extension of the detected anomalies inside the crater area. In Fig. 11, the time series of rainfall and area of the thermal anomaly are shown from 2005 to 2022. We use the wavelet coherence analysis to highlight that, especially after the increase in data collection starting in 2009, seasonality is present. Wavelet coherence is generally used to investigate the localized correlation coefficient between two signals in a time-frequency space, i.e., it is highly recommended for time series that are not stationary [29]. This tool allows comparison of the frequency contents of two-time series (y-axis), as well as derives conclusions about the synchronicity of the series at specific periods and across certain ranges of time (x-axis). Wavelet coherence can effectively identify regions of high co-movement in the time-frequency space. The correlation magnitude between two signals is color scaled, while the arrows indicate the phase of the investigated signals. Thus, high values indicate high correlation between them at the correspondent frequency in a specific time interval and vice versa. The wavelet coherence between the areal extension and rainfall is shown in Fig. 11(c) It is noteworthy that both signals are highly correlated at Period = 1 year for the entire time interval overall, i.e., one-year seasonality. We expect higher values after 2009, but the highest values are observed from 2014. The opposite phases confirm that the rainfall signal's peak occurs in winter, whereas the maximum areal extension occurs in hot seasons.
Finally, we focus on the last crisis. A consistent increase in heat flux is observed starting in May 2021 reaching the highest peak (86.4 MW) on October 19, 2021 (Fig. 12). This trend is also detected by ground-based instruments, as well as the SO 2 and CO 2 flux [26] and by Visible Infrared Imaging Radiometer Suite (VIIRS) satellite sensor at a 375-m spatial resolution starting from September 2021 [30]. An independent study used an index computed by visual inspection-based anomaly detection in nighttime ASTER data starting from June 2021 [22]. An ambient control area on the island far from the active target La Fossa is used as the background. A similar index than the one used for the fumarole survey, i.e., normalization, was created using the maximum and mean surface temperature of the detected anomaly, the number of anomalous pixels, and the standard deviation. This index is used as a proxy for ground-based heat flux considering the seasonality of surface temperature. It is worth noting that the trained UNET allows us to process daytime acquisitions as well, which is typically removed in past studies using only intensity-based detection techniques. The importance of this aspect is shown for the Vulcano study case, which creates a better thermal trend with more sample points available.
The highest peak, therefore, reached during the crisis is in a daytime acquisition. Both the heat flux and the thermal index are generally correlated, although a large difference is observed for the 29/09/2021 21:00 (gray shaded area in Fig. 12). In this case, a much larger area is detected by the visual inspection than detected using the UNET. This scene contained widespread, thin clouds making the anomaly difficult to discern accurately. This case is a good example of the DL model's potential to automatically discriminate anomalies based on previous experience in areas that may be much harder to recognize even by human inspection.

VI. CONCLUSION
Current satellite TIR systems have the potential to routinely monitor thermal change at volcanoes globally. Large data volumes are freely available, and however, quantitatively analyzing this huge volume of data is impractical. Techniques, such as threshold-based statistical approaches, are generally not accurate for subtle thermal anomaly detection. Thus, we have applied an ML framework based on deep CNNs and greatly reduced the potential of labeling errors by using a rigorous statistical approach supervised by experts on a large set of ASTER data for five different volcanoes. Results show that the proposed model is applicable to different volcanoes with different eruptive styles and most importantly is able to detect all volcanic thermal anomalies from subtle to intense with high precision. This is shown for the last two decades of thermal monitoring at Vulcano, Italy. This case study, compared with independent ground-and ASTER-based data, successfully monitored the thermal activity during the recent crisis period on Vulcano in September 2021. An increase in the monitored heat flux was successfully observed before the peak of thermal activity and correlated with ground-based monitoring. This system can be used to monitor volcanic thermal behavior with higher levels of confidence with respect to other existing data modeling systems. In fact, no a priori choice related to thresholds is needed and background influence on the outcome is avoided because our DL technique detects volcanic anomalies with a different paradigm with respect to the traditional approach by learning the textural features from data similar to what the human eye does. We are able to process automatically a large volume of high-resolution TIR data for any volcano, thus improving thermal anomaly detection and monitoring with high sensitivity, which is fundamental to detect subtle preeruptive and posteruptive changes. This allows investigation of low-level and/or spatially small volcanic thermal behavior otherwise missed by traditional approaches. The next logical step will involve designing a data-driven approach to automatically discover preeruptive trends from the results of this model. With future planned TIR instruments having greatly improved temporal (1-3 days) and spatial (60 m) resolution data such as NASA's Surface Biology and Geology (SBG) mission, this approach will be ideal for quickly interrogating the much larger data volume in near real time.