Predicting Eastern Mediterranean Flash Floods Using Support Vector Machines with Precipitable Water Vapor, Pressure, and Lightning Data

: Flash ﬂoods in the Eastern Mediterranean (EM) region are considered among the most destructive natural hazards, which pose a signiﬁcant challenge to model due to their high complexity. Machine learning (ML) methods have made a signiﬁcant contribution to the advancement of ﬂash ﬂood prediction systems by providing cost-effective solutions with improved performance, enabling the modeling of the complex mathematical expressions underlying physical processes of ﬂash ﬂoods. Thus, the development of ML methods for ﬂash ﬂood prediction holds the potential to mitigate risks, inform policy recommendations, minimize loss of human life, and reduce property damage caused by ﬂash ﬂoods. Here, we present a novel approach for improving ﬂash ﬂood predictions in the EM region using Support Vector Machines (SVMs) with a combination of precipitable water vapor (PWV) data, derived from ground-based global navigation satellite system (GNSS) receivers, along with surface pressure measurements, and nearby lightning occurrence data to predict ﬂash ﬂoods in an arid region of the EM. The SVM model was trained on historical data from 2004 to 2019 and was used to forecast the likelihood of ﬂash ﬂoods in the region. The study found that integrating nearby lightning data with the other variables signiﬁcantly improved the accuracy of ﬂash ﬂood prediction compared to using only PWV and surface pressure measurements. The results of the SVM model were validated using observed ﬂash ﬂood events, and the model was found to have a high predictive accuracy with an area under the receiver operating characteristic curve of 0.93 for the test set. The study provides valuable insights into the potential of utilizing a combination of meteorological and lightning data for improving ﬂash ﬂood forecasting in the Eastern Mediterranean region.


Introduction
Flash floods are sudden and intense flooding events that are typically caused by heavy rain. They can occur in a short period of time, making them difficult to predict [1]. Flash floods can lead to human casualties, causing extensive damage to infrastructure, property, and the natural environments [2]. Flash floods can also lead to serious injuries due to landslides [3] and collapsed infrastructures, as well as disrupt essential services such as electricity, water and transportation, thus leading to significant economic and social disruption. They can also cause erosion on roads and paths, resulting in the formation of potholes, sinkholes, and other hazards [4].
The short occurrence time of flash floods, which is typically a matter of several hours, makes them challenging to predict. Furthermore, when analyzing the output of hydrological models, the most significant factor that controls the generation of flash floods (such the radio waves, causing a measurable delay upon arrival at the receiver. To correct for this effect, GPS satellites transmit radio waves in at least two frequency bands. The other effect is caused by the troposphere, which absorbs the radio waves, causing a delay in their arrival time at the receiver [29]. This delay, known as the zenith tropospheric delay (ZTD), is composed of two types of delay: the hydrostatic delay or zenith hydrostatic delay (ZHD), which is mainly caused by atmospheric pressure, and the wet delay, which is caused by the interaction of the radio waves with water molecules. The wet delay can be calculated by subtracting the ZHD from the ZTD [30][31][32].
The approach of Ziskin and Reuveni [21] involved training three types of machine learning models (random forest (RF), multi-layer perceptron (MLP) and support vector machine (SVM)) with 24 h of PWV data, in order to predict whether a flash flood will occur. The models were trained with 107 unique flash flood events, and were tested using a nested cross-validation technique. The results showed good agreement across the various score metrics for the three ML models and indicated that the models can be improved by incorporating additional features such as surface pressure measurements and the day of year (DOY) information as an additional feature. In addition, a feature importance analysis revealed that PWV values from 2 to 6 h prior to a flash flood are the most important features. These results suggest that near real-time GNSS ground-based data-driven approaches can be used to augment current flash flood warning systems. Thus, when these models were tested with an imbalanced test set, simulating more realistic flash flood occurrence scenarios, they indicated a drop in the false alarm rate (precision score metrics) with a high hit rate (recall score metrics). The study suggested that the suggested flash flood prediction approach could be used to improve real-time flash flood early warning systems, possibly through the use of a multi-class classification task with peak discharge as a threshold parameter. For a comprehensive understanding of the processing parameters and methodology used to derive and validate PWV, as well as for an analysis of diurnal, interannual, and long-term trends, readers who are interested may refer to [31] or [30].

The Contribution of This Study
In this current study, the aim is to address the research gap that was present in a recent paper, presented by Ziskin and Reuveni [21], namely the sharp drop in the false alarm rate (i.e., the precision score metrics) when considering imbalanced data that closely simulate a realistic flash flood scenario.
An additional feature that has been successfully explored for predicting flash floods is the use of lightning activity data, which has proven to be a reliable precursor to heavy rainfall, thus [33] is known to be highly correlated with flash floods occurrence [34][35][36]. As such, the integration of nearby lightning data as a new dataset feature is performed, and the best model is tested with a highly imbalanced dataset. This approach closely mimics a real-life flash flood scenario, where the number of false alarms can have serious consequences. The results demonstrate a significant improvement over previous studies, particularly in terms of precision. Specifically, it was found that all models tested exhibited a lower false alarm rate while maintaining a high hit rate.
The inclusion of this feature may enhance the ability of the learning algorithms to better distinguish between a typical flood event from a fair weather day. The motivation for adding this new dataset feature is the previously reported results concerning heavy rainfalls, which are often accompanied by an increase in nearby lightning activity, that can lead to flash flood events.
The paper is structured as follows: in Section 3, the lightning data used and its integration into the dataset, as well as the flood events utilized in this study, are described. The ML methodology utilized for studying these datasets is then described in Section 4. Section 5 presents the results of the ML models' performance. These results are discussed in Section 6, and concluding remarks are presented in Section 7.

Related Work
A recent study by Giannaros et al. [37] investigated the November 2019 catastrophic flash flood in Olympiada (North Greece) using the mesoscale weather and research forecasting (WRF) model and the integrated multi-satellite retrievals for global precipitation measurement (GPM-IMERG) algorithm. The study showed that the WRF-based Hydrologic Engineering Center-Hydrologic Modelling System (HEC-HMS) could provide a strong indication of the forthcoming flash flood at least two days in advance, while the GPM-IMERG algorithm yielded the best performance in capturing the timing of the excessive rainfall. Another study by Varlas et al. [38] evaluated a hydrometeorological forecasting system that operates at the Institute of Marine Biological Resources and Inland Waters (IMBRIW) of the Hellenic Centre for Marine Research (HCMR). The system combines the Advanced Weather Research and Forecasting (WRF-ARW) model, the WRF-Hydro hydrological model, and the HEC-RAS hydraulic--hydrodynamic model to provide daily 120 h weather forecasts and hydrological forecasts for the Spercheios and Evrotas rivers in Greece. The study demonstrated that the system provided skillful precipitation and water level forecasts and timely flash flood forecasting products, which could benefit flood warning and emergency responses due to their efficiency and increased lead time.
In regards to the use of machine learning for flood prediction, Panahi et al. [39] investigated the potential of using two types of deep learning neural networks-convolutional neural networks (CNN) and recurrent neural networks (RNN)-for predicting and mapping flash flood probability at a spatial scale. They utilized a geospatial database containing records of historical flood events and environmental characteristics of the Golestan Province in northern Iran, to develop and validate the predictive models. A step-wise weight assessment ratio analysis was employed to identify the relationships between floods and various influencing factors. The CNN and RNN models were trained using the results of this analysis, and were validated using the receiver operating characteristics (ROC) technique. The results show that CNN performed slightly better than RNN in predicting future floods, with an area under the curve (AUC) of 0.832 and root mean squared error (RMS) of 0.144, compared to an AUC of 0.814 and RMSE of 0.181 for RNN.
Bui et al. [40] developed a new approach to flash flood susceptibility mapping based on a deep learning neural network (DLNN) algorithm, and tested their approach within a case study of a high-frequency tropical storm area in Vietnam. The DLNN model used a database of features such as elevation, slope, curvature, aspect, stream density, normalized difference index (NDVI), soil type, lithology, and rainfall to predict different levels of susceptibility to flash floods. Feature selection was performed using the information gain ratio. The results indicated that DLNN yields strong prediction accuracy, with a classification accuracy rate of 92.05%, a positive predictive value of 94.55%, and a negative predictive value of 89.55%. The DLNN model performed better than benchmarks based on a multilayer perceptron neural network (MLP) or on support vector machines (SVM), suggesting that it could be a useful tool for flash flood mitigation and land-use planning in the study area.
Band et al. [41] aimed to assess the susceptibility of the Kalvan watershed in Iran to flash floods, using five hybrid parallel and regularized approaches. The extremely randomized trees (ERT) model was found to be the most optimal, with an AUC value of 0.82. The ERT model indicated that 28.33% of the area was at very high to moderate risk of flash floods, with the remaining area at very low to low risk. Topographical and hydrological parameters such as altitude, slope, rainfall, and distance from the river were found to be the most important in assessing flash flood susceptibility. This study demonstrated the effectiveness of hybrid parallel and regularization approaches for estimating flash flood susceptibility in a semi-arid environment.
In regards to the correlation between lightning and floods, Koutroulis et al. [34] examined the relationship between lightning activity and high precipitation events leading to flash floods for the island of Crete. Their results showed that the maximal correlation between the lightning and rainfall data was obtained within a circular area of an average radius of 15 km and an average time lag of 15 min for flood events, and 25 min for nonflood events. In addition, lightning activity was also found to be four times higher during flood-triggering storms. Further analysis is needed to understand the differences between flood and non-flood producing storms.
Soula and Chauzy [35] and Price and Federmesser [36] both conducted studies on the correlation between lightning and rain intensity during thunderstorms and winter storms, respectively. Soula and Chauzy [35] found that the overall spatial correlation between rain and lightning occurrence was very consistent for all types of lightning during four days of thunderstorm activity in France. Price and Federmesser [36] found similar results while investigating winter storms over the central and eastern Mediterranean. Barnolas et al. [42] used a combination of rain gauges, radar, geographic information system (GIS), and lightning data to study a flash flood event that occurred in Catalonia during 12-14 September 2006. They found that the high lightning activity during the event made it an ideal case for studying the relation between lightning strikes and precipitation, thus concluding that the correlation between lightning and precipitation was stronger with increased lightning activity. Hence, these studies demonstrate the importance of harnessing lightning data for predicting and mitigating the risks of flash floods [2].

Datasets
In the current study, the main aim is to improve the performance of ML models used by Ziskin and Reuveni [21] for predicting flash flood events. To achieve this, the exact dataset and methodology utilized by them were used. The dataset for estimating PWV used in Ziskin and Reuveni [21] was obtained from the SOI-APN GNSS ground receivers. The daily RINEX files were processed using NASA's JPL GipsyX software [43], with PPP solutions, minimum cutoff elevation angle of 15 • , GMF for the tropospheric model [44], and 200 ocean loading for all stations. The ZWD was obtained and translated into PWV using the formula [23]: PWV = Π × ZWD. The dimensionless constant of proportionality, Π, was calculated by Ziv et al. [31] using IMS's automated stations and radiosonde measurements [22]. The PWV validation using the Bet-Dagan radiosonde station is extensively explained in [30,31]. The mean diurnal and annual variations were removed during the PWV dataset preparation process.
Supplementary lightning occurrence data were introduced into the ML models, in addition to the dataset used by Ziskin and Reuveni [21]. The lightning occurrence data were obtained from two sources: World Wide Lightning Location Network (WWLLN) and Israel Lightning Detection Network (ILDN). WWLLN determines the locations of lightning strikes by using the time of arrival from at least 5 sensors, with an average global detection efficiency of around 30% for strikes with peak currents values exceeding 30 kA [45]. The ILDN system, on the other hand, consists of 11 sensors, including LPATS and IMPACT sensors. These are distributed throughout the entire state of Israel and have a strike detection efficiency greater than 90% within the Israel area [46]. The ILDN system accurately registers cloud-to-ground strikes of each polarity with a time accuracy around 1 ms, where flashes with peak currents between 0 and 10 kA are automatically filtered out and treated as intra-cloud flashes. The lightning events captured in the vicinity of the SOI-APN GNSS stations in the southern part of Israel are illustrated in Figure 1.
To align the WWLLN and ILDN datasets, low-magnitude lightning events below 25 kA were excluded from the ILDN dataset. This allowed focusing on high-magnitude lightning events in both datasets. It is worth noting that the ILDN dataset does not include RMS information, precluding the employment of RMS considerations during pre-processing. The research period spanned from September 2004 to December 2010, as well as from July 2017 to July 2020, based on the availability of the lightning data.  [21], with a radius of 10 km around each GNSS station.

Methodology
The ML methodology introduced in this study is based on the methodology presented by Ziskin and Reuveni [21], and is illustrated in Figure 2. The figure depicts the complete steps and processes, beginning with target and feature selection, through data pre-processing and model input, and finishing with the creation of the best model fit. The steps are explained in detail in the following sections. year, and the nearby lightning activity, with the target being the flash flood occurrence times. The preprocessing stage involves standardizing the lightning data by resampling them at 1 h resolution time, and aligning the hydrometric station data, GNSS-PWV, DOY and surface pressure measurements following the pre-processing step described in Ziskin and Reuveni [21] work. The creation of 24 h sequences, with balanced classes, concludes the pre-processing phase. In the learning process, the SVM model is optimized using cross-validation technique. The final output of each model is a prediction of whether or not a flash flood will occur in the 25th hour.

Data Pre-Processing
One of the key steps in building a ML model is the selection and generation of features that can effectively capture the underlying patterns in the data [47]. In this section, we describe the various techniques and methods that we used to generate the features for the ML model, adding to the ones Ziskin and Reuveni [21] presented. WWLLN: First, lightning events with large residual RMS greater than 30 ms, which exceeds the maximum allowed time for detecting the lightning event, were filtered out from the WWLLN dataset. ILDN: For the ILDN dataset, it was necessary to remove low-magnitude lightning events due to the high-magnitude events contained in the WWLLN dataset. To achieve this, all lightning events below a magnitude of 25 kA were filtered out, allowing us to focus on the large magnitude events in both datasets. We note that the ILDN dataset lacks the RMS information, so it was not possible to pre-process this dataset using a RMS considerations.
Furthermore, since the lightning activity area with an average radius of 15 km has the highest correlation with rainfall data Koutroulis et al. [34], we integrated all the lightning locations within the same radius originally utilized by Ziskin and Reuveni [21], as they considered all the flash flood events within a 10 km radius of at least one of the GNSS stations listed in Table 1. Table 1. Geographical coordinates and names of SOI-APN GNSS stations used by Ziskin and Reuveni [21], in accordance with Figure 1 lightning occurrence locations.

Feature Extraction
In this study, a method for extracting relevant features from the dataset was developed in order to analyze the correlation between flash flood events and lightning activity. Specifically, 24 h lighting vectors were created for each flood event by integrating the number of lightning strikes that occurred within close proximity to the nearest GNSS station at a temporal resolution of 1 h.
The GNSS station closest to each flood event was first co-located to construct the lightning occurrence vectors. Then, the number of lightning strikes occurring within a 10 km radius around each GNSS station at 1 h time window over a 24 h period was determined. The distance was chosen based on the fact that the circular area with the highest correlation between lightning and rainfall data had an average radius of 15 km [34].
The counts of lightning strikes were integrated for each 1 h time window and assembled into a 24 h vector representing the chosen flood event. This method of computing the lightning vector for each flood event allowed us to analyze the temporal evolution of lightning activity in relation to a specific flood event, investigating any potential correlations or patterns. A comparison between the mean lightning strikes within a time window of 24 h prior to all flash flood events analyzed in this study, versus the mean lightning strikes 24 h prior to all quiet days (non-flash events) is presented in Figure 3. The feature extraction process in this study enabled us to effectively capture and analyze the relevant lightning data for each flood event in a consistent and systematic manner. This approach provided valuable insights into the coupling between flood events and lightning activity, informing the subsequent analysis and interpretation of the data.
After first filtering the dataset to include only flood and quiet (non-flood) day events for which lightning data were available, flood and quiet days that occurred only during winter days were taken into consideration, as summer rain is very rare in the EM region. The DOY feature introduced by Ziskin and Reuveni [21] reflected this filtering process. Consequently, a total of 105 flash flood events and 1219 quiet days remained. To simulate a realistic flash flood scenario, we then used an 80/20 randomized train-test split, resulting in 85 flood events and 85 quiet days in the training set, and 20 flood events and 1134 quiet days in the testing set. This resulted in a ratio of 56 quiet days to one flood event in the testing set. This split allowed us to evaluate the performance of the model on a separate, unseen dataset, ensuring its robustness and generalizability.

Support Vector Machine (SVM)
In this study, the support vector machine (SVM) technique was chosen to classify the flash flood event dataset. The SVM algorithm was applied to the dataset from Ziskin and Reuveni [21], which includes precipitable water vapor, surface pressure, and DOY, augmented with the associated lightning activity, as explained above. The decision to employ the SVM technique was based on its demonstrated effectiveness in classification tasks, as previously shown by Ziskin and Reuveni [21].
SVM works by discovering the high-dimensional hyperplane, which maximally separates the different classes [48]. It is particularly effective when the data are not linearly separable [49], as in this setting the kernel trick may be used to embed the data in a higherdimensional space admitting a linear separator [50]. In this study, the SVM approach was used to classify flood events based on their associated lightning activity vectors. To choose the optimal hyperparameters for the SVM model, a Bayesian optimization has been used rather than a grid search approach. Bayesian optimization is a global optimization method, which uses a probabilistic model to guide the search for the best hyperparameters [51][52][53]. It has been shown to be more efficient and effective than grid search in many cases, particularly for complex, high-dimensional models such as SVM [54].

K-Fold Cross Validation
As a key aspect of the evaluation of the model's performance, we have incorporated a k-fold cross validation process. The k-fold cross validation involves dividing the entire training-set into k equal subsets, using a randomized stratified sampling approach to ensure that each subset is representative of the overall dataset, where the other k − 1 subsets are used for training, and one subset is used for testing. This process is repeated k times, with each subset being used once for testing. The results from each iteration are then aggregated to produce a comprehensive evaluation of the model's performance. By utilizing this approach, we can avoid overfitting, as the model is tested on previously unseen data. The results from k-fold cross validation provide a useful understanding of how well the model generalizes to the new data, providing a more robust evaluation of the model's performance compared to training and testing with a single fixed dataset. In this study, 5-fold cross validation was used.
The decision to use here a standard k-fold cross validation approach instead of nested cross validation was made due to the limited amount of data available [55]. With limited data, the standard k-fold cross validation approach is a suitable choice as it provides good balance between the computational cost and the ability to obtain meaningful results, while still allowing for an evaluation of the model's generalization performance [56,57]. Figure 4 shows the result of the cross validation process, where the groups refers to the nine different GNSS stations stated in Table 1.

Score Metrics
In this study, several score metrics composed of different combinations between true positive (TP), false negative (FN), true negative (TN), and false positive (FP) ratios, were employed to assess the accuracy and robustness of the flood classification model. The score metrics used in this study include accuracy, precision, recall, F1 score, HSS score, TSS score, and the receiver operating characteristic (ROC) curve with its corresponding area under the curve (AUC), as suggested in previous studies [21,52,53,58].
Accuracy is the fraction of correct predictions made by the model, while precision is the proportion of true positive predictions among all positive predictions. Recall, also known as sensitivity, is the proportion of true positive predictions among all actual positive instances. The F1 score is the harmonic mean of precision and recall, and is often used as a single metric to balance these two measures.
The HSS and TSS scores are measures of the skill of a binary classification model in relation to a reference forecast. The HSS score measures the proportion of correctly predicted events, while the TSS score measures the proportion of correctly predicted events as well as the proportion of correctly predicted non-events.
The ROC curve is a graphical representation of the relationship between the true positive rate and the false positive rate of a binary classification model at different classification thresholds. The AUC, is a measure of the overall performance of the model, with higher values indicating better performance.
When working with imbalanced data, it is important to consider the impact of class imbalance on these score metrics. In such cases, it is often preferable to use metrics that are less sensitive to class imbalance, such as the HSS score and TSS score, in order to more accurately assess the performance of the model [59].
The following are the equations for the above metrics:

SVM Result
In this section, we present the results of the best SVM model obtained through the use of Bayesian optimization. Specifically, we evaluate the model's results by examining the minimum classification error per iteration in the optimization process, as illustrated in Figure 5. This approach indicates the progress of the optimization algorithm as it explores the hyperparameter space, with the y-axis representing the minimum classification error, and the x-axis representing the number of iterations. Thus, demonstrating the effectiveness of the Bayesian optimization algorithm in reducing the classification error over the course of the optimization process and achieving a global minimum. In addition, The visualization of the optimization process provides a clear understanding of the overall performance of the SVM model used in this study.

Skill Scores Results
In this section, the results of the skill score metrics evaluation for the best SVM model obtained through the use of Bayesian optimization are presented. The effectiveness of the model in accurately predicting the target variable is demonstrated by the results of this evaluation. Furthermore, a comparison was made between these results and those of Ziv and Reuveni [21] to provide insight into the relative performance of the SVM model compared to other approaches.
The comparison results are presented in Figure 6, demonstrating that the current model outperforms the results of Ziv and Reuveni [21] in terms of skill scores performance, indicating a higher accuracy in flash flood prediction for the realistic scenario.
The flood classification model's experimental results are highly promising, with encouraging performance across multiple score metrics. An accuracy of 0.9913 was achieved by the model on the testing set, indicating correct predictions for the majority of instances. This is particularly impressive since the class imbalance in the data means that simply predicting the majority class all the time would result in a relatively high accuracy. The high skill scores achieved by the model of this study, despite the presence of imbalanced data, suggest that it is both robust and effective. Although imbalanced data are known to adversely affect model performance, the results of this study indicate that the model was able to overcome this issue and achieve high accuracy in predicting flash floods.
Furthermore, the high skill scores achieved in this study, particularly in precision, and F1, show that it has a low rate of false positives and false negatives, which is particularly important in flood prediction, as it can have severe consequences if a flood event is not predicted or if a non-flood event is incorrectly predicted as a flood.
In terms of the F1 score, which is the harmonic mean of precision and recall and is used to balance these two measures, the model achieved a value of 0.7917. This indicates that the model has a decent balance between the precision and recall score matrices, with a relatively high recall value of 0.95 and a lower precision value of 0.6786. The high recall value suggests that the model is able to effectively detect a large proportion of the examined flash flood events, while the lower precision value indicates that there were a relatively larger number of false positive predictions.
The HSS and TSS scores both measure the skill of a binary classification model in relation to a reference forecast. The model achieved an HSS score of 0.7875 and a TSS score of 0.9421, indicating strong performance in terms of both correctly predicted events and correctly predicted non-events. This suggests that the model was able to accurately classify both flood and non-flood events, and was not simply relying on the class imbalance to achieve high performance.
The strong performance of the model across these score metrics demonstrates its effectiveness at classifying flood events based on their associated lightning activity within a given time window.
The high accuracy, TSS and HSS scores indicate that the model succeed to correctly identify both flood and non-flood events, while the relatively high recall and lower precision values suggest that the model perform effectively in detecting flood events, but has higher number of false positive predictions.
By using the same machine learning technique as the model presented in the work of Ziv and Reuveni [21], the current model was able to achieve an improved performance due to the addition of the local lightning activity as an augmented feature. By doing so, we were able to provide the model with additional information regarding the key feature characteristics of each flash flood event, allowing it to make a more accurate predictions.
In addition to the quantitative score metrics analysis, we also assess the current model performance using a confusion matrix and ROC curve representation to provide a visual analysis of the model's performance. These are presented in Figures 7 and 8, respectively. The confusion matrix indicates the number of correct and incorrect predictions made by the model for each class, allowing for a more detailed understanding of its performance. The ROC curve, on the other hand, illustrates the trade-off between the true positive rate and false positive rate at different classification thresholds, allowing for a more nuanced understanding of the model's performance. Together, the confusion matrix and ROC curve representation provide a comprehensive view of the model's performance and allow for a more thorough evaluation of its accuracy and robustness.  In this study, a comparison was made to recent studies by Panahi et al. [39] and Bui et al. [40] in order to provide a more comprehensive understanding of the performance of the approach, see Figure 9. Notably, the comparison emphasizes the performance of the approach in the presence of imbalanced data, an aspect that has not been extensively investigated in either Panahi et al. [39] or Bui et al. [40]. By highlighting this research gap, the comparison underscores the novelty of the approach in addressing this critical issue and emphasizes the need for further research in this area. Despite this, the approach continues to demonstrate relatively good performance in comparison to available metrics, providing a promising foundation for future research efforts to utilize this methodology. Figure 9. Comparison of skill score metrics for flash flood event prediction between the current SVM model and both Panahi et al. [39], and Bui et al. [40] works.
Incorporating nearby lightning activity, around the examined hydrometric stations, as a feature allowed the model to capture the correlation between the lightning activity and flash flood occurrence, enhancing the SVM model results presented in the previous study carried by Ziv and Reuveni [21]. This augmented feature added additional information, which clearly contributed to the improved performance of the model, as indicated across the various examined score metrics. All together, these results demonstrate the advantage of including diverse relevant features in ML models, along with the potential for improved performance by leveraging additional data sources.

Discussion
Flash floods are a major natural disaster that can cause significant damage and loss of human lives. As such, the development of accurate and reliable methods for predicting flash flood events is of critical importance for risk management and disaster response efforts.
We then filtered out the data to only include flash flood events with available nearby lightning data, taking into account the DOY feature (i.e., only integrating the lightning, which occurred during winter time), resulting in a dataset of 105 flash flood events along with 1219 quiet (non-flood) days. We separated the resulting dataset into a training set (80% of the data) and a testing set (20% of the data), ensuring that the ratio of flood events to quiet days was approximately 1:1 for the training set (i.e., balanced set), where for the remaining testing set a ratio of 1:56.
The pre-processed data were used to train an SVM model to classify flash flood events based on their adjusted PWV, surface pressure, and associated nearby lightning activity. The model achieved impressive performance across multiple score metrics calculated from the imbalanced testing set, including an accuracy of 0.9913, F1 score of 0.7917, HSS score of 0.7875, precision of 0.6786, recall of 0.95, and TSS score of 0.9421. These results demonstrate the effectiveness of the model in accurately predicting flash flood events, particularly in the presence of imbalanced data.
In this study, the focus was on the imbalanced dataset test to simulate a flash flood occurrence that is rarer, which is typical for the study area in the EM region. This scenario was estimated to represent a flash flood frequency of 1 in 57 days. Results were similar to those reported by Ziskin and Reuveni for most metrics, but a notable improvement in the precision and F1 metrics' performance was observed, demonstrating the ability of this model to accurately classify both flash flood and non-flood events in a more realistic scenario.
The results of the confusion matrix and ROC curve representation are presented in addition to the quantitative score metrics to provide a visual understanding of the model's performance. The confusion matrix shows the number of correct and incorrect predictions made by the model for each class, while the ROC curve illustrates the trade-off between the true positive and false positive rate at different classification thresholds. This provides a comprehensive evaluation of the model's accuracy and robustness.
The comparison with recent studies presented in Figure 9 has provided a more comprehensive understanding of the performance of the current approach in comparison to other recent works. Notably, this comparison is significant for its emphasis on the performance of the current approach when faced with imbalanced data, which has not been extensively examined in either Panahi et al. [39] or Bui et al. [40]. By highlighting this research gap, the comparison underscores the novelty of the current approach in addressing this critical issue and the need for further research in this area. Despite this, the current approach demonstrates promising results and continues to perform relatively well in comparison to the available metrics, serving as a promising foundation for future research efforts aimed at utilizing the methodology.
In summary, the potential for accurately classifying flood events using machine learning and lightning activity data was demonstrated in this study. An improvement over the previous research presented by Ziv and Reuveni [21] has been achieved, and the value of using advanced machine learning techniques and diverse data sources to build more accurate and robust models has been highlighted. The use of additional features and data sources to further improve model performance, as well as the application of the model in operational settings to aid in flood prediction and risk management efforts, may be explored in further research.

Conclusions
The objective of this study was to explore the classification of flash flood events using an SVM model that incorporates GNSS-PWV and surface pressure measurements, augmented by nearby lightning activity data. The experimental results demonstrated that the model's performance improved when nearby lightning activity was incorporated as an augmented feature, capturing the correlation between atmospheric electricity characteristics and flash flood occurrence. This improvement was observed in the precision and F1 metrics' performance on an imbalanced testing set, contributing to the development of a more accurate and reliable flash flood classification system. The findings suggest that the integration of atmospheric electricity data can enhance the performance of existing flash flood prediction models and help mitigate the devastating effects of these natural disasters.  Data Availability Statement: The data presented in this study are contained within the article in Section 3.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: