Estimation of Flores Sea Aftershock Rupture Data Based on AI

ABSTRACT


I. Introduction
Flores Island, Indonesia, is a seismically active area [1].Pranantyo et al. [2] revealed that the seismic activity of Eastern Indonesia is thought to be influenced by the isolated thrust fault segment of the island of Flores and the island of Wetar.The study of the rising fault segment on Flores Island and Wetar Island helps in further understanding the fault behavior, earthquake pattern, and seismic risk in the Flores Sea region.The data is collected through seismic modeling, which is able to make predictions about the potential for future earthquakes.With a better understanding of isolated rising fault segments on Flores Island and Wetar Island, disaster mitigation efforts can improve, especially with a more effective early warning system.This is an important step to protect the public and reduce the negative impact caused by the earthquake in the region.The earthquake catalog records earthquakes of > Mw 7 that have hit the Flores area, three of which occurred in the Flores Sea in 1992, 2015, and 2021.In earthquakes with a giant magneto, an aftershock can occur due to the interaction of ground movements [3].It is caused by an increase in the deformation mechanism (increase in stress) for a fracture area which is quantified as a change in Coulomb Stress Failure (CFS) [4].Changing earthquake stress will encourage or inhibit seismic activity in the area.This can be proven by calculating the Coulomb stress, which helps predict the location of its increase in the future.Based on this, the aftershock phenomenon will often occur in areas with positive changes in Coulomb stress rather than negative Coulomb stress [5].
A. Jufriansah, et  Because the Flores Sea frequently sees intense seismic activity, it is crucial to research the aftershocks in this region.Aftershocks are earthquakes that happen immediately after the first shock and frequently have a smaller magnitude [6].But aftershocks can wreak more harm and even present a risk to places already impacted by the preceding mainshock.By comprehending aftershock patterns in the Flores Sea, scientists and seismologists can more accurately evaluate risk [7].They can identify areas that could be impacted, forecast the likelihood of aftershocks, and give the immediate area early notice.The Flores Sea aftershocks must be better understood in order to create a more reliable early warning system.One way that this technology can warn people before aftershocks happen is by using an artificial intelligence system.
Machine Learning (ML) algorithms are machines that can learn like humans.ML studies the system's behavior based on dataset information without requiring prior knowledge [8].Several ML algorithms that have been studied in earthquake prediction and Aftershock are grouped into regression and classification algorithms.Various ML architectures are presented in (Figure 1).Wieland et al. [9] explained that the success of ML is shown by its success in identifying earthquake damage.It is clarified by the use of SVM in the Japanese earthquake data in March 2011.In addition, Syifa et al. [10] once compared predictive data based on the accuracy value between SVM and ANN.
So far, there has not been much research on the aftershock hazard in the Flores Sea using predictive models.So, it needs to be used as an evaluation material because three large earthquakes have historically hit the Flores Sea.Furthermore, it is essential to study the Aftershock study because it can provide new information about seismogenic processes, including the location of newly active faults.
Several studies have succeeded in comparing several classification and regression algorithms.The evaluation process is carried out by comparing several parameters, such as the estimated value.So, this research is considered necessary in the future as a time alarm and forecast in analyzing Aftershock predictions.Based on the description above, the purpose of this study is to determine the estimated value of the aftershock rupture data for the Flores Sea earthquake based on AI.The best estimation results will provide a correction value to be used as comparison data for further investigations regarding the prediction of aftershocks.The benefits of this research are as a hope in the geohazards field and predict earthquakes with a reasonable accuracy level.

II. Theory
This Relevant applications of classification and regression algorithms exist in the analysis of earthquake data [12].In this context, classification algorithms can be used to classify earthquakes into various categories, such as shallow, medium, or deep earthquakes, based on the profundity of their epicentre [13].In the meantime, the regression algorithm can be used to simulate the relationship between certain features, such as earthquake magnitude, depth, or epicentre distance, and certain parameters, such as building damage or vibration intensity [14], [15].In theory, classification and regression algorithms use a variety of mathematical techniques to classify and analyse earthquake data [16].
Implementing the classification algorithm on earthquake data requires data preparation, the selection of pertinent features, and the configuration of a classification algorithm model, such as SVM or Decision Tree [17], [18].It is necessary to collect and organise quake data from multiple sources with care in order to meet the requirements of the analysis.Important characteristics, such as magnitude, depth, epicentre location, and duration, should be chosen as model training attributes [19].To obtain optimal model performance, it is also necessary to consider the hyperparameter settings, such as C in SVM or maximum depth in the Decision Tree [20], [21].Implementing a regression algorithm on earthquake data, on the other hand, entails similar data preparation and A. Jufriansah, et [22].To predict continuous values, such as building damage or vibration intensity, a regression algorithm, such as Linear Regression or SVR, may be implemented [23], [24].The application of a classification algorithm to earthquake data can yield valuable information regarding the characteristics of a specific earthquake [25], [26].By classifying earthquakes based on depth, for instance, seismologists can comprehend the patterns and behaviour of earthquakes at specific depths and identify potential dangers in certain regions [27], [28].In addition, the classification algorithm can aid in earthquake risk analysis by classifying areas according to their potential hazard level, allowing for more precise mitigation measures [29].
Applying the regression algorithm to earthquake data enables the prediction of continuous earthquake-related values [30].Using regression, we can, for instance, predict building damage or tremor intensity based on certain earthquake characteristics, such as magnitude and epicentre distance [31].This information can be used to assess the potential damage and impact of an earthquake in a particular location, allowing for earlier implementation of mitigation measures and disaster response plans to reduce earthquake-related risks and losses.
The application of classification and regression algorithms to earthquake data contributes to a greater comprehension of earthquake patterns, the characteristics of certain earthquakes, and the potential for disaster [32], [33].The classification algorithm aids in classifying earthquakes based on certain attributes, allowing for the identification of risks and potential dangers in certain regions [34], [35].On the other hand, the regression algorithm enables us to model and predict earthquake impacts, such as building damage and the intensity of vibrations, so that disaster mitigation and response actions can be more precisely directed [36].However, the implementation of this algorithm encounters obstacles in the preparation of accurate and representative data, the selection of pertinent features, and the setting of the appropriate hyperparameters [37].In addition, earthquake data can be highly dynamic and intricate, necessitating a vigilant and ongoing analytic approach to produce accurate and useful results [38].

III. Method
This study uses real-time data on the aftershock rupture of the Flores Sea, East Nusa Tenggara, which was recorded from December This research analyzes to compare the data from the evaluation of the classification algorithm and the regression algorithm.The initial stages of this research include requesting IRIS DMC Web Service data.The data is then subjected to a cleaning process to obtain the expected feature extraction It is necessary to clean up data in order to get rid of errors.This entails eliminating unimportant signals or noise, such as interference from people or equipment.Clean seismic data can be processed using techniques for filtering, blending, and outlier elimination.Following the identification and separation of the crucial phases, feature extraction is done to explain the pertinent seismic properties.Parameters like amplitude data, frequency data, duration, or wave speed are examples of possible features.To find and extract these features from seismic data, signal processing methods and statistical analysis might be applied the procedure of normalization is then carried out.Data normalization may be used in specific circumstances to guarantee consistency and accurate comparison of extracted features.By altering the scale or range of feature values, normalization can be done to make them uniform and simple to compare.Dimensionality reduction is frequently required in big seismic datasets to ease complexity and accelerate analysis.Using methods like Principle Component Analysis (PCA) or factor analysis, one can reduce the dataset's dimensionality while still preserving important data.A robust foundation for additional research, such as seismic modeling, subsurface structure mapping, or earthquake prediction, is provided by appropriate data cleaning and feature extraction procedures.Seismic researchers and scientists can examine seismic patterns, trends, and characteristics using well-extracted features.
The clustering process must be done as follows: In this stage, new features are added to the data as data clusters to label the dependent data.Latitude, longitude, depth, and magnitude data are the features that have been chosen for this section.The elbow approach, also known as WCSS analysis, will be used to find the best elbow graph by utilizing the sklearn cluster import KMeans library function.The outcome is then merged with the concat command in the data frame to create a new label.The validation value, which comprises training data and test data, is divided by the subsequent process.This method compares 70% training data and 30% test data using a fitting model with training data and targets.The results of the examination of the comparative analysis of the accuracy value of the classification and regression algorithms are evaluated in the final stage.Details are presented in Figure 2.

IV. Results and Discussion
Historically, earthquake predictions have been made since the 19th century.Geller [39] reviewed earthquake predictions that were divided into different timescales and could explain precursors, which were recognized in the IASPEI guidelines.In the same year, Uyeda [40] revealed that the earth is a seismo-electromagnetic signal and developed the Van method for short-term earthquake prediction.By strengthening his report that electrical signals are not enough to be said as earthquake precursors.One of the researchers who disputed this was Sevgi [41], Another thing to note is the local characteristics of permittivity, permeability, and background noise.However, the discussion only shows the earth's electrical signal and has not demonstrated an earthquake prediction model based on historical data.Huang et al. [42] added that earthquake precursors consist of several parameters, such as seismic, geo-electromagnetic, geodesic, gravity, and ground fluid.Several other parameters are input parameters, such as satellite imagery and animal behavior.Researchers have developed Aftershock predictions using artificial intelligence [43]- [45].Karimzadeh et al. [46], has reviewed Aftershock on distribution slip, Coulomb stress change at source fault, and active fault orientation to predict Aftershock pattern.This study provides promising results for predicting spatial aftershocks but has not answered the predictions for the time and depth of aftershocks.

Main Shock Review
The Mw 7.3 earthquake that occurred in the Flores Sea on December 14, 2021, was recorded using the Australian Array (AU) station with 14 seismic stations (Figure 3).The data was obtained based on a bandpass filter of 0.1 Hz to 1.0 Hz with the cross-correlation method to 18 seismic stations.The rupture characteristics are described in (Figure 4) after filtering the waveform in the pass band range between 0.1 Hz to 0.5 Hz and 0.3 Hz to 1.0 Hz.It results in a P-waveform (Figure 5) with a coherent and in-phase signal.It shows that the pattern follows the area around the main earthquake.
Handayani [47] has reviewed the source of the Flores island earthquake.From the results of this study, it is known that normal faults dominate the Flores Sea.It is clarified by Maneno et al. [48] that the earthquake hypocenter in the northern part of Flores was dominated by deep earthquakes associated with the Flores Thrust Zone.The southern part is more overlooked by shallow to moderate earthquakes.So far, research on the Flores earthquake has only been limited to earthquake mapping and relocation.Kurnio et al. [49] only describe a review of underwater landslides with the result that the trigger for the 1992 earthquake in the Flores Sea was an underwater landslide.The same thing was also explained by Pranantyo et al. [50] regarding seismic and non-seismic hazards in Indonesia, especially in eastern Indonesia.Then Handayani [47] clarified the seismic hazard, especially in the Flores Sea area, which had experienced seven significant earthquakes.Supendi et al. [51] explained that the Aftershock study provides new information on seismogenic processes and seismic hazards in Indonesia, including the location of newly active faults.Therefore, several parameters need to be considered when making Aftershock predictions because these will affect the performance accuracy.

Classification Algorithm
The classification process is carried out based on the Aftershock rupture data, which was previously carried out before the data cleaning process was carried out to avoid missing data.
This process involves several stages, such as feature extraction to label new data as dependent data with the cluster name.feature, while the dependent variable is cluster data.This process is carried out as an initial step in determining the validation value using the cross-validation method, which will divide the data into training and test data [52].Using the WCSS tool with the Elbow method, we get a cluster division of 3 clusters (Figure 7), followed by a classification algorithm.

Figure 7. Elbow method
The Elbow approach aims to select a small k number while maintaining a low withinss value [53].The Elbow method of cluster analysis is used in this study to determine the ideal number of clusters, taking into account the comparative value (from the SSE calculation for each cluster value) between the number of clusters that will form an elbow at a point, such that the SSE value will decrease as the number of clusters k increases [54].The SSE formula looks like in Equation (1).
Since there are 3 clusters based on equation (1) above, use equation ( 2) to calculate the distance between the two objects that are closest to one another.
Table 1 includes the results of computations made using the Elbow technique.Banggut et al. [4] revealed that the classification method would place objects into categories.Therefore, it will depend on each classification attribute in the new data [55], [56].There are 982 data of training data after the Train-Validation Split, while a lot of test data after the Train-Validation Split 421 data.The training data consists of two variables: the independent variable using the magnitude.The complete comparison of the accuracy of the classification algorithm is presented in Table 2.

Regression Algorithm
The regression process is carried out to determine the relationship between categories or labels.In this process, the algorithms that compare the evaluation values are the Decision Tree Regressor and the Random Forest Regressor.The results show that both do not meet the accuracy value because the error value is very large.It is also reinforced by using the Convolution Matrix method (Figure 8), which shows a minimal value of the relationship between attributes.Marhain et al. [47], [57] have investigated the application of AI for earthquake prediction in Terengganu using ML.In the report, several algorithms are compared, such as SVM, Random Forest (Figure 9), Decision Tree (Figure 10 This study succeeded in predicting ground motion parameters, namely earthquake acceleration, earthquake speed, and earthquake depth, based on four performance criteria that were successfully evaluated.These results indicate that the ANN algorithm has accuracy compared to Random Forest.Although the proposed model has an accuracy value, it has not answered the location point for the earthquake.Other researchers who use SVM are [55], [57], [59], [60].In addition to SVM, earthquake prediction algorithms are often used, namely Random Forest [61], and Neural Network algorithm [62]- [64].You can check to see, for instance, whether there is a spatial pattern suggesting that stronger aftershocks are more likely to happen close to the mainshock or whether there is a temporal pattern suggesting that the frequency of aftershocks increases after a mainshock of a certain magnitude.It follows that this classification analysis can help with comprehension of aftershock characteristics and associated potential dangers.This can be used to improve comprehension of the spatial and temporal distribution of aftershocks and to offer guidance in the design of plans for mitigating earthquake risk.
In order to examine trends and numerical patterns in aftershock data, regression algorithms might be applied.In this instance, the generated regression coefficients can be used to assess the pattern and observed trends in the aftershock data.A tendency toward increased aftershock frequency as time passes after the primary earthquake, for instance, may be indicated by a significant and positive regression coefficient for the time variable.This regression analysis's implication is that it can aid in more precisely simulating aftershock behavior.Regression analysis data on numerical trends and patterns can be used to anticipate and better understand fault behavior and potential seismic danger.

V. Conclusion
New scenarios in the field of earthquake prediction need to be seriously reviewed.It is crucial in risk assessment as prevention and early warning in the future.
However, this will be difficult to do because data availability is influenced by several factors, such as noise data affecting earthquake detectors.So to do this, it is necessary to have an initial analysis that is useful in filtering noise data so that the sensor only captures earthquake data sources, not other data.The method is through an alignment process with cross-correlation, which can produce P-waves in a coherent and in-phase signal form.It is very useful when the detector is converted into a normalized value structure to determine the estimated value.The estimation results show that the classification algorithm's evaluation value is better than the regression algorithm.Several algorithms' evaluation value indicates an accuracy rate between 80% and 100%.
Research utilizing SVM, HP-SVM, and PSO-SVM to predict earthquakes is a challenge that must be resolved by overcoming several limitations on prediction accuracy.Data constraints, which pose a significant barrier to model creation, are the primary issue.However, data augmentation and synthetic data integration techniques can overcome this limitation.In addition, careful feature analysis is required to identify the most pertinent features and eliminate the possibility of features with minimal correlation.When using optimization techniques such as PSO to improve SVM performance, paying special attention to how non-stationary earthquake characteristics can be incorporated into the model is crucial.As a result of these issues, the study's recommendations include the use of multiple models, particularly the combination of SVM with various optimization techniques.In addition, methods for addressing the issue of rare earthquakes and feature imbalances must be further developed to improve the accuracy of forecasts and ensure their dependability in real-world situations; thus, a comprehensive evaluation of various scenarios against earthquake spatial data sets is necessary.

Figure 1 .
Figure 1.ML architecture (a) SVM algorithm classification process, (b) SVR algorithm work procedure, (c) K-NN algorithm decision making process, (d) K-means algorithm principle work, (e) RF algorithm classification process, (f) Dendrogram hierarchical clustering algorithm, (g) The decision-making process of the DT algorithm [11].

Figure 6 .
Figure 6.(a) Event Mw 7.3 data, (b) Aftershock data, (c) Aftershock distribution data between Magnitude and Depth ), and Logistic Regression.The limitation of this research is that the data information from each station is different.So the research A. Jufriansah, et al.Estimation of Flores Sea Aftershock Rupture Data … p-ISSN: 2621-3761 e-ISSN: 2621-2889 only focuses on one station to compare the accuracy and probability values and then analyze the results.In the same year, Essam et al. [58] conducted a study in the same location by comparing the Artificial Neural Network (ANN) and Random Forest algorithms.

Table 1 .
The Elbow method is used in calculating the first five data

Table 2 .
Comparison of classification algorithm accuracy values The researcher would like to thank the Indonesian Ministry of Education, Culture, Research and Technology for providing research funds through the 2022 Beginner Lecturer Research Grant Scheme (PDP) under Grant No. 1098/LL15/KM/2022.