Adaptive Neural Fuzzy Inference System and Automatic Clustering for Earthquake Prediction in Indonesia

Earthquake is a type of natural disaster. The Indonesian archipelago located in the world's three mega plates; they are Australian plate, Eurasian plate, and Pacific plate. Therefore, it is possible for applied of earthquake risk of mitigation. One of them is to provide information about earthquake occurrences. This information used for spatiotemporal analysis of earthquakes. This paper presented Spatial Analysis of Magnitude Distribution for Earthquake Prediction using adaptive neural fuzzy inference system (ANFIS) based on automatic clustering in Indonesia. This system has three main sections: (1) Data pre-processing, (2) Automatic Clustering, (3) Adaptive Neural Fuzzy Inference System. For experimental study, earthquake data obtained Indonesian Agency for Meteorological, Climatological, and Geophysics (BMKG) and the United States Geological Survey’s (USGS), the year 2010-2017 in the location of Indonesia. Automatic clustering process produces the optimal number of clusters, that is 7 clusters. Each cluster will be analysed based on earthquake distribution. It calculates the b value of earthquake to get the seven seismicity indicators. Then, implementation for ANFIS uses 100 training epochs, Number of membership function (MFs) is 2, MFs type input is gaussian membership function (gaussmf). The ANFIS result showed that the system can predict the non-occurrence of aftershocks with the average performance of 70% Keywords— Seismic, Automatic Clustering, Adaptive Neural Fuzzy Inference System, Earthquake Prediction


I. INTRODUCTION
Earthquake is the event of the earth due to the release of energy in the ground suddenly.It caused by the sudden breaking a layer of rock or plate fracture in the earth's crust.The sequence in the event of an earthquake is not disposed randomly, but it follows a spatial pattern with triggers that cause the occurrence of a shock [1,2,3].
The Indonesian archipelago is located in the world's three mega plates; they are Australian plate, Eurasian plate, and Pacific plate.The interaction between these plates placed Indonesia as the territory that has high activity of volcanoes and seismic [4].
The Indo-Australian Plate moves relatively northward and infiltrates the Eurasian plate, while the Pacific plate moves relative to the west.The meeting point of the plate is in the sea so that if there is a large earthquake with shallow depth, it will potentially cause a tsunami so that Indonesia is also prone to tsunami. Figure 1 shows the lanes of the plates.
Seismic sequences are not formatted randomly, but they follow a spatial pattern with the consequent triggering of events.In other words, These events produce non-random grouping [5].So, the scientists are developing a model to explain this grouping pattern.Identification of the seismogenic zone is the first step taken in analyzing earthquake risk.It can be divided into small subzone with reference to seismological criteria.Earthquake zoning can be done by using expert knowledge.This zoning or grouping well-known by clustering.By using a clustering scheme, it is possible to retrieve spatiotemporal pattern that created by events [6].Therefore, it is required modeling of earthquake clustering more accurate to develop a model that explains the pattern or grouping behavior [7].

INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION VOL 3 (2019) NO 1 e-ISSN : 2549-9904 ISSN : 2549-9610
Each region, that has been produced by clustering process, has the different historical earthquake datasets.Information in this dataset is used for predicting the probability of earthquake occurrence.One of the empirical relationships that have been used frequently in long-term prediction is the Gutenberg-Richter law [8].
Clustering is useful for analyzing seismic parameters e.g., b-value.The result of clustering can be employed for predicting the nature of the events in the future.Therefore, it is possible for applied of earthquake risk of mitigation [2,6,9].
This study proposes prediction an earthquakes over the next five days after an earthquake by using an ANFIS model with a magnitude equal to or greater than the threshold given in the selected cluster.Determining the number of clusters uses the automatic clustering method.Predicted aftershocks can be used by authorities to spread prevention policies.

II. RELATED WORKS
Automatic clustering in this study uses valley tracing and hill-climbing method.It detects the number of spatial areas automatically.Valley tracing and hill-climbing method determine global optimum with variation differences.Then, hierarchical k-means clustering algorithm for earthquake data classification.After information is available for each cluster, feature extraction will be performed on each cluster.This feature is used as a spatial analysis of earthquake.
However, research on predictions of earthquakes has been carried out before.Artificial neural networks (ANN) have been used for earthquake prediction as in [9,10,11].Besides that, ANN has also been implemented in predicting earthquakes based on earthquake distribution in Indonesia.[4].There are also many studies in analyzing earthquake patterns to predict earthquakes using ANFIS.one of the ANFIS implementations for earthquake prediction is on paper [12].This paper investigates the prediction of future earthquakes that would occur with magnitude 5.5 or greater using an adaptive neuro-fuzzy inference system (ANFIS).
ANFIS can also be implemented to investigate the seismic moment prediction of the next earthquake [13].Besides predicting the occurrence of earthquakes in the future, (ANFIS) approach to predict the location, occurrence time and the magnitude of earthquakes [14], and also for classification and prediction problems [15].

III. PROPOSED SYSTEM
This section describes the design system architecture of earthquake prediction using Adaptive Neural Fuzzy Inference System (ANFIS) based on automatic clustering.The overall modeling process in this study described in Figure 2.

A. Earthquake Data Preprocessing
This process includes several stages.First stage is data acquisition, which includes earthquake data consisting of longitude, latitude, magnitude, depth, and time.Then, the data obtained will be converted to scale magnitude moment (Mw), and the third is magnitude thresholding or commonly called magnitude of completeness (mc).
An input data from the process of ANFIS model is earthquake data that covers the entire territory of Indonesia which located at the location that has geographic coordinates located between 6 0 North Latitude -11 0 south latitude and 95 0 East longitude -141 0 East longitude.Also, Indonesia locating between the Pacific Ocean and the Hindi Ocean, between the continents of Asia and the continent of Australia, and at the meeting of two mountain ranges, namely the Pacific Circum and the Mediterranean Circum.
The history of earthquake data in this study was taken from the Agency Meteorology, Climatology and Geophysics (BMKG) and the United States Geological Survey (USGS) catalog from 2010 to 2017.This dataset consists of Magnitudes of 1 -10 ML, and a depth of 0-650 km.The number of earthquake data obtained is 16,013 data Magnitude of Completeness (MC) has the meaning of cutoff magnitude or commonly known as threshold.Earthquake data with a magnitude below the threshold, the data will be eliminated as a data set.While the data above the Magnitude of Completeness (MC) value, the earthquake data will be used as a dataset.
Magnitude of Completeness has an impact on earthquake data analysis.the improper implementation of Magnitude of Completeness, will result in a less precise analysis.Incompleteness of seismic data will give the result in seismic risk parameters resulting into overestimated or underestimated.In this research uses the Gutenberg-Richter frequency size distribution for determining the value of Magnitude of Completeness (MC) [16].
In this research, we use Magnitude 5.1 Mw for implement magnitude of completeness (MC) [7].Therefore, total number of seismic data after implementing magnitude of completeness is 12931 events.

B. Zonation of The Earthquake Zone
Zonation of the Earthquake Zone aims to divide the area of the earthquake risk into several regions.This area division is used for earthquake distribution analysis.A common problem is to determine the number of territorial divisions.Determining the number of random zones, will have an impact on the results of the earthquake distribution analysis.So, The results of the analysis will provide an earthquake risk analysis that is overestimated or underestimated.The division of the earthquake zone into a smaller region is usually known as earthquake clustering.Thus, this process requires a method to determine the optimal number of zone clusters.
Seismic clustering is the initial stage for earthquake pattern recognition.Clustering aims to classify earthquake data that has similarities to certain criteria.determining the optimal number of clusters is a difficult challenge in the process of grouping data [17].Cluster analysis is known as a way of determining the optimal number of clusters, namely by measuring the variance in and variance between each cluster [18,19].
In this research, grouping the earthquake area into smaller groups using clustering techniques.The clustering method used is automatic clustering.This method consists of two clustering processes, the first stage is determining the optimal number of clusters.Meanwhile, the parameters used in this process are longitude parameters and latitude parameters.and the process for Clustering Analysis uses Centroid Linkage.After the first phase is completed, continued by classifying the distribution of earthquake data based on the similarity of predetermined criteria.The number of distributions is based on the number of clusters obtained from the first stage.The method used to group earthquake data using hybrid clustering method.This method consists of hierarchy clustering and k-means clustering.
Processing to determine the optimal number of clusters use the hill-climbing method and valley tracing or what is called automatic clustering.Automatic clustering has been implemented by several studies on earthquake prediction [4].Automatic clustering process is also implemented to risk mapping the earthquake disaster [21].Figure 3 shows System Architecture to find optimal number of cluster.The algorithm that to find optimal number using hillclimbing and valley tracing method [19] as follows: 1. Manage each data from A with the n-dimensional vaktor attribute., and as optimal number of cluster.
After obtaining the optimal number of clusters, the next step is to group data based on similarities to certain data characters.This grouping process uses hybrid clustering, it is combine hierarchical and k-means clustering algorithm.Figure 4 shows System Architecture to find optimal number of cluster.The Hierarchical K-means clustering algorithm is as follows: 1. Set the attribute of n-dimensional vector from each data of A.

C. Adaptive Neural Fuzzy Inference System
The ANFIS process consists of three stages: Feature Extraction, Normalization, and ANFIS Model.Feature extraction is the process for determining the input parameters that will be used in the first layer in the ANFIS model.Then, the amount of data feature will be normalized using Normalization Process.This process aims to scale attribute values from data in a certain range.After that, implement ANFIS Model to train data and to predict earthquakes in the future.

1) Feature Extraction
Feature Extraction intends to obtain seismic parameters from historical earthquake data.This parameter is used as preprocessing of earthquake data, which will be used for pattern recognition.Seismicity parameter in this research uses b value.This Seismic parameters are obtained from spatial and temporal distributions analysis [15,7].b value is characterized the relative size distribution of events [15] and b-value depends on the stress regime and tectonic character of the region [7].The b value can be calculated as follows: (1) where, indicates the mean magnitude and is the minimum magnitude.
Parameter input consists of seven inputs.five of seven parameters input are obtained from the Gutenberg-Richter law's.The parameter is b value.The seven seismicity indicators can be shown in Table 1 [20].The parameter b value is calculated as: ( Where, Mi is the magnitude for the i-th earthquake in the dataset clustering.Mo is magnitude completeness or cutoff magnitude.
Then, increments of b are calculated: ( From these equations, first seismicity indicator uses 50 earthquakes recorded to calculate bi.Then, calculate bi-4 using 50 recorded earthquakes as well, This calculation starts after 4 steps back from the data used to calculate bi.So, the amount of data needed to calculate five seismicity indicators is 70 earthquakes.
The six th input variable x6i is the maximum magnitude Ms from the earthquakes recorded during the last week in the area analyzed [15].It is defined as: when ( where the time t is measured in days. x7i is identifies the probability of recording an earthquake with magnitude larger or equal to Ms.In this study, 6.0 ML uses as magnitude (Ms) and the parameter x7i is calculated as: The output of parameter is yi that observed as the maximum magnitude Ms in the cell under analysis, during the next five days.It has been set to 0 for such situations where no earthquake with magnitude equal or greater to Ms.In this scenario, Ms set to 6.0 ML. (10) where the time t is measured in days.

2) Normalization
In this process, the amount of data normalized in within the value 0.1 -0.9 as follows: (11) where xi is the normalized value, x is the measured value, xmin is minimum values in dataset, xmax is maximum values in dataset.

3) ANFIS Model
Adaptive Neuro Fuzzy Inference System (ANFIS) is a combination of fuzzy logic and artificial neural network (ANN).ANFIS is an architecture that is functionally the same as the fuzzy rule base Sugeno model.Neuro-fuzzy systems are based on fuzzy inference systems that are trained using learning algorithms derived from artificial neural network systems.
ANFIS consists of five layers.Each layer contains nodes.Nodes in ANFIS consist of two types, namely adaptive nodes and fixed nodes.Adaptive nodes have box-shaped symbols and nodes still have circular symbols.Adaptive nodes are parameters that can change due to the learning process.This ANFIS model can be seen in Figure 5.
ANFIS network consists of 5 layers as follows [14]: a) Layer 1: Adaptive nodes Each node on layer 1 is an adaptive node.This node function is as follows: Where, is a membership function of a fuzzy set Ai or Bi, and determine the degree of new membership formed from existing inputs x or y.As for functions and are a membership function parameter for x and y.
The parameter membership function can be described using the Bell function approach, as follows: (14) Where, values of a, b, and c are the premise parameters.b) Layer 2: This layer is a non-adaptive node.The result of this layer is the multiplication of all inputs.The following is the equation on layer 2: (15) Each node output is a firing strength.This node will form a given rule.c) Layer 3: This layer is the result of calculating the ratio of the i-rule.the output of this layer is called normalized firing strength.Here is the equation in layer 3.

IV. EXPERIMENT AND ANALYSIS
This section describes experiment and analysis of earthquake prediction.Figure 1 describes the overall process.the steps in this study can be explained as follows: Data Preprocessing, collect earthquake data histories.the data was collected from January 2010 to December 2017.Earthquake data catalogues in this paper are obtained from two earthquake association sources.These are BMKG and USGS (United States Geological Survey's).The sample of data history earthquake can be shown in Table 2. A. Zonation of The Earthquake Zone Zonation of the earthquake zone is a process for grouping data into smaller areas.This grouping intends to extract earthquake data according to a particular region.A big challenge in determining grouping is determining the optimal number of areas.So, zonation of the earthquake zone in this study, uses automatic clustering in determining the optimal number of areas.Automatic clustering determines the number of earthquake regions based on previous earthquake history data.Automatic clustering has been used in previous research [7].Earthquake dataset in this research which is used for clustering is epicenter parameter, that consist of longitude and latitude.The results of Zonation of The Earthquake Zone process show that, the optimal number of zones in the Indonesian region is 7 zones.While the Indonesian territory belonging to the each cluster can be seen in Table 3.

B. Adaptive Neural Fuzzy Inference System
As an example, it is attempted to predicted the occurrence of earthquakes over the next five days.earthquake data used is cluster4 area, 100 training epochs , Number of membership function (MFs) is 2, MFs type input is Gaussian membership function (gaussmf), and uses RMSE.In this example, major earthquakes are defined as those events with a Richter Magnitude of 5.1 or greater.
For training ANFIS, earthquake catalogues between January 2010 and March 2017 are used.. Feature extraction is then implemented to get input vector parameters.The process after implementing feature extraction is to obtain of 37 training data sets.Sample training data sets are presented in Table 4.
Whereas, for testing the model performance, it uses earthquake catalogue between April 2017 and December 2017.The process after implementing feature extraction is to obtain of 10 testing data sets.Sample training data sets are presented in Table 5. Performance evaluation in this study is used to determine the performance of the ANFIS model.Earthquakes can be stated to occur during the next five days, when the magnitude reaches the specified threshold (Ms).The parameters that used to evaluate the performance of system [2] are : 1. True Positive (TP) is a parameter that states how many times ANFIS predicts earthquakes and earthquakes that will occur in the next five days.
2. True negatives (TN) is a parameter that states how many times ANFIS does not predict earthquakes and earthquakes will not occur for the next five days.3. False positives (FP) or false alarm is the number of times ANFIS predicted the earthquake but the earthquake did not occur for the next five days.4. False negatives (FN) is the number of times ANFIS does not predict the occurrence of an earthquake but an earthquake occurs over the next five days.Table 6 shows the experimental result of ANFIS model for quality parameters.The result is that the system can predict the non-occurrence of aftershocks P0=80% and able to predict emerging earthquake above 5.1 ML of P1 = 60% with average performance of 70%.

VI. CONCLUSION
In this paper we have presented a spatial analysis of magnitude distribution for earthquake prediction using adaptive neural fuzzy inference system (ANFIS) based on automatic clustering in Indonesia.This system has 3 main sections: (1) Data preprocessing, (2) Automatic Clustering, (3) Adaptive Neural Fuzzy Inference System.For experimental study, earthquake data is obtained in year 2010-2017.Automatic clustering process produces The optimal number of cluster, that is 7 clusters.Each cluster will be analyzed based on earthquake distribution.its calculate the b value of earthquake to get the seven seismicity indicators.Then, implementation for ANFIS uses 100 training epochs , Number of membership function (MFs) is 2, MFs type input is Gaussian membership function (gaussmf).The ANFIS result showed that the system can predict the non-occurrence of aftershocks with average performance of 70%.

Fig 1 .
Fig 1. Path of earthquake in Indonesia

Fig 2 .
Fig 2. System Architecture of Earthquake Prediction Using ANFIS Based on Automatic Clustering

Fig 3 .
Fig 3. System Architecture of find optimal number of cluster 2. Set the number of clusters that have been determined as K 3. Clustering algorithms are applied with the number of clusters 4. Determine the calculation of variance as 5. Increment j=j+1 6.If j<n-2, then do it again from step 3 .7. Determine the moving variance with valley tracinghill climbing 8. Set a threshold on moving variance ∂ to assign automatic clustering 9. Next is ranking moving variance

Fig 4 .
Fig 4. System Architecture of find optimal number of cluster

4 :
Each node in this layer is adaptive.This layer has the following node functions:(17) where, {pi, qi, ri} are a consequent adaptive parameter.e) Layer 5: this layer to calculate all outputs.(18)

Fig 5
Fig 5 System Architecture of Adaptive Neuro Fuzzy Inference System (ANFIS) 2. Detemine the predefined number of clusters (K). 3. Determine computation numbers as p 4. Set the initial counter (i=1 ) 5. Apply K-means algorithm.6. Record the centroids of clustering results as

TABLE 2 SAMPLE
OF DATA HISTORY EARTHQUAKE FROM 2010 TO 2017

TABLE 4 SAMPLE
TRAINING DATA SETS BETWEEN JANUARY 2010 AND MARCH 2017 5. Negative predictive value(NPV) is proportion of negative result denoted as P0 which is obtained as follow: (19) 6. Positive predictive value (PPV) is proportion of positive result denoted as P1, determined as follow: (20) 7. Sensitivity (Sn) is proportion of positives that are correctly identified and obtained as follow: (21) 8. Specificity (Sp) is proportion of negatives that are correctly identified and obtained as follow: (22)