Ground motion predictive modelling based on genetic algorithms

This study aims to utilise genetic algorithms for the estimation of peak ground accelerations (PGA). A case study is carried out for the earthquake data from south-west Turkey. The input parameters used for the development of attenuation relationship are magnitude, depth of earthquake, epicentral distance, average shear wave velocity and slope height of the site. Earthquake database compiled by the Earthquake Research Institute of Turkey was used for model development. An important contribution to this study is the slope/hill data included into the dataset. Developed empirical model has a good correlation ( R = 0.78 and 0.75 for the training and overall datasets) between measured and estimated PGA values. The proposed model is also compared with local empirical predictive models and its results are found to be reasonable. The slope-hill effect found to be an important parameter for the estimation of PGA.


Introduction
Many seismic design codes utilise smoothed acceleration spectrums, which are scaled by spectrum coefficients for different seismic zones, for the determination of seismic loads on buildings.Spectrum coefficient is usually a measure of expected peak ground acceleration (PGA).On the other hand, seismic hazard estimation is also an important aspect of earthquake engineering.It is needed to determine seismic safety of existing building stocks for mitigation works.PGA values significantly affect the results of the hazard assessment studies.
Many ground motion predictive models for PGA are proposed in the existing literature.Various input parameters are used in different models.Magnitude of the event and distance between site and source are common parameters in Correspondence to: S. Yilmaz (syilmaz@pau.edu.tr)almost all models.However, attenuation behaviour is also affected by source type, properties of propagation media, local geology and topography (Kramer, 1996).On the other hand, many of the existing empirical predictive relations are obtained by regression analysis.Although, the attenuation behaviour is very complex due to uncertainties caused by the complexity of inputs, these empirical models have very simple formulations.The affect of the uncertainties may be reduced by the use of artificial intelligence (AI) techniques.In recent years, some new ground motion predictive models are proposed based on AI techniques (Güllü and Erc ¸elebi, 2007;Cabalar and Cevik, 2009).
Genetic algorithm (GA) is a popular computing technique for the solution of optimization problems.It is a computer simulation of production of successive generations from a random initial population, which is a set of probable optimal solutions.As well as classical optimization problems, GA is utilised for curve fitting applications for the estimation of a depended variable (Ergun et al., 2008).In this approach, format of the aimed function is defined in the code and coefficients of the predefined parameters are determined by genetic operators.However, the success of regression for complex problems is limited to the imagination of the user, which defines the form of the curve.Instead of using predefined curve formats, GA can be programmed so that it selects the effective parameters and determines their coefficients.Such an application of GA is also available in the literature for the geotechnical engineering problems (Sen and Akyol, 2010).
Turkey is a seismically active region of the world and it has been affected by destructive earthquakes frequently.There is a need to develop local empirical predictive models for the region.There is a number of models developed for Turkey (Aydan, et al., 1996;Gülkan and Kalkan, 2002;Kalkan and Gülkan 2004;Ulusay et al., 2004).The recent models, which utilise dataset developed in previous studies, also employ AI approaches (Güllü and Erc ¸elebi, 2007;Cabalar and Cevik, 2009) see Table 1.
Published by Copernicus Publications on behalf of the European Geosciences Union.

Author
Attenuation model Aydan et al. (1996) PGA = 2.8(e 0.9Ms e −0.025R − 1) M s : surface wave magnitude, R: Epicentral distance İnan et al. (1996) PGA = 10 0.65M−0.9log(R)−0.44 M: magnitude, R: Epicentral distance Gülkan and Kalkan (2002) regression coefficients; r: Epicentral distance; V S : Shear wave velocity Kalkan and Gülkan (2004) Epicentral distance; V S : Shear wave velocity Ulusay et al. (2004) PGA = 2.18e 0.0218(33.3M w −Re+7.8427SA +18.9282S B ) M w : moment magnitude, R e : Epicentral distance; S A , S H : Soil constants )) A: Peak acceleration at base-rock, β 0 , β 1 , β 2 : regression coefficients; M w : moment magnitude, R: Epicentral distance; Cabalar and Cevik (2009) PGA = 5.7 M w : moment magnitude, R: Epicentral distance; V S : Shear wave velocity In this study, south-west Turkey is selected as a case study for the development of GA based attenuation model.A dataset is prepared for the region.Recent earthquake data is also included in the dataset.It is well known that slope/hill affect is an important source of PGA variations.Due to the slope/hill affect, hazard level increased in some past earthquakes (Kaplan et al., 2008) in the region.Therefore, slope data of the recording stations is also used in the dataset.
2 Data used and method

Seismicity of the South-West Turkey
Turkey suffers significantly from earthquakes, being on Alpine orogenic system (Erdik et al., 1985).The African, Arabian and Eurasian plates interact with the Anatolian plate.Movement of African and Arabian plates to north-northeast forces the movement of the Anatolian plate to the Eurasian plate.Due to this movement, the west of Anatolian plate is squeezed and horst and graben systems are formed in southwest Anatolia (Fig. 1).These formations are the main source of the seismic activity in the region.For more information on the seismicity and tectonics of the region, Erdik et al. (1985) and Westaway (2003) are important references.

Empirical predictive models for Turkey
The first seismograph in Turkey was installed in 1973 in scope of a project carried out by Earthquake Research Institute of Turkey (today called Disaster and Emergency Management Presidency).The first record was taken during the 1976 Denizli Earthquake.Some of the records from the network were already used in some international datasets for empirical predictive modelling.Moreover, local predictive models based on earthquake data in Turkey have been developed in the last decade.Table 1 summarizes empirical predictive relations developed for Turkey.The relations developed before 2001 uses surface wave magnitude (M s ), which is developed for strong earthquakes.The most reliable magnitude scale is the moment magnitude (M w ).However, it has complex calculation procedures and calculated for a limited number of earthquakes.The recent models utilise moment magnitude.However, the dataset used in this study includes small to medium size earthquakes with records taken from a distance of up to 100 km to the source.As it has the simplest calculation procedure, duration magnitude is available for most of the earthquakes.Some empirical relations are also available in the literature for the conversion between different magnitude scales (Ulusay et al., 2004).However, such a conversion may introduce some extra errors due to development of a model by using the data obtained from another empirical model.Therefore, duration magnitude (M d ) is used for this study; instead of converting different magnitude scales to each other.

Ground motion dataset for south-west Turkey
Ground motion dataset used for the development of the empirical predictive model proposed in this study was divided into training and test datasets.Division is carried out by random selection.Researchers proposed to select 20 % to 30 % of the dataset for testing (Swingler, 1996;Nelson and Illingworth, 1990).In this study, 28 % of the overall dataset is selected as testing data.Tables 2 and 3 list the training and testing datasets, respectively.Station information can be accessed at Disaster and Emergency Management Presidency (DEMP).All of the destructive earthquakes (M > 5.0) were included in the database.However, to limit the affect of small PGA values, records with small PGA values were eliminated.
The lower limit for the PGA is selected as 20 gals. Figure 2 shows geographic distribution of the epicentres of the earthquakes and recording stations used in this study.
Since the lower limit of 20 gals have been used for dataset creation, there exist many near source recordings for small magnitude earthquakes.Therefore, an epicentral distance limitation of 2 km is selected to filter cut-off near-source effects.When the magnitude increases, upper bound of the epicentral distance is also increases as illustrated in Fig. 3. Considering the earthquake with magnitude smaller than 4.0, there are only two records with epicentral distance greater than 30 kilometers (32 km and 47 km).On the other hand, for a greater magnitude number of records with greater epicentral distances than 65 kilometers, there is only three.Therefore, it seems to be rational that developed models can be valid for the first 30 and 65 km for magnitudes smaller and greater than 4.0, respectively.

Genetic Algorithms
Genetic Algorithm (GA) is an optimization and search method based on the principle of survival of the fittest.The solution process simulates natural selection mechanisms.The method has been effectively used in many engineering applications.GA was first introduced by Holland (1975) although it started to be widely used after Goldberg's book (1989).
A GA is initiated with a random population of many individuals.Each individual represents a probable solution to the problem of concern.Then, using these individuals, new offsprings are produced by crossovers and mutations in each generation.Among the population, specific selection rules are applied to the expanded population to reduce the size to that of the original population.These rules are based on the fitness of the individuals.The selection method is designed to increase the chance of selection of the fittest individuals.The fitness of individuals is a measure of optimality of the solution represented by that individual.The procedure of the GA generate better and better offsprings, in each generation to make the solutions close to the objective function (Tung et al., 2003).GA has been confirmed to offer many advantages with respect to the classical optimization methods.
Offspring chromosomes are generated by usually combining two parent chromosomes.This is called crossover operation.An offspring has features of both parents.Firstly, two individuals are selected for crossover and a random cut-off point is selected for a crossover.Then each chromosome is cut at that point and right parts of the strings are swapped.This simplest crossover method is illustrated in Fig. 4. It is also possible to select many cut-off points for a crossover operation.The number of crossover operations in each generation is determined by crossover probability.Crossover probabilities up to 0.80 give satisfactory results in many applications.
Mutation is an essential operator of the GA.A mutation operation is simply carried out by changing 1 to 0 or vice versa at a randomly selected bit among all chromosomes.Premature loss of genetic information from the population is highly probable in especially small populations.This is prevented by mutations.Smaller mutation probabilities are ideal to satisfy stability of the population for most of the problems.Some sample mutation operations are shown in Fig. 5.
Increasing the population size is a problem for computer memory.Population size expanded by mutations and crossovers is reduced to original size by selection operators.Selection is based upon the fitness of individuals.Fittest ones have more chance to be selected to the next generation.Normally, less fit ones have less of a chance.Therefore, each generation has a greater average fitness value than the previous one.Fitness of a chromosome is calculated by fitness function.This function is a definition of the optimization problem.In case of curve fitting applications functions utilizing sum square error, root mean square error, etc., can give satisfactory results.These operations are carried out until the end criteria is satisfied (Fig. 6).

Empirical predictive models by GA
In most of the regression, a template formula is determined by the researchers.In this study, a standard formulation is not given to the code.An initial template is introduced to the code.The final form of the formula is determined after GA procedures are run.The initial template is given in Eq. ( 1).
3 ) (1)  In this formulation, X i denotes coefficients or powers of the terms.Function alternatives and input parameters to be selected by the GA are denoted as f i and t i , respectively.Upper and lower limits of the coefficients and powers and function alternatives and input parameters are summarized in Table 4. GA selects optimal coefficients, powers, input parameters and functions.For example, if the optimal solution for (X 1 ,f 1 ,t 1 ,X 2 ) is (3, log, r, 4), then the term X 1 • f 1 (t x 2 1 ) becomes 3log(r 4 ).
As the original inputs have very different intervals, changes in input parameters of mutation by a bit or a crossover may cause significant jumps at fitness value of the individual.For example, magnitude, M, changes between 2.9 and 6.0, whereas shear wave velocity, V S , changes between 200 and 700 m s −1 in the dataset.To prevent big jumps at fitness values due to change of parameters, all input  parameters are mapped to similar intervals.Formulation of the mapping procedure for the inputs is given in Eq. ( 2).This formulation squeezes the variations in each parameter to interval [1,5].
P is the original data; P min and P max are upper and lower limits of the data corresponding to mapped limits of 1 and 5, respectively.p stands for the mapped data.Table 5 shows the upper and lower limits of the input parameters.For example, M d = 4 is mapped to m = 2.5.Fitness function used for the evaluation of the individuals is shown in Eq. ( 3), where PGA i is the observed data, PGA i,e is the estimated value corresponding to ith record.Fitness evaluation takes care of absolute error and the ratio of greater estimated and real PGA values to those smaller.So it is penalized for small PGA values with a small absolute error, but estimations have 2 or 3 times the real values or vice versa.Lastly, as larger PGA values are more significant for engineering purposes, fitness term is multiplied by the real PGA value to increase the weight of larger PGA values in the solution.Therefore, the domination of small PGA values is prevented.
After running GA for the minimization of the objective function, an optimal curve is obtained as the solution of PGA estimation problem.The proposed empirical formula is shown by Eq. ( 4).The formula has a more complex structure than the existing empirical predictive models.However, an important advantage of the proposed model is that it counts for the slope affect.The formulation takes into account mapped earthquake parameters m, r, z, h and S 3 , which gets 1 for soft soils and 0 for others.Estimated and recorded PGA values are plotted in Fig. 7.There exists a good correlation especially for high-PGA records, which are more important for the seismic design.The final model in Eq. ( 4) has PGA attenuation with the epicentral distance as shown in Fig. 8.It is quite reasonable that PGA reduces with the increase in epicentral distance.Attenuation of the PGA were plotted for different slope heights for magnitudes of M = 5 and M = 6 in Fig. 9.According to developed model, slope heights up to 100 m yields a limited increase in PGA.After this limit, PGA increases rapidly.PGA increases about 12 gals between 100 and 150 m of slope height, which is also achieved between 150 and 170 m of slope height.Due to the limitations of the dataset the formulation may not yield reliable results for higher slopes.The only parameter related to soil behaviour is S 3 , which has a limited effect on the results.Mainly because of the small number of records available for medium and hard soils.
In addition to parametric studies, the developed model is also compared with existing models.The same dataset is used in comparison with domestic relations.Results of those were compared with the proposed attenuation relationship in Table 6.According to the table, the attenuation model proposed in this study gives satisfactory results.Its performance is very similar to that of the model developed by Kalkan and Gülkan (2004).Both relations yield good correlation with low errors.Scatter diagram calculated for Kalkan and Gülkan (2004) relation is given in Fig. 10.When compared with Fig. 7, it is observed that both models overestimate low PGA values and underestimates PGA values higher than 100 gals.Deviation of the proposed formula is higher for low PGA values.On the other hand, deviation of estimations of proposed model is limited with respect to Kalkan and Gülkan (2004) formulation for higher PGA values.In Fig. 11, estimations of two models can be compared with the recorded data in descending order.Peak points for both models belong to the same recordings.Better performance of the proposed model for high-PGA values is also observed in this figure.

Conclusions
This study proposes an empirical model for the estimation of PGA based on genetic algorithms.The model is developed for the ground motion records obtained from south-west Turkey, with magnitudes ranging from 2.9 to 6.0.Besides, an evaluation of the existing attenuation models, developed with the records from Turkey, has been carried out.The dataset for the south-west Turkey was divided into training and test sets.The main results are discussed below.
Among the existing models, the empirical predictive relationship developed by Kalkan and Gülkan (2004) shows the best performance for the RMSE for the overall dataset, while the proposed model has a better correlation.However, the case is vice versa for the test dataset.In general, the proposed model in this study gives similar results with  Kalkan and Gülkan (2004).
the Kalkan and Gülkan model (2004).Among other models proposed for Turkey, these two models generate more precise estimates of PGA.In the training dataset and overall datasets, the proposed model by this study has a better correlation than the Kalkan and Gülkan (2004)  An important advantage of the proposed attenuation model is that it counts for the slope affects on PGA.It is a local attenuation model developed for the south-west Turkey, where many hilly settlement areas exist.The use of the model for these regions may produce more reliable results.
However, an important drawback of the models is that, the characterisation of the soil under the recording stations in Turkey is based on rough soil classifications.More precise soil parameters of each station are needed to improve the quality of the ground motion predictive models in terms of soil classification.Besides, very limited data is available for strong motions with magnitude greater than 6.Therefore, it should be noted that, results obtained from the proposed model may not be valid for higher magnitudes.

Figure 8 .Fig. 8 .
Figure 8. Attenuation of the PGA with respect to epicentral distance Fig. 8. Attenuation of the PGA with respect to epicentral distance.

Table 1 .
Local empirical predictive relationships developed from Earthquakes in Turkey.

Table 4 .
Input parameters and function alternatives.

Table 5 .
Upper and lower limits of original data used for mapping

Table 6 .
Performance of empirical predictive models recently developed for Turkey in case of south-west Turkey earthquakes.
Kalkan and Gülkan (2004)igure, RMSE of the proposed model is lower than the Kalkan and Gülkan model (2004) for the test dataset, but the training and overall datasets.Besides, the proposed model has better estimates for higher PGA values with respect to theKalkan and Gülkan (2004)model.This is the effect of weighting PGA values in objective function.Due to such a weighting, errors for low PGA values becomes less important than those for high PGA values.