The potential of region-specific machine-learning-based ground motion models: Application to Turkey

Conventional ground motion models have extensively been established worldwide based on classical regression analysis of records. Alternatively, advanced nonparametric machine-learning (ML) algorithms may capture the complex nonlinear behaviour of earthquake motions. This paper investigates the efficiency of artificial neural network (ANN) and extreme gradient boosting (XGBoost) in predicting peak ground acceleration (PGA), peak ground velocity (PGV) and pseudo-spectral acceleration (PSA) (period, T = 0.03 – 2.0 s) for the Turkish dataset. The dataset involves 1166 records of 383 events with a moment magnitude (M w ) of 4.0 – 7.6, Joyner and Boore distance (R JB ) of 0 – 200 km, focal depth (FD) less than 35 km, and site condition as the averaged shear wave velocity of the soil on the top 30 m (V S30 ) of 131 – 1380 m/s. The performance of the models is compared against empirical models in terms of root-mean-square error ( RMSE ), coefficient of determination ( R 2 ), Pearson correlation coefficient ( r ), and inter-event and intra-event residuals. To perform residual analysis, a likelihood function is developed. Findings reveal that the XGBoost approach gives an unbiased model with a higher correlation and lower residual than ANN. Finally, an online platform is provided for any interested users.


Introduction
Earthquakes have been the primary source of human losses throughout history, with large economic losses, particularly in seismically active regions.In earthquake engineering and engineering seismology applications, ground motion models (GMMs) are essential for estimating the intensity of ground shaking.They have been widely developed to predict ground motion intensity measures, IMs (e.g., peak ground acceleration, PGA, peak ground velocity, PGV, and pseudospectral acceleration, PSA, at different periods, T) along with the associated uncertainty in any site of interest.GMMs link ground motion IMs to variables involving fault mechanism (FM), event magnitude (mostly in terms of moment magnitude, M w ), focal depth (FD), source-to-site distance, and characteristics of the soil profile at the station.GMMs are commonly used in civil and earthquake engineering fields, ranging from performing deterministic or probabilistic seismic hazard analyses and developing seismic hazard maps for building codes.GMMs are also employed in assessing site-specific seismic hazard levels for designing infrastructures and seismic loss estimation studies.A literature survey reveals that the former studies have mainly developed global or regionspecific empirical GMMs based on classical regression analysis [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17].In recent years, the functional forms of the empirical GMMs have been largely modified to account for the nonlinearity, in addition to soil amplification, source mechanism, geometric and anelastic attenuation, and uncertainties involved in real motions.Therefore, recently proposed models became very intricate.Moreover, a key challenge in developing empirical models is the priori definition of functional forms with an adequate level of accuracy.
Nonparametric models (e.g.Refs.[18][19][20][21][22][23]), which do not require fixed functional forms, have been proposed as alternatives to deal with the high nonlinearity (complexity) of the ground motions and the difficulties involved in the parametric (empirical) models [24][25][26][27].Furthermore, Kong et al. [28] and Alimoradi and Beck [29] demonstrated the widespread applicability of machine learning (ML) algorithms (e.g., artificial neural network, ANN, random forest, RF, gradient boosting, GB, extreme gradient boosting, XGBoost, support vector machine, SVM) in seismology, highlighting their potential to enhance the understanding of seismic events and improve prediction accuracy.As such, with the advancements in artificial intelligence and soft computing techniques in recent years, a significant number of GMMs have been developed using various approaches.For example, Dhanya and Raghukanth [30] and Dhanya et al. [31] recently employed the ANN approach for developing a global GMM based on PEER NGA-West2 ground motion database and used a hybrid technique combining genetic algorithm and Levenberg-Marquardt technique to train the model.The developed model was able to capture the main ground motion characteristics of the existing GMMs from the NGA-West2 project [11,12,15,32] and the variability better than the previous ANN-based model developed by Derras et al. [33].On the other hand, Dhanya and Raghukanth [34] proposed assigning regional flags to the records while using the same approach to develop GMMs for regions with sparse recorded data, such as North-Eastern India and the Western Himalayas.Moreover, to overcome the high-frequency (>1 Hz) limitation of physics-based simulations and to eventually enhance the ground motion predictions for future seismic events, especially those with high magnitudes [35], Paolucci et al. [36] proposed enriching the simulated time histories by iteratively scaling their Fourier spectrum to match the prediction of PSA at short periods by ANN-based GMMs.Ghalehjough and Mahinroosta [37] used the fuzzy logic model to predict the PGA of Iranian ground motions and showed that the proposed GMM is more efficient than empirical GMMs.Furthermore, Khosravikia et al. [38] employed three ML techniques, ANN, RF, and SVM, to develop GMMs for Oklahoma, Kansas and Texas and concluded that if the data is sufficient, all ML techniques tend to provide more accurate estimates compared to traditional GMMs, and specifically, RF outperforms other algorithms.Likewise, Seo et al. [39] evaluated the performance of classical regression-based models and the GMMs developed using ANN, RF, and GB algorithms for South Korea to predict PSA, while the GB-based GMM was recognised as the best performing model.In general, as demonstrated by the studies mentioned earlier and many others (see Refs. ), the main advantage of such sophisticated GMMs is that if the ground motion database used to train the models is sufficiently large, they have lower dispersion and more accurate predictions than traditional regression-based ones since they can capture complex nonlinear relationships between the input and variables.However, the main drawback of models based on fuzzy logic and ML algorithms is that they are "black box" models, meaning that providing a physical interpretation of them is difficult.Further, it is not usually permitted to extrapolate such models beyond the original data range due to the absence of a physical model.Nonetheless, this is not believed to be a drawback, as using empirical GMMs outside the original data range is also considered controversial [64].
In particular, to handle epistemic uncertainty in probabilistic seismic hazard analyses, Atkinson et al. [65] study demonstrates the need for different GMMs in a logic tree format [66,67].It is worth noting that the applicability of different GMMs relies on their accuracy, model parameters, and, more importantly, the dataset used for the analyses.The study by Douglas [68] summarised the available parametric and nonparametric GMMs derived worldwide based on either real or simulated ground motion datasets between 1964 and early 2021.Overall, in addition to 87 empirical GMMs derived based on simulated ground motion datasets, that study summarises 485 and 316 empirical GMMs for predicting PGA and elastic PSA ordinates, respectively.Regarding nonparametric models, in addition to 18 backbone models, that study includes details of 39 models.The available models are mainly global, while implementing the NGA ground motion dataset or European records.
This work focuses on constructing a local GMM for Turkey, one of the distinguished seismic hazard zones, based on the most recent ground motion data.In the literature, some region-specific empirical GMMs are derived from the dataset of Turkey [1,2,16,58,[69][70][71][72][73][74].The model developed by Cabalar and Cevik [58] was proposed for predicting PGA, while the other studies were developed for predicting the full spectral ordinates.Among the available models, the studies of Özbey et al. [71] and Bindi et al. [1] have performed regression only using ground motions from events recorded in north-western Turkey, while the rest are developed based on all regions of Turkey.Regarding the nonparametric models, the study of Güllü and Erçelebi [41], where the datasets involve ground motions recorded up to 2004, predicted only PGA by employing the ANN approach.Later, Günaydın and Günaydın [57] proposed a nonparametric GMM to predict PGA using three different ANN methods: radial basis function, generalised regression neural networks, and feed-forward back-propagation.The proposed model was developed using the database of north-western Turkey between 1999 and 2000.The authors predicted the vertical and two horizontal components separately, employing M w , FD, hypocentral distance, and site conditions.Yerlikaya-Özkurt et al. [63] recently derived a GMM for Turkey to predict PGA and PGV using the multivariate adaptive regression splines method.That model was developed based on three independent variables, including M w , site condition as the averaged shear wave velocity of the soil on the top 30 m (V S30 ), and Joyner and Boore distance (R JB ) as the source-to-site distance metric.The authors employed 726 strong ground motions of 156 events with strike-slip fault mechanisms.M w range was within 3.8-7.6while the R JB range was within 0-200 km.
In this regard, this study introduces a novel nonparametric regionspecific GMM capable of predicting the full PSA ordinates by investigating the effectiveness of alternative advanced ML algorithms.This study is novel in that it contributes to filling a gap in the literature by developing a GMM utilising advanced ML-based approaches, namely, ANN and XGBoost algorithms, to estimate the spectral ordinates of the Turkish dataset with minimal computational resources.This is significant because existing models rely on empirical methods and require numerous regression coefficients, resulting in complicated calculations.Moreover, this study introduces a new analytical maximum likelihood formula as an adjustment to the model developed by Abrahamson and Youngs [75], which rectifies their likelihood function.Compared to the other studies, the GMM developed in this study uses the ground shakings, including the most recent large-magnitude earthquakes in Turkey (e.g., the 2020 Elazig earthquake with M w = 6.7, the 2020 Samos earthquake with M w = 6.6).The database of this study is compiled from AFAD [76] and includes 383 different earthquake events with a total of 1166 ground motion time histories recorded in Turkey between 1976 and 2022.The records have M w of 4.0-7.6,R JB of 0.1-200.0km, V S30 of 131-1380 m/s, and FD less than 35 km.To develop the GMMs based on alternative ML algorithms, the predictors are M w , V S30 , R JB , and FM.The present study investigates the efficiency of two alternative ML algorithms: ANN and XGBoost, for predicting peak ground motion parameters, including PGA, PGV, and the elastic PSA at 14 time periods for 5% damping within the range of 0.03-2.0s in Turkey.To optimise the hyperparameters of the ML models and assess the most efficient values, the Bayesian optimisation algorithm (BOA) [77] for the XGBoost model, along with the trial and error [78,79] approach for the ANN model, are utilised.To investigate whether the model is unbiased with respect to any predictor and to reduce the aleatory uncertainty [80], the ML algorithms herein are adjusted by splitting the uncertainty in terms of inter-event (between-event) and intra-event (within-event) terms using the approach developed by Abrahamson and Youngs [75].A correction is made to the likelihood function proposed by that study.Next, the performances of different ML algorithms are evaluated through root-mean-square error (RMSE), coefficient of determination (R 2 ), and Pearson correlation coefficient (r).The developed models are also compared against the empirical attenuation model by Akkar et al. [6] and Kale et al. [16] through training with the same database.Finally, the best nonparametric GMM developed by this study is determined and implemented in web-based application software for end-users.
A. Mohammadi et al.

Adopted strong ground motion database
Turkey is in a geologically active area, where most of the country lies on seismic faults dominated by mostly shallow active structures.The seismotectonic setting of Turkey can be explained by the interaction of the movement of the Arabian and African plates toward the relatively stable Eurasian plate in the north resulting in two major fault zones: North Anatolian Fault Zone and East Anatolian Fault Zone [81][82][83].
The instrumental dataset of the Turkish Disaster and Emergency Management Presidency (AFAD) [76] includes several earthquakes that occurred all over the country, several of which led to human losses and damage to the built environment [84].This study takes the raw strong ground motions between 1967 and 2022 with M w of 4.0-7.6,R JB of 0-200 km, FD less than 35 km, and all recorded at stations with V S30 ranging from 131 to 1380 m/s from AFAD [76].Additional raw recordings are retrieved from the dataset RESOURCE [85], while missing information related to specific events is gathered from additional sources [86][87][88][89] for the completeness of the dataset.The records are then filtered using baseline correction and a fourth-order band-pass Butterworth filter within the frequency range of 0.1-25 Hz.The dataset includes a variety of magnitude scales, such as M s , M b , M L , M w and M d .The homogeneity of the magnitude is ensured by eliminating the records with magnitude scales other than M w from the dataset.The collected database contains 1166 recordings from 383 distinctive earthquake events recorded at 269 seismic stations in Turkey since 1967.The spatial distribution of the stations and the earthquake events for this database is shown in Fig. 1.For statistical evaluation, Fig. 2 illustrates the histogram of seismic characteristics of the ground motion records in terms of M w , V S30 , R JB , and FD.The statistics reveal that large-magnitude events in the dataset are rare, while M w between 4.5 and 5.0 has the highest probability.The plot corresponding to V S30 at the stations, which describes the local site conditions, reveals that most of the stations in Turkey have soil types C and D consistent with the soil classification system of the National Earthquake Hazards Reduction Program (NEHRP) [90].The distribution plot of R JB , which is the shortest distance from a station to the surface projection of the rupture plane selected for characterising the source-to-site distance, reveals that most records have R JB less than 75 km.Finally, the distribution of FD reveals that most events are shallow earthquakes having a mean FD of approximately 12 km.
In this study, accelerograms from all events, including main-shocks and fore-/after-shocks, are included in the analysis, and this decision is based on the adequacy of their waveform characteristics for computing accurate ground-motion intensity metrics of interest, as noted by Kale et al. [16].Previous research, including Douglas and Halldórsson [91], found no significant differences in spectral accelerations between main-shocks and after-shocks when using the same dataset as Ambraseys et al. [92].This finding supports the decision to use all available strong-motion data to develop the GMM for this study.In addition, most of current GMMs for Turkey are constructed based on the entire dataset, further justifying the choice to retain all available data for the analysis.The information regarding the FM, including normal (N), reverse (R), and strike-slip (SS) for all earthquakes, is plotted in Fig. 3.The distribution of FM demonstrates that SS is the predominant fault mechanism in Turkey (almost 60%).In contrast, events with the R fault mechanism have the smallest occurrences.
Fig. 4 illustrates the distribution of M w versus R JB for different soil classes and fault mechanisms.The scatter plots reveal that the number of near-field records, particularly for R JB < 10 km and large-magnitude events, is relatively small.In contrast, the dataset is abundant for the M w range of 4.0-6.0 and R JB larger than 10 km.As stated, most recorded motions for large-magnitude events have NEHRP-C and NEHRP-D soil types.Earthquakes with the R fault mechanisms within the M w range of 5.0 and 6.0 are also rare.Finally, large-magnitude events with M w more than 7.0 have happened primarily due to the rupture of faults with the SS fault mechanism.In contrast, no large-magnitude event (M w > 6.5) struck due to the rupture of the N fault mechanism.
To develop the GMMs of this study based on two alternative ML approaches, the predictive variables are considered as M w , V S30 , R JB , and FM.Given the input variables M w , R JB , V S30 , and FM, the vector of IMs, Fig. 1.Spatial distribution of the stations and earthquake events for the Turkish dataset between 1967 and 2022.Finally, the characterisation of intra-event spatial correlations is recognised as a valuable tool for evaluating the performance of GMMs, particularly for near-field records where the correlation is known to be prominent.Nevertheless, it should be noted that the spatial correlation model is deemed adequate and dependable for datasets obtained from dense seismic array networks with ample recordings of each event [94] or for developing simulated-based GMMs [95].As the seismic network for past events was limited in Turkey, and most events had a small number of near-field stations, this phenomenon was not accounted for in the present study.Furthermore, it should be noted that even if spatial correlation affects estimated GMMs, its overall impact on predictions is relatively insignificant [96].

Methodology
This section summarises the techniques used to generate the GMMs for this study.Next, a discussion of conventional methodologies, ML approaches and optimisation algorithms for tuning the hyperparameters of the developed models will be presented.Following this, the mixedeffect algorithm will be reviewed.Indicators of model performance and an overview of the research methodology will be provided at the end.

Conventional GMMs
The familiar approach for predicting a ground motion IM, such as PSA, is to employ ground motion prediction equations, the most recently known as GMMs.Empirical GMMs are typically developed using a sta-  tistical regression [97] on the large sets of ground motion intensities observed in past earthquakes.Since significant scatter is present in the observed data for each IM, GMMs, in general, deliver a probability distribution instead of a single value as follows: where ln y ij is the natural logarithm of the IM, i denotes the index of the earthquake event, and j represents the station's index.μ(X i ,X ij ,θ) indicates the median ground motion prediction function, with X i representing event-related parameters, X ij defining station-related parameters and θ being the vector of model parameters.η i is the inter-event (between-event) and ε ij is the intra-event (within-events) residual components in the natural logarithm scale.The term "between-event" refers to the average difference between the median estimates of the GMM and the observed ground motions for the i th earthquake.The term "withinevent" refers to the difference between the record of the i th earthquake at the j th station and the median prediction for the i th earthquake.Both residuals, i.e., inter-event and intra-event components, are supposed to be normally distributed independent random variables with zero mean and standard deviations of τ and σ, respectively.Finally, the total standard deviation corresponding to the GMM is reported by: The efforts toward seismic hazard characterisation of Turkey gained momentum after ˙Izmit 1999 (M w = 7.4) and Düzce 1999 (M w = 7.1) earthquakes.Consequently, some empirical local GMMs have been proposed for Turkey [69,70,98] to estimate either PGA or PSA values.Moreover, some regional GMMs were proposed for the north-western Turkey [71,99].Akkar and Çaǧnan [2] evaluated some of these pioneering GMMs, and found a bias, potentially due to the data used for regression analysis.The authors proposed an empirical model (AC10) that considers the faulting mechanism and the magnitude scaling, geometric decay and site effects, which were also considered in previous GMMs.The latest local GMM (KAAH15) for Turkey was proposed by Kale et al. [16], considering estimators to account for anelastic attenuation.
Moreover, it is common to use GMMs for seismic characterisation of hazards, developed based on databases containing ground motion recordings of worldwide events or events coming from a broader region, such as pan-European records.Among others [100], a set of new empirical GMMs (ASB14) is proposed for the Middle East and Europe [6].These GMMs use the ergodic assumption, which leads to large aleatory variability in source effects, attenuation and path effects, and site seismic processes.Kotha et al. [101] suggested a new non-ergodic GMM (KO16) to reduce the estimated aleatory variability.Both ASB14 and KO16 utilised the dataset RESOURCE [85] assembled for the Middle East and Europe.
In the present study, the GMMs based on alternative ML approaches have the form given in Eq. ( 1) in which X i includes M w and FM, and X ij contains R JB and V s30 .Moreover, to compare the predictive capabilities of the considered ML algorithms, a comparison is provided with the GMM of ASB14 proposed based on R JB , in addition to a recent local GMM for Turkey, KAAH15.

Machine-learning algorithms
In this study, two alternative ML algorithms are utilised and tested to derive GMMs for the ground motion dataset of Turkey.It is noted that ML algorithms are sensitive to the scale of data; thus, it is recommended to transform the input features into a similar scale.To accelerate the training speed and to minimise the possible errors, the input dataset of this study is normalised using the following expression: where θ si is the scaled value of θ i, which is the input parameter, and the terms θ max and θ min are, respectively, the maximum and minimum values of that parameter in the training dataset.The range between 0.2 and 0.8 is selected to avoid analysis failure at the value of zero [102].Nonetheless, the analysis has shown that selecting any range within (0,1] does not significantly affect the results.
In the following sections, the two adopted techniques are described in detail.

Artificial neural network
ANN is a prevailing computational tool, particularly for solving  A. Mohammadi et al. complex regression and classification problems, which can imitate the cognitive skills of thinking human minds to solve real-world problems that conventional approaches cannot follow [103].Empirical equations may yield a complex and not applicable expression when a problem involves many explanatory parameters.In such cases, ANN models can predict the solution of highly nonlinear and complex problems better than statistical or empirical models.Fig. 6 gives a schematic representation of the structure of the ANN algorithm employed in this work.The model consists of an input layer, a hidden layer, and an output layer.The neurons of the input layer directly receive initial signals from the explanatory variables for further processing in the adjacent layers.A nonlinear transformation is applied through an activation function, f (.), on the summation of weighted input signals arriving in the network of Fig. 6.The linear summation of the weighted inputs with bias is the net input (u k ), which can be expressed as: where m is the number of hidden layers, x k is the k th input variable, w ki is the weight of k th input variable given for i th neuron, and b i is bias in i th neuron.Subsequently, the output layer performs a linear transformation on the summation of weighted signals entering this layer.Finally, the output of the model is obtained in the output layer.
In this study, the error between the desired targets and outputs of the model is minimised using an error backpropagation algorithm.This algorithm adjusts the connection weights (w) and biases (b) of the ANN model.Among various backpropagation algorithms, the Marquardt-Levenberg algorithm [104,105], developed for solving least square problems, is used.Design of the architecture of neural networks consists of identifying the number of hidden layers and its neurons, and the type of activation function.These hyperparameters are decided based on the minimum calculated RMSE of the test dataset for each possible combination using a trial-and-error method.In this work, the optimal choices for the hyperparameters of the ANN model are obtained as one hidden layer, four neurons, and a log-sigmoid activation function (Table 1).

Extreme gradient boosting
Chen and Guestrin [106] proposed XGBoost as a practical implementation of the gradient boosting technique.To eliminate model complexity and prevent overfitting, XGBoost includes a regularisation function.Due to its capacity to handle large-scale problems with significant functioning and execution speed, XGBoost has recently gained popularity as an ML method.It is, however, more challenging than other boosting algorithms to understand and interpret [107].Fig. 7 presents the structure of the XGBoost approach.In the first step, a tree is trained with randomly selected data to predict the given data.Then, the residuals of the predictions are used to train the next prediction tree.The residuals of trained trees are consecutively used for training another tree.This iterative approach updates the model parameters to optimise the objective function through division into two parts: one part represents the loss function (L).In contrast, another part penalises the model's complexity and prevents overfitting, as shown below: where, γ is the complexity parameter of each tree leaf, K is the number of leaves, λ is the regularisation parameters, and w j is the score of j th leave.
To obtain the best XGBoost model, the hyperparameters of the algorithm, as listed in Table 2 should be optimised.The BOA is used for this purpose, as explained in the following section.

Optimisation algorithms for tuning the hyperparameters of the ML algorithms
In general, there is no easy way to define the best parameters of a neural network, which is an optimisation problem beyond the scope of this study.An efficient yet straightforward way to define reasonable values for the parameters of an ANN model is trial and error.This algorithm, which is generally used in the literature [78,79], is utilised here for tuning parameters of the ANN model.
On the other hand, to identify the optimal hyperparameters of the XGBoost approach, BOA, which is effective in contrast to different known optimisation approaches (e.g., manual, random search, grid, particle swarm optimisation), is used in this study.The term Fig. 6.Structure of the artificial neural network (ANN) model and illustration of artificial neurons of the hidden layer."optimisation" in BOA refers to the global optimisation of a black-box function for which the formula and derivatives are unknown [108].This optimisation stems from Bayes' theorem as below: where ω denotes an unseen value, p(ω) is the prior probability distribution, p(D) is the evidence, p(D|ω) denotes the probability, and, finally, p(ω|D) represents the posterior probability distribution.Prior knowledge is employed by Bayes' rule in order to define the posterior possibility in which the outcomes of earlier iterations are considered for determining the values of the upcoming iteration.Two sub-models, the acquisition, and the substitute, can be used with the BOA.The substitute model assesses the objective function through the Gaussian process (GP), a common surrogate for objective function modelling.This is a Gaussian distribution generalisation.In general, GP describes a prior over function, which can be changed into a posterior over function after observation of specific values of the function.This method assumes that the function F (x) is a realisation of GP with the mean of μ and the covariance of Κ [109]: The acquisition function of BOA is maximised over repetitions and depends on the prior observations.The acquisition model recommends iteration using the findings of the substitute model as the next step.The hyperparameter optimisation through BOA is expressed mathematically as: The best set of hyperparameters (x * ) for any space (xεX) can be assessed by finding the optimised value for the objective score (i.e., F (x)).
An overview of BOA steps is summarised below.
Step Step 6: Iterating steps 3 to 5 for the maximum iteration number Step 7: Training the selected machine using the obtained hyperparameters

Mixed effect model
Mixed effect models are beneficial for cases where data are acquired through repeated measurements.In this case, both fixed and random effects components are included in the residuals.This study employs seismic events recorded at different stations with various site characteristics and distance information.It is well known that the variability between seismic events and even inside the records of each earthquake is high, which requires splitting the total residual into different components [75].
Here, the well-known procedure for the mixed effect model suggested by Abrahamson and Youngs [75] is used to perform residual analysis.This procedure is modified herein by proposing an algebraic maximum likelihood function for computing model parameters and variances using the expectation-maximisation algorithm.The solution to Eq. (3) of Abrahamson and Youngs [75], given in Eq. ( 7) of the same reference, is revised here.The new likelihood function is derived in Appendix A. It should be highlighted that computationally solving this function, Eq. (A-10), is substantially more straightforward and faster than solving Eq. (A-1).
Here, the artificial bee colony (ABC) [110] and genetic algorithm (GA) [111] are respectively used in ANN and XGBoost approaches to maximise the log-likelihood function (minimising the − ln L) of Eq. (A-Fig.7. The structure of the extreme gradient boosting (XGBoost) approach.

Table 2
Hyperparameters of the extreme gradient boosting (XGBoost) approach and their search spaces.10).ABC and GA are numerical approaches for finding the optimal configuration that minimises the objective function of interest.The intelligent foraging behaviour of honeybees and the mechanics of genetics besides natural selection form the foundation of ABC and GA algorithms, respectively.These metaheuristic algorithms utilise iterative search techniques for solving a function of complex nature (as they do not require knowledge of the derivatives).Despite being computationally basic, the algorithms are powerful tools for optimisation problems.
The ABC algorithm has three setting parameters, making it more flexible than other most known algorithms.The algorithm mimics the behaviour of three types of bees in a colony, namely employed bees, onlooker bees, and scout bees.In this method, the artificial bees in the hive are divided into two groups: employed and onlooker bees.Each employed bee flies into a specific food source, then randomly searches the neighbourhood of the food source and evaluates the nectar's quality and shares the information with onlooker bees in the hive.It is noteworthy that each food source is a candidate for the solution of the problem.In the first step, a random population of the artificial bee is generated as: where SN is the number of food sources equal to that of employed or onlooker bees.Choosing the first setting parameter of the ABC algorithm, SN, depends on the complexity of the problem.Φ i is a vector including the variances σ 2 and τ 2 .Thus, Φ i can be defined as follows: Considering the random generation of the initial population of the artificial bee colony in the first phase of the optimisation process, σ 2 i and τ 2 i are considered as random numbers in the range [0, 1].In this phase, each employed bee searches around the assigned food source by the following equation: where Φ NEW i is the new value found for Φ i .k is defined randomly different from i. X is a random variable between [-1,1] (X ∼ U[ − 1, 1]).When the employed bees return into the hive, the information related to each food source quality is evaluated by the fitness value as follows: where F(Φ i ) denotes the value of objective function of i th food source.When artificial employed bees share the information in the hive, onlooker bees select the food source based on the probability of i th food source with the following equation: In the second phase, onlooker bees search the neighbourhood of the selected food source using Eq. ( 13).If the quality of the food source cannot be enhanced after a predetermined number of searches (limit value; the second setting parameter), the food source will be abandoned.Thus, in the third phase, the employed bees that could not find a better solution change into scout bees randomly search the solution space in the range [0, 1].This process is terminated if iterations exceed a predefined maximum cycle number (MCN), the third setting parameter of the ABC algorithm.By minimising the objective function, the unknown parameters of the optimisation problem are obtained accordingly.The values considered for the setting parameters of the ABC algorithm were 10, 50 and 100 for SN, limit, and MCN, respectively.These values were found to be sufficient for determining the unknown parameters of the problem.It is noteworthy that the number of food sources (i.e., SN) can be increased.However, it would be with the cost of more computational effort.An in-depth study of the determination of the setting parameters is out of the scope of this study.On the other hand, GA is a class of optimisation algorithm that is inspired by the process of natural selection in biological evolution.It is particularly useful in solving complex optimisation problems that involve a large search space and numerous constraints.The GA process involves modelling the desired solution as a set of parameters, which are then represented by a chromosome.The chromosomes are combined and mutated to generate new solutions, which are subsequently evaluated for fitness.The fittest solutions are selected and used to generate a new population of chromosomes, and the process is repeated until the desired solution is found.
Below is a step-by-step procedure to develop the mixed effect model, considering the explanation provided regarding GA and ABC.
Step 1: First, an initial model is trained by a fixed-effect training procedure, i.e., the random effect is assumed to be equal to zero as follows: Step 2: Residual components are computed by maximising the loglikelihood function (Eq.(A-10)) using the numerical algorithms (ABC and GA).
Step 3: Based on estimated values of (σ,τ) and vector of model parameters θ, the random-effect term is obtained through the following formula: ) Step 4: A new model is trained using a fixed-effect training procedure for ln y ij − η i .
Step 5: Steps 2, 3, and 4 are iterated until the termination criterium is fulfilled.The adopted termination criterion is 0.5% in terms of the difference between two successive likelihood values.

Evaluation of the model performance indicators
In this study, the results from the n-time repeated k folds are averaged to evaluate the performance of two alternative ML approaches.For this purpose, RMSE, R 2 , and r are the three statistical performance indices used in this study.These indicators are calculated as follows: where n is the number of samples, y is the actual value, ŷ is the predicted value, and y and ŷ are the arithmetic means of y and ŷ values, respectively.In this study, to evaluate the accuracy of the proposed ML models, the estimated IMs in terms of PGA, PGV, and PSA at different periods within 0.03-2 s are compared with those of real values.Among the model performance parameters, R 2 quantifies the variance in the response variable that can be predicted using the predictor variables.
The error-related indicator (RMSE) indicates the average distance between the observed and predicted response values.Finally, the r measures the degree of linearity between the predicted and observed IMs.The performance of a given GMM increases by an increase in R 2 and r and a decrease in RMSE.

An overview of the research methodology
Fig. 8 shows an overview of the procedure used for developing the ML-based GMMs in this study.As can be seen, in the first step, 80% of records from different M w ranges are randomly selected for training, and the rest of the data is stored for testing.Then, the training dataset is scaled and used to train the machine for the first time (with a repeated kfold cross-validation approach).It is important to note that hyperparameters of the machines are tuned using BOA (for XGBoost) or trial and error (for ANN).Through the likelihood function proposed in appendix A and given measured and predicted IMs, the intra-event and inter-event uncertainties (σ, τ) are obtained.Given the σ and τ values, the intra-event and inter-event residual terms are calculated (ε, η).The machine is retrained by ln(IMs)-η i, and a new likelihood value is obtained through a similar approach.This procedure continues until the likelihood function converges to the maximum value based on the termination criteria (0.5% in the difference between two successive likelihood values).Finally, the model performance is validated using the test dataset, which is scaled by considering the scaling parameters from the training dataset.

Results and discussion
This section compares the suitability of alternative ML-based GMMs to conventional GMMs, including ASB14 and KAAH15.Results are initially evaluated schematically and then statistically by calculating model performance indicators and inter-/intra-event residuals.Finally, the superlative model is chosen by evaluating the findings, and the results for the proposed model are further analysed.

Evaluation of the developed ML-based GMMs (ANN versus XGBoost)
This section compares the performances of the two ML algorithms against the empirical approaches of ASB14 and KAAH15.It is noted that for the sake of consistency, the empirical models are trained using the same training subset of this study.Fig. 9 presents the distribution of the observed versus predicted values for a sample IM, herein ln(PGA), evaluated through different algorithms for the entire dataset.In each figure, dashed lines reflect the ideal estimate (i.e., where the predicted and observed values are identical).The concentration of data along dashed lines demonstrates the correlation between estimated and observed values.The findings show that datasets are sufficiently close to the optimum fit line from all approaches.However, depending on the approach, the accuracy of the model changes.It is observed that XGBoost provides a better match than ANN and the other two empirical methods.The results of the ANN model are more consistent with those of the two empirical models.In addition to schematic comparisons, it is well-known that a model is only legitimate if it provides good model performance indicators.For this purpose, the effectiveness of the two ML algorithms for constructing GMMs against the aforementioned empirical GMMs is assessed by comparing the outcomes in terms of the model performance indicators as specified in section 3.5.Fig. 10 compares the results in terms of RMSE, R 2 , and r for different IMs, including ln(PGA), ln(PGV), ln(PSA T = 0.2 s ), ln(PSA T = 0.5 s ), ln(PSA T = 1.0 s ), and ln(PSA T = 2.0 s ) from all GMMs.The RMSE from the ANN approach varies between 0.65 and 0.81.At the same time, the RMSE based on the XGBoost approach ranges between 0.55 and 0.68 while the other two empirical models provide RMSE, almost varying between 0.60 and 0.95.
According to the general hypothesis provided by (Smith 1986), the r values above 0.8 indicate a significant linear correlation concerning the estimated and observed values.As shown in Fig. 10 the r values for all models are above 0.8, meaning that the models capture the real values of the IMs in general.Nevertheless, the r values obtained from the XGBoost method are above 0.9, the highest compared to the others.Similarly, when the results are assessed in terms of R 2 for the investigated IMs, the mean value from the XGBoost approach is roughly obtained as 0.85, whereas the mean value from the ANN approach is around 0.80.For the other empirical models, including ASB14 and KAAH15, this value is estimated as 0.72 and 0.70, respectively.Therefore, the XGBoost model provides minimum error boundaries and maximum correlation coefficients compared to the other models.
In the final phase, it is essential to validate the model's bias to input variables such as M w , R JB , and V S30 .For this purpose, it is necessary to evaluate the inter-event and intra-event uncertainties, which indicate the variance of residuals concerning the seismic earthquakes and sites, respectively.To this end, the inter-event, intra-event and total uncertainties of the developed GMMs based on the ANN and XGBoost algorithms are calculated for PSA at all periods.For the sake of comparison, these plots are also developed for the two empirical models, including ASB14 and KAAH15.Results are illustrated in Fig. 11.It is observed that for all spectral values, the inter-event uncertainty is smaller than the intra-event uncertainty from all models in all period ranges.Compared to the empirical models, the ML-based GMMs have acceptable uncertainty ranges and can perform well.Among the two MLbased models, the trend in the total uncertainty for the ANN model is closer to the KAAH15 model at all period ranges.When the results of the Fig. 8. Iterative process for finding random and fixed-effect residuals.
A. Mohammadi et al.Fig. 9. Observed (targets) versus predicted (outputs) values in terms of a selected intensity measure, ln(PGA, from different machine-learning (ANN and XGBoost) and conventional algorithms (ASB14 [6] and KAAH15 [16]).Fig. 10.Model performance indicators in terms of root-mean-square error (RMSE), coefficient of determination (R 2 ), and Pearson correlation coefficient (r) given for selected intensity measures, including ln(PGA), ln(PGV), ln(PSA T = 0.2 s ), ln(PSA T = 0.5 s ), ln(PSA T = 1.0 s ), and ln(PSA T = 2.0 s ) from different machine-learning (ANN and XGBoost) and conventional algorithms (ASB14 [6] and KAAH15 [16]).The smaller RMSE and higher R 2 and r indicate the better performance of each model.two ML-based GMMs are compared, the inter-event residuals of the two models are approximately the same (except for the periods greater than 0.8 s).Nevertheless, the XGBoost model results in smaller intra-event residuals than the ANN model.This observation leads to smaller values of total uncertainty at all period ranges from the XGBoost model compared to the ANN model.It is also observed that the two empirical GMMs for the database of this study provide higher uncertainty than the XGBoost algorithm.
In conclusion, the ML-based GMM developed by the XGBoost algorithm is a robust predictive model because it has a lower RMSE and uncertainty and a higher R 2 than the models of ANN and ASB14 and KAAH15.This study, therefore, proposes the XGBoost-based GMM for the Turkish dataset.The outcomes of this model will be reviewed in depth from this point on.

The proposed GMM for the Turkish dataset (XGBoost)
XGBoost results in higher accuracy, and the results are further analysed with this model.Fig. 12 presents the distribution of the inter-event residuals from the XGBoost-based GMM regarding the source-related parameter (M w ) for different IMs.Similarly, Figs. 13 and 14 illustrate the distribution of the intra-event residuals, with respect to the siterelated parameters, including R JB and V S30 .The selected IMs are ln (PGA), ln(PGV), ln(PSA T = 0.2 s ), ln(PSA T = 0.5 s ), ln(PSA T = 1.0 s ), and ln (PSA T = 2.0 s ).It is noted that the reason for considering different IMs herein is to investigate the performance of the developed model for estimating a bandwidth frequency, including low-, intermediate-, and high-frequencies.In these figures, the top box plots show frequency distribution of earthquakes with respect to a specific IM (i.e., M w for inter-event, and R JB , and V s30 for intra-event residuals).The boxplots on the right-hand side of these figures present the frequency distribution of the inter-/intra-event residuals.The diamonds represent data points beyond the third quartile of the data distribution.The fitted red lines to residuals versus explanatory variables indicate the means of residuals along those variables, and the shaded area around these lines represent the 95% confidence intervals for the true mean of the residuals.It is noted that the size of the confidence interval is proportional to the number of data points used in the analysis.The absence of any trend in the mean of residuals with tight confidence intervals suggests a high level of confidence in the unbiasedness of the model errors across the M w , V s30 , and R JB parameters.Nevertheless, this was tested using pvalues, which are computed at a significance level of 0.05, and are presented in the subplots to facilitate the decision of accepting or rejecting the null hypothesis regarding the unbiasedness of the estimates.When the p-value of the IM is close to 1.0, it suggests that the resulting residual is less biased with respect to the input parameter.As Fig. 11.Distribution of the inter-event (τ), intra-event (σ), and total (Ø) uncertainties for pseudo-spectral acceleration with respect to the period from different machine-learning (ANN and XGBoost) and conventional algorithms (ASB14 [6] and KAAH15 [16]).shown Figs.12-14 for all IMs, the inter-event residual varies between − 0.75 and 0.75 and intra-event residual varies between − 2.0 and 2.0, which is consistent with the observations of other studies [6,16].Overall having a p-value over 0.05 for all considered IMs supports the assumption that the mean residual does not exhibit any discernible pattern, indicating that the model is free of source-related or site-related bias for all frequency bands.In Fig. 12, the confidence interval of the residuals is wider for earthquake events with Mw above 6, which may be attributed to insufficient datasets in the large-magnitude range.Finally, a comparison of the inter-event and intra-event residuals reveals that the intra-event residuals are greater than the inter-event residuals.This observation is consistent with the outcomes of other studies [6,16].
Although XGBoost is highly efficient, interpretation of its results is challenging compared to other predictive models such as ANN.There are techniques to address this problem, among which the one employed in this study and referred to as Shapley additive explanation (SHAP) [112].SHAP is developed based on the game theory to interpret the outputs of any ML-based model, including XGBoost.In this technique, predictions are made with or without each of the input variables.Then, the importance of each input variable is measured by comparing these predictions.Fig. 15 presents the SHAP values for the entire database and the input features of the GMM, where the x-axis shows the SHAP value for each earthquake record.On the y-axis of these graphs, input variables are ordered from the most significant (on top) to the least effective (at the bottom).The value of input variables (feature value) is displayed on a scale from lowest to highest, with blue representing the most inferior and red representing the most superior.As seen in these plots, depending on the output of interest, M w or R JB has the highest effect on the model.For IMs, including ln(PGV), ln(PSA T = 0.5s ), ln(PSA T = 1.0s ), and ln(PSA T = 2.0s ), M w provides the highest effect, while for ln(PGA) and ln(PSA T = 0.2s ) R JB has the highest impact.The results of this study are consistent with a recent study by Withers et al. [113] which developed a ML-based GMM and showed that distance is the most critical factor, with decreasing importance as a function of period.Additionally, the influence of M w increases at longer periods.Finally, for all outputs, FM has a minor effect on the predictions.These findings are consistent with the known physical behaviour of ground motion records and the results support and extend current knowledge of input parameter importance in GMMs.
The proposed GMM is evaluated further to determine if it can represent physics-based phenomena regarding the behaviour of real earthquakes.For this purpose, the results for various magnitude and distance combinations using V S30 = 760 m/s and FM of SS are compared.Fig. 16 displays the estimated PGA, PGV, PSA T = 0.2 s , PSA T = 0.5 s , PSA T = 1.0 s , and PSA T = 2.0 s from the XGBoost-based GMM for various M w ranges between 4.0 and 7.6.This effect is evaluated for four R JB values, namely 15, 50, 75, and 150 km.An increase in M w and a decrease in R JB leads to a rise in the PGA, PGV, and PSA levels at all period ranges.In addition, the trend of the GMM is compared against the change in R JB for different moment magnitudes (4.5, 5.5, 6.5, and 7.5) using various values of R JB between 0 and 200 km.The outcomes are plotted in Fig. 17. Results show that an increase in R JB leads to a decrease in the PGA, PGV and PSA levels at all period ranges, indicating that the suggested GMM effectively captures the distance-dependent attenuation.Consistent with the former observation, an increase in the magnitude results in an increase in the ground motion amplitudes.Upon analysing the ground motion data, we observed that as the distance between the source and site decreases, the difference between the magnitudes of 6.5 and 7.5 narrows.This phenomenon is a well-known characteristic of earthquake ground motions in shallow crustal earthquakes in interplate   tectonic regions.At a given distance from the source, the amplitude of ground motion depends on both the earthquake's magnitude and the distance from the source, leading to a narrowing effect between the ground motion amplitudes of earthquakes of different magnitudes at shorter distances.This narrowing effect is due to non-self-similar ground motion scaling and a magnitude-distance dependent saturation of earthquake ground motion amplitudes at larger magnitudes [114].The good agreement between the ground motion model developed in this study and the observed behaviour of real earthquakes confirms the validity of the proposed approach, which incorporates this physical phenomenon, and supports its use for seismic hazard assessment in the study region.
This study also delved into radiation damping in ground motion records and its implications in the developed GMM.To investigate this phenomenon, the variation of the PSA concerning R JB is investigated for the SS fault mechanism, V S30 = 760 m/s and two different moment magnitudes (M w = 4.5 and M w = 7.5).Results are plotted in Fig. 18a.The analysis revealed that the peak value of PSA decreases and shifts towards longer periods as the distance increases, which agrees with physical properties regarding the distance-dependent damping of ground motions as observed in previous studies [30,115].This behaviour is attributed to the attenuation of seismic energy as it propagates away from the source due to the dissipative properties of the earth's crust.Additionally, it is verified that the developed GMM captures the radiation-damping characteristics of ground motion records, indicating the model's efficacy.Consistent with earthquake physics, the event magnitude affects how far the peak shifts.Finally, the efficiency of the developed GMM is investigated for soil classes C and D, which are the predominant soil types in the region, according to the NEHRP soil classification [90].For this purpose, a representative V S30 value of 300 m/s and 560 m/s are used for soil types C and D, respectively.All estimations are carried out for the SS fault mechanism using R JB = 30 km and for two different moment magnitudes (M w = 4.5 and M w = 7.5).Results are plotted in Fig. 18b.It is evident that as the soil type shifts from stiffer soil to softer soil (i.e., type C to type D), the ground motion amplitudes increase (particularly for longer periods), and the peak of the spectra moves near longer periods.Results also demonstrate that the earthquake magnitude affects how far the peak shifts, which is consistent with the physics of earthquakes.
Overall, the interpretation of the results reveals that the proposed XGBoost-based GMM can capture the behaviour of empirical GMMs with a need for minimal seismological data and without the necessity for nonlinear regression with multiple coefficients.The proposed model features a predefined closed-form function to estimate PGA, PGV and PSA for the Turkish dataset and is accessible to users without requiring many computations (Appendix B).Finally, the proposed XGBoost-based GMM could be implemented in future studies for large and more homogeneous datasets to improve its accuracy and minimise limitations, particularly for large-magnitude events and closer distances, by either using real worldwide datasets or combining real with region-specific simulated ground motions.

Summary and conclusions
This study investigates the efficiency of two alternative ML algorithms for predicting peak ground motion parameters and spectral ordinates: ANN and XGBoost.The comparison includes PGA, PGV, and the PSA for 5% damping at 14 time periods within the range of 0.03-2.0s.Turkey is used as a case study, and the dataset consists of 1166 ground motions with an M w range of 4.0-7.6 and R JB of 0-200 km observed during 383 seismic events since 1976.The stations feature V S30 ranging from 131 to 1380 m/s.To optimise the hyperparameters of the ML models, the Bayesian optimisation and trial-and-error procedures are used, respectively, for the XGBoost and ANN approaches, where the most effective hyperparameters are determined.To determine if the model is biased toward any predictor and to reduce the aleatory uncertainty [80], the ML algorithms of this study are modified by dividing the uncertainty into inter-event (between-event) and intra-event (within-event) terms.For this purpose, the method proposed by Abrahamson and Youngs [75] is implemented using a modified version of the likelihood function originally proposed.Next, the performance of the ML algorithms is determined using a set of model performance  indicators, including R 2 , RMSE, and r.The developed models are compared to alternative empirical attenuation models existing in literature utilising the same database.
Interpretation of the results of this study reveals that developing nonparametric GMMs with modern ML techniques yields results better than those of conventional GMMs.Among the two ML algorithms, the best approach is chosen to be the XGBoost model since it provides the minimum error and maximum correlation for peak ground motion parameters and all spectral coordinates.Consistent with conventional GMMs, residual analysis generates acceptable uncertainty for all spectral values.The residuals are further evaluated regarding the inter-event and intra-event uncertainties with respect to explanatory factors.For this purpose, the inter-event residual is examined relative to the magnitude, whilst the intra-event residual is investigated against the soil and distance information of the dataset.Overall, the inter-event uncertainty for all spectral values is less than the intra-event uncertainty.It is also demonstrated that inter-event and intra-event residuals contain no substantial bias.with respect to the input variables, indicating that the constructed GMMs, in general, adequately describe the overall behaviour of the ground motion dataset.
The proposed XGBoost-based GMM accurately captures the physical properties of ground motion records, including distance-, magnitude-, soil-, and radiation-damping effects.The results also reveal a narrowing effect between ground motion amplitudes of earthquakes of large magnitudes at shorter distances, which is consistent with earthquake physics of shallow interplate tectonic regions.The good agreement between the developed ground motion model and the observed behaviour of real earthquakes in the region confirms the validity and effectiveness of the proposed approach for seismic hazard assessment in the study region.
Finally, the results of this study demonstrate that the proposed XGBoost-based GMM can capture the behaviour of empirical GMMs using minimum seismological data and without a need for nonlinear regression with numerous coefficients.This research introduces a novel nonparametric local GMM for the Turkish dataset by designing and implementing a web-based application platform for end users (Appendix B).The proposed model might be employed for other regions.Still, it is recommended to consider the range of the seismological parameters of the original dataset by accounting for the uncertainties involved.Last but not least, to increase accuracy and reduce limitations of the proposed model, particularly for large-magnitude events and closer distances, the suggested ML-based GMM could be further studied in future research for other tectonic zones with vast and more homogeneous real datasets.Other limitations of this study are the lack of consideration of spatial correlation and additional input parameters to capture near-field effects in the ground motion records.To improve the model, future studies could address these behaviours.This could also be fulfilled by combining real catalogues with region-specific simulated ground motions for regions with limited datasets of large-magnitude near-field records.

Funding
This work has received funding from multiple sources.This work was partly financed by FCT / MCTES through national funds (PIDDAC) under the R&D Unit Institute for Sustainability and Innovation in Structural Engineering (ISISE), under reference UIDB / 04029/2020, and under the Associate Laboratory Advanced Production and Intelligent Systems ARISE under reference LA/P/0112/2020.Additionally, the research was partly funded by the STAND4HERITAGE project, which received financial support from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program, Grant agreement No. 833123, as an Advanced Grant.The first author also acknowledges the support of national funds through FCT, under grant agreement 2020.08876.BD.

A
.Mohammadi et al.   including PGA, PGV, and 5% damped elastic PSA at various periods (T = 0.03-2 s) are estimated.The IMs of each record are computed by means of the open-source toolbox introduced by Ozsarac et al.[93].The normalisation approach enabled a fair comparison between ground motions of varying magnitudes and facilitated the identification of differences in ground motion amplification across spectral ordinates.The PSA of all databases under consideration normalised by their PGA values is shown in Fig.5.The median PSA for these records and one standard deviation above the median are also shown to illustrate the range of possible PSA values.

Fig. 2 .
Fig. 2. Histograms of seismological features of the Turkish ground motion records.

Fig. 3 .
Fig. 3. Distribution of earthquakes with respect to the focal mechanism.

Fig. 4 .
Fig. 4. Magnitude-distance (M w -R JB ) distribution of the dataset with respect to the focal mechanism (strike-slip, SS; normal, N; and reverse, R) and soil class (Types B, C, D, and E) according to NEHRP guidelines [90].

1 :
Defining the objective function by setting hyperparameters of the selected machine Step 2: Constructing a surrogate probability model of the objective function Step 3: For the surrogate probability model finding the bestperforming set of hyperparameters Step 4: Employing the hyperparameters of Step 3 in the real objective function Step 5: Rebuilding the surrogate probability model by incorporating the new results

Fig. 12 .
Fig. 12. Distribution of the inter-event residuals (η i ) with respect to magnitude (M w ) for selected intensity measures, including ln(PGA), ln(PGV), ln(PSA T = 0.2 s ), ln (PSA T = 0.5 s ), ln(PSA T = 1.0 s ), and ln(PSA T = 2.0 s ) from the XGBoost-based ground motion model.The top boxplot shows distribution of M w , while the boxplot on the right-hand side presents inter-event residual.The diamonds represent data points beyond the third quartile of the data distribution.

Fig. 13 .
Fig. 13.Distribution of the intra-event residuals (ε ij ) with respect to distance (R JB ) for selected intensity measures, including ln(PGA), ln(PGV), ln(PSA T = 0.2 s ), ln (PSA T = 0.5 s ), ln(PSA T = 1.0 s ), and ln(PSA T = 2.0 s ) from the XGBoost-based ground motion model.The top boxplot shows distribution of R JB , while the boxplot on the right-hand side presents intra-event residual.The diamonds represent data points beyond the third quartile of the data distribution.

Fig. 14 .
Fig. 14.Distribution of the intra-event residuals (ε ij ) with respect to shear wave velocity (V s30 ) for selected intensity measures, including ln(PGA), ln(PGV), ln(PSA T = 0.2 s ), ln(PSA T = 0.5 s ), ln(PSA T = 1.0 s ), ln(PSA T = 2.0 s ) from the XGBoost-based ground motion model.The top boxplot shows distribution of V s30 , while the boxplot on the right-hand side presents intra-event residual.The diamonds represent data points beyond the third quartile of the data distribution.

Table 1
Hyperparameters of the artificial neural network (ANN) approach and their values or types.