Assessment of ground motion amplitude scaling using interpretable Gaussian process regression: Application to steel moment frames

The process of ground motion selection and scaling is an integral part of hazard‐ and risk‐consistent seismic demand analysis of structures. Due to the lack of ground motion records that naturally possess high amplitude and intensity, the research community generally relies on scaling the records to match a target hazard intensity level. The scaling factors used are frequently as high as 10. Due to the criticism received in previous research studies, the extent of amplitude scaling and its process has become a matter of debate, and various constraints on the scaling factors have been proposed. The primary argument against unrestricted amplitude scaling is the unrealistic nature of the scaled records and the possible biases caused in the engineering demand parameters (EDPs) of structures. This study presents a framework to utilize machine‐learning and statistical techniques for the assessment of ground motion amplitude scaling for nonlinear time‐history analysis (NTHA) of structures. The framework utilizes Bayesian non‐parametric Gaussian process regressions (GPRs) as surrogate models to obtain statistical estimates of EDPs for scaled and unscaled ground motions. The GPR surrogate models are developed based on a large‐scale analysis of five steel moment frames (SMFs) using 200 unscaled as‐recorded ground motions for ten spectral acceleration levels, ( Sa(T1)${S_a}( {{T_1}} )$ ) (ranging from 0 g to maximum considered earthquake, MCE) and 2500 scaled ground motions representing 50 scale factors ( SF$SF$ ), and the 10 Sa(T1)${S_a}( {{T_1}} )$ levels for each SMF. For each building, two types of EDPs are considered: i) peak inter‐story drift ratio (PIDR) and ii) peak floor acceleration (PFA). To provide a better interpretation of the GPR surrogate models, the concept of explainable artificial intelligence (i.e., Shapley additive explanation, SHAP) is used to obtain insights into the decision‐making process of the GPR models with respect to the SF$SF$ and Sa(T1)${S_a}( {{T_1}} )$ . Then, for the 10 Sa(T1)${S_a}( {{T_1}} )$ levels, the GPR‐based EDP estimates under scaled ground motions corresponding to 50 different SFs are compared with the EDP estimates of unscaled ground motions. The comparison is conducted using Kolmogorov–Smirnov (KS) statistical hypothesis test. Results indicate that the range of allowable SF$SF$ s depends on two factors: i) intensity level (characterized by Sa(T1)${S_a}( {{T_1}} )$ ), and ii) the dynamic properties of the building. In general, it is noticed that allowable SF$SF$ s range between 0.5 and 3.0 for PIDRs, and from 0.6 to 2.0 for PFAs. Finally, the EDP between the unscaled and scaled ground motions are adhered to various discrepancies observed in different intensity measures representing amplitude‐, duration‐, energy‐, and frequency‐ content of the two sets of ground motions.


INTRODUCTION
The advances in computational capacities have driven the use of nonlinear time history analysis (NTHA) to assess the performance of buildings under the framework of performance-based earthquake engineering (PBEE). In this context, an essential step is the selection and scaling process of ground motions which are used for the NTHA of the building structures. The method of ground motion selection and scaling forms a link between the fields of seismology and earthquake engineering. Considering the limited number of strong recorded ground motions available in current ground motion databases and hence the inability to represent the typical hazard levels for performance-based analysis (e.g., design basis earthquake DE, maximum considered earthquake MCE, or collapse), it is a common practice to select moderate-to-low amplitude ground motions and scale them to reach higher spectral amplitudes that represent the required hazard levels. Thus, code provisions such as the American Society of Civil Engineers (ASCE 7-16) 1 and performance-based guidelines such as the Los Angeles Tall Buildings Structural Design Council (LATBSDC), 2 among others, have included procedures to select and scale recorded ground motions for conducting NTHA.
For decades, several researchers (specially seismologists) have criticized the validity of scaling ground motion records to achieve the desired intensity level. 3,4 The primary concerns in this regard are the differences in the frequency content and duration of ground motions recorded from different magnitudes and distances. It is well-known that for a given rupture distance ( ) and a class site, the spectral acceleration, on average, tends to increase with an increase in earthquake moment magnitude ( ). On the other hand, for a given magnitude and site class, an increase in the source-to-distance (such as ) leads to a decrease in spectral acceleration amplitude (attenuation). Thus, amplitude scaling can be interpreted as an artificial increment or decrement of or , and thereby can lead to inconsistencies in the correlations between intensity measures (IMs) and corresponding or of the ground motions. 4 Due to this, studies such as Bommer and Acevedo 5 have postulated that strongly influences the frequency content and strong motion duration and must be considered as a parameter in the ground motion selection process. They proposed limiting of the selected ground motion records to be within ±0.20 units of the disaggregated for the desired hazard level. As a remedial measure, many studies have proposed to set limits on the scaling factors that can be used in the ground motion selection and scaling process. For example, Krinitzsky and Chang 6 proposed a scaling factor limit of 4 without exhaustive scientific support for this criterion. Later on, Cornell and his group 7,8 studied the nonlinear response of a multiple-degree-of-freedom (MDOF) system subjected to ground motion records grouped into four different bins in terms of and to explore alternative criteria for the record selection and scaling. These scenarios included unscaled and scaled ground motion records. The study concluded that the bias from scaled records (in terms of the MDOF's response) is minimized when the records are scaled to the median spectral acceleration at the structural first period ( ( 1 )) of the corresponding and bin. These results were further confirmed by Iervolino and Cornell,9 where the study concluded that the nonlinear response of structures is not affected by conducting record selection using and scenarios, by scaling records for sites with no directivity effects and for records without systematic peak-valley effects.
Subsequent studies on the effect of amplitude scaling focus on estimates of the potential bias introduced by scaling ground motion records on the seismic performance of structures. For instance, Luco and Bazzurro 10 quantified the bias introduced in the nonlinear structural drift response of single-degrees-of-freedom (SDOFs) and MDOF systems by scaling records randomly selected from a − range to a target intensity level. The scaled ground motions were then used NOVELTY This article is a step towards utilization of artificial intelligence and machine learning concepts to solve earthquake engineering problems such as ground motion selection and scaling. In particular, this article utilizes Gaussian process regression (GPR) to carefully develop a data-driven surrogate model for statistical comparison of structural demands for steel moment frames obtained under scaled and unscaled ground motion records. This is one of the first studies that clearly provides the range of allowable scale factors for ground motion scaling different intensity levels and dynamic properties of the building structures. Furthermore, to provide a better interpretation of the GPR surrogate models, the concept of explainable artificial intelligence (i.e., Shapley additive explanation, SHAP) is used to acquire insights into the decision-making process of the GPR models. Hence, this study also aims to implement solutions to reduce the black-box nature of machine learning models and provide engineering interpretation to the models. Due to its novelty, its topical subject matter, and its strong potential to have a high impact on the field of earthquake engineering, we believe this work will be of broad and considerable interest to the readers of EESD and its special issue on AI and Data-driven Methods in Earthquake Engineering.
for NTHA of the SDOF and MDOF systems, and the obtained responses were compared with responses from unscaled records corresponding to the specific intensity level. Thus, bias was calculated as the ratio between the median structural response of scaled records to the median structural response of unscaled records at the target ( 1 )). Their analyses indicated that the record scaling introduces a bias in the nonlinear response of the systems and their engineering demand parameters (EDPs) (i.e., lateral drifts). Specifically, this bias was found to depend on the scaling factor, the fundamental period of the structure, the overall overstrength of the structure, higher mode effects, and on the − range of the records scaled. The computed bias ranged from 15% to 60% for SDOF systems (with different periods) for scaling factors from 2 to 10, respectively, while for MDOF systems, the bias spanned from 25% to 80% for scaling factors from 2 to 10, respectively. The study concluded that this bias is due to the differences in the response spectra of the scaled and unscaled records. Similar observations were confirmed by Backer and Cornell, 11 who proposed to select ground motions based on the parameter ε, which is considered a measure of spectral shape.
More recently, Zacharenaki et al., 12 Du et al., 13 and Davalos and Miranda 3,4 studied the influence of amplitude scaling of ground motions on the probability of collapse of SDOF and MDOF systems with different fundamental periods. Specifically, Davalos and Miranda 3 investigated the impact of using spectral shape for the ground motion selection process on the assessment of bias in the seismic response of several SDOF and MDOFs systems, including 2-and 4-story reinforced concrete moment frames. This study compared the median structural response and collapse probabilities using scaled and unscaled records. It indicated that the bias in drifts (overestimation) is observed even though the ground motions are selected and scaled considering the mean and variance of spectral shape. The study concluded that the bias occurs due to the presence of a large number of pulses with large incremental velocities in the scaled ground motions compared to the unscaled records. Thus, the differences in energy distribution, frequency, and duration of the scaled and unscaled ground motions for the same intensity level can be adhered to explain the statistical mismatch in the corresponding EDPs.
On the other hand, Du et al. 13 investigated the effect of scaling factor limits on ground motions selected by the conditional spectra criterion 14 for four different hazard scenarios. The authors studied the effect of scaling on the ground motions characteristics, conducted NTHA on SDOF and MDOF systems, and recommended a scaling limit of 3 to 5 for general use when ground motions are selected from the Next Generation Attenuation West 2 (NGA-West 2) database. 15 Modern studies on the topic have included artificial intelligence techniques in their analyses. For instance, Fayaz and Zareian 16 presented an algorithm to simulate site-based ground motions that reach a target IM between a range of periods. This algorithm used predictive relations between the model parameters and the spectral acceleration spectrum of the site-based ground motions and was used to generate ground motions for the NTHA of a bridge structure. The response of this bridge was compared to three sets of ground motions, including unscaled, scaled, and simulated ground motions. Results from this study indicate that the set of scaled ground motions led to significant bias in the EDP estimates compared to the unscaled and simulated sets. Besides these studies, Tsalouchidis and Adam (2022) 17 investigated the impact of amplitude scaling on the structural response of buildings. For this purpose, ten steel moment frames (SMFs) were subjected to NTHA. They studied the potential biases introduced in the peak inter-story drift ratio (PIDR) due to the scaling of ground motions. The study utilized cloud analysis and generalized linear regression models to deduce any changes in the PIDR trends caused by the scaling of spectrally compatible and scenario-compatible ground motions. The comparisons were made for scaling factors ranging from 2 to 20 and were primarily conducted by checking the ratios between scaled and unscaled EDPs. They concluded that the scaling process doesn't cause a major bias in the EDPs however recommended caution while using large scaling factors. Although this finding is consistent with some earlier studies on the topic (e.g., 9,18 ), many modern studies (e.g., 3,4,10 ) have indicated that amplitude scaling can lead to major biases even if ground motions are selected with proper consideration of the spectral shape. One of the major limitations of the Tsalouchidis and Adam (2022) 17 study relies in their ground motion selection criteria, where it is evident that the ground motions with smaller IMs are over-represented in the dataset of cloud analysis which can cause biases in the regression models and hence in the conclusions.
In summary, various research studies (e.g., 3,4,[10][11][12][13] ) have indicated that the scaling of the ground motions for NTHA leads to bias in the EDPs of structural systems. Most studies agree that the bias is not statistically significant for smaller scaling factors. However, there is no clear scientific rationale for what scaling factors are acceptable and do not lead to substantial biases in the structural demands. Also, ground motion scaling for hazard levels of engineering interest such as DE or MCE is barely studied. Furthermore, it is evident from the previous studies that allowable scaling factors may also vary based on the target intensity level. Moreover, most of these studies were conducted on non-degrading systems (e.g., 7 ) or did not include structures of different configurations, limiting the generalization of their results. In addition, the selected ground motions for their analyses do not provide a vast range of intensity levels or focus on a specific limit state such as collapse level (e.g., 3,4 ). No proper alternatives were explored to overcome the limitations of high intensity ground motion records. Furthermore, only traditional statistical tools were used to estimate bias (e.g., comparing the mean drift response from scaled or unscaled records) and no information was provided about their uncertainty distributions.
Against this backdrop, this study proposes a comprehensive framework based on principles of machine learning and statistical surrogacy to assess the impact of ground motion amplitude scaling on NTHA of structural systems. In particular, the paper presents a sophisticated statistical investigation using Gaussian process regression (GPR) to provide insights concerning the bias introduced by scaling ground motion records for NTHA of steel moment frames. Specifically, the objectives of this paper are threefold: a) investigate potential bias in EDPs caused by scaling of ground motions using machine-learning-based surrogate models; b) understand and interpret the decision-making process of developed surrogate-models using the concepts of explainable artificial intelligence (XAI); c) provide recommendations on the use of scale limits based on the target intensity level of ground motions. For this study, a series of NTHA in conducted on five archetype SMFs (i.e., 2-,4-,8-,12-, and 20-story) to obtain two EDPs: i) peak inter-story drift ratio (PIDR); and ii) peak floor acceleration (PFA). A total of 2500 scaled and 200 unscaled ground motion records are selected from the NGAWest2 database 15 to match ten intensity levels ranging from 0 g to MCE level for each SMF. The results of the analyses are then used to develop GPR-based surrogate models for scaled and unscaled demands, which are then interpreted and explained using Shapley additive explanation (SHAP) to provide insights. The surrogate models are then used to sample the EDP posterior distributions for different intensity levels and scaling factors ( ). The distributions are utilized and compared to understand the joint impact of and ( 1 ) on the EDPs; the comparison is conducted through hypothesis testing of the scaled and unscaled EDP distributions to provide recommendations on the record scaling process. Finally, the observed bias in the EDPs under scaled ground motions as compared to the unscaled ground motions is briefly investigated by analyzing the differences in the intensity measures representing amplitude-, duration-, energy-, and frequency-content of the two sets of ground motions.

Assessment framework
The framework is based on a selection of ground motion records from an extensive ground motion database (such as the NGAWest2 database 15 ) and relies on large-scale NTHA of the selected structural application. The general scientific background of this investigation is based on a ten-stepped procedure described below (the text provided in parenthesis for the steps explains the corresponding items used in this study).
1. Select a ground motion IM (e.g., ( 1 )) and corresponding target IM level (e.g., MCE level) based on the structural application (e.g., steel SMFs) and hazard level of interest (e.g., MCE level). 2. Obtain stripes of IMs between the minimum IM level of interest and the target IM level by dividing the range into groups (e.g., = 10 equal stripes between 0 g and MCE level).
3. For each of the IM stripes, from the ground motion database, obtain number of unscaled ground motions possessing IM closest to the IM stripe (e.g., = 20 is used to select 20 unscaled ground motions). Hence a total of × unscaled ground motions are selected (e.g., 10 × 20 = 200 unscaled ground motions are selected for each SMF). 4. For each of the IM stripes, compute the average IM of the selected unscaled ground motions. 5. Select values of s to obtain scaled ground motions for the computed average IM of each IM stripe (e.g., = 50 is used with 25 divisive s (< 1) and 25 multiplicative s (> 1)). 6. Obtain scaled ground motions (e.g., = 5 is used to select 5 scaled ground motions) for each of the values of s by selecting the ground motions that require th to attain the average IM of the unscaled ground motions. Hence a total of × × scaled ground motions are obtained (e.g., 10 × 5 × 50 = 2500 scaled ground motions are selected for each SMF). 7. Utilize the selected scaled and unscaled ground motions to conduct NTHA of the selected structure (e.g., steel SMF) and obtain the EDPs of interest (e.g., PIDRs and PFAs). Hence a total of × × scaled + × unscaled EDPs are finally obtained for the selected structure. 8. Due to the limitation of recorded ground motions, the EDP grid constructed using × scaled and unscaled ground motions (unscaled represents = 1) are likely to be sparse and would not adequately cover all ranges of IMs and s. Hence utilize appropriate machine-learning/statistical methods to develop a comprehensive surrogate model which can be used to complete the dataset. The model needs to be trained using the IMs and s as inputs and should be able to provide proper statistical estimates of corresponding EDPs (e.g., Bayesian non-parametric GPR models are trained using the scaled and unscaled datasets). 9. For a particular IM level, use the surrogate model to obtain the estimate for the unscaled EDPs for = 1 and scaled EDP for the different s; the estimates can be point estimates of a statistical moment or complete statistical distribution (e.g., complete posterior distributions of EDPs are obtained using the GPR models). 10. Use an appropriate hypothesis test to statistically compare the similarity of one estimate of unscaled EDP against scaled EDP estimates corresponding to s (e.g., KS test is used in this study). Finally obtain the s whose corresponding scaled EDP estimates are deemed statistically similar to the unscaled EDP estimate, thereby showing no bias. This process should be repeated for other IM levels of interest to obtain the corresponding s leading to no significant bias in EDPs. The results can then be used to provide recommendations for the allowable ranges for different IM levels and EDPs. While the proposed framework is general and can be used for any type of structure and ground motions, in this study, the process is applied to steel SMFs and crustal ground motions, and the results are discussed in the subsequent sections of the paper. The sections: "Structural Models" provide details of the five SMFs utilized in this study; "Ground Motion Selection" discusses the selection and scaling process of ground motions (i.e., steps 1 to 6); "Seismic Demand Surrogate Model using GPR" entails the description of NTHA and development of GPR-based surrogate modelling (i.e., steps 7 and 8); "Interpretability of the developed GPR models" provides supplementary interpretability of the developed GPRs for engineering analysis; "Hypothesis Testing and Results" discusses the results of KS hypothesis test and provides recommendations for allowable scaling limits (i.e., steps 9 and 10); and finally "Conclusions" section provides the final analysis of the framework within the setting of steel SMF structures.

Structural models
For this study, five (i.e., 2-, 4-, 8-, 12-and 20-story) different archetype SMFs are selected as the prototype structures for analysis. The five SMFs mathematical models vary only in their height since this parameter is a quite important characteristic of buildings, as pointed out in several parametric studies conducted in the past (e.g., [19][20][21][22]  are applied to all floors. In addition, a perimeter load of 1.20 kN/m 2 is applied to simulate the cladding. Figure 1 illustrates the elevation and the plan view of the selected archetype frames used in this investigation. As per Figure 1, the SMFs share common characteristics, such as the plan view. The SMFs are located at the perimeter of the building plan and consist of three-bay frames with a bay width of 6 m. The first story's height is 4.5 m, while the typical height of the rest of the stories is 3.90 m. The beam-column connections are detailed as reduced beam section (RBS) connections as per the American Institute of Steel Construction (AISC) AISC-341-16 23 and AISC-358-16. 24 Table 1  The SMFs are modeled as plane frames (2D) using the software OpenSEES for all the NTHA since this platform has been extensively verified 26 for nonlinear analysis. Beam and column elements of the SMFs are modeled as linear elastic elements with rotational springs located at their ends. The nonlinear rotational springs aim to capture the hysteretic behavior of the beam-column connections and are represented by Ibarra-Medina-Krawinkler (IMK) bilinear model. 27 This phenomenological model consists of a trilinear backbone curve, rules to predefine the hysteretic behavior, and rules to capture cyclic strength and stiffness deterioration. These springs' properties are computed per the guidelines for nonlinear structural analysis for the design of buildings provided by the National Institute of Standards and Technology (NIST GCR 17-917-46v2). 28 However, these phenomenological springs cannot capture the moment-axial load interaction effect in the columns of the SMFs. Thus, this phenomenon is tackled in an approximate manner. First, based on the recommendations given by NIST GCR 17-917-46v2 [28], the gravity loads are assumed as an average value of the axial load obtained during the NTHA of the frames. Next, the bending strength of the columns is reduced using the AISC 318-16 29 interaction equations for combined forces. Finally, this reduced strength is used in all column springs. This procedure is appropriate since SMFs are typically drift-controlled rather than force-controlled, entailing that the SMF columns are only lightly loaded. Moreover, experimental studies in the past (e.g., 30 ) on shake tables have shown good agreement with mathematical models calibrated with the preceding approach. Because of these reasons, several studies on SMFs 20,31,32 conducted in the past have adopted similar modeling decisions.
The panel zones are modeled as a hinge parallelogram assembly by rigid elements with a nonlinear spring in one of the corners to simulate shear distortion. This spring is defined by the yield strength, full plastic strength, and the corresponding stiffness parameters. 33 A leaning column simulates P-delta effects through large displacement geometric nonlinearity. This leaning column is loaded with gravity loads corresponding to half of the plan-building on each floor and is connected to the SMFs through rigid elements. Finally, column base connections are simulated with the mathematical models developed by Torres-Rodas et al. 34 and Torres-Rodas et al. 35 for exposed and embedded base connections, respectively. Consistent with current engineering practice, 36 the exposed base plate detail is used for the low-rise buildings (i.e., 2-, 4story), while for the taller frames (i.e., 8-,12-and 20-story), the embedded base connection detail is preferred.

Ground motion selection
Proper ground motion record selection is essential to understand the joint impact of and ( 1 ) on the response of steel structures. The study uses a large subset from the NGAWest2 database 15 consisting of ∼7000 mainshock records  ( 1 ) is obtained from the uniform hazard spectrum (UHS) of the site using the unified hazard tool (UHT) 38 (corresponding to step 1). The ( 1 ) levels are then uniformly divided into 10 equal stripes from 0 g to the obtained MCE ( 1 ) level (corresponding to step 2). For each stripe of ( 1 ), 20 unscaled ground motion records that closely match (without any scaling) the respective ( 1 ) are selected from the ∼7000 records database. Hence a total of 200 unscaled ground motions are selected for each building for this study (corresponding to step 3). The selected ground motion spectra are shown along with the UHS of LADT in Figure 2A for the eight-story building ( 1 = 2.21 s). The ( 1 ) of the selected unscaled records are also plotted against = 1 in Figure 2B for the eight-story building ( 1 = 2.21 s). The different colors of the diamond-shaped data points represent the ground motions selected for different ( 1 ) stripe. It can be observed from the data points that as the stripe ( 1 ) increases, the variance of ( 1 ) of the selected ground motions also increases. This is due to the lack of enough ground motions in the recorded database for large ( 1 ) levels. Though the ( 1 ) distribution and accuracy of the selected 20 ground motions are not the same for higher ( 1 ) levels as compared to the lower ( 1 ), the differences in the unscaled ( 1 ) are not expected to lead to significant differences in the final analyses, as will be described in the following sections of selecting scaled ground motions and developing/utilizing GPR-based surrogate models.
For each group of the selected 20 unscaled ground motions, a corresponding set of 250 scaled ground motions is obtained. As described above, the selected unscaled ground motions do not necessarily have the same ( 1 ) as the target ( 1 ) of the different stripes. Hence, to select a statistically compatible set of scaled ground motions, ( 1 ) of the selected 20 unscaled ground motions for each stripe is averaged (shown in dashed red lines in Figure 2B) (corresponding to step 4). Then the averaged ( 1 ) is used as a target ( 1 )  of the corresponding 20 unscaled records, respectively (corresponding to step 6). This is done to reduce any additional sources of variability between the scaled and unscaled ground motions that may occur due to differences in causal parameters. The selection is made by first obtaining the required s for all the ground motions in the subgroup to achieve the target averaged ( 1 ) and then selecting the five closest to each of the 50 levels. Hence, a total of 2500 scaled ground motions are set (5 ground motions × 50 s ×10 ( 1 ) stripes) for each SMF (corresponding to step 6). The selected scaled ground motions are shown in grey data points in Figure 2B. It should be noted here that in the complete record selection process (including scaled and unscaled), once a ground motion is selected for any level of scaled or unscaled ( 1 ) and , it is removed from the database to avoid any repetitive selection of the same ground motion. This selection process is repeated for each SMF (five in total) analyzed herein. In this manner, a total of 13,500 ground motions (2500 scaled + 200 unscaled = 2700 ground motions × 5 SMFs) are selected for NTHA of the steel structures.

Seismic demand surrogate model using GPR
The selected 2700 ground motions are used to conduct NTHA of the five SMFs to obtain two peak responses: i) peak interstory drift ratio (PIDR) and ii) peak floor acceleration (PFA) (corresponding to step 7). Figure 3 illustrates a normalized histogram of the PIDR distributions for the 8-story frame. As per Figure 3, most of the PIDRs (∼ 80%) are below 0.025 (i.e., are code compliant), implying modest nonlinear behavior. Similar trends are observed in the other SMFs. The aim is to compare the 2500 scaled responses for the selected 50 scaling factors against the 200 unscaled responses. As described in the previous section, due to the lack of unscaled records for large ( 1 ) and scaled records for the combination of large ( 1 ) and < 1, the obtained EDPs data are not viable for the direct comparison. This can be thought of as a "missing data" or "kriging" problem. Hence, in this study, Bayesian non-parametric and machine-learning-based GPR 39 is deployed to develop surrogate models, which are used to complete the data for the entire selected ( 1 ) and grid (corresponding to step 8).
The basis of GPRs relies on the Gaussian process (GP), which is defined as a collection of random variables, any finite number of which have a joint (multivariate) Gaussian distribution. 40 GPR is a powerful and effective tool for nonlinear regression problems since it has a simple structure and, unlike most regression methods, provides non-uniform uncertainty estimations based on the proximity of features. 41 GPR serves as an efficient tool for performing inference both passively (i.e., interpolation, such as completing missing data, or extrapolation, such as forecasting or prediction) as well as actively (i.e., filtering and smoothing). 39 It can accurately capture a wide variety of relations between the features and targets by utilizing a theoretically infinite number of parameters and allowing the data to determine the level of complexity through Bayesian inference. 42 Furthermore, it is observed that many Bayesian regression models based on artificial neural networks converge to the Gaussian process with an infinite number of hidden units. 43 Due to the efficacy of GPRs, it has been widely used as surrogate models for various modeling purposes and applications (e.g., [44][45][46][47][48][49]. A GPR model attempts to describe the data in terms of a signal term ( ( )) and noise term ( ), as explained in Equation 1. Similar to other regression models, the noise term reflects the inherent aleatory randomness in the data and is assumed to be normally distributed with zero mean and variance ( 2 ). The signal term ( ( )) is also assumed to be a random variable following a particular statistical distribution, thereby reflecting the uncertainty of the function. In GPRs, this distribution is assumed to be a Gaussian process with a mean ( ( )) and covariance ( ( , ′ )) function, as mentioned in Equation 2. The mean function ( ( )) describes the expected value (E[.]) of the function ( ) at input as shown in Equation 3. The covariance function ( ( , ′ )) explains the dependence between the values of the function at inputs and ′ . The covariance function ( , ′ ) involves the utilization of a kernel that models the correlation between two points based on the distance between the points. This means that closer points are expected to behave more similarly than points that are further away from each other. Many available kernels are used to develop the covariance function, such as radial-basis function (RBF), white, matérn, rational quadratic, pairwise, etc. 39 Based on the trade-off between explainability, generalization (prevent overfitting), and accuracy, this study uses the summation of matérn (Equation 5) and white (Equation 6) kernels as described in Equation 7. The class of matérn kernels is a generalization of RBF kernels. It is parameterized by two factors, including i) length scale ( ), which is greater than zero, and ii) smoothness parameter ( ). The length scale parameter ( ) can either be a scalar (an isotropic variant of the kernel) or a vector with the same number of dimensions as the inputs x (an anisotropic variant of the kernel). The smaller , the less smooth the approximated function is. As → ∞, the kernel becomes equivalent to the RBF kernel. When = 1∕2, the matérn kernel becomes identical to the absolute exponential kernel. Other important intermediate values include = 1.5 (once differentiable functions) and = 2.5 (twice differentiable functions). Furthermore, (.), (.), and Γ(.) represent the Euclidean distance, modified Bessel function, and gamma function, respectively. The white kernel is mainly used as a sum kernel and explains the noise (∈) of the signal as an independent and identical normal distribution with variance 2 . Lastly, represents the scaling factor of the used kernel function ( , ′ ) (Equation 7). Due to the lognormal nature of seismic demands and IMs, the GPR models are trained in the lognormal domain. 50,51 For each SMF, two types of GPRs are trained as surrogate models using the resulting EDPs of i) 200 unscaled ground motions with ( 1 ) as the input features, and ii) 2500 scaled ground motions with ( 1 ) and s as the input features (corresponding to step 8). The two EDPs (i.e., PFA and PIDR) are independently used as the target variables for the GPRs. Hence a total of four GPRs are developed for each SMF. Since each combination of ( 1 ) and , five scaled ground motions are selected and used for NTHA, and the GPRs are developed by obtaining the mean EDPs (in the lognormal domain) and . Hence for each five scaled ground motions obtained at the intersection of ( 1 ) and , the respective mean EDPs are obtained to represent the scaled response. This is presented in Figure 4A for eight-story SMF, where the yellow data points show the 200 unscaled records against = 1, grey points show the 2500 scaled records, and red markers show the 500 mean values (i.e., one mean value for five scaled records) for the scaled records. This is done for both PFAs and PIDRs, and the obtained 500 mean values are utilized to develop GPR surrogate models for scaled responses, while the 200 unscaled values are used to develop GPR surrogate models for unscaled responses (corresponding to step 8).
The posterior mean prediction surface and curve using the scaled and unscaled GPR models, respectively, are shown in Figure 4B (other views are given in Figures 4C and D) for the eight-story building. As can be seen from the figure, the GPR models demonstrate a good prediction power for the PIDRs with a coefficient of determination ( 2 ) of 0.75 and 0.85 for unscaled and scaled responses, respectively. Similarly, 2 of 0.74 and 0.9 are observed for unscaled and scaled PFAs, respectively. Similar results are observed for other SMFs and EDPs. Hence based on the good prediction power of the developed GPRs for the dataset, they can be used as surrogate models for the data completion. It should be noted that the

F I G U R E 4 (A) True computed, (B) GPR predictions (view 1), (C) GPR predictions (view 2), (D) GPR predictions (view 3), for unscaled and mean scaled PIDR for eight-story SMF
developed GPRs are not proposed as general surrogate models but are only used as kriging models to complete the EDPs dataset in this study. It is important to point out that only non-collapse cases of analysis have been considered in this study. Thus, a PIDR limit of 0.10 has been set as a threshold between collapse or non-collapse, following notable publications on the topic (e.g., 52 ).
Several previous research studies (such 16,53-56 ) have described the importance of using IMs that explain spectral accelerations at a range of periods rather than ( 1 ). This is mainly to include the period elongation (due to non-linearity) that can occur during the time history of high-intensity ground motions and to include higher mode effects. However, such conclusions are usually observed in the cases of structures that are expected to undergo huge non-linearity during the ground motion and have a higher number of participating modes. In this study, the modal mass contributions of the first mode for the SMFs are 95.31% for two-story, 83.80 for four-story, 80.41% for eight-story, and 78.28% for 12-story, and 74.89% for 20-story (as mentioned in Table 1). Thus, higher modes' contributions are not deemed significant to the total response of the used SMFs. Furthermore, the recorded EDPs are observed to be low (code compliance), and consequently, the level of inelasticity/non-linearity occurring during the ground motion is also small. Hence, ( 1 ) is postulated to be a sufficient IM in the study. This hypothesis is tested by comparing the relative feature importance of ( 1 ) against the average spectral acceleration ( , ) and , using random forests algorithm. , is defined as the geometric mean of spectral acceleration values between periods . 1 and . 1 where and are non-negative constants ( ≤ ). 57 In this study a = 0.2 and b = 3 are used. On the other hand, , , is a metric used to quantify spectral shape, and defined as ( 1 ) normalized by , over a period range. 55 This data-driven method checks the relative importance of the candidate features for explaining a target variable. 53 To avoid any bias or collinearity issues, in this case, the algorithm is employed in two settings; ( 1 ) is compared against , and , for the target for the 2700 (scaled and unscaled) responses for all five SMFs. It can be observed from Figures 5A and B for the dataset of the eight-story SMF, ( 1 ) is observed to be more significant in describing as compared to , and , , thereby validating the hypothesis. Also, to make the conclusions simpler and interpretable, it is essential to minimize the number of features F I G U R E 5 Relative feature importance of ( 1 ) against: (A) , and (B) , , for PIDR of eight-story SMF and use scalar quantities of IMs as the inputs to the GPRs. Hence it can be concluded that for the used dataset, ( 1 ) is sufficient to explain the demands of the used SMFs. The readers can conduct similar feature analysis for their datasets and utilize the appropriate IMs (corresponding to step 1).

Interpretability of the developed GPR models
Due to the versatility of the machine-learning models, they have been widely used in engineering applications. However, due to the "black-box" nature of these models, there is a general reluctance in the research community to propose and utilize such models. Hence it is critical to provide sufficient analytics for model interpretability and its response in terms of predictions based on variability in the input features. With the onset of XAI, various algorithms have been developed that provide different methods that allow interpretability of these "black-box" models and step towards "grey-box" and even "white-box" nature (such as linear regression, decision trees, etc.). This study uses SHAP to analyze and interpret the nature of the developed GPRs. 58 SHAP is a model-agnostic procedure that provides insights to explain individual predictions of the model based on the game's theoretically optimal Shapley values. Shapley values are a widely used approach from cooperative game theory with desirable properties. 59 The feature values of a data instance act as players in a coalition. The Shapley value is the average marginal contribution of a feature value across all possible coalitions. Shapley values are the only solution that satisfies properties of efficiency, symmetry, dummy, and additivity. 58 SHAP belongs to the class of models called "additive feature attribution methods," where the explanation is expressed as a linear function of features, as described by Equation 8.
The 50 s and 10 ( 1 ) are used as the inputs (i.e., 50×10 = 500 samples) to the GPR surrogate model for scaled ground motions and the respective SHAP values for (i.e., ) and ( 1 ) (i.e., ( 1 ) ) are computed. The SHAP values for the 500 samples for of eight-story SMF are presented in Figure 6a, where the color of the data points represents the magnitude of the feature values, while low refers to values close to 0 for both s and ( 1 ), high refers to values close to 10 and 0.6 for s and ( 1 ), respectively. As described above, the sign of SHAP indicates the direction of contribution and the absolute value of SHAP indicates the magnitude of contribution. Hence negative SHAP values indicate that the corresponding feature lowers the final target prediction while positive SHAP indicates an increase in the final output of the model. It can be observed that, in general, with an increase in the value of both features, their corresponding SHAP values tend to move from a negative sense to a positive sense. The trend is more evident for s, which means for central values of the feature (i.e., = 1), the contribution tends to be close to zero. Hence this means a higher the magnitude of scaling factor (division or multiplication) and ( 1 ) leads to a higher magnitude of contribution to the output . Furthermore, it can be observed from the Figure 6A that extreme values of lead to a higher magnitude of SHAP and hence lead to more significant changes in the output as compared to ( 1 ). Also, Figure 6B presents the relative feature importance of and ( 1 ) on the GPR output, in terms of their mean absolute SHAP values (|SHAP|). This is done by computing the mean |SHAP| values for the 500 samples for both and ( 1 ) and then dividing them by the sum of the two mean |SHAP|. Since the SHAP values represent the contribution of the features to the model output, computing their relative sum for the two features in an absolute sense signifies the importance of the respective feature in predicting . It can be observed from the figure that tends to have a higher contribution in the GPR model to predict the as compared to ( 1 ). The difference in the contribution is close to 60%:40% in favor of against ( 1 ). This means the scaling processes require more careful examination in terms of both and ( 1 ). Figure 7 presents the dependence contribution plots for SHAP and SHAP ( 1 ) against and ( 1 ) and the colors provide the dimension of other features. Dependence plots allow better understanding and insights into the features' interactions and the respective SHAP values. Figure 7A shows the 500 s plotted against the corresponding SHAP while Figure 7B shows the 500 ( 1 ) plotted against the corresponding SHAP ( 1 ) . The plots also provide box plots for the respective SHAP values for showing the statistics. The box plots show that the range of SHAP tends to be higher than SHAP ( 1 ) , which means that sensitivity of is higher towards the change in as compared to ( 1 ). Furthermore, it can be observed from Figure 7A, SHAP tends to be close to 0 for s∼1 showing no contribution to the GPR model predictions. This is in line with the expectations since, for s∼1, ( 1 ) is the only parameter that impacts the . It is further observed that as the value of decreases less than 1, the magnitude of SHAP increases linearly in a negative sense with significantly less variability due to ( 1 ). On the other hand, an increase in greater than 1 leads to a plateau effect in terms of SHAP , however, the variability in the SHAP values increases significantly due to ( 1 ). This means the contribution of < 1 (i.e., divisive scaling factors) to the doesn't vary significantly with ( 1 ), while  Figure 7B, it can be observed that with an increase in ( 1 ), SHAP ( 1 ) increases considerably, and interaction with leads to significant variability in the SHAP values. This means the contribution of ( 1 ) to predict is significantly affected by the level of scaling involved. Hence it is essential to develop regulations for amplitude scaling in conjunction with the IMs (i.e., ( 1 ) in this case).
Lastly, to interpret the decision-making process of the GPR model, Figure 8A presents force plots for six grid samples among the 500 samples (50 s × 10 ( 1 ) ). The selected six samples represent the four corners (maximums and minimums of and ( 1 ) ) and two central (factor of 2 for and median ( 1 ) ) examples of the selected and ( 1 ) grid. The dashed line in the figure shows the base value obtained from the GPR model, which represents the average prediction for and hence is the same across all predictions. The overall prediction from the GPR model is computed as described in Equation 11. The two-colored arrows in the figure show how the and ( 1 ) influence the GPR base value (i.e., SHAP value) and arrive at the final prediction of based on the input values. It can be observed in general that tends to have a decreasing impact on the model outcome while ( 1 ) tends to raise the values, thereby having an increasing impact. Specifically, it is observed that for smaller values of ( 1 ) the model outputs are influenced by ( 1 ) to a more considerable extent as compared to , which is consistent with Figure 7b. Figure 8B  Pred ln ( ) = GPR Base Value + SHAP + SHAP ( 1 ) (11)

Hypothesis testing and results
The GPR-based surrogate models are utilized to provide recommendations for careful scaling of ground motions to conduct NTHA of the steel SMFs. This is done by using the scaled GPR model to sample the posterior distributions of EDPs for 50 s (25 divisive and multiplicative) × 50 ( 1 ) (0 g to MCE in 50 equal intervals) grid; and using the unscaled GPR model to sample the posterior distribution of EDPs for the 50 ( 1 ) (unscaled EDPs correspond to = 1). Hence, after the sampling, there are 50 distributions for the scaled EDPs (corresponding to 50 s) and one distribution for unscaled EDPs (corresponding to = 1) against each of 50 ( 1 ) (corresponding to step 9). The obtained 50 scaled EDP distributions are compared against the one unscaled EDP distribution for each ( 1 ) to identify the s that lead to significant statistical bias in EDPs. Herein, it should be noted that "scaled EDPs" do not mean that the EDPs are scaled themselves, but rather refers to the EDPs obtained for the scaled ground motions (surrogated by the GPR model). The comparison between the scaled and unscaled EDP distributions is conducted using Kolmogorov-Smirnov (KS) hypothesis test for each ( 1 ) (corresponding to step 10). The null hypothesis proposes that the two distributions are statistically similar, which thereby suggests that the scaled EDPs for the corresponding are identical to the unscaled EDPs for the given ( 1 ). Hence the p-values greater than a significance level (typically 0.05) indicate no significant bias in the EDPs due to scaling of ground motions, while p-values lower than the significance level mean that the corresponding introduces a bias in the EDPs. Figures 9 and 10 summarize the findings by illustrating the results from the KS test using contour plots of the obtained p-values for the grid. In these figures, the horizontal axis represents the scaling factor while the vertical axis is the IM (defined here ( 1 )). The color bar represents the p-value from the KS test. Thus, the green-colored regions in the figures illustrate the grid cases with p-value∼ 0, thereby rejecting the null hypothesis of statistical similarity between the scaled and unscaled EDPs and indicating bias due to the scaling of ground motions. Similarly, other regions represent grid cases where the test fails to reject the null hypothesis suggesting no significant bias with different probability levels (i.e., pvalues). While the consensus in statistical research is to use p-value = 0.05 for hypothesis testing and balance type I and type II errors, in this study, the contours are presented for different p-values to allow stakeholders to choose their appropriate levels of required confidence.
The hypothesis test is conducted for both EDPs: PIDR and PFA (for all the archetype frames). Figures 9 and 10 present the results for PIDR and PFA, respectively, for all five SMFs. In both figures, it is observed that large scaling (both multiplicative and divisive) leads to p-values∼ 0, indicating bias in the scaled PIDRs and PFAs. Also, both figures show that for all SMFs, the bias is dependent on both ( 1 ) and ; and generally allows a higher range of for lower levels of ( 1 ) and leads to a more restricted range of for higher levels of ( 1 ). Furthermore, it is noticed that the results are not symmetric for < 1 and > 1 for both PIDR and PFA. The allowable tends to decrease faster with increasing ( 1 ) for > 1, and the slope is observed to be lower for < 1. In Figure 9, it is observed that for ( 1 ) values close to 0.1 g, the allowable range of scaling factors is: between 0.4 and 3.5 for p-value∼ 0.025, between 0.5 and 3 for p-value∼ 0.05, and between 0.6 and 2 for p-value∼ 0.1. In contrast, for higher levels of intensity (e.g., ( 1 ) = 0.7 g for eight-story SMF) the allowable SF range narrows down to: between 0.7 and 2 for p-value∼ 0.025, between 0.75 and 1.75 for p-value∼ 0.05 and between 0.85 and 1.15 for p-value∼ 0.1. This phenomenon can be due to the mismatch in time-histories created by the for low amplitude ground motions compared to the true high amplitude unscaled ground motions. For example, it can be postulated that the use of = 2 for ground motions with PGA = 0.1 g and PGA = 1 g may not lead to the same changes in their other respective ground motions parameters.
Hence the impact of scaling should be considered along with the level of ground motion intensity. Furthermore, a closer inspection of Figure 9 indicates that the bias in EDPs is also affected by the height of the SMFs. For instance, for the 12-story frame, at ( 1 ) = 0.5 the range of no-bias scaling factors is between 0.70 and 2.00, while for the two-story, at the same intensity level, the range spans from 0.40 to 3.00. This trend is observed across the five frames. This phenomenon may be explained due to the differences in the plastic strain distribution along with the height of the buildings. For instance, taller buildings (e.g., 12-and 20-story) generally lead to concentrated plasticity in a few elements in the lower stories. In contrast, in short buildings (e.g., two-and four-story), the plasticity is more evenly distributed along with the height. Thus, differences in the first period of the structure (associated with building height) entail an influence on the range of allowable s. This analysis is consistent with the mass modal participation factors mentioned in the earlier sections, where the first mode leads to the participation of 95.31% for the two-story SMF and reduces to 74.89% for the 20-story SMF. Furthermore, as mentioned above, the slope of allowable s with respect to ( 1 ) tends to be steeper for > 1 as compared to < 1. This means the contribution of < 1 (i.e., divisive scaling factors) to the doesn't vary significantly with ( 1 ), while for > 1 (i.e., multiplicative scaling factors), the SF contribution to the varies considerably with ( 1 ). This observation is consistent with the results obtained from SHAP analysis. PFAs play an important role in detailing nonstructural components of buildings. 61,62 Hence it is also important to consider the impact of ground motion scaling on the structural PFAs. Figure 10 presents the KS test results for the PFAs of the five SMFs. In general, it can be observed that the range of allowable scaling factors is narrower for PFAs as compared to PIDRs. Similar to the cases in Figure 9, the bias due to record scaling on PFAs depends on the intensity level. It can be observed that for ( 1 ) values close to 0.1 g, the allowable range of scaling factors is between 0.45 and 2.5 for ( 1 ) = 0.7 g for eight-story SMF) the range narrows down to: between 0.65 and 1.8 for p-value∼ 0.025, between 0.8 and 1.5 for p-value∼ 0.05 and between 0.9 and 1.1 for p-value∼ 0.1. The results indicate that the scaling bias in PFAs has lower variability for the same intensity level than in the case of PIDRs. However, the slope of allowable s concerning ( 1 ) is observed to be more prominent for > 1 in the case of PFAs as compared to PIDRs. Lastly, in contrast with Figure 9, the scaling bias is less sensitive to the building height, implying lower effects of ( 1 ) on the ground motion scaling in PFAs. These differences in the results between PFAs and PIDRs can be due to the tendency of PFAs to decrease (as reported by 63 ) nonlinearly with levels of plasticity to a higher degree in contrast to the PIDRs. The findings presented in Figures 9 and 10 are consistent with the previous studies conducted on the topic (e.g. 3,4,10,64 ), implying that record scaling leads to bias in EDPs. The reasons behind this bias, as pointed out by other authors (e.g., 3,4,10,64 ), can be due to: i) strong ground motion duration (which is related with ) is not affected by amplitude scaling and has been reported to influence the behavior of buildings 65 ; ii) ground motions with benign amplitude are typically recorded for small or large , and consequently require large scaling for utilization at engineering hazard levels.Such scaled records lead to bias in EDPs due to the fact that the scaled ( 1 ) is not aligned with the corresponding or and hence other related IMs are not consistently altered 4,57,66 ; iii) as reported by Davalos and Miranda 3 , the scaled ground motion records lead to larger input energy due to acceleration pulses with larger amplitudes compared to the input energy of unscaled records for similar intensity levels. In a nutshell, the EDP biases can adhere to the fact that the amplitude scaling in terms of ( 1 ) or any other scalar IM, do not appropriately scale the other correlated IMs of the ground motions (such as energy-content, frequency, duration, etc.). Hence the scaled ground motions do not inherit the internal correlations among the various IMs and event parameters (such as , , etc.). Additionally, as observed from the figures, the range of s with possible bias increases with an increase in the ground motion intensity (i.e., ( 1 )). This is directly related to the level of inelasticity developed in the buildings due to high intensity ground motions. Note that the analyses in this study are conducted to a ( 1 ) equal to the MCE level. Hence it can be concluded that the collapseoriented analyses are strongly affected by scaling the low-amplitude records with a factor of ∼2 and above. Accordingly, it is observed that the levels associated with serviceability, design, and MCE requirements have different allowable scaling ranges that should be considered for ground motion selection and scaling procedures.
To briefly explore the reasoning behind the observed biases and provide new insights, Figure 11 compares different ground motion IMs of the unscaled and scaled records (in the natural logarithmic domain). In this figure, the red and blue colors represent the unscaled and scaled records, respectively. The diagonal insets of the figure present the normalized kernel density estimations (KDE) of Arias Intensity ( ), significant duration ( 5−95 ), mean period ( ), mid-frequency ( ), and ( 1 ) for 2500 scaled (blue color) and 200 unscaled (red color) records selected for the eight-story SMF. These IMs are used as proxies for various engineering characteristics of the ground motions: represents the energy content, 5−95 is the significant duration of shaking, signifies the dominant frequency content, 67 represents the frequency at the middle of the shaking, 68 and ( 1 ) represents the amplitude of the ground motions. The off-diagonal subplots in Figure 11 show relational plots (scaled and unscaled records) among the five IMs presented for two measures in each subplot. Specifically, the upper off-diagonal insets present the scatter plots (along with mean regressions) between each pair of IMs. In contrast, the lower off-diagonal insets show the density contour of the scatter plots.
The KDEs for the five IMs are compared between the scaled and unscaled ground motions using the KS hypothesis test with the null hypothesis that the two KDEs are statistically similar. It can be observed that except ( 1 ) all the other comparisons lead to p-values < 0.05, indicating rejection of the null hypothesis. The KDE match for ( 1 ) is obvious here since the 2500 scaled ground motions are selected with the criteria to match the ( 1 ) of the 200 unscaled ground motions. This comparison of IMs signifies that even though the scaled ground motions are selected to make sure they have consistent amplitude-based ( 1 ) distributions with unscaled ground motions do not necessarily lead to statistical match in other types of IMs. Notably, it can be observed that the two sets of ground motions are significantly different in terms of their energy-and frequency-content ( , , and ). These observations are consistent with the conclusions made by Davalos and Miranda 3 with respect to the difference in energy between scaled and unscaled ground motions.
Furthermore, it can be observed from the upper off-diagonal insets in Figure 11 that the scaled and unscaled ground motion IMs differ considerably in terms of their cross-relations. Notably, the relationships between the other IMs and ( 1 ) are observed to be different in the two sets except possibly for the case of 5−95 . Since the ground motions used in this study only correspond to the crustal seismic sources, 5−95 is known to not vary significantly (due to the inclusion of pulse-characteristics). 69 Hence in Figure 11, it can be observed that, generally, the trends with 5−95 are not extremely sensitive to ground motion selection using ( 1 ). In addition, the trends of other IMs concerning are observed to be considerably different between the scaled and unscaled ground motions. This means that the selection and scaling of ground motions using only amplitude-based ( 1 ) is not sufficient in eliminating the bias that the mismatches can cause in the other types of IMs. This observation is consistent with the results of Fayaz et al., 68 who tested the efficiency and sufficiency of different IMs to explain the EDPs of bridge structures in the context of ground motion simulation validation. Similar observations were made by Li et al. 70 who pointed out the importance of preserving in addition to ( 1 ) in liquefaction analysis. While such detailed analysis is out of the scope of this paper, it is worth mention that Bradley 71 proposed a general procedure to built vector-values IMs to develop hazard consistent analysis when more than one IMs are required. The lower off-diagonal elements of Figure 11 compare the density contours between the selected scaled and unscaled ground motions. In general, it can be observed that the scaled ground motions have higher variability as compared to the unscaled ground motions. One of the reasons for this can be due to the sampling bias where only 200 unscaled ground motions are compared against 2500 scaled ground motions. However, it is also viable that natural ground motions corresponding to a ( 1 ) level have lower variability in terms of other IMs, while scaling of ground motions to match a ( 1 ) do not appropriately influence the other IMs, thereby leading to higher variability. Also, it is observed from the contours that, in many cases, the mean of the unscaled ground motions contour is not similar to the density contour of scaled ground motions, indicating bias due to scaling. Hence, all these mismatches in the IMs between the scaled and unscaled sets can be the basis of the mismatches observed in the EDPs of SMFs. Although the reasons presented here complement the findings of previous studies, the goal of this study is not to conduct an in-depth investigation of the underlying causes of the EDP bias but to present an overview of the possible explanations and provide recommendations in terms of ( 1 ) and . The underlying reasoning behind the EDP mismatch requires further scrutiny and is currently beyond the scope of this study. Finally, the results of this study suggest that the scaling of ground motions requires careful examination in the context of seismic demand analysis.

CONCLUSION
The lack of high amplitude ground motion records has made ground motion selection and scaling process an essential step for the seismic demand analysis in structural and earthquake engineering. Over the years, various measures have been suggested to minimize the impact of extremeground motion scaling on structural demands. This study is a step to provide statistical and machine-learning-based recommendations for scaling ground motions in terms of IM and . The study proposes and implements a 10-stepped procedure to assess the viability of scaling ground motions to match average IM levels and its impact on the structural responses in conjunction with the ground motion intensity level. The procedure involves careful selection of scaled and unscaled ground motions corresponding to different IM levels and using them for NTHA of structural models. The results are then used to develop machine-learning-based surrogate models and obtain statistical predictions for scaled and unscaled responses for various IM and levels. The scaled and unscaled response predictions are then compared through hypotheses testing to determine the ranges of no-significant bias and provide recommendations for ground motion scaling process.
As an exercise of the proposed procedure, the study investigates the impact of ground motion amplitude scaling on two seismic EDPs (i.e., PFA and PIDR) of five steel SMFs varying in their heights (2-,4-,8-,12-, and 20-story) using Bayesian non-parametric GPR-based surrogate models. For this purpose, a total of 200 unscaled ground motions and corresponding 2500 scaled ground motions are selected to represent 10 ( 1 ) levels from 0 g to MCE level (2% in 50 years hazard level). Hence a total of 2700 ground motions are applied to each SMF to conduct NTHA, and the resulting EDPs are utilized to develop GPR-based surrogate models for unscaled and scaled responses. To have better insights into the GPR models, their prediction-and decision-making-processes are explained using the concepts of XAI and SHAP. The scaled GPR surrogate models are then utilized to sample the EDPs for a 50 × 50 grid of ( 1 ) and levels. The obtained posterior distribution of scaled EDPs for each grid point is then compared against the posterior distribution of unscaled EDPs sampled from the unscaled GPR model corresponding to the same ( 1 ). The comparison is conducted using the KS test, where the two distributions are analyzed for statistical similarity and hence checked for possible bias in EDPs due to scaling.
Results of this study indicate that the record scaling (both multiplicative and divisive) introduces bias in the responses of SMFs for both EDPs (i.e., PFA and PIDR). It is concluded that the EDP bias caused by scaling depends on three factors: i) scaling level (i.e., ), ii) target amplitude (i.e., ( 1 )), and iii) fundamental period (i.e., mass and stiffness distribution). In general, it is noticed that the allowable s range between 0.5 and 3.0 for PIDRs, and from 0.6 to 2.0 for PFAs (with p-value = 0.05). It is further seen that the allowable scaling range decreases with an increase in the target ( 1 ) leading to an allowable range between 0.7 and 1.8 for PIDRs, and from 0.8 to 1.6 for PFAs (with p-value = 0.05) for MCE levels. These analyses are essential from both design and analysis points of view since drift limits typically control the design of SMFs, while PFAs define the anchor forces for detailing nonstructural components. Finally, the EDP differences between scaled and unscaled ground motions are briefly explained by analyzing the differences in five different IMs of the ground motions characterizing the energy-, duration-, and frequency content of the ground motions. The analysis shows that a sheer scaling of ground motions using amplitude-based IMs such as ( 1 ) do not appropriately alter the other IMs, and a significant inconsistency is noticed between the other IM relations for the scaled and unscaled ground motions. These IM inconsistencies are postulated to be the reasons behind the mismatch in the scaled and unscaled EDPs. Based on the results of this study, it is clear that further scrutiny and studies are required to understand the underlying phenomenon of scaling ground motions. Furthermore, it is vital to understand the impact of other IMs in the record selection and scaling process, and also, with the influx of further recorded/simulated data, the analysis can be conducted by taking the variability of ground motions and its consequences into account. This study can provide a resource to the community for the utilization of sophisticated machine-learning tools to obtain insights about ground motion selection and scaling problems.

C O N F L I C T O F I N T E R E S T
The authors confirm that there is no potential conflict of interest with anyone for this study

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.