Fast and robust identification of railway track stiffness from simple field measurement

Article history: Received 31 May 2020 Received in revised form 3 November 2020 Accepted 7 November 2020


Introduction
The stiffness of the ballasted railway track is primarily provided by resilient components such as railpads and ballast. The stiffness values of these components are important parameters as they are closely related to track conditions. On one hand, the stiffness of track components changes as their condition deteriorates, e.g., due to ballast damage [1] or worn railpads [2]. On the other hand, deviations of track component stiffness can also accelerate the track degradation process or even lead to new track defects. Hence, track stiffness should be properly monitored during the whole life cycle of railway tracks. In addition, most of the dynamic train-track interaction models require the stiffness values of railpad and ballast as inputs. Therefore, accurate identification of the stiffness of track components from field measurement is very much desired.
A typical ballasted railway track system can be modelled as a layered structure that comprises rails, fasteners/railpads, sleepers and ballast [3]. Track stiffness can be defined as the combined effect of different track components. the track support stiffness refers to the equivalent stiffness provided by all the track components below the rail [4]. In this paper, however, we treat the stiffness values of resilient components in the track system separately, and mainly focus on the stiffness of railpad and ballast. The definitions and values of railpad and ballast stiffness are based on their representations in a two layer discretely supported numerical model (see Section 2.2).
Track stiffness can be measured with controlled excitations or under dynamic train loading using specialized measurement vehicles [5] or operational trains [4]. Controlled excitation methods include impact excitations with hammers [6][7][8][9], falling weights [10], as well as sinusoidal excitations with hydraulic cylinders [11,12]. The main advantage of these methods is that the excitation forces can be measured so that the frequency response function (FRF) can be directed derived. Furthermore, as the excitation is generated in a controlled manner, the measurement noise is usually lower compared to that with the train loading. For excitations using hand-held hammers, the test equipment and setup are cost-effective and relatively simple to implement. Because of the aforementioned advantages, although measured track stiffness can be quite dependent on the loading conditions [13], the hammer impact measurement has been widely used to identify track parameters [14][15][16][17][18][19].
Identification of track stiffness from measurement is in essence an inverse problem. In general, it involves fitting a physical (analytical or numerical) model to measurement. For simple and individual cases, modelled FRFs are manually 'tuned' to fit measured FRFs based on certain FRF features [14][15][16][17][18][19]. However, the manual adjustment is labour-intensive and timeconsuming. In addition, the number of FRF features that can be fitted simultaneously is very limited. Usually, the full track resonance and the rail resonance are used to determine the ballast and railpad stiffness, respectively [14,15,19].
Alternatively, the fitting process can be achieved by solving an optimization problem, where objective functions defining the difference between modelled and measured FRFs are minimized iteratively. This process is also known as model updating. As the process is iterative, a track model has to be evaluated multiple times until the objective functions converge. Therefore, the speed of the optimization process highly depends on the complexity of the model. From simple to complex, the following track models have been used for the optimization process: a rational fraction polynomial (RFP) [20], a twodegrees-of-freedom (2DOF) model [21], an analytical beam on elastic foundation model [22], a finite element (FE) model consisting of only one sleeper beam and two rail masses [1,6], a two-layer beam model [23] and a three dimensional FE model [24]. On one hand, simplified models may not fully reproduce the FRF features [25] and thus may not be robust in some scenarios. On the other hand, complex numerical models may be computationally unaffordable due to the iteration process. For simple track models, conventional gradient-based algorithms may yield satisfactory results [21]. For complex track models, more advanced optimization techniques, such evolutionary algorithms (EAs), are needed to deal with the multimodal and non-convex nature of the problem [26]. Among EAs, the particle swarm optimization (PSO) and its variants have been widely used in the model updating problems both in railway applications [27] and in more general structural identification problems. Extensive comparison studies between different EAs can be found in the literatures for general structural identification problems [28,29], whereas they have not been compared specifically for the track parameter identification problem.
Besides the model updating methods, one can also establish an inverse mapping between structural inputs and outputs. In such a way, the iterative process can be avoided, which significantly accelerates the identification process. For example, based on a beam on elastic foundation model, track support stiffness was correlated to the ratio of two harmonic velocity amplitudes measured from sleeper vibrations subjected to pass-by train loadings [4]. The selection of the feature requires considerable physical insight so that the correlation between such a feature and track stiffness is strictly monotone; otherwise the predictions would be non-unique. A more general approach is to use data driven methods, such as neural networks (NN) [30,31] or the Gaussian process regression (GPR) [32], to learn the inverse relationship, based either on simulated or measured data. An example in railway applications can be found in [11], where twelve features of simulated FRFs were selected to train a NN that can directly predict the properties of railway substructures. For structural identification problems, uncertainties associated with modelling and measurement errors as well as incomplete measurements should be properly addressed to ensure robust identifications. Uncertainties can be reduced by injecting noise to training data [30,33] and prediction fusion [34]. Incomplete measurements can be treated with a careful feature design for the specific problem in concern [35]. In this paper, we choose the GPR as the data-driven process because it is a non-parametric Bayesian method, which show several advantages over other methods. Firstly, compared with conventional parametric regression models, the GPR has more expressive power in the sense that it can handle complex datasets (e.g., high-dimensions) with more flexibility [36]. Another practical motivation to use the GPR is that it can provide both predictions and confidence intervals, as opposed to other kernel based non-parametric regression methods, such as the support vector machine (SVM) and artificial neural network (ANN), which only offer point estimates. Besides, as a Bayesian approach, the GPR can incorporate prior information and training data to infer a posterior with reduced uncertainty [37]. We make use of such reduced uncertainty from prior to posterior, i.e., the entropy change, as an indicator to weight and fuse the predictions by multiple GPR models trained with different datasets.
In this paper, we propose a non-iterative method to directly infer the stiffness of railpad and ballast from measured FRF features based on the GPR. To make the field test setup as simple as possible, we aim to use only one FRF curve for the identification. Section 2 provides an overview of the proposed method. In Section 3, eleven features are selected from a single FRF based on a global sensitivity analysis. In Section 4, multiple GPR models are trained and a fusion strategy is proposed. The performance of the proposed method is evaluated by comparing to a baseline method. Finally, the capabilities of the proposed method are demonstrated with two numerical examples and a field application example in Section 5.

Overview
An overview of the approach adopted in this paper is shown in Fig. 1. Two types of relationships (i.e., a forward and an inverse) between track parameters and FRFs are defined where x ¼ ðx 1 ; x 2 ; :::; x m Þ T is a vector of track parameters defined in m-dimensional space X and y ¼ ðy 1 ; y 2 ; :::; y n Þ T is a vector containing FRF features defined in n-dimensional space Y. In this paper, eight track parameters (m = 8) are considered: x 1rail bending stiffness (EI); x 2 -rail inertia property (qA); x 3 -sleeper EI; x 4 -sleeper qA; x 5 -railpad stiffness; x 6 -railpad damping; x 7 -ballast stiffness; where d is the number of observations. The core task of this paper is to formulate the inverse relationship x ¼ gðyÞ, in particular for two track parameters: x 5railpad stiffness and x 7 -ballast stiffness. For this purpose, we construct two datasets, each with d observations, based on the inverse mapping of f À , i.e., D j = ðy ðiÞ ; x ðiÞ j Þji ¼ 1; :::; d n o , j = 5, 7. Now we need to fit a function g to each dataset D j , which for any given FRF feature vector y Ã can make a prediction of the track parameter x Ã j . In this paper, we use the GPR to accomplish the fitting process. The basic theory of GPR is introduced in Section 2.3.  A major challenge is to consider incomplete measurements as well as uncertainties associated with modelling and measurement errors in the GPR model. Incomplete measurements will lead to dimension changes of FRF features y ðiÞ for different observations, while uncertainties will lead to noisy observations of y ðiÞ . We address these issues specifically in Section 4.

FE track model
The track is represented by a two-layer discretely supported model; see Fig. 2. The rails and sleepers are meshed with Timoshenko beam elements. At each node, only the vertical and in-plane rotational degrees of freedom are considered. The mesh sizes were determined by a convergence analysis, resulting in 6 elements per sleeper span for the rail and 20 elements per sleeper. The total length of the track model is 12 m with 20 sleeper spans. Railpads are modelled using the Kelvin-Voigt (KV) model with an elastic spring and a viscous damper connected in parallel. The railpad stiffness is defined as the spring stiffness of the KV model. In this paper, we model railpads as spring-damper pairs, while neglect the other components, such as clamps, bolts, etc. This is a widely accepted simplification in railway track models. Although bolt tightness cannot be explicitly considered in the current model, it can be reflected in the railpad stiffness. For instance, a loose bolt will lead to a decrease of clamp force, which further results in a decrease of railpad stiffness [21]. Ballast is modelled as discretely distributed KV models under each sleeper nodes. The ballast stiffness is thus defined as the spring stiffness of a single KV model multiplied by the number sleeper nodes.
The field hammer test is usually performed on top of the rail with a hammer excitation either above a sleeper support or at mid-span; see Fig. 2(c). The impact test is simulated by applying an impulse force vertically on the rail node either above the sleeper support (at coordinate x = 6 m) or at the mid-span (x = 6.3 m). The model is solved in the time domain using the Newmark integration with a fixed time step length of 4eÀ5 s. The model and solution process are implemented in Matlab.

Gaussian process regression
We assume each track parameter x j is a collection of random variables {x ðiÞ j 2 X j , i = 1, 2,. . .}, dependent on the FRF features, i.e., x j ¼ gðyÞ. A Gaussian process (GP) is a way to describe a Gaussian distribution over functions. Detailed descriptions of this method can be found in [36]. Here a brief introduction is provided.
We assume the function x j ¼ gðyÞ follows a GP, x j ¼ gðyÞ $ GPðmðyÞ; kðy; y 0 ÞÞ ð3Þ where mðyÞ is the mean function and kðy; y 0 Þ is the covariance (or kernel) function of {x ðiÞ j 2 X j , i = 1, 2,. . .}. The GP can be completely defined by these two functions. The goal is to determine the two functions mðyÞ and kðy; y 0 Þ from a training set D j = ðy ðiÞ ; x  In the current case, we assume the track parameter x j has a zero-mean distribution by normalizing {x ðiÞ j , i = 1, . . ., d} between [-1, 1]; see also Section 4.1.2. Thus, The covariance function measures the correlation between two particular values x ðpÞ and x ðqÞ . A commonly used covariance function is the squared exponential covariance function covðx ðpÞ ; x ðqÞ Þ ¼ kðy ðpÞ ; y ðqÞ Þ ¼ r 2 where l k is the length scale, r 2 f is the signal variance, r 2 n is the noise variance and d pq is a Kronecker delta which is one if p = q and zero otherwise. The parameters H ¼ fl k ; r 2 f ; r 2 n g are called hyperparameters and are determined during the training process. Note if the length scale is the same for all predictors, such a covariance function is isotropic. If the length scale is different for different predictors, such a covariance function implements automatic relevance determination (ARD) and is referred to as an ARD kernel.
We x T ðK þ r 2 Now that we have the joint distribution of X (assuming we trained the GPR on dataset D to obtain the hyperparameters), we can make a prediction of x* given any new observation y*. The training outputs X and the prediction x* together should also follow a joint Gaussian distribution, where KðY; y Ã Þ denotes the covariance between all the y in Y and y*, and similarly for the other entries. They can be calculated according to Eq. (5). The prediction can be made now based on the Bayesian inference and the distribution of x* can be solved as where K Ã ¼ KðY; y Ã Þ, K ÃÃ ¼ Kðy Ã ; y Ã Þ and K ¼ KðY; YÞ.

Two individual cases
We first simulate two individual cases [38,39] of field hammer tests. The parameters are listed in Table 1. The major difference between the two cases is that the railpad stiffness is 1300 MN/m and 90 MN/m for Case 1 and 2, respectively. The purpose of simulating the two cases are two folds: (1) to use them as references to validate the proposed track model and (2) to identify the features of track FRFs with different track parameters.
The FRF magnitudes calculated for Case 1 (stiff railpad) and Case 2 (soft railpad) are shown in Fig. 3 on the left and right column, respectively. For each case, we evaluate the point FRFs of the rail above a sleeper (first row of Fig. 3) and at mid-span (second row of Fig. 3). The results from the current model are compared with field hammer tests, as well as a 3D [38] and 2.5D FE model [39] for Case 1 and 2, respectively. In general, the current model yields good agreement with the measurements as well as the reference models.
The peaks in the FRF curves indicate track resonances (TR). To identify the TRs associated with each peak, we perform an eigenanalysis of the FE track model. Based on the eigenfrequencies and corresponding eigenmodes of the rail, we construct the dispersion relations of the vertical rail bending waves; see Fig. 3 (e)~(f). Note here only the bending waves are shown; a more detailed description of other rail waves can be found in [40]. We can pinpoint the eigenfrequencies of the TRs at the cut-on frequencies of the rail waves (Wave 1-9). Some of the track resonances correspond to the peaks in FRF curves (see the red triangles in Fig. 3) while the others do not (the black triangles). This is a consequence of incomplete measurements.  Because only the FRFs at two locations on the rail top are measured, TRs associated with sleepers or with the measurement points as nodes cannot be reflected in the FRFs. Fig. 4 shows the mode shapes of the identified track resonances (TR 1-9). The main features of these TRs are summarized in Table 2 and are briefly described below: Full track resonances (FT-a/b): the rail and sleeper vibrate in-phase at rail seats, with translational (symmetrical) and rotational (anti-symmetrical) rigid sleeper modes, respectively. Sleeper resonances (S1~S4): the sleeper vibrates in-phase with rail at rail seats, in the first four sleeper bending modes.  Rail resonances (R1-a/b, R2-a/b): the rail vibrates in anti-phase with sleeper at rail seats, with the wavenumber of 0 (infinite wavelength) and p=L (wavelength of 2L) for R1 and R2, respectively, where L is the sleeper spacing. The mode shape of the rail at R2 show patterns with nodes (at which the displacements are zero) located at the mid-span.
Pin-pin resonance: the rail vibrates with the wavenumber of p=L (wavelength of 2L) and nodes above the sleeper, i.e., with the same wavelength but a phase shift of p=2 compared to R2.
The idea is to use (some of) the TRs as the features of the FRFs to infer track stiffness. Qualitatively, from the mode shapes of different TRs, the railpad stiffness is mainly associated with the rail resonances (R1 and R2) because of the anti-phase vibration between rails and sleepers at rail seats, whereas the ballast stiffness is more related to the full track resonances (FT-a/b). In fact, the manual fitting of a modelled FRF to measurement is usually performed based on these two resonances [14][15][16][17][18][19]. In addition, the sleeper resonances should also provide valuable information for the identification of both railpad and ballast stiffness. However, the sleeper resonances are not always identifiable in the rail FRFs. For example, in Case 2 the sleeper resonances (S2 and S3) are well isolated by the soft pads and therefore are not reflected in the rail FRFs. Moreover, due to the change of railpad stiffness, the sleeper resonances and the rail resonances switch in sequence in TR 4~7. These issues add to the difficulties of including the sleeper resonances in the FRF features.
Compared with the FRFs, the features of dispersion relations are more consistent, because TR 1~7 are always 'identifiable' in the dispersion relations. We will take advantage of this to generate the training datasets for the GPR model (see Section 4.1.2).

Global sensitivity analysis
To facilitate the selection of FRF features, we perform a global sensitivity analysis [41] over a predefined parameter space to evaluate the sensitivities of several FRF features to railpad and ballast stiffness.
To perform the global sensitivity analysis, the parameter space X has to be first defined. This means to determine the upper bound x u i and lower bound x l i of the track parameters x i (i ¼ 1; 2; :::; 8). Rail parameters are relatively easy to determine from the nominal design values. Here we use the properties of UIC54 type rail as the nominal values, which are increased and decreased by 5% to obtain the upper and lower bound, respectively. The value ranges of sleeper, ballast and railpad are determined based on the values reported in a wide range of literatures [13,17,19,38,39,[42][43][44][45][46][47][48]. The defined parameter space is listed in Table 3.
3.2.1.2. Global sensitivity analysis. We assume each track parameter is uniformly distributed in the parameter space defined in Table 3, i.e., x i $ Uðx u i ; x l i Þ; i ¼ 1; 2; :::; 8. Subsequently, we use the Sobol sequence sampling method to generate N samples of track parameters x ðkÞ 2 X 8 (k = 1, 2, . . ., N) from the distribution. With the sampled track parameters, we perform both the time-domain analysis and eigenanalysis using the FE track model. Furthermore, to rank the sensitivity of different track parameters, a variance-based global sensitivity index is calculated following a numerical procedure based on Monte-Carlo Simulations. The details of the procedure can be found in [41].

Magnitude features of FRF
In this paper, to ensure a simple test setup, we propose to use the point FRF above the sleeper support for the identification of track stiffness. Fig. 5 shows the simulated point FRFs above the sleeper support with sampled track parameters as inputs. Herein, we use MAG1, MAG2 and MAG3 as the magnitude features of the FRFs. They denote the sum of the magnitudes of the point receptance in three different frequency bands, i.e., between 10-100 Hz, 150-900 Hz and 1050-2500 Hz, respectively.
The scatter plots in Fig. 6 show the correlations between the FRF magnitude features and different track parameters. In the plots, a uniform cloud of points means the magnitude is not sensitive to the change of the parameter (see the grey cloud points in Fig. 6). On the other hand, if a 'pattern' can be observed, e.g., the blue and red points, the corresponding parameter is considered to be influential. It can be seen the FRF magnitudes below 100 Hz (MAG1) are most sensitivity to the ballast stiffness, while in the mid frequency range 150~900 Hz (MAG2), the magnitudes are most sensitive to the railpad stiffness. The sensitivity index of different track parameters are presented in Fig. 6(c). For MAG3, railpad damping (x 6 ) and stiffness (x 5 ) are the two most sensitive parameters. Furthermore, an exponential correlation can be observed between the ballast/

Frequency features of FRF
The frequencies of TR1, TR4, TR5, TR6 and the pin-pin resonance are selected as the frequency features for the sensitivity analysis. The influence of different track parameters on these features are shown in Fig. 7. The sensitivity index in Fig. 7(b) indicates that the pin-pin resonance at around 1000 Hz is most sensitive to the rail properties (x 1 , x 2 ) and therefore should be excluded from the FRF features for the identification of railpad and ballast stiffness. In contrast, TR1, TR4 and TR5 should be considered for the identification of ballast stiffness (x 7 ), while TR4, TR5 and TR6 should be considered for the identification of railpad stiffness (x 5 ). This confirms that the sleeper resonances are also correlated with railpad and ballast stiffness.
Besides, the resonance frequencies of TR4~TR6 increase nonlinearly with increasing railpad stiffness; see x 5 in Fig. 7(a). Such nonlinearity has two consequences. First, TR4 and 5 converge to a single resonance at low railpad stiffness; see Fig. 3(b) as an example. Second, as railpad stiffness increases, it becomes less influential on the resonance frequencies of TR4 and TR5.

. FRF features as input predictors
We choose 11 features from the FRF curve as the potential predictors for the GPR model, see feature 1~11 in Table 4. The 11 FRF features are chosen based on the processes of feature identification (Section 3.1) and feature selection (Section 3.2).
First, in Section 3.1, we investigated the track resonance behavior and extracted corresponding FRF peaks as the potential features. More specifically, the natural frequencies and FRF magnitudes at TR 1/2, TR4, TR5, TR6 and the pin-pin resonance are chosen as the potential features. In addition, we choose MAG1, MAG2 and MAG3 as shown in Fig. 5 to reflect the general trend of the FRF curve.
Subsequently, in Section 3.2, we performed a global sensitivity analysis to further select the most sensitive features to the change of railpad and ballast stiffness, respectively. As a result, we select 11 features in total.
Besides, as the sleeper properties (EI and qA) may be known a prior and are correlated with certain FRF features (see TR4~6 in Fig.7), they are also selected as the potential predictors (feature 12 and 13 in Table 4).

Datasets generation
We generate the datasets used for training and testing following the procedure below: 1. Follow the same procedure as in Section 3.2.1.2 to sample N sets of track parameters x ðkÞ 2 X 8 (k = 1, 2, . . ., N).
2. With each sample x ðkÞ , perform an eigenanalysis and construct the dispersion relations as shown in Fig. 3(e) and (f). The cut-on frequencies of Wave 1, 4, 5, 6 are stored in the feature vector y ðkÞ j (j = 8,9,10,11). This ensures that we can always extract the frequencies of TR5 and TR6 even if they are not identifiable in FRF curves (e.g., Fig. 3(b)). 3. With each input sample x ðkÞ , calculate the point receptance of the rail node above the sleeper located at the middle of the track length.

Noise injection
Because the dataset is generated based on numerical simulations, the uncertainties introduced by measurement noise and modelling errors should be properly considered to ensure robust performance when applying to real measurement data. In the current case, the uncertainties are mainly associated with the FRF features. Hence, we add a Gaussian noise to each feature, It should be noted that adding noise to the inputs of a GPR model is equivalent to treat the inputs as deterministic while adding an extra term to the output noise r 2 n I in the covariance matrix in Eq. (5) [49]. Ideally, a heteroscedastic GPR model [50] should be used to model this extra term with more accuracy. In this paper, however, we still resort to the standard GPR model and include this term in the output noise r 2 n I for simplicity. Hereafter, we refer to the GPR models trained by the noise free datasets as the noise free model, and those trained by the noisy datasets as the noisy model, although they all include a noise term in the outputs.

Incomplete measurement of FRF features
In real-life applications, extracting all the 11 features from a measured FRF is not always possible, which will lead to incomplete measurements of FRF features. There are two main sources of incomplete measurements. The first source is associated with railpad stiffness as discussed in Section 3. For example, in the reference Case 2, only 7 features can be extracted due to the soft railpad, i.e., feature 1, 2, 5, 6, 7, 8 and 9; see Fig. 8(b). The second source is related to the valid frequency ranges of different type of hammers used for the excitation. In principle, an impact hammer with a light weight can only effectively Log of MAG2 7 Log of MAG3 8 Frequency of TR1 9 Frequency of TR4 10 Frequency of TR5 11 Frequency of TR6 12 Sleeper inertia (qA) 13 Sleeper bending stiffness (EI) excite the structure vibrations in mid-to high-frequency ranges (approximately zone II(*) and III in Fig. 8), whereas a heavier hammer is more accurate in low to mid-frequencies (approximately zone I and II(*) in Fig. 8) [22]. Because of the two sources of incomplete measurements, we expect different combinations of FRF features in different scenarios, as summarized in Table 5. In particular, we use five feature classes, i.e., A, B, C, D and E, to describe the variability of the valid frequency ranges. Within each feature class, we further include four feature combinations to consider the variability due to different railpad stiffness. We select 12 and 10 feature combinations for the identification of railpad and ballast stiffness, respectively, based on the results of the sensitivity analysis in Section 3.2.

Kernel functions
Four different types of covariance functions are considered in this study, i.e., the exponential (EXP), the squared exponential (SE), the rational quadratic (RQ) and the Matern 32 (M32). For each type, we implement both the isotropic kernels with the same length scale for all predictors, as well as the ARD kernels with separate length scale per predictor (see Section 2.3). The details of these kernels can be found in many literatures (e.g., [36]) and thus are not repeated here.

Prediction fusion
With each training set D $ i or D i generated in Section 4.1, we can select a feature combination and a kernel from Section 4.2 to train a candidate model. The prediction process starts with a feature vector y* extracted from an FRF curve. If the feature index of a feature combination in Table 5 is a subset of the feature index of y*, candidate models trained by this feature combination can be used to make predictions of x*. In general, a feature vector will activate more than one candidate models and therefore register multiple predictions. We use a generalized Product of Experts (PoE) method [51] to fuse the predictions by different candidate models. Each candidate model is regarded as an expert. The fused probability distribution is the product of the posterior distribution of each expert, weighted by a factor, taking the form where p i ðx Ã jy Ã Þ is the posterior predicted by the i-th expert, a i ðy Ã Þ is a weighing factor depending on the reliability of the i-th expert, Z is to normalize the fused prediction Pðx Ã jy Ã Þ so that it sums to one.
The purpose of a i ðy Ã Þ is to give more weights to the experts with more reliability while less weights to less reliable experts. Therefore, a i ðy Ã Þ is defined as the change in entropy from prior to posterior. Entropy is a measure of the amount of uncertainty in a distribution HðNðl; RÞÞ ¼ Thus, the entropy change from prior to posterior is where Kðy Ã ; y Ã Þ and R Ã i is the variance of the prior and posterior distribution; see Eq. (9). Note that the prediction fusion takes almost no extra computation time, because Kðy Ã ; y Ã Þ and R Ã i in Eq. (13) are readily available once the predictions are made. In the current case, if a candidate model is not activated, the posterior is the same as the prior, i.e., a i ¼ 0. This means an inactive model would not contribute to the fused prediction. Also, a larger entropy change a i ðy Ã Þ means the model is more certain about the prediction, and thus gives a larger weight in the fused prediction according to Eq. (11). The fused prediction is also a Gaussian with mean and variance i are the mean and variance of the posterior by the i-th expert and can be obtained according to Eq. (9). Fig. 9 shows the process of prediction fusion for railpad stiffness as an example. For each training set, we consider twelve feature combinations (see Table 5) and eight kernels. Thus, we have in total 96 candidate models arranged in a 12 by 8 matrix. According to Eq. (11), raising a posterior to the power of a weight gives a weighed prediction for each candidate model. The fused prediction is the product of selected weighed predictions. In theory, we can fuse any combination of elements in the matrix. In practice, we usually fuse the predictions by different kernels for each feature combination or fuse the predictions by different feature combinations for each kernel, as indicated by the red and blue blocks in Fig. 9.

Performance analysis
We evaluate the performance of the proposed method using the two test sets from D i (noise free) and D i (with 5% injected noise) defined in 4.1.2. The mean absolute percentage error (MAPE) is used as the metric for the evaluations, wherexÃ ðkÞ and xÃ ðkÞ are the predicted and target values of the k-th sample.

Baseline method
We use a model updating method implementing the particle swarm optimization (PSO) [52] as the baseline model. Because the PSO is computational intensive, to ensure fast evaluations, we employ meta-models of the FE track model in the optimization process. The approach is similar to that in [53,54], but with a different optimization algorithm. The metamodels are also GPR models trained and tested on datasets generated in a similar way as in 4 whereŷ j;x and y j are the simulated value given x and the target value for the j-th FRF feature, respectively.

Prediction accuracy
We first compare the performance of different kernels on the two test sets; see Fig. 10. For the railpad stiffness, the ARD kernels outperform the isotropic kernels. The ardM32 kernel shows the most balanced performance, with a slightly higher MAPE than the baseline method on both test sets. For the ballast stiffness, the ardM32, ardRQ and ardSE perform the best with nearly identical results. In both cases, fused predictions by all kernels outperform all the individual kernels, as well as the baseline method, by a large margin.
To evaluate the performance of the proposed method for incomplete FRF measurements, we further compare the performance of the candidate models with different feature combinations; see Fig. 11. In the noise free case, fusing the predictions by different kernels does not seem to improve the performance compared to a single kernel, with comparable performance to the baseline method. In the noisy case, however, fusing the predictions by different kernels outperform the baseline method as well as the model with a single kernel.
We now discuss how incomplete measurements in different scenarios influence the model performance. For the prediction of railpad stiffness, including the sleeper-related features can improve the model performance. For example, A1 and A2 outperform A3 and A4 because they include the sleeper resonances in the predictors; see feature 3 and 4 in Table 5. A1 and A3 outperform A2 and A4, respectively, because they include the sleeper properties (feature 12 and 13) in the predictors. The same applies to feature class B and C. In addition, including the features in a wider frequency range in general improves the performance; see the decrease of MAPE from feature class E to B and to A.
For the prediction of ballast stiffness, the candidate models with the sleeper properties in the predictors (A1, A3, C1, C3, D1) perform better than their counter-parts without the sleeper properties. Furthermore, all models show consistent performance, except for model D1 and D2 with features only in the low frequency range (Zone I in Fig. 8). In the worst-case scenarios with only three features as the predictors (E4 and D2), the MAPEs predicted by the fused model are approximately 12% and 6% for the railpad and ballast stiffness, respectively. Based on the results in Figs. 10 and 11, fusion by kernels is more effective than fusion by features. For each training set, we consider 96 and 80 candidate models for the prediction of railpad and ballast stiffness, respectively. Although the numbers seem to be large, each candidate model can be trained independently in parallel. Moreover, as  ballasted railway tracks share almost the same structure and are periodic, once the candidate models are trained, they can virtually be applied to almost all ballasted railway tracks. Table 6 Comparisons of the prediction time (in seconds) for predicting railpad stiffness between the proposed and the baseline method.    Table 6, assuming all the 13 features are known. A clear advantage of the proposed method is that it is highly scalable. Making predictions of 400 samples only takes a slightly more time than the prediction of one sample. With the proposed method, the most time consuming step for a single prediction is inversing the covariance matrix shown in the right hand side of Eq. (9). Once the inversed covariance matrix is available, the prediction takes almost no time as it is non-iterative and only involves simple matrix multiplications (see Eq. (9)). For multiple predictions in a batch, e.g., the case of 400 samples shown in Table 6, the inversion is performed only once at the beginning of the procedure and stored in the memory for the all the predictions in the batch. This means the marginal time cost for an additional prediction is trivial. In contrast, with the PSO, each prediction involves an iterative process, which takes about 30-100 s, depending on the test cases. For multiple predictions, the total required time is the addition of all the predictions and therefore are much more time consuming than the proposed method.

Numerical examples
We applied the proposed method to the two reference cases presented in Section 3.1. The simulated FRF curves and dispersion relations are used for the feature extraction and subsequent predictions. Fig. 13 gives an example of the predictions made by a single kernel (ardM32) with different feature combinations. It is noticed that the predictions by the noisy models ( Fig. 13 (a) and (c)) have larger confidence intervals than the noise free models (Fig. 13 (b) and (d)). Meanwhile, the mean predictions by different noisy models are less biased than those by the noise free models. In other words, the noisy models show more consistent performance with incomplete measurements. Fig. 14 shows the predictions by different kernels, fused by feature combinations for each kernel. As is the case in Fig. 13, the predictions by the noisy models show larger confidence intervals while are more consistent in terms of the mean predictions.
We further fuse the predictions of all the kernels for the noise free and noisy models, respectively. Results are compared with the true values in Table 7. The prediction errors for Case 2 are larger than those for Case 1. This is because in Case 2 the track model has a shorter sleeper length and a larger sleeper spacing compared to the FE model used for training the GPR models, which means the noise level due to modelling errors are higher compared to Case 1. In addition, Case 2 has a set of incomplete FRF features, which according to Fig. 11 also leads to larger prediction errors.

Field test setup
The test site is located at Faurei Railway Test Center in Romania. The field test was conducted on a section of wellmaintained ballasted track. The track system consists of UIC60 E1 rails, Vossloh W14 fastening system, prestressed concrete sleepers (type B70-W60) with 600 mm spacing and a ballast layer. We have installed 11 accelerometers (PCB 356B21, three dimensional) on the rail, as well as 4 accelerometers (Bruel & Kjaer 4514-004, one-dimensional) on the sleeper, see Fig. 15. We attached three accelerometers to the rail head, web and foot, respectively at two locations, i.e., above the sleeper (R6, R7 and R8) and at mid-span (R2, R3 and R4).
We used two types of hammers for the hammer test: a small hammer that weighs 0.32 kg (PCB 086D05) and a big hammer that weights 5.5 kg (PCB 086D50). Hammer impacts were applied at two locations, one on the rail top above the instrumented sleeper and the other on the center of the sleeper. For each measurement with each hammer, the impact was repeated several times until at least five impacts show a good coherence.

Feature extraction
As the GPMs are trained with the point receptance of the rail above sleeper, we use the hammer force at impact location 1 and measured accelerations from R6, R7 and R8 to construct the FRF curves for the feature extraction. For each hammer type,  Fig. 8(b), the fusion is by A3, A4, B3, B4, E3, E4 for the railpad stiffness and A3, A4, C3, C4, D1, D2 for the ballast stiffness. See Table 5 for the details of the feature combinations.
the test was repeated five times and therefore we can construct 15 FRFs with the three sensors. The results are shown in Fig. 16(a). The validity of the results from the two hammer types can be evaluated from two perspectives. The first perspective is based on the coherence between different measurements. For the big hammer, the results are consistent till about 800 Hz with small variances (see the shadings in Fig. 16(a)), whereas the results for the small hammer can be considered reliable above about 300 Hz. Note that the shadings include the signals from three accelerometers while still maintaining small variances. This means we can use any of the three accelerometers for the feature extraction without affecting the final results. The other perspective is the ability to provide adequate energy to excite the necessary track resonances. Between 300 Hz till about 1400 Hz, the consistency of the results from the small hammer is better than that from the big hammer. However, some of the track resonances (e.g., S2, R1-a, S4) cannot be fully excited by the small hammer because of its low excitation energy. Therefore, the information from the results of both hammers should be used between 300 Hz and 1400 Hz. As a result, we construct a new FRF curve for the feature extraction, as indicated by the dashed line in Fig. 16 (a). The FRFs of the big hammer and small hammer are used for the frequency below 300 Hz and above 1400 Hz, respectively. In between, the average results of the two methods is adopted. The features of the new FRF curve are shown in Fig. 16(b).

Prediction results
Compared to the numerical examples, the measured FRF features contain a higher level of measurement noise. Additionally, the modelling errors also exist due to the assumptions such as linearity and homogeneity, which might not hold true in the current application. As a result, the predictions made by the noise free models deviate from those by the noisy models; see Fig. 17. Despite this, the predictions by the noisy models are very close to those made by the baseline method; see Table 8. For real-life measurements, it is recommended to use the noisy model for the predictions.
With the predicted track stiffness, we simulate the field hammer tests for the two impact locations as shown in Fig.15. Fig. 18 (a) and (b) compare the calculated and measured FRFs with different accelerometers for the impact on the rail and on the sleeper, respectively. Note that the predictions of track stiffness are made based only on the measured point rail FRF above the sleeper, see the first plot in Fig. 18 (a). It can be seen that the calculated FRFs are in good agreement with the measurement not only for the point FRFs, but also for the other transfer FRFs.
It is also noted that in the measured FRFs, three distinct peaks can be observed above 1000 Hz, especially for the impact on the sleeper, see Fig. 18 (c). These peaks are less damped than those below 1000 Hz. Therefore, by reducing the ballast damping from 120 kNÁs/m to 0.12 kNÁs/m, we can reproduce these peaks as shown in Fig. 18 (c). This suggests that the ballast properties are frequency dependent and the ballast damping reduces to a very low level above 1000 Hz in this case. The three peaks turn out to be three high order sleeper bending modes.

Limitations of the proposed method
There are some limitations of the proposed method in estimating the track stiffness. First, the railpad and ballast stiffness are measured in the unloaded condition. In reality, the railpad and ballast stiffness are non-linear and increase with the preload on the track [2,13,21]. This means the track stiffness measured with the proposed method may deviate from that under train loading. In future studies, a train-borne measurement system based on ABA can be developed to measure the track stiffness under the loaded condition.
Second, the stiffness of the railpad and ballast is assumed homogeneous in the track model. This means the proposed method does not consider the stiffness variations at different sleeper supports or under a single sleeper. As the influence of a dynamic wheel load is within about five sleeper span [55], the hammer impact used in the proposed method should have an even smaller range of influence. Therefore, the track stiffness estimated by the current method represent the averaged stiffness in the vicinity to the hammer impact location. To be able to consider the stiffness variations, the stiffness val-   ues of the railpad and ballast should be defined and identified separately for each sleeper. With more track parameters to be identified, using only one FRF may be insufficient as the problem becomes under-determined. Instead of using a single accelerometer in the proposed method, a more complex instrumentation setup is required [4,6].

Comparisons between proposed method and previously used techniques
As discussed in Section 1, the techniques that have been used for the track parameter identification in the literatures are all optimization-based methods, see Table 9. These methods usually consist of three key components, i.e., a track model, a single or multiple objective functions and an optimization technique. However, there has not been a comparison study between different techniques as to their performance, which makes the choice of a baseline method difficult. In this section, we provide a comparison between the different techniques to justify the choice of the baseline model in this paper and also to further demonstrate the advantages of the proposed method.

Track models
The most accurate track models are 3D solid FE models [24,38,39], as they make the least assumptions and simplifications. The main drawback is however the high computational costs they require. The track model in this paper uses the Timoshenko beam element instead of solid element to mesh the rail and sleeper, which greatly reduces the degrees of freedom of the system and therefore is much faster to solve. In terms of accuracy, we have shown in Fig. 3 that the track model used in this paper is accurate up to about 1000 Hz compared to the solid FE model, which is adequate for the current problem. The other track models listed in Table 9 further simplify the rail, sleeper or both as rigid masses. Such simplifications will result in a lack of certain track resonances in the FRF. For example, when the sleepers are modelled as rigid masses, track resonances related to sleeper bending modes (see S1~S3 in Fig. 3 and Table 2) are not present in the FRF [25,39]. In summary, the current track model represents a good balance between the accuracy and efficiency.

Objective functions
In structural identification problems, objective functions are usually based on modal parameters (i.e., natural frequencies and mode shapes) or FRFs. In the track parameter identification, mode shapes are usually not used in the objective function, as shown in Table 9, mainly because they are not easily measurable with simple instrumentations. In the baseline model used in this paper, the objective function (Eq. (16)) is based on a combination of natural frequencies and FRF features.
Note that we only make use the FRF features instead of the whole FRF curve. This is mainly for practical reasons. First, our method is based on the meta-models of the FE track model (see Section 4.4.1), which means for each feature we train a metamodel. By selecting only a few features, we keep the training cost as low as possible. Furthermore, some frequencies are redundant for the identification of parameters; the best approach is to select a few frequencies that carry as much information as possible [56].
Moreover, we adopt a single objective function that combines the natural frequencies and FRF magnitude features, instead of two multi-objective functions that treat them separately as in [24]. The two approaches will be compared in the next section.

Optimization algorithms
For simple track models, conventional gradient-based algorithms may yield satisfactory results [21]. For more complex track models, more advanced optimization techniques are needed to deal with the multimodal and non-convex nature of the problem. We compare three commonly used EAs, i.e., the PSO, GA and multi-objective GA both in terms of the robustness and time complexity, as shown in Figs. 19 and 20, respectively. It can be seen that the PSO in general performs the best among the selected EAs. Only in the worst case scenario, with only three features available in the measured FRF (see E4 in Fig. 19), the multi-objective GA outperforms the PSO. However, the multi-objective GA is also more computationally expensive than the other methods.
In addition, the proposed method is more robust than the selected EAs; see Fig. 19. Only with feature combinations A2 and A4, the PSO performs slightly better than the proposed method. In terms of the time complexity of the proposed method, the time required of running multiple predictions is only slightly larger than a single prediction, whereas the selected EAs take much longer time to run multiple predictions. The highly scalable feature of the proposed method is a major advantage of the proposed method.

Conclusions
We propose a new method to directly infer railpad and ballast stiffness from a single FRF using the GPR. The robustness of the method is ensured by a careful selection of eleven FRF features as the predictors, as well as the prediction fusion strategies that automatically filter out unreliable predictions from multiple candidate GPR models. We show that including more features related to sleeper resonances and fusing the predictions by different kernels are two effective ways to reduce the prediction errors.

MAPE
Ballast stiffness, noise free Railpad stiffness, noise free Fig. 19. Comparisons of the accuracy between different optimization based algorithms and the proposed method (GPR). PSO is used as the baseline in this paper. See Table 5  We compare the performance of the proposed method with a model updating method using PSO on two synthesis datasets in a wide range of scenarios. In general, the proposed method outperforms the PSO method both in terms of accuracy and time efficiency. In the worst-case scenario with only three available features and 5% injected noise, the average prediction errors by the proposed method for the railpad and ballast stiffness are approximately 12% and 6%, respectively. Moreover, the method is highly scalable in the sense that it enables fast predictions for large datasets.
Both the numerical and field application examples show that the candidate models trained with injected noise are more consistent with each other and therefore are more suitable for real-life applications. The field application example also shows that the proposed method is capable of extracting the stiffness values using only the point FRF of the rail above a sleeper support with a simple setup, i.e., only one accelerometer and one impact location.
In future studies, the proposed method can be applied to field measurement data for the monitoring of track conditions. Although the proposed method focuses on the FRFs measured by field hammer test, it can be extended to features measured in operational conditions, especially considering the characteristic frequencies of railway tracks excited by ambient vibrations agree with the TRs identified by hammer tests [57].

Declaration of Competing Interest
None.