Warped Gaussian processes for predicting the degradation of aerospace structures

Gaussian processes (GPs) can be used to predict future states of a system with credible intervals when considering multiple previous trajectories for training. For example, predicting the degradation of mechanical structures is one application in which they have shown their usefulness. In modeling the system output as a GP, the output is presumed to be normally distributed—assuming the predictions to be defined from negative to positive infinity. However, this assumption does not hold in many applications as, for example, crack lengths and damage indices can only assume positive values. Moreover, several degradation trajectories for training are rare in real-world applications, and the current state of a monitored system, which is used to update the prediction, can often be not directly measured. This paper presents an approach that utilizes warped GPs for treating data that is not normally distributed while considering multiple degradation trajectories for training. The approach is successfully applied to two different crack propagation examples: first, an analytically computed pre-cracked infinite plate, and second, two equally manufactured aluminum structures that resemble a lower section of a wing. For the investigated aerospace structures, we use finite element (FE) simulations to generate multiple degradation trajectories for training. To estimate their hidden degradation states, we infer the current crack length from strain measurements by using Bayesian inference. The results show that the approach of warped GPs provides more accurate predictions than using standard ones for non-normally distributed data, as is the case for crack growth problems. The approach enables quick training of warped GPs while considering multiple training trajectories. Additionally, the crack lengths estimated from strain measurements agree well with the visually inspected ones. Ultimately, the presented approach enables estimating the current and future degradation states with credible intervals that can be used to improve maintenance scheduling.


Introduction
Mechanical structures are usually used under changing load conditions and must withstand fatigue loads. Fixed inspection intervals ensure safe operation but are often too conservative, resulting in unnecessary system downtimes and increased operational costs. In order to overcome these drawbacks, new approaches consider variable inspection intervals. These include structural health monitoring (SHM) and prognostics and health management (PHM). While SHM monitors the current state of a system by processing sensor data, PHM extends this by predicting the state of the system and thus forecasting future damages. 1 This leads to benefits such as improved operational reliability and reduced operating costs. 2 Several machine learning methods can be used to predict the degradation of structures. Since recurrent neural networks (NNs) are explicitly suitable for dealing with time-series data, they are widely used for prognostics. 3 For example, Do et al. 4 utilizes recurrent NNs to predict the crack growth in structures and Malhi et al. 5 to estimate the degradation of bearings. Moreover, Heimes 6 and Peng et al. 7 used them to forecast the time to failure of turbines. However, one drawback of recurrent NNs is that they do not provide any information on their predictions' uncertainties by default, which is especially important for fatigue problems.
Support vector machines are another method that researchers frequently use. Many authors [8][9][10][11][12][13] utilized support vector machines to predict the degradation of bearings. Furthermore, Fumeo et al. 14 used them to estimate the remaining useful life of railway transportation systems. As recurrent NNs, support vector machines do not provide estimates about the uncertainty associated with the prediction by default. Yet, the fatigue life of structures, especially at the beginning of usage, is highly uncertain. Using a prediction method that does not account for uncertainties would suggest a false certainty about the fatigue life.
By contrast, Gaussian process regression (GPR) is a data-driven approach for PHM 15 which has the advantage of modeling the uncertainty of the predicted states. This is particularly important in fatigue mechanics since fatigue-induced damage is highly uncertain. 16 Even though there exist methods for leveraging recurrent NNs and support vector machines to predict their outputs with credible intervals (e.g. mixture density networks or relevance vector machines), Gaussian processes (GPs) describe distributions of trajectories rather than points and thus enable predicting the entire degradation process instead of merely the fatigue life. This allows us to estimate the probability of failure for each future time point. A GP is defined by a mean and covariance function 17,18 that greatly influences prediction accuracy. In order to improve the selection of these functions, Pfingstl and Zimmermann 19 introduced an approach to infer them from previously collected trajectories. Yet, a major disadvantage of GPs is that they presume the data to be normally distributed, thus assuming the predictions to be defined from negative to positive infinity.

Contribution
In many applications, presuming normally distributed data is not valid as, for example, crack lengths and damage indices can only assume positive values. Moreover, training data such as degradation trajectories of aircraft structures is often rare in real-world applications, and the state of the system, for example, the crack length, is hidden. In order to improve and utilize GPs for PHM problems, we (1) transform the data before training the GP, (2) incorporate prior knowledge by using analytical equations and results from finite element (FE) analyses, and (3) infer the hidden state of the system from sensor data by applying Bayesian inference. The proposed approach is used to predict the crack growth in an infinite plate and in two similar aerospace structures resembling a lower part of an aircraft wing. Even though it is currently common practice in the aviation industry to repair a crack as soon as it is detected, in the future, with a fully operating SHM system, this might become too expensive. Predicting the crack growth makes it possible to wait until either multiple parts require repair or structural integrity can no longer be assured. Current requirements are primarily based on NDI/visual inspections where the aircraft is already grounded, and the location can be accessed rather easily once the engineer has found the crack. This might change when reliable SHM systems are demonstrated. The proposed approach is not limited to predicting crack lengths. Predicting other damage indices is also possible.

Definition of terms
In the course of this work, we distinguish between previous and current data. In this paper, previous data consists of several trajectories describing the degradation of a mechanical system. These trajectories can be gathered by executing simulations or experiments. By contrast, current data represents the data of the currently monitored mechanical system for which degradation is to be predicted. Furthermore, we use the term degradation to describe the fatigue process in mechanical systems. In this paper, we use this term particularly for propagating fatigue cracks.

Gaussian processes
Over the last decade, many researchers have frequently been using GPR for nonlinear regression, see Reference, 20-32 as the model provides not only the predicted function value itself but also its credible interval. A GP is fully defined by its mean and covariance functions, m y, u (x) and k y, u (x, x 0 ), with where usually predefined mean and covariance functions with some free parameters u are used. These parameters are optimized by minimizing the negative log-likelihood (NLL) in order to fit the model to the training data.

Basis functions
In order to use GPs for PHM, an approach based on several previously collected trajectories was recently published by Pfingstl and Zimmermann. 19 By using p linearly independent basis functions f y (x) = ½f (1) y , :::, f (p) y > with k 2 1, :::, p to fit all m previously collected trajectories fx (j) , y (j) g with j 2 1, :::, m the GP can be expressed as wherem b andŜ b are the sample mean vector and the sample covariance matrix of the fitting coefficients b (j) , respectively. The fitting coefficients b (j) can be determined by applying linear regression. Therefore, no free parameters u of the mean and covariance function have to be optimized by minimizing the NLL. This turns the, in general, non-convex optimization problem into a convex one and thus significantly reduces the training time, see Reference 19 for more details. The approach first tries to describe all possible functions by a GP. Second, after determining the mean and covariance function, the GP can be updated by computing the conditional distribution based on currently measured data, that is, data describing the current state of the monitored system. When the mean function m y (x) and the covariance function k y (x, x 0 ) are defined, and we have measured the data y ( + ) (e.g., crack lengths) at the locations x ( + ) (e.g., numbers of cycles), we can compute the conditional distribution (e.g., over the crack length) at the location x (e.g., a future number of cycles) by with m yj + (x) = m y (x) + k y (x, x ( + ) ) ðk y (x ( + ) , x ( + ) ) + s 2 y IÞ À1 y ( + ) À m y (x ( + ) ) À Á ð7Þ where s y is the observation error. The conditional distribution described by the conditional mean m yj + (x) and variance k yj + (x, x) is then our prediction at x. We can, of course, evaluate the conditional distribution at multiple locations x, allowing us to rather predict the entire future degradation trajectory than just a single point in future. By splitting the approach into a training and an updating process, the method enables predicting the degradation trajectory rapidly since only the conditional distribution (Equations (7) and (8)) must be evaluated.

Warped GPs
By modeling the system output with a GP, the output is assumed to be normally distributed-presuming it to be defined from negative to positive infinity. This assumption, however, is often not valid as, for example, crack lengths, damage indices, sunlight time, etc., are only defined in the positive domain. For a better treatment of data that is not normally distributed, Snelson et al. 33 introduced warped GPs. The idea is to transform the observed space y onto a latent space z by a so-called warping function c u (y) = z which usually has some free parameters u. The GP is then modeled in the latent space z as where m z, u (x) and k z, u (x, x 0 ) are the mean and covariance function in the latent space z, respectively. For predicting the function values of the observed space y, the predictions of z have to be transformed by the inverse warping function c À1 u (z). Some researchers have already used warped GPs, for example, to predict the power supplies of wind turbines. In contrast to wind speeds, power supplies cannot be assumed to be normally distributed due to the nonlinear correlation between wind speed and power. 34 Therefore, the authors of [35][36][37] utilized warped GPs to predict the power supplies of wind turbines. They used a sum of tanh as their warping function and proved the approach's usefulness on real data. Moreover, Mateo-Sanchis et al. 38 applied warped GPs to oceanic content data. They predicted the oceanic chlorophyll content from multispectral data and concluded that warped GPs outperform standard GPs. Again, a sum of tanh as the warping function was used.
However, if the inverse warping function is not available in closed form, which is the case in the beforementioned papers, additional complexity arises from numerical approximations. 39 Therefore, one can use, for example, the Box-Cox transformation 40 as the warping function. For example, Rios and Tobar 39 utilized a slightly adjusted type of the Box-Cox transformation function and showed its effectiveness on real data, enforcing their predicted yearly sunspot numbers to be strictly positive. A similar application of warped GPs was presented by Gonc xalves et al. 41 The authors estimated future sunspot numbers and enforced their predictions to be positive by using an integrated softplus function in their warping function.
Even though these studies show that warped GPs are able to deal with non-normally distributed data, researchers tend to use them rarely. This might be because the warping function introduces additional parameters to the modeling task. These parameters must be optimized in addition to the mean and covariance function parameters. Therefore, one entailed problem is the arising computational complexity by determining not only the mean and covariance function parameters but also the ones of the warping function. Another problem is that after introducing a warping function, an optimizer might find, in fact, a different solution. But the result is not necessarily better, as minimizing the NLL is, in general, a non-convex optimization problem. In order to remove the mean and covariance function parameters, we can derive these functions from previously gathered trajectories. In this way, we significantly reduce the computational complexity and additionally integrate prior knowledge into warped GPs.

Approach
In order to exploit the advantages of warped GPs and the use of basis functions for inferring the GP model quickly from previously collected data, both approaches are combined in the following. For doing so, it is assumed that the warped realizations f (j) z (x) can be approximated by a linear combination of p linearly independent basis functions f z (x) = ½f (1) z , :::, f (p) z > in the latent space z. The warped trajectories fx (j) , c u (y (j) ) = z (j) g can therefore be represented by a weighted sum of basis functions as and the GP in the latent space as Note that the GP model is independent of any free parameter u. However, by introducing a warping function, free parameters, which need to be optimized, are integrated in the formulation. In order to determine the free parameters, the NLL in the observed space p y is minimized. If the warping function is strictly monotonic, the optimization problem can be stated as with the given probability density function in the latent space where n j is the number of data points of trajectory j, and The gradient with respect to the parameter u q is given by By transforming the observed data into the latent space, the observation error is warped too. A constant observation error in the observed space s y can be approximated in the latent space by This approximation is particularly accurate if the observation error s y and the second derivative of the warping function at y are rather small. Considering this nonconstant observation error in the latent space, a weighted least squares regression is applied to determine the weights of the basis functions and Now, the mean and covariance function can be determined with Equations (20)-(22) (fitting coefficients) and 12-13 (mean and covariance function) within every iteration step of the warping function parameter optimization. The result of the presented approach is an optimized warping function and a determined GP in the latent space. The observation error s y can be approximated bŷ Afterwards, the conditional distribution in the latent space can be computed by substituting m y (x), k y (x, x 0 ), y ( + ) and s 2 y I for m z (x), k z (x, x 0 ), z ( + ) and diag s 2 z (y (j) ) in Equations (7) and (8).

Application to an infinite plate
As a first example, we examine the proposed method on an academic example, a pre-cracked infinite plate, for which the underlying formulas and solutions are known.

Data generation
Considering an infinite plate with a centered crack, the range of the stress intensity factor (SIF) DK I can be expressed by where a is the crack length and Ds ' the remotely applied stress range. With Paris' law where C and a are material parameters, the crack growth rate da=dN can be quantified. With the initial condition N 0 = 0 and an initial crack length a 0 , the differential equation in Equation (25) can be analytically solved by where N is the number of cycles. For this example, 50 trajectories are simulated by setting Ds ' = 48:26 MPa, a 0 = 9310 À3 m, a = 2:9, and sampling C from a normal distribution with m C = 8:7096310 À11 and s C = 1:519310 À11 (C with ½da=dN = m=cycle and ½DK I = MPa ffiffiffi ffi m p ) for each trajectory. Often, researchers assume C to follow a log-normal distribution. A lognormal distribution approaches a normal distribution for m C s C ! '. In our case, m C s C = 5:73 which approximately leads to a normal distribution. This is why we can assume C to be normally distributed. In order to represent the measurement noise, an observation error s y ;N (0, 0:16310 À6 m 2 ) is added to the computed crack lengths. Figure 1(a) displays the crack growth trajectories already split into training and test data.

Training of GPs
First, a GP without the use of a warping function is trained on 35 trajectories of the simulation data representing previously collected data. A set of polynomial basis functions f y (x) with orders 0 to 5 is chosen, resulting in a polynomial mean and covariance function of order 5, see Equations (2)-(4). The determined GP with its mean function and symmetric 95% credible region is shown in Figure 1(b). The figure reveals that the credible region assumes negative values, which is nonphysical since crack lengths can only be zero or positive.
In the second case, the warping function in Equation (10) is considered. The free parameter u 1 is optimized with respect to Equation (15). In this case, two polynomial basis functions in the latent space f z (x) with orders 0 and 1 are chosen since the inner part of Equation (26) represents a GP with straight lines. This is because the parameters a, Ds ' and a 0 are constants, N is the variable, and C is normally distributed. The optimization of the warping function should therefore lead to the inverse of the outer exponent such that the latent space is modeled by a GP with straight lines. In this example, the optimizer leads to a value ofû 1 = À 0:4489 with a relative error of 0.25%, which is a close approximation of the optimal solution. The difference results from approximating the observation error in the latent space with Equation (19). Considering the trajectories without the added noise leads to the analytical solution.
The resulting GP in the latent space and the corresponding warped GP, which is mapped to the observed space by the inverse warping function, are shown in Figures 1(c) and (d), respectively. Figure 1(d) shows that the credible region of the warped GP assumes only positive values. Figure 1(c) additionally shows that the combination of the warping function and the choice of polynomial basis functions with orders 0 and 1 leads to the desired solution. During the optimization process, the optimizer tries, on the one hand, to warp the data such that the trajectories follow a normal distribution in the latent space. On the other hand, the optimization tries to straighten the simulated data in the latent space to fit the presumed basis functions. Figure 2 shows another effect related to the transformation, which we mentioned at the end of the Approach section. The constant observation error s y , see Figure 2(a), is warped too and varies over N in the latent space, see Figure 2(b). For higher cycle numbers, the error s z decreases. The proposed approach takes this into account by applying weighted least squares regression and approximating the warped observation error s z according to Equation (19).

Condition GPs on current crack length data
In the next step, we gradually condition the GPs on currently observed data to update the prediction for an unseen trajectory according to Equations (6)- (8). This is done for the entire test set, even though only one test trajectory is depicted here. Figure 3 shows the GP's prediction of the longest test trajectory (indicated in red) conditioned on currently observed data. Since the monitoring process is continuous, not all measurements are available from the beginning. The currently available ones are shown in black. In both cases (standard and warped), the credible regions narrow down with more available current data. Figure 3(a) reveals that the standard GP initially updates the mean function so that it becomes negative. This is because the current data has a relatively flat trend which, in combination with the assumed basis functions, results in such predictions. By comparison, the warped GP in Figure 3(b) shows strictly positive predictions. The warped GP is conditioned in the latent space, and the results are afterwards mapped to the observed space by the inverse of the warping function, leading to strictly positive predictions.
In order to compare the two models, the mean negative log-likelihood (MNLL), the mean absolute error (MAE), and the root mean squared error (RMSE) of the test set are quantified. The crack lengths predicted at the trajectory's last number of cycles for each conditioning step are compared to the latest realized crack length for all metrics. The results are listed in Table 1. For all metrics, the warped GP performs better than the standard GP as it predicts strictly positive crack lengths and resembles the optimal solution (GP with straight lines in the latent space with u Ã 1 = À 0:45) closely. The advantage of warped GPs is particularly apparent for long trajectories. One reason for this is   that long trajectories have a relatively small slope at the beginning.
As we have seen, utilizing the Box-Cox transformation for warping our data leads to approximations close to the analytical solution for the infinite plate example. However, using this exact type of transformation function results in imaginary numbers if u 1 is not an integer and y is negative. Additionally, if u 1 is an even number, the warping function becomes non-monotonic over y 2 R, violating our assumption.
As presented earlier, we first compute our predictions in the latent space and second map them by the inverse warping function to the observed space. Since the inverse of the Box-Cox transformation reads we must also ensure that z ø À 1 u 1 for u 1 .0 and z< À 1 for u 1 \0 (both for u 1 6 ¼ 0 and u 1 6 ¼ 1). In the infinite plate example, we do not encounter any of the beforementioned problems as our data is strictly positive (y.0) and our predictions in the latent space comply with the requirement z< À 1 u 1 . These constraints limit the applicability of this specific type of Box-Cox transformation. Since the Box-Cox transformation can be useful even in situations where no power transformation can produce normality exactly, 42 it is worth adjusting it. Bickel and Doksum 43 presented the modified Box-Cox transformation function which was also used by Rios and Tobar. 39 This modification leads to strictly monotonic warping functions over y 2 R also if u 1 becomes an even number. Yet, for u 1 \0, the modified Box-Cox transformation function results in a discontinuity at y = 0. Therefore, we must constrain u 1 to be positive, ensuring our warping function complies with the assumptions made for our approach. Ultimately, the infinite plate example reveals the great predictive capabilities of warped GPs. Since the GP is modeled in a latent space, the warping approach is more flexible than standard GPs. The approach enables the prediction of non-normally distributed trajectories, which is especially useful for PHM problems due to their non-negative nature. By contrast, since standard GPs rely on a normal distribution, the possible functions of their conditional distribution can become negative. Yet, one drawback of warped GPs is that the parameters of the warping function need to be optimized, leading to greater computational effort.

Application to an aerospace structure
The method proposed in the Approach section is also applied to predict the crack growth in an aluminum panel that resembles a lower section of a civil aircraft wing. The structure is made of the aluminum alloy Al 2024-T351 with Young's modulus of E = 72, 000 MPa and a Poisson's ratio of n = 0:34. It is 1,920 mm long and 570 mm wide and has an elliptical armhole in its center with a length of 135 mm and a width of 75 mm. Around the armhole are 16 holes with a diameter of 4 mm. The specimen is shown in Figure 4(a).
The armhole in the center of the structure is usually covered by a lid that is fixed on the smaller holes around it. The armhole allows reaching into the wing to inspect the inner part of the structure with an endoscope. This aerospace structure is prone to fatigue   CAD: computer-aided design cracks, and it is used to showcase the developed method on warped GPs. The present section is divided into five parts: First, we describe the experimental setup of the executed fatigue tests. Second, the simulation of multiple crack growth trajectories for generating training data is explained. Third, the generated trajectories are used for training a warped GP according to the proposed method. Fourth, we show how to estimate the current state of the structure, that is, the current crack length, based on measured strain data. And fifth, future crack lengths are predicted by using the trained warped GP and the estimated current crack lengths.

Experimental setup
During the fatigue test of the aerospace structure, the load was applied by a hydraulic cylinder at an angle in order to represent the shear stresses in a wing resulting from twisting, see Figure 4(b). The loading program is based on four different flight types, A, B, C, and D, to realistically simulate the forces acting on an aircraft wing, see Figure 5.
They consist of 230, 190, 114, and 146 load steps, respectively. The flight types are ordered randomly for the first 100 flights and repeated consecutively afterwards. The occurrences of the different flight types, A, B, C, and D, in the first 100 flights are 55, 15, 20, and 10 times, respectively.
Two equally manufactured specimens, P02T01 and P03T01, were tested with the same loading program. Several sensors were attached to the panel to monitor the structures. They were predominantly positioned according to the method described in Reference. 44,45 According to this method, the change of strain regions due to different possible cracks are evaluated with FE analyses. Based on a certain requirement for the change of strain that can be detected by the applied sensors, the positions that satisfy this requirement for every possible crack are determined. Therefore, the method ensures that the deployed sensors are able to detect the occurring cracks. In total, two single strain gauges and three strain rosettes were attached to specimen P02T01 and two single strain gauges and four strain rosettes to specimen P03T01, see Figures 6(a) and (b). For each flight, the strains were measured at every load step.
The experiments were run until final fracture of the structure, see Figure 7. During the experiment, the structure was regularly inspected by test engineers in order to detect cracks and measure their lengths.

Generation of training data
The method proposed in the Approach section enables integrating prior knowledge from previously collected degradation trajectories into GPs. As large structural fatigue tests are usually carried out only once, previously measured degradation trajectories are missing. However, due to analytical equations and FE analyses, much of the degradation behavior is understood a priori. In order to integrate this knowledge into GPs, degradation trajectories are gathered by conducting virtual simulations. First, the fatigue life and second, the crack growth are computed to simulate the   structure's degradation. The entire simulation process is schematically shown in Figure 8.
Fatigue life. First, an FE analysis is carried out to quantify the stresses in the structure. Two local hot spots are found at the small holes 5 and 6 on the side toward the armhole. In the following, we assume the crack to start at hole 6. However, the methodology can also be extended to consider cracks starting from multiple spots by modeling a mixture of GPs. By applying rainflow counting and the Haigh diagram to the computed stress at hole 6, the stresses for the entire loading program can be mapped to amplitude stress blocks with a constant stress ratio of R = À 1:0. Using Miner's linear damage accumulation rule and the 50% S-N curve corresponding to the structure's material, the median number of flights after which a crack will occur can be determined. Note that we denote the number of flights as N f . In order to quantify the uncertainty of the crack initiation, a material corresponding scatter parameter s = 0:197 of the S-N curve from Reference 46 is used. The resulting log10 normal distribution and the S-N curve are shown in Figure 9. Now, the number of flights for an initial crack N f, 0 can be sampled from this distribution.
Crack growth. Second, the crack growth in the structure is evaluated. To quantify the relationship between the SIF and the crack length a, multiple crack computations are evaluated using the extended finite element method (XFEM). [47][48][49] The crack is assumed to first propagate toward the armhole (crack length a 1 ) and then toward the edge of the structure (crack length a 2 ), see Figure 10.
In total, 382 XFEM computations (static analyses), see Figure 11(a), with different crack lengths are evaluated to quantify the relationship between a and K I . Figure 11(b) reveals a scatter of the simulated data mainly toward longer crack lengths (a.30mm). The scatter results from numerical errors due to larger elements further away from the armhole. As we do not want to consider these errors, we model only the general relationship and not the scatter/distribution of the data points. Two separate NNs, for a 1 and a 2 , with  two and four neurons (one hidden layer) are trained to map the crack length onto the SIF K I (a), see Figure  11(b).
In this paper, we define the total crack length a as a = a 1 , i fa\10mm As Pfingstl et al., 44 an initial crack length of a 0 = 0:635mm which is, according to Ryschkewitsch, 50 the smallest crack length detectable by eddy current testing, is assumed. The crack growth can then be computed by the Paris law of Equation (25) with In order to reduce computational time, the crack growth computations are simplified by applying rainflow counting 51 to the load steps of 100 flights to compute the load ranges and using 1=100 of their frequencies for each flight. Furthermore, we assume K I, min = 0 and that the crack length is constant during one flight. According to Virkler et al., 52 crack growth is subject to great uncertainties. Therefore, C is assumed to be a random variable that is normally distributed with m C = 8:7096310 À11 and s C = 6:5680310 À12 (determined from the crack growth data published by Virkler et al. 52 with C with ½da=dN = m=cycle and ½DK I = MPa ffiffiffi ffi m p ), and the material parameter a is set to a = 2:9 according to Spencer et al. 53 Moreover, the error of the load range is varied according to a normal distribution with s dF = 5:0% for each trajectory. For computing different crack growth trajectories, a set of parameters (N f, 0 , C, dF) is sampled for each trajectory. Figure 12 shows the computed degradation trajectories. Note that the step in each trajectory results from the two different cracks, a 1 and a 2 , and the added diameter of the hole after crack 1 reaches the armhole. We can also see the huge scatter of the initial starting points of the trajectories that results from the S-N curve distribution shown in Figure 9. Furthermore, the curvatures of the trajectories vary due to different values for C and DF. Figure 12 also shows that no observation error is artificially added to the simulation data.

Training of GP
A warped GP can now be trained on this simulation data set according to the method proposed in the Approach section. By using simulation data for training, however, one has to be sure that the simulations are correct. When there are doubts about whether the simulations represent the experiment well, they should be verified based on experiments before using them as the training set.
For this example, the modified Box-Cox transformation of Equation (28) is used as the warping function and a polynomial of degrees 0 and 1 (intersection and slope) as basis functions. Therefore, the optimizer tries again to achieve not only a normal distribution in the latent space but also straight lines. The optimized  solution isû 1 = 4:46310 À8 which is close to a log transformation. And for a log transformation, the distribution in the observed space is completely defined on the positive domain. Thus, the predicted crack lengths will stay positive after transferring the values from the latent space to the observed space. Moreover, by using polynomial basis functions of degrees 0 and 1, we ensure that the resulting predictions are strictly monotonic in the observed space. Since no observation error is present in the simulation data, a non-weighted least squares regression is applied to determine the basis functions' weights. The modeling error in the latent space s z, m is assumed to be constant and is approximated by taking the square root of the average of all squared residuals in the latent space. Figures 13(a) and (b) show the training data and the determined warped GP in the observed and the latent space. It can be seen that the trajectories are almost straight lines in the latent space. Moreover, the mean function and the credible region in the observed space assume only positive values which is in agreement with the physics. Figure 13(a) depicts the prediction before any data of the monitored structure is available.

Estimation of current states
In order to update the trained GP, the current state of the system has to be observed. As the present GP is based on crack growth data, the current crack length must be determined. In the present study, the applied strain gauges are used to determine the current crack length. With Bayes law, the crack length a can be inferred from the measured strain e SG by where p(a) is the prior distribution and p(e SG ja) the likelihood. For measuring multiple strains and assuming them to be measured independently, the probability density function of e SG for s applied strain gauges given a becomes Additionally assuming the prior p(a) to be uniformly distributed within a 0 and a c , the probability density function of a given all strain measurements becomes In order to be able to use Equation (34), the likelihoods p(e (l) SG ja) with l 2 1, :::, s that incorporate how the measured data e SG depends on the current crack length a have to be known. Therefore, the strains of all XFEM computations (see the subsection Generation of training data) are evaluated for each sensor position to quantify the relationship between the crack lengths and the strains. Then, the results are used to fit an NN for each strain gauge position. Since we only consider strictly monotonic strain gauges, in total, seven NNs e (l) NN (a) are trained. By assuming a normally distributed measurement error of s e = 200 mm=m, the likelihoods p(e (l) SG ja) are completely defined and can be evaluated.
In order to cancel out the bias term which might emerge from the difference between the FE analysis and the real measurement, only the relative change of strain due to a crack  The figures also show that if the crack grows, the strain becomes smaller at the position of sensor Q3A1 and larger at Q3A2. Furthermore, Q3A1 indicates a very sensitive behavior for small cracks and Q3A2 for larger ones. This can be explained by looking at Figure 6(b). The figure shows that the crack occurs close to position Q3A1 and cuts the load path so that smaller strains are measured. By contrast, position Q3A2 lies on the opposite side of the crack. If the crack starts growing, no significant change of the strain is measured. However, once the crack is long enough, the load path is shifted to the side of position Q3A2, increasing the measured strain.
The relative change of strain for every flight is computed by the relative change of the strain-force slope shown in Figure 15. While the change of the slope for position Q3A1 can be well distinguished for smaller numbers of flights (smaller cracks), position Q3A2 is better for indicating the crack lengths at larger numbers of flights (larger cracks), see Figure 15 which is used instead of p(e (l) SG ja). The current crack length is determined bŷ and its variance by  show the crack lengths inferred from the measured strains and the corresponding crack lengths visually inspected by test engineers for both specimens. It can be seen that the inferred crack lengths for P03T01 closely match the inspected ones (R 2 = 0:926), whereas the match of the P02T01 trajectory is not so close (R 2 = 0:656). A big step in the inspected data of specimen P02T01 can be seen. It is likely that the test engineers did not detect the crack on the outer part of the small hole straightaway.
As the P03T01 specimen was tested after P02T01, the test engineers were already familiar with the type of structure and were, therefore, able to measure the crack lengths more accurately. Still, both crack growth behaviors are very similar to the inspected ones, and are close to the simulations in terms of locations, numbers, and crack growth rate.

Prediction of future states
Although Figures 16(a) and (b) show the entire trajectories of the inferred crack lengths, during the test, the crack length is only partially known, that is, from N f = 0 up to the current number of flights. After each new flight, the current crack length and its uncertainty can be determined and used to compute the conditional GP leading to an updated prediction. For doing so, all inferred crack lengthsâ and their estimated observation errorsŝ a up to the current flight cycle are transformed to the latent space using Equations (10) and (19), respectively. In this case, the total squared error in the latent space s 2 z is assumed to be the sum of the squared modeling error s 2 z, m and the transformed squared observation error s 2 z, a (s 2 z = s 2 z, a + s 2 z, m ). Furthermore, crack sizes smaller than the initial crack length of 0:635 mm are ignored. Then, the conditional GP is computed in the latent space using Equations (6)- (8).
The updated prediction is transformed to the observed space using the inverse warping function. Figures 17(a) to (h) show the updated predictions for specimens P02T01 and P03T01 at different time states. Initially, the GP's prediction is entirely based on the knowledge gained from analytical equations and FE analyses. Once a crack length greater than 0:635 mm is inferred from the strain data, the mean function starts to change and the credible region narrows down, leading to a more accurate and precise prediction. Since the GP is defined by a set of polynomial basis functions with orders 0 and 1, the step due to the two different crack regimes is not apparent in the prediction. As in the infinite plate example, the predictions are strictly positive again, which complies with the physics.

Warped GPs
As demonstrated for an infinite plate and an aerospace structure, the proposed approach on warped GPs can even handle data that is not normally distributed. In both cases, introducing a warping function leads to predictions that assume strictly positive values. This is in agreement with the physics since crack lengths can only be positive. The approach reproduces the analytical solution for problems without an observation error and leads to a close approximation (\0:5%) for cases where an observation error is present. However, by using warped GPs, free parameters u are introduced. These need to be optimized by minimizing the NLL, which is, in general, a non-convex optimization problem requiring increased computational effort. In contrast to existing warped GP approaches, no GP parameters of the mean and covariance function need to be optimized by minimizing the NLL since the GP is determined by solving a linear regression problem. By modeling the GP with a weighted sum of basis functions therefore allows integrating prior knowledge quickly into GPs.

Simulations
Prior knowledge in the form of degradation trajectories is often rare, especially if the mechanical system is large and expensive. Using analytical and FE-based simulations can produce valuable information that can be incorporated into GPs by the presented approach, see section Application to an aerospace structure. By splitting the approach into two parts, (1) training a GP on previous data and (2) computing the conditional distribution for updating the prediction based on currently monitored data, allows rapid predictions that may be evaluated online. Nevertheless, incorrectly executed simulations resulting, for example, from the use of incorrect parameters, can lead to weak predictions. Therefore, the simulation and its parameters, as well as their uncertainties, should be well known. To avoid overconfidence, it is better to assume overly large variances than variances that are too small since there might be sources of uncertainties that are not known in advance.

Hidden state of the system
After the GP is defined by its mean function and covariance function, it can be conditioned on current data. The current state of the system, however, is often hidden. Therefore, the present paper shows how to infer the crack length from strain data. Based on the coefficient of determination, the resulting crack lengths using Bayesian inference match the crack lengths measured during inspections with R 2 = 0:656 and R 2 = 0:926 accuracy for the first and second aerospace specimens, respectively. The approach enables continuous monitoring of the crack length and its uncertainty. By using this information, the GP's predictions can be continuously updated.

Conclusion
The present paper proposes a PHM algorithm that is based on GPs. It is successfully applied to an infinite plate and an aluminum aerospace structure in order to predict crack growth. The established model predicts not only the crack length for every future time step but also its credible intervals. By describing the crack length as a random variable, different credible regions, for example, a 99% or 99:999% region, can be computed, which allows the user to ensure different levels of safety. The proposed algorithm is based on warped GPs. Using warped GPs reduced the MAE by 53.6% and the MNLL by 32.2% for the infinite plate example investigated in this paper. It also leads to strictly positive predictions for crack lengths, which is in accordance with the physics. Furthermore, the proposed approach can quickly integrate prior knowledge in the form of several previously generated degradation trajectories. Since prior knowledge is often only available in terms of analytical equations and FE analyses, the present paper shows how to generate them for an aerospace structure prone to fatigue cracks. In general, the simulations must be trustworthy and might be verified before using them. Therefore, the approach is currently limited to crack growth predictions for isotropic materials. In future research work, it would be interesting to apply a similar approach to composite structures.
In order to update the GP's prediction, the current crack lengths are inferred from strain data, which agree well with the visually inspected ones. Ultimately, the estimated and predicted crack lengths can be used to compute the probability of failure and therefore to better schedule maintenance tasks.