Long-term fatigue estimation on offshore wind turbines interface loads through loss function physics-guided learning of neural networks

Offshore wind turbines are exposed during their serviceable lifetime to a wide range of loads from aero-, hydro-and structural dynamics. This complex loading scenario will have an impact on the lifetime of the asset, with fatigue remaining the key structural design driver for the substructure, e


Motivation
Recent years have seen a surge in the demand for wind energy as societies attempt to transition to sustainable energy sources.To meet the promised lowered costs, more cost-effective designs are being targeted.These cost-reductions are in part achieved through the increased average wind turbine size (from 4.8 MW in 2016 to 11.2 MW of average power rating of ordered offshore turbines by 2021 [1,2]).In parallel, recent insights have identified (unintended) conservatism in older offshore wind farms, and updated design procedures (e.g.PISA [3]) have allowed for more accurate predictions of structural behaviour.
While offshore wind turbine design has seen much evolution, fatigue has remained a key design driver [4].During design substructure dimensions are optimized for fatigue life to match the project's intended lifetime (typically 20-25 years) as close as possible.However, in older designs, in particular those that are now reaching their intended half-life, a structural reserve has been found.Real    Lebesgue space norm of -dimension suggest that fatigue life consumption is less than those computed during design [5].Motivated by a quest for greater cost-effectiveness, this observation can optimize operation and maintenance (O&M), which amounts to nearly a third of global costs [6].It has been demonstrated  (𝐲) Cumulative distribution function  2  Coefficient of determination   Residual   TW/TP inner radius at strain gauge location   TW/TP outer radius at strain gauge location x Real number vector that the initial capital expenditure required for SHM is quickly offset by the reduction in operational expenditures [7], complemented with the possibility of lifetime extensions or optimized usage.
In more recent projects this structural reserve has seemingly been diminished as the design codes were optimized.As a result, operators want to keep tabs on the fatigue progression of their assets to assure intended project life can be reached.This, in particular when external factors interfere with the normal operation of the assets, e.g.shutting down a turbine to meet demands from the transmission system operator, but that might be detrimental to expected fatigue life.In absence of any reserve, to monitor fatigue life has now become a operational concern.
It should, however, be noted that fatigue monitoring is not a substitute, but rather a complement, of fault diagnosis/prognosis [8,9], wherein the structural fatigue health monitoring allows operators to make long-term decisions regarding their assets' predicted life and fault prognosis aids operators identifying problematic components and plan maintenance ahead of time.

State-of-the-art in fatigue monitoring
The accurate monitoring of the remaining useful lifetime (RUL) for every turbine within a given wind farm can thus help wind farm operators of both older and newer farms to take informed decisions regarding the operation of their assets.
Historically, this assessment has been performed by monitoring the assets, either through physical inspections [10] or through strain gauges, part of an SHM system, installed at the substructures' interface with the turbine, capable of capturing the turbines' fatigue load history [11][12][13].However, farm-wide physical inspections are costly and workplace-accident prone.As for strain gauges, these remain labour intensive to install and prone to failure within a wind turbine's operational life.Installing and maintaining strain gauges on the substructure of every wind turbine is considered cost-ineffective.As a consequence, real-world examples known to the authors have less than 10% of turbines in a farm equipped with a SHM system which includes strain gauges.
Several researchers have therefore previously suggested to rely instead on the data obtained from the supervisory control and data acquisition (SCADA) system installed in every turbine [14][15][16].Their strategy typically comprises in training models to predict fatigue loads as measured on turbines with a SHM system from SCADA data.As SCADA is available on every turbine, one can use said models to estimate fatigue life on every location [17,18] .[18] performs a comparative study between different sensor setups for fatigue estimation, considering varying quality of SCADA and accelerations, results show a strong improvement in the ability to estimate fatigue rates when tower acceleration data was considered.This result may come to no surprise as both tower accelerations and fatigue cycles follow the same equations of motion of the OWT [19].Currently, the cost of farmwide deployment of high quality accelerometers has dropped far below that of a farm-wide deployment of strain gauges or inspection-based monitoring.Moreover, as accelerometers are a reliable technology, it does not requires periodic inspections carried out by technicians (reducing workplace accident probability).
In parallel, an increased use of machine learning (ML) algorithms and other data-driven techniques has been noted, producing promising results.These have included proofs of the reliability of data-driven fatigue estimators serving the entirety of the turbine's operational life [20], the quality of SCADA-based artificial neural network (ANN) models in different flow conditions for blade flap-and edgewise bending moment fatigue load estimation [21,22], the use of ANNs for blade root flapwise fatigue loads and their connection to turbine failure [23], Gaussian process regression time-series modelling to evaluate the influence so-called EOPs (Environmental and Operational Parameters) have on the features of the vibration response of the wind turbine blades [24], the use of conditional variational auto-encoders to estimate the probability distribution of the accumulated fatigue on the root cross-section of a simulated wind turbine blade, enabling longterm probabilistic deterioration predictions based on historic SCADA data [25,26] or of graph neural networks [27], the use of long-term SCADA from onshore wind turbines to estimate the tower fore-aft bending moment [28], the use of Gaussian processes for damage detection [29] and the use of SCADA and acceleration data to predict the fatigue loads at tower-transition piece on a ten-minute level and provide insight into feature selection, the performance of different sensor setups and a tentative farm-wide employment [18].
A caveat of the aforementioned research is that predictive quality of the models is assessed on typically a ten-minute basis.I.e. a model quality is evaluated by its ability to predict ten-minute damage rates.However, in offshore wind, the evaluation of fatigue life is not primarily performed on a ten-minute basis, but rather by assessing the total accumulation of the fatigue damages over the project life and quantifying residual life [5,30].The present contribution will show that the ability to predict ten-minute fatigue rates well, does not guarantee neither an accurate nor a conservative outcome when these predictions are accumulated into a fatigue life estimation.To resolve this caveat the present contribution will introduce physics-informed machine learning models (-ML) [31] to significantly improve the accuracy of fleet-wide fatigue life predictions by incorporating damage accumulation into the ML model itself.Simultaneously, conservatism is introduced into the predictions by using the properties of the logarithmic function.
For this objective, data collected from XL monopile foundations is utilized; this represents a radical departure in terms of the structure's dynamics when compared to smaller monopiles or jackets, with an increased importance of side-to-side loading and loading under idling conditions [32,33].As such, an added focus will be given to the performance of the proposed methodology's in diverse operational conditions.

Physics-informed machine learning
The recent increase in monitoring data availability has led to a predominance of data-driven approaches in SHM, the inability for these approaches to properly extrapolate or to explicitly take into consideration specific physical relationships of the underlying problem limit their applicability.This as lead to the rise of physics-guided machine learning (Φ-ML) approaches.
In Φ-ML, hybrid models are generated which are not completely data-driven and include some underlying physical knowledge of the problem at hand.As good as purely data-driven models may fit the training data, some predictions may be physically nonsensical or inconsistent.This can be due to extrapolation (for which ML is not usually adequate) or observational biases that may lead to poor generalization [31].Therefore, a drive for fundamental physical laws integration and domain knowledge has recently risen, wherein ML models are taught the governing physical laws which provide 'informative priors' in the form of strong theoretical constraints and inductive biases on top of the observational ones [31].
More generally, one can understand Φ-ML as corresponding to a series of methods halfway in the spectrum between 'white-box models' (purely physics-based models, as partial differential equations or finite element methods) and 'black-box models' (where the model structure is purely data-driven).Therefore, several authors have spoke of 'grey-box' or hybrid models [34].
As mentioned, there is a spectrum of approaches between physicsprincipled models and data-driven models, from residual models and hybrid architectures to physics-guided learning.A quick subdivision of these diverse approaches can be performed by grouping these by the type of bias they incorporate: Observational biases, where reliable, high-quality data that embodies the underlying physics (through sensors) and carefully crafting input augmentation procedures (such as metrics, statistics and nonlinear transformations); inductive biases, where prior assumptions are incorporated by tailored interventions to an ML model architecture, such that the predictions sought are guaranteed to implicitly satisfy a set of given physical laws, typically expressed in the form of certain mathematical constraints.Albeit the most principled Φ-ML implementation, it often leads to complex implementations of difficult scalability [31]; and learning biases -the focus of this contribution -which consist in appropriately choosing loss functions, constraints and inference algorithms that modulate the training phase to explicitly favour convergence towards solutions that adhere to the underlying physics.Therefore, a flexible soft constraint platform which approximately satisfies the underlying physics is introduced, incorporating physics-based biases, e.g.physics-informed neural networks (PINNs [35]), as will be discussed in this contribution.
This approach is particularly relevant for our application, as we are attempting to have similarly acceptable results for ten-minute predictions and the long-term accumulation of these predictions.Additionally, when we consider a fleet-leader model (i.e. when a model trained on one turbine is considered representative of the entire fleet and used for farm-wide estimation [17]), the successful transferability of a model trained on an individual turbine will depend on the similitude of the input variable space throughout the farm.The additional stability/robustness added by Φ-ML may allow to extend machine learning into extrapolation scenarios.

Article structure and main contributions
The current article presents as its main contributions the development of a data-driven methodology which is able to tackle different timescales for DEM estimation (10-minute and long-term accumulation) through physics-guided learning by implementing a custom loss function.The structure of the article is divided into five individual sections, with the current section consisting in the introduction.In Section 2 the data and fatigue calculation methodology are presented, firstly by focusing on the data acquisition systems (Section 2.1) and after by introducing the mathematical background behind damage equivalent loads and fatigue life estimation (Sections 2.2 and 2.3).Section 3 describes the methodology for long-term fatigue estimation using physics-informed neural networks, with a greater focus on the loss function (Section 3.1).Section 4 presents the results of this contribution along with its discussion.More specifically, Section 4.1 focuses on the ten-minute level performance, Section 4.2 focusing on the longterm performance and Sections 4.2.1 and 4.2.2 focus on identifying the sources of model shortcomings.Section 4.3 presents the performance for different operational conditions, Section 4.4 for different timescale accumulations and Section 4.5 show the long-term fatigue damage accumulation.Finally, in Section 5 the conclusions and future work are discussed.

Sensors and data
In the present contribution, data from a measurement campaign on real-world offshore wind turbines with XL monopile foundations (9.5 MW turbines and water depths of up to 36 m) located in the Belgian North Sea was used.This is highly relevant, as the vast majority of offshore wind turbine foundations are monopiles (up to 80% [36]) and the increase in size of wind turbines, enabled by improvements in manufacturing processes, has led to the existence of very large monopiles -XL monopiles.A lower threshold of 8 m diameter (capable of supporting a 7-8 MW turbines) can be set for this term [37].
The growing size of turbines has led to a novel paradigm for the industry, as the natural frequencies of turbines with XL monopile foundations are getting increasingly closer to the wave frequency spectrum [38].This, combined with a larger surface area for hydrodynamic loads and deeper locations, has led to a greater impact of the wave loading on fatigue [32].This in turn has resulted in the increased importance of idling operational conditions and the inclusion of sideto-side (SS), or cross-wind, damage calculations.As all of this can be linked to the structural dynamics of the turbines, acceleration data has become paramount to understand the broader fatigue picture [33].
The measurement campaign was based on data collected by three types of sensors: 6 axial strain gauges used to calculate the bending moments (  ,   ) and installed along the inner circumference at the interface between the turbine tower and the transition piece (TW-TP), three dedicated tower bi-axial accelerometers installed at three different levels in the bottom, middle and top of the turbine tower and SCADA data collected at nacelle-level.The yaw angle (from SCADA) can be used to convert the bending moments from the strain gauges into FA (  ) and SS (  ) directions, as these are the 'primary' directions of the structure, with dynamics clearly being different between these two primary directions.It should be noted that the data was collected by third-parties, namely the operator when it comes to SCADA and a specialized monitoring company for the monitoring data.Throughout the monitoring data collection, extensive calibration and quality checks were performed.Some periods had no collected data due to power outage which does not have an impact in this contribution's objectives; however, it should be considered when extrapolating in time.A schematic representation of the setup can be seen in Fig. 1.
The signals of the various sensors are collected at various sampling frequencies and transformed in ten-minute time instances of diverse statistics of the signals.A full description of the various parameters and their original sampling frequency can be found in Table 1.It should be noted that, albeit the sampling frequency for SCADA is 1 Hz, only ten-minute statistics are available (due to data storage considerations).[39] addressed the shortcomings a mean ten-minute SCADA approach might entail, however, the inclusion in this contribution of additional statistics such as minima, maxima and standard deviation serves to rectify some of these by increasing the valuable information stored from SCADA signals.
Three turbines were instrumented, each representing one of the design clusters in the original detailed design of the wind farm.We can see the use of data from three turbines of different design clusters as a step in the direction of the farm-wide implementation of a fleet-leader model.The data collected in these locations was further enhanced by the addition of wave and tidal data, attained from a public Flemish maritime database (Meetnet Vlaamse Banken [40]), with a measuring station in the Belgian North Sea at the Westhinder site.

Damage equivalent moments
The strain gauges installed along the inner circumference at the TW-TP interface level enable the calculation of the bending moments in FA (  ) and SS (  ) directions, through the ten-minute average yaw angle from the SCADA.Given the timeseries of bending moments, one can then employ a rainflow counting algorithm [41,42], which counts the number of cycles within a given stress range.Then, holding the linear damage accumulation hypothesis as true (Palmgren-Miner's rule) [43] and through the employment of the Wöhler exponent (the negative inverse slope of the SN curve) [44], the damage equivalent moments (DEM) are calculated for any ten minute window.A more detailed discussion of this procedure can be found in [45].These DEMs will become the input for a fatigue assessment of the asset and are fairly widely adopted among load engineers to quantify fatigue rates (over e.g.damage).As DEMs are also widely used in design, results can be compared with design results (when available).
The equation for the DEM is given in Eq. (2.1), as defined by [46] (here presented for the stress ranges), wherein  is the inverse slope of the SN curve, or Wöhler exponent [42],   is the number of cycles of a given stress range,   ,   and   , are respectively the TW/TP outer and inner radii at strain gauge locations.For this contribution, and following the farm's design documentation, the value of 5 was used for  and   = 10 7 , a predefined number of cycles. (2.1) The DEMs are calculated for both the FA (  ) and SS (  ) directions using the respective timeseries.
As explained above, the DEM calculation hinges upon the bending moments estimated from the strain gauges' readings.As these are not available farm-wide, our strategy is based on the use of not only SCADA, but also accelerations.Accelerometers have proven to be a far more reliable sensor compared to strain gauges over past SHM projects.Moreover, unlike strain gauges, their installation is fairly simple and post-processing is straight-forward, without the need for temperature compensation or calibration.The strong correlation between strain and accelerations (there is a link between the measured horizontal accelerations and the bending stresses [19], and as seen in Fig. 2) allows us to justify this approach, as the influence of the fatigue damage on the structure appears to also be captured by accelerometers.In Fig. 2(b) we can see how the first order dynamics dominate both the accelerometer's and the strain gauge's signals.

Fatigue life estimation
DEMs are a valuable property when discussing a lifetime assessment procedure.Nonetheless, just calculating these on a ten-minute basis is not enough to actually provide an insight into the fatigue life of an asset.For this, short-term ten-minute DEMs have to be added up.
As seen in the previous section, DEMs translate the damage caused by a dynamic load history into the single-amplitude equivalent load which would cause the same damage.By Palmgren-Miner's rule, we can further combine  equivalent load ranges that have been derived for the same reference cycle number and Wöhler exponent through the -root of the weighted summation of the -power DEM instance [47], as seen in Eq. (2.2).Here   represents the long-term DEM accumulation of  ten-minute DEM instances.In this equation every  ten-minute time-instance DEM represents a damage load with identical occurrence probability of 1∕ [48].
As we can seen, whatever the desired target long-term timescale might be, the combination of DEMs will always re-scale these into the timescale of   (all  instances of   must have the same timescale and have been calculated for the same Wöhler exponent, ).This allows to compare long-term effects of different timescales and easily add these by applying a timefactor.To translate the accumulated DEM (  ) into accumulated fatigue damage (), one must simply re-instate Palmgren-Miner's rule and bring DEMs back into damage, as shown in Eq. (2.3).
where  is the area moment of inertia [13,49].
All of these quantities are known and have been introduced in previous sections apart from the intercept of the SN curve, ā.This value can easily be consulted in the corresponding SN-curve tables used during design [49].In this case, we take the ā value from the SN curve D for seawater environment with cathodic protection, with ā = 15.606.Finally, we can translate the accumulated fatigue damage into the remaining useful lifetime through Eq. (2.4), where   is the number of years the wind turbine has already operated [13].

Long-term fatigue estimation using physics-informed neural networks
In the present contribution, nine months worth of data from one real-world turbine (OWT1) is used to train ANNs in order to predict the DEMs affecting offshore wind turbine structures at TW-TP interface level for FA and SS directions.This mirrors the methodology prescribed in [18].The full methodology can be seen in Fig. 3 and is applied to both FA (  ) and SS (  ) independently.
As can be seen in Fig. 3, variable-space dimensionality reduction on half of the available data from the reference (OWT1) was performed.[50] performed a systematic comparative study of different feature selection algorithms and explains the coalescing into the best performing feature selection algorithms, namely, Borutashap [51] and Recursive Feature Elimination [52], with the latter producing a smaller number of explanatory features, and thus being the selected algorithm.
The features selected for each of the two direction differed.Both FA and SS had accelerations highly represented, respectively relying on FA and SS accelerations.In the FA direction additional selected features were more related to the output of the turbine (such as wind speed, power, etc.), whereas SS fatigue loads are more defined by wave-related parameters (as high frequent wave direction, wave height and tidal level).A more detailed discussion on the use of feature selection can be consulted in previous works [18,53].
The general architecture of the machine learning model is that of a feed-forward artificial neural network (ANN).Using the kerastuner package, three hyperparameter optimization algorithms (Random search [54], Hyperband [55] and Bayesian optimization [56]) are tested, with the best model architecture being obtained with Bayesian optimization [57] with a mean squared error objective using the keras-tuner package [58].The topology of the network consisted in 5 hidden layers, with the number of neurons varying between 32 and 512, rectified linear unit (ReLU) [59] and Gaussian error linear unit (GELU, a smoothing of ReLU using (), the cumulative distribution function of the Gaussian distribution [60]) activation functions.The dropout rate could be chosen from the set  = {0, 0.1, 0.2, 0.3} and the learning rate of the optimizer (Adam The mean squared error and mean absolute error were the monitored metrics.
For training, half of the available data of the reference turbine is bootstrapped and the neural network model trained using a k-fold cross-validation [61] with 5 folds and a 70-30 validation split.A fixed threshold of 100 epochs is given and an early stop callback employed.
After training, the FA and SS ANNs were used to estimate the DEM both for the full dataset of the reference (training) turbine, OWT1, but also for two additional fully instrumented turbines (OWT2 and OWT3).Additionally, the sum of all the 10-min DEL estimations (and measurements) was scaled into a 10-min lifetime DEL (cf.Eq. (2.2), derived from [48]).

Loss functions
In Section 2.2 we have seen how damage equivalent loads on a ten-minute level can be accumulated and re-scaled back in order to compare fatigue damage throughout any timescale by a so-called longterm DEM (  ).While previous research focused on accuracy at a ten-minute level [18], the current contribution prioritizes instead the performance on long-term DEM.This, while attempting to retain accuracy on a ten-minute level and keeping a reasonable degree of conservativeness.Simultaneously an inherently conservative strategy is desired, in which an overestimation of fatigue damage (and therefor an underestimation of RUL) is highly preferred over underestimating fatigue damage.
This multi-timescale (ten-minute predictions, long-term accumulation) is a highly complex problem statement, and a common ML approach will not be able to capture all the dynamic relationships between the many physical variables varying across space and time at different scales [62].Therefore, in order to reflect the focus on longterm DEM estimation, a physics-informed machine learning approach is required, by guiding the learning process of the neural network, as introduced in Section 1.3.
In this section, two loss functions are selected and compared: the mean squared logarithmic error, a commonly available loss function which will serve as the control function, and a loss function, named Minkowski logarithmic error, custom-developed as to contain some physical information inherent to the problem at hand and reflect the priority given to long-term DEM estimation.Both functions employ the logarithmic function, which will inherently favour overpredictions, rendering the models conservative when used for a fatigue life assessment.

Mean squared logarithmic error
The mean squared logarithmic error (MSLE) can be interpreted as a measure of the ratio between the true and predicted values and employs the logarithmic function to the mean squared error loss function (see Eq.
The introduction of the logarithm makes MSLE penalize underestimates more than overestimates, introducing an asymmetry in the error curve.This property is related to the slope of the logarithmic curve [63] and provides the desired conservatism for the current application.

Minkowski logarithmic error
When the individual instances of ten-minute DEMs are aggregated into a long-term DEM, as in Eq. (2.2), we can see this is performed through a weighted Lebesgue, or   , space norm, a class of Banach spaces [64], where  coincides with the Wöhler exponent, .Therefore, the length of the vector  = ( 1 ,  2 , … ,   ) in the -dimensional real vector space R  can be described for   spaces as given by Eq. (3.2) [65].
Thus, for the problem at hand, we can bring any timescale into the 10-min frame of reference through Eq. (2.2), a direct the application of Eq. (3.2) for DEM with  =  = 5.
This equation is the point of departure for our physics-guided loss function, as it is the underlying physical principle which drives the long-term accumulation of DEMs.One could therefore reasonably place a complete focus on the long-term ability of the model to predict long-term, as seen in Eq. (3.3).

(𝐘
However, this approach would be detrimental, as it would mean that during the learning phase instead of comparing the   scaled residuals (and thus focusing on the estimation error), it would compare the residual of the   scaled total (long-term) measurements and estimations, which has no bearing on a ten-minute level.Therefore, if our loss function is to reflect the long-term DEM in a flexible way (i.e.not timescale-dependent), it must also remain able to estimate accurately at a ten-minute level.
Thus, based on the   norm and the logarithm function, and attempting to maintain a ten-minute level prediction accuracy, we introduce the Minkowski logarithmic error (MLE).This loss function can be seen as a extension of the logarithmic loss function to any   metric, also know as Minkowski distance [66].Eq. (3.4) describes this function mathematically, extended for 1 ≤  < +∞ in the -dimensional vector space R  for , Ŷ.For our case the Wöhler exponent ( = 5) coincides with .
This loss function, as will be seen in Section 4, represents a fair compromise between ten-minute DEM and long-term DEM estimation and was implemented in the Keras framework.[67] provides an extensive introductory guide into custom loss functions -including physics-guided learning -for this specific Tensorflow API.A deeper discussion on the Minkowski Logarithmic Error and its mathematical development can be found in [68].

Results and discussion
As described in Section 3, the final architecture obtained through hyperparameter tuning is used during the training of the model using 50% of OWT1's data (of the available 9 months).The final PINN model is used to estimate TW-TP ten-minute DEMs, which are then accumulated into a long-term DEM for each of the three turbines (both FA and SS) using nine months worth of data.

10-minute level performance
We can begin inspecting the model's performance on a ten-minute level by observing a two-day timeseries of the FA predictions with a 95% confidence interval for OWT1 using the Minkowski logarithmic error (MLE) loss function (MLE uses  =  = 5).The predictions using the mean squared logarithmic function (MSLE; control function) have also been added.
In Fig. 4 we can observe how all the measurements fall well within the 95% confidence interval.More specifically, we can see how the predictions are, on average, greater than the measurements.This is a direct result of the inclusion of the logarithmic function in the MLE, as it promotes over-prediction, introducing the desired conservatism to the model.When compared with the predictions using the MSLE control loss function, the overprediction incentive in these latter is not really noticeable.
We can further confirm that the model using MLE is generally overpredicting by plotting the predictions against the measurements (see Fig. 5) by smoothing their bivariate distribution using a kernel density  estimator (KDE) [69], wherein a multivariate gaussian kernel estimates the probability density function (PDF) [70] non-parametrically.In Fig. 5, we can see how the majority of values are rather small when compared with the highest damage equivalent loads, being concentrated in the lower left corner, close to the origin (in light green/white).More interestingly, we can also see how there is a slight upward skew in the PDF distribution in relation to the identity line for   = ŷ (measurements equal predictions).This shows us how there is a constant model overprediction ( ŷ >   ).

Long-term level performance
As mentioned in preceding sections, the major goal of this contribution lies with the accurate characterization of long-term DEMs through physics-guided learning by employing a custom-built loss function.In order to assess this, we can take a look at the long-term (nine months worth of data) results for the three turbines.This can be seen as a proofof-concept of the fleet-leader model through cross-validation, opening the road for a farm-wide implementation.For this, we can inspect Table 2, the errors of TW-TP DEM estimation by direction for 10-min level through the coefficient of determination,  2 and the root mean squared error, normalized as a percentage of the mean DEM value, where  = 100 ⋅ (1∕ ∑  =1 (  − ŷ ) 2 ) 1∕2 ⋅ , but also for the longterm level (  ) for the Minkowski logarithmic error (MLE) compare those control loss function, the mean squared logarithmic error (MSLE).
In Table 2 we can see how the introduction of a Φ-ML approach for the loss function greatly improves the models' performance on the long-term accumulations of Eq. (2.2) (  ), whilst retaining and even slightly improving the ten-minute estimation performance ( 2 , NRMSE).All errors on long-term DEM estimation are well below 3%.These can be considered as excellent within the overall context of the substructure's design.There is usually an allowable error of similar magnitude between the substructure designer's and the wind turbine designer's code.From real-world examples known to the authors, the results present in Table 2 are well below this industry allowable error.The ten-minute level improvement is more noticeable in SS, especially if we consider the NRMSE (4%-5% improvement).Indeed, the MLE loss function has not only meant improvements at 10-min and long-term levels (with the latter improving from up to 9.4% error to less than 3%), but has also prioritized overpredictions (a feature desired in order to ensure some conservatism), represented by a negative   , for OWT2 and 3 FA.

Side-to-side long-term error
However, despite the overall improvement of long-term performance due to MLE, the expressed objective of forcing overprediction (negative   ), has not been attained for all SS cases and for OWT1 FA, with the long-term error being positive despite using the logarithmic in loss function to give incentive to overpredictions.In order to understand this, we inspect Fig. 6a, where we plot the predictions ( ŷ ) against the measurements (  ) for   of OWT 2. Additionally, we identify in purple the region (without balancing counterpart) where the model is severely underpredicting (  ≫ ŷ ).
Here, we can see that the model is overwhelmingly accurate, as the vast majority of values stick closely to the identity line (also given by the  2 value in Table 2 of 0.98).However, there is indeed a region (in purple) where the measured values are much greater than the predicted values.We can additionally see how most of these values of the purple shaded region can be mapped by time-instances where there is a sudden variation in the yaw angle (∕ −1 > 10 • , in orange).
In order to understand the rationale behind this mapping of the yaw transient, we can start by taking a look at Fig. 6b, where the lateral bending moment () is plotted for an individual ten-minute timeinstance located in the underpredicting region.Here, we can indeed see that there are two regions (identified in red), where there is a sudden variation (in this case, a drop) in the lateral bending moment (∇ ≫ 0) caused by a sudden variation in yaw.Therefore, we can understand that the model is unable to accurately capture these severe variations and, thus, underpredicts the fatigue loads.We can furthermore see how these sudden variations on   are preceded by an equally sudden variation on the yaw angle (, in light blue in Fig. 6b).Thus, there is a link between yaw and lateral bending moment variation However the origin of these apparent large fatigue loads does not come from an actual sudden, physical variation in the SS loads.Instead it is due to one of the key assumptions of the global methodology [45] being violated.In general one assumes a unique value for the yaw angle for the whole of the ten-minutes.This assumption overwhelmingly holds true.But Fig. 6b shows that once this assumption breaks down, the bending moment calculation is not truthful to the physical reality (the bending moment variation regions in red in Fig. 6b do not have any physical meaning, they just are a result of the yaw angle not remaining approximately constant).Therefore, we are in the presence of a shortcoming of the decision to work with   and   bending moments and not of the machine learning model.
It could be argued that one should not work with   and   loads and use the strain gauges directly or use 1s SCADA for the yaw transformation.In Section 2.1 it was explained that working with   and   made sense, as these are the 'primary' directions of the structure, with dynamics clearly being different between these two primary directions.When, alternatively, working with strain gauges directly, the two primary directions would be mixed and additional challenges to train a model would be posed.As for the use of 1s SCADA for the yaw angle, this would be challenging, as time-synchronization issues between the different systems would arise.Moreover, 1s SCADA is not readily available for all turbines/sites, which would hinder the widespread adoption of the current methodology.

Fore-aft long-term error
Similarly to Section 4.2.1, and as seen in Table 2, the issue of underprediction is not constrained merely to SS, but also for FA, with FA long-term error being positive (representing a global underprediction) for OWT1.
In order to understand the causes behind this behaviour, we resume the strategy of Section 4.2.1, this time by plotting ‖  − ŷ ‖ 5 , the  5 norm of the errors, against the measurements for OWT1 FA, as shown in Fig. 7. Above this figure, we additionally plot the probability density function (PDF; see Fig. 5) and the cumulative distribution function (CDF), the integration of the PDF,   () = ∫  −∞ p ()  [72].In Fig. 7 we see the linear correlation between ‖  − ŷ ‖ 5 (hereafter simply referred as errors) and   .If the errors are above the regression line (the residual is positive), then   > ŷ , which means that the model is underpredicting.As we can see, the mean residuals are above zero (  > 0) to the right of the blue dashed line, representing less than 3% of the data.This can be further verified by observing the PDF and CDF plots, in which the measurements' distribution is overwhelmingly concentrated on the lower band of the axis (to the left of the blue dashed line).This indicates that for the vast majority of fatigue loads the model is, as desired, overpredicting (left side of blue dashed line).However, the average of the residuals is dragged up due to the model's behaviour for high load cases (which also represent a large role on the total fatigue life).
To further understand what is happening with the model in the high load case region we can plot the predictions ( ŷ ) against the measurements (  ), as shown in Fig. 8a.
In Fig. 8a we can again see that the model's performance is overwhelmingly accurate (most values are well within the 1.5 confidence interval, CI, and closely follow the identity line, red).Therefore, the model is accurately able to capture the underlying behaviour of the turbine response, with it even overpredicting for small load values ( ŷ >   on bottom left corner of the figure).However, for a small subset of data generally withstanding high fatigue loads, it is not capable of doing so (purple region).In the case of ten-minute predictions this is not as noticeable, as the  2 for Table 2 demonstrates.However, when we accumulate and re-scale these errors, due to the presence of the Wöhler exponent ( = 5), these gain a higher preponderance on the overall long-term performance.
In the purple shaded region, the model is severely underpredicting (  ≫ ŷ ), and it does not have a overpredicting counterpart to cancel it out.If we take a look at an individual time-instance example outlier (orange cross in Fig. 8a) we can better understand what is going on in this region.In Fig. 8b we can see for this time instance how the model's DEM prediction ( ŷ ) is vastly inferior to the measurement (  ) when plotted for the measured timeseries, when there is a drastic increase in DEM in a spike without the model being capable of accurately predicting this structural response.Further looking into Fig.8c, where the rotational speed (, in RPM) of the rotor is plotted and we can identify the culprit for this underprediction: there is a severe and sudden variation on the RPM which causes a sudden spike in DEM.
If we define a threshold of 3 RPM for the transient (absolute variation considering the preceding time-instance,  −1 ) of the rotational speed (|∕ −1 | > 3 RPM, a simple logical constraint), then we can return to Fig. 8a and map these time-instances by overlaying them (in orange).Indeed we can see that, when there is a severe and sudden variation of the rotor speed, albeit for some cases the overlap is not present, they are also able to correctly identify a good number of values within the purple region.
The sudden variation on RPM represents a tentative rotor stop/start.Thus, the PINN model appears not to be able to accurately replicate the turbine's structural response when the turbine's rotor suddenly stops.This is not wholly surprising as, due to the data quality and type, this phenomenon might not be captured by the data available.

Operational conditions
We have seen above how a -ML approach greatly improves the model performance when DEMs are accumulated and re-scaled and the areas where it is lacking, due to the quality of the available data and some fundamental assumptions.In the present subsection, the ten-minute DEM estimation error (expressed as a percentage of the long-term DEM) is plotted for the mean squared logarithmic error (MSLE, control) and Minkowski logarithmic error (MLE) loss functions (for FA and SS) based on the operational cases: nominal (functioning turbine) or idling.We do this because the accurate portrayal of the fatigue loads faced during the different operating conditions a offshore wind turbine experiences has arguably become as important as a good general model.In modern monopile-foundation OWTs the complex interplay of aero-, hydro-and structural dynamics has meant that there is not one primary direction who is solely responsible for the fatigue incurred by the structure.This becomes specially relevant for idling conditions [33].
The performance of the MLE and MSLE models are presented in Fig. 9 for nominal and idling operational conditions for Fig. 9(a), fore-aft and Fig. 9(b), side-to-side.
In Fig. 9(a), we can see how the MLE loss function model has, for all turbines, an error distribution whose mean is negative for nominal operating conditions.This means that, on average, the model is overpredicting, as intended.As for the whiskers, we can detect an unbalance in OWT 1 (max above 150%, min below −150%), which can be reflected in the global long-term error being positive and thus indicating underprediction.As for the MSLE model, albeit the logarithm is also used, this is less evident.What is however noticeable is that the relative errors are much lower for idling, as fore-aft loads are lower under this operating condition.We can nevertheless still see how, specially for OWT 2 and 3, the MLE model is negatively skewed.
In Fig. 9(b) we can see how, for both nominal and idling conditions, the MLE model has its error distribution centred around zero, whereas the MSLE model has a positive bias.However, if we again take a look at the whiskers, these are greater on positive errors than negative ones.
Thus, globally, we can affirm that the MLE PINN is able to have a more accurate performance than a regular neural network approach with MSLE and better able to incorporate the logarithm-induced conservatism for all operating conditions.

Timescale
The main focus of this contribution lies with the accurate prediction of the long-term accumulation of DEMs.However, the long-term errors shown in Table 2 condense into a single value the performance of the PINN model, in which overpredictions and underpredictions on ten minute level will cancel eachother out over 9 months of data.However, we can accumulated the ten-minute DEMs into different timescales (1 h, 1 day, 1 1 month) and it is relevant to understand how the error propagates over these different timescales (see Fig. 10).
Fig. 10 shows that the spread on the error vastly reduces with the increasing timescale, rapidly converging, with it being centred around zero (this is to be expected, as negative and positive errors cancel each other out over time) for all turbines.We can observe, e.g., how for a monthly long-term DEM accumulation the error spread 95% confidence interval (1.96 ) is within a rather acceptable ±5%.This has a practical relevance, as the error spread provides the confidence bounds operators must consider when comparing the accumulated DEMs over a given timescale.These results suggest that the method is suitable for monthly and even weekly fatigue accumulations, but might be insufficiently reliable for daily and hourly estimates.With SS estimates converging faster than those in FA direction.We can also observe how the incentive for overpredictions (reflected on the negative error mean) slowly converges with the increasing timescale towards zero, without actually reaching it (OWT 2 and 3) or slightly overpassing it (OWT 1).Interestingly enough, we can also observe for smaller timescales (1 hour/1 day), the longterm error seems to cap at around 15%.As for the SS error, we can see that its spread is much smaller and centred around zero (hovering slightly above it) for all timescales.

Fatigue damage accumulation
Finally, in Section 2.3 we have seen how the long-term DEM can be brought into a accumulated fatigue damage through the employment of Eq. (2.3).In this section, we show the results of the fore-aft and side-toside fatigue damage accumulation for every month of the nine months of available data for the three turbines, whilst additionally presenting the progression of the relative error.We can see these on Fig. 11.
The figure shows the fatigue damage progressively increasing both fore-aft and side-to-side.We can see how, for OWT 2 and OWT 3 FA, the predicted damage slightly surpasses the measured damage (as intended as per the inclusion of the safety-factor of the logarithm function in the Minkowski logarithmic error).The same cannot be said of OWT 1, due to the reasons explained in Section 4.2.2.Nevertheless, when we take a look at the relative error on the accumulated fatigue damage, it is kept within ±5% for all turbines.
More interestingly, we can see how, for all turbines, after the initial months where there is a strong variation, the relative error on the accumulated damage (  , in green) begins to converge to a constant value (after roughly 6+ months).This is consistent with [45], where the same 9-12 months timeframe was identified until convergence.The convergence on the error is essential if the developed model is to be used in prognostic applications, extrapolating its predictions into the future.
This variation on the error is less noticeable in the side-to-side direction, but there is nevertheless, a stability.As seen in Section 4.2.1, the relative error on the accumulated fatigue damage will be positive (between 10 and 15%), because the model is underpredicting (measured damage is higher than the predicted).This might seem inconsistent with Table 2, which showed SS long-term DEM errors that are below 3%.However, accounting for power of  in Eq. (2.3) means a 3% error in   results in a 15% error in fatigue damage.The power of  is also the reason why the ratio between FA and SS increases: for DEMs this ratio (taken every month) is between 1.20 and 1.34, whereas for damage it is between 2.48 and 4.32 (1.20 5 and 1.34 5 , respectively).

Conclusions and future work
In this contribution, we propose and analyse the performance of a data-based methodology formulated through physics-guided learning for neural networks.This is used in a fleet-leader strategy for towermonopile transition piece interface fatigue loads long-term estimation.The explicit objective of long-term estimation is stated by explaining the rationale and procedure behind the accumulation of ten-minute instances of damage equivalent moments (DEM) into a long-term DEM.This metric is further brought into a fatigue damage value.The knowledge required for the accumulation of 10-min DEM (by employing a Lebesgue norm) is additionally used to shape a custom loss functionthe Minkowski logarithmic error (MLE) -which incorporates physical knowledge of the problem and the properties of the logarithmic function, forcing the model to slightly overpredict and therefore adding an element of conservativeness.
This approach involved the use of nine months of monitored data in three real-world locations consisting of ten-minute statistics of SCADA, wave, acceleration (inputs) and strain data (used as the target for the neural network model).The available input data statistics were reduced to a smaller number of explanatory features through the employment of recursive feature elimination and the optimal neural network model topology found through Bayesian hyperparameter optimization.In the results section it was seen how the employment of the Minkowski logarithmic error is able to accurately predict fore-aft and side-to-side DEM timeseries on a ten-minute basis.Furthermore, the MLE was favourably compared with a control loss function (mean squared logarithmic error), with performance greatly increasing for long-term DEM estimation.It was additionally noted that, for some cases, the model was still underpredicting (and therefore, contradicting the rationale behind the use of the logarithm).The reason behind this behaviour was scrutinized by inspecting the residuals and discovering that, for the side-to-side direction, due to some fundamental assumptions on the bending moment calculation, the model was presenting errors without an underlying physical explanation.The same analysis was done for the fore-aft direction, where it was seen that rotor stops were not wholly captured by the model as it lacked data.
Additionally, the performance of the model was inspected in relation to nominal and idling operational conditions, with a greater performance over the board for the Minkowski logarithm error when compared with the control loss function.Thereafter, the influence of the timescale -hourly, daily, weekly and monthly accumulationon the long-term DEM prediction accuracy was inspected, with the errors balancing each other out (centred around zero) and the spread rapidly converging to towards zero.Finally, the long-term DEM was translated into fatigue damage and its progressive accumulation and error convergence verified.It was seen how these values compare favourably in relation to industry practice.
The next logical step to be taken in this research is to apply this fleet-leader model in a full farm-wide setting, and not just for turbines instrumented with strain gauges where cross-validation is possible.Such an implementation can be seen as pilot in what is hoped to become a commonplace practice among wind farm operators in the future.This is naturally dependent on farm-wide data availability, but, in the case of the farm used for this study, farm-wide SCADA and acceleration data will become a reality in the very near future, with sensor installation already rolling out (at the time of writing).Coupled with the farm-wide implementation, and in order to quantify the uncertainty related to the predictive model and therefore, introduce a metric of confidence one might have on said model, future research will also focus on applying a neural network predicated upon a Bayesian probabilistic framework to a farm-wide setting.The main advantage of such a Bayesian neural network approach is the probabilistic output (in the form of the mean and standard deviation of predicted DELs), which can be leveraged to indicate the model's confidence in its output through the coefficient of variation.
Finally, a study quantifying event-related fatigue is also necessary.If wind farm operators are to effectively understand the fatigue life consumption of their assets and leverage this knowledge next to the grid operator, they are required to quantify fatigue life consumption of demand-induced events, such as stopping or curtailing rotor operation due to grid requirements.The accurate translation of these events into a fatigue consumption framework and further into an economical impact metric through a value of information analysis is of the utmost importance for a fair and informed negotiation between all wind energy stake-holders.
In summation, in this contribution, it was shown how problemspecific knowledge employed in physics-guided learning of neural networks greatly increases model performance for multi-timescale objectives, the data quality dependant pitfalls and their probable solutions and how the marriage between engineering knowledge and dedicated sensor data are essential for modern structural health assessments in offshore wind support structures.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 2 .
Fig. 2. Exemplary datasets of both the bending moment (orange) and the top level accelerations (blue) for a turbine in parked conditions.In (a), signals' timeseries, in (b) their corresponding power spectral densities (PSD).It can be clearly be observed that the two signals show a very similar behaviour, illustrating the strong link between the acceleration and fatigue data.

Fig. 4 .
Fig. 4. Time series of measurements (  or   , in red) and predictions ( ŷ or   , in blue) with 95% confidence interval bounds (1.96 , with  standard deviation) of the relative error for OWT1 FA using MLE.Predictions using MSLE ( ŷ in orange).

Fig. 9 .
Fig. 9. Model prediction error based on loss function (Minkowski logarithmic error, MLE and mean squared logarithmic error, MSLE) expressed as a percentage of the longterm DEM for nominal and operational conditions for (a) fore-aft and (b) side-to-side.N.b. scales of (a) and (b) differ.

Table 1
Description of datasets from the measurement campaign.Note the conflicting sampling frequencies.Each data-type is processed into 10-min target statistics.

Table 2
Comparison of models' performance for MSLE and MLE (coefficient of determination,  2 , RMSE expressed as a percentage of the mean DEM value, NRMSE (%) and long-term DEM error,   , with   = 100×(  −  )∕  ) for the three turbine in each direction (fore-aft, FA, and side-to-side, SS).OW1 (T), training turbine.