Data management for structural integrity assessment of offshore wind turbine support structures: data cleansing and missing data imputation

Structural Health Monitoring (SHM) and Condition Monitoring (CM) Systems are currently utilised to collect data from offshore wind turbines (OWTs), to enhance the accurate estimation of their operational performance. However, industry accepted practices for effectively managing the information that these systems provide have not been widely established yet. This paper presents a four-step methodological framework for the effective data management of SHM systems of OWTs and illustrates its applicability in real-time continuous data collected from three operational units, with the aim of utilising more complete and accurate datasets for fatigue life assessment of support structures. Firstly, a time-efficient synchronisation method that enables the continuous monitoring of these systems is presented, followed by a novel approach to noise cleansing and the posterior missing data imputation (MDI). By the implementation of these techniques those data-points containing excessive noise are removed from the dataset (Step 2), advanced numerical tools are employed to regenerate missing data (Step 3) and fatigue is estimated for the results of these two methodologies (Step 4). Results show that after cleansing, missing data can be imputed with an average absolute error of 2.1%, while this error is kept within the [+ 15.2%−11.0%] range in 95% of cases. Furthermore, only 0.15% of the imputed data fell outside the noise thresholds. Fatigue is found to be underestimated both, when data cleansing does not take place and when it takes place but MDI does not. This makes this novel methodology an enhancement to conventional structural integrity assessment techniques that do not employ continuous datasets in their analyses.


Introduction
Structural Health Monitoring Systems (SHMS) have become relevant in the last decade for the operational management of Offshore Wind Turbines (OWTs) due to their damage detection and continuous fatigue life assessment capabilities. Operation and Maintenance (O&M) related costs are a significant contributor to the Levelized Cost of Energy (LCoE) (Shafiee et al., 2016;Shafiee and Sørensen, 2018). While in the past SHMS were installed as a way to abide by the German regulations (imposing a 10% of assets instrumented across an offshore wind farm (OWF)) and not exploited to their full potential, nowadays operators have realized how these technologies could result in an increase in electricity production and thereby a reduction in LCoE (Ioannou et al., 2018;Myhr et al., 2014). Over the past decades, many researchers from the SHM community have developed an extensive amount of methods based on a variety of physically interpretable structural features (Hansen et al., 2017). At this point in time there is no widely accepted practice with respect to the specification of monitoring systems, as industry is still exploring WTs' potential, making every wind farm different in terms of technologies implemented, number and location of the sensors, redundancies, etc. Most of these fatigue assessment methods rely on collected data from either accelerometers, strain gauges or the combination of both from selected instrumented units (Luengo and Kolios, 2015;Martinez-Luengo et al., 2016). Numerous authors have carried out different ways of analysing SHMS' data -for example, a vibration-based damage localization and quantification method, based on natural frequencies and mode shapes extracted by means of Operational Modal Analysis (OMA) combined with Finite Element Analysis (FEA) of the test structure (Hansen et al., 2017).
Another approach to fatigue assessment is by the extrapolation of the dynamic behaviour of OWTs from a limited set of sensors. Existing monitoring strategies for monopiles are based on physical models or artificial intelligence (Ziegler et al., 2017). Model-based time-domain algorithms require accelerometers and sometimes strain gauges on the structure. These try to reproduce the time history of dynamic response parameters, such as acceleration or strain, of the whole structure for https://doi.org/10.1016/j.oceaneng.2019.01.003 Received 20 May 2018; Received in revised form 12 November 2018; Accepted 2 January 2019 T different operational regimes. This was carried out by employing Kalman filters (Maes et al., 2016a;Fallais et al., 2016), joint input-state estimation (Maes et al., 2016b) and modal expansion algorithms (Maes et al., 2016b;Iliopoulos et al., 2014Iliopoulos et al., , 2016. Even though accelerometers might be placed in the WT's nacelle for Supervisory Control And Data Acquisition (SCADA) or Condition Monitoring (CM) purposes, they are not so often placed at different levels of the Support Structure (SS), unless there is a particular interest in its vibration monitoring. However, installing these accelerometers at different levels of the turbine is more expensive. Besides, accelerometers alone do not cover all the necessary frequencies needed for modal expansion algorithms as (Maes et al., 2016b) explains, making strain gauges also necessary. Furthermore, sometimes WTs are only instrumented with strain gauges, especially those commissioned more than five years ago.
Most fatigue-sensitive spots (called hot spots) in OWTs are inaccessible for direct measurements, i.e. at welds or mudline (Iliopoulos et al., 2017). Different methods have been utilised to accurately predict the structure's response at these important locations where strain gauges cannot be installed. This is achieved by combining measurements from sub-optimal locations with FEA (Ziegler et al., 2017;Iliopoulos et al., 2014Iliopoulos et al., , 2017Martinez-Luengo et al., 2017;Gentils et al., 2017), to extrapolate to the critical locations.
Sometimes datasets at accessible locations are not complete due to failure in acquiring or recording the data, full storage space, high noise, etc. The issue of limited information due to limited availability of operational data could be mitigated by these FEAs. This hypothesis has been supported in different articles, where accurate load estimation is believed to be best carried out with data driven models requiring only a short period of mechanical strain measurements (Smolka and Cheng, 2013).. However, this approach of using incomplete SHM datasets is questionable, not only for deriving service life estimations from reduced time intervals, but also for introducing uncertainty in the estimations and wasting costly SHM data that could potentially be utilised for damage detection and quantification strategies.
As mentioned earlier, OMA introduces uncertainty in the estimations. According to Banfi and Carassale, the available mathematical OMA techniques have the common feature that the unmeasured excitation is modelled as a random process specified by some probabilistic models (Banfi et al., 2017). In practical applications, the length of the measurement is limited and the probabilistic model adopted to represent the excitation does not necessarily apply. This, together with measurement errors, leads to uncertainties of a different nature that affect the estimation of the modal parameters. Finally, it seems impractical to install multiple sensors and dedicate resources to analysing their measurements, without obtaining a long-term view of how the system behaves and degrades.
Noise is inherent to data acquisition. Signals in realistic applications are inevitably contaminated with measurement noise, as well as other sorts of variabilities and uncertainties, such as calibration issues, transmission or de-synchronisation between the real and the recorded time-stamp. As a result, the SHM features extracted from the contaminated data, such as damage equivalent loads (DELs), power M. Martinez-Luengo et al. Ocean Engineering 173 (2019) 867-883 spectrum and frequency response function, are also noisy Todd, 2012, 2013). Uncertainty could contaminate the extracted SHM features dramatically if the data quality is poor, and thereby causes ambiguity in interpreting the features (Sarrafi and Mao, 2017). Usually the uncertainty will raise false alarms in the damage detection, i.e. nondamage-induced feature deviation from the undamaged baseline. Therefore, noise identification and quantification in SHMS′ data should not be ignored and, ideally, should take place before fatigue assessment. Besides, a systematic approach to the effective data management of these SHMS installed in OWTs' prior fatigue assessment, has not been established yet either in the literature or by regulations. This paper aims to develop a methodological framework for the effective data management of SHMS of OWTs by addressing the issues of missing data and noise in the acquired data, which influence the effective fatigue assessment of offshore wind (OW) energy assets. This is achieved through a four-step process (including synchronisation, cleansing, imputation and fatigue assessment) that enables the continuous analysis of the unit's structural integrity and remaining service life throughout the years, as highlighted in Fig. 1. This novel framework is implemented by utilising real and continuously monitored, 50 Hz strain data collected from three different OWTs currently in operation, for over three years. These turbines were instrumented with SHMS during their commissioning. Therefore, it is assumed that no previous fatigue damage from the commissioning phase was undertaken by the turbine without being captured and that the noise/calibration error present in the measurements, used to derive the dynamic structural behaviour of the units, is minimum.
This article highlights the importance of appropriate data handling of SHMS for the continuous fatigue assessment of an OWT's SS. The four-stage methodology proposed in Fig. 1 is implemented in Section 2 and its results discussed in Section 3. After data synchronisation takes place, noise cleansing and missing data imputation (MDI) are applied. Their efficiency for the better assessment of the structure's integrity is analysed in Section 3, where the impact that noise cleansing has in the accuracy of MDI is shown. During the noise cleansing stage, the inherent dynamic relationships between different parts of the SS are derived and those dynamic responses significantly deviating from them are cleansed. In the third stage, missing data present in these datasets is imputed for both the non-cleansed and the cleansed scenarios. The accuracy of the imputation is shown by the comparison between the imputed and the exact data values. Finally, fatigue is estimated for four different scenarios: without cleansing/without MDI, without cleansing/ with MDI, with cleansing/without MDI and with cleansing/with MDI.
Results show that the proposed data management framework could help the OW industry to derive more accurate fatigue life estimations to help push the boundaries of current operational periods and make the technology more competitive by reducing its LCoE.

Data synchronisation
Modern WTs are equipped with sophisticated SCADA control systems, spreading, on a 10 min time basis, a vast amount of information, including: details on the wind flow and meteorological conditions, on turbine alignment to the wind, on the conversion of wind kinetic energy into active power, on the vibrational and mechanical status of the machine, on thermal conditions at relevant parts of the turbines, and so on (Castellani et al., 2017). SHMS′ data are physically collected from time to time (at the discretion of the operator) from the OWT as the local storage capacity is limited. This often coincides with regular inspection activities. Once data have been collected, environmental data (from both SCADA and metmast) are synchronised by having one measurement for each time-step. Typically, wind measurements are recorded every 10 min, and metmast measurements every 30 min. As a result, two synchronisation approaches could be considered: every 10 min by keeping wave measurements constant for the 30 min interval, or every 30 min by averaging wind conditions. In this analysis the 10min interval dataset is chosen as wind is considered to be the environmental factor contributing most to the overall loading that the structure is subject to, in comparison to wave's loading. Therefore, by having 10-min intervals, wind variability is more accurately captured. Strain data would typically need to be temperature normalized. Each strain gauge that required compensation has an associated temperature channel and set of apparent strain coefficients. Therefore, the temperature compensated strain would be the actual measured strain minus the apparent strain, which depends on the temperature and the sensor material properties. The apparent strain ( ) A is calculated as: (1) where T is the value of the temperature (°C) and C 0 to C 4 are the coefficients for the gauge batch. The length of the dataset also needs to be reduced as handling 50 Hz strain data is neither time-nor cost-efficient. Furthermore, its synchronisation with environmental conditions would be problematic as there would be 30,000 strain measurements per each 10-min measurement of environmental conditions. A solution to this issue consists of the calculation of the Damage Equivalent Loads (DELs) for 10-min intervals (Schutz, 1996;Ziegler and Muskulus, 2016;Cosack, 2010). DELs are equivalent to the single load that would cause the same damage than the cumulative effect of the loads for the established interval, which in this case is 10 or 30 min. The expression is calculated with the following formula (Schutz, 1996;Ziegler and Muskulus, 2016;Cosack, 2010): where n i is the current cycle, i m is the stress range, N eq is a fixed number of cycles and m is the slope of the S-N curve. Values for N eq and m can be obtained from standards such as the volume dedicated to fatigue design of offshore steel structures from DNVGL-RP-C203 (V (Det Norske Veritas).a). Once the DELs are calculated, resulting in a single dataset containing 18 months of continuous data, strain data can be synchronised with environmental data (provided that environmental data was previously synchronised at the same frequency than the strain data). The result is a single dataset containing SCADA, environmental (wind, wave and generator's active power) and strain data for every 10 min. Any redundant data will be identified and removed during this process, reducing the length of the dataset and avoiding doublecounting fatigue cycles.

Data cleansing
Before these datasets can be used for fatigue analysis and following the Statistical Pattern Recognition Paradigm (for more information see (Martinez-Luengo et al., 2016)), data cleansing must take place. In the offshore wind energy context, cleansing is understood as two phenomena: -The removal of abnormal data, which are believed to be abnormal not due to damage, but due to external conditions, i.e. the malfunctioning of a sensor. -Removal of noisy data. This occurs when sensors record noisy measurements.
Strain gauges generally record noisy measurements in the presence of electric and/or magnetic fields, which can superimpose electrical noise on the measurement signals. If not controlled, the noise can lead to inaccurate results and incorrect interpretation of the strain signals (Vishay Precision Group, 2013). Even though sensors for SHM of OWT support structures are placed way below the nacelle and therefore not M. Martinez-Luengo et al. Ocean Engineering 173 (2019) 867-883 exposed to their electric and magnetic fields, interferences could occur if they are placed close to the J-tube in the TP. Other noise sources that could potentially introduce noise in the train measurements are: transformers, relays, generators, rotating equipment, radio transmitters, electrical storms, poor insulation of the sensor during installation, transient vibrations, etc. In summary, any electrical device that generates, consumes, or transmits power is a potential source for causing noise in strain gage circuits. In general, the higher the voltage or current level, and the closer the strain gage circuit to the electrical device, the greater will be the induced noise (Vishay Precision Group, 2013). It is difficult to know if a sensor is recording noise and how much the magnitude of this noise is individually, but due to the embedded redundancy in the SHMS, the relative noise can be accounted for by comparing the readings of two correlated sensors at each time-stamp. The concept of correlating sensors lies in the premise that, depending on wind direction, different pairs of sensors will exhibit behaviour of a similar trend. This is illustrated in Figs. 2 and 3. While in Fig. 2, for that particular day, sensors A and C were correlated and therefore, exhibit the same trend in DELs measurements (even though there is some offset between them), Fig. 3 shows that these same two sensors (A and C) were not exhibiting the same trend on another day when the wind direction did not make them in correlation. When two sensors are not in correlation at a particular moment, it does not necessarily mean that there is noise in their measurements. It only implies that the parts of the SS where these two sensors are placed are not experiencing physically the same trend of stress, and therefore, sensors are not measuring the same trend of strains (for a particular direction). This correlation between sensors allows us to understand the offset between DELs measurements when two sensors are in correlation and therefore, it can be employed to cleanse the dataset whenever the noise between a pair of sensors is higher than a particular level previously established.
In this paper, we propose a novel approach for noise identification and removal. For that approach, analysis of the sensors that are in correlation for particular intervals, depending on the wind direction, is carried out. The term "correlation" is understood to be two sensors following the same behaviour or trend in measurements, even though there might be an offset between the two.
Initially, in order to determine which sensors are in correlation at different wind directions, the dataset was divided into 20deg intervals (18 in total) and DELs were plotted for each wind direction angle (see Fig. 4 where DELs from different sensors are plotted). As it can be appreciated in this figure, sensors 1 and 3 seem to be following a similar uniform trend in Orientation 2 (between 20 and 40 deg of wind direction); however, for Orientation 15 (280-300 deg) it seems that their measurements are much more distorted. This procedure was repeated several times to find which sensors would correlate at each Orientation.
Nevertheless, these graphs do not show precisely the differences between sensor readings and their evolution. For this reason they are not an accurate way of determining whether a new point would be within reasonable limits of noise. To solve that, noise thresholds have to be defined in a way that, when the noise level of a particular measurement happens to fall above a predefined threshold, the data-point is automatically excluded from the final dataset.
Noise thresholds are determined by calculating the difference between two sensors' measurements, for all wind directions. The value of this difference tends to be a stable value or offset, which may be zero when the pair of sensors are in perfect correlation. This means that, even if the difference (the offset) is constant around a certain value (statistical distribution's mean value), the standard deviation would be significantly lower whenever the sensors are in correlation and higher when they are not. In order to be exhaustive, all possible sensor combinations for each one of the 18 orientations were analysed. For each orientation there is a total possible number of combinations 'C' of: 'n' being the number of sensors, which in this case is eight. Afterwards, a normal distribution was fitted to all computed values of the 28 sensor combinations for every orientation. The mean of the normal distribution determines the offset between the measurements. This offset constitutes the difference between dynamic responses of the two sensors of the combination being analysed. The best indicator of the correlation between two sensors is the standard deviation of the difference between their measurements. The smaller this is, the more correlated these measurements are, as this means that these sensors' measurements follow a more similar trend. In order to automatize data cleansing throughout the life of a structure, firstly the noise thresholds need to be set. This would be achieved by analysing the correlation between sensors for small intervals of wind direction right at the beginning of the operation of the analysed asset. Five degree intervals were selected for this purpose for two reasons in order to not only to capture the slightest variability of these correlations with enough accuracy but also to have enough data-points to posteriorly define the polynomials that will constitute the noise thresholds. M. Martinez-Luengo et al. Ocean Engineering 173 (2019) 867-883 In order to define the noise thresholds, the dataset is divided into intervals according to wind direction, which results in a total of 72 intervals (360/5 = 72). Also, for each data-point the 28 sensor combinations are computed. If for each sensor combination (among 28 possible combinations), the mean and standard deviation at all orientations (72) are plotted into a graph, and a polynomial is fitted into the points, the boundaries of the admissible noise can be set. This can be observed in Fig. 5, where the mean value (blue line) and mean value  M. Martinez-Luengo et al. Ocean Engineering 173 (2019) 867-883 plus and minus the standard deviations (black dashed lines) of the difference between sensors' readings, every five degrees, are plotted. Fifth order polynomials were fitted to the points. The order of these polynomials was determined after the optimisation of the fitting error was carried out. Fig. 5 shows the particular case of the sensor 2-6 combination. Furthermore, the two red polynomials represent a 20% noise allowance. This noise allowance is set to be 20% of the standard deviation at each orientation. For each new measurement, if the deviation of the difference between sensors' measurements is higher or lower than the thresholds (red lines), the data-point is considered to have excessive noise and is therefore excluded from the dataset. In Fig. 5, the mean value of the difference in DELs measured by sensors 2 and 6 is shown in blue. This difference is: This mean value represents the offset of the measurements due to both the measurement of different physical states of the structure and difference in calibration when these sensors are installed. The closer that DELs 2 6 for a particular data-point gets to the mean, for a given wind direction, the less noise this sensor's readings will have. A certain noise or variation in DELs 2 6 will still be expected and its magnitude will be dependent upon the level of correlation these two sensors experience throughout the wind directions. This is measured by the standard deviation, which determines how spread the values are in a normal distribution and accounts for 95% of the values (99.7% within 3 standard deviations of the mean). The closer that the + mean Std deviation . is to zero, the more correlated the sensors are and, therefore, the more similar trend of measurements these will record. When a dataset is cleansed, the 28 different sensor relationships (at each time-step) are computed. Thus, the wind direction is used to extract the upper and lower noise thresholds for each combination, which will be compared to the computed values of the combinations. A noise matrix will be filled for each data-point of the set. Whenever the computed value is within the established thresholds, a 1 would be filled in the noise matrix. If the measured difference of values falls outside the thresholds, the noise in the measurement is considered 'too high'. Therefore a 0 would be placed in the noise matrix, which is composed of the following relationships: where the relationships between the same sensor are not considered (the difference is zero by definition). Therefore these are marked as NAN ("Not A Number") in the matrix below. Furthermore, inverse relationships are considered as if the first sensor of the difference is always the lowest number (i.e. DELs 2 1 will never be computed because it has similar characteristics to DELs 1 2 , therefore DELs 2 1 is substituted for DELs 1 2 in the matrix). This procedure makes the matrix symmetric, which facilitates the procedure of determining which of the sensors for a particular combination is the one presenting noise (or if both are). Equation (6) Note: For clarity, the matrix above only shows the sensors' combinations.
When a dataset is being cleansed, the 28 different sensors' relationships at each time-step are computed and compared to the noise thresholds to determine whether or not the noise they present is admissible or not (admissible = 1, inadmissible = 0). For each time-step, the noise matrix will be filled in binary. Once the noise matrix is complete for a particular time-step, each sensor of the combination is checked whenever noise is detected for that particular combination. For  example, if the combination 1-2 (shown in red in Fig. 6) has noise, either sensor 1, sensor 2, or even both of them, could have noise. The criteria used to decide which one or if both of the sensors have noise is to check the overall performance of the sensors at a given time-step. Therefore, for this case all the relationships involving sensor 1 (first row) and sensor 2 (second column) are checked with three potential outcomes: -Majority of sensor 1's combinations have noise but not sensor 2's combinations (sum(NoiseMatrix(1,:) > =4). Therefore, sensor 1's value is deleted due to excessive noise, but not sensor 2's value. -Majority of sensor 2's combinations have noise but not sensor 1's combinations (sum(NoiseMatrix(:,2) > =4). Therefore, sensor 1's value is deleted due to excessive noise, but not sensor 2's value. -Both sensors' combinations have noise (sum(NoiseMatrix(1,:) > =4 && sum(NoiseMatrix(:,2) > =4) Therefore, both sensors' values are deleted.

Missing data imputation
After noise is removed, data is checked using the criterion of completeness, making sure that information is not corrupted. Missing data is a challenge faced in almost every empirical analysis but especially in engineering applications employing sensing technologies. These technologies are by no means infallible as they can present different types of failure modes in the data collection. Some of these are: calibration, noise, transmission and data storing issues, and also those related to the reliability and failure mechanisms of the data acquisition system (composed of the sensing technologies, transmission and storage of the measurements). Current practice in the OW industry would ignore the missing data and select reduced intervals of complete time series that are believed to be representative, to carry out their analysis. This approach is practical for time-consuming studies; however, precious data are discarded in the process. Having complete datasets free of noise would, without doubt, enhance the confidence in the fatigue life analysis and allow more realistic remaining service life estimations.
An effective way of dealing with missing data from SHMS of OWTs is through employing Artificial Neural Networks (ANN). This method was chosen as the best approach due to its applicability, accuracy and consistency with the analytic software used for other data management activities during this project (Gheyas and Smith, 2010;Kolios et al., 2018;Lazakis et al., 2018). Other relevant methods for MDI are: mean imputation (Hawthorne and Elliott, 2005), K-nearest neighbour, Maximum Likelihood (Dempster et al., 1977;Enders, 2001;Eason, Bond, Lozev, n.d.) and Multiple Imputation methods (Richman et al., 2009;Reilly and Pepe, 1997). Fig. 7 shows the methodology followed for MDI using ANN.
In order to train the ANN, input and output matrices need to be specified. This process might seem trivial but often one of the most recurrent issues with SHM is the excess of non-necessary data and how to determine which data should/should not be analysed. For this application the relevant input variables include: wind speed, wind direction, generator active power, significant wave height and wave direction. Output data is constituted by the eight sensors previously utilised for data cleansing. These sensors are located at the transition piece of the turbines. Once the input matrices for each dataset are created, the statistical distributions of each input are derived. As can be appreciated from Fig. 8, normal, Rayleigh and kernel distributions were fitted to the inputs -kernel distribution being the best fit, among others, to the available empirical data. A kernel distribution is a nonparametric representation of the probability density function of a random variable (Matlab, 2016). Kernel distributions are used when a parametric distribution cannot properly describe the data, also when assumptions about the distribution of the data are better to be avoided. Kernel distributions are defined by a smoothing function and a bandwidth value, which control the smoothness of the resulting density curve.
Furthermore, from the initial dataset, a similar percentage to the one of the data removed during the cleansing would be deleted from both the original and the cleansed datasets. These removed data are imputed with the ANNs described later and their results compared to the originals in order to assess the level of confidence that can be given to these estimations. Further details are explained in Section 3.2.

Fatigue assessment
The ultimate aim of this framework is to develop a data management tool that supports fatigue calculations for SS of OWTs. In order to Fig. 7. Missing data imputation framework.
M. Martinez-Luengo et al. Ocean Engineering 173 (2019) 867-883 do so, data cleansing and MDI techniques were applied to real SHM data from three WTs, obtained from a continuous monitoring campaign. Therefore, the fatigue that these three turbines are subject to during the monitoring campaign, is assessed for the four possible scenarios, as summarised in Fig. 9. An initial dataset without any other manipulation than eliminating missing data, is used for Case A (without cleansing/ without MDI scenario). Case A is utilised to train the ANN mentioned in Section 2.2, which imputes the missing data from the original dataset, constituting Case B (without cleansing/with MDI scenario). On the other hand, Case C (with cleansing/without MDI scenario) is made when data are cleansed and missing data removed from the dataset afterwards. This has the implication that only high quality data (without noise) are used for the calculation. However, the length of the dataset is significantly reduced, which also diminishes the confidence in the remaining service life estimations. Lastly, Case D (with cleansing/ with MDI scenario) is made by employing Case C's dataset to train an ANN, which imputes the previously removed missing data after the cleansing took place.
The two most commonly used fatigue assessment techniques are the stress life (S-N) approach and the fracture mechanics approach (Martinez-Luengo et al., 2017). The S-N curve approach is the one recommended by DNV and IEC standards (see (V (Det Norske Veritas).b)) due to its straightforward implementation. A review of the currently used S-N curves is provided in (Brennan and Tavares, 2014). Furthermore, the equivalent stress range ΔS is determined from the four different datasets, previously mentioned, by calculating the DEL of the whole dataset in the same way as in Section 3.1. Having obtained the equivalent stress range, the number of loading cycles to crack initiation, in Equation (6), can then be determined from the S-N curve, expressed as: Fig. 8. a) Histogram of real input data b) statistical distributions fitted to real input data.
M. Martinez-Luengo et al. Ocean Engineering 173 (2019) 867-883 where A is the intercept 'm' in the slope of the S-N curve in the log-log plot (V (Det Norske Veritas).b). The selection of the S-N curve plays a massive role in the results obtained. These are generally classified in air, seawater with adequate cathodic protection or free corrosion conditions, and are taken from DNV-RP-C203 ''Fatigue Strength Analyses of Offshore Steel Structures" (V (Det Norske Veritas). (2005). Offshore structures are prone to corrosion development due to the harsh marine environment, which leads to significant levels of damage to the structures and hence a reduction in service life (Adedipe et al., 2016). For that reason, curve D in seawater with adequate cathodic protection is used in service life calculations with an intercept = A 15.6 and a slope = m 5.

Results and discussion
In this Section, the results of the analyses described in Section 3 are presented. This analysis was performed on three WTs from the same OWF, which from now on are called 'Turbines 1, 2 and 3' for clarity purposes. Metocean, SCADA and strain data were available for the three turbines and synchronised, as explained in the previous section, before the data cleansing started. Also, all the data-points where the turbine should have been in operation (wind speeds of 4-25 m/s), but according to SCADA was shut down, were deleted from the dataset. This deletion is carried out so these non-operational intervals do not affect the data cleansing process. Fig. 10 shows how this filtered dataset follows the power curve.

Data cleansing
In order to capture the dynamic response of each turbine better, the synchronised datasets are divided into five intervals of wind speed. These intervals consist of three operational and two not-operational regimes (0-4 m/s and > 25 m/s being the intervals of the non-operational regime and 4-11 m/s, 11-18 m/s and 18-25 m/s the intervals of the operational regime). This approach was chosen as it provides a good compromise between capturing well the behaviour of the turbines and having enough data in each interval for the statistical analysis. The interval corresponding to wind speed greater than 25 m/s had to be discarded due to the lack of samples, which made the statistical analysis of this interval not possible. The only data-point of this interval remained 'uncleansed' in the final dataset, as it was impossible to determine whether it had noise or not. Therefore, the assumption of no noise present in this data-point was made.
During the analysis, the different polynomials, which constitute the noise thresholds for each interval for each one of the 28 sensor combinations, are extracted. Fig. 11a and b shows an example of how these different noise thresholds may look, while Fig. 11 a shows the great level of physical correlation that sensors 1-2 have for low wind speeds (0-4 m/s) with a very steady mean and standard deviation values. A constant mean value of difference between sensors implies that these sensors are physically exposed to the same type of physical excitations, as the average offset between these sensors does not have significant variation across the different wind directions. A constant value of the standard deviation implies that the pair of sensors is continuously correlated, as the deviation of their sensor readings from the mean value (definition of standard deviation) is constant across wind   M. Martinez-Luengo et al. Ocean Engineering 173 (2019) 867-883 directions; Fig. 11b shows a different situation, where the correlation of sensors 3-7 is strongly influenced by the wind direction in a pattern similar to a sinusoidal wave. Furthermore, the standard deviation also exhibits a higher degree of variation than in Fig. 11a, by reaching local maximums in the valleys of the mean distribution and local minimums at the hills of the mean value distribution. The noise thresholds are set to be 20% of the standard deviation of the difference between sensors. Although this percentage might seem high, it was set to be a reasonable trade-off between cleansing excessive noise and capturing diversions from the expected behaviour of the asset that could potentially lead to an acceleration of fatigue damage. Excessive cleansing would result in the removal of expected phenomena such as vibrations and sudden excitations that could locally affect the turbine (wind gusts, local impact of waves, propagation effects, or even localized damage). This percentage ensures that not too much data are discarded for further analysis; however, it may vary depending on the level of risk that each operator is willing to take. Fig. 12 shows the percentage of deleted data for each sensor, at the three turbines and for the different wind classes, which correspond to the operational regimes previously mentioned (1: 0-4 m/s, 2: 4-11 m/s, 3: 11-18 m/s and 4: 18-25 m/s).

Missing data imputation
After data cleansing has taken place, the missing data from the reduced but more accurate datasets are imputed with the aim of obtaining more complete datasets for the fatigue assessment. ANNs with different structures are developed to perform this imputation and to determine whether the imputation becomes more accurate due to the data cleansing. Therefore, following the MDI framework, the three filtered and cleansed datasets from Turbines 1, 2 and 3 were used as inputs and outputs to train the ANNs. The ANN employed was a two-layer feedforward network, with a sigmoid transfer function in the hidden layer and a linear transfer function in the output layer. The number of hidden neurons was optimised for each turbine. After the training was done, a  M. Martinez-Luengo et al. Ocean Engineering 173 (2019) 867-883 similar percentage to the one of previously cleansed data was randomly removed from each sensor of both the original and the already cleansed datasets. A record of these randomly deleted data was kept for later on, when computing the deviation of the prediction from the real value (verification process). Three different algorithms for ANN training were utilised: Scaled Conjugate Gradient, Levenberg-Marquardt and Bayesian Normalisation. Levenberg-Marquardt is recommended by (Matlab, 2016) for most  M. Martinez-Luengo et al. Ocean Engineering 173 (2019) 867-883 problems, but for some noisy and small problems Bayesian Normalisation can take longer time but achieves a better solution (Foresee and Hagan, 1997;Hagan and Menhaj, 1999). For large problems, however, Scaled Conjugate Gradient is recommended as it uses gradient calculations which are more memory efficient than the Jacobian calculations the other two algorithms use (Moller, 1993). Finally Levenberg-Marquardt was chosen for outperforming the others in terms of Error (minimum squared error (MSE) and residuals (R)), training performance, regression, number of iterations and training time needed. Figs. 13 and 14 show an example of the error histogram and regression chart. Missing data were imputed through a number of stochastic input values to the ANN. A problem often presented in ANN is overfitting. Overfitting occurs when the network has memorized the training examples, but has not learned to generalize to new situations. This could be the case when the performance on the training set is good, but the test set performance is significantly worse. The solution in this case would be reducing the number of neurons. An example of overfitting can be the ANN employing 1000 neurons for Turbine 2, where the error is considerably higher than that of the 400 neurons ANN (see Fig. 16). In order to avoid overfitting but optimise the results, the best performing architectures were chosen for each turbine. These were the 200, 400 and 1000 neurons for Turbines 1, 2 and 3 respectively. The following figures show the performance of the different ANN architectures for both with and without cleansing cases (see Figs. 15,16 and 17).
Another aspect noticed during the cleansing process of Turbine 3 was that all measurements from sensor 8 were compromised as they appeared to be two orders of magnitude lower than the expected values. Therefore, the level of mismatching in the MDI is not surprising. Furthermore, the results of Turbine 3 show that for no apparent reason, axial sensors 1, 3, 5 and 7 present a higher challenge for the imputation, which appears to be mitigated with the cleansing, but is still noticeable. Fig. 18 represents a comparison between the best performing ANNs trained with and without cleansed data for the three turbines. This figure shows that MDI is performed more efficiently after data cleansing has taken place, as this reduces not only the mean error of the imputation, but also the standard deviation of this error. Thus, for the few cases where the mean imputation error of the dataset with previous data cleansing exceeded the one without it (Turbine 1: Sensors 6, 7 and 8; Turbine 2: Sensors 7 and 8), the absolute error is still smaller with data cleansing.
When the comparison of the performance is made (see Fig. 19), results show that the average absolute error is 2.1%. Furthermore, in 95% of the cases (i.e. ± 2 standard deviations) the error is within the range [+15.2%−11.0%]. This estimation was carried out by averaging mean value and standard deviation errors across the eight sensors for the three turbines (excluding Sensor 8 from Turbine 3 as mentioned before). Besides, Turbine 3 presents the highest challenge to input data to, having standard deviations that exceed the 20% of error in the imputation.
Furthermore, the errors presented in this section were calculated from the difference between the imputation and the exact value of DELs. Nevertheless, errors reduce considerably by checking when the imputed values are within the noise thresholds previously defined. These errors are presented in Table 1.

Fatigue assessment
Fatigue assessment constitutes the last step of the proposed methodology and the fundamental reason for its development. This section analyses the effect that data cleansing and MDI have on the current fatigue damage estimation. Fatigue assessment is normally based on uncomplete datasets, hence being able to impute missing data enhances the confidence in residual fatigue life estimations, as the number of samples increases and can become more accurate. However, this imputation needs to be precise by not introducing noise or amplifying biases in the estimations. Data cleansing is key in keeping noise away from the datasets. Figs. 20-22 show, for each of the three turbines under consideration, the effect that the four different combinations of M. Martinez-Luengo et al. Ocean Engineering 173 (2019) 867-883 cleansing and MDI scenarios have in fatigue calculations. This analysis takes Case D as its baseline, due to the positive results obtained in the previous section where missing data were proven to be imputed to the exact real value with an average absolute error of 2.1% and within the range of [+15.2%−11.0%], for 95% of the times.
According to Figs. 20 and 22, fatigue is underestimated when data cleansing and MDI are not performed. This can be appreciated, especially in Case A (without cleansing/without MDI) and Case B (without cleansing/with MDI). The cause is believed to be an excess of noise, which contributes to the collection of lower measurements, and makes  M. Martinez-Luengo et al. Ocean Engineering 173 (2019) 867-883 stress ranges lower for the rainflow-counting algorithm. When data cleansing is not carried out but the MDI is, there is occasional overestimation of stresses (see Case B, sensors 1 and 6 in Fig. 20 and sensors 3 and 5 in Fig. 21). The reason is that the noise is picked up in the algorithm and reproduced, making the cumulative effect to considerably increase the overall fatigue of the structure. On the opposite side, Case C, where data cleansing is carried out but MDI is not, is found to underestimate fatigue for the three turbines. The explanation for this phenomenon is the dramatic reduction in the number of samples considered for the fatigue calculation (see Table 2).
The underestimation of fatigue when data cleansing is not carried out is particularly concerning. The implications of the underestimation of fatigue loads may seem small at this stage; however, these estimations have been made after two years of operation and at not critical locations. This means that while the difference in fatigue damage is currently not an issue, after ten years of operation it could make a difference to the remaining service life calculations, when an underestimated stress range is introduced in the S-N curve. Furthermore, sensors are not installed at turbine's hot-spots, meaning that the measurements they collect are potentially 5-10 times smaller than they could be at hot-spots (Martinez-Luengo et al., 2017). The underestimation of fatigue could potentially make a big impact at these hotspots and in the remaining service life of the structure.

Conclusion and future work
In this study, a framework for the effective data management of SHMS was developed enabling the continuous analysis of offshore wind turbines' structural integrity throughout the life cycle. The synchronisation between environmental data (SCADA and metocean) and real, continuously monitored, 50 Hz strain data collected from three different OWTs currently in operation in the Irish Sea, led to datasets over three years long; however, these three datasets were incomplete. Noise cleansing and MDI were carried out with the purpose of determining their benefits in continuous fatigue assessment of offshore wind turbines.
Two scenarios were considered for each wind turbine: with and without noise cleansing. Our results confirmed that in those cases where data cleansing was carried out, the average imputation error was about 2.1%. Furthermore, in 95% of the cases the error was within the range [+15.2%−11.0%]. The results indicated that noise cleansing and MDI could successfully be employed together to produce more complete datasets containing real low-disturbed strain data. Furthermore, fatigue was estimated for the four different cases, namely (i) without cleansing/without MDI (Case A), (ii) without cleansing/with MDI (Case B), (iii) with cleansing/without MDI (Case C), and (iv) with cleansing/with MDI (Case D). Results showed that for the wind turbines 1 and 3, fatigue was underestimated when data cleansing had not been performed. The cause is believed to be an excess of noise, which contributes to the collection of more uniform cycles of fatigue. In Case C, where data cleansing was carried out but MDI was not, fatigue was found to be underestimated for all the three turbines. Also, there was an overestimation of fatigue in some sensors when data cleansing was not carried out but MDI was. The reason is that the noise is picked up in the MDI algorithm and reproduced, making the cumulative effect to considerably increase the overall fatigue of the structure.
Currently, fatigue analyses are often performed based on uncomplete datasets. The methodology presented in this research provides the possibility of enhancing the confidence in fatigue life estimations by increasing the length of the datasets through firstly, data cleansing and secondly, MDI. The results obtained validate our two novel methodologies, making it a suitable tool for better evaluation of offshore wind turbines' structural integrity. We are exploring some opportunities to implement the proposed approaches in the wind energy sector with the aim of deriving more accurate fatigue life estimations to help push the boundaries of current operational periods and make the technology more competitive by reducing its LCoE. Further work could potentially focus on accounting for the degradation in the accuracy of sensor readings (increase in noise) across the years, comparing different   M. Martinez-Luengo et al. Ocean Engineering 173 (2019) 867-883 periods across the life of a windfarm. A comparison between the performance of the proposed ANN method and some other techniques such as random forest, support vector machine (SVM) and Gaussian process regression is in our research agenda.