Analysing RMS and peak values of vibration signals for condition monitoring of wind turbine gearboxes.

Wind turbines (WTs) are designed to operate under extreme environmental conditions. This means that extreme and varying loads experienced by WT components need to be accounted for as well as gaining access to wind farms (WFs) at different times of the year. Condition monitoring (CM) is used by WF owners to assess WT health by detecting gearbox failures and planning for operations and maintenance (O & M). However, there are several challenges and limitations with commercially available CM tech- nologies e ranging from the cost of installing monitoring systems to the ability to detect faults accu-rately. This study seeks to address some of these challenges by developing novel techniques for fault detection using the RMS and Extreme (peak) values of vibration signals. The proposed techniques are based on three models (signal correlation, extreme vibration, and RMS intensity) and have been vali- dated with a time domain data driven approach using CM data of operational WTs. The ﬁ ndings of this study show that monitoring RMS and Extreme values serves as a leading indicator for early detection of faults using Extreme value theory, giving WF owners time to schedule O & M. Furthermore, it also indicates that the prediction accuracy of each CM technique depends on the physics of failure. This suggests that an approach which incorporates the strengths of multiple techniques is needed for holistic health assessment of WT components. © 2016 The Authors. Published by Elsevier Ltd. This is an open access the CC license


Introduction
The availability and the consequent O&M costs of WFs are influenced by the failure and downtime of WT components such as the gearbox. In offshore WFs, where repair procedures are complex and logistics are influenced by extreme weather conditions, the impact of component failure can lead to even longer WT downtimes [1,2]. These O&M issues have spurred the need for remote condition monitoring and assessment capabilities for WT components to detect faults early enough in order to be able to plan O&M activities and minimise downtime. CM is gradually becoming the state-of-the-art approach for meeting this need in large multimegawatt and offshore WT applications after being requested by certification bodies after series of catastrophic WT failures in the early 1990s [3]. Yet the adoption of CM technologies for commercial WF applications has not been without challenges. On the one hand, installing purpose-built CMS, which typically do not accompany WTs except in few offshore applications, is very expensive. On the other hand, even though most large WTs have Supervisory Control and Data Acquisition (SCADA) systems, SCADA systems also have issues with prediction reliability and accuracy [3].
This study proposes a perspective of WT gearbox O&M through the use of CM for early fault detection, enabling WF owners time to plan O&M well in advance and save costs by reducing downtime as a consequence. This maintenance approach is called condition based maintenance (CBM) [4,5]. Unlike preventive maintenance (PM) [6], a CBM approach takes the condition of the monitored component into account when making O&M decisions. This provides the opportunity for both the effective planning and scheduling of maintenance actions [7]. PM takes into account the previous failure and service history, factoring these as risk parameters when calculating the interval between the current operating period and the next wear-out or failure time [6]. Conversely, with CBM there is no need for previous failure history. O&M planning is achieved by monitoring key parameters that would be indicative of any deterioration in a WT's health, so as to detect failures at their early stage. The success of CBM depends on the type and accuracy of the CM technique used, the analysis methods and the interpretation of results.
In the CBM approach presented in this study, three models (signal correlation, extreme vibration, and RMS intensity) are proposed and validated based on a data driven time-domain approach addressing the key limitation and issues that have been identified in literature [3,8,9]. For this, a three stage approach was used, they are: data pre-processing, modelling and validation. The models are validated using data from operational and failed turbines, seeking to show how sensitive the models are in detecting different types of failure modes in wind turbine gearbox. Operational turbines with healthy gearboxes are used to show the normal response for each model while faulty gearboxes with some of the common gearbox failure modes are used to show the detectability of each model for different types of failures.
The main contributions of this paper are in two parts: The improvement of known techniques of using RMS values of vibrations [8,10] and application to monitor WT gearbox health. The development and validation of a novel approach for detecting abnormal WT gearbox operation using the extreme value theory.
The outline of the article is as follows: In Section 2, a brief review of literature on CM and CBM is done by identifying the main techniques and the limitations of current approaches. In Section 3, the three models proposed to meet the limitations are developed. Section 4 presents the results after data from 10 WT gearboxes were used to validate the proposed models. A comparison of healthy versus faulty gearboxes was made for each model and a case study for detecting three common failure modes of the gearbox high speed module is also done. Finally, in Section 5, the findings are summarised and a view into future research directions is presented.

Related works
Antecedent research in CM of WT gearboxes has covered a wide variety of applications ranging from standard techniques such as vibration and oil debris analysis [1,11e16] to others such as acoustic emissions [17] and SCADA analysis [7e9,18e22] etc. While the first three are purpose-built CMS for monitoring specific parameters and detecting incipient failures, SCADA systems were primarily installed on WTs for measuring operational parameters such as wind speed, ambient temperature, component temperature and generator power [3,5]. However, because they are readily available, SCADA systems are now also used for CM. This has been achieved by creating models and trends from SCADA data which when interpreted, are used to assess the condition of WT components [1,8,20,23,24]. A good example of such technique that does not rely on traditional CMS can be found in Ref. [25], where angular velocity measurements from the gearbox input shaft and output shaft to the generator were used to define an error function for detecting gear and bearing damage. It is also worth noting that previous works on WT gearbox CM have focused on two main strands: (1) CM algorithm development, validation and improvement, such as [16,25e28], and (2) CM technology assessment and development, for example [11,15,29,30].
Irrespective of the technique and/or technology applied for CM, the capability of CM depends on two factors [30]: (a) the number and type of sensors and (b) the associated signal processing and simplification methods, with the latter being relevant to this study. The number and types of sensors are generally determined by the type of commercial CMS or SCADA systems used and are beyond the scope of this article. According to [30], some examples of signal processing methods used for CM include: Statistical Analysis, Time Domain Analysis, Cepstrum Analysis, Wavelet Transformation etc. In Ref. [31], three CM methods applied to SCADA analysis were discussed, they are: Signal Trending, Artificial Neural Networks and Physical Modelling. All these methods have the tendency to lead to false alarms or erroneous predictions if models used for detection are not accurate or sophisticated enough [8,23]. Moreover, more sophisticated models require more complicated algorithms which are computationally intensive and more difficult to develop [27]. Although most purpose-built CMS come with built-in detection algorithms, they are expensive to install and have not been fully justified economically [24,32]. SCADA systems on the other hand are already part of most large WTs and hence no extra costs are incurred to use SCADA for CM. On the downside, analyses of SCADA parameters are prone to high rate of false alarms. This is due to the following fundamental issues with SCADA: SCADA has a low 10 min sampling rate which has been considered too low for accurate fault diagnosis when conventional CM techniques are used [3,8]. Models generated from are relatively poor since SCADA training data are noisy [1]. SCADA data values varies over a wide range of operating conditions [8]. Consequently, a change in SCADA data does not necessarily mean a fault has developed; it can simply be as a result of a change in operating conditions. This brings additional complexity in analysing SCADA data since developed models would have to normalise the variability and seasonality of operating conditions in order to improve accuracy [7].
Of the three issues, only the first two are unique to SCADA data. The issue with the variability of operational conditions also has an effect on several monitored parameters obtained from commercial CMS, such as vibrations. A good example of this is how pitch control of WTs induces variability in monitored CM parameters. This is because pitch control limits the aerodynamic power of the turbine in order to control the power output [33], hence leading to nonlinearities in the behaviour of the turbine [8]. CM parameters such as gearbox vibrations and temperature, often vary over wide ranges [8] and a change in their levels does not necessarily indicate the occurrence of a fault, but a fault may lead to changes in these values [3,8,34].
The issues identified above have some influence on several analysis techniques commonly used in literature, especially if insufficient effort is made in data pre-processing and in normalising operational variability. Two good examples which illustrate this are: the pitfalls in comparing similar and/or neighbouring turbines through signal trending (see Figs. 1e3), and the effect of seasonality on the physical models based on gearbox energy balance (see Figs. 4 and 5). First, whilst comparing operating parameters of neighbouring turbines has proven useful in determining outliers [31], it does not always show the true picture and can be misleading. This is because different WTs and their components, even though identical in design, may have different response in terms of the CM parameters used for trending (Figs. 1e3 illustrate this). Second, the use of gearbox oil and bearing temperatures are also examples of common parameters used for monitoring the health of wind turbine components [1,8,23,35]. This approach has been used to model the energy balance of the gearbox, i.e. energy is either transmitted by the gearbox as output power or dissipated as heat energy in the form of temperature rise. Here, a loss in efficiency of the gearbox would be signalled by an increase in energy loss which consequently indicates a fault. However, seasonality of ambient temperature influences the accuracy of the approach if not normalised (see Figs. 4

and 5).
These examples (Figs. 1e5) suggest the importance for normalising the variability operational and environmental parameters. This article proposes alternative approaches by using condition indicators which are not sensitive to these variations but to a change in the health of the gearbox. There are many statistical features of vibrations that describe key condition indicators for gearbox health, such as RMS, Kurtosis, Crest Factor, peak values etc. These have all been discussed excellently in Refs. [10,36e38]. Of the key condition indicators for gearbox vibrations, the authors chose to use RMS and peak values of the time domain vibration signals because a change in their values can be a leading indicator of impending faults as seen in Figs. 1e3. In general, RMS values of vibration signals have been used to monitor the overall vibration level of gearboxes [10]. This is because the overall vibration level typically increases as the gearbox deteriorates (as observed in Fig. 5). Hence RMS vibration monitoring is very suitable for detecting progressive failures such as bearing pitting and scuffing and shaft cracks. However, there are criticisms of using RMS vibration for gearbox CM, with two known issues identified in literature. The first stems from suggestions that RMS values of a vibration signal does not increase with the isolated peaks in the signal, hence it is not very sensitive to incipient gear tooth failure. Its value only increases as the tooth failure progresses [10]. Second, RMS values are also not significantly affected by short bursts of low intensity vibrations and as a result encounter problems in detecting early stages of bearing failure [11]. These two limitations served as an initial motivation for the authors to consider using peak (extreme) values of vibration signals.

Modelling approaches
In this section, three different models for detecting faults in the high speed and intermediate speed stages of WT gearboxes are developed. The methodology adopted in modelling the vibration data has been purposefully conceived and designed to address the key limitations and issues which have been identified in literature [3,8,9]. For this, a three stage approach has been used to develop the respective models presented in this paper. They are: data preprocessing, modelling and validation. Firstly, the raw time series Here, the curves have been clearly labelled to differentiate between the healthy turbine and a second turbine that failed from HS bearing pitting a year after the data was collected. Firstly, if the average vibrations of all the neighbouring turbines were used to check for outliers. If one was to simply compare the measured power and vibration response of two turbines in Fig. 1 (a, it would not have been farfetched to conclude that the healthy turbine was in a poor condition relative to its neighbour. However, from Fig. 1(b), it can be seen that the vibration of the failed turbine increased dramatically when failure occurred a year later whilst that of the healthy turbine barely changed and is almost identical with the values for the modelled normal operation. of vibration data, together with relevant operational data such as power and wind speed, need to be pre-processed to filter out noise and normalise operational variability. After this, the relevant models are developed based on the data. In this study a data driven modelling approach is used to establish the relationships between vibration levels and operational parameters. Finally, the models are then validated using data from operational and failed turbines, seeking to show how sensitive the models are in detecting different types of faults in a wind turbine gearbox. The first two stages are dealt with in this section while the validation is done in the results section (see Section 4).
The data used for this analysis are 2 min averages from a vibration based CMS data, obtained as time series from purpose-built CMS piezoelectric accelerometer sensors installed on operational turbines. Fig. 6, for example, is a one month time series window of CM and operational data for a turbine showing parameters such as gearbox HS crest factor, peak and RMS vibrations, generator power output and wind speed. Looking at the raw time series on its own does not give much insight into the health of the gearbox, unless failure has developed into a severe state where vibration levels become excessively high as seen in Fig. 5. Hence, there is need of some data model in order to detect any abnormal behaviour in the gearbox as early as possible. The data driven approach models the relationship between gearbox vibration parameters and operational parameters such as wind speed and power output.

Data pre-processing
Pre-processing CM data is a very important and fundamental step when developing data models for wind turbines. This is there is no striking difference between the power curves for both turbines during the normal operation and just before failure occurs. It is largely expected for both turbines to produce same power output during normal operation since they are both in the same wind farm. However, one would expect that based on common theories of loss in generating efficiency resulting from component deterioration and performance degradation [1,20], that the power output of the failed turbine should degrade during failure period. This is not the case in this context. A similar observation has been made previously by Ref. [8]. Therefore this suggests that turbine power curve signal is a lagging indicator for detecting incipient faults in WT gearbox. Hence having the limitation of detecting local subassembly faults [8]. . Daily average ambient temperature and gearbox temperature rise for a WT with gearbox HS bearing pitting failure; in this example the seasonal effects of ambient temperature affect the accuracy of using the temperature difference as modelling approach and hence data have to be normalised for seasonal variation of key parameters [7]. This is because seasonal change in ambient temperature directly affects the energy balance of the gearbox i.e. the temperature rise.
because there are different factors, which if not accounted for or normalised, may influence CM data. Factors other than the structural health of the turbine, such as wind shear, turbulence, the effect of pitch control etc. have an influence on wind turbine CM data [8]. The authors have adopted the pre-processing algorithm developed by Ref. [8] and combined this with other techniques of data filtering as used in Refs. [9,19]. The first step in the data preprocessing stage is to filter out noise from the data. This includes excluding parts of CM data which have negative power output values [9]. This is because when the power output is negative it implies that the turbine is consuming power and not generating power, which could occur before the turbine reaches the cut in wind speed [19]. Once the data filtering is complete, the next step is to segment the data so as to eliminate the nonlinear effects of pitch control. This is achieved by dividing the power curve into three wind speed regions ( Fig. 7(a)) [20]. According to [39], there are three distinct wind speed regions: Region 1, times when the turbine is not operating or during start-up. Region 2, when the turbine is in operational mode where it is desirable to capture as much power from the wind as possible. Region 3 occurs above the rated wind speed (the wind speed at which rated power is produced) and in this region the turbine must limit the fraction of wind captured so as not to exceed the rated designed electrical loads. This is achieved via pitch control of the blades.
It has been shown in Ref. [8] that it is easier to obtain reliable CM before reaching the rated wind speed due to the absence of nonlinear control effects that could damper fault features contained in the data. Furthermore, by filtering the idle periods of the  WT it leaves the CM data contained in region 2 as the most suitable for modelling.
The results of pre-processing are binned values of variables such as wind speed, generator speed and generator power, and CM parameters. The wind speed was used as a reference for binning CM data and the expected values for each variable in each bin were estimated based on the probability distribution of the samples contained in the bin. This is unlike the method introduced in the IEC standard which simply finds the average of values of each bin [8]. The IEC's method can be susceptible to the presence of one-offs or outliers in the data, which could skew the average value of each bin away from the true mean. Fig. 7(b) shows the power curve scatter plot of the modelled pre-processed data from Fig. 7(a).

Signal correlation and trending
Correlations of different CM parameters and several operational variables can be obtained from the pre-processed CM data. One way of doing this is by creating scatter plots consisting of bins of power output, wind speed or generator speed plotted against a CM parameter of choice. This straight forward approach can be very powerful in detecting faults from CM data. The occurrence of a fault can be observed by miscorrelations between binned variables and the respective CM parameters at different operating windows [8]. Fig. 8 shows two different correlations for CM parameters for a WT gearbox during normal operation and a week before failure. In Fig. 8(a) the power curves for the two operating conditions do not give a clear indication of the failure. This further reinforces the arguments in Section 2 that the power curve is a lagging indicator of gearbox faults (see Fig. 3). However, this does not mean that the power curve cannot be used to monitor the health of other wind turbine components such as the generator as has been shown in literature [35,40].
It is obvious from Fig. 8(b) that a clear miscorrelation can be observed when the scatter plot is reproduced for the power versus RMS vibrations. This goes in line with the argument by Ref. [10] that the occurrence of certain types of gearbox failures (in the case of Fig. 8, bearing pitting failure) would lead to a substantial increase in the gearbox vibration levels and hence their RMS values. This is a good way to detect failures but the challenge is that waiting to see this degree of miscorrelation might be too late hence there is a need to be able to assess the failure severity. An attempt to detect failure severity has been made in recent literature [8], where a CM criterion was developed to measure this severity given by the equation: where a and b are the respective coefficients of the polynomials derived from the present and historical data, k is the degree of the polynomial and x max and x min are the respective maximum and  minimum values in both polynomials. Each polynomial and their subsequent coefficients can be obtained by fitting a regression model to the data. When c z 0 it means that the turbine is healthy and when c > 0 indicates a fault. Furthermore the greater the value of c is the more serious the fault is [8]. It is also possible to estimate the miscorrelations with other curve fitting and regression error and deviation measures such as mean square error, mean absolute error and the explained variance (In Section 4.3, these criteria will be compared with the c value by using data from several operational turbines with known failure history). Signal correlation and trending has the following drawbacks when applied to CM data: It can be applied successful only when the normal operating conditions (obtained from historical data) of the monitored component are available to be modelled. This is mainly because the faults are detected only based on the miscorrelations of monitored parameters from their modelled normal conditions. Hence, a good duration of operational history e typically three to six months of data e is needed to develop a model of the normal operating conditions in order to apply this approach successfully. This technique is also the issue of being prone to estimation errors during polynomial fitting especially when miscorrelations are only marginal. Hence expert judgement is needed to conclude if in such cases a fault has actually occurred.
Correlations of power and RMS are only sensitive to progressive failures as shown in literature [10,11] and hence are not ideal for detecting gear tooth failures (this is illustrated with a WT example in Section 4.2).
Having this in mind, the next subsection presents a novel approach which addresses these limitations.

Extreme vibration model
In an attempt to overcome the shortcomings of RMS vibration signals, this section introduces a novel CM technique for WT gearboxes based on the extreme vibration levels (peak vibrations). This technique uses the inherent behaviour of the vibration based on the amounts and magnitudes of extreme vibrations that occur during a given time period, hence eliminating the need for correlation with historical data. It leverages upon the extreme value theory which has been used in literature to model extreme events in other applications such as the prediction of extreme annual rainfall, sea tide levels, breaking strengths of glass fibres, peak wind speed predictions and in financial applications [41e43]. The approach is based on the following paradigm: "A sequence of random variables X 1 , X 2 , X 3 , …, X n which are set of the maximum values of a certain parameter X measured at fixed intervals, will have a common distribution function given by: M n ¼ max{X 1 , X 2 , X 3 , …, X n } which converge for large n, i.e. as n / ∞. The distributions which describes these types of data are called extreme value distributions" [42,43].
An example of such time interval could be maximum rainfall measured hourly, weekly, monthly or annually. The first type of asymptotic model in this family of distributions is the Gumbel type given by Ref. [42]: where x is the set of independent random variables; while a and b are respectively called the location and scale parameters for the distribution. The Gumbel distribution is also known as the double exponential model. From its definition, the extreme value theory should be applicable to peak vibrations. This is because the CM data used in this study, which takes the 2 min summary statistics of the raw vibration signal, have peak vibrations measured for every 2 min window. Furthermore, since the number of samples of CM data is typically large, the authors expect peak vibration data to satisfy equation (2) by converging for large n.
In order to test this hypothesis, the probability plot of peak vibrations from the pre-processed data, when modelled with equation (2), should follow a straight line on a logelog scale. The goodness of fit of data to the Gumbel distribution can be assessed by two measures e the p-value and the AndersoneDarling (AD) coefficient. The p-value is used to determine the appropriateness of rejecting the "null hypothesis", which in this case would be that "the data closely follows a straight line when fitted to a Gumbel distribution". The null hypothesis can also be interpreted as e "there is no significant difference between the plotted data and the Gumbel distribution". Therefore for the peak vibration data to be Gumbel distributed there must be no sufficient evidence to reject the null hypothesis. Typically, for a 95% significance level the null hypothesis is rejected if the p-value < 0.05 (i.e. there is sufficient evidence for a significant difference between the peak vibration data and the Gumbel distribution) and accepted when pvalue > 0.05 [44]. The AD coefficient is also used to test for the goodness of the fit and in this application the smaller the AD value the better the goodness of fit. Typically a value below 1 is preferred. Fig. 9(a) shows the probability plot of the peak vibrations for the pre-processed CM data when fitted to a Gumbel model.
From the graph it might seem plausible to reject the null hypothesis i.e. "the data does not follow a Gumbel distribution". This is because the p-value is less than 0.05, in fact p-value < 0.01 which leads to a conclusion of rejecting the null hypothesis. Also the AD coefficient of 3.399 seems very large as well. This is the case because all the power bins have been used in fitting this distribution. However making a scatter plot of power vs. peak vibrations ( Fig. 9(b)) it can be seen that the peak vibrations varies across the full power window. Upon closer inspection of Fig. 9(a), it can be seen that the probability plot is actually made up of more than one shape, these are: 1. the shape that falls between peak vibration ranges 18e34, 2. the shape that falls between ranges 35e40, and 3. the shape that falls between ranges 40 and above This is firmly in line with the scatter plot where the peak vibrations rapidly increases linearly from start-up, then gradually grows through about 50e60% of the rated power and nears a steady state from 75% to rated power output. Consequently, the peak vibration data can then be divided into three groups of power bins with each fitted to the Gumbel distribution (see Fig. 10(a)e(c)).
It is now clear what the impact of dividing the power bins have on the various peak vibration profiles. From Fig. 10(a)e(c), it is clear that splitting the peak vibrations into three power bins makes a significant difference in the quality of the plot. Furthermore, the respective p-values are much greater than 0.05 and the AD coefficients are small as well (AD < 1), thus confirming the initial hypothesis that peak vibrations will follow a Gumbel distribution. In addition to the plots in Fig. 10(a)e(c), a probability plot was made using a random sample from raw un-binned peak vibration values (see Fig. 10(d)). This further reaffirms the claims made by the authors. Now that the behaviour of peak vibrations has been established, the question that remains is how this can be used to detect a fault in a gearbox. Fig. 11(a) and (b) respectively show the scatter and probability plots at 75%e100% rated power of the peak vibration for the CM data one week before failure. Comparing these respectively with Figs. 9(b) and 10(c) for the normal operating period, it can be that not only are the peak vibrations for each power bins much higher than those for normal operation, the location and scale parameters are also greater (see Table 1).
In this example location and scale parameter have only been able to point out the difference in peak vibrations of both operational windows because they can be compared retrospectively. However, when a WT is newly installed with little operational history or when only a snapshot of one operational window exists, it is not possible to tell if the vibrations are extreme by simply measuring the location and scale parameters. Consequently, a model has been develop empirically from CM data from operational WTs and also by curve fitting techniques to detect the extremeness of vibrations for each given time window. Two main observations have been made empirically from studying the RMS and peak vibration plots of healthy gearboxes.  First, for a healthy gearbox the scatter plot of the RMS values of vibration from 75% to 100% rated power generally follows a straight line or gentle curve and reaches its maximum at rated power (for example see Figs. 8(b) and 14(a)). Second, the scatter plot of the peak vibrations for a healthy gearbox is near steady state from 75% to 100% rated power.
In general, a faulty gearbox will not obey at least one of these two rules as can be seen in Fig. 8(b) where the RMS vibrations do not follow a straight line during a week before failure, and in Fig. 11(a) where the peak vibrations are not near steady state between 75% to rated power. It follows that if a multiple regression plot in three dimensions is done for power vs. RMS vs. peak vibrations, there will be a marked difference between a healthy and faulty gearbox. Fig. 12(a) and (b) show the plots produced for the two cases using the pre-processed CM data. The x-axis is the power output, the y-axis is the RMS value of the vibration signal and the third dimension (colour scale) is for the peak vibrations. It can be seen that the colour scale, which measures the intensity of peak vibrations show that for a healthy gearbox the peak vibrations between 75% and 100% rated power are near steady state, hence the dark blue colour, which is consistent for the entire power bin ( Fig. 12(a)). Also the power vs. RMS for normal operation follows a straight line.
However, for the faulty gearbox these two relationships do not hold ( Fig. 12(b)). Consequently the colour axis show higher vibrations which deviate from steady state hence an inconsistency of peak values (colour shades) across the power bin, and the scatter plot is more random than linear or gentle sloping. This curve fitting approach relies on the inherent shapes of the polynomials generated from the scatter plots of RMS and peak vibrations. This novel approach (using extreme value theory and peak/RMS/power output regression colour plot) gives a good indication of the presence of faults in a gearbox. Examples of how it is used to detect different failure modes are presented in the results section. The next subsection presents another model which can be used to determine the severity of a fault once detected.

RMS deviation intensity
This model is based on the relationships between two consecutive RMS values. A parameter called "delta RMS", which is the Fig. 11. Peak vibrations for a faulty gearbox (a) scatter plot; (b) Gumbel probability plot at 75%e100% rated power.  difference between two consecutive RMS values, is used to estimate the trend of the vibration signals. The parameter can be estimated using the equation: The assumption behind this parameter is that e if a gear damage occurs, there will be a more rapid increase in vibration levels than in a case without damage [10]. One disadvantage of this method is that the parameter is very sensitive to load changes [10]. Hence it is not very suitable for setting alarm levels. However, it can be used with other techniques such as signal correlation, to assess the severity of gearbox damage. This is shown in Fig. 13(a)e(d), which are the respective delta RMS plots for 1 year, 6 months, 1 month and 1 week before failure occurred. The pattern of the delta RMS plot in Fig. 13(a) and (b) are usually observed when the gearbox is healthy (normal operating period), while the patterns in Fig. 13(c) and (d) are usually observed in the lead up to failure. In the case of Fig. 13(d) the failure has progressed to a more severe stage. Thus giving a picture of how delta RMS varies over time will give an idea of the severity of a failure if other indicators signal the presence of a fault. This is a qualitative approach and must be used with caution and with expert judgement.

Results and discussion
This section presents the results obtained after applying modelling approaches presented in the previous section to real WT CM data for detecting faults in the HS module of gearboxes in operational WTs. Also a case study on detecting several failure modes in the HS module is presented in this section. The reasons why the HS module has been chosen for the validation are as follows: The HS module is considered to be the least reliable part of the gearbox, with HS bearing failures dominating gearbox failures [6,45,46]. Many modern WT designs enable the repair and replacement of HS modules up-tower, eliminating the need for external cranes [6,46].
These mean that if failures in the HS stage can be detected early enough, O&M managers will have sufficient time to plan for uptower repairs, therefore enabling them to reduce downtime, heavy equipment and logistics costs, and in preventing consequential failures in the entire gearbox.

Healthy vs. faulty WT
Firstly, before showing how the different methods have been used to detect different failure modes, a comparison of each method for two identical WTs in the same WF is done. This is to give a flavour of how the three methods can be used in a WF gearbox CM context. The WTs, respectively designated as WTG1 and WTG2 for the healthy and faulty, were commissioned on the same day and were operational for about three years before WTG2 experienced a gearbox HS bearing failure. Therefore, this makes them good candidates for comparing the three methods by using CM data for the time period which preceded failure of WTG2. WTG2 is the same WTs used to illustrate the limitations of known approaches from literature in Section 2 (see Figs. 4 and 5).

Signal correlation
Retrospective CM data for both WTG1 and WTG2 from a week before the failure date of WTG2 to a year backwards was modelled using the signal correlation algorithm. The results of the correlations for power output and the RMS values of the HS bearing vibrations for both WTs are shown respectively in Fig. 14(a) and (b). For WTG1 it can be seen that there are no miscorrelations in the RMS vibration of the HS bearing. However, for WTG 2 there are very clear miscorrelations in the HS bearing vibrations in the run-up to failure. Furthermore, there are marked difference between the miscorrelations during six months, one month and a week before failure. This shows that the correlations of RMS with power output gives an early indication before failure occurs.

Extreme vibrations
Again the pre-processed CM data for both WTG1 and WTG2 was fitted to the Gumbel distribution to test for goodness of fit and also to determine the distribution parameters e Fig. 15(a) and (b) respectively. It can be seen that just as expected the P-values and AD coefficients are respectively greater than 0.05 and less than 1. This reinforces the findings from Section 3.3. Furthermore, using Matlab curve fitting toolbox to produce extreme vibration plots for both WTs during the run-up to failure of WTG2 the results are shown in Fig. 16. Again, as expected, the 3D regression colour plots for WTG1 and WTG2 agree with the empirical observations which respectively indicate a healthy and faulty WT. In WTG2 the vibrations at 75%e100% rated power follow a straight line and the peak vibrations are near steady state.

RMS intensity
Applying the delta RMS model to CM data for WTG1 and WTG2 in the run-up to failure gives an indication of the severity of failure. The delta RMS plots for WTG1 and WTG2 are shown in Fig. 17(a) and (b) respectively. It can be seen that although a clear pattern is emerging for the delta RMS of WTG2, indicating that the RMS values are increasing at a much faster rate than in WTG1. This together with either of the previous two models can be used to confirm the occurrence and assess the severity of a fault.

Case study HS module failure modes
The previous section has shown how the three modelling approaches can be used for CM by comparing a healthy with a faulty WT However, it is also important to assess how well each technique does in detecting some of the common failure and damage modes seen in the HS module. The failure/damage modes covered in this case study are: HS bearing e Hairline cracks, spalling and pitting. HS gear teeth e cracks and fracture. HS shaft e cracks.

HS bearing hairline cracks, spalling and pitting
In this example, retrospective CM data from three WTs have been used to illustrate how the modelling techniques give early warning signs of common bearing failures. The wind turbines and the respective failures in their HS bearings are given in Table 2.
It should be noted that WTG4 is the same WT used as an example of a faulty turbine to illustrate trending of neighbouring turbines in Section 2 (see Figs. 1e3) and the same WT used to develop the models in Section 3. Fig. 18(a)e(c) respectively show the scatter plots of the vibration and power output correlations for WTG3, WTG4 and WTG5. It can be seen that there are miscorrelations between the vibration and power plots in the run-up to failure of the bearings. Also the degree of miscorrelations during period "one week before failure" is very high.
Unlike for WTG2 ( Fig. 14(b)), there are no clear miscorrelations six months before failure for WTGs 3, 4, and 5, and only a slight miscorrelation one month before failure. This is not due to the type of failure mode. It is rather due to the location of failure. In these three examples, failure occurred in the generator end HS bearing while in WTG2 the failure occurred in the HS shaft. The implication is that even though the bearing vibration is being monitored for both cases, there will be a marked increase in vibrations when something is wrong with the shaft. For example, shaft failures can be a symptom of misalignment in the drivetrain and this will be easily caught by the high miscorrelations very early before shaft fracture occurs. Again, the CM data of WTG3, WTG4 and WTG5 were modelled for extreme vibrations and the variation of the delta RMS parameters during a week before failure. These are shown respectively for WTG3 and WTG5 in Fig. 19(a)e(d). As mentioned previously the extreme vibration and delta RMS plots for WTG4 can be found in Figs. 12(b) and 13(d) respectively. From all the plots it can be observed that all three WTs show symptoms of extreme vibrations in the run-up to failure. Also from the delta RMS plots, it can be seen that WTG3 and WTG4 exhibit a similar pattern and show more severity than for WTG5. This is broadly expected as they both had hairline cracks and presence of hairline cracks in bearings can lead to greater vibration levels as the cracks grow. These plots also show that the consequential damages resulting from these failures can be avoided since the plots give an indication of a potential severe incident a week before the failure occurred.

HS gear tooth fracture
Repeating the same process applied for the HS bearing examples on another WT which had a fractured HS pinion (designated as WTG6), the respective plots are shown in Fig. 20(a)e(d). Upon first glance of Fig. 20(a), it can be seen that there are little or no miscorrelations between the power output and vibration plots in the run-up to failure. This is one of the limitations of using RMS values of vibration, as identified in Section 3.
RMS values are not sensitive to short bursts in vibration that result from a single tooth coming into contact once in a revolution of the gear wheel. This is because the RMS is the average of the vibration signal over one revolution of the shaft. However, the peak vibration, which is the maximum vibration per revolution, would occur when the fractured tooth comes into contact once every revolution. Hence in Fig. 20(b) it can be seen that there are clear miscorrelations between the scatter plots of the peak vibration and power output. Furthermore, upon closer look, it can also be observed that miscorrelations also only occur after 500 KW power output. Also, there is a great degree of miscorrelation from power output >1500 KW. This corresponds to 75%e100% rate power range used for modelling the extreme vibration, hence further reinforcing the claims in Section 3. In Fig. 20(c) and (d) the extreme vibration and delta RMS plots one week before failure are shown further indicating the presence of the fault. It is interesting to note that although there were no miscorrelations in the RMS and power output plots, the delta RMS plots clearly indicate a fault with increasing severity is about to occur.

HS shaft cracks
The same process was then repeated for a WT designated WTG7, which had a shaft fracture just like WTG2. Fig. 21(a)e(c) respectively show the scatter, extreme vibration and delta RMS plots for WTG7.
Comparing Fig. 21(a) for WTG7 with Fig. 14(b) for WTG2, it can be seen that there is a much early sign of miscorrelations before shaft failures due to cracks occur. This means that RMS vibrations are very well suited for monitoring HS shaft conditions. Comparing Fig. 21(b) and (c) with Figs. 16(b) and 17(b) it can be seen that WTG7 shows sign of an impending shaft failure with a higher  severity unlike WTG2. This is due to the extremeness of the vibrations shown by the colour plot, and the pattern indicating a higher severity of failure shown by the delta RMS plot.

Application to maintenance planning via CBM
As mentioned earlier in the introduction, the authors have recently applied PM to WT gearbox HS bearings [6]. This is especially suitable for WTs without any commercial CMS or SCADA system installed on them. For WTs with commercial CMS, PM can still be used but it makes more sense to adopt CBM, since it gives a more precise estimation of the component health. Having this in mind, a brief discussion on how the CM techniques presented in   this article can be used for CBM of WT gearboxes is done in this section.
In a typical commercial CMS installed on WTs, CBM is can be achieved by using alarms to signal the occurrence and in some cases, the severities, of a fault. Alarms levels or thresholds are set for each or a combination of monitored parameters based on prescribed rules. When alarm levels exceed a predetermined threshold, the WT would be shut down to avoid catastrophic/ consequential failures giving the WF owners time to plan for maintenance. In the context of this article, four measures have been used to as a CM criterion when applied to the modelled data. They are, Means square error (MSE), Explained variance (EV), Mean absolute error (MEA), and The criterion c proposed by Ref. [8].
The rules for determining the presence of a fault for the measures are given in Table 3.
These measures all quantify the deviation of a CM data sample from a reference sample which has been modelled from CM data obtained during normal operating conditions.
The estimated values of each of these measures for WTG1eWTG7 and eight other WTs are shown in Table 4. This has been estimated during normal operation and one week before failure so as to give a flavour of how these measures can be used in practice for CBM.
From Table 4  These measures were tested for a further sample of over 20 WTs and similar results were obtained. Although in some cases all four were able to indicate a shaft and gear tooth failure. However, it was only the MEA measure that gave an indication of a fault for every WT tested, hence making it the most sensitive parameter. There are two reasons for this: Both MSE and EV are the square of the deviation being measured, and when very small numbers are squared, they become smaller making the deviation seem smaller. Unlike these, MEA is the absolute deviation measured and gives a true picture. Even though the c value measures the absolute deviation over the whole power range, this serves as a short coming in cases where the deviation occurs in a certain power range, as seen in   WTG6 where the tooth failure deviation occurred between 75% and 100% rated power. This explains why the c value also underestimates the deviation in some cases.
The summary of from Tables 3 and 4 above show that using a single monitoring criterion independently poses a risk of leading to false alarms. Hence, the authors suggest that a combination of these and other criteria be used in practice. O&M managers can also adopt a more qualitative approach by combining the CM criteria values with the RMS intensity and 3D plots together with expert judgement in determining the presence and severities of failures.

Summary and conclusions
This paper has made the case for the use of peak and RMS values of vibration signals for the CM of WT gearboxes. Three approachessignal correlation, extreme vibration, and RMS intensity models, were developed and validated using CMS data from operational WTs. Furthermore, each CM approach was then used with CMS data from 10 WTs to test its detectability of common failure modes in the gearbox HS module. The result showed that signal correlation with RMS values are good for detecting progressive failures such as HS bearing pitting or shaft cracks as early as a month before failure. However, this was not suitable for detecting gear tooth fracture, agreeing with literature. Unlike RMS, the peak values were better at detecting gear tooth fractures using both the correlation and extreme vibration model. This therefore presents a significant advantage of this study over other techniques presented in literature. Furthermore, the extreme vibration model does not rely on historical data and can be used for newly installed WTs or with missing CMS history. It makes use of the inherent extremeness of the peak vibrations in certain power ranges, which have been identified using the extreme value theory. To the knowledge of the authors, this is the first time this approach has been successfully applied to modelling mechanical vibrations. The authors are currently working on a parallel piece of research to explore this theory even in a greater detail, giving more in-depth analysis of the application of extreme value statistics to WT CBM. Finally, the "delta RMS" plot gives an insight to the severity of the failure, as certain patterns in the "delta RMS" signature emerge in the run-up to failure. This can be used qualitatively in combination with other models and with insights from gearbox experts.
Contrary to claims in literature, it has been shown that RMS and peak values are good indicators of the gearbox health if used properly. These techniques are not without limitations though, one of which is that changes in RMS vibrations are only sensitive to high shaft revolutions. Hence, it will only be suitable for monitoring the high speed and intermediate speed modules of the gearbox, which have higher shaft revolutions than other modules. Also, the CM data used in this study have been from monitored high speed modules of gearboxes.