A Parameter Selection Method for Wind Turbine Health Management through SCADA Data

More and more works are using machine learning techniques while adopting supervisory control and data acquisition (SCADA) system for wind turbine anomaly or failure detection. While parameter selection is important for modelling a wind turbine’s health condition, only a few papers have been published focusing on this issue and in those papers interconnections among subcomponents in a wind turbine are used to address this problem. However, merely the interconnections for decision making sometimes is too general to provide a parameter list considering the differences of each SCADA dataset. In this paper, a method is proposed to provide more detailed suggestions on parameter selection based on mutual information. Moreover, after proving that Copula, a multivariate probability distribution for which the marginal probability distribution of each variable is uniform is capable of simplifying the estimation of mutual information, an empirical copula based mutual information estimation method (ECMI) is introduced for an application. After that, a real SCADA dataset is adopted to test the method, and the results show the effectiveness of the ECMI in providing parameter selection suggestions when physical knowledge is not accurate enough.


Introduction
As wind energy is identified as being one of the most promising sources of renewable energy, more and more wind turbines have been installed around the world. In Europe, annual wind power installation in 2015 is 12.8 GW, in which offshore wind power contributes nearly 25%. By the end of 2015, nearly 142 GW wind power was been installed in Europe [1]. While the huge amount of wind energy will bring many benefits, there are still a large amount of challenges to overcome considering the potential cost of operation and maintenance (O&M). As the O&M cost of a wind farm constitutes 25% to 30% of total power generation cost [2], reducing unscheduled shut down time and devising an efficient maintenance strategy are of significant importance to operators.
The first step to realizing these goals is anomaly or potential failure detection, which is the prior knowledge required to make decisions considering maintenance O&M optimization. In this field, many studies are focus on utilizing statistical approaches for diagnosis [3,4]. Moreover, considering datasets adopted in the recent works, wind turbine SCADA data is more preferred by researchers because it provides comprehensive information of different subcomponents of a wind turbine such as: gearbox bearing temperature, oil temperature, wind speed, wind direction, output power, pitch angle, and rotor speed [5]. Moreover, as there are many sensors even for one subcomponent-such as the method will be used for regression. As the mutual information indices of each dataset are different due to dissimilar working conditions and anomaly patterns, the proposed method can be considered an auxiliary tool for parameter selection by adjusting the general parameters list according to physical knowledge and field experience.
The rest of this paper is organized in the following way. In Section 2, the mathematical background knowledge of both copula and mutual information is introduced. Afterwards, a mathematical proof is provided to show that estimating mutual information through copula is feasible and efficient. Besides, empirical copula-based mutual information estimation (ECMI) is proposed from application perspective. In Section 3, a case study based on real SCADA data of wind turbines is presented, the results are discussed and a suggestion on parameter selection is recommended. In the conclusion, a summary of all the findings in this paper is provided and future work is indicated.

Methodology
Copulas are functions which build connection between variables' high-dimensional collaborative distributions and the one-dimensional marginal distribution of each variable. The main characteristic of copulas is that they capture properties of the joint distribution of the variables and are immune to any increasing transformation of the marginal variables [26].

Mathematical Definition of Copulas
Consider X as a random variable with continuous cumulative distribution function F, then the probability integral transformation of X can be written as U = F(X), and U is uniformly distributed in interval [0, 1].
Consider a random vector X = {X 1 , X 2 , . . . , X n }, and suppose its marginal distribution is continuous. Hence, the marginal cumulative distribution functions F X i (x) = Pr(X i ≤ x) are continuous functions. After adopting probability integral transformation for each element, the vector U can be expressed as is uniformly distributed marginal. The copula of X = {X 1 , X 2 , . . . , X n } is defined as the joint cumulative distribution function of U represented as All information on the dependence between the elements from X is captured by the copula C and the marginal cumulative functions F X i (x) contain all information on the marginal variables.
Sklar's theorem [27]: consider a random vector X = {X 1 , X 2 , . . . , X n } with continuous marginal cumulative distribution functions F X i (x) = Pr(X i ≤ x), every joint cumulative distribution function of X can be described as F(x 1 , x 2 , . . . , x n ) = Pr(X 1 ≤ x 1 , X 2 ≤ x 2 , . . . , X n ≤ x n ). Then, F(X) can be expressed as in which F X 1 (x), F X 2 (x), . . . , F X n (x) are the marginal cumulative distribution functions. If the density function of the joint distribution is available, then it can be deduced in the following way where c is the density function of copula, f X 1 (x 1 ) f X 2 (x 2 ) . . . f X n (x n ) are the marginal density functions of each variable. Equation (3) shows the role of copula in the relationship between multivariate distribution functions and their margins. The theorem also provides the theoretical foundation for the application of copulas. A copula has its own well-grounded mathematical definitions and properties where the details can be found in [28].

Mutual Information (MI) and Entropy
In information and probability theories, the mutual information is a measure of mutual association between two variables. More specifically, it quantifies "the amount of information" obtained by one random variable, through other random variables. The concept of mutual information is intricately linked to the "entropy" of a random variable, which defines "the amount of information" held in this random variable. The relationship between entropy and mutual information is represented in Figure 1. In this figure, the zone contained by both circles is the joint entropy H(X,Y). The left circle (yellow and green) is the individual entropy H(X). The yellow is the conditional entropy H(X|Y). The circle on the right (blue and green) is H(Y), with the blue being H(Y|X). The green is the mutual information I(X;Y). of copulas. A copula has its own well-grounded mathematical definitions and properties where the details can be found in [28].

Mutual Information (MI) and Entropy
In information and probability theories, the mutual information is a measure of mutual association between two variables. More specifically, it quantifies "the amount of information" obtained by one random variable, through other random variables. The concept of mutual information is intricately linked to the "entropy" of a random variable, which defines "the amount of information" held in this random variable. The relationship between entropy and mutual information is represented in Figure 1  Unlike correlation coefficients, mutual information is more general and determines the similarity between the joint distribution of two variables and the products of their marginal distribution. Hence, mutual information of two random variables is invariant to the relationshipi.e., linear or nonlinear-between them.
In [29], Shannon defined the entropy of a discrete random variable with possible values , = 1,2, … , as in which ( ) means the probability of each value of . It can be extended to a continuous random variable scenario as where ( ) is the probability density function of . To make the concept easier to understand, ( ) can be considered as the average information carried by the random variable [30]. In [31], mutual information is defined as From the distribution perspective, it can also be written as  Unlike correlation coefficients, mutual information is more general and determines the similarity between the joint distribution of two variables and the products of their marginal distribution. Hence, mutual information of two random variables is invariant to the relationship-i.e., linear or nonlinear-between them.
In [29], Shannon defined the entropy H of a discrete random variable X with possible values x i , i = 1,2, . . . ,n as in which Pr(x i ) means the probability of each value of X. It can be extended to a continuous random variable scenario as where f (x) is the probability density function of X. To make the concept easier to understand, H(x) can be considered as the average information carried by the random variable X [30].
In [31], mutual information is defined as From the distribution perspective, it can also be written as Energies 2017, 10, 253 5 of 14 in which I(X; Y) represents mutual information of X and Y, P X,Y (x, y) is the joint probability of X and Y, and P X (x), P Y (x) are the marginal probabilities. The continuous version of Equation (9) is presented in the following subsection.

Estimate Mutual Information through Copula
Equation (9) holds the capability and properties of mutual information. In this equation, both joint distribution density and marginal distribution density exist in this equation. Therefore, it is understandable to consider using copula transformation to simplify the form of Equation (9). This is the inspiration of this work. In this part, a mathematical proof is provided for the feasibility of estimating mutual information adopting copula.
Let us consider X and Y be two variables with continuous marginal distribution functions and joint probability density function. Then the mutual information of X and Y can be written as in which, f X,Y (x, y) is the joint probability density function and f X (x) and f Y (y) are the marginal distribution functions of X and Y. Based on Sklar's theorem and Equations (3) and (5), the copula of X and Y can be represented as where F X,Y (x, y) is the joint cumulative distribution function and F X (x) and F Y (y) are the marginal cumulative distribution functions. Moreover, the density function of copula can be described as Hence, based on Equation (11), it is obvious that Equation (10) can be rewritten as Let F X (x) = a and F Y (y) = b and consider that F X (x) and F Y (y) are both distributed in the interval [0, 1], then Equation (13) can be simplified as In this way, instead of estimating f X (x), f Y (y), and f X,Y (x, y) without any prior knowledge of the correlation between the variables, the mutual information of random variables X and Y can be calculated after finding the probability density function of copula.

Empirical Copula-Based Mutual Information Estimation (ECMI)
The empirical copula, introduced by Deheuvels in [32], is a non-parametric method where no prior assumption on the relationship of random variables is needed. Besides, considering the feasibility to apply the method for mutual information calculation, an empirical copula is more appropriate compared with other copula family members because of the convenience for understanding and calculating.
According to [33] and Equation (14), the mathematical formula of empirical copula is represented as where F X (x i ) and F Y (y i ) are the marginal cumulative distribution functions which can be calculated by adopting empirical distribution functions. Taking F X (x i ) as an example, it can be expressed as (16) and the probability density function is written as In the above functions, N is the size of the original dataset. The function ω can be approximated by kernel methods [34] which can be generally expressed as in which, u is the input vector, N is the data size, K(·) is kernel smoother, p is the dimensionality of u and h is the bandwidth. In this paper, the mutual information between X and Y can be approximated as Equation (19), which can be described as is near zero should be clarified. Considering that, after changing the format, it equals to zero based on L' Hopital's Rule as

Scenario Description
The dataset adopted in this work is a real SCADA dataset covering two months working period of a 2.5 MW wind turbine and the sampling period is 20 s. According to the warning signals, there are 370 alarms during this period. As it is practically impossible to have so many failures or anomalies in such a short period, most of the alarms should be considered only as reminders for operators that there are some ambient turbulences and changes on the wind turbine's control strategy during operation. In order to figure out the real anomalies in a wind turbine, a performance indicator is created by calculating the deviations between the normal behavior and real observation [35]. This deviation takes the form of Euclidean distance. The behavior of a wind turbine takes the form of power output. To calculate this index, several parameters should be selected from the SCADA dataset to build a regression model in which output power is the target. From the adopted SCADA dataset, output power has the label active power (AP). In the original dataset, there are 53 parameters related to wind turbine subcomponents and the power grid. In this work, 35 parameters which represent wind turbine working conditions are taken into account.
The whole procedure of the application of ECMI is shown in Figure 2. In the calculating process, mutual information between other parameters and active power is estimated in turn and a rank list based on the result is created.  Besides, as units of each parameter are different, when trying to investigate wind turbine system level behavior, some of the parameters with small value ranges cannot have an equal chance to impact the model. Hence, the data is normalized with Equation (21).
where N represents the normalized data vector and V means the original dataset.

Results Based on ECMI
Based on the cleaned dataset, the value of each parameter is divided into 100 bins in [0, 1], empirical copulas of each pair of parameter are constructed and copula density is estimated by adopting kernel smooth method. Figures 4 and 5 show cumulative copula and copula density of variable pairs as (active power (AP), yaw) and (AP, wind speed (WS)).
The copula density of (AP, WS) in Figure 5 is the observations of each parameter with the Z axis as the probability value. The distribution of AP and WS proves that the empirical copula process does For cleaning process, a typical wind turbine power curve is adopted as a reference. After comparing the observed power curve to the reference, data points which contain negative values for power output are filtered out. Moreover, some bad data points caused by sensor mistakes are also cleaned. The wind turbine power curve from the cleaned dataset is shown in Figure 3.  Besides, as units of each parameter are different, when trying to investigate wind turbine system level behavior, some of the parameters with small value ranges cannot have an equal chance to impact the model. Hence, the data is normalized with Equation (21).
where N represents the normalized data vector and V means the original dataset.

Results Based on ECMI
Based on the cleaned dataset, the value of each parameter is divided into 100 bins in [0, 1], empirical copulas of each pair of parameter are constructed and copula density is estimated by adopting kernel smooth method. Figures 4 and 5 show cumulative copula and copula density of variable pairs as (active power (AP), yaw) and (AP, wind speed (WS)).
The copula density of (AP, WS) in Figure 5 is the observations of each parameter with the Z axis as the probability value. The distribution of AP and WS proves that the empirical copula process does Besides, as units of each parameter are different, when trying to investigate wind turbine system level behavior, some of the parameters with small value ranges cannot have an equal chance to impact the model. Hence, the data is normalized with Equation (21).
where N data represents the normalized data vector and V means the original dataset.

Results Based on ECMI
Based on the cleaned dataset, the value of each parameter is divided into 100 bins in [0, 1], empirical copulas of each pair of parameter are constructed and copula density is estimated by adopting kernel smooth method. Figures 4 and 5 show cumulative copula and copula density of variable pairs as (active power (AP), yaw) and (AP, wind speed (WS)). information estimation can be used as a reference for parameter selection. Moreover, from Figure 5, it can be observed that the probability of copula (AP, Yaw) is much smaller than that of copula (AP, WS). This corresponds to the final result that wind speed ranks higher than yaw in the suggestion list. Since in this work only the components of a wind turbine are taken into consideration, the suggestion list shown in the following table only provides a rank of parameters related to the sub components.  According to [6], parameters which have influence on the output power are selected based on experience. In this paper, nacelle temperature, rotor speed, gearbox oil temperature, hydraulic temperature, generator bearing temperature, wind speed, and pitch angle are suggested for next steps in research. In Table 1, it can be observed that nacelle temperature affects power output, however, it does not rank high enough to be a choice. It means that from mutual information perspective, nacelle temperature does not hold enough information to predict active power. Based on the list, the yaw parameter is recommended for modelling system behavior instead of utilizing nacelle temperature. For wind turbines, this needs further discussion if the misalignment information is available in the SCADA data. According to [36], a wind turbine's behavior is complex and site information estimation can be used as a reference for parameter selection. Moreover, from Figure 5, it can be observed that the probability of copula (AP, Yaw) is much smaller than that of copula (AP, WS). This corresponds to the final result that wind speed ranks higher than yaw in the suggestion list. Since in this work only the components of a wind turbine are taken into consideration, the suggestion list shown in the following table only provides a rank of parameters related to the sub components.  According to [6], parameters which have influence on the output power are selected based on experience. In this paper, nacelle temperature, rotor speed, gearbox oil temperature, hydraulic temperature, generator bearing temperature, wind speed, and pitch angle are suggested for next steps in research. In Table 1, it can be observed that nacelle temperature affects power output, however, it does not rank high enough to be a choice. It means that from mutual information perspective, nacelle temperature does not hold enough information to predict active power. Based on the list, the yaw parameter is recommended for modelling system behavior instead of utilizing nacelle temperature. For wind turbines, this needs further discussion if the misalignment information is available in the SCADA data. According to [36], a wind turbine's behavior is complex and site The copula density of (AP, WS) in Figure 5 is the observations of each parameter with the Z axis as the probability value. The distribution of AP and WS proves that the empirical copula process does not change the original information and maintains the physical meaning. Hence, the mutual information estimation can be used as a reference for parameter selection. Moreover, from Figure 5, it can be observed that the probability of copula (AP, Yaw) is much smaller than that of copula (AP, WS). This corresponds to the final result that wind speed ranks higher than yaw in the suggestion list. Since in this work only the components of a wind turbine are taken into consideration, the suggestion list shown in the following table only provides a rank of parameters related to the sub components.
According to [6], parameters which have influence on the output power are selected based on experience. In this paper, nacelle temperature, rotor speed, gearbox oil temperature, hydraulic temperature, generator bearing temperature, wind speed, and pitch angle are suggested for next steps in research. In Table 1, it can be observed that nacelle temperature affects power output, however, it does not rank high enough to be a choice. It means that from mutual information perspective, nacelle temperature does not hold enough information to predict active power. Based on the list, the yaw Energies 2017, 10, 253 9 of 14 parameter is recommended for modelling system behavior instead of utilizing nacelle temperature. For wind turbines, this needs further discussion if the misalignment information is available in the SCADA data. According to [36], a wind turbine's behavior is complex and site dependent. Terrain, wakes, and the coupling among wind turbines may all have impact on a wind turbine's operation. Besides, considering the gearbox temperature, Gearbox_BearingT1 is recommended due to a higher ranking which implies that it is more closely related to output power. The result shows that ECMI is capable of providing suggestions for parameter selection regarding the difference of each SCADA dataset. In the following sub section, the advantages of mutual information based on parameter selection will be discussed by conducting a comparison study between ECMI and other statistical methods for correlation coefficient analysis.

Comparison Study: The Advantages of Mutual Information Based Parameter Selection
To investigate the statistical relationships among the parameters, some other methods are also available. In this part, Pearson correlation coefficient analysis (PCCA) and kernel canonical correlation analysis (KCCA) are adopted for a comparison study. PCCA is used for estimating the strength of the linear relationship between two variables. KCCA is adopted to assess the strength of the nonlinear relationship between two parameters. The details of these two approaches are described in [20,21].
After applying these two methods to the SCADA dataset, Table 2 shows the results which can be considered as the strength of linear and nonlinear relationship between active power and other parameters. The suggested parameters in [6] are highlighted with red color in Tables 1 and 2. The differences of the parameters' locations in the three lists are because that PCCA, KCCA, and ECMI explore different relationships among parameters. For example, in the PCCA-based rank list, the first parameter is Converter_L Current, which has the most linear relationship with output power. Table 2. Criticality rank of a wind turbine subcomponents based on PCCA and KCCA.

PCCA KCCA
The advantages of an ECMI-based parameter selection method can be discussed from two perspectives. First, as it is mentioned above, the parameter list based on interconnections among sub-components in a wind turbine is more preferable for field operators. When checking the positions of highlighted parameters in the rank lists, they take relatively higher positions in Table 1. This implies that an ECMI-based rank list and the parameter list based on interconnections among wind turbine's subcomponents share a similar trend. ECMI method can be used as a supplement when the interconnection based idea is not accurate enough for decision making.
The other advantage of ECMI-based parameter selection lies in the ranks of some parameters. After comparing the results in Tables 1 and 2, the main differences are the ranks of some parameters  such as pitch angle, yaw, ambient temperature, and nacelle temperature. Both PCCA and KCCA can detect the statistical relation between active power and other parameters. Since pitch angle has significant impact on power output, KCCA-based rank is more reasonable since KCCA is capable of detecting nonlinear relation between variables. When compared to Table 1, ECMI gives an even higher rank than that in KCCA, which shows the effectiveness of the proposed method in detecting nonlinear relationships. Moreover, both PCCA and KCCA failed to provide appropriate ranks of yaw, ambient temperature, and nacelle temperature while all these parameters are often considered important for modeling wind turbines behavior. The reason is that the values of these three parameters are almost stationary, while both PCCA and KCCA are sensitive to parameters which change frequently. In this case, ECMI is more efficient in detecting associations among parameters as it provides a more reasonable rank for yaw, ambient temperature, and nacelle temperature. This should be attributed to the main feature of mutual information that it is a more general method which measures the common information shared by two variables rather than investigating whether they are related linearly or nonlinearly.
From the condition monitoring view, ambient temperature also has an impact on wind turbine power output. Even though this parameter ranks a little bit higher in the ECMI list, it takes a lower position in all three rank lists. However, considering that all the subcomponents are located in the nacelle, the power output is more closely related to nacelle temperature. In three rank lists, nacelle temperature ranks higher than ambient temperature, which is consistent with the field experience.
To evaluate the performance of the ECMI-based parameter selection method, NN is used for testing the capability of the parameters selected from Tables 1 and 2. The input for the NN is the selected parameters and the target is the AP of the wind turbine. Then the SCADA data regarding all the selected parameters and AP are used to train the NN. After that, the best validation performance is chosen as the indicator which shows the effectiveness of the method. The criteria for parameter selection are:

1.
Select 10 parameters which rank higher in the three rank lists.

2.
Whenever there are several parameters regarding to the same sub component, choose the one which ranks higher.
Based on the above criteria, the parameter lists are created and shown in Table 3. In this test, multi perceptron feed forward NN is used to build regression model between AP and the selected parameters through SCADA data. As it is only used for testing, an NN with three layers and 10 neurons in each layer is built. To validate the performance of the training process, Mean Square Error (MSE) is used as the indicator. The training function is scaled conjugate gradient back propagation. After training the NN, the validation performance based on each parameter list is shown in Figure 6. In this case, the method which provides smaller MSE values indicates that it is more accurate in modelling the operation behavior of a wind turbine.
From the training results, the MSE based on the ECMI list has the lowest value, 9.9349 × 10 −6 , as shown in Figure 6c. The results shown in the KCCA-based test are bigger than the one with ECMI list, however, smaller when compared with the results generated by PCCA list. Hence, the parameter selection method based on ECMI is preferred because it can produce more accurate training results which is very important for wind turbine anomaly detection and condition monitoring. and copula, a mathematical proof is provided to show that estimated mutual information through copula is more efficient because only copula density needs to be figured out. Then, to make the method more applicable, an empirical copula-based mutual information estimation approach is provided. Besides, real wind turbine SCADA data is adopted for testing the method and the results show the efficiency of the ECMI method.
Afterwards, a suggestion list for parameter selection is provided based on the rank list. A parameter that ranks higher in the list implies that it is more closely related to the target variable. The ECMI method is suitable as an auxiliary method for parameter selection because, while following the physical knowledge of a real wind turbine, ECMI is capable of finding specialties of different dataset which makes the next-step investigation more effective. The advantage of the ECMI-based parameter selection method lies in the fact that no assumptions on statistical relationship among parameters are needed when using it. Moreover, the validation performance after training the NN also shows that the ECMI-based method can produce more accurate results. The future work is to apply this method to different SCADA datasets to test the stability of ECMI.