Condition Monitoring of Wind Turbine Generators Based on SCADA Data and Feature Transfer Learning

In order to build an effective condition monitoring (CM) model for the target wind turbines (WTs) with few operational data, an approach based on the feature transfer learning and a modified generative adversarial network is proposed. First, a large amount of labelled data from WTs are analyzed to construct a CM model with the aid of an autoencoder. This forms the knowledge of CM for WTs in the source domain. Second, a generative adversarial network is trained to build a mapping relationship between the features of different WTs. Third, the health status of the target WT is determined by analyzing the data collected from it online based on the proposed approach. Two case studies are conducted to verify that the proposed method can transfer the CM knowledge from source WT to target WT and achieve good performance in the CM of target WT.


I. INTRODUCTION
As the key equipment to convert wind energy into electricity, wind turbine has grown rapidly in recent years [1]. According to a Global Wind Energy Council report [2], the global capacity of installed wind power reached 837 GW at the end of 2021. However, due to remote wind farm location, harsh working environment, and dynamic working conditions, wind farm often faces problems of frequent wind turbine failure, low power generation efficiency, and high maintenance costs [3], [4]. Therefore, condition monitoring of the wind turbines and their critical components is a main task [5], [6]. For example, the generator is one of the core subsystems of wind turbine. The long-term dynamic operating condition leads to a high failure rate of its components such The associate editor coordinating the review of this manuscript and approving it for publication was Mauro Tucci . as bearings and windings. These impact the power generation efficiency and the economic benefits seriously.
Methods for condition monitoring of wind turbine mainly include vibration analysis, oil analysis, and intelligent diagnosis driven by supervisory control and data acquisition (SCADA) data [7], [8]. The first two approaches need to install sensors. These intrusive condition monitoring methods result in additional costs. On the other hand, there is a lot of SCADA data, but the value of these data for nonintrusive condition monitoring has not been fully explored. Therefore, anomaly detection and fault diagnosis of wind turbines through SCADA data analysis becomes a hot topic. Generally, based on the SCADA data, a normal behavior model (NBM) [9] can be built and quantitative anomaly indicators [10] can be constructed for condition monitoring of wind turbines. For example, a NBM based on principal component analysis and support vector regression was established to VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ detect generator fault based on power, voltages, and currents SCADA data [11]. In [12], a NBM using optimized relevance vector machine regression and adaptive threshold was proposed for anomaly detection of wind turbine pitch system. In [13] and [14], NBMs were established based on Mahalanobis space to detect abnormalities of wind turbines. The NBM received much attention because of its effectiveness and accessibility. However, training a NBM usually needs a large number of historical SCADA data [15]. But, there may not have enough historical data for condition monitoring of a newly installed wind turbine. In addition, the poor communication and data transmission may also result in the loss of data [16]. When facing such conditions, it is difficult to train a NBM effectively.
On the other hand, there is a large amount of historical SCADA data available for wind turbines, such as the same type of wind turbines but with different power ratings. A lot of work has been done on the utilization of these data to form the knowledge of condition monitoring of wind turbines. How to reuse this knowledge for the fault diagnosis of new wind turbines is a new problem worth studying. In recent years, transfer learning has been developed rapidly due to its wide application scenarios [17], [18]. Its main idea is to solve problems in different but related fields (target domain) by learning knowledge from auxiliary fields (source domain) with sufficient relevant data and rich domain knowledge [19]. To some extent, transfer learning relaxes the requirement for sufficient data in target domain and the assumption of independently and identically distributed data between training and testing process [20]. The feature-based transfer is one of the major transfer learning methods, which can achieve the purpose of knowledge reuse when the data distributions of source and target domains are quite different [21]. The feature-based transfer method has been utilized in the intelligent fault diagnosis of wind turbines [16], [22]. For example, a domain-invariant feature learning method named cross-attribute adaptation networks was proposed for fault diagnosis of wind turbine gearboxes [23]. An algorithm named TrAdaBoost with one-class classifier was proposed to establish a condition monitoring model for wind turbine [24]. A novel transfer learning method based on deep learning was proposed to improve rotating machinery fault diagnostic performance [25].
In order to monitor and analyze the health status of a new wind turbine (target domain) using the diagnostic knowledge acquired from the historical SCADA data of existing or retired wind turbines (source domain), it is an important task to learn the information across different domains in feature transfer learning. The generator in generative adversarial network (GAN) can learn the mapping relationship between different data distributions adaptively, and the discriminator in GAN can measure their difference implicitly [26]. Therefore, feature transfer learning can be integrated with GAN to eliminate the difference between data in source and target domains [27]. However, the training process of standard GAN is unstable and prone to have problems, such as gradient disappearance or mode collapse [28]. Researchers have proposed a variety of solutions, including improving the optimization objective function, network structure, and training process [29]. In this paper, to make better use of GAN to learn the information cross different domains, a modified GAN called GAN-quadratic potential (GAN-QP) is proposed. The main contributions of this paper are summarized as follows: 1) A modified GAN-QP framework is proposed to build the mapping relationship between different domain data. This transforms the data from the target wind turbine to the source wind turbine. 2) An integrated approach of NBM and feature transfer learning is proposed to monitor the health status of target wind turbine by utilizing the knowledge constructed from the data in source wind turbine. The effectiveness of the proposed approach is validated by two case studies. The rest of this paper is organized as follows. Section II introduces the basic principles of GAN and the NBM. Section III presents the technological process of wind turbine feature transfer method based on GAN-QP. Section IV reports validation results of the proposed method for condition monitoring and anomaly detection of wind turbine. Finally, conclusions and future research are drawn in Section V.

II. BASIC PRINCIPLES OF GAN AND THE NBM
Domain adaptation (DA) is a popular feature-based transfer learning method, which can reduce the distributional difference of data between source and target domains effectively [30]. That is to say when the data distributions of target and source domains are different, the sample data of target domain can be transformed into source domain through feature mapping. Assuming that X S and X T represents the sample data sets in source and target domains, respectively; P (X ) represents the marginal probability distributions of the sample data set, and represents the mapping relationship, i.e., X T → (X T ). Before feature mapping, the marginal probability distribution of the sample data set in target domain is different from that in source domain, i.e., P (X T ) ̸ = P (X S ). By mapping, the marginal probability distribution of the sample data sets in target and source domains should be as similar as possible, i.e., P ( (X T )) ≈ P (X S ). Therefore, we can make full utilize of the data in the source domain to help improve the learning performance in the target domain, and explore a new idea of establishing an effective model in the target domain with fewer samples. The construction of mapping relationship in DA is the main problem of transfer learning. For the monitoring task of wind turbine in this paper, the generator in GAN is used to construct the mapping relationship.

A. GENERATIVE ADVERSARIAL NETWORK
The structure of the GAN model is shown in Fig. 1, which includes a generator G and a discriminator D. Both are nonlinear functions with neural network structure. Through adversarial training, G gradually learns the relationship that can transform the data z subject to a certain distribution to the same distribution of real data x r , i.e., P(G(z)) ≈ P(x r ), which is expected to cheat the discriminator. In the training process, D needs to distinguish whether the input is from real data x r or generated data x g , and output the probability of the input coming from x r rather than x g . At the beginning of the training, D(x r ) = 1 when x r is input to the discriminator and D(x g ) = 0 when x g is input to the discriminator. With the progress of training, x g tends to go to something like x r , and the discriminator gradually fails to distinguish the difference between x r and x g . When the output results converge around 0.5, the training process is regarded as completed.
In the problem of condition monitoring of wind turbine, samples of source wind turbine are regarded as the real data x r and that of target wind turbine as the data z input to the generator, and the difference between their distributions can be reduced through alternately adversarial training. Finally, a generator that can transform the distribution of the samples in target domain to the distribution of the samples in source domain can be obtained. (1) where E(·), G(·), and D(·) are the expectation, generator, and discriminator functions, respectively; z ∼ p(z) represents z from random data, and x ∼ p r (x) represents x from real data. The optimization objectives during the training process are given as follows The discriminator and generator are trained alternately. When the discriminator reaches the optimum, the optimal loss function of the generator can be expressed by Jenson Shannon (JS) divergence where p r and p g are the distributions of real data and generated data, respectively.
When p r and p g do not overlap, the JS divergence cannot measure the similarity between these two distributions. This will lead to ineffective training of the GAN.
To avoid the problem of saturation interval in traditional GAN, the Wasserstein GAN (WGAN) model [31] uses Wasserstein distance to measure the discrepancy between p r and p g where (p r , p g ) is the set of all possible joint distributions of p r and p g , (x, y) ∼ γ means that a real data x and a generated data y are randomly sampled in the joint distribution.
The advantage of Wasserstein distance over JS divergence lies in that Wasserstein distance can reflect the discrepancy between two distributions even if there is no intersection between them. To make it easier to calculate (5), it is transformed into (6) where K is the Lipschitz constant, ||f || L ≤ K indicates that the function meets the K -Lipschitz constraint conditions, f (·) is a continuous function that can be fitted by a neural network B. GAN-QP WGAN solves part of problems in traditional GAN. However, Cui and Jiang pointed out that discriminators trained by weight clipping would ignore higher-dimensional parts of data distribution [32]. And the optimization process of WGAN model is difficult due to the interaction between weight constraints and cost functions. The Lipschitz constraints also add additional computational complexity.
To address these problems of WGAN, GAN-QP is utilized in this paper by adding a quadratic potential divergence into the discriminator loss function as follows such that the optimal solution of the discriminator automatically satisfies Lipschitz constraint conditions.
where λ is a scaling factor and is usually set to 1 [29], and d (x r , x g ) represents the Euclidean distance between x r and x g , The loss function of generator is The proposed quadratic potential divergence does not need Lipschitz constraints or gradient penalty. Thus, GAN-QP has lower computational complexity and a more stable training process compared with GAN and WGAN. In contrast to the standard GAN, the random data z in Fig. 1 is replaced by VOLUME 11, 2023 the target wind turbine data during the training process of the GAN-QP. It should be noted that the generator is no longer focused on generating new samples, but performing nonlinear transformation of target wind turbine data to make the data distribution similar to that of source wind turbine.

C. NORMAL BEHAVIOR MODEL
The autoencoder is used to build a NBM for condition monitoring of wind turbine [33]. As a neural network model of unsupervised learning, autoencoder usually has a three or more layers network structure [34], which can be divided into an encoder network and a decoder network, as shown in Fig. 2. The numbers of neurons in the input layer and output layer of the autoencoder are the same, while the number of neurons in the hidden layer is usually less, which can reduce the dimension of the input data and extract the most significant features from the input data. The encoding process can be expressed as and the decoding process can be expressed as where W 1 and W 2 are the weight matrices between input and hidden layers and hidden and output layers, respectively; b 1 and b 2 are bias matrices of between input and hidden layers and hidden and output layers, respectively; and σ (·) is the activate function which is usually the sigmoid function. The loss function L is defined as follows where x ′ i and x i are the reconstruction and input data, respectively. The weight matrix W i and the bias matrix b i in the network are optimized by the adaptive moment optimizer. The NBM of the wind turbine is constructed by training the autoencoder with normal data. When the new input data is similar to the normal data, the reconstruction error is small; otherwise, the reconstruction error is large. Therefore, the health status of the wind turbine can be evaluated by analyzing the size of the reconstruction error.

III. PROPOSED CONDITION MONITORING METHOD BASED ON FEATURE TRANSFER LEARNING
The feature transfer learning method is applicable to wind turbines that are of the same type but with different power ratings. One of wind turbines has sufficient SCADA data and complete operation and maintenance records, which is called the source wind turbine. And the other is called the target wind turbine, which is expected to be monitored while its SCADA data is limited. The overall process of the feature transfer method between two wind turbines based on the proposed GAN-QP is shown in Fig. 3. First, the raw SCADA data of source and target wind turbines are preprocessed to construct the training data sets in source and target domains. Second, the discriminator and generator in GAN are trained in cycles alternately and saved as independent models when training completes. Third, an autoencoder is trained by the historical data of healthy wind turbines. Finally, the health status of the target wind turbine is evaluated by the proposed approach.

A. SCADA DATA PREPROCESSING
In order to construct high quality training data sets, preprocessing historical SCADA data including feature selection, data cleaning and normalization, is needed. Firstly, select the parameters of the critical components of wind turbine that the research focuses on. Secondly, SCADA data in different operating conditions should be labelled. Since the monitoring method is based on a NBM, it is necessary to delete data when wind turbine is out of service. Specifically, the deleted data includes those collected when there is wind speed but no power, wind speed is less than the wind turbine's cut-in wind speed (3 m/s), wind speed is higher than cut-out wind speed (25 m/s), and power is limited. Moreover, some extreme outlier data are also detected and deleted by local outlier factor algorithm [35]. Finally, in order to reduce the difficulty in model training and improve the accuracy, original data is normalized by using the min-max method expressed as follows.

B. FEATURE TRANSFER PROCESS
To simplify model structure, both generator and discriminator in GAN-QP utilize multilayer perceptron neural networks. Their networks contain three hidden layers. The numbers of neurons in the three hidden layers of generator network are all set to 6 and that of the discriminator network is set to 30, 30, and 10, respectively. The number of neurons in the output layer is set to 1. The leaky rectified linear units (LeakyReLU) activation function is used in all layers except for the last layer of the discriminator network. Training data sets in source and target domains can be obtained through data preprocessing as discussed in Section III-A. The input of generator is the training data set in target domain rather than random data, and the output is the generated data, which is expected to be similar to the distribution of the training data sets in source domain. The input of discriminator includes the generated data and normal healthy data (i.e., training data set) in source domain, and the discriminator outputs the discriminant results. Furthermore, the loss functions of the discriminator and generator are calculated according to (7) and (9), respectively, and the parameters of the discriminator and generator networks are iterated until they reach a Nash equilibrium [36]. The trained generator model has the capability to transform the data of target wind turbine to a distribution that is similar to the data distribution of wind turbine in source domain.

C. ONLINE TESTING
In the online testing stage, an autoencoder is firstly trained by the historical healthy data of wind turbines. Then, the online testing set is constructed through the same data preprocessing. The trained generator model is applied to transform the testing set into the data distribution space of the source wind turbine. Then, the transformed data are input into the autoencoder to calculate the reconstruction error with the normal historical data of the source wind turbine. Lastly, the condition monitoring model of the target wind turbine is established with the reconstruction error as the monitoring indicator.
In order to reduce false alarms, the alarm strategy in [24] is used. The maximum value of monitoring indicator in training set is taken as the health threshold. The threshold is used to distinguish whether the online data is normal or abnormal, so as to identify the health condition of the target wind turbine.

IV. CASES STUDIES AND DISCUSSIONS
Two cases on the condition monitoring of wind turbine generator subsystem are studied to verify the effectiveness of the proposed method. The information of the wind turbines used in these two case studies is given in Table 1. These three wind turbines are of the same type. They use doublyfed induction generator (DFIG). Thus, they have the same operating features. Furthermore, they are made by the same manufacturer, using the same sensors in the SCADA system and the locations of the sensors in the wind turbines are also same. Therefore, they have same feature parameters in SCADA data. However, these three wind turbines are located in different places. They also have different power ratings, which lead to different distributions of their SCADA data.
The historical healthy SCADA data of the source wind turbine collected in four months from February to May 2016 is used to construct the training set in the source domain. The target wind turbine 1 was healthy from February 2016 to January 2017, but had a bearing fault in its generator at 11:20 on February 5, 2017. Its SCADA system also raised the problem ''generator bearing temperature too high downtime''. The target wind turbine 2 was operating for a short time and only had few data from June to August 2017 in a healthy status. So, the training data sets in target domains used in Case 1 and Case 2 are introduced in Table 1.
According to the specific research object, features that can reflect its health status are selected out. As shown in Table 2, six features are selected as the input of the model [24]. They are average wind speed in 30 seconds, generator rotor speed, active power and three temperature parameters. Since wind turbines work in a harsh environment, the ambient temperature will change a lot over time. Such changes will impact nacelle temperature, and finally lead to the changes in generator temperatures. Thus, in order to eliminate the impacts of the ambient temperature changes, generator temperatures are subtracted from nacelle temperature. VOLUME 11, 2023  Wind turbines are usually in normal operating conditions, so the NMB construction will focus on these normal operating data. According to the data preprocessing method mentioned in Section III-A, the raw data is preprocessed to construct the training data sets both in the source and target domains.

A. CASE 1: ANOMALY DETECTION OF A FAULTY WIND TURBINE
In this case, the data of the source wind turbine and the target wind turbine 1 in Table 1 are used to evaluate the effectiveness of the proposed method. A generator non-driving end bearing fault of the target wind turbine 1 was found during maintenance [13]. This bearing fault led to the increase of the bearing temperature and triggered an alarm in wind turbine's SCADA system on Feb. 5, 2017 finally.
As introduced in Fig. 3, a data preprocessing analysis was done first. Fig. 4(a) shows the scatter plot between wind speed and active power in the original data. It can be clearly seen that there are some noise data which are noted by ''outliers'', ''limited power'', and ''no power'' in the figure. These noise data should be cleaned before modeling. Fig. 4(b) shows the scatter plots of the source and target wind turbines after data preprocessing. It can be seen that the data distributions of these two wind turbines are different but their shapes are similar. The source wind turbine has a lot of data but the target wind turbine only has a small amount of data as reported in Table 1. The data volume of the target wind turbine is insufficient to support the effective training of the condition monitoring model, but the source wind turbine can. Thus, it is valuable to make use of the source wind turbine data to realize condition monitoring of the target wind turbine. Based on the proposed GAN-QP, a nonlinear mapping relationship between source and target wind turbines was learned. Then, the data of the target wind turbine can be transformed into a distribution that is approximately consistent with the distribution of the data in the source domain.
According to the aforementioned training method, set the number of iterations to 400, the amount of data in each batch to 1000, and the optimized learning rate of generator and discriminator model to 0.00005. Through iterative training, the loss of generator and discriminator converge gradually, as shown in Fig. 5. After the training, the GAN-QP model is saved as an independent model. To verify the effectiveness of the GAN-QP model, the training data set in the target domain is transformed by the generator of the GAN-QP model. Fig. 6(a) shows that there is a significant difference between the data distributions of the training data sets in source and target domains before transformation. The main reason is that these two wind turbines have different power ratings and different environmental conditions, since they are installed in different locations. Fig. 6(b) shows the data distributions of the training data sets in the source domain after transformation. It can be observed that the difference between the data distributions has been greatly reduced. These confirm that the trained GAN-QP can effectively transform data from target to source domain. The same transformation is implemented on the testing data set to obtain the target transformed data. Then, an autoencoder model that is trained by the training data set in the source domain is used to analyze the target transformed data. For comparison, another autoencoder is trained by the training data set obtained from the target wind turbine only. The monitoring results based on the reconstruction errors are shown in Fig. 7.   The monitoring result without feature transfer learning is shown in Fig. 7(a). There are a large number of false alarms raised, which are indicated by magenta cycles. This is mainly because the data volume of the training data set in the target domain is too small to learn enough information for  condition monitoring. Only 15-day data are available, which cannot describe all the normal operating conditions of the wind turbine completely. Thus, the knowledge learned by the autoencoder model is incomplete. Many healthy conditions are misdiagnosed as abnormal. Therefore, there are many false alarms. On the contrary, Fig. 7(b) shows the monitoring results with feature transfer learning by the proposed method. It can be seen that the autoencoder model trained by the training data sets in the source domain reduces false alarms greatly. This is due to the fact that the training data set in the source domain expands the healthy space and assists the VOLUME 11, 2023 target wind turbine 1 to complete the condition monitoring task. A partially enlarged view of the monitoring results before and after anomaly detection are shown in Fig. 8. Anomaly was observed at 6:25 on January 28, 2017 (as shown by the black dot dashed line) firstly and had existed from January 31 to February 3. This indicates that the operating state of the generator was deteriorated by its bearing fault. Finally, the wind turbine was shut down for maintenance on February 5. The proposed method can raise alarms 8 days before the SCADA system. Early warning shows that the proposed method is of great value to realize intelligent maintenance for wind turbines.

B. CASE 2: HEALTH MONITORING OF A HEALTHY WIND TURBINE
In this case, source wind turbine and target wind turbine 2 are involved in verifying the capability of the proposed method. Neither of these two wind turbines has faults, while the running time of target wind turbine 2 is very short, only two months.
Referring to the steps in the proposed approach, the training data sets in source and target domains are constructed, and the data distributions of these two wind turbines between wind speed and active power scatter are shown in Fig. 9. It can be seen that the target wind turbine seldom worked at its rated power. Thus, there is a lack of data when the wind speed ranges from 10 to 20 m/s. These indicate that the data of the target wind turbine 2 is incomplete. Thus, transformation was done for it. According the data distributions of the training data sets in source and target domains before and after transformation shown in Fig. 10, the feature transfer learning is effective. Two autoencoder models are trained for condition monitoring in the case of feature transfer and no feature transfer, respectively. Their monitoring results are shown in Fig. 11. Due to the lack of data, some healthy conditions in the target wind turbine are misdiagnosed as abnormal as shown in Fig. 11(a). With the help of feature transfer learning, the performance of the proposed approach is improved.

V. CONCLUSION
It is usually challenging to construct effective condition monitoring models for newly installed wind turbines or when there is a problem of data loss. To address these challenges, this paper proposed an approach based on feature transfer learning and modified GAN-QP. The generator in the GAN-QP is used to learn the nonlinear relationship between source and target wind turbines, and then transform the data of the target wind turbine to the distribution in the space of the source wind turbine. These transformed data, together with the data in the source wind turbine, form big data for condition monitoring. Then, an autoencoder is trained using the data of healthy wind turbines to construct a NBM. This NBM is then used to distinguish whether a wind turbine is working properly. Two case studies were given to verify the effectiveness of the proposed method, which can not only detect the abnormal behavior of the wind turbine effectively, but also reduce false alarms during normal conditions. In the future, the development of other more effective transfer methods, such as instance-based and model-based transfer learning methods, will be studied for condition monitoring of wind turbines.