Effects of Wind Conditions on Wind Turbine Temperature Monitoring and Solution Based on Wind Condition Clustering and IGA-ELM

To reduce maintenance costs of wind turbines (WTs), WT health monitoring has attracted wide attention, and different methods have been proposed. However, most existing WT temperature monitoring methods ignore the fact that various wind conditions can directly affect internal temperature of WT, such as main bearing temperature. This paper analyzes the effects of wind conditions on WT temperature monitoring. To reduce these effects, this paper also proposes a novel WT temperature monitoring solution. Compared with existing solutions, the proposed solution has two advantages: (1) wind condition clustering (WCC) is applied and then a normal turbine behavior model is built for each wind condition; (2) extreme learning machine (ELM) is optimized by an improved genetic algorithm (IGA) to avoid local minimum due to the irregularity of wind condition change and the randomness of initial coefficients. Cases of real SCADA data validate the effectiveness and advantages of the proposed solution.


Introduction
Since wind energy is renewable and pollution-free, many governments cite wind energy as a primary future energy source [1]. However, the high maintenance costs of wind turbines (WTs) seriously restrict the development of the wind energy industry [2]. The most effective way to reduce maintenance costs is to monitor the working state of WTs to sound alarms when failures occur. Thus, model-based WT monitoring has attracted wide attention, and different methods have been proposed. Existing model-based WT monitoring methods can be roughly divided into two categories [3]: theoretical and data-driven. The advantage of theoretical methods is that fewer data are required. By using theoretical methods, main physical mechanisms can be responsible for the temperature changes [4]; annual energy is obtained [5]. For data-driven methods, model accuracy relies on the quality and quantity of the data. By using data-driven methods, turbines with weakened power generation performance are identified through assessing the wind power curve profiles [6]; shorthorizon power is predicted [7]; maintenance decisions are made according to the complex WT component degradation processes [8]; fault can be detected [9]. Since WTs are complex electromechanical systems, the relationships between its various parameters are primarily nonlinear; thus, the construction of the theoretical model is difficult and inaccurate [10]. With supervisory control and data acquisition (SCADA) improvements [11][12][13], the datadriven methods are more suitable for WT monitoring [14].
A typical data-driven WT monitoring method analyzes real-time data using a model with one parameter as the output and other parameters as inputs [15]. When building datadriven models, several different intelligent algorithms are applied. With nonlinear state 1.
The effects of wind condition on a WT's internal temperature are investigated.

2.
A WCC scheme is proposed so that normal WT behaviors are built under different clusters. This divide-and-conquer strategy can help reduce false alarms. 3.
IGA is used to optimize ELM to improve the accuracy of the model.
The remainder of this paper is organized as follows. Section 2 investigates the effects of wind conditions on WT temperature monitoring. In Section 3, the framework of the proposed solution is presented in detail. Section 4 presents the cases study and the results of various analyses. Conclusions are summarized in Section 5.

Effects of Wind Conditions
WTs are exposed to harsh weather conditions all year round. Large differences between wind conditions can make the internal temperatures of different components vary markedly, even under the same working state. Figure 1 shows the active power-wind speed curve under a normal working state, and Figure 2 shows the main bearing temperature-wind speed curve. To reduce the impact of external temperature, the external temperature of the two data sets shown in the two curves is between 14 and 16 • C.
proposed solution is presented in detail. Section 4 presents the cases study and the results of various analyses. Conclusions are summarized in Section 5.

Effects of Wind Conditions
WTs are exposed to harsh weather conditions all year round. Large differences between wind conditions can make the internal temperatures of different components vary markedly, even under the same working state. Figure 1 shows the active power-wind speed curve under a normal working state, and Figure 2 shows the main bearing temperature-wind speed curve. To reduce the impact of external temperature, the external temperature of the two data sets shown in the two curves is between 14 and 16 °C.  The two figures show that the active power and internal temperature are directly related to the wind speed, which means WT temperature monitoring should consider wind speed. Therefore, most existing WCC divide wind conditions into three regions: I (0-3 m/s), II (3-12 m/s) and III (>12 m/s) according to the absolute value of wind speed, as shown in Figure 1, without considering wind speed change. However, the fact is that wind speed change also affects internal temperatures. Figure 2 shows that the main bearing temperature increases as the wind speed increases; this result occurs because wind speed has a direct positive correlation with rotor speed, which is directly related to the heat generated in a WT. Considering the progress of the heat conduction, there can be some delay between the wind speed change and internal temperature change. Due to this delay, under the same wind speed, the main proposed solution is presented in detail. Section 4 presents the cases study and the results of various analyses. Conclusions are summarized in Section 5.

Effects of Wind Conditions
WTs are exposed to harsh weather conditions all year round. Large differences between wind conditions can make the internal temperatures of different components vary markedly, even under the same working state. Figure 1 shows the active power-wind speed curve under a normal working state, and Figure 2 shows the main bearing temperature-wind speed curve. To reduce the impact of external temperature, the external temperature of the two data sets shown in the two curves is between 14 and 16 °C.  The two figures show that the active power and internal temperature are directly related to the wind speed, which means WT temperature monitoring should consider wind speed. Therefore, most existing WCC divide wind conditions into three regions: I (0-3 m/s), II (3-12 m/s) and III (>12 m/s) according to the absolute value of wind speed, as shown in Figure 1, without considering wind speed change. However, the fact is that wind speed change also affects internal temperatures. Figure 2 shows that the main bearing temperature increases as the wind speed increases; this result occurs because wind speed has a direct positive correlation with rotor speed, which is directly related to the heat generated in a WT. Considering the progress of the heat conduction, there can be some delay between the wind speed change and internal temperature change. Due to this delay, under the same wind speed, the main The two figures show that the active power and internal temperature are directly related to the wind speed, which means WT temperature monitoring should consider wind speed. Therefore, most existing WCC divide wind conditions into three regions: I (0-3 m/s), II (3-12 m/s) and III (>12 m/s) according to the absolute value of wind speed, as shown in Figure 1, without considering wind speed change. However, the fact is that wind speed change also affects internal temperatures. Figure 2 shows that the main bearing temperature increases as the wind speed increases; this result occurs because wind speed has a direct positive correlation with rotor speed, which is directly related to the heat generated in a WT. Considering the progress of the heat conduction, there can be some delay between the wind speed change and internal temperature change. Due to this delay, under the same wind speed, the main bearing temperature during an increase in wind speed could be lower than during a de- crease in wind speed. In order to present this phenomenon specifically, the main bearing temperature-wind speed curves during wind speed increase and decrease are shown in Figure 3. For an accurate description, despite the wind speed change, the working state and external temperature are similar, which are producing an active power of 900-1000 kW, an external temperature of 14-16 • C and a wind speed of 9-11 m/s. sors 2022, 22, x FOR PEER REVIEW 4 bearing temperature during an increase in wind speed could be lower than during a crease in wind speed. In order to present this phenomenon specifically, the main bea temperature-wind speed curves during wind speed increase and decrease are show Figure 3. For an accurate description, despite the wind speed change, the working s and external temperature are similar, which are producing an active power of 900-1 kW, an external temperature of 14-16 °C and a wind speed of 9-11 m/s.  Figure 3 shows that, under the same wind speed, the main bearing temperature periences a significant difference during wind speed increase and decrease. The ave difference is 4.6 °C, and the maximum difference can reach 5.4 °C. These results indi that changes of wind speed affect internal temperatures in a WT. Thus, WCC should sider not only the absolute value of wind speed but also the changes of that.
More importantly, wind condition is particularly irregular that wind speed can constant for a long time or fluctuate markedly in a short time, making intelligent a rithms arrive at a local minimum. Therefore, to improve the accuracy of WT tempera monitoring, it is necessary to optimize the intelligent algorithm for the solution w WCC.

Proposed Solution Framework
To reduce the effects of wind conditions and monitoring WT temperature accura a solution with WCC and IGA-ELM is proposed, and its flowchart is shown in Figu The proposed solution has two key parts:

•
Wind data are partitioned into several condition clusters by using K-means clus ing, so that each wind condition has an independent normal behavior model. can make the monitored data more suitable with their corresponding models. To best knowledge, this is the first WCC based on a data-driven method.

•
The ELM algorithm is based on one set of initial input weights and hidden layer b which could cause the ELM models fail to achieve its due accuracy. In the propo solution, IGA, with the random global search capability, is applied to optimize E for the irregularity of wind condition change and the randomness of initial wei and bias.   Figure 3 shows that, under the same wind speed, the main bearing temperature experiences a significant difference during wind speed increase and decrease. The average difference is 4.6 • C, and the maximum difference can reach 5.4 • C. These results indicate that changes of wind speed affect internal temperatures in a WT. Thus, WCC should consider not only the absolute value of wind speed but also the changes of that.
More importantly, wind condition is particularly irregular that wind speed can be constant for a long time or fluctuate markedly in a short time, making intelligent algorithms arrive at a local minimum. Therefore, to improve the accuracy of WT temperature monitoring, it is necessary to optimize the intelligent algorithm for the solution with WCC.

Proposed Solution Framework
To reduce the effects of wind conditions and monitoring WT temperature accurately, a solution with WCC and IGA-ELM is proposed, and its flowchart is shown in Figure 4. The proposed solution has two key parts:

•
Wind data are partitioned into several condition clusters by using K-means clustering, so that each wind condition has an independent normal behavior model. This can make the monitored data more suitable with their corresponding models. To our best knowledge, this is the first WCC based on a data-driven method.

•
The ELM algorithm is based on one set of initial input weights and hidden layer bias, which could cause the ELM models fail to achieve its due accuracy. In the proposed solution, IGA, with the random global search capability, is applied to optimize ELM for the irregularity of wind condition change and the randomness of initial weights and bias.
With historical healthy data, WCC and IGA-ELM models can simulate the normal behaviors of WT. When monitoring, the residuals, which are the difference between the actual value and the predicted value of the models, are analyzed to show the deviation of the actual state from the normal behaviors. The rise of the residuals can be an indication of possible failure.  With historical healthy data, WCC and IGA-ELM models can simulate the normal behaviors of WT. When monitoring, the residuals, which are the difference between the actual value and the predicted value of the models, are analyzed to show the deviation of the actual state from the normal behaviors. The rise of the residuals can be an indication of possible failure.

WCC Using K-Means Clustering
In the proposed solution, K-means clustering is applied for WCC. As an unsupervised learning method, K-means clustering is widely used in clustering algorithms due to its accuracy and efficiency [34]. The aim of K-means clustering is to allocate all wind condition samples into K clusters by minimizing the sum of the squared error over all K clusters, denoted as follows [35]: where = , , … , is the set of K clusters, is the cluster centroid of the ith cluster, , , … , is the cluster samples, and N is the number of samples. In the K-means algorithm, the clustering number K is a key parameter. To evaluate the clustering number K, the silhouette value [36] is applied in the proposed solution. The silhouette value for the ith sample is expressed as

WCC Using K-Means Clustering
In the proposed solution, K-means clustering is applied for WCC. As an unsupervised learning method, K-means clustering is widely used in clustering algorithms due to its accuracy and efficiency [34]. The aim of K-means clustering is to allocate all wind condition samples into K clusters by minimizing the sum of the squared error over all K clusters, denoted as follows [35]: . . , O k } is the set of K clusters, µ i is the cluster centroid of the ith cluster, {l 1 , l 2 , . . . , l N } is the cluster samples, and N is the number of samples.
In the K-means algorithm, the clustering number K is a key parameter. To evaluate the clustering number K, the silhouette value [36] is applied in the proposed solution. The silhouette value for the ith sample is expressed as is the average distance from the ith sample to other samples from the same cluster and b i = min |Oj| is the minimum average distance from the ith sample to other samples from different clusters. The range of S i is [−1, 1]. S i indicates the rationality of the ith sample's clustering, and the average of all S i indicates the rationality of WCC with K clusters.

ELM Algorithm
The intelligent algorithm is the key to the model-based WT temperature monitoring. Compared with NN, the ELM has the advantages of fast training speed and high accuracy. ELM is composed of a single hidden layer feed forward neural network. The topological diagram of ELM is shown as in Figure 5. The intelligent algorithm is the key to the model-based WT temperature monitoring. Compared with NN, the ELM has the advantages of fast training speed and high accuracy. ELM is composed of a single hidden layer feed forward neural network. The topological diagram of ELM is shown as in Figure 5. and is the input and output weights, respectively. For n distinct samples X, the ELM can approximate the target as where (•) represents the activation function, is the number of hidden nodes, and is the hidden layer bias. If ELM can fit n distinct samples with zero error, the matrix form of approximation can be expressed as where the output weights = [ , , … , ] , = [ , , … , ] and the hidden layer output matrix can be expressed as With given input weights and hidden layer bias , the output weight can be analytically calculated by a least squares method as where is the generalized Moore Penrose inverse of . Figure 5. Topology of the extreme learning machine (ELM).
In Figure 4, are the inputs and outputs of the model, respectively; ω ij and ω jk is the input and output weights, respectively. For n distinct samples X, the ELM can approximate the target aŝ where g(·) represents the activation function, n is the number of hidden nodes, and b i is the hidden layer bias. If ELM can fit n distinct samples with zero error, the matrix form of approximation can be expressed as where the output weights , ω 2k , . . . , ω nk ] T and the hidden layer output matrix H can be expressed as With given input weights ω ij and hidden layer bias b j , the output weight can be analytically calculated by a least squares method as where H + is the generalized Moore Penrose inverse of H. Then the solution can be expressed aŝ

GA Optimization
The ELM model is based on one set of initial input weights ω ij and hidden layer bias b j , which are set based on experience in most existing studies [28][29][30]. However, the wind conditions are irregular. This makes the ELM model, with initial coefficients based on experience, fail to achieve its due accuracy. To solve this problem, GA is applied to optimize ELM in the proposed solution. GA is a global random search optimization algorithm based on the genetic mechanism and evolution, which can select the individuals with good fitness.
With the strong global search capability, the initial coefficients of ELM can be optimized by GA and the accuracy of the model can be improved [37,38].
The coefficients to be optimized, which are initial input weights ω ij and hidden layer bias b j of ELM in this study, are coded as individual chromosome. The fitness F, which can judge whether the code is a good solution, is calculated as where e k is error of ELM as e k = Y k −Ŷ k and m is the number of output layer nodes.
The GA optimization process proceeds as follows: • Step 1, selection. GA selection is based on fitness, and the probability of selection is calculated as where N is the number of individuals;

•
Step 2, crossover. GA crossover of two chromosomes at gene j is calculated as where α kj and α lj are the gene j of chromosome k and chromosome l, respectively, β is the cross coefficient, which is a random number between 0 and 1;

•
Step 3, evolution. GA evolution of α ij is calculated as where α max and α min are the upper and lower threshold of a ij , respectively, g and G max are the current and maximum number of GA evolutions, respectively, γ is the evolution coefficient, which is between 0 and 1. It is necessary to repeat the GA optimization until the maximum fitness is obtained. The individual chromosome code with the maximum fitness is the optimal solution. By decoding the optimal solution, optimal initial input weights ω ij and hidden layer bias b j can be obtained for the ELM.

IGA Using Levy Flight
In the process of model testing, it was found that fixed cross coefficient β in GA crossover can lead to mild crossover and excessive crossover at the later stage, which affected the practical application of GA-ELM. To solve this problem, the Levy flight algorithm is introduced to improve GA algorithm. A Levy flight is a random walk strategy with non-Gaussian distribution. During the walk, the Levy flight is accompanied by frequent short walks and occasional long walks, so it effectively balances the mild crossover and excessive crossover of GA [39,40]. Its update formula is as follows: where β t i is the current cross coefficient, τ is the random step size, ⊕ is the dot product, the step length δ = ρ |η| , where ρ and η have Gaussian distribution, and random dynamic

SCADA Data Description
The data in this study come from Damianshan Wind Farm in Wanyuan City, Sichuan Province, China. The wind farm has a total of 33 1.5-MW WTs, with an annual power of 90.489 million kWh. The SCADA in this wind farm records data of 26 parameters, as shown in Table 1, every 1 min. The output of the model should not only directly reflect the working state of WT, but also have a great impact on maintenance. Among the various failures of WT, the main bearing failure costs the most [18]. Since the main bearing temperature is closely related to the health of the main bearing, the main bearing temperature is chosen as the output of the model in this paper.
The input should be directly related to main bearing and WT, which are: (a) the production parameters, such as active/reactive power; (b) the parameters which are close to the main bearing temperature, such as gearbox front/rear bearing temperature; (c) the environmental condition. In this study, the input of the model contains eight parameters, namely active power, rotor speed, gearbox front bearing temperature, gearbox rear bearing temperature, nacelle ambient temperature, tower vibration, external temperature and wind speed.

WCC Results
The SCADA data from 1 May to 20 May are extracted to partition the wind conditions. In this paper, in order to be more consistent with the wind characteristics, the number of clusters is set from two to eight, and the silhouette value results and the three-dimensional visualization of WCC are shown in Figure 6.

WCC Results
The SCADA data from May 1st to May 20th are extracted to partition the wind conditions. In this paper, in order to be more consistent with the wind characteristics, the number of clusters is set from two to eight, and the silhouette value results and the threedimensional visualization of WCC are shown in Figure 6. From Figure 6a, it can be seen that when the number of clusters is five, the silhouette value reaches the maximum value of 0.78. This result indicates that it is optimal to segment the wind into five conditions: Condition I (wind speed obvious increase), Condition II (wind speed slight increase), Condition III (wind speed stable), Condition IV (wind speed slight decrease) and Condition V (wind speed obvious decrease). From Figure 6b, five separate wind condition spaces can be clearly observed which also explicitly shows the effects of wind conditions on WT temperature. Table 2 summarizes WCC distribution to further quantitatively understand the clustering results. As can be seen in Table 2, as far as the wind speed change is concerned, the ranges under the five conditions are different. In terms of wind speed, there is little difference in the ranges under the five conditions. One can see from the results that the wind conditions are clearly partitioned according to this parameter, namely wind speed change, and thus this parameter can be used for subsequent real-time wind condition recognition purposes. It should be noted that wind speed change ranges are open to more than 1.17 m/s/min in Condition I and less than −0.66 m/s/min in Condition V. From Figure 6a, it can be seen that when the number of clusters is five, the silhouette value reaches the maximum value of 0.78. This result indicates that it is optimal to segment the wind into five conditions: Condition I (wind speed obvious increase), Condition II (wind speed slight increase), Condition III (wind speed stable), Condition IV (wind speed slight decrease) and Condition V (wind speed obvious decrease). From Figure 6b, five separate wind condition spaces can be clearly observed which also explicitly shows the effects of wind conditions on WT temperature. Table 2 summarizes WCC distribution to further quantitatively understand the clustering results. As can be seen in Table 2, as far as the wind speed change is concerned, the ranges under the five conditions are different. In terms of wind speed, there is little difference in the ranges under the five conditions. One can see from the results that the wind conditions are clearly partitioned according to this parameter, namely wind speed change, and thus this parameter can be used for subsequent real-time wind condition recognition purposes. It should be noted that wind speed change ranges are open to more than 1.17 m/s/min in Condition I and less than −0.66 m/s/min in Condition V.

WCC Performance Test
To verify the effectiveness of WCC using K-means clustering, a solution with WCC of wind speed actual value (0-3, 3-12 and >12 m/s) and a solution without WCC are applied as comparison. The algorithms of all three solutions are same as IGA-ELM. Typical data sets of wind speed increase and decrease are chosen as shown in Table 3. It is worth mentioning that, just like the algorithm performance test, the WT works normally in these two periods. The residual results of the three solutions are shown in Figure 7, and the statistical indicators are shown in Table 4.  Table 4.   As shown in Figure 7, the proposed solution with WCC of K-means clustering shows much better performance than the other two comparison solutions do, which is also proved by the statistical indicators in Table 4. During the wind speed increase, the residual results' amplitude of the proposed solution is 0.32 °C, and those of the solution with WCC of actual value and the solution without WCC are 3.49 °C and 3.78 °C, respectively. With WCC of actual value or without WCC, the monitoring (residual) results can be misleading information which can easily trigger false alarms. The same situation can also happen during the wind speed decrease with 0.29 °C vs. 1.56 °C and 1.71 °C. This demonstrates that WCC of K-means clustering can improve the accuracy of the model and avoid false alarms.
Additionally, the solution with WCC of actual value and the solution without WCC achieve worse performance during wind speed increase than during decrease. This occurs because the speed of wind speed change can directly determine the delay between the wind speed change and internal temperature change. As mentioned in Section 2, there is a certain delay between wind speed change and internal temperature change, which  As shown in Figure 7, the proposed solution with WCC of K-means clustering shows much better performance than the other two comparison solutions do, which is also proved by the statistical indicators in Table 4. During the wind speed increase, the residual results' amplitude of the proposed solution is 0.32 • C, and those of the solution with WCC of actual value and the solution without WCC are 3.49 • C and 3.78 • C, respectively. With WCC of actual value or without WCC, the monitoring (residual) results can be misleading information which can easily trigger false alarms. The same situation can also happen during the wind speed decrease with 0.29 • C vs. 1.56 • C and 1.71 • C. This demonstrates that WCC of K-means clustering can improve the accuracy of the model and avoid false alarms.
Additionally, the solution with WCC of actual value and the solution without WCC achieve worse performance during wind speed increase than during decrease. This occurs because the speed of wind speed change can directly determine the delay between the wind speed change and internal temperature change. As mentioned in Section 2, there is a certain delay between wind speed change and internal temperature change, which means wind condition can determine the difference between the predicted value (only based on working state without wind speed change) and the actual value of internal temperature. In these two periods, during the wind speed increase, 54% data (from 9:42 to 10:36) are in Condition I (wind speed obvious increase) and the rest of the data are in Condition II (wind speed slight increase); during the wind speed decrease, only 18.3% data (from 14:58 to 15:31) are in Condition V (wind speed obvious decrease) and the rest of the data are in Condition IV (wind speed slight decrease). The difference of wind condition distribution makes the absolute values of the two comparison solutions' residual results during the wind speed increase generally larger than that during the decrease, which also reflects the effect of wind condition on WT temperature monitoring.

IGA-ELM Performance Test
To build and test the IGA-ELM model, the learning set and test set are shown as in Table 5. To ensure model accuracy, the learning set should cover the working conditions and state as much as possible only without failures. Similarly, the test set should also contain a variety of working states and wind conditions without failures. In this test, GA-ELM, original ELM and back propagation neural network (BPNN) are used for comparison. The building progress of the comparison models are the same as the proposed model, with WCC. The only difference between the proposed model and comparison models is the intelligent algorithm. The residuals of IGA-ELM, GA-ELM, ELM and BPNN are shown in Figure 8. It should be noticed that the outputs of the models are the residuals of main bearing temperature which is the actual value minus the predictive value. In addition, to quantitatively compare the performance of the testing models, mean square error (MSE = 1 s ∑ s k=1 (e k ) 2 ), mean absolute error (MAE = 1 s ∑ s k=1 |e k |) and mean absolute percentage error ( ) are used to analyze the residuals. Statistical indicators of the residuals are shown in Table 6.

IGA-ELM Performance Test
To build and test the IGA-ELM model, the learning set and test set are shown as in Table 5. To ensure model accuracy, the learning set should cover the working conditions and state as much as possible only without failures. Similarly, the test set should also contain a variety of working states and wind conditions without failures. In this test, GA-ELM, original ELM and back propagation neural network (BPNN) are used for comparison. The building progress of the comparison models are the same as the proposed model, with WCC. The only difference between the proposed model and comparison models is the intelligent algorithm. The residuals of IGA-ELM, GA-ELM, ELM and BPNN are shown in Figure 8. It should be noticed that the outputs of the models are the residuals of main bearing temperature which is the actual value minus the predictive value. In addition, to quantitatively compare the performance of the testing models, mean square error ( = ∑ ( ) ), mean absolute error (MAE = ∑ | | ) and mean absolute percentage error ( = ∑ ) are used to analyze the residuals. Statistical indicators of the residuals are shown in Table 6.   As shown in Figure 8, the GA-ELM model and IGA-ELM model have a better performance than the ELM model and BPNN model, demonstrating that the GA optimization is effective. Figure 8 also shows that the residuals of the IGA-ELM model are generally smaller than that of the GA-ELM model. Consistently in Table 6, the IGA-ELM achieves a smaller MSE, MAE and MAPE. This demonstrates that the parameter optimization can improve the accuracy of the model.

Main Bearing Failure Detection
To verify the failure detection ability of the proposed solution, a serious main bearing offset occurred in the wind farm, which is as shown in Figure 9, is used as a failure case.
The data set of 5 h before the failure happened is shown in Table 7. In this test, solution of IGA-ELM without WCC, solution of GA-ELM with WCC and solution of ELM with WCC are used for comparison. The residual results are shown in Figure 10.
smaller than that of the GA-ELM model. Consistently in Table 6, the IGA-ELM achieves a smaller MSE, MAE and MAPE. This demonstrates that the parameter optimization can improve the accuracy of the model.

Main Bearing Failure Detection
To verify the failure detection ability of the proposed solution, a serious main bearing offset occurred in the wind farm, which is as shown in Figure 9, is used as a failure case. The data set of 5 h before the failure happened is shown in Table 7. In this test, solution of IGA-ELM without WCC, solution of GA-ELM with WCC and solution of ELM with WCC are used for comparison. The residual results are shown in Figure 10.     Comparing the two solutions of IGA-ELM in Figure 10, it can be seen that the solution without WCC generally falls behind the solution with WCC by more than 60 min, reaching about 90 min at a residual result of 2 °C. Especially from 9:32 (time point 232) to 10:16 (time point 276), during Condition I (wind speed obvious increase), the solution without WCC shows downward trend over a short time. Additionally, considering the conclusions of the WCC performance test, the solution with WCC exhibits a stable performance during wind condition change, and the residual results are generally less than 0.5 °C. However, Comparing the two solutions of IGA-ELM in Figure 10, it can be seen that the solution without WCC generally falls behind the solution with WCC by more than 60 min, reaching about 90 min at a residual result of 2 • C. Especially from 9:32 (time point 232) to 10:16 (time point 276), during Condition I (wind speed obvious increase), the solution without WCC shows downward trend over a short time. Additionally, considering the conclusions of the WCC performance test, the solution with WCC exhibits a stable performance during wind condition change, and the residual results are generally less than 0.5 • C. However, for the solution without WCC, due to the wind condition change, residual results under normal working state can be more than 1 • C, sometimes even reaching 4 • C. Thus, the safe range of the solution with WCC is narrower than that of the solution without WCC. If the safe range with WCC is set to ±1 • C and that without WCC is set to ±2 • C, the alarm with WCC can be about 120 min earlier than without WCC. If the safe range with WCC is set to ±0.5 • C and that without WCC is set to ±4 • C, the alarm with WCC can be more than 180 min earlier than without WCC. These results demonstrate that, with WCC, monitoring can achieve earlier failure alarms.
Comparing the three solutions with WCC in Figure 10, for the residual larger than 0.5 • C, the solution of IGA-ELM is averaging 18 and 32 min earlier than that of GA-ELM and ELM, respectively. Especially in the early stages of failure, at a residual of 2 • C, the solution of IGA-ELM is about 37 and 64 min earlier than that of GA-ELM and ELM, respectively; at residual of 1 • C, the solution of IGA-ELM is about 45 and 103 min earlier than that of GA-ELM and ELM, respectively. These results demonstrate that the IGA optimization can reduce the effects of wind conditions on the monitoring and obtain earlier failure alarms.

Conclusions
WTs are exposed to harsh conditions all year round, and the variability of wind conditions can affect WT monitoring directly. Data show that the main bearing temperature can be more than 5 • C different in the same working state but different wind conditions. Conversely, the data-driven model may fall into a local minimum due to the irregularity of wind condition change and the randomness of initial weights and bias, which can also affect monitoring accuracy.
In this paper, a novel WT monitoring solution based on WCC and IGA-ELM is proposed to solve these problems. On the one hand, considering the effects of wind conditions on the WT temperature monitoring, WCC using K-means clustering can partition wind data into several condition clusters, which can make the monitored data more suitable with their corresponding models. On the other hand, ELM is optimized by IGA for the irregularity of wind condition change and the randomness of initial weights and bias, which can improve the accuracy of the model. With testing cases in different wind conditions, it proves that, compared to the solution without WCC or IGA optimization, the proposed solution could reduce false alarms when WT is in a normal working state. A main bearing failure case shows that, with WCC, the alarm can be advanced by at least 60 min. All the cases demonstrate that the proposed solution with WCC and IGA optimization can improve the accuracy of WT temperature monitoring, thereby reducing operation and maintenance costs.
It should be noted that this study focuses on wind speed and its change to describe wind conditions. However, wind conditions also include other factors, such as air humidity and air pressure. In future research, these factors should also be considered to make the WT monitoring more accurate.