Usage of Neural-Based Predictive Modeling and IIoT in Wind Energy Applications

The adoption of wind energy has grown significantly in recent years. New, cost-effective technologies have been developed, led by customer awareness of green technologies and a legal framework proposed at the European Union level. The stochastic nature of wind speed is transferred to wind turbine output, making wind energy difficult to predict. The main scope of predicting wind energy production is to be proactive in balancing and reserving energy to meet demand. When the prediction identifies a potential gap between supply and demand, additional energy from other sources must be generated and supplied. Creating a synergy of physical devices through advanced sensing capabilities, software, storage and analytics capabilities, the Industrial Internet of Things is enabling the effective transition to wind energy through automation by removing many of the disadvantages in a way that has recently become accessible. This research focuses on the data analytics, proposing a fast univariate network-based approach for wind energy prediction, using Feed Forward Neural Networks, Recurrent Neural Networks, Long-Short Term Memory, Gated Recurrent Unit, and Convolutional Neural Networks. Moreover, by introducing the theoretical fundamentals, the implementation method and the hyperparameters of the final models, this article becomes unique in the context of wind energy. At the time of this study, no prior research studies have presented a direct comparison between feedforward, recurrent, and convolutional neural networks ‒ these being the most important in the field of supervised learning.


Introduction
The renewable energy sector has experienced considerable global growth in the last few years. This growth comes with huge challenges, in terms of asset management. Wind turbine site selection can lead to geographically dispersed layouts, which makes the management of these installations a challenging task. By adopting an Internet of Things (IoT) approach, geographical dispersion-related problems can be overcome by enabling remote descriptive, diagnostic, and predictive analytics to minimize operational costs by maximizing production and preventing unplanned, costly downtimes. Predictions of wind energy is one of the core data needed in real-time control of power systems: to be close to the real-time ideal means delivering fast predictions, in sync with entire system dynamics. The higher the prediction accuracy, the higher the efficiency of the system, resulting is savings for all investors. IoT can enable companies across the entire energy supply chain to achieve their targets. The continuously increasing adoption of these renewable technologies has led to a situation that was not previously possible: small installations built by businesses or homeowners. These installations were initially built to cover daily basic energy consumption, but they can also send energy into the power grid. This reinforces the dispersed character of current power systems and creates a new challenge, in terms of optimal grid management. However, data from smart energy meters has the advantage of driving improvements for consumers, for example, by identifying waste, such as power-hungry devices or automated heating, ventilation, and air conditioning systems. The growth of wind power systems is driven by European Commission and Government legislations. For the next 10 years, through the European Green Deal, there are precise targets for the minimum percentage of renewable energy to be used and the reduction of greenhouse gas emissions (European Commission, 2020). Even though most renewable energy resources were identified decades ago, they have not been able to replace fossil fuel-based sources because of their intermittent and variable availability. The solution for this has been to gradually add them to existing power grids, which has been possible due to the development of smart grids that include features, such as power consumption and output power prediction. In this way, a feedback loop between customer and supplier is created. Then, the gap between demand and supply can be covered using fossil fuel-based sources, overcoming the negative impact of availability. The successful integration of wind-based technologies in the existing electricity grid depends on the accuracy of wind prediction. Short-term forecasting plays an important role, for both operational and energy trading activities. The IoT, by definition, covers every piece of technology with the capability to communicate with other devices, systems and networks (Ashton, 2009). Industrial professional and academics split the IoT in two branches: Industrial Internet of Things (IIoT) and Customer Internet of Things (CIoT) (Al-Ali, 2016). IIoT is represented by smart grids, factories, cars, and machines, while CIoT is oriented to the customer and their devices, such as smart home devices, connected cars and wearables. IIoT and CIoT are connected, enabling information transfer between them. This research aims to provide an overview on the use of predictive analytics based on neural networks and IIoT in the wind energy industry. The focus is on analytics, by identifying the theoretical and practical aspects of using network-based algorithms for rapid, short-term univariate predictions of wind energy production, using Romanian-based wind turbines data for the case study. Finally, the results of the best performing models belonging to each of the five selected typologies are compared, both in terms of generalization capacity and training time.

Literature review
Methods for wind energy prediction can be grouped depending on the timescale (short-term, medium-term, long-term), model type (physical, statistical, machine learning, hybrid) and the variety of parameters (univariate, multivariate). Physical models are built on the exogenous variables that influence energy production. However, being deterministic, they are dependent on the location and physical properties of the environment in which the wind turbines are located, making them less versatile than other models. In these models, another important aspect is that for all methods of prediction, either the output energy is predicted directly, either firstly wind speed is predicted and then based on power curves, the energy is determined analytically. Most literature uses the indirect method, as described below. An error correction model, based on a bidirectional gated recurrent unit neural network, is proposed to correct the error of the numerical weather prediction of wind speed (Ding, et al., 2019). The results outperformed the selected benchmark models for short-term power prediction and the same approach could also be used for medium and long-term predictions. Using the same univariate machine learning model, Long-Short Term Memory (LSTM) and One-dimensional Convolutional Neural Networks (1D-CNN) can be implemented (Fukuoka, et al., 2018). In this study both LSTM and 1D-CNN provide better performances than the Feed Forward Neural Networks (FFNN). Another way to extract meaningful information from a time series is represented by Empirical Wavelet Transform (Wang and Hu, 2015), then the GPR (Gaussian Process Regression) model combines, in a nonlinear way, the predictions generated by other models like ARIMA (Autoregressive Integrated Moving Average), ELM (Extreme Learning Machine) and SVM (Support Vector Machine). This method is more accurate than the standalone models for predicting short-term wind speed at two sites. An example of ARIMA implementation on real operational data illustrates the improvement in reducing energy buffers, resulting in a cost reduction by accurately predicting wind speed (Eldali, et al., 2016). A similar outcome can be achieved by direct predicting of the wind power (Pant and Garg, 2016). The model performance can also be improved by dividing the year into months and building separate models for each of them (Chen and Lai, 2011). The comparison of ARIMA and FFNN reveals that for each month, and for one hour, two hours, three hours, and four hours ahead, FFNN outperforms ARIMA. Under the direct wind energy prediction, a hybrid approach has been tried that consists of using a non-linear model for the non-linear component of the time series, and a statistical model for the linear factor. An example of this is using ARIMA for the linear and RBFNN (Radial Basis Function Neural Network) for the non-linear component. For large data similarity and a high-density time series, a preprocessing step for extracting the change trend information can be used (Liu, Ding and Jia, 2020). A K-means clustering method is proposed for obtaining a new time series that compresses the data, facilitates storage and utilization, and eliminates noise. The resulting time series is then used as input for ARIMA, SVM, GPR, ESN (Echo State Network), GRU (Gated Recurrent Unit), A-RNN (Attention Recurrent Neural Network), Input-Attn-RNN (Input Attention Recurrent Neural Network) and DA-RNN (Dual-stage Attention based Recurrent Neural Network). Except SVM, the results from all of the selected models were similar, proving the versatility of the network-based models.
Machine learning-based approaches enable researchers to study wind energy production without having much industry experience. These flexible and highly scalable models outperform existing models in fast univariate prediction tasks. Currently, there is a gap in direct univariate wind energy prediction. Furthermore, a comprehensive analysis of the most important network-based models is necessary, specifically to align with the challenges faced by the wind energy industry. The enabler for implementing these machine learning-based solutions is facilitated by IIoT technologies firstly by the fact that operational data is readily available for consumption, and secondly by the fact that the results of predictive analyzes can be propagated backwards for real-time optimization of energy production.

Feed Forward Neural Networks
The artificial neural network (ANN) concept was first introduced by McCulloch and Pitts (1943). Inspired by the human brain, ANN aims to replicate the way information flows between neurons. However, only after backpropagation was introduced (Rumelhart, Hinton and Williams, 1986) did ANN start to demonstrate its capabilities. By implementing backpropagation, the synaptic weights between neurons are updated according to the expected result. The value of the synaptic weight provides information about how important the inputs are to achieve the maximum possible accuracy. FFNN represents an instance of the ANN in which the information is processed while passing forward through the network, traveling from the input layer through hidden layers and, finally, the output layer (figure no. 1). Each layer consists of neurons, representing the computational units of the network.

Figure no. 1. Feed Forward Neural Network high level architecture
The neurons only communicate with the outside world if they are in the input or output layer. The neurons of the hidden layers are receiving inputs from the neurons upstream and are sending the information downstream, either to another hidden layer or to the output layer. All neuron inputs have a synaptic weight (w i ), based on the inputs (x i ) and the impact of the output (y i ). The output of a neuron can be calculated as a weighted sum of the inputs (equation 2) on which the activation function f is applied (equation 1).

Usage of Neural-Based Predictive Modeling and IIoT in Wind Energy Applications
Using the activation function introduces non-linearity into the neuron's output. The neuron will be fired if the output value is greater than a given threshold, or it can be inhibited if the output is smaller than the threshold (equation 3). For the RELU (rectified linear unit) activation function: Backpropagation means the information travels back from the output layer to the input layer. The model's error is used to update the network parameters, with respect to the objective function (i.e., error minimization). The way network parameters are updated is governed by the optimization algorithm (Kingma and Ba, 2015).

Recurrent Neural Networks
A Recurrent Neural Network (RNN) can be obtained when feedback connections are added, and are performant while using sequential data. The applications in the time series field are important since the prediction of a time step may depend on multiple steps backwards. The RNN can be defined using the following equations (Pascanu, et al., 2014): Where x t = input vector; h t = hidden state; h t-1 = previous step hidden state; y t = output vector; W, U, V= parameter matrices and f h , f o =activation functions. The input x t and the previous hidden state h t-1 are concatenated. The newly created vector contains information of both the current input and the previous state. This vector is passed through a tanh activation function, resulting in output of the current state. The tanh layer regulates the output by fitting the values between -1 and 1 (figure no. 2).

Figure no. 2. Recurrent Neural Network high level architecture
While FFNN is using backpropagation for training, RNN is using backpropagation through time (BPTT) (Rumelhart, Hilton and Williams, 1986). BPTT is suitable for network applications where model parameters are updated in discrete time steps. The design of the RNN is vulnerable to exploding or vanishing gradient issues (Bengio, Simard and Frasconi, 1994).

Long-Short Term Memory
Long-Short Term Memory design (LSTM) (Hochreiter and Schmidhuber, 1997) retains the capabilities of the RNN to work with time series data, but it reduces information morphing and the issues related to vanishing and exploding gradients. The novelty of LSTM (figure no. 3) lies in a gating system that can manage the way information flows: the internal gating system ensures that input information can be kept or forgotten, according to its significance to the problem at hand.

Figure no. 3. Long-short term memory high level architecture
According to the theoretical fundamentals in the cell state relevant information of the earlier steps is stored, reducing the negative impact of information morphing. A network's gating system consists of three gates: the input gate, the forget gate and the output gate. These three gates are essentially neural networks. As the tanh activation function squashes the results between -1 and 1, the sigmoid activation function performs the same task, but within 0 and 1. The forget gate then decides which information of the previous cell state is to be kept or discarded (equation 6).
Where U f and W f are the weights of the current state input and previous cell output, with respect to the forget gate. The input gate consists of two mathematical layers. The first layer decides the new information that will be stored in the cell state (equation 7). This layer acts in a similar way to the forget gate: current state input and previous state cell output are passed through a sigmoid function. The differentiation is made by considering its own bias and weights for the current input and previous state cell output, b i , U i and W i , respectively.

Usage of Neural-Based Predictive Modeling and IIoT in Wind Energy Applications
The second layer of the input gate takes the same current input and previous cell state and passes them through a tanh activation function (equation 8). This time, the bias and the weight matrices are specific to this second layer. The new candidate for the cell state is calculated as: C t =tanh(x t U g +h t-1 W g +b c ) The new cell state: Through a tanh activation function, the newly calculated cell state C t is regulated and further multiplied with the output gate result: Now the current cell state C t , known as long-term memory and the hidden state h t , known as short-term memory are computed. The logic described above is repeated for all the new time steps considered. The output of each time step is obtained using the short-term memory.

Gated Recurrent Unit
GRU it is a relatively new design , which has been gaining popularity since its inception. As with the LSTM, this design aims to reduce the issues related to long-term dependencies and vanishing or exploding gradients. GRU logic is similar to that implemented in the LSTM, in that the information flow is regulated by a gating system (figure no. 4).

Vol. 23 • No. 57 • May 2021 419
However, the gating system of GRU consists of only two gates: an update gate and a reset gate. Another difference is represented by the cell state, which is not part of GRU. Both the reset and update gate functionality are governed by the sigmoid activation functions. In this way, only relevant data is kept. The reset gate has a similar functionality as LSTM's forget gate, deciding what information from the previous hidden state is to be discarded.
r t =σ(x t U r +h t-1 W r +b r ) The update gate works in the same way as LSTM's input gate, filtering the information coming from the previous state and the information of the current input, selecting the new information to be added.
z t =σ(x t U z +h t-1 W z +b z ) Where r (equation 12) and z (equation 13) indicate that the weights are proper for the reset gate and the update gate. If the values of the update gate are close to one, then the information of the old state is kept, while the current state input is ignored. The reset gate ensures the short-term dependencies are captured, while the update gate does the same but for long-term dependencies. The new hidden state candidate is governed by the following equation: After assimilating the effect of the reset gate into the new hidden state candidate, then the impact of the update gate output is incorporated into the current hidden state: In this way, GRU manages to deal with short and long-term dependencies and gradientrelated issues with less calculations, being less computationally expensive than the LSTM.

Convolutional Neural Networks
Convolutional Neural Networks (CNN) represent a branch of neural-based models invented for computer vision tasks, initially for handwritten digit recognition (Le Cun, et al., 1990). Modern CNNs started to be used on one-dimensional (1D CNN) sequential data of applications, like time series, text, or audio analysis (Zhang, et al., 2020). CNNs might result in more computationally efficient architectures due to the ease of computation parallelization across graphical power unit cores and fewer parameters, compared to fully-connected architectures.
A typical 1D CNN configuration for a time series prediction problem consists of: input data, a convolutional layer, a pooling layer, a concatenation layer, a dense layer and an output layer (figure no. 5). Convolutional layer processes the input data and learn to extract the features appropriate for the regression made by the dense layer (Abdeljaber, et al., 2017).

Figure no. 5. One-dimensional convolutional neural network (1D CNN) high level architecture
The forward propagation in a 1D CNN layer uses the following equation (Kiranyaz, et al., 2020): x k l =b k l + ∑ conv1D(w ik l-1 ,s i l-1 ) Where b k l = bias of the k th neuron at layer l;w ik l-1 = kernel from the i th neuron at layer l-1 to the k th neuron at layer l; s i l-1 = output of the i th neuron at layer l-1; x k l = input. By passing through the activation function, the input x k l , the intermediate output y k l can be obtained.
As a step-by-step approach, first a kernel must be selected, which is then translated along the time series, one step at a time. For each step, the dot product of the kernel is calculated and fitted to the time series. The convolution is represented by the resulting sequence of dot products between the kernel and time series (End to End Machine Learning School, 2020). The pooling layer functions to reduce the number of trainable parameters and to retain the information. Flattening the pooling layer output results in a one-dimensional array, which has the right shape for using in the fully connected layer, which is essentially a feed forward neural network.

Methodology
Between 2005 and 2017 the amount of wind energy produced by the European Union increased by 414%. Romania is one country which has reached the European Commission target for renewable energy for 2020 (European Court of Auditors, 2019), having a total wind power capacity of 3040 MW (Sava, 2020). In 2018, 42% of Romania's energy consumption was represented by renewable energy, while the average in the European Union is 32%. Of this, 15% of Romania's renewable energy is created by wind energy (Botea, 2020). This impressive increase has resulted in growing complexity of the tasks needed to ensure optimal and safe operation of wind energy, adding uncertainty to the power systems. The uncertainty and volatility have a significant impact, since the contribution of wind power to the total power is increasing. Wind turbine operators need to deal with these previously mentioned challenges, without impacting the grid or creating disruptions to the customers. IoT technologies can create mechanisms for automatic monitoring and controlling systems, enabling the operator to implement strategies with respect to these challenges. Ultimately, the wind energy prediction results are used to support production, in order to deliver energy without suffering gaps in supply.
A high-level view of a typical IIoT architecture consists of three major layers: the connect layer, the acquisition layer and the analytics layer (figure no. 6).

Figure no. 6. High-level overview of the proposed IIoT architecture
The connections of these layers are bidirectional (Krishna, et al., 2018) and compliant with cyber security protocols. The connect layer consists of physical devices, such as sensors for monitoring and actuators for control. Besides sensors to record data, these devices must be able to connect to the IT infrastructure for sending and receiving data. Additionally, data standardization and transformation can be made. The connection to the internet is done using an IoT gateway. This device has bigger computational capabilities compared to the existing devices on the connect layer and is capable of aggregating data from multiple sensors and sending it in the cloud. As part of the acquisition layer, cloud hosted servers host the data received from IoT gateways. Cloud storage has advantages, such as scalability, usability, accessibility, security, cost-efficiency, and automation (Singh, 2020). In the analytics layer, the real-world operational data is transformed into actionable insights for managing and improving business operations. The analytics layer can contain various types of data analytics (Hanski, et al., 2018): this study focuses on the descriptive and predictive analytics. On top of the analytics layer, the Cross-Industry Standard for Data Mining (CRISP-DM) is implemented (IBM, 2019). This robust methodology gives a structured approach for data mining projects. The step-by-step guidance describes the main tasks to be completed during each phase and the interactions between these phases. CRISP-DM is essential in real-world business cases and consists of the following six phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. The first phase, business understanding, has been covered extensively in the previous sections. The second phase of CRISP-DM is represented by data understanding. The data was made available by Open Power System Data (Neon Neue Energieökonomik, 2020). The wind energy production time series is aggregated by hour and consists of 15336 data records between 01/01/2019 and 30/09/2020 (table no. 1). Open Power System Data is a platform for data gathered from the European power system, with freely available data for researchers. Amfiteatru Economic   . This mean does not take into account if the error value is negative or positive.

Vol. 23 • No. 57 • May 2021 423
Where y i is the actual value and ŷ i is the predicted value. Training time assesses the resources spent on training and if the model training can happen within consecutive timestamps. The sixth phase of CRISP-DM, deployment, has not been covered in this research. The entire experiment happened offline.
All the proposed algorithms were implemented in Python, using Keras (Chollet, 2017) with TensorFlow on the backend. The hardware setup consisted of a Dell Precision 7350 equipped with a Nvidia Quadro P2000 GPU, 32 GB RAM, Intel Core i5-8400H @ 2.5 GHz CPU, Windows 10, and Python 3.6.10. The learning process used a supervised paradigm. Pairs of input-output training data subset records were fed into the models during the training process. The models then learned to generalize based on this training data. Testing happened using unseen datai.e., a test data subset that was not part of the training. During the testing phase, the performance of the models was assessed (Fawcett and Provost, 2013).

Results and discussion
For the selected models, the final performance and training times represented the average of three trials, with different ratios between training and testing data subsets (table no. 2 and figures no. 8-10). This ensured meaningful results, unrelated to the data split.   Due to the highly configurable character of network-based models and their outstanding capability to learn and generalize, the results can be considered meaningful and reliable. There are no significant differences between RNN, LSTM and GRU, in terms of 2 and MAE. However, the LSTM training time is expected to be the longest, due to its complex mechanisms for capturing shortand long-term dependencies. Among the selected models, LSTM is the most computationally expensive. FFNN can be considered the least sophisticated model, but it provides a reasonable 2 in half the training time of the LSTM. According to the theoretical considerations, 1D CNN was the least expensive, in terms of computational power. Even if 2 results of the considered models appear similar, MAE provides another perspective (figure no. 9): for a real-world business case the increase of MAE, even the training time takes just few seconds, might not be feasible.

Conclusions
The energy industry must keep pace with the other industries that it is setting in motion. Energy demand, and the way customers use it, is becoming more complex. The pattern of consumption is highly influenced by the devices used, while the distributed character of the power systems is influenced by household or small business renewable energy systems. The key to safe and optimal operation of power grids relies on predictability across the entire energy supply chain. IoT technologies can act as enablers, gathering together all the "things" to create synergy. Geographically dispersed assets can, with the help of IIoT technologies, be managed more efficiently, preventing and minimizing the costly downtimes, while maximizing the output and reducing the negative impact of production volatility. Through IIoT, multiple energy resources, such as solar, geothermal, hydro and biomass, can be integrated and managed. As well as the IIoT, the CIoT is crucial for providing an end-to-end solution. The predictive analysis results from this case study show that LSTM produced the best prediction. Even with the long training time, the impact of such a prediction in a business will create the premises for a proper hardware upgrade to reduce the training time. This will allow the prediction to be made within consecutive data points for a more discrete time scale. MAE gives a measure of the performance in relation to an absolute value, namely the energy produced. In the case of RNN, LSTM, and GRU, the MAE values are differentiated at the decimal level and even if the unit of measurement is MW, at the time of deployment in production other elements must be considered. For a fully informed deployment, it is important that the three models are used in parallel to determine the best solution. The complexity of the tasks for identifying the parameters and the stability of the models or the frequency with which the training must be repeated are decisive in selecting the best solution. The values of R 2 are close between the selected models and are also close to the value 1, meaning that the models can, to a large extent, explain the variability in the results.
Comparing the results with those obtained by other researchers was not possible due to the way the metrics are typically selected; specifically, the metrics allow a comparison of models that are trained using the same data set, but do not allow a comparison of models trained on different data sets. In this regard, the use of the coefficient of determination, the complete description of the models' parameters, and the software and hardware configuration will allow other researchers to use this article for comparative studies. The performance of the models might be increased by adding exogenous variables, such as wind speed, wind shear, ambient temperature and pressure, dew point temperature and humidity.