Deep Learning Enhanced Solar Energy Forecasting with AI-Driven IoT

. Short-term photovoltaic (PV) energy generation forecasting models are important, stabilizing the power integration between the PV and the smart grid for arti ﬁ cial intelligence-(AI-) driven internet of things (IoT) modeling of smart cities. With the recent development of AI and IoT technologies, it is possible for deep learning techniques to achieve more accurate energy generation forecasting results for the PV systems. Di ﬃ culties exist for the traditional PV energy generation forecasting method considering external feature variables, such as the seasonality. In this study, we propose a hybrid deep learning method that combines the clustering techniques, convolutional neural network (CNN), long short-term memory (LSTM), and attention mechanism with the wireless sensor network to overcome the existing di ﬃ culties of the PV energy generation forecasting problem. The overall proposed method is divided into three stages, namely, clustering, training, and forecasting. In the clustering stage, correlation analysis and self-organizing mapping are employed to select the highest relevant factors in historical data. In the training stage, a convolutional neural network, long short-term memory neural network, and attention mechanism are combined to construct a hybrid deep learning model to perform the forecasting task. In the testing stage, the most appropriate training model is selected based on the month of the testing data. The experimental results showed signi ﬁ cantly higher prediction accuracy rates for all time intervals compared to existing methods, including traditional arti ﬁ cial neural networks, long short-term memory neural networks, and an algorithm combining long short-term memory neural network and attention mechanism.


Introduction
Photovoltaic power generation has the advantages of low carbon consumption, adaptive to various applications, and low installation and maintenance costs, which is known as a sustainable energy source [1]. Because of different weather conditions, PV panels often cannot stably output electrical power from solar energy. While integrating the PV power to the power grid, the grid is seriously influenced. The stability of the entire grid will be greatly reduced. With the wireless sensor network, it is possible for deep learning techniques to forecast solar energy generation and consequently stabilize the smart grid systems [2]. Therefore, artificial intelligence-(AI-) driven internet of things (IoT) technology becomes one of the key technologies of solving this problem [3,4].
Along with the fast development of IoT technology, the extended deep learning methods nowadays are capable of performing short-term time series data forecasting, such as PV power generation, anomaly detection, and energy consumption forecasting, with considerable high forecasting accuracy [5][6][7].
In recent years, with the fast development of AI-driven IoT technology, the applications of deep learning technologies have been extended to various fields [8,9], such as digital twinning [10], computer security [11], cyberphysical systems [12], transportation systems [13], and air quality forecasting [14]. Jiménez-Pérez and Mora-López [15] proposed a forecasting system simulating global solar irradiance forecasts every hour. The system can be separated into two phases. The first phase is clustering. The original dataset was divided into groups by the k-means clustering algorithm. Each group represents data of different weather types. Machine learning techniques, such as decision trees, artificial neural networks, and support vector machines, are employed to perform forecasting in the second phase. Yang et al. [16] proposed a hybrid deep learning method based on weather types for PV power output forecasting with a timestep at 1 hour. The proposed method has three steps: classification, training, and prediction. The classification step involves a selforganizing map (SOM) [17] and Learning Vector Quantization (LVQ) [18]. In the training and prediction step, the fuzzy training method is used to select the most appropriate deep learning model for prediction. Han et al. [19] proposed an alternative multimodel PV power interval prediction method that considers the seasonal property of the PV power and absolute power deviation in the prediction process. Van-Deventer et al. [20] proposed a support vector machine model based on a genetic algorithm (GASVM). The GASVM model classifies the historical weather data using an SVM classifier initially, and later, it is optimized by the genetic algorithm using an ensemble technique. Malvoni et al. [21] proposed to use wavelet decomposition and principal component analysis to decompose meteorological data and treat it as predictive inputs. A time series forecasting method, named as GLSSVM (Group Least Square Support Vector Machine), which combines the Least Square Support Vector Machines (LS-SVM) and Group Method of Data Handling (GMDH), was applied to the measured weather data.
More recently, deep learning techniques, such as the convolutional neural network (CNN), long short-term memory (LSTM), and deep belief networks (DBNs), are widely applied to the PV system energy generation forecasting problem. Srivastava and Lessmann [22] proposed to adopt the original LSTM neural network for PV energy generation forecasting, and the proposed algorithm is hardly generalizable. Jian et al. [23] proposed an online sequential extreme learning machine with a forgetting mechanism model for PV energy generation prediction. The normalized data is input into a two-layer CNN for feature extraction and then merged into LSTM by fusion layer for PV power prediction. Zhou et al. [24] proposed a LSTM neural network combining the attention mechanism (AM) for PV energy generation forecasting. The AM is used to adaptively perceive feature information from the time series data. Chen et al. [25] proposed a cloud shadow model using computer vision techniques to forecast real cloud coverage nature for PV systems and enhanced the PV energy generation forecasting results consequently. Chang et al. [26] introduced a virtual inertia control based on PV load forecasting results, showing real-world applications of the PV energy generation forecasting techniques.
In this paper, we proposed a novel PV power forecasting framework combining clustering and deep learning technologies. The entire framework can be divided into three stages. In the first stage, according to the impact of the training data in the previous paragraph, we divide the training data into four clusters using the SOM algorithm, mimicking the four seasons. The reason for using the SOM algorithm as the experimental clustering algorithm is described in 2.2.2, and the reason for simulating the data into four seasons for the experiment is described in 2.2.1. In the second stage, use AM, CNN, and LSTM to build the model. The third stage is the forecasting stage. According to the month of the testing data, the most suitable forecasting model is selected to predict the PV energy generation for the next time stamp. Experimental results show the superior performance of the proposed method over the existing works. Compared to the existing methods tackling on the same problem, the main contributions of the proposed method are summarized in the following points.
1.1. The Raw Data Is Processed by Clustering Techniques for More Accurate Forecasting Performance. Based on the data collected from the eastern region of China, the solar irradiance patterns vary between different seasons. The data is first clustered into four classes and trained using four different LSTM neural networks, respectively, to enhance the forecasting accuracy.

Methodology
The dataset employed in this study was collected by a PV power station located in Shaoxing city in the eastern part of China. This solar energy generation dataset was collected from October 2014 to September 2018, in a time interval of 7.5 min. The data from 2014 to 2016 is taken as the training dataset, and the data from 2017 to 2018 is used for testing. Remote sensors were utilized to record the PV module temperature, the current, voltage, frequency, phases, and PV power every 7.5 minutes. Since the power station uses three-phase inverter equipment, the data contains the PV module temperature, three AC currents (alternating current 1, alternating current 2, and alternating current 3), three AC voltages (AC voltage 1, AC voltage 2, and AC voltage 3), two DC currents (direct current 1 and direct current 2), two DC voltages (DC voltage 1 and DC voltage 1), frequency, phases, and PV power. The frequency and phase do not change with the PV power, so the frequency and phase data are not used in the experiments. It is very difficult to obtain huge amount of historical weather data with a time interval of 7.5 min at the same location from the weather bureau. We do not add weather data to the experimental data and find a data preprocessing method to effectively improve the predictive ability of the model.

Wireless Communications and Mobile Computing
We calculated correlation coefficients between other factor data and PV power. The correlation of the multifactor combination on the PV power is shown in Table 1. A combined correlation test method for calculating the modified RV coefficient is introduced here. Modified RV coefficient (RV mod ) is a correlation analysis method based on matrix calculation. The equation for calculating the modified RV coefficient is where J = M · M T − diag ðM · M TÞ; K = N · N T − diag ðN · N TÞ; M represents the matrix of influencing factors, including alternating current 2 (AC2) and alternating current 3 (AC 3); N represents PV power output matrix; diag ð·Þ is a function that takes out the diagonal elements of the matrix; trð·Þ is a function that takes the sum of the elements on the diagonal of the matrix; the range of RV mod is (-1,1), when the RV mod is closer to -1 or 1; and the correlation is higher between the factor and the power output.
For the coefficients AC2 and coefficients AC3 which are very close, we specially carry out experiments and compare the experimental results of the datasets of AC2 and AC3. The experimental results show that, although the coefficient difference is not large, the prediction accuracy of AC2 is significantly higher than AC3. In this study, we extract AC2 data as experiment data.

Clustering of the Original Data.
In the clustering stage, the AC2 data from 2014 to 2016 is the training dataset and used for clustering. Considering the different numbers of days per month and the lack of PV data, we selected 15 days of complete data from each month and made each month AC2 data to a dataset as ½x 1 , x 2 ,⋯,x 15 ; the dataset indicates that a month's AC2 data, x, contains 192 AC2 real-time data for one day. The clustering experiment uses two-year AC2 data. We merged the datasets with the same month label. The SOM algorithm is used to perform the clustering task. The raw data was clustered into 4 clusters.
The SOM model structure is shown in Figure 1. The learning steps of the SOM algorithm are as follows.
In the first step, the weight vector corresponding to each neuron in the competition layer was initialized, normalizing the current input mode vector X and the weight vector corresponding to the neuron. Secondly, the neuron node d with the smallest Euclidean distance between X c (input vector) and ѡ j (connection weight vector) is selected as the winning neuron node. As a result, the weights were updated accordingly. Following Equation (3), the weight of the winning neuron node d and other nodes in its domain is updated, where η ðtÞ is the learning rate of the t step in the range of (0, 1) and h d，j ðtÞ is the neighborhood function. h d，j ðtÞ generally uses a Gaussian function, as shown in Equation (4). The specific adjustment rules are as shown in Equations (5) and (6).
where the distance between the neuron d and the neuron j is d dj 2 , rðtÞ is the neighborhood radius, INT is the rounding function, and T is the learning frequency.

Deep Learning Models.
In the training stage, the 4 clusters of data are inputted into a model for training, and 4 forecasting models are trained. The training process follows the training method of time series. We display the training data of each category as ½y 1 , y 2 , y 3 ,⋯,y n , where y represents AC2 data and n represents the number of AC2 data contained in each category. We set timestep i, ½y 1 , y 2 ,⋯,y i , as a new training set, and y i + 1 as the target value; then, we put the new training set into the training model, and the model parameters are adjusted during training so that the predicted output is constantly close to y i + 1. Next, ½y 2 , y 3 ,⋯,y i + 1 is inputted into the training model and adjusts the model parameters so that the predicted output is continuously close to y i + 2, and so on, until the target value is y n ; end the model training and get the forecasting model. The training model structure of this experiment consists of two convolutional kernel layers of CNN, one layer of LSTM with attention mechanism, and a fully connected (FC) layer. The overall flowchart of the proposed framework is depicted in Figure 2.
This proposed deep learning model (CNN-ALSTM) is a hybrid deep learning model to extract features from the raw data and perform foresting using the LSTM neural network. The CNN layer is employed to extract the useful features from the time series data, representing additional latent information of the data, which has the potential to improve the prediction accuracy. The experimental results show that the CNN layer contains one 16 3 × 1 convolution kernel layer and one 32 3 × 1 convolution kernel convolution layer that optimizes the prediction performance.
The feature vector obtained from the second layer of CNN was inputted into the LSTM layer for prediction. Each element of the feature vector has correspondence to one of the 32 units in the LSTM layer. The attention mechanism puts higher weight to the feature quantities that are significantly related to the current output. Last, the output vector of the attention mechanism is processed by a FC layer using the unfolding operation. The predicted value of the AC2 at the next moment is output.
LSTM is very suitable for prediction experiments of time series data. Existing works show superior forecasting performance combining CNN and LSTM for various applications 3 Wireless Communications and Mobile Computing [23,[27][28][29]. CNN helps LSTM to better extract the characteristics of experimental data. The attention mechanism is a process of assigning weights. Using the attention mechanism, more accurate weightage values are assigned to the LSTM output vector to improve the prediction capability of the model.
The predictive power is obtained by the inverse normalization according to Equation (7).
where Pr p is the predicted value of power and Pr Iac2 is the predicted value of AC2.

Long Short-Term Memory Neural
Network. The LSTM model structure is shown in Figure 3.
The appearance of LSTM cell structure effectively resolves the gradient explosion/vanishing problems. There are four important elements in the flowchart of the LSTM model: cell status, input gate, forget gate, and output gate. The input, forget, and output gates are used to control the update, maintenance, and deletion of information contained in cell status. The forward computation process can be denoted as where W f , W i , and W o are the weight matrix of the forgetting gate, input gate, and output gate, respectively; b f , b i , and b o are the offset items of the forgetting gate, the input gate, and the output gate, respectively; σ is the sigmoid activation function; tanh is the hyperbolic tangent activation function.

The Attention Mechanism.
The attention mechanism is a brain signal processing mechanism peculiar to human vision. Human vision quickly scans the global image to obtain the target area that needs attention and ignores other areas of useless information. The attention mechanism algorithm has been successfully implemented and applied to model training [26] and other related fields.
The model proposed in this paper uses the LSTM hidden layer output vector H = fh 1 , h 2 ,⋯,h t g as the input of the attention mechanism, and the attention mechanism will find the attention weight α i of h i , which can be calculated as shown in where W h is the weight matrix of h i and b h is the bias.

Wireless Communications and Mobile Computing
The values of W h and b h vary during the ALSTM training process. The range of e i is (-1,1). The attention vector H ′ = fh 1 ′ , h 2 ′ , ⋯, h t ′ g can be obtained by multiplying attention weight α i and h i : The attention mechanism is implemented as a custom layer where the parameters are optimized using RMSProp backpropagation [33,34].

The Overall Forecasting Framework.
There are four deep learning models trained in the proposed framework ( Figure 4). In the testing stage, an appropriate forecasting model is selected based on the grouping result of the testing data sample. The proposed hybrid deep learning model structure is shown in Figure 5. As a necessary preprocessing step, we used correlation coefficient to determine inputs for prediction. The weather data was employed by the SOM algorithm to cluster inputs into four categories. Each category is trained by the proposed hybrid deep learning framework. In particular, the CNN is used to extract the time series features of AC2 data; the LSTM neural network further extracts high latitude features in the data. The attention mechanism is used to assign different attention weights to the output elements of the LSTM hidden layer. In the testing phase, according to the month of the test set, we chose the appropriate model to predict the testing result (Figures 4  and 5).
The process of data classification method determination and hyperparameter determination is also described, including the reason for simulating data in four seasons, the selection of clustering algorithm, the settings of SOM competition layer, and the selection of SOM classified result.

The Selection of Clustering Algorithm.
The original idea of the clustering is to map the raw data into four different groups, representing the four seasons. These four groups of data are trained and tested using the proposed CNN-ALSTM structure separately. Different clustering algorithms are tested in the section. In the training phase, we analyze clustering algorithms, including k-means, FCM, and SOM ( Figure 6), for clustering the training dataset into four clusters. Based on the experimental results on the training data, SOM is more accurate than most of the above-mentioned traditional clustering algorithms ( Figure 6) according to the original season labels of the raw data.
The size of the SOM competition layer has been evaluated through a series of experiments as shown in Figure 7. The X-axis represents the size of the competition layer, and      The clustering results of the SOM algorithm have a potential relationship with the prediction effect of the forecasting model. The number of iterations of the SOM algorithm is set between 500 and 1000, and the number of times is debugged in hundred. The batch size is set to 15 or 30.

Experimental Process and Results
In this study, the average absolute percentage error MAPE, the mean absolute error MAE, and the root mean square error RMSE are used to evaluate the prediction ability of the model. The detailed equations of three error metrics are formulated in

Wireless Communications and Mobile Computing
The experiment compares the proposed model, our previous work, that combines the LSTM and attention, together with the MLP and LSTM model alone. The most relevant factor obtained in this paper is AC2 as input to the training model. The resulting MAPE results are shown in Table 2. The resulting RMSE results are shown in Table 3. The resulting MAE results are shown in Table 4.
In the case of less training data, using the clustering phase and prediction phase mentioned in this paper will reduce the prediction accuracy. When the time interval is 30 min, the amount of data is reduced to a quarter of the data volume with a time interval of 7.5 min. From the comparison of different time intervals, when the interval is 30 min, the prediction accuracy drops. On the contrary, in the case of more training data, the prediction accuracy can be effectively improved.
The experiment randomly selects two days of the month as a display of the predicted results. As shown in Figures 8-10, according to the law of PV power generation, the power generation in the early evening to the early morning of the next day is 0, which is not shown in Figures 8-10.
In Figures 8-10 Through the experimental clustering, training, and prediction stage, the accuracy of the forecasting model has a significant outperformance over the compared models when the amount of training data is sufficient. The clustering stage divides the original PV data into four different clusters. And each clustered data is trained by a hybrid deep learning framework combining CNN and LSTM. The sophisticated deep learning framework obviously enhances the forecasting performance over singular methods, such as MLP and LSTM. The AM technique already shows a great influence of the forecasting results in the experiments with ALSTM. The performance improvement is further enlarged for the proposed methods. According to Tables 2-4, the proposed model has a good performance on the RMSE indicator. Compared with the RMSE indicator, the MAE indicator reflects the actual situation of the prediction value error. The proposed model has the better performance than other models in the MAE indicator for intervals within 7.5 min. It is noted that the proposed model has significantly higher accuracy in the prediction of intervals within 7.5 min and has no obviously high accuracy compared with the other individual model in the prediction of intervals within 15 min and 30 min.

Conclusion and Discussion
Photovoltaic power generation prediction is of great significance for maintaining grid security and coordinating resource utilization. In the era of big data, it is possible for AI-driven IoT technology to perform accurate solar energy generation forecasting based on historical solar energy data [24,[30][31][32]. This paper proposes a hybrid deep learning method based on weather categories. Unlike traditional models, this experiment includes correlation analysis and clustering calculation of data, which effectively improves the generalization ability of the model and improves the prediction accuracy. The training algorithm employs the CNN algorithm, which can extract the data features more effectively and get more potential information in the data. Secondly, the attention mechanism is applied to the LSTM model, focusing on the extracted important features. In the prediction stage, the month of test set is used to determine 9 Wireless Communications and Mobile Computing the forecasting model, which improves the accuracy of the prediction. This forecasting model is superior to the traditional algorithm model in predicting performance, and it also shows outstanding results in predicting other data types.
The experimental results shown in Section 3 show that the proposed method outperforms the traditional time series data prediction methods, such as MLP, LSTM, and ALSTM for PV system energy generation forecasting. Three evaluation metrics are used to show the superior performance, including RMSE, MAPE, and MAE. According to the results collected in Tables 2-4, the proposed method achieves higher forecasting accuracy for short-term and very short-term forecasting, e.g., the 7.5 min advanced forecasting. For longerterm forecasting, such as time intervals of 15 and 30 min, the forecasting performance advantage decreases.
One of the future works of this study is to extend the existing work for more generalized datasets, i.e., achieving acceptable forecasting results for longer-term forecasting of the PV system energy generation. Another future work direction is to apply the proposed framework towards a broader range of time series data applications in other fields, such as air quality forecasting [14,35] and energy consumption forecasting [36].

Data Availability
Access to data is restricted. The solar irradiance data used in this study is collected by a local company in Shaoxing, China. The data is confidential and is only available for the company's internal usage.