Short-Term Solar Irradiance Prediction Based on Multichannel LSTM Neural Networks Using Edge-Based IoT System

Most photovoltaic power generation methods use global level irradiance (GHI) as the main input and output. However, randomness, instability, and intermittency are the main factors that seriously degrade the solar irradiance prediction results. Traditional data-driven prediction models are di ﬃ cult for accurate predictions. In this study, a multichannel deep learning model named multichannel, wavelet transform combining convolutional neural network and bidirectional long short-term memory (MC-WT-CBiLSTM) framework-based edge computing and IoT system is proposed to improve the GHI prediction accuracy. The solar irradiance data is decomposed by wavelet transform to reduce data complexity. Each decomposed component is inputted into the multichannel MC-CBiLSTM deep learning framework for forecasting and combined to produce the ﬁ nal results. The comparison with existing solar irradiance forecasting methods shows that the proposed MC-WT-CBiLSTM deep learning framework has obvious advantages in the prediction of various time horizons.


Introduction
As one of the green, clean, and sustainable energy, solar energy accounts for an increasing proportion of the current world energy structure. Accurate and reliable solar irradiance prediction brings significant benefits to the construction of modern smart grids [1][2][3]. For effective design and control of the photovoltaic (PV) energy, it is necessary to accurately predict solar irradiance in advance [4]. PV power generation and irradiance are positively correlated. However, the irradiance is affected by various external factors such as temperature, weather, and seasonality. These complex factors interact with each other to make GHI change irregularly, making the prediction difficult for traditional methods [5].
Data-driven prediction and forecasting methods, including machine learning (ML), edge computing (EC), and internet of things (IoT), are crucial methods in the field and important for the operation and dispatch of Industry 4.0 [6][7][8][9]. A recent study shows that ML, EC, and IoT methods have significant advantages in irradiance data prediction compared with physics-based models. Precise irradiance prediction provides the approximation of the expected PV output power for the dispatching plan of the grid company's operators [10].
The recent development of various DL methods seriously influences the issues of time-series data analysis and forecasting [11][12][13][14][15]. Long short-term memory (LSTM) has been widely applied to different time-series data analysis fields, including air quality prediction [16], short-term load prediction [17], irradiance prediction [18], and cyberphysics systems [19,20]. While the original RNN cannot consider the dependence between long-term sequences, causing problems such as gradient disappearance or gradient explosion, LSTM cleverly solves the problems of gradient disappearance and gradient explosion by increasing the selectivity of the unique gating unit structure control information [21].
Difficulties exist for traditional LSTM neural networks for the solar irradiance forecasting problem. First of all, the irradiance data presents high volatility with weather changes, which are difficult to accurately capture by the neural network. Secondly, a variety of external factors such as temperature, wind speed, and cloud density may have a certain impact on the irradiance prediction. Therefore, it is necessary to consider the mutual influence of multiple feature data to make more accurate predictions. For these neural network methods based on historical data, in order to make more accurate irradiance predictions, the structural complexity of the neural network must be increased. While remembering the characteristics of the longer-term sequence, the mutual influence between variables should also be considered [22].
Taking into account the shortcomings in the current studies in the field, this paper proposes a multichannel, wavelet transform combining convolutional neural network and bidirectional long short-term memory (MC-WT-CBiLSTM) framework for solar irradiance forecasting. BiLSTM is improved from LSTM by combining an LSTM moving from the beginning of the sequence and an LSTM moving from the end of the sequence to the beginning of the sequence [23]. In addition to the BiLSTM model, a one-dimensional convolutional neural network (CNN) is also used to further extract data features. Wavelet transform (WT) is introduced to decompose the original input data into multiple subsequences with different frequencies. Then, each subsequence is individually connected to a CNN-BiLSTM module for short-term GHI prediction. Experimental results show that wavelet decomposition can effectively reduce data complexity and improve prediction performance. At the same time, BiLSTM combined with CNN learns more sequence features from different dimensions and improves the prediction performance. According to experiments, the proposed MC-WT-CBiLSTM depth model framework has the following advantages over the existing methods: (i) A data preprocessing step with wavelet transform. As GHI data is affected by many factors, the solar irradiance data fluctuates greatly. The wavelet transform preprocessing step effectively reduces the data complexity and improves the prediction ability of the multichannel CNN-BiLSTM model (ii) A sophisticated multi-input multichannel network structure. The proposed framework takes the mutual influence of temperature factors and GHI data into consideration and proposes to use multiple channels for parallel learning (iii) A deep network framework integrating CNN and BiLSTM. To the best of our knowledge, it is the first time that WT-CBiLSTM is combined with multichannel ideas for GHI prediction. Comparative experiments show that the prediction performance of the framework is due to the current advanced prediction methods

Related Works
The problem of time-series data prediction has always been one of the important topics in the field of artificial intelli-gence (AI). A lot of work has been done in the field of time-series forecasting. LSTM is one of the most popular deep learning models. Compared with traditional neural networks, the unique gated unit structure enables LSTM to remember information for a longer period of time [24][25][26][27]. Wen et al. [28] implemented the LSTM model to predict photovoltaic power generation and power load. The prediction performance of the proposed LSTM neural network is significantly better than the ML model. Yan et al. [29,30] proposed a hybrid deep learning neural network framework that combines LSTM neural network and CNN to solve the problem of single household electricity consumption prediction. The use of CNN adds a preprocessing stage and extends the traditional LSTM neural network. Combined with CNN's LSTM can predict sequence changes more accurately. This research proves the advantage of onedimensional convolution in processing time-series data. Zhou et al. [31] proposed an LSTM model combined with an attention mechanism to predict photovoltaic power generation. Taking into account the impact of temperature data on photovoltaic power generation, the attention mechanism adaptively focuses on more important input features, and the prediction effect is better than the comparison model of each time field of view. A large number of prediction studies have proved that a variety of data preprocessing strategies have greatly improved the prediction ability of the neural network model. Zheng et al. [32] proposed a hybrid deep learning model that combines empirical model decomposition (EMD) and LSTM to decompose the original data into multiple intrinsic mode functions (IMF) for better predictive analysis. It can be known from the research results that the decomposition of the waveform has a good effect on the prediction of time series. Wu et al. [33] realized singular value decomposition, reconstructed the original cutting force signal of the tool, and then used BiLSTM to predict the feature subsignal, thereby effectively improving the prediction accuracy.
While irradiance forecasting has received increasing number of attentions, people have adopted a variety of forecasting methods for irradiance forecasting. In [34], Yan et al. added the Inception-ResNet network for feature extraction and then input the extracted features into the GRU-Attention network for training prediction. The fusion of complex structures increased the complexity of the network. Zhao et al. [35] proposed 3D-CNN to perform feature analysis on ground cloud images for irradiance prediction and achieved very good prediction results.
The surveyed works show that for the nonlinearity and instability of the current time-series data, adopting a variety of data preprocessing strategies can effectively improve the prediction performance of the neural network model [36,37]. The multichannel complex neural network fusion model proposed in this paper shows the effectiveness of predicting unstable irradiance data. Different from the conventional stacked CNN-LSTM, in the proposed hybrid model, CNN and LSTM extracted features in parallel, which results in more robust features with less loss in terms of data information. In [38], a multichannel DL framework was proposed for electrical load time-series prediction. The 2 Wireless Communications and Mobile Computing framework consists of two parallel channels and a feature fusion module. One of the channels is composed of the CNN layer, and the other is the LSTM layer. These two channels are connected in the feature fusion module, and then, the final output is set. The final prediction result is better than most deep models.

Methodology
The experimental flowchart of the proposed MC-WT-CBiLSTM model is shown in Figure 1. The features used to predict GHI include irradiance and temperature data. After normalizing each feature, a three-layer wavelet transform is performed separately to reduce the complexity of the input data to obtain a more predictable subsequence. The subsequence is trained by the proposed MC-CBiLSTM framework, and the final prediction result is obtained. The experiment uses five evaluation indicators to evaluate the predictive performance of the proposed model.

Data Source and Preprocessing.
The data used in this article comes from a comprehensive set of solar irradiance, imaging, and prediction data released by Pedro et al. [39] in 2019. The data includes three-year (2014-2016) quality control, 1-minute resolution global level irradiance, and direct ground measurement of normal irradiance in California. In addition, it also provides overlapping data from commonly used exogenous variables, including sky images, satellite images, and numerical weather forecast predictions. The experimenter selects global level irradiance and temperature data. The data for the three years from 2014 to 2016 are selected according to the training set, and the test set ratio is 4 : 1. The experiment chooses the z-score normalization method to preprocess all input data, and the calculation formula is as follows: where μ is the average of all sample data and σ is the standard deviation of all sample data.

Wavelet
Transform. Due to the severe volatility of the original GHI data set, this paper proposes WT's data processing method to decompose the original solar irradiance series data into multiple subsequences of different frequencies. These subsequences include a stable part (low-frequency signal) and a fluctuating part (high-frequency signal). These decomposed subsequences have better behavior in terms of rules. The wavelet transform decomposes the input data into multiple subcomponents, reducing the complexity and nonlinearity of the input data. These relatively stable simple subsequences are more stable, which is conducive to model training.
Generally speaking, the irradiance sequence data always presents high volatility, variability, and randomness due to its correlation with nonstationary weather conditions. Therefore, the original solar irradiance sequence may include nonlinear and dynamic components in the form of spikes and fluctuations [39]. WT is a decomposition method of discrete sampling of the input sequence. The key advantage of WT over Fourier transform is that WT can simultaneously capture frequency and position information (position in time). In addition, it is also good at multiscale information processing [40]. These advantages make WT an effective tool for complex data sequence analysis.
The main feature of wavelet transform is that the transformation can fully highlight the characteristics of certain aspects of the problem, localized analysis of time (space) and frequency, and gradually multiscale refinement of the signal (function) through the expansion and translation operation and finally reach the high-frequency time subdivision and low-frequency subdivision, which can automatically adapt to the requirements of timefrequency signal analysis, so that you can focus on any details of the signal. CWT is to select a center frequency and then obtain a large number of center frequencies through scale transformation and then obtain a series of basic functions in different intervals through time shift and then integrate the products of a certain segment of the original signal (corresponding to the interval of the basis function), respectively, and the result is the frequency corresponding to the extreme value is the frequency contained in this interval of the original signal. Since CWT requires a continuous signal, but the actual sampled signal is often discrete, we cannot directly perform CWT on the actual signal. In order to perform wavelet transformation on the irradiance sequence, the discrete wavelet transform (DWT) needs to be introduced. The discrete wavelet transform is obtained by discretizing the scale and displacement of the continuous wavelet transform according to the power of 2. The characteristics of the irradiance time series determine that the discrete wavelet transform is more suitable for decomposition.
There are many types of wavelet basis functions, such as Hear wavelet, Symlet wavelet, and dbN wavelet. In this study, wavelet transform (WT) with db1 wavelet basis function is implemented to decompose the original data into multiple subsignals, including denoising low-frequency components and denoising high-frequency components. The decomposition evidently improves the learning ability of the subsequent neural network models. Wavelet transform is a localized analysis of time and frequency. It gradually refines the sequence in multiple scales through the expansion and translation operation. It can automatically adapt to the requirements of time-frequency sequence analysis, subdividing time at high frequencies and subdividing frequencies at low frequencies. In this way, the timefrequency variation characteristics of the irradiance time series are analyzed.
Given a mother wavelet function ψðtÞ and its corresponding reduced order function φðtÞ, calculate the wavelet ψj, kðtÞ and the binary reduced order function ϕ j,k ðtÞ:

Wireless Communications and Mobile Computing
where t represents the time index, j represents the zoomin variable, and k represents the translation variable. After the original sequence is decomposed n times, multiple components are obtained: Through multiple decompositions, the low-frequency component A nt is decomposed into the next layer of lowfrequency components A ðn+1Þt and high-frequency components D ðn+1Þt . The WT level in this paper is three. The original data is decomposed into A3, D1, D2, and D3. The decomposition sequence is directly input into the model framework for training. The wavelet decomposition process is shown in Figure 2 3.3. Convolutional Neural Network. CNN is an emerging branch of DL. Different from traditional ways of feature extractions, CNN automatically generates useful and discerning features from raw data. This efficient feature extraction feature has been widely used in image recognition, speech recognition, and natural language processing [40].
Each subsequence decomposed from the original solar irradiance data set sequence is a one-dimensional sequence. A one-dimensional CNN is used as a local feature extractor. CNN adds a preprocessing stage and extends the BiLSTM neural network. In the processing stage, useful features are extracted from the original data, which improves the accuracy of subsequent predictions.
CNN can recognize simple patterns in data well and then use them to form more complex patterns in higher layers. One-dimensional CNN obtains more detailed features from a shorter (fixed-length) segment of the overall irradiance data set, and the position of the feature in the sequence segment is not correlated; the one-dimensional CNN will be very effective. In this paper, CNN is used to extract the features of each subsequence of wavelet transform, which further optimizes the learning of data features and facilitates the improvement of the prediction accuracy of subsequent neural network models.

Bidirectional Long Short Memory Neural Network
(BiLSTM). The long-term short-term memory (LSTM) model is a special form of recurrent neural network (RNN) that provides feedback on each neuron. The unique gating unit solves the problem of gradient disappearance and gradient explosion when RNN processes long sequences. In the traditional RNN model and the long-term memory recurrent neural network (LSTM) model, information can only be propagated forward. This makes the current sequence state of the model only relate to the previous state. The bidirectional LSTM is an extension of the traditional LSTM, which combines two sets of LSTM in an opposite manner. This two-way structure facilitates simultaneous learning of forward and reverse sequence information, making the prediction results more integrated. BiLSTM not only considers the before and after correlation of the sequence but also solves the problem of prediction lag that may exist in oneway LSTM. The structure of BiLSTM is shown in Figure 3.
Since GHI data fluctuates significantly over time, the characteristics of the data before and after are closely related. The BiLSTM model is selected to predict the irradiance data, combined with the before and after correlation of GHI. Relying on this two-way characteristic, more detailed data characteristics are obtained. BiLSTM effectively improves the prediction accuracy of GHI.
The final prediction output is determined by the two values of the hidden layer of the bidirectional network. The formulas for the gating units of the BiLSTM model are as follows:    Wireless Communications and Mobile Computing proposed framework is shown in Figure 4. Each input sequence is decomposed into multiple subsequences using WT. Then, each subsequence is inputted into the MC-CBiLSTM framework. Each subsequence is individually connected to a CNN-BiLSTM channel, and the channel parameters are adjusted according to the complexity of the subsequence to achieve the best prediction effect. The input GHI and temperature data are learned separately in two parallel channels. Each channel is connected by a feature fusion layer. In the feature fusion layer, the feature information of each channel is shared, and the prediction results are output together. Experimental results show that the output of GHI is affected by the temperature component. In view of actual experience, it is known that the irradiance and temperature do have certain internal influences. The interaction between the two can achieve more accurate prediction results than single-sequence prediction. In Figure 4, the multichannel training layer is divided into GHI channels and TEMP channels, and the subsequence data after wavelet transformation are input, respectively. Each sequence is individually input to a CNN_ BiLSTM model. Taking into account the internal correlation between components, the correlation effect may improve the accuracy of GHI prediction. Onedimensional CNN is designed for local feature extraction to improve prediction accuracy. For different input fea-tures, the number of filters can be flexibly adjusted to achieve the best feature extraction effect. The RMSprop optimizer is used to minimize the mean square error (MSE) loss function. The forecast steps are 10 minutes, 30 minutes, and 60 minutes. The neural network model was trained for 16 iterations. The BiLSTM unit of each channel has 100, 64, 64, and 32 neural units, respectively. The remaining hyperparameters include activation = " linear, " validation split = 0:05. Each channel is connected to the feature fusion layer for information sharing and finally undergoes wavelet inverse transformation to obtain the final prediction result.
The proposed MC-WT-CBiLSTM multichannel deep network framework consists of two parallel input channels, and two input features are trained separately. With edge computing solutions, input channels can be placed in different positions. For each input feature channel, four subchannels are connected, and the subchannels are used to train the subsignals after wavelet decomposition. Each subchannel consists of a CNN-BiLSTM layer, a feature fusion layer, and an output layer. The four subsequences after wavelet decomposition are input into one subchannel, respectively, and the CNN and LSTM parameters of each subchannel are different. The purpose is to train the model from different depths and finally perform overall prediction through feature fusion.
anh tanh y t-1 y t x t+1 x t H t x t-1 x t Figure 3: BiLSTM structure and its internal LSTM cell unit structure.

Wireless Communications and Mobile Computing
Compared with the existing methods, the proposed framework not only considers the internal influence of temperature factors and GHI data but also is equipped with different channels, and multiple channels are connected for parallel learning of decomposed subsequences. Compare this multichannel model with a single-channel model. It can learn the characteristics of the input sequence in more detail. Compared with the existing single-channel model, the time dependence between features can be captured more accurately, and the decomposition of the input signal enables the framework to understand data fluctuations in more detail. The effective local feature extraction ability of one-dimensional convolution will further improve the predictive ability of the model. In some cases, BiLSTM considers the overall correlation of the sequence and is more suitable for predicting irradiance data, such as periodic fluctuations, than traditional LSTM.

Evaluation Metrics.
In this experiment, five error evaluation indexes of absolute error (MAE), root mean square error (RMSE), average absolute percentage error (MAPE), coefficient of determination (R 2 ), and symmetric average absolute percentage error (SMAPE) are selected to evaluate the accuracy of prediction. The specific formulas of the 5 indicators are as follows:     Table 2. Each model in the table is trained using GHI and TEMP feature data. Comparing the prediction performance index tables of the three time periods, it shows that the prediction performance of each model decreases significantly as the time interval increases. The prediction results show that the MC-WT-CBiLSTM model proposed in this paper still maintains good prediction performance. Compared with machine learning models, machine learning may be better than some deep learning models in short-term predictions such as 10 min predictions. However, as the time interval increases, the performance of machine learning prediction decreases significantly. This article carried out multiple sets of comparative experiments. It can be seen from the experimental results. The prediction results of LSTM or BiLSTM alone are poor, because the network structure is relatively simple and cannot learn more detailed features. The feature extraction ability of CNN can improve the learning ability of the model to a certain extent, but it has limited processing ability for complex data volatility. At the same time, the wavelet transform is added to reduce the complexity of the irradiance data. The results show that the wavelet has a great improvement in the predictive ability of the neural network. This article starts from multiple angles. On the one hand, wavelet transform is introduced to reduce the data complexity, and on the other hand, CNN is introduced for feature extraction. The results show that CNN and wavelet transform alone have certain limitations, and the combination of the two can more effectively improve the prediction accuracy.
The prediction results of the proposed MC-WT-CBiLSTM depth model and multiple comparison models are shown in Figure 5. Based on the last year's full-year data microtest set, the following picture shows the forecast results of the four seasons of spring, summer, autumn, and winter. It can be seen from the prediction effect graph that the proposed model has a good learning ability against various fluctuations of GHI data and has a better learning ability than other models. Figures 5(a) and 5(d) show the 10-minute time interval forecast. Due to the short time interval and the relatively smooth GHI data, all models have achieved good prediction results. However, most model predictions generally have a certain lag. The highest and lowest points of the irradiance data cannot be accurately fitted. And the prediction result graph shows that the model after adding the waveform decomposition can capture more fluctuation information. From the fitting curve in the figure, the prediction effect of each model can be observed more intuitively. Only LSTM and BiLSTM have the worst fitting results. Compared with a single neural network, the prediction effect of CNN-LSTM and CNN-BiLSTM has been improved to a certain extent, but it still falls short of expectations. Due to complex data fluctuations, the neural network cannot learn accurate information, so wavelet transform is introduced for this purpose. The ability of wavelet transforms to reduce the complexity of the frequency domain effectively reduces the learning difficulty of neural networks. But    Figure 5 show that the MC-WT-CBiLSTM proposed in this paper can better fit the original GHI data. In order to see the prediction performance of each model more clearly, the prediction effect in the blue dashed box in the figure has been partially enlarged. Figure 5 shows that the model proposed in this paper has significant advantages whether it is the overall prediction effect or the partial detailed prediction effect. The model proposed in this paper accurately predicts the fluctuation of data in three time periods. The prediction effect of each model shows that adding a series of data processing strategies to the irradiance prediction can effectively improve the prediction accuracy. For example, in the comparison model in this article, the fusion of CNN or WT obtains more accurate prediction results than the traditional single model.
Both the evaluation index and the fitting effect diagram prove the superiority of the model proposed in this paper. It can be seen from the fitting results in the figure that a single LSTM and BiLSTM model has certain difficulties in processing such complex irradiance data. This is because a single neural network cannot learn more in-depth data features, and at the same time, the neural structure is simple, and there is a certain performance bottleneck in the prediction of complex data. And from the results, most of the excellent prediction performance is due to the parallel learning of multiple channels. Multiple channels learn features of different depths. Compared with single-channel learning, deeper learning features can make more accurate predictions. At the same time, the bidirectional learning ability of BiLSTM enables the model to learn sequence features from two directions. In some of these specific scenarios, such as the irradiance sequence, BiLSTM is more practical than LSTM in this case due to the front-to-back correlation. Wavelet decomposition has also made a great contribution, and its ability to reduce data complexity can improve the predictive ability of neural networks. But the disadvantage is that the decomposition of the waveform makes the amount of training data extremely large, and the training time is significantly increased.
A rectangular graph of the 60 min prediction is shown for the further evaluation of the prediction performance of the MC-WT-BiLSTM model ( Figure 6). The horizontal axis in the figure represents the real data, and the vertical axis It is obvious that the model proposed in this paper is closer to the ideal value. It is concluded from the distribution of the forecast data in the figure that the distribution of the forecast results of the model proposed in this paper is closer to the ideal straight line. This tightly distributed data indicates that the predicted result is closer to the true value.

Conclusion and Discussion
Solar irradiance prediction adopting AI and IoT technologies is of great importance for smart grid and city designs. In this study, considering the nonstationary and nonlinearity of GHI data, a multichannel multimodel fusion framework MC-WT-BiLSTM is proposed on the edge for accurate and effective solar irradiance prediction using cutting-edge edge computing and IoT technologies. The most advanced DL technology was adopted. A multichannel hybrid network model combining CNN and BiLSTM is proposed. The wavelet decomposition strategy is selected to process the irradiance data. The experiment utilizes a comprehensive solar irradiance data released by Pedro et al. in 2019. A comprehensive comparison with a variety of advanced depth models proves the effectiveness of the MC-WT-CBiLSTM model. Through comparison and prediction of multiple time intervals, it is evident that the proposed DL model has the most superior performance over the existing approaches. The fitting effect diagram in Figure 6 shows that the prediction method proposed in this article has a smaller prediction error. The results of various comparative experiments show that the various methods combined with the MC-WT-CBiLSTM model have the effect of improving the prediction ability.
The experiment takes into account the internal correlation between temperature data and GHI. At the same time, multichannel parallel learning enables the model to learn more data features. Summarizing the forecasting method of this article draws the following conclusion. First of all, for complex and nonstationary data, the waveform decomposition strategy is an effective way to reduce the complexity of the data. Moreover, one-dimensional convolution has excellent feature extraction capabilities and can achieve good feature extraction effects in the prediction of time-series data with greater volatility. As a variant of LSTM, BiLSTM is widely used in the field of NLP, mainly due to its bidirectional learning ability. For irradiance data with certain periodicity, it has an excellent predictive effect.
A future working direction of this study is to add more features to make more complex predictions. At the same time, the generalization ability of most of the current forecasting methods in the literature is poor, and only good results can be achieved in a small range. The next work is to improve the model in this paper and improve its generalization to be applied to more time-series forecasting fields.

Data Availability
The data used in this study is confidential.

Conflicts of Interest
The authors declare that there is no competing interest.