Enhanced Deep Belief Network Based on Ensemble Learning and Tree-Structured of Parzen Estimators: An Optimal Photovoltaic Power Forecasting Method

The random fluctuation and non-uniformity of Photovoltaic (PV) power generation greatly affect the power grids’ stability and operation. This paper addresses the high volatility of PV power by proposing a precise and reliable ensemble learning model for short-term PV power generation forecasting. The proposed forecasting tool incorporates a base model and meta-model layers. The first-layer base learner combines extreme learning machines, extremely randomized trees, k-nearest neighbor, and mondrian forest models. The meta-model layer exploits deep belief network to generate the final outputs. The hyper-parameters of the proposed stacking ensemble are carefully tuned using the tree-structured of parzen estimators algorithm to achieve top-notch predictive performance. The proposed model is thoroughly assessed through an empirical study using a real data set from Australia. The simulation results confirm the performance superiority of the proposed model over the existing forecasting models with the lowest average root mean square error and mean absolute percentage error of 3.88kW and 2.30%, respectively.


I. INTRODUCTION
With the rapid growth of Photovoltaic (PV) capacity, PV Power Forecasting (PVPF) presents an effective solution to cope with the unexpected changes of weather conditions [1]. The PV forecasts allow the compensation of the deficit in PV power generation from alternative sources. A precise expectation for future PV power generation guarantees a secure and effective system commitment by improving the energy stability and ensuring the grid's reliability [2]. More precisely, the PVPF copes with the weather outliers and provides information integrity to customers and energy suppliers. In addition, accurate forecasting of PV power output is crucial for energy control and management in smart grid systems, especially when the well-known concept adopted by customers is the ''fit and forget'' approach [1]. Therefore, PVPF overcomes the lack of coordination between the load and its variant suppliers. The conditional hierarchical relations between heterogeneous generation sources and demand need an accurate forecasting model to prevent blackouts and system failures [1]. However, the intermittency and randomness of meteorological conditions pose great challenges to the accuracy of PV power production forecasts, especially during rainy, cloudy, or extreme weather conditions [3]. Consequently, PVPF remains in a theoretical exploration phase due to the unsatisfactory performance in particular case scenarios where the Root Mean Squared Error (RMSE) can exceed 50% [4]. Meanwhile, more sophisticated forecasting methods are highly needed to meet the technical requirements of actual PV plants.
Recently, PVPF has become an attractive research area for scientists and engineers. Meteorological Data-Driven Approach (MDDA) is one of the commonly used approaches in PVPF. This approach deals with the weather parameters having a direct impact on the PV power production. For a typical forecasting system, the database collection has a major impact on the forecasting quality [5], [6]. Moreover, the forecasting horizon significantly impacts the quality of the prediction [7], [8]. According to the cut-off time horizon, the data-driven methods are categorized into short, medium, and long-term predictions [2]. The Short-Term PVPF (STPF) ranges from minutes to hours, the medium horizon from hours to a few weeks, while the long-term prediction is implemented up to years ahead. Artificial intelligence methods were frequently adopted to cope with the stochastic interaction of PV systems with their external environment. A recent selection of these methods is listed in Table 1 [5], [9]- [20].
In [5], a Weighted Gaussian Process Regression (WGPR) approach has been proposed to alleviate the negative impact of outliers on the PVPF accuracy. In the latter approach, the samples with higher outlier have a lower weight. From the experimental results, the proposed method shows a slight improvement in terms of RMSE compared to standard Gaussian process regression (GPR). However, the joint distribution of input features needs to be calculated, which may lead to poor performance for multidimensional input data. A deep Residual Network (ResNet) and Dense convolutional Network (DenseNet) have been proposed in [20]. In these architectures, shortcut connections were utilized to skip one or more layers while preventing the learning degradation problem. It has been found that ResNet achieves higher performance than DenseNet, with a coverage error rate of less than 1%. Nonetheless, the major limitation of these deep networks is the long training time and the heavy computational requirements due to the high complexity with the increased network depth. In [18], a Support Vector Machines (SVM)-based Ant Colony Optimization (ACO) approach has been proposed for PVPF. Although the latter approach achieved excellent performance with a coefficient of determination (R 2 ) of 0.997, the scalability to large-scale problems has not been addressed. Besides, the comparison of the optimization method with other up-to-date metaheuristic methods is missed. Some scholars incorporate the persistence model, Auto-encoder, and Long Short Term Memory (LSTM) into the forecasting process of PVPG [15].
Nonetheless, the training time of LSTM is much longer than that of other algorithms. Reducing the training time under the premise of ensuring high accuracy is still a challenge worth studying. Paper [12] has adopted a solartime-based analog ensemble for regional PVPF. In this specific design, six forecasting engines were fully utilized to generate the PV installed capacity output, leading to an impressive performance with a Mean Absolute Error (MAE) of 54.82MWh. From the employed forecasting engines, LSTM and Convolutional Neural Network (CNN) were incorporated, which could entail excessive parameters and high calculation costs. Unfortunately, the practical application of this design may require big data solutions to counter the additional computational complexity and meet the technical requirements. Besides, the hyperparameter optimization of these algorithms is overlooked. In [11], a novel long-term PVPF assembled by fusing a hybrid feature selection approach and stacking ensemble model is explored. However, the proposed method ignores the impact of historical trends on the future PV power output. Authors in literature [19] exploit the LSTM model based on time correlation modification for PVPF. The LSTM model exhibits the best performance among benchmarks in solving non-linear and time-varying problems.
Nevertheless, the comparison with the emerging powerful Deep Learning (DL) techniques such as Deep Belief Network (DBN) is not considered, making the competitiveness of the depicted approach with top-level DL models questionable. In a similar vein, paper [16] adopted LSTM architecture associated with copula function-based feature extraction to handle mid-to-long term PVPF. The proposed model efficiently extracts the relevant weather features with a lower Mean Absolute Percentage Error (MAPE) of 5.95%, owing to the strong ability of LSTM in characterizing the dependence relationship of TS data. It was pointed out that meteorological feature extraction is highly recommended for capturing long-term dependencies. However, the model inputs of the used data set are limited to five, which poses a question of the usefulness of feature extraction in such a reduced dimension. The authors in [10] applied a CNN model for day-ahead PV power forecasting. However, the proposed model lacks enough capability to model the complex temporal characteristics of load series regarding the high error measurement leading to an unsatisfactory Root Mean Square Error (RMSE) of 163.15W.
The authors in [21] implemented an Extreme Learning Machine (ELM) for PVPF. The ELM architecture provides an effective selection of random nodes to determine the output weights to protect the system from slow gradient-based learning. To validate the competitiveness and feasibility of the proposed forecasting scheme, SVM and ANN were chosen as the reference models. From the error measures, it has been found that ELM provides more precise forecasting results at the hourly forecasting horizon, with a MAPE = 3.56%, compared to SVM and ANN with MAPE = 4.56% and 5.41%, respectively. Besides, the training time of ELM was the lowest with 0.34 seconds compared to ANN (0.74 seconds) and SVR (2.91 seconds). However, the ELM model still has room to improve in terms of higher accuracy and efficiency since this model faces a complex issue in computation with a large number of hidden nodes. A recent work [22] deploys a new model named Expanded ELM (EELM) for PV power forecasting. The proposed EELM contributes to the original ELM by an automatic selection of the hidden layer number and random input weights. Although the proposed model outperforms ELM and Functional Link Neural Network (FNLNN), the higher extrapolation capabilities of EELM have been only demonstrated for a forecasting horizon of less than 1 hour. Authors in [23] proposed Kernel Extreme Learning Machine (KEML) model to predict daily global solar radiation. The supremacy of the proposed method is verified using different optimal penalty parameters and kernel width. However, the proposed KELM does not have a high generalization capability, and its performance is proved in limited problems. The authors in [24] implemented a stacked ELM (SELM) model for Time Series (TS) prediction. Despite the superior performance of the S-ELM, it has been conducted that S-ELM faces a heavy computational burden compared to the traditional ELM. To the best of the authors' knowledge, Mondrian Forest (MF) has never been implemented for PV and solar forecasting purposes yet [25].
In summary, based on the above-mentioned research works, the effectiveness of the PVPF can be further enhanced via performing some feasible scenarios. The main contributions of this paper rely on the following aspects: • An effective ensemble learning-based stacked generalization approach and an intelligent optimizer are firstly proposed. The Enhanced DBN (EDBN) model creates effectively high-level abstractions by leveraging meta-data solutions.
• A potential application of the proposed model is appraised for PV power forecasting and assessed through a real-data set. The proposed method demonstrates a high extrapolation capability for STPF application. The assessment of the proposed model has been conducted using score metrics and a comparative study with multiple benchmark models.
• Tree-structured of Parzen Estimators (TPE) algorithm is employed for Hyperparameters (HP) tuning of machine learning and DL models. It is quite challenging to decide the initial HP values. The TPE optimizer has provided an efficient automatic HP selection based on simulation results. The remaining parts of this paper are organized as follows. In section II, a comprehensive overview of the adopted methodologies and the proposed method is presented. Section III assesses the proposed method for daily PV power generation using three real data sets. Finally, section IV concludes this study.

II. METHODOLOGIES AND PROPOSED APPROACH
This section briefly describes five ML models employed in this research work, including MF, ELM, KNN, DBN, and Extremely Randomized Trees (ET). The cited methods have been optimized based on TPE. Furthermore, the proposed approach is comprehensively described.

A. MONDRIAN FOREST
Recently, the MF model has been introduced as an enhanced version of Random Forest (RF). For the conventional RF, assuming (f n )(n 1) as a randomized estimate, x ∈ [0, 1] d denotes the query point, { (m) , m ∈ {1, . . . , M }} denotes the random partitions of [0, 1] d , and (f n (x, (m) ) presents the prediction output. The RF prediction is computed by [26]: In order to avoid the inconsistency and complexity of RF, a Mondrian Process (MP) has been applied for MF in a scaled time domain. A family distribution {MF t , t ∈ [0, ∞)} makes hierarchical replications {MF s , s ∈ [0, ∞)} with an accuracy enhancement for each s > t. MF s are shaped recursively according to an improved probability distribution and the hierarchical bayesian prior to the leaf parameters. For each distribution, the nodes are updated for each timestamp following the conditional mondrian algorithm. For the sake of conciseness, MF is explained from a mathematical perspective. From (1), the random partitions are accorded with timesteps λ from the MP distribution MP(λ[0, 1] d ). The MF parametric equation is defined as [26]: Fig. 1 presents the global structure of the MF algorithm in which incremental learning is getting proceeded over time. Regarding Fig. 1, each node is split for a specific λ timestep. MR has the propriety of no limitation for continuous learning. This MF ensures a falling back to the prior mean and variance for samples far away from the train sets. VOLUME 9, 2021 B. EXTREME LEARNING MACHINE The ELM is a learning model for the generalized single-hidden Layer feedforward neural Networks [27]. The ELM acquires a fast learning and high cost-effectiveness of computational complexity compared to the back-propagation algorithm and the Levenberg-Marquardt algorithm. Unlike slow gradient-based algorithms for neural networks, this algorithm's hidden weights and bias parameters are randomly selected, and the output weights are analytically computed. The ELM learning process targets minimizing the training error in tandem with the smallest norm of output weights as following [27]: T denotes the training data-target matrix. H is the Hidden layer output matrix written as [28]: The ELM mechanism consists of choosing the hidden perceptron and calculating the output weights of Single-hidden Layer Feed Forward Neural Networks (SLFNs). The ELM model with the activation function g i (.) for the i th hidden node and N hidden nodes is presented as [27]: where w i is the weight between the hidden nodes and the input nodes, β i denotes the output weights. c i is the bias. This algorithm aims to get a faster training process with a minimum norm of output weights. Eq. 5 can be simplified as [27]: Since the hidden node's parameters are tuned randomly, the output weight vector is calculated by a simple multiplication of the training data-target matrix, and the Moore-Penrose generalized inverse of H denoted as H † presented by [27]: The orthogonal projection method is successfully employed for the calculation of inverse matrix: if HH T is non-singular. Following the ridge regression theory, it is recommended to add a positive value (1/λ) to the diagonal matrix HH T . Thus, the corresponding output function f (x) is given as follows [29]: The traditional KNN algorithm is an instance-based learning supervised learning method to find out k training samples closest to the target object based on the available input patterns. The traditional KNN implementation is characterized by high simplicity and generalization potential for regression and classification problems. The Euclidean distance D(X , Y ) is often implemented through the standard KNN to measure the distance between two points X = (x 1 , x 2 , . . . , x n ) and Y = (y 1 , y 2 , . . . , y n ) illustrated as [30]: where n denotes the number of features. From the Euclidean distance calculation, only saving the input representations is required, and the prediction is achieved locally based on some near patterns. Despite being a mature algorithm with wide use, the traditional KNN has several shortcomings, including high sensitivity to the local structure of the data, large memory requirements, poor performance with unbalanced data, and ineffectiveness with large data sets. Therefore, several research papers have been focused on tackling these problems, including the Weighted KNN, Mixed KNN, and fuzzy KNN [31]- [33].

D. EXTREMELY RANDOMIZED TREES
The ET is an ensemble of untrimmed Decision Trees (DT) that promotes tree diversity based on classical top-down procedure [34]. The ET employs the whole training examples to build each tree in the tree ensemble [34]. The ET model separates the learning nodes by selecting action-points totally randomized for growing these trees. This process makes the ET model more computationally efficient compared to other tree-based ensemble methods [34]. This process reduces the variance of the model and prevents training overfitting [34]. Empirically, given a training data set, . . , f D denotes a D-dimensional vector with f j as the inputs and j ∈ {1, 2, . . . , D}, ET generates M individual DT. Here, S p presents the subset of training data set X at child node p. At each node p, the ET model selects the best split based on S p and a randomized inputs' subgroup. Gini impurity is employed as a score function to select the best split rule. In each child node, the iterations are repeated until obtaining a minimum number of samples required to split, or when all the samples in subset S p have an identical label. Each leaf node is represented by the label of the samples in subset S p .

E. DEEP BELIEF NETWORK
The DBN is a variant of the Boltzmann Machine (BM) model formed by h 0 and L computational layers h i ; (i = 1, 2, . . . , L) [35]. Every layer h i is a Restricted Boltzmann Machine (RBM) RBM i . The RBM algorithm contains a visible layer with visual neurons designed v ∈ {0, 1} g v hidden layer containing hidden neural units named as h ∈ {0, 1} g h with g v and g h present the number of visible and hidden units, respectively [35]. As depicted in Fig.2(a), the standard BM has a limited potential in digging out discriminant data of the input data layer by layer. Thus, the RBM is employed as an advanced and more straightforward form of BM as there is no linkage between nodes in the same layer, while the nodes between the visible layer and the hidden layer are fully connected, as shown in Fig.2(b). Cascading multiple RBMs serve as the nuclear components of DBN, as illustrated in Fig.2(c).
The RBM belongs to energy-based models, and the energy function is defined by the visible layer v = (v i ) n and the hidden layer h = (h j ) m computed as [36]: with W ij and σ i present the weight between visible and hidden units and the Gaussian standard deviation of the visible units, respectively. a i and b j present the bias terms. Gaussian unit-based RBM is simplified for practical use as [35]: (11) where δ i and γ j and are the standard deviations of the Gaussian noise of the visible unit i and the hidden unit j, respectively. The joint distribution of the visible and hidden units is defined by [37]: ℵ denotes a normalizing constant as the sum of E(v, h) over all pairs of visible and hidden vectors. ℵ is calculated as [35]: The marginal distributions of the visible and the hidden layer are defined as [36]: At each RBM i with i = {1, 2, . . . , L}, the conditional probability density function of the j-th neurons in the visible layer h i−1 and hidden layer h i can be expressed as [37]: With σ denotes the sigmoid function formulated as [38]: The RBM training requires learning model parameters that boost the log-likelihood of the probabilistic distribution of the train visible set, following [37]: where T denotes the sample number for the train set D, Hence, v t presents the tth training sample. For a representative sample v t , Differentiating a log-likelihood of P(v t ) with regard to as follows [37]: where is the network parameter set with = {w ij , a i , b j }. P(h|v t ) and P(v, h) denote the expectations of the gradient function under distribution specified by the data and the model, respectively. Due to the high computational burden for solving Eq. 20, the Contrast Divergence (CD) algorithm is employed to measure the joint distribution of v and h [37].

F. TREE-STRUCTURED OF PARZEN ESTIMATORS
The HP optimization is a tedious task for ML methods [39].
To remedy this task, the TPE algorithm is presented as a sequential model-based optimization (SMBO) approach that effectively handles categorical (such as date or weather type) VOLUME 9, 2021 and conditional parameters (such as learning algorithm and learning rate) [40]. The TPE algorithm transforms the configuration space into a non-parametric Parzen-window density estimation with a function evaluation f (θ) [41]. The configuration space can be modeled by uniform distribution, discrete uniform distribution, or logarithmic uniform distribution [42]. Hence, this variety of configurations contributes to the flexibility of the TPE compared to the standard BO [42]. For the iterative process, the TPE approximates f (θ ) in a configuration space that is probabilistically supervised and limited according to the observation history [41]. Expected Improvement (EI) criterion is employed to locate the best HP θ * from the search space [41]. The algorithm defines the probability distribution p(θ|y) by splitting the configuration space into good and bad samples as [41]: where Pr good (θ ) and Pr bad (θ ) are parzen estimators used to estimate the density formed by using the observations θ i such that f (θ i ) are less than and greater than y * , respectively. y < y * indicates that the value of the objective function is less than the threshold, and y > y * denotes that the value of the objective function is higher than the threshold. The optimal HP value θ is formulated as [41]: To better describe the optimization procedure of TPE, Fig. 3 presents the flow chart of the iterative steps.

G. PROPOSED APPROACH
To avoid the insufficiency of standalone prediction models, stacked generalization, so-called stacking, is proposed as a heterogeneous integration strategy that comprises multiple base learners [43]. Thus, the stacking approach efficiently achieves an excellent non-linear fitting ability based on the diversity principle of base learners [44]. In this two-tier framework, the models in the first layer (or level-0), named the base estimator, employs the same target function to obtain the best hypothesis. Then, the second layer (or level-1) named the meta-estimator, manages the balance among the hypothesis obtained from level-0, and makes the final decision. The proposed model is a combination of MF, ELM, ET, KNN, and DBN networks to strengthen the learning effect of standalone models. The stacked scheme runs the individual base models (first-level learners) to train the meta-learner (second-level learner). The complete model architecture is shown in Fig. 4. As seen in Fig. 4, six modules are consecutively computed, namely, data preprocessing, data split, optimization, model construction, results generation, and assessment modules. In the data preprocessing module, the data collected from the data set is cleaned from inconsistent measurements, missing data points, and outliers using Scikit-Learn python library to avoid false interpretations afterwards. Next, data encoding, correlation analysis, and data normalization take place to prepare the time-series data to be fed into the stacking fusion model. The normalized feature representations are integrated into the data split module, where the data is separated into training, validation, and testing. In addition, the predictors' HP were tuned using TPE optimizer in the third module to boost the overall model accuracy. In the model construction module, MF, ELM, ET, and KNN are trained by applying fivefold cross-validation in the first level of the stacked generalization strategy to promote predictor diversity. A series of four meta-features of base learners were merged to construct new inputs. Then, the new inputs are fed to the DBN meta-learner model. The meta-learner exploits the new features to make the final forecasting. The prediction quality is assessed in the evaluation module. The evaluation module assesses the model's applicability in real-world PV plants to justify its practical utility compared to benchmarks. The flow chart of the adopted framework is demonstrated in Fig. 5.

III. EXPERIMENTAL RESULTS
The evaluation of the recommended model is performed based on numerical test results, using real PV plant data traces. This section investigates the data description and preprocessing stage. Next, the evaluation measures are presented. Finally, the simulation results are thoroughly discussed. The learning environment setup is run on a Google Colaboratory, a free cloud service supported by Google with Graphics Processing Unit (GPU) enabled. The experimental simulations tests were carried out on the computational environment described in Table 2.

A. DATA DESCRIPTION AND PREPROCESSING
The evaluation process is conducted using a real-world data set to prove the predictive performance of the proposed model. Desert Knowledge Alice Springs Center (DKASC) in Central Australia data was collected and employed for the validation process [45]. The DKASC flagship facilities of Alice Springs contain 38 sites. The DKASC has a semi-arid climate BWh according to the Köppen climate taxonomy exploiting 9% of Northern Territory. This renewable energy plant is located in a town in the northern territory and is considered as one of the Australia's top producers of solar energy [45]. Meteorology (global horizontal irradiance (W /m 2 ), diffuse horizontal irradiance (W /m 2 ), relative humidity (%), wind direction (Â • )), sampling time (min), temperature ( • C)), and historical power data of PV arrays (kW) from March 1, 2016, to December 1, 2019, in sampling intervals of five minutes has been employed for the numerical study. The meteorological factors and the one-year lagged PV power were associated as feature inputs to predict the PV power. The training, validation, and testing sets occupy 60%, 10%, and 30% of the total collected data, and the five-minutes ahead PV generation are the target.
For data cleaning purposes, removing invalid observation samples from the database such as Not A Number (NAN) or unreal values is highly required to not skew model training and gives a reliable sign of the data collected. The low timestep (5 minutes) improves the clearness of the data and the easiness of dirty sample detection. To ensure the consistency of features' dimensions, the data samples have been linearly scaled into [0, 1] following the Min-Max normalization approach [46]. This feature scaling approach boosts the model convergence as [46]: where x n presents the normalized value. x r , x max , and x min are the datum, maximum, minimum, and scaled datum, respectively.

B. EVALUATION CRITERIA
To perform a fair assessment of the investigated prediction models, the deterministic forecasting performance of EDBN is assessed by RMSE, MAE, MAPE, and R 2 error metrics to their common applicability in TS forecasting problems and wide acceptability within the PV forecasting research community [47]. Mathematically, the score metrics are formulated as [46]: where n denotes the total number of samples.ŷ i and y i denote the i th forecast and the actual value, respectively.

C. RESULTS AND DISCUSSION
In the beginning, predictive analysis on the improvement of the proposed model in the PVPF due to the inclusion of the stacking ensemble model was evaluated and compared to its original counterpart models. These models include ELM, KNN, ET, DBN, MF, and EDBN. In this stage, four types of fluctuations were considered for the assessment procedure, namely, the sunny, partially cloudy, foggy, and rainy weather types. Then, the evaluation procedure is extended to compare the proposed model with the popular single DL models. These models include Multi-Layer Perceptron (MLP), Gated Recurrent Unit (GRU), Bidirectional Long Short-Term Memory (BiLSTM), and LSTM. Finally, the proposed model is compared to a list of hybrid methodologies presented in the recent literature [47]- [54]. Towards a reliable evaluation of the proposed model for 5-min ahead daily forecasting, score errors are simultaneously computed with identical conditions for the proposed framework. The simulated models pass by an HP optimization using TPE method. For the TPE implementation, the TPE is used to classify the explored positions into good and bad, while the RPP presents the probability for the TPE to jump to a random position in an iteration step. The TPE optimal configuration adopted in this study lies in thirty iterations, TPE of 0.5, and a Random Position Probability (RPP) of 0.03. To enable rigorous assessment of the ensemble model with respect to individual models, the EDBN model adopts the optimal HP settings of its base and meta-learners to attribute the same level of tuning for all the simulated models. Subsequently, some popular neural network benchmarks, such as MLP, LSTM, BiLSTM, and GRU, are compared. In these benchmarking DL methods, the number of units, the hidden layer size, the activation function, and the optimizer function need to be optimized. It is worth noting that the adopted TPE model generates the optimal HP for the reference models used in this study. The specific HP settings of the developed model and benchmarks are found in Table 3. The proposed model is tested for the different seasons of the year to evaluate the prediction performance of the obtained architecture over the possible meteorological conditions. The forecast results of the EDBN model are vividly visualized in Fig. 6. It is worth noting that the forecasting outputs are converted to the original range using the Max-Min denormalization approach.
As it can be seen from the simulation results presented in Fig. 6, the proposed PVPF method is well-performing. From the first look at the forecast curves, the prediction curves show excellent agreement with the measured PV power taken into account the number of forecasted points close to the real values. The high accuracy confirms that the proposed model is perfectly designed for PVPF. This assessment procedure investigates the impact of the rapid changes of meteorological factors on the forecasts' quality. To have a more precise judgment, the quantification of the error values for the EDBN model and the counterpart models (ELM, ET, KNN, and DBN models) are summarized in Table 4, where boldface error measures represent the optimal values in each situation. The EDBN can always obtain better testing accuracy than standalone models. The proposed framework succeeds in performing the best accuracy. The high performance of the proposed method is explained by combining the benefits of the model components. To further demonstrate the performance of the proposed model, the results are compared with those of MLP, LSTM, BiLSTM, and GRU. To visually display the predictive ability of the proposed model and other DL benchmarks, Fig. 7 further plots the prediction outputs of the simulated models for typical days in four seasons of the year 2018.
According to Fig. 7, the EDBN model provides high effectiveness in tracking the daily PV power and outperforms all the benchmark models for different seasons of the year. During sunny and partially cloudy days in summer and spring, the EDBN and DBN perform best than other models. It can be noticed that EDBN provides more stable results with fewer fluctuations compared to DBN. On foggy and rainy days, the kNN, DBN, ET, and DBN generate satisfying forecasts despite the high volatility of meteorological conditions. It is worth mentioning that MF and ELM models provide the worst performance in all weather types, possessing high errors compared to DBN, and ET models. Although the predictive performance of the EDBN model is clearly demonstrated compared to comparison models, it could be remarked that the proposed method slightly outperforms ET and KNN models. To establish the robustness of the proposed framework compared to reference models, the scatter plot of the simulated models is illustrated in Fig. 8.
As seen from Fig. 8(a), the forecasted PV power pattern coincides mostly with the real PV power values. It is clear that the EDBN outputs rise linearly with the actual PV measurements. It should be noted that the presented methodology shows a strong correlation between the forecasted and real PV power, demonstrating the best performance compared to counterparts models. More importantly, compared to the original DBN in Fig. 8(c), the predicted points of the DBN are very dispersed, reflecting a weaker correlation with the actual outputs. It should be mentioned that the more the dispersion is low, the more the accuracy is high, and vice versa. From Fig. 8(d) and Fig. 8(g), it can be derived that the weak learners of the stacking approach, namely ELM and KNN, are badly correlated to the PV outputs. The linear relationship deduced from Fig. 8(e) between the ET, and actual values show a good correlation compared to ELM and KNN models. In Figs. 8(b), 8(f), 8(h), 8(i) presenting the MLP, GRU, LSTM, and BiLSTM outputs, respectively, it can be remarked that these DL models achieve a satisfactory performance for a low PV generation lower than 150kW. In other words, It can be found that the forecasting accuracy of these models is very close to the EDBN model. But, the forecast accuracy of the counterpart models degrades for PV power outputs higher than 150kW.
Interestingly, it is remarked that the correlation among the PV generation measurements decreases as the produced PV energy is increased. Fig. 9 shows the error distribution denoted as the difference between the predicted and actual PV generation output. Empirically, the error error(i) = y(i)−ŷ(i) is the test error for the i th sample, y(i) is the target value and y(i) is the output for time step i. The error distributions plot reveals a direct interpretation of forecast errors for all weather conditions. Regarding Fig. 9(a), the error distributions of the proposed model are mostly concentrated on the zero-axis, which reveals that the EDBN can produce more accurate results. Compared to the LSTM model shown in Fig. 9(h), the error values range from −20kW to 30kW, while the LSTM errors range from −40kW to 30kW. Compared with the prediction outputs of DL models, it can be conducted that DL models can provide better performance than machine learning prediction models. Fig. 10 shows the coefficient of determination for the competing models using the 5-minutes daily generated PV outputs in different seasons of the year. Ideally, an R 2 ratio closer to 1 is better.
As seen from Fig. 10, the box plots illustrate the noticeable difference between the prediction engines' performance. It can be noticed from Fig. 10 that the EDBN model predicts the target accurately compared to ET and KNN models despite the non-symmetry behavior of PV power over the seasons of the year. The size of the box in the plot shows that the coefficients of determination are mostly concentrated at a high range. The EDBN model has the smallest box for different weather types, reflecting the model robustness compared to other models. To better demonstrate the feasibility of the proposed model, Fig. 11 presents the spider chart of MAPE and R 2 of the proposed algorithm and simulated benchmarks.
As presented in 11(a), the MAPE error of the proposed EDBN is 2.30% which is lower than the reference models. Furthermore, the R 2 value of the EDBN algorithm yields a 97%, which is higher than the other benchmarks. It is worth mentioning the DL models possess a sight lower error values than the proposed model, which demonstrates the high competitiveness of the proposed model. In this work, daily comparisons based on scale-independent forecast error metric results are conducted in this study with the recently proposed models. Comparing with other methods presented in literature using other PV databases, the proposed model performance is analysed aside with CNN-LSTM (CLSTM) [48], LSTM-based Attention Mechanism (LSTM-AM) [49], PVPF Network (PV-Net) [50], Multi-Channel CNN (MC-CNN) [51], Radial Basis Function Neural Network (RBFNN) [52], Wavelet Packet Decomposition-LSTM (WPD-LSTM) [53], LSTM [47], GRU [47], Improved Moth-Flame Optimization algorithm-SVM (IMFO-SV) [54], and Particle Swarm Optimization-SVM (PSO-SVM) [54]. The models' results for PVPF are listed in Table 5.
As per Table 5, the EDBN model yields accurate forecasting results, exhibiting a MAPE error of 2.30%. The proposed model produces better results than high-performing benchmarks such as WPD-LSTM, IMFO-SVM, and PSO-SVM with 2.40%, 3.92%, and 2.85%, respectively. Although hybrid models generally yield better performance, the CLSTM, LSTM-AM, and MC-CNN generate low MAPE  errors of 7.53%, 7.10%, and 8.63%, respectively. The RBFNN, LSTM, and GRU yield an MPAE of 3.71%, 3.61%, and 3.42%, respectively. It is worth mentioning that the EDBN model produced slightly better results than WPD-LSTM and PSO-SVM models (the best algorithms   after the proposed one) with 2.40% and 2.85%. To assess the significance of the differences among the results listed in Table 5, Fig. 12 presents the MAPE results of the obtained EDBN and recent works. Fig. 12 reveals that the proposed model is highly competitive compared to the results reported in the literature. This is VOLUME 9, 2021  deduced since the overall MAPE obtained from the adopted methodology is comparatively lower than those obtained in [47]- [54]. Therefore, the obtained EDBN model is found perfectly tailored for daily PVPF. The time complexity of the ML algorithm is an essential ingredient taken into consideration when devising novel models for real-world implementation. A proposed model requires having a running time less than the response time needed for the underlying problem. In this work, the computational complexity of the testing model is resumed in Table 6.
From Table 6, it is observable that the proposed technique generates the testing prediction output with an acceptable computational testing time-cost (1.67 seconds). The PV forecasts generation in a few seconds reflects the fast convergence of the stacking ensemble and its adaptability in practical applications. Empirically, the MLP, ET, ELM, and DBN present the fastest algorithms, followed by the EDBN and GRU models with 0.42, 0.77, 0.8, and 1.13 seconds, respectively. It is conducted that the relative complexity and the superior accuracy of the EDBN model did compromise its low testing time for generating the final outputs. It must be pointed out future work of this study will give a particular focus on the emerging materialistic optimization algorithms that can further boost the model performance, such as grey wolf optimizer and whale optimization algorithms [38]. In summary, the proposed EDBN model is able to model the periodical data in complex non-linear relationships and irregular dependencies through the stacking mechanism in a timely manner, demonstrating its high applicability in real-world PV plants.

IV. CONCLUSION
PV power forecasting (PVPF) is essential and fundamental in decision-making processes for smart grid systems. This paper proposes a novel PVPF framework named Enhanced Deep Belief Network (EDBN). The proposed EDBN model incorporates Extreme Learning Machine-Mondrian Forest-K-Nearest Neighbors, Extremely Randomized Trees, and deep belief network to cope with the heteroskedasticity of PV power. This model employs the correlation patterns between the meteorological data and the meta-learning ensemble to provide a precise daily estimation of the PV power. Tree-structured of parzen estimators algorithm is employed to optimize the model performance through an effective hyperparameter tuning. As demonstrated in the case study, the error measurements of the developed system are the lowest among all involved methods, with an RMSE and R 2 of 3.88kW and 97%, respectively. The mean absolute error has been reduced from 7.5kW to 2.70kW when compared with the original DBN model. In sum, the simulation results demonstrate that the EDBN provides excellent performance for non-linear PV power output forecasting and high generalization potential compared to single reference models.
In this study, the most suitable base learners were selected based on the repetitive experiments, which can present a cumbersome task for complex time series forecasting problems. An automatic knowledge generation method will be considered to define the best model stack for various prediction problems in future works. Furthermore, a missing-data tolerant strategy is highly required to guarantee the proper operation of the proposed method in real-world environments, which will be the topic of a future study.
[53] P. Li He has five years of hands-on experience in applying deep learning and machine learning strategies to tackle real-world problems. He is the first author of more than 20 peer-reviewed journals and conference publications and one book chapter during his work at Texas A&M University at Qatar. He has a H-Index of six. His research interests include machine learning and deep learning techniques for power system planning and energy management in smart grids and innovative prediction models.
He has worked at many universities in many countries, including Poland, Palestine, USA, Germany, and Qatar. Since 2006, he has been with Texas A&M University at Qatar. For five years, he has served as the Chair for the Electrical and Computer Engineering Program, Texas A&M University at Qatar. He is currently serving as the Managing Director for the Smart Grid Center. He has published more than 550 journals and conference papers, five books, and six book chapters. He has supervised many research projects on smart grid, power electronics converters, and renewable energy systems. His main research interests include electric drives, power electronic converters, renewable energy, and smart grid. He was a recipient of many national and international awards and recognitions. He was a recipient of the American Fulbright Scholarship, the German Alexander von Humboldt Fellowship, and many others. He has worked in the industry for more than 12 years as the Engineering Team Leader, a Senior Electrical Engineer, and an Electrical Design Engineer on various electrical engineering projects. He is currently an Associate Research Scientist with the Department of Electrical and Computer Engineering, Texas A&M University at Qatar. He has published more than 125 journals and conference papers and one book. His principal work area focuses on electrical machines, power systems, smart grid, big data, energy management systems, reliability of power grids and electric machinery, fault detection, and condition monitoring and development of fault-tolerant systems. He has participated and leads several scientific projects over the last eight years. He has successfully realized many potential research projects. He is a member of The Institution of Engineering and Technology (IET) and the Smart Grid Center-Extension in Qatar (SGC-Q). His research interests include materials science, energy systems, pollution, and renewable energies, with expertise in multi-component multi-phase convection-diffusion problems. He is a member of several boards of directors or scientists and conference organizing committees. He has chaired many international conferences. He is a regular reviewer for journals and international research projects.