Short-Term Photovoltaic Power Generation Combination Forecasting Method Based on Similar Day and Cross Entropy Theory

The forecast for photovoltaic (PV) power generation is of great significance for the operation and control of power system. In this paper, a short-term combination forecasting model for PV power based on similar day and cross entropy theory is proposed. The main influencing factors of PV power are analyzed. From the perspective of entropy theory, considering distance entropy and grey relation entropy, a comprehensive index is proposed to select similar days. Then, the least square support vector machine (LSSVM), autoregressive and moving average (ARMA), and back propagation (BP) neural network are used to forecast PV power, respectively. The weights of three single forecasting methods are dynamically set by the cross entropy algorithm and the short-term combination forecasting model for PV power is established. The results show that this method can effectively improve the prediction accuracy of PV power and is of great significance to real-time economical dispatch.


Introduction
Photovoltaic (PV) power generation is widely used in human life and built around the world in recent years.However, due to the obvious impact of meteorological factors, PV power generation has some characteristics such as uncertainty, volatility, and intermittency.It is detrimental to do safe dispatch and energy management [1].The operational risk will increase when PV power generation system is connected to the grid.Accurate prediction of PV output power can provide a basis for dispatching decision and it is of great significance to reduce system operating costs and ensure the safety and stability of the power system.
Until now, much research has been devoted to the forecasting of PV power generation.According to the difference of forecasting methods, it can be divided into two categories: direct method and indirect method [2].The indirect method predicts solar irradiance by using meteorological data of PV power plants and then uses the relevant calculation formulas or algorithms to calculate the PV output power [3,4].The direct method combines the meteorological data with historical PV power data to predict output power [5][6][7][8].Yuan et al. [5] used BP neural network method to predict the PV power and the weather types have been considered.Lan et al. [6] established an ARMA and Markov chain prediction model for short-term PV power forecast.Jing et al. [7] used extreme learning machine method for the short-term PV power forecasting.In [8], a forecast model for PV power was established by least square support vector machine, but the prediction accuracy is affected by the model parameters.
Single direct forecast methods mentioned above may have some limitations and the prediction accuracy can be further improved.Appropriate combination forecasting methods can effectively improve the prediction accuracy of PV power generation [9][10][11][12].Wang et al. [9] proposed a combination forecast method based on improved grey BP neural network; fuzzy C means method was used to classify the historical data and then select similar days as the model training samples.Li et al. [10] adopted the grey relational analysis method to determine the meteorological factors of the highest impact on PV power generation.Then combined with the advantages of each individual prediction model, it proposed the combined forecasting model based on IOWA operators.Yang et al. [11] established the combination forecast method for PV power based on entropy method and obtained appropriate combination weights.Yang and Chen [12] proposed a combination method in PV power forecasting based on the correlation coefficient and the prediction accuracy was improved.Although these combination forecasting methods can effectively improve the prediction accuracy, the weights of the single forecast methods are fixed in these combination methods and cannot reflect the real-time changes of PV power.
Accordingly, in order to further improve PV power forecasting accuracy, this paper proposes a short-term PV power forecasting method based on similar day and cross entropy theory.First of all, the main influencing factors of PV power are analyzed.Distance entropy and grey relation entropy are introduced.The similarity degree selection index is proposed to select similar days.Then least square support vector machine (LSSVM), autoregressive and moving average (ARMA) method, and BP neural network are used to forecast PV power, respectively.Using the cross entropy algorithm to dynamically set the weights of three single forecast methods, the short-term PV power combination forecasting model based on the cross entropy theory is established.The correctness and superiority of this model are verified by cases analysis and comparison with the combination method based on the sum of the squared errors and the combination method based on the correlation coefficient.

Similar Day Selection
2.1.Entropy Theory.Entropy is a measure of the degree of chaos in a sequence.Entropy theory originated from the laws of thermodynamics and has been widely used in many fields such as systems science, information science, and management science.
Entropy can describe the degree of chaos in the sequence.The probability of occurrence of each variable in the sequence is p w i i = 1, 2, … , l ; the entropy value of the sequence is defined as where C is a constant and l is the number of variables.

Distance Entropy.
Distance entropy is a combination of Euclidean distance and information entropy [13].Euclidean distance is a method to effectively measure the similarity between two sequences, but it treats the differences between variables of different nature in the sequence as equivalent and sometimes cannot meet the actual requirements.Here, the combination of Euclidean distance and information entropy can overcome the shortcomings of Euclidean distance.The lower the distance entropy value, the more information is represented.That is to say, the less the difference between the comparison sequence and the reference sequence, the closer to the reference sequence.The Euclidean distance of each meteorological characteristic variable between the historical day i and the forecast day is where x ′ k and x i k are the meteorological characteristic variable k of the forecast day and the historical day i, respectively.Here x ′ k and x i k need to be normalized first.
The ratio of the distance between each meteorological characteristic variable and the distance sum of all the characteristic variables of the historical day can be calculated.The ratio is the probability of occurrence of the meteorological characteristic variable for the historical day.Thereby, the distance entropy value of the historical day i can be calculated as where d i k is the distance of the meteorological characteristic variable k between the historical day i and the forecast day; K is the total number of meteorological characteristic variables.
2.3.Grey Relation Entropy.Grey relation entropy is a combination of grey relational analysis method and information entropy.Based on the grey relational analysis method, the information entropy theory is used to quantitatively describe the degree of similarity between the influencing factors in the system, which can compensate for the local correlation tendency and personality information loss [14].The greater the grey correlation entropy value is, the stronger the correlation between the comparison sequence and the reference sequence.
The correlation coefficient between the historical day i and the forecast day is where x′ k and x i k are the meteorological characteristic variable k of the forecast day and the historical day i.Here x ′ k and x i k need to be normalized first.
The ratio of the correlation coefficient of each meteorological characteristic variable and the sum of the correlation coefficient of all characteristic variables for historical days can be calculated.The probability of occurrence of the meteorological characteristic variable k can be obtained as International Journal of Photoenergy where ε i k is the correlation coefficient of the meteorological characteristic variable k of the historical day i and the forecast day; K is the total number of meteorological characteristic variables.
The grey relation entropy of the historical day i can be expressed as 2.4.Influence Factors of PV Power.The output power of PV power is related to many influencing factors, including solar irradiance, temperature, humidity, wind speed, and weather type.The output power of PV array is calculated as follows [15]: where t 0 is the operating temperature ( °C) of the photovoltaic cell, I is the solar irradiation intensity (kW/m 2 ), S is the area of the photovoltaic array (m 2 ), and η is the conversion efficiency of the photovoltaic cell (%).
It is assumed that the array area and conversion efficiency are constant for the short-term PV power prediction.It can be seen from equation ( 7) that when the two parameters are determined, the PV power is only related to two factors: solar radiance intensity and temperature.Due to the different seasons and weather types, the intensity of solar radiance is also quite different and the output power of PV arrays is also very different.Since the data of solar irradiance of the forecast day is generally difficult to be obtained, this paper uses the sunshine hours to replace it and selects the mean temperature, the maximum temperature, the minimum temperature, the relative humidity, the minimum humidity, and the sunshine hours as the main influencing factors to select similar days.These similar days are selected as training samples to predict PV power.

Similar Day Selection
Based on Entropy Theory.The basic steps for similar day selection based on entropy theory are as follows: Step 1. First, determine the type of season and weather of the forecast day, and then select the samples of historical days with the same season and weather type, and the number of samples is N.
Step 2. Select the mean temperature, maximum temperature, minimum temperature, relative humidity, minimum humidity, and sunshine hours of samples to form the meteorological characteristic vectors.The and T min i indicate the mean temperature, maximum temperature, and minimum temperature of the i th day, respectively; H i and H min i indicate the relative humidity and minimum humidity of the i th day; S i indicates sunshine hours of the i th day.
Step 3. Calculate the distance entropy E d i and the grey relation entropy E ε i of the i th day.
Step 4. The distance entropy and grey relation entropy are combined into a similarity index R i to characterize the similarity of historical day i.The calculation formula is According to the comprehensive similarity index R i , the six similar days are selected as the training samples for PV power prediction.
2.6.Cross Entropy Theory.Cross entropy (CE) is derived from the definition of entropy and is a measure method for the information difference between two random vectors.It is also known as the Kullback-Leibler (K-L) distance.K-L distance is not a physical distance in length but is used to describe the difference between two probability distribution functions.The lower the cross entropy value is, the more similar the two probability density distribution functions.The definition of cross entropy can be divided into two situations [16].
Discrete situation: Continuous situation: where f and g stand for probability vectors in equations ( 9) and (10).
The cross entropy stands for the distance between f and g.It is the description of the closeness degree between the two probabilities.
Theorem 1. f and g are probability density functions; for D f = g ≥ 0, it equals to zero only when f = g.Property 1.For D f = g ≥ 0, it equals only when f = g almost everywhere.
Cross entropy algorithm is as a random optimization method and can be used to simulate small probability event 3 International Journal of Photoenergy and solve the optimization problem.The cross entropy method has been applied to solve the practical problems such as combined optimization, multiobjective optimization, combined forecast, and machine learning.2.7.Single Forecasting Methods.PV power is affected by solar irradiance, weather types, season types, temperature, humidity, etc.It is difficult to describe by a mathematical model.ARMA, BP neural network, and LSSVM method are often used to forecast PV power.ARMA is a linear model, which can predict the overall trend of the data.BP neural network and LSSVM model are nonlinear models with strong nonlinearity learning ability.Therefore, this paper selects these three single methods to forecast PV power.
2.8.ARMA.Auto regression moving average (ARMA) model is a random time sequence analysis model founded by Box and Jenkins; it is also called the B-J method.The ARMA model is a combination of autoregressive (AR) and moving average (MA) models [17].The AR model uses past values, and the MA model uses past error values.The linear ARMA can be expressed as where ϕ k and θ k are the AR and MA model coefficients.x t is the output of the ARMA model and ε t is the residual.T is the sampling time; p and q are the orders of AR and MA models, respectively.In this paper, the input variables of ARMA model are historical PV power data of similar days.The forecast process mainly includes the test of sequence stability, model parameter estimation, and model ranking.Stationarity is tested by Augmented Dickey Fuller (ADF) unit root test.The model parameters are estimated by least square method and the model order is determined according to the Akaike info criterion (AIC).
2.9.BP.BP neural network is a multilayer feedforward neural network with error back propagation training, which has good self-organizing learning ability and can realize any nonlinear mapping from input to output.The prediction model mainly uses the input signal forward propagation and error signal back propagation to realize the training process, which can deal with parallel process of large-scale data and has the ability of robustness and fault tolerance [18].The BP neural network basic structure is shown in Figure 1.w 1 is the connection weight between the input layer and the hidden layer node and w 2 is the connection weight between the hidden layer and the output layer node.
2.10.LSSVM.The least square support vector machine (LSSVM) was proposed by Suykens J. A. K in 1999, which is an improvement of the standard support vector machine (SVM) [19].
LSSVM is mainly used to solve the problem of quadratic optimization of functions, which can be described as where c is the penalty factor, e k is the error variable, w is the weight vector, b is a constant, φ x k is the linear regression function, x k is the input, y k is the output, and n is the space vector dimension; N is the total number of samples.The radial basis function is chosen as the kernel function.
where σ 2 is the kernel parameter.
The estimation function of LSSVM can be expressed as In the regression model of LSSVM, the penalty factor c and the kernel parameter σ 2 have the greatest influence on the model performance.This paper uses cross-validation to optimize parameters.The input variables of LSSVM are the PV power data of similar days, sunshine hours, mean temperature, maximum temperature, and minimum temperature of similar days and sunshine hours, mean temperature, maximum temperature, and minimum temperature of the forecast day.

Combination Forecasting Model Based on Cross
Entropy.Before PV power generation forecast, the similar days need to be selected.It is very important for improving the efficiency and accuracy of the forecast model.The similar day selection method proposed in this paper is used to select samples.
The procedure of combination forecasting method based on the cross entropy theory is as follows: 4 International Journal of Photoenergy Step 1. Predict the PV power at certain moment during forecasting period T. Select similar days as forecasting samples.For the PV power during forecasting period T, using M kinds of algorithms to predict the PV power at the t point t = 1, 2, … , N , the m th algorithm forecast results are P mt m = 1, 2, … , M .Here, ARMA, BP neural network, and LSSVM methods are used to forecast PV power, respectively.
Step 2. Define the probability density function of PV power.
Here, the value of PV power at a certain point follows the normal distribution.Suppose PV power at the t point during forecasting period T follows normal distribution, g m x represents PV power distribution function at the t point by the m th algorithm and it can be expressed as where μ m is the average; σ 2 m is the variance.
Step 3. Calculate the digital characteristics of PV power distribution function.
PV power at the t point of similar day k k = 1, 2, … , K is P kt .The PV power P mt predicted by the m th algorithm can be regarded as the mean μ m of probability distribution function and the variance can be calculated according to PV power of similar days.
Suppose g x is the probability density distribution function of PV power considering influence factors, g m x represents the PV power distribution function by the m th algorithm, then

17
PV power at the t point obtained by equation ( 17) is the mathematical expectation of μ.The variance of the combination forecasting PV power is Step 4. Establish support vector and obtain objective function of cross entropy.
Build support vector: According to the cross entropy definition, suppose where S m is used to indicate the mutual support degree between g x and g m x ; S m is smaller when the mutual support degree is greater.Define in order to let the weights reflect the mutual support degree among different information sources better.Define The objective function of the minimum cross entropy is Step 5. Solve the minimum cross entropy optimization problem.
It can be seen that the objective function is related to w m from the above formula; the optimization problem is actually solving the minimum of F. In this paper, fmincon function in MATLAB is used to solve the minimum of F.
Step 6. Forecast PV power at other moment during forecast period.
For PV power at other moment of forecast period, it can be calculated by Step 5; then the PV power at each moment of forecast period T can be obtained.

Case Study
In order to verify the correctness and validity of the combination forecast method based on cross entropy theory proposed in this paper, the real data samples from a PV power station of Nanjing in Jiangsu Province from Oct. 1 to Dec. 31 are employed.The meteorological data are obtained from the local weather station.Here, ARMA, BP, and LSSVM method are used to forecast PV power, respectively.For specific analysis, three single forecast methods and three combination International Journal of Photoenergy forecast methods are used to predict PV power, then the predicted results are compared and analyzed.Four typical forecast days with different weather types, sunny day (Dec.10), cloudy day (Dec.12), overcast day (Dec.15), and rainy day (Dec.17), are selected from data samples.The PV power data are recorded every 30 minutes.The prediction period is 7:00-17:00.The algorithms of all forecast models are implemented by MATLAB R2014a software.

Similar Day Selected Results
. Taking the forecast date (Dec.10) as an example, the meteorological parameters of the forecast day are shown in Table 1.According to the similar day selection method proposed in this paper, the selection results are shown in Table 2.All the selected results of similar days are taken as samples of three single prediction methods.
In order to further verify the rationality of the similar day selection method proposed in this paper, the traditional similar day selection method (traditional Euclidean distance selection method) is used for comparison and verification.The ARMA prediction model is used to predict results.The prediction results are shown as follows.
Here, taking the forecast date (Dec.10) as an example, Figure 2 and Table 3 show the prediction results and the statistics of forecasting errors by using different selection methods, respectively.
It can be seen from Figure 2 and Table 3 that the similarity day selection method proposed in this paper can better integrate the uncertainties of PV power generation because it considers distance entropy and grey correlation entropy synthetically.Therefore, the selected similar day samples are better and the prediction accuracy is higher.

Forecasting
Results of Single Models.Figure 3 shows the forecasting results of PV power by using different single    3 that the prediction curves of three single prediction methods are close to the actual curve under the sunny day or cloudy day and the prediction results are ideal.But in overcast day or rainy day, the deviation between the predicted curves of three single prediction methods and the actual curve is more obvious.In order to evaluate the validity of each forecast method, the mean absolute percent error (MAPE) and the Theil inequality coefficient (TIC) [20] are selected as evaluation indices.The smaller the TIC value, the better the prediction performance of the model is.MAPE and TIC are expressed as   where y i is the actual power, y f i is the forecasting power, and n is the number of forecasting points.
Table 4 shows the statistics of forecasting errors for single models for different weather types.LSSVM performs the best among the three single forecast methods.The MAPE index can be up to 8.63 in a sunny day.TIC is also smaller than the other two methods.For cloudy day, overcast day, or rainy day, the prediction error of the three single methods is increasing due to the fluctuation of PV power.In particular, during overcast and rainy days, MAPE of ARMA model can reach 17.88 and 17.97, respectively.Compared with sunny day, the prediction accuracy drops significantly.The input variables of BP such as sunshine hours, temperature, and humidity are taken into account, so its prediction error is lower than ARMA, but the prediction performance is not ideal compared with the LSSVM model.5 gives a part of weights of the cross entropy model; w 1 , w 2 , and w 3 stand for the weights of LSSVM, ARMA, and BP, respectively.It can be seen from Table 4 that the weights of the three single methods are different at different prediction points.At t = 14 h on sunny day, the prediction errors of the three methods are all larger.At this point, the mutual support degrees of ARMA and BP methods are less than those of the other single methods and the weights are all approximate to zero, which avoid the influence caused by the single prediction method error.At t = 8 5 h on overcast day and t = 10 h on rainy day, the prediction errors of the three methods are similar.Due to the greater mutual support degree of LSSVM, the weight of LSSVM needs to be increased.The weights are 0.662 and 0.728, respectively.Therefore, it can be seen that the weights obtained by the cross entropy theory can reflect the information fusion between different forecasting methods objectively.

Forecasting Results of Combination Model. Table
Figure 4 shows results of PV power predicted by three combination forecasting methods for different forecasting days.Among them, the combination model 1 adopts the method based on the sum of the squared errors.The combination model 2 adopts the method based on the correlation coefficient and the combination model 3 is based on the cross entropy theory proposed in this paper.Compared with forecasting results which are shown in Figure 3, it can be clearly seen that the prediction PV power values obtained by combination models are closer to the actual power.Figures 4(a   8 International Journal of Photoenergy overcast and rainy days are more complex, the prediction accuracy of each combination forecasting models is still high, especially in combination model 3, which has the best prediction performance under different weather types.Table 6 shows the statistics of forecasting errors for combination models under different weather types.It can be seen that combination methods can predict more accurately than single methods.In sunny day, MAPE of combination model 3 reduces 4.24 than the LSSVM model, which is still more accurate than combination models 1 and 2. TIC index shows that combined model 3 is also better than the other two combination models.In addition, TIC index values of combination model 3 under four weather types are 0.0198, 0.0249, 0.0462, and 0.0465, respectively, which are the lowest among the three combination prediction models, which can demonstrate the superiority of the combination model based on the cross entropy theory.Therefore, it can be seen that the cross entropy combination model has better performance.

Conclusions
This paper presents the short-term PV power generation combination forecasting method based on similar day and cross entropy.The selected samples of similar days by using the entropy theory are closer to the forecast days.The combination forecasting model can effectively reduce the prediction error under different weather types.Compared with the combination model based on the sum of the squared errors and the correlation coefficient, the combination model based on cross entropy has lower prediction error and better prediction performance.When the weather types are different, its prediction accuracy is also high.Therefore, the short-term PV power combination forecasting method based on similar day and cross entropy proposed in this paper is of great value and can be the reference for the research of operation and control of power system with PV power generation.

Data Availability
The photovoltaic generation power data is collected from a PV power plant in Nanjing, Jiangsu.The meteorological data are collected from the local weather station and can be obtained from the China National Meteorological Information Center website.In addition, the photovoltaic power data used to support the findings of this study is under license and so cannot be made freely available.If the reader needs data, he or she can contact the corresponding author: Qi Wang (email: wangqi@njnu.edu.cn).

Figure 1 :
Figure 1: The structure of BP neural network.

Figure 2 :
Figure 2: Prediction results of different similar day methods.
) and4(b)  show the results of sunny and cloudy days and the results are ideal.Figures4(c) and 4(d) are results of overcast and rainy, respectively.Although the weather conditions in

Table 1 :
Meteorological parameters of forecasting day.

Table 2 :
Selection results of similar days.

Table 3 :
Statistics of forecasting errors for different selection methods.

Table 4 :
Statistics of forecasting errors for single models.

Table 5 :
Weights of cross entropy model.

Table 6 :
Statistics of forecasting errors for combination models.