Ultra-short-term PV Power Forecasting Based on Cloud Image Modeling

Due to the impact of many local sudden changes in photovoltaic (PV) output power, its ultra-short-term forecasting is facing great challenges. In a time scale of 1 to 2 minutes, the generation, movement, and ablation of clouds in the sky are the main factors that affect the output power. Therefore, a prediction method based on cloud images is proposed for ultra-short-term prediction of PV output power. First, apply image processing technology to extract image features that affect the changes in PV output power, including cloud coverage rate, direct sunlight rate, and image brightness; second, calculate the extraterrestrial irradiation and air mass at each time point; finally, use the above features as input factors, and use PV output power output data, and a long short-term memory network (LSTM) is used to construct a PV power prediction model to achieve ultra-short-term forecasting of PV output power.


Introduction
As the carbon dioxide released by the massive combustion of fossil fuels enter the atmosphere, they will cause great harm to the environment on which humans depend. Therefore, in recent years, renewable energy sources have received more attention and application from countries all over the world [1] . The international energy agency (IEA) predicts that by 2050, PV power generation will account for 20%-25% of the total global power generation [2] . Most of the changes in solar ultra-short-term output power come from the irregular movement of the clouds. This change can significantly affect the output power within a time scale of 1 to 2 minutes. Therefore, it's significant to improve the prediction accuracy of PV output power. For the forecasting of PV power generation, scholars have completed a lot of work. Some scholars make PV power forecasting based on historical power generation data, combined with machine learning methods such as artificial neural networks and support vector machines [3][4][5][6] , but their models do not involve local cloud information. There are also scholars combining meteorological information such as temperature, and pressure to establish a PV power prediction model [7] , but in the minute-level ultrashort-term prediction, the above meteorological factors will not change significantly, and it is not appropriate to use them as a feature. It would be a better choice to use cloud as an impact factor. In this paper, based on the real-time cloud image of the sky recorded by the complementary metal oxide semiconductor (CMOS) camera, the digital image technology is used to extract the corresponding features, and then combined with the extraterrestrial irradiation and air mass values corresponding to each time point, the LSTM model is used to construct a minute-level PV prediction model, to realize accurate forecasting of PV output power.

Image Brightness
At present, the best effect of collecting cloud images over PV power stations is the total sky imager (TSI), but the cost of TSI systems is relatively expensive. The image capture device used in this article is a more common CMOS camera, and the resolution is set to 640×480, which is stored in JPG format. During the process of transmission from the sun to the ground, solar radiation will pass through the earth's atmosphere and will undergo refraction, absorption, divergence and other phenomena, causing a certain degree of change in solar radiation outside the atmosphere, with the most obvious change in light intensity, which will be directly reflected in the changes in image brightness. The images collected in this article are all RGB images. For an image I composed of M×N pixels, the image brightness Br is defined as: where R, G, B represent the value of each color channel repectively.

Cloud Coverage Rate
The movement of the clouds tends to block the sunlight, which is the main reason for the dramatic changes in PV power generation in a short period of time. On the principle that the degree of light scattering of different wavelengths by clouds is different [8] , this paper combines the ratio of grayscale image and each color channel to achieve the purpose of cloud identification. First, the image I is processed to obtain a grayscale image I G . For thin clouds, the binary image is obtained as follows: if the green-red ratio is less than the threshold α and the grayscale is less than the threshold α G , the pixel is set to 1; if the green-blue ratio is greater than threshold β and the grayscale is less than the threshold β G , then the pixel is set to 1; if the red value is less than the threshold γ, the pixel is set to 0. For thick clouds, if the blue-red ratio is less than the threshold η and the grayscale is between the threshold interval [η GL , η GH ], then the pixel is set to 1. The setting of each threshold requires a large amount of image data for sample testing to set a reasonable value. The setting of each threshold in this article is shown as the

100
According to the collected images, the result of cloud recognition is shown in Figure 1.

Direct Sunlight Rate
In this paper, the pixel with gray value greater than 230 in the image is set as the point of direct sunlight to determine the position and size of the sun in the image. If the number of direct sunlight points in the image is N SU , the direct sunlight rate is

Regular Fluctuations
Surface radiation fluctuations that cause PV power fluctuations are not purely random fluctuations.The distance of the outer solar radiation and sunlight through the atmosphere to the surface and the extraterrestrial irradiation are the regular sources of the fluctuation of the surface radiation.
The extraterrestrial irradiation changes according to the laws of astronomy, and its theoretical calculation involves declination angle, time angle and zenith angle [9] . They are defined as: In the equation, n is the date number of the year, t is the true solar time and tc is the time indicated by the clock, which are in hours; L is the local longitude and L s is the longitude based on the local regional standard time, which are in degrees; ψ is the latitude of the calculated location in degrees.
Based on equations (5) The direction in which extraterrestrial irradiation enters the atmosphere will also affect the surface radiation. When it is incident perpendicularly, the distance is the shortest. The ratio of the actual path of the sunlight to this shortest distance is called the air mass. It is stipulated that on the sea level, when the sun is in a fixed position, the distance covered by the sunlight in vertical irradiation is 1. The air mess can be calculated by the following equation: 614 cos 614 cos

Prediction Model
RNN first appeared in the 1980s. Its biggest feature is that some of the neuron's output can be used as its input to be transmitted to the neuron again, retaining and using the previous information, which is very effective for dealing with time series problems and has been widely applied to nonlinear system modeling [10] . The structure of RNN is shown in figure 2, which t represents the time series; X t is the input of the neural network, Y t is the output of the neural network, H t = is the hidden layer of the neural network; m, n, k are the number of nodes in the input layer, output layer, and hidden layer, respectively. RNN is able to process sequence data because its current output is not only related to the current input, but also related to the previous input. Although RNN is designed to process the information of the entire sequence, it is the information of the recent time node that has the deepest memory and the greatest   block is shown in figure 3, which contains three gating units, the input gate I t , forget gate F t and output gate O t . The block structure can be defined as follows: In equations (13)-(17), σ is the sigmoid function; W and b are the corresponding weight matrix and bias term. By introducing the above operations, the LSTM network can learn the long-term memory relationship of the time series. The accuracy of prediction is an evaluation index to measure the prediction performance of the model. The main evaluation indexes used in this article are RMSE and Coefficient of determination (R 2 ). They are defined as follows: where y i is the real value, y i is the prediction value, y i is the expected prediction value.

Case Study
This article uses the cloud image and power generation data of a rooftop PV system from March to April 2020. Two of the days are selected as prediction days, and the training set contain 10 days. The LSTM network is used to construct a prediction model. The random and regular features are used as the input of the model, and the PV power is the output of the model. Select the data time period as 8:00-16:00, the sampling time interval as 10s, and the prediction time length is 1min. The predicted value of PV power obtained by prediction is compared with the actual value, and its evaluation index is calculated. Since the magnitude difference between the extracted cloud image features and the calculated features is relatively large, and the sizes represented by different values are not the same, it is also necessary to normalize them to unify the calculation scale. Its normalization formula is where x max and x min represent the maximum value and the minimum value of the data set. In this article, the parameters of the LSTM training network are set as: four hidden layers, 150 LSTM blocks in each layer, the learning rate is 0.01, and the maximum number of iterations is 1000. The prediction result of March 20 is shown in figure 4(a), as the prediction result of April 7 is shown in figure 4(b). The evaluation indexes of two cases are shown in table 2.   0.741 It can be seen from the forecast results that during March 20, the cloud image and power generation change relatively smoothly, the prediction error is relatively small, and the accuracy is high; during April 7, the cloud image and power generation change more drastically. The prediction error is relatively large and the accuracy is low. The reason is that when the cloud cover changes drastically, it is difficult to predict the cloud cover change due to the limited viewfinder size of the device. If the prediction time is too long, large errors will occur. After changing the prediction time to 20s, the prediction results are shown in figure 5, and the corresponding evaluation indicators are shown in table 3.  0.916 It can be seen that shortening the forecasting time will significantly improve the accuracy of the forecast. Among them, the accuracy rate on March 20 will increase by 68.6%, and the accuracy rate on April 7 will increase by 51.2%.

Conclusion
PV power generation has mutual changes due to changes in cloud movement. This kind of random and uncertain mutation is a big hidden danger to power grid security. By extracting cloud features and calculating regular changes in weather, the prediction model established in this paper based on the LSTM network can cope with the mutation problem of PV power generation due to cloud movement changes, and can achieve better results in minute-level predictions. Due to the limited viewing size of a single conventional CMOS camera, it is difficult to accurately predict the long-term movement and change of the cloud. When the cloud cover changes drastically, it will affect the prediction effect. Multiple cameras can be used in subsequent research to improve image feature capture. Considering that the power of PV power generation is strongly related to the surface radiation, and the physical mechanism of cloud cluster change and radiation attenuation is more complicated, other image features need to be discussed in subsequent research to participate in modeling, so that the prediction of PV power is more accurate and universal.