Short-term solar irradiance forecasting using convolutional neural networks and cloud imagery

Access to accurate, generalizable and scalable solar irradiance prediction is critical for smooth solar-grid integration, especially in the light of the accelerated global adoption of solar energy production. Both physical and statistical prediction models of solar irradiance have been proposed in the literature. Physical models require meteorological forecasts—generated by computationally expensive models—to predict solar irradiance, with limited accuracy in sub-daily predictions. Statistical models leverage in-situ measurements which require expensive equipment and do not account for meso-scale atmospheric dynamics. We address these fundamental gaps by developing a convolutional global horizontal irradiance prediction model, using convolutional neural networks and publicly accessible satellite cloud images. Our proposed model predicts solar irradiance in 12 different locations in the US for various prediction time horizons. Our model yields up to 24% improvement in an hour-ahead predictions and 26% in a day-ahead predictions compared to a persistence forecast. Moreover, using saliency maps and target-location-focused cropping, we demonstrate the benefits of incorporating meso-scale atmospheric dynamics for prediction performance. Our results are critical for energy systems planners, utility managers and electricity market participants to ensure efficient harvesting of the solar energy and reliable operation of the grid.


Introduction
Harnessing solar energy is essential for paving the path toward energy sector decarbonization, a necessary step in reducing the adverse environmental and health impacts of fossil-fuel-based electricity production and decelerating climate change (Rai and Sigrin 2013). Global photovoltaic solar energy production has increased from 994 GWh in 2000, to 443 554 GWh in 2017 (IEA 2019). This rapid increase in capacity requires accurate forecasting of solar production at all levels, from rooftop to utility-scale deployment, to ensure smooth integration and reliable operation of the grid (Lorenz et al 2014, Golestaneh et al 2016. Short-term, hour-and day-ahead forecasts of renewable energy are critical for the efficient and cost effective operation of the electricity grid (Rachunok et al 2020). Specifically, short-term solar power plant predictions allow utilities and electricity market operators to make informed decisions for scheduling reserve capacity and designing efficient bidding strategies for hour-ahead and day-ahead wholesale power markets. Medium-and long-term forecasts are vital for understanding the strategic value of solar panel deployment from small-scale roof tops to grid-scale solar projects (Antonanzas et al 2016, Kaur et al 2016. Solar energy production is predominantly a function of the characteristics of the solar panel as well as the amount of solar radiation captured (Antonanzas et al 2016). Therefore, access to accurate solar irradiance forecasts is necessary for energy systems professionals to be able to estimate solar power production at various future time horizons. Solar irradiance is typically estimated via the global horizontal irradiance (GHI), which measures the total radiation (in W m −2 ) from the Sun on a horizontal surface on the earth (Sengupta et al 2018). As such, GHI prediction is an integral component of forecasting solar energy output.
Previous work on predicting GHI can be classified into physical and statistical models (Antonanzas et al 2016). Physical models are based on theoretical understandings of light transmission to estimate irradiance (AghaKouchak and Nakhjiri 2012). The simplicity of theoretical transmission models makes them highly generalizeable, as irradiance is calculated based on information such as wind speed, oxygen levels, ground angle, cloud coverage, and elevation (Antonanzas et al 2016). However, these theoretical models lack the specificity required for accurate GHI forecasting (Dolara et al 2015). Another class of physical models is based on the numerical weather prediction methods that use computationally expensive simulations, which are typically run on supercomputers, with limited accuracy in sub-daily predictions (Letendre et al 2014).
Statistical models learn from historical observations and do not leverage physics-based knowledge. Instead, they are empirically driven, and predict future GHI values using statistical learning methods trained on historical data (Yang et al 2018). Statistical solar irradiance prediction methods can be sub-categorized into two paradigms: endogenous and exogenous. Endogenous models generally require ground GHI data as input, as measured by both a pyranometer and a pyrheliometer (Fuquay andBuettner 1957, Kerr et al 1967). Due to requiring the physical deployment of ground instrumentation, endogenous statistical models lack the ability to spatially generalize to locations without in-situ ground GHI measurements. Contrary to endogenous models, exogenous models utilize non-GHI data sources to predict GHI values. The current state-of-the-art in exogenous modeling uses total-sky imager-ground captured sky images (Feng and Zhang 2020). This procedure takes pictures from the ground at a specific location and utilizes a convolutional neural network to analyze captured image data to predict GHI. Other examples include recent works by Feng et al (2017), Crisosto et al (2018), Kamadinata et al (2019), Ryu et al (2019), Jiang et al (2020), Le Guen and Thome (2020), all of which involve training accurate artificial neural networks (ANNs)-based models, using ground-based cloud images, to predict GHI (the two papers are hybrid models which use endogenous GHI data as well as exogenous cloud images). However, the prediction time horizons for all these works range from 0 to a maximum of 1 h. Moreover, while classified as exogenous, this approach still requires in-situ hardware deployment to photograph the sky. Thus, it is subject to the same limitations as endogenous predictions.
Moreover, GHI is affected by both point-specific and meso-scale dynamics in atmospheric dynamics conditions (Fuquay and Buettner 1957). The majority of the current GHI prediction models do not incorporate micro-and meso-scale atmospheric dynamics at a wide scale (Antonanzas et al 2016). A notable exception is a recent model proposed by Jiang et al (2019) which incorporates meso-scale atmospheric dynamics together with point-specific locational and temporal information to estimate hourly global solar radiation in China, utilizing a deep neural network. However, their estimates are based on realtime input data and thus not suitable for hour-and day-ahead forecasting purposes. Koo et al (2020) also harnesses an exogenous approach for GHI prediction. Specifically, they train an ANN model using satellite images together with solar zenith angles and hour angles to make hourly estimations of solar energy in Korea. However, their estimation is for hourly sums instead of point estimations, and it also requires realtime data.
In this paper, we propose an exogenous approach: the convolutional global horizontal irradiance (C-GHI) model to predict GHI based on freely available cloud images obtained from geostationary satellites. This approach requires no physical instrumentation, and is easily generalizable across regions. C-GHI is based on convolutional neural networks-a nonlinear deep learning technique that can learn complex information from image data. C-GHI overcomes the limitations of current statistical and physical models by accurately predicting GHI using entirely exogenous data-enabling a fully remote, worldwide GHI estimation technique, with prediction quality similar to in-situ methods. Moreover, using datapreprocessing and model inferencing techniques in a novel way, we demonstrate the benefits of leveraging a holistic approach and accounting for meso-scale dynamics in solar irradiance predictions, and illustrate model accuracy trends as a function of variable temporal prediction horizons. The structure of the paper is as follows. Section 2 summarizes the input data used in the analysis. Section 3 outlines our proposed approach. Finally, results and conclusions are presented in sections 4 and 5, respectively.

Data
The data utilized in this study consists of satellite images of the continental United States (CONUS) and GHI measurements from the Cooperative Network for Renewable Resource Measurements (CON-FRRM). Section 2.1 describes the satellite imagery data, section 2.2 describes the solar irradiance data, and section 2.3 outlines pre-processing of the data for the analysis.

Satellite imagery
We used geostationary satellite imagery captured by NOAA's Geostationary Operational Environmental Satellite 8 (GOES8) infrared channels. The satellite imagery data is maintained by the Satellite Data Services (SDS) group at the University of Wisconsin-Madison Space Science and Engineering Center. The data was collected from Multi-format  Client-agnostic File Extraction Through Contextual HTTP, a web-based API, maintained by SDS (SSEC UW-Madison 2020). Each GOES 8 image has a 2125 × 825 pixel resolution per channel. In this study, 4 infrared channels are used: channels 2-5, which operate on wavelengths: 3.9, 6.8, 10.7, and 12 µm. The spatial resolutions vary between 4 and 8 km. Figure 1 shows a sample of the input data from all 4 channels.

Solar irradiance
We collected solar irradiance data for 12 different stations in 7 states throughout the CONUS data have a 5 min temporal resolution, with the exception of Cape Canaveral FL data, with a 6 min temporal resolution. The geographic characteristics of the study sites are summarized in table 1 and figure 2, respectively. Moreover, the correlation of GHI between different case study sites are depicted in figure 3. Data is available from the start of 1996 until the end of 2012. To demonstrate the applicability of our proposed approach, we train and validate our models using data from 1999 to 2001 as it is the time period with the most complete data across all the sites. However, the model can easily be extended to more recent years, which will likely result in higher predictive accuracy owing to improved resolution of satellite imagery over time.

Methods
The proposed C-GHI prediction model is grounded in convolutional neural networks. In this section, we provide a brief overview of the model formulation (section 3.1), the technical details needed to replicate our experiment (section 3.2), model inferences (section 3.3), and measures of model performance (section 3.4).

Convolutional neural networks
ANNs are a statistical learning technique inspired by the way neurons work in the human brain. They consist of a series of neurons (i.e. nodes) which activate each other in parallel or in sequence. Each neuron and its connections are called a perceptron. Mathematically, this consists of a combination of multivariate linear regressions followed by a non-linear transformation. The non-linear transformation-called an activation function-is typically applied on the output of a perceptron. A 'vanilla' version of a neural network algorithm consists of multiple layers of perceptrons. A neural network can learn the complex non-linear mapping between input and target variables by stacking multiple non-linear functions. A mathematical representation of a layer of a fully connected network can be written as: where f i (X) is the output of ith layer given input X; W i and b i are the weight tensor and bias of ith layer respectively; and a(·) is a non-linear activation function.
The output of the last layer (f N (X)) is the prediction value given an input X, where N is the number of hidden layers of a model. A neural network with more than 2 hidden layers is considered a 'deep' network. The training of a neural network is done by finding the weights which minimize the distance between the prediction values and the ground-measured observations (distance is represented by a loss function).
Convolutional neural networks (CNNs) are a class of deep neural network commonly used for image analysis (Bishop 2006). CNNs take in an input image, assign importance to visual features in the image, and ultimately make predictions about the image's contents. They are typically used for image classification, object detection and segmentation , and have been used in several environmental applications including urban expansion detection (He et al 2019), biophysical modeling of crop growth (Lin et al 2020), and environmental and water management (Sun and Scanlon 2019).
CNNs differ from ANNs in the way in which the layers connect. Contrary to ANN, in a CNN, a neuron of a convolutional layer takes only a portion of the output of the previous layer as its input; this portion is called a receptive field. Within a layer, each neuron has a similar sized receptive field generated with the same weights and biases, called a kernel. Convolutional layers generally utilize a linear kernel followed by a non-linear activation. To generate the output of the layer, it slides the kernel throughout the input. This sliding is called a convolution operation. A formula of a 2D convolutional layer can be written as: ( 2) where (x, y) is one entry in the output tensor of (i + 1)th layer, h is a two-dimensional kernel consisting of |{t x }| × |{t y }| weights (which are shared in the layer for every (x, y)), f i is the output of ith layer, and a(·) is an activation function. As a way to maintain data dimensions, padding (usually zero padding) and stride values can be adjusted. Padding refers to adding extra (non-informative) pixels around the input images to keep size of output after a convolution, and stride refers to a step size that a kernel slides. A pooling layer, which performs a similar operation to a convolution but applies predefined summary function instead of convolution kernel, may also be applied after a convolutional layer to reduce dimension and reduce over-fitting. Two most common pooling methods are max-pooling and averagepooling. Max-pooling takes a maximum function for the summary function and average-pooling takes average function. As a result of this convolution operation, the (x, y)'s value of the output tensor can hold the information of multiple neighboring pixels in the input image (i.e. receptive field). By stacking multiple convolutional layers, each neurons of the output of the last convolutional layer encompasses a very large receptive field with respect to the input image.

Model specification
The CNN model structure used in our proposed C-GHI model is based on the VGG16 model (Simonyan and Zisserman 2014), with adjustments made for the number of layers and regularization parameters for each study site. We build separate models for each region and time horizon, using the same network architecture briefly described below. The convolutional layers receive the input image in series, each of which with filter size 32, 64, 64, 128, 128, and, 128. The convolutional kernel size is 3, and the stride is 1, in all layers. Max-pooling is applied after every other layer with kernel size 3, 3, and 2 respectively. After the convolutional layers, the data is passed through three fully connected layers-with 1024, 1024, and 1 nodes respectively. Rectified linear unit, a type of non-linear activation function, is also applied for every layer except for the last fully connected layer. Additional L2-regularization with respect to all weights is added to the loss function to reduce the risk of over-fitting. In short, the final loss function is MSE + λ ∑ ∥W∥ 2 , where MSE stands for mean squared error. Lastly, Adam Optimizer is utilized (Kingma and Ba 2015) to tune the model weights, given the input dataset.

Feature visualization
As a posterior analysis, we apply feature visualization techniques to the fitted models to visualize model performance at different spatial and temporal scales. Specifically, we harness saliency maps for the model inference. Saliency maps help visualize the gradient of the cost function with respect to each pixel of the input (Simonyan et al 2013). In other words, saliency maps involve calculating [∂E j /∂X] X=X j , where E j is the prediction error of jth observation, X is the variable indicating an image, and X j is the jth image. The gradient is then projected onto the original image plane. Saliency maps (figure 4) show the effect of a unit change in a pixel on the prediction accuracy. If the gradient of a pixel is higher, it indicates that the pixel is more informative for the particular prediction.

Model performance
In our experiments, two model performance metrics are used: mean-squared error (MSE), and root mean-squared error (RMSE). For parameter tuning in fitting each model, we use MSE, mathematically represented as: where y i andŷ i are the ground-measured and predicted GHI of ith observation respectively. RMSE is used in comparing the predictive performance of different models. We report RMSE as it is a widely applied metric for solar irradiance prediction, particularly in exogenous predictive models. Additional performance metrics of MAPE, R 2 , and nRMSE are also calculated and included in the appendix tables A3-A5.
Since two different GHI prediction models cannot be directly compared if the prediction regions, time, and temporal horizons do not match, and there is no agreed upon benchmark dataset, we compare our models' performance against the persistence model which is represented as: whereŷ(t) is the predicted GHI at time t and y(t) is the observed GHI at time t, and T is the prediction interval. In essence, the persistence model assumes all future predictions will be equal to the last observed value at T time before.

Results and discussion
Comparing model performance of both the persistence and C-GHI prediction indicates that the C-GHI outperforms the persistence model in all but one location for day-ahead predictions, and in all hour-ahead predictions. The persistence model demonstrates better performance in 30 min predictions across all regions, which is not surprising given the temporal auto-correlation in GHI values. We hypothesize that the poor performance of the CNN-based model in the EP region 1 d ahead forecasting compared to the persistence is due to a 24 h lagged auto-correlation within the GHI data measured at the site. Results in units of RMSE are presented for all temporal horizons for four regions with minimal correlation in table 2, and a comparison between the C-GHI and persistence models are shown in table 3. Results for all locations are shown in appendix tables A6 and A7.
These results indicate that the RMSE of both the C-GHI and persistence model generally increase as the time horizon increases. However, the C-GHI model's performance remains largely consistent in farther-ahead predictions across all regions. In contrast, the persistence model's performance decreases significantly as the time horizon increases. In the BS region, for example, the persistence model's day-ahead RMSE is 64% greater than the 30 min prediction. By contrast, the C-GHI model's RMSE increases only by 0.6%.
The C-GHI outperforms the persistence model by an average of 14.39% in hour-ahead prediction, and 13.35% in day-ahead prediction across all locations. This indicates that, within a single region, the prediction error increases slowly as the temporal prediction horizon increases when utilizing a satellite image based model. We also see comparable results between the C-GHI and other state-of-the-art exogenous models. Direct comparisons between GHI prediction models is not possible because of differences in prediction regions, temporal horizons, and available data. Accordingly, we utilize percentage improvement over a persistence model to compare modeling techniques with different temporal and spatial characteristics. Using this metric, C-GHI model's performance is on par with current stateof-the-art exogenous models' performance in terms of percentage improvements from the persistence model.
We compare the C-GHI to two other comparable exogenous GHI prediction models. The first model, proposed by Feng and Zhang (2020), utilizes a totalsky image (thus not fully exogenous) and is trained on 3 years of data at NREL's South table Mountain Campus, Golden, Colorado. The 1 h ahead prediction performance compared to persistence is 34.02%. The best C-GHI performance for 1 h ahead prediction is 24.26% at CN region. Qing and Niu (2018) proposed a 1 d ahead time-series model to predict GHI using weather forecast datasets. Their model for Santiago in Cape Verde utilizes 30 months of data, which is a similar data size to our dataset, and showed 30.68% performance improvement over the persistence model. This is comparable to our best 1 d ahead model in BS region, which showed 25.52% improvement.

Model visualization
We hypothesize that the model prediction quality is the result of the C-GHI locating the prediction region and adjusting the size and scope of the input data. That is, given the whole US satellite imagery and  historical GHI data for a specific point, the learning process encourages the C-GHI to seek out the prediction location and a sufficient surrounding area to optimally map an input to an output. To test this hypothesis, we leverage saliency maps (introduced in section 3.3) to understand the geographic regions which most contribute to the GHI prediction. An example is illustrated in figure 4. In these maps, lighter colors indicate regions of higher gradients with respect to the prediction and thus a greater contribution to GHI prediction. In each location, the brighter region becomes wider and more diffuse around the correct region as the prediction horizon grows.

Deterministic cropping
The C-GHI demonstrates high predictive accuracy when the entire CONUS imagery is utilized to predict local GHI values. In order to understand which areas of the country contribute to a particular regional GHI prediction, we perform the same analysis, but crop the input images to a specific window around the prediction location. We apply fixed-location deterministic cropping as a data preprocessing step, given the target prediction region's location. By fitting and comparing models for different cropping sizes and prediction horizons, we aim to understand the extent to which increased regional information improves prediction accuracy across different prediction horizons.
What follows is an example of the cropping procedure at the AU location for one channel. First, we calculate the location of the AU site (lat, lon) = (30.29 • N, 97.74 • W) within the satellite imagery. Each image is then centered around the AU site and cropped into squares with side lengths of: 10,20,40,60,100,140,180,220,260,300,360,420, and 512 pixels. All cropped images are resized to 256 × 256 using bicubic interpolation to preserve the network architecture and maintain the number of parameters within the C-GHI. Finally, each model is separately trained per cropping window size and time horizon using the same dataset. We show the results of the predictions in figure 5.
Results indicate that prediction error decreases rapidly in 5 min and 1 h prediction as a result of increasing cropping window size when window size is small, and prediction error levels off as the window size becomes very large. Additionally, the effect of increasing the cropping window size become weaker in longer-term prediction. In figure 5, for 5 min ahead prediction in AU, the cropped-image model surpasses the error of full-size-image model at approximately 30 000 pixels-the result of a cropping window of 180 × 180. For 1 h ahead prediction, this threshold increases to 150 000 pixels (roughly 380 × 380). For 1 d ahead prediction, the threshold is approximately 230 000 pixels (480 × 480). As all cropped-image models surpass the predictive quality of full-sizeimage models, we conclude that regional image cropping is beneficial to GHI prediction, particularly in short-term prediction.
Additionally, we find that the predictive performance of models with noisy saliency maps (i.e. 5 min ahead prediction in AU and CL) is substantially improved when input images are cropped. We believe this is due to the model identifying the prediction location more accurately in longer temporal horizons and the the wider regional cloud information surrounding the target point which can be incorporated. For example, the 5 min ahead prediction in AU had an 8.2% improvement by the method at window size 300 × 300, while the 5 min ahead prediction in BC had only a 4.0% improvement even at 512 × 512.
We can see that in figure 4 the saliency map of AU in the 5 min ahead prediction is noisier than that of BC. The ability of the CNN-based model to maintain predictive accuracy at increased time horizons combined with the benefits to model accuracy gained by cropping the input image reveals that the C-GHI is accurately capturing the regional meteorological patterns which contribute to GHI values. The cropping window size required to achieve similar performance to an uncropped model increases as the prediction horizon increases.

Weather condition effects
Lastly, we test the sensitivity of C-CHI's prediction performance to the degree of cloud cover, focusing on the best performing regions of ED and BC. ED, located in McAllan, Texas, has a semi-arid climate under the Köppen climate classification; it has very hot and humid summers and short and warm winters. BC, located in Daytona Beach, Florida, has a humid subtropical climate with warm and wet summers and cooler and drier winters. These two regions have similar climatological characteristics, but ED is drier and hotter.
To assess the sensitivity of model performance to degree of cloud cover, we first label the days in the model as 'sunny' , 'partly cloudy' , and 'cloudy' . The labels are chosen based on data from the Weather Underground (The Weather Underground 2020). We looked at daily rather than hourly weather data, following the convention in the solar irradiance prediction literature. The cloud cover labels are selected based on the following criteria: (i) 'sunny' represents days with more than 90% clear sky conditions, or with more than 80% clear skies, with the remaining being partly cloudy; (ii) 'cloudy' represents days with either no clear sky conditions, or when more than 70% of the day is mostly cloudy and/or cloudy; and (iii) 'partly cloudy' represent days that are neither completely sunny nor cloudy. Table 4 shows the test set performance changes over different cloud cover conditions and prediction time horizons. Results indicate that as the prediction time horizon increases, C-GHI performs better than the persistence model, especially in cloudy and partly cloudy days. We find that persistence model performs well in day-ahead predictions during sunny days, possibly due to high auto-correlation in consecutive sunny days. We also find that predictions with longer lead times tend to overestimate irradiance in cloudy days and underestimate irradiance in sunny days to minimize the total sum of squared error (figure 6). We hypothesize that this helps C-CHI (b) Figure 6. Scatterplots of ground-measured and predicted GHI. The images in row (a) are for the ED region and the images in row (b) are for the BC region; from left to right, the images represent models' fits for sunny, partly cloudy, and cloudy days.
to better fit partly cloudy days which are most frequent in both regions, thus the overall performance benefits.
Assessing the C-CHI's prediction performance as a function of cloud types is not within the scope of this paper. However, it should be highlighted that future work analyzing the effect of cloud types on CNNbased solar irradiates prediction performance will be of great value to renewable energy planners, operators and regulators.

Conclusion
We propose a novel solar irradiance prediction model, using convolutional neural networks and cloud imagery. The proposed C-GHI is entirely exogenous, predicting solar irradiance based on freelyavailable satellite imagery of clouds. The model's performance is tested throughout various prediction horizons and compared with the persistence model across different regions. The experiment results show that the C-GHI model is highly generalizable and is robust against changes in prediction horizon and locations. The persistence model outperforms C-GHI for sub-hourly predictions, owing to the temporal auto-correlation in the GHI data. However, C-GHI's predictive performance remains steady with increasing lead times, while the performance of the persistence model decreases significantly as the prediction horizons increase. C-GHI outperforms the persistence model by up to 26% in day-ahead prediction.
Using saliency maps as a inferencing tool, we find that model's improved accuracy can be attributed to the holistic approach of the CNN which includes region specific information and surrounding regions' information which are particularly helpful for longterm prediction. Additionally, we test image cropping as a method for data preprocessing and find it improves model performance in all prediction horizons, and is extremely beneficial in short-term GHI prediction. When an appropriate window size is selected, the cropped model outperforms the model with full CONUS imagery.
While our proposed holistic approach yields high accuracy, there are additional avenues for model improvements, for example by incorporating information such as wind speed and direction. Including such environmental factors could lead to higher prediction accuracy while still providing a fully-exogenous GHI prediction framework. Future approaches could add such temporal information through the use of time-series methods such as recurrent neural networks . Moreover, future extensions of this work that accounts for the influence of cloud types, collected through ceilometers, on the performance of irradiance prediction models will of great interest to renewable energy policymakers and practitioners.
In summary, the improved accuracy in predicting GHI provided by the C-GHI combines the benefits of physical GHI prediction models-namely, no need for in-situ measurement-with the accuracy of statistical models. This flexibility can allow for a low-cost exploration of PV siting locations at both a household and utility scale. More accurate GHI forecasts can also help utility-scale integration of renewable energy into the electricity grid. The uncertainty in day-ahead renewable energy production is a major hurdle to the cost-effective usage of renewable energy, and improving the accuracy of solar energy production forecasts will be vital to effectively utilizing renewable resources.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: www.nrel.gov/grid/solar-resource/confrrm.html.