A Linear Regression Model for Global Solar Radiation on Horizontal Surfaces at Warri, Nigeria

The growing anxiety on the negative effects of fossil fuels on the environment and the global emission reduction targets call for a more extensive use of renewable energy alternatives. Efficient solar energy utilization is an essential solution to the high atmospheric pollution caused by fossil fuel combustion. Global solar radiation (GSR) data, which are useful for the design and evaluation of solar energy conversion system, are not measured at the forty-five meteorological stations in Nigeria. The dearth of the measured solar radiation data calls for accurate estimation. This study proposed a temperature-based linear regression, for predicting the monthly average daily GSR on horizontal surfaces, at Warri (latitude 5.020N and longitude 7.880E) an oil city located in the south-south geopolitical zone, in Nigeria. The proposed model is analyzed based on five statistical indicators (coefficient of correlation, coefficient of determination, mean bias error, root mean square error, and t-statistic), and compared with the existing sunshine-based model for the same study. The results indicate that the proposed temperature-based linear regression model could replace the existing sunshine-based model for generating global solar radiation data.


Introduction
The global warming of the earth poses serious threats to the ecosystem. The earth's climatic change is the result of increasing concentrations of greenhouse gases (GHGs) resulting primarily from fossil fuel combustion into the atmosphere. Besides the abrupt changes in the earth system, global warming affects human societies and ecosystems in a variety of ways (Kamman 2001;IPCC 2007) including adverse impacts on water availability, food supply, coastal safety, and human health. A typical example of the climate change impact is the recent flooding in Nigeria, which claimed lives, displaced citizens and destroyed properties (estimated over one billion naira). In addition, the ecological problems associated with the extraction of fossil fuel, from the Niger Delta region of Nigeria, have threatened the existence of her inhabitants (Obi 2009).
The growing anxiety on the impact of fossil fuels combustion on the environment and the global effort in mitigating their devastating effects, call for a more extensive use of renewable energy sources. Essential solution to the high atmospheric pollution from fossil fuel combustion usually considers renewable energy resources (Cristobal 2011).
Global Solar Radiation (GSR) is the sum of the direct and the diffuse solar radiation on a surface. The direct solar radiation is the solar radiation received from the sun without having been scattered by the atmosphere. The diffuse solar radiation is the solar radiation received from the sun with its direction changed due to scattering by the atmosphere. The knowledge of the availability of GSR data is of fundamental importance in order to utilize solar energy economically and efficiently. The direct normal irradiation (DNI) is of interest in concentrating solar thermal installations and installations that track the position of the sun. This is essential for the design and evaluation of solar energy conversion system. GSR data are not measured at the forty-five meteorological stations in Nigeria (NIMET 2012). In the absence of these data, one has to rely on available methods of estimation and also to develop new ones. The availability of such data will encourage possible analysis of application and efficient utilization of solar energy, in order to control GHG emissions.
Several authors (Ulgen & Hepbasli 2002;Falayi & Rabiu 2005;Falayi et al. 2008;Augustine & Nnabuchi 2009;Okundamiya & Nzeako 2010) have demonstrated the significance of the regression theory in the estimation of global solar radiation on a horizontal surface. Ulgen & Hepbasli (2002) correlated solar radiation parameters (global and diffuse solar radiation) with ambient temperature of the fifth order for the city of Izmir in Turkey. Falayi & Rabiu (2005) and Augustine & Nnabuchi (2009) applied the Angstrom type model and correlated global solar radiation with relative sunshine duration in a simple linear regression form. Falayi et al. (2008) based their studies on the correlation between global solar radiation and other meteorological parameters such as sunshine duration, ambient temperature and relative humidity, using data collected from Iseyin, Nigeria. Okundamiya & Nzeako (2010) proposed a temperaturebased model for estimating the global solar radiation on horizontal surfaces for Abuja, Benin City, Katsina, Lagos, Nsukka and Yola, representing the six geopolitical zones in Nigeria. Although Augustine & Nnabuchi (2009) had previously proposed a relationship (based on a correlation between global solar radiation and sunshine hours), for estimating global solar radiation for the city of Warri in Nigeria, there is the need for improvement of this method for optimal economic sizing of the solar energy conversion systems at such location. The developed model will provide a comprehensive database for the solar energy potential in Warri. The main objective of this research is to develop a simple and accurate model, for predicting the monthly average daily global solar radiation data on horizontal surfaces in Warri by using linear regression theory. Statistical analysis is also performed on the proposed model to determine its predictive ability for generating global solar radiation data.

Site Description and Data
Warri is a leading oil city in Delta State, located (latitude 5.02 0 N and longitude 7.88 0 E) in the southsouth geopolitical zone, of Nigeria. The area, surrounded by tropical rain forest and swamp, experiences two distinct seasons: the dry and the rainy seasons. The dry season lasts from about November to April and is significantly marked by the cold 'harmarttan' dusty haze from the north east trade winds. The rainy season spans from May to October, with a brief break in August. The area is characterized by tropical-equatorial climate with average annual temperature of about 32.8 0 C and average annual rainfall of about 2673.8mm. The natural vegetation is rainforest with swamp forest in some areas. The forest is rich in timber, palm and fruit trees.
Twenty-two years (July, 1983-June, 2005) monthly average daily datasets of global solar radiation on a horizontal surface and the corresponding maximum and minimum air temperature are obtained from the archives of the National Aeronautics and Space Administration (NASA) for Warri. NASA derives these datasets from a variety of earth-observing satellites and reanalysis research programs, which provide reliable meteorological resource data over regions where surface measurements are scarce or nonexistent, and provide two unique features: the data are global and, in general, contiguous in time (NASA 2011). Fig. 1 shows the correlation of the monthly average daily values of GSR on a horizontal surface and maximum air temperature.
The existing sunshine-based model (Augustine & Nnabuchi 2009) for predicting average daily global solar radiations on the horizontal surface at Warri is: where H is the monthly average daily global solar radiations on the horizontal surface, o H is the monthly average daily extraterrestrial radiation on horizontal surface, n is the monthly average daily bright sunshine hours, and N is the maximum possible monthly average daily sunshine hours. Monthly m e a n m a xim um a m bie nt te m pe ra ture ( o C ) Monthly me a n da ily GSR on horizonta l surfa ce ( kWh/m 2 /da y ) Fig. 1 Correlation of a twenty-two year monthly average daily dataset of GSR on a horizontal surface and maximum air temperature for Warri, Nigeria.

Theory
Given the data points of the form (x p , y p ) for p = 1, 2,. . . , P set of points, and that y p depends linearly on x p , then, for accurate prediction of the set points, it is necessary to obtain the slope m and H-intercept (m o ) of a line that best fit these dataset points, defined by . .

(4)
where x is the independent variable, j is the number of independent variables (which would minimize the sum of squares of the difference between the data and the estimated line) and R is the coefficient of correlation.
This study intends to develop an easy and accurate method of predicting the monthly average daily global solar radiation data on a horizontal surface. As such, it assumes two independent variables (j in Equation 4 equals 2). If the dependent variable of Equation 4 is the clearance index K and the independent variables are the monthly average daily air temperature ratio, R T and the monthly average daily maximum air temperature where the subscript p (= 1, 2,. . . 12), refers to the monthly average daily dataset for a typical year. Expressing Equation 5 in matrix form where K is P x 1 column matrix, A is P x 3 matrix and M is (j + 1) x 1 matrix. To solve for A, it is necessary to transform A to a square matrix (where P ≠ j + 1).
where A T is the transpose of A. If (A T A) -1 exists then is the predicted monthly average daily global radiation on a horizontal surface (kWh/m 2 /day) and o H is the monthly average daily extraterrestrial radiation on a horizontal surface (kWh/m 2 /day),

M = (A T A) -1 A T K
is the monthly average daily air temperature ratio, min T is the monthly average daily minimum air temperature ( o C), max T is the monthly average daily maximum air temperature ( o C) and m i is the empirical constants. The available parameters informed choice of Equation 10. In other words, the input parameters of Equation 10 are easily obtained. Moreover, these parameters are measured at the fortyfive (45) meteorological stations in Nigeria.

Simulation
Computer codes developed using MATLAB programming language compute the empirical constants of Equation 10 using 15-years (July, 1983-June, 1998) monthly average daily datasets discussed above. The performance evaluation of the estimation (proposed and existing) models utilized 7 years (July, 1998-June, 2005 monthly average daily datasets. The t-statistic (t-value) is computed in terms of the mean bias error (MBE) and root mean square error (RMSE) and analyzed at the 95% confidence level. The t-statistic is a known statistical tool (Stone 1993;Okundamiya & Nzeako 2010; with an exceptionally satisfactory accuracy for analyzing solar radiation data. The coefficients of determination (R 2 ) of the estimation models are obtained using Microsoft Excel 2007. Detailed analysis of MBE, RMSE, and t-value is given in the literature (Okundamiya & Nzeako 2010).

Results Analysis and Discussion
The proposed temperature-based linear regression model for Warri using the 15-years monthly average daily dataset of global solar radiation on a horizontal surface and air temperatures (minimum and the maximum) is Fig. 2 shows the correlation between the observed and predicted (Equation 11) values of the global solar radiation at Warri using 7-years monthly average daily data.

Comparison of Results
It is pertinent to compare the observed with the estimation models (Equations 1 and 11) in order to determine their performance. The result of this comparison (as presented in Table 1 and Figure 3) will determine the predictive accuracy of the proposed models. Table 1 Comparison of predicted (based on the existing and proposed models) values of the monthly average daily global solar radiation on a horizontal surface at Warri using 7-years monthly average daily dataset

Analysis and Discussion
A test of MBE and RMSE, which are a measure of the accuracy of estimation, gives a long term and short term performance of the model studied respectively. Almorox et al. (2005) and Che et al. (2007) recommend that a zero value for MBE is ideal; however, overestimation of an individual data element will cancel underestimation in a separate observation. In addition, low RMSE values are desirable (Almorox et al. 2005;Che et al. 2007), but few errors in the sum can cause a significant increase in the indicator. It is possible to have large RMSE values at the same time a low MBE values. Therefore, the use of the MBE and the RMSE statistical indicators is not sufficient for the evaluation of solar radiation model performance (Okundamiya & Nzeako 2011). This informs the use of the t-statistic test indicator.
The t-statistic test is chosen since it allows models to be compared, and at the same time can induce whether a model's estimate is statistically significant at a confidence level. It takes into account the dispersion of the results. It can be computed in terms of RMSE and MBE. The smaller the t-value, the better is the performance of the model. To determine whether the estimates are statistically significant, the critical t-value (t c ) at the 95% confidence level and (P -1) degrees of freedom are obtained from standard statistical table.
The following assertions could be deduced from a study of the results presented in Table 1.
 A close to unity value (i.e., 0.926) of the coefficient of determination (R 2 ) of the proposed model indicates a satisfactory agreement of the predicted with the observed values of GSR. The higher value (0.926 compared to 0.795) of the coefficient of determination indicates that the mapping accuracy of the proposed model in predicting global solar radiation is over 13% better than the existing model (see Fig. 3).  The MBE value obtained is positive for the proposed model and negative for the existing model. This suggests that the proposed and existing models vary between overestimation and underestimation of GSR estimates respectively. However, the higher magnitude of the MBE value from the existing model indicates significant (high) underestimation. The proposed model has little (insignificant) overestimation.  The RMSE value of the proposed model is lower compared to the existing model. This indicates a better accuracy of estimation of the proposed model.
For the model's estimates to be judged statistically significant at the (1 -α) confidence level, the calculated t-value must exist between the interval defined by -t c and t c (acceptance region under the reduced normal distribution curve), i.e., t-values outside the range of critical t-values indicate that the model has no statistical significance. The t-values of both models are within the range of critical t-values (t c(11,0.025) = 1.96). However, a lower t-value of the proposed model (0.1004 compared to 1.3995) demonstrates that the predictive efficiency of the proposed model is better than the existing model. The results presented in Fig. 3 validate the performance of the proposed temperature-based linear regression model for predicting the daily global solar radiation on a horizontal surface at Warri, Nigeria. The proposed estimates compared favourably with the observed values throughout the year while estimates from the existing sunshine-based model only compared favourably with the observed values during parts of the year (between April and the August break). However, to improve on the accuracy of the estimated results, additional climatological parameters such as relative humidity, relative sunshine duration, solar declination, cloud cover, should be included in the proposed model (Equation 11). This paper ignores other parameters as it intends to create simple and accurate method for estimating the average daily global solar radiation data on a horizontal surface.

Conclusions
This study employs regression analysis and proposed a temperature-based linear regression model used to predict the monthly average daily global solar radiation on a horizontal surface at Warri (Nigeria), which is in satisfactory agreement with the observed values. The predictive efficiency of the proposed temperature-based model exceeds the existing model for the same study. The results suggest that the proposed temperature-based linear regression model for estimating global solar radiation data could replace the existing sunshine-based model. The estimation of the average daily global solar radiation at Warri (necessary for optimal economic sizing of solar photovoltaic systems), will encourage extensive use of solar energy, which is considered as essential solution to the high atmospheric pollution caused from fossil fuel combustion.