A Comparison of Models for Estimating Solar Radiation from Sunshine Duration in Croatia

The performance of seventeen sunshine-duration-based models has been assessed using data from seven meteorological stations in Croatia. Conventional statistical indicators are used as numerical indicators of the model performance: mean absolute percentage error (MAPE), mean bias error (MBE), mean absolute error (MAE), and root-mean-square error (RMSE). The ranking of the models was done using the combination of all these parameters, all having equal weights. The Rietveld model was found to perform the best overall, followed by Soler and Dogniaux-Lemoine monthly dependent models. For three best-performing models, new adjusted coefficients are calculated, and they are validated using separate dataset. Only the Dogniaux-Lemoine model performed better with adjusted coefficients, but across all analysed locations, the adjusted models showed improvement in reduced maximum percentage error.


Introduction
Any energy-related application requires knowledge of the resource availability at the site of interest. In the case of solar energy, this is rather challenging due to its intrinsic variability. Solar radiation changes during the day and throughout the year, but it also very much depends on local conditions in the atmosphere, as well as on relative position of solar collector to the sun. In order to account for intra-and interannual variability of solar radiation, the recommended practice is to estimate solar radiation from long-term measured data [1][2][3][4]. Direct measurement of solar radiation involves special solar radiometers: pyranometers are used for measuring global solar radiation, and pyrheliometers are employed when measuring the direct beam component of solar radiation [5]. As a standard, global solar radiation is measured for a horizontal surface (global horizontal irradiance (GHI)), and a pyrheliometer is mounted on a two-axis solar tracking device, ensuring that the instrument sensor is always directed towards the sun (direct normal irradiance (DNI)). Diffuse component of solar radiation is usually mea-sured the same way as GHI, with the addition of a shadowing device that prevents direct solar radiation to reach the sensor (diffuse horizontal irradiance (DHI)). However, aforementioned techniques are rather expensive as they require regular maintenance by well-trained personnel (regular instrument calibration, cleaning of the sensor area from dust and dirt, control of pyrheliometer alignment, adjustment of a shadowing device, etc.). In majority of the world, the solar radiation is then measured in some other, more economical way. One of the most widely adopted techniques is to measure sunshine duration and then use these data to calculate GHI. Sunshine duration is measured in many locations worldwide and is probably one of the solar radiation measurement techniques with the longest history. The Campbell-Stokes sunshine recorder is the most widely used instrument for measuring sunshine duration, as it is a robust and simple instrument which requires minimum maintenance. The main drawback of this measurement technique is related to the differences in readings of photosensitive cards which may occur between different observers, as well as the fact that measurement accuracy is dependent on the properties of the recording paper. Still, in many locations, this is the only way of ground-based solar radiation measurement. More recent approach towards estimating solar radiation is based on meteorological satellite images. There are several freely available solar radiation databases with data obtained from satellite observations, e.g., PVGIS [6,7], SoDa [8], and NASA SSE [9]. However, this approach involves image processing techniques and different algorithms for estimating solar radiation so the data between these databases can differ significantly. So, we believe that ground-based measurement is still valuable for estimating solar radiation.
Solar radiation in Croatia has not been studied extensively. While sunshine recorders have been in operation in many Croatian locations for decades, solar radiation measurement using pyranometers is limited to a number of meteorological stations and is of recent date. In this paper, we have used the measured solar radiation from seven meteorological stations in Croatia to review and compare seventeen models for estimating GHI from sunshine duration. To the best of our knowledge, this is the first time that these data have been used for such study. The models are evaluated and ranked with regard to the mean absolute percentage error (MAPE), mean bias error (MBE), mean absolute error (MAE), and root-mean-square error (RMSE). Three bestperforming models are then selected, and their parameters are recalculated using the least squares method, in order to adjust the models for Croatian climate. The assessment of adjusted models is performed using separate dataset, and their performance is compared with original models.

Methods
There are many different models for estimating solar radiation from various meteorological parameters. We have limited this study to the models which utilize only geographical data from the measurement station (latitude, longitude, and altitude) and where sunshine duration is the only required meteorological parameter. The calculation is based on monthly average of daily values which means that each month is represented by the typical daily value. The monthly average daily global horizontal radiation is given as a function of relative sunshine duration (measured sunshine duration divided by the maximum possible sunshine duration) and extraterrestrial radiation on a horizontal surface. Extraterrestrial radiation is calculated using a procedure described in the European Solar Radiation Atlas (ESRA) [1,2], with updated value of solar constant to 1,361 W/m 2 [10]. The analysed models are divided into three groups based on the function type and model regression coefficients: linear monthly independent models, nonlinear monthly independent models, and monthly dependent models. is given as follows [11]: where coefficients a and b are equal to 0.29 and 0.52, respectively. Parameter φ, valid for φ < 60°, takes into account the effect of latitude of the site. Page (Model 2) derived a linear relationship between the ratio of average daily global radiation H to the extraterrestrial radiation on a horizontal surface H 0 and the ratio of average daily sunshine duration S to the maximum possible sunshine duration S 0 . The model is given as follows [12]: where regression coefficients a and b are having constant values 0.23 and 0.48, respectively. Bahel (Model 3), Jain (Model 4), and Louche (Model 5) models are all presented by equation (2), with different values for regression coefficients a and b. In the Bahel model, the coefficients are equal to 0.175 and 0.552, respectively [13]. Jain analysed data for Italy and suggested coefficient values 0.177 and 0.692 [14], while Louche for Ajaccio in Corsica obtained regression coefficients 0.206 and 0.546 [15].
The Dogniaux-Lemoine model (Model 6) took for the regression coefficients of equation (2) the following equations [16]: where φ is the latitude of the location. Rietveld (Model 7) in his study analysed data from 42 locations worldwide and found that regression coefficient a depends linearly, and coefficient b hyperbolically on relative sunshine duration [17]: If equations (5) and (6) The correlations are given as a third-order function of a monthly relative sunshine duration S/S 0 .

Monthly Dependent Models.
Kilic-Ozturk (Model 15) suggested regression coefficients a and b as a function of the location latitude φ, site altitude h (expressed in meters), and solar declination δ [24]: The Dogniaux-Lemoine (Model 16) model is of the following form [16]: where monthly dependent coefficient A ranges from -0.00245 to -0.00369, monthly dependent coefficient B ranges from 0.33459 to 0.41234, coefficient C ranges from 0.00412 to 0.00578, and coefficient D ranges from 0.27004 to 0.36377. Soler (Model 17) adjusted the Rietveld model to Europe by using data from 100 stations and proposed monthly varying coefficients a and b of equation (2) [25]. The regression coefficient a ranges from 0.17 to 0.24, and coefficient b ranges from 0.52 to 0.66. The values of coefficients for Dogniaux-Lemoine and Soler models are given in Table 1. Osijek). The measured data was thoroughly checked for possible errors and inconsistencies. Months with more than 100 hours of missing data were excluded from the analysis, in order to avoid underestimation. Two datasets are employed: the data from the seven aforementioned stations in the period 2009-2019 are used for model evaluations, for selection of three best-performing models, and for calculation of the adjusted regression coefficients. The period 2009-2019 was selected because it had the smallest number of missing data for majority of locations. For validation of adjusted models, we used 2003-2013 data for Zagreb, Split, and Rijeka, as well as 2014-2019 data for Križevci and Osijek. There are two reasons for selecting these data for validation process: first, we wanted to have at least some data with at least 10year measurement period, and second, we wanted to test adjusted models for locations, which were not included during the evaluation step and calculation of adjusted model coefficients. The drawbacks of the described procedure stem from limitations in the available database (small number of locations with rather short time period). A partial overlap of data used for evaluation and validation steps (Zagreb, Split, and Rijeka 2009-2013) was considered to be a necessary trade-off instead of using shorter time period. This choice is 3 International Journal of Photoenergy based on the fact that majority of missing data occurs during the initial years of measurement.
The stations are shown in Figure 1, and their data are given in Table 2. 2.3. Statistical Analysis. The performance of seventeen sunshine-based models was estimated using common statistical metrics: mean bias error (MBE), mean absolute error (MAE), and root-mean-square error (RMSE). The mean bias error gives insight whether the model tends to underestimate or overestimate the solar radiation with regard to the measurement. However, small MBE value does not necessarily mean the model is unbiased because positive and negative biases offset each other. For that reason, we have also calculated the mean absolute error, which gives the average magnitude of the model deviation from the measured value. The smaller the MAE value, the more accurate the model. The same goes for the root-mean-square error, which is preferably as small as possible to indicate the model agrees well to the measured data. One problem with RMSE is that it gives

Results and Discussion
Monthly average of daily global horizontal radiation is calculated for seven measuring stations in Croatia using seventeen sunshine duration-based models. For each location, the models are compared and ranked with regard to four statistical indicators: MAPE, MBE, MAE, and RMSE. It should be noted that MBE, MAE, and RMSE are all expressed in kWh/m 2 . The results are given in Table 3. MAPE is often used to express the relative difference between the model and measurements and is quite easy to understand. It gives average performance of the model relating the difference with the actual value. Of course, as a single value, it cannot tell anything about the performance of the model throughout the year. In a sense, MAE is similar parameter, only not expressed in relative terms. MAE gives accumulated difference between the model and measurement throughout the year, improving the intrinsic drawback of MBE where positive and negative differences can offset each other. However, there might be cases where rather small value of MAE hides pretty large differences for months with lower solar radiation. Hence, some authors suggest using relative MAE, usually with regard to the average value. The same goes for RMSE which can obtain rather high value in case of larger deviations between model and measurements.
For analysed locations, the smallest MAPE value of only 1.3% is calculated for the Dogniaux-Lemoine monthly dependent model (Model 16) for Split. In terms of MAPE, the same model was the best for Dubrovnik with the value of 3.3% and Zadar with 3.2%. All three locations are on the seaside having typical Mediterranean climate. For each remaining location, the best MAPE performance was achieved by different model: for Gospić, the best model was the Soler model (Model 17) with the value 3.0%; for Parg, the Louche model (Model 5) had 3.3%; the Newland model (Model 11) recorded remarkable 1.6% for Zagreb while the Rietveld model (Model 7) gave the best result for Rijeka of 4.8%, which is also the largest among best-performing MAPE values. Another thing to notice is that MAPE for all models and all locations usually remains below 10%.
In terms of MBE, the best-performing models were Dogniaux-Lemoine monthly dependent model (best for Dubrovnik and Zadar), Gopinathan's linear model (best for Rijeka and Zagreb), Soler model (best for Gospić), Louche model (best for Parg), and Dogniaux-Lemoine monthly             International Journal of Photoenergy independent model (best for Split). One interesting observation is that for Mediterranean locations (Dubrovnik, Split, and Zadar), majority of tested models have negative bias, meaning they underestimate the solar radiation. Looking at MAE values, it can be found that already mentioned models dominate: Dogniaux-Lemoine monthly dependent model performs the best for Dubrovnik, Split, and Zadar. For Gospić, the lowest MAE value was calculated for the Soler model, while the Bahel model (Model 10) performed best for Parg. The Rietveld model was best for Rijeka, and the Newland model gave the best results in Zagreb.

MBE
The analysis of RMSE values discovers that the Dogniaux-Lemoine monthly dependent model is once again the best for Dubrovnik, Split, and Zadar, just as it was for MAPE and MAE values. In terms of RMSE, the bestperforming model for Gospić was the Rietveld model, and the same model was the best for Rijeka. The Newland model again showed the best among models for Zagreb, as was the Louche model for Parg.
In order to find the best overall performing model, the models have been ranked for every location and with regard to each statistical parameter (the smaller the parameter value, the higher the ranking). The best performing was the Rietveld model (Model 7), followed by Soler (Model 17) and Dogniaux-Lemoine monthly dependent models (Model 16).
The performance of the best models throughout the year is shown in Tables 4-6. Obviously, even for the best-performing models, there are months with significant deviation from measured data. Only the Dogniaux-Lemoine monthly dependent model for Split had the percentage error (PE) below 5% for all months. For all other locations, there was at least one month with PE exceeding 5%. In case of Rietveld and Soler models, there was no location with PE less than 5% at least for one month. This suggests that even the best-performing sunshine-durationbased models are quite limited for estimating solar radiation on shorter time scale. However, there were several locations for which PE was kept below 10% boundary throughout the year. For all three models, that was the case for Split, Zadar, and Zagreb, and also for Gospić when using the Rietveld model. Also, we found that often the biggest PE occurs during the period with lower solar radiation (October-February). In that case, the model might still be valuable for estimating solar radiation, especially in terms of yearly energy production, which is often required for preliminary studies.
In an attempt to improve the model performances, we have used measured data and calculated regression coefficients using least-squares method. The adjusted Rietveld coefficients are a = 0:1756 (originally a = 0:18)   Table 1.
For validation of adjusted models, a new dataset has been used: data for Rijeka, Split, and Zagreb in the period 2003-2013 and data for two additional stations Križevci and Osijek covering the period 2014-2019. Križevci and Osijek have been the latest addition to the list of Croatian stations where global horizontal radiation is measured (at both sites the measurements started in 2014). The results of both original and adjusted models are given in Figures 2-6.
Comparison of the models with original and adjusted coefficients was performed with regard to the statistical parameters MAPE, MAE, and RMSE. As a metric of the model dispersion, maximum value of percentage error (max PE) is added (it is taken as absolute value, so it is positive). The results obtained for all three models with adjusted coefficients are shown in Figures 7-9. Here, we must stress that all indicators are calculated as normalized values with regard to the values of the model with original coefficients. Therefore, the values below 1 mean the adjusted model improved the parameter, while the opposite goes if the value is above 1. For the Rietveld model, we can see almost no difference between the original and adjusted models. It is somewhat expected, as the adjusted coefficients only slightly differ from the original ones. Another thing is related to the model itself. Using the simplest linear form, this model cannot account for different microclimatic conditions on a monthly basis. It is therefore difficult to expect perfect match to the measurements in every month, and usually, the coefficients are adjusted to the summer period, preferring to "move" the bigger deviations to the period with lower solar radiation. While not much improvement was achieved with adjusted coefficients to this model, it can be seen that for all five tested locations, maximum percentage error is lowered.
In the case of the Soler model, who originally uses monthly specific regression coefficients, the adjustment of the model did not prove as much success. For three of five locations, the model performs worse than the original, while for two other locations, there was a small improvement. A probable cause why we have not been able to improve the Soler model lies in small number of stations used for calculating the regression coefficients. When regression coefficients must be calculated for each month, the number of samples for linear regression is equal to the number of locations. Therefore, only seven values are taken to calculate the   Interesting observation goes for maximum percentage error, which is here again lower than the original, and in two cases (Križevci and Zagreb), the improvement is significant.

11
International Journal of Photoenergy model performed very well for Split, so changing the coefficients resulted in a slightly worse performance. Yet again, the improvement in max PE is achieved for all five locations.

Conclusions
The up-to-date data covering the period 2009-2019 obtained from seven ground-based meteorological stations in Croatia are used to assess the performance of seventeen models for estimating global horizontal radiation using sunshine duration data. The performance of the models was numerically expressed combining four statistical indicators: mean absolute percentage error, mean bias error, mean absolute error, and root-mean-square error. The Rietveld model was the best overall, followed by the Soler and Dogniaux-Lemoine monthly dependent models. Although Soler and Dogniaux-Lemoine models have monthly specific coefficients, they fell behind the simplest of them, as Rietveld model uses Page's linear relation with yearly constant coefficients. The Dogniaux-Lemoine model performed the best for three seaside locations with Mediterranean climate (Dubrovnik, Split, and Zadar), the Rietveld itself was the best model for Rijeka, and the Soler model performed the best for Gospić. The Newland model gave the best results for Zagreb and was ranked 4 th overall, while the Louche model, which was the best for Parg, was only ranked 11 th overall. In terms of the model percentage error, calculated in % as a deviation from the measured values, we found that in some cases, this error can even surpass 20%. Generally, the biggest percentage error occurs during the months with lower solar radiation. The coefficients of all models have been recalculated in order to adjust the models to the Croatian climate, and the validation For the available dataset, we found that only the Dogniaux-Lemoine model performed better with adjusted models, while the Soler model even showed a bit worse performance with regard to the original coefficients. We believe that this is because the number of data values used for calculation of the adjusted coefficients was too small to achieve better correlation. This will be included in our future work. As for the Rietveld model, the R 2 value was 0.9397, and the adjusted coefficients do not differ much from the original ones so the performance of the adjusted and original model is pretty much the same. One thing to notice is that adjusted coefficients in all cases improved the maximum percentage error, and in some cases, this improvement was significant.

Data Availability
Sunshine duration data used in the study were supplied by the Croatian Meteorological and Hydrological Service under the research license and cannot be made freely available. Calculation data are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.