Evaluation of global horizontal irradiance estimates from ERA5 and COSMO-REA6 reanalyses using ground and satellite-based data

This study examines the progress made by two new reanalyses in the estimation of surface irradiance: ERA5, the new global reanalysis from the ECMWF, and COSMO-REA6, the regional reanalysis from the DWD for Europe. Daily global horizontal irradiance data were evaluated with 41 BSRN stations worldwide, 294 stations in Europe, and two satellite-derived products (NSRDB and SARAH). ERA5 achieves a moderate positive bias worldwide and in Europe of +4.05 W/m2 and +4.54 W/m2 respectively, which entails a reduction in the average bias ranging from 50% to 75% compared to ERA-Interim and MERRA-2. This makes ERA5 comparable with satellite-derived products in terms of the mean bias in most inland stations, but ERA5 results degrade in coastal areas and mountains. The bias of ERA5 varies with the cloudiness, overestimating under cloudy conditions and slightly underestimating under clear-skies, which suggests a poor prediction of cloud patterns and leads to larger absolute errors than that of satellite-based products. In Europe, the regional COSMO-REA6 underestimates in most stations (MBE = −5.29 W/m2) showing the largest deviations under clear-sky conditions, which is most likely caused by the aerosol climatology used. Above 45°N the magnitude of the bias and absolute error of COSMO-REA6 are similar to ERA5 while it outperforms ERA5 in the coastal areas due to its high-resolution grid (6.2 km). We conclude that ERA5 and COSMO-REA6 have reduced the gap between reanalysis and satellite-based data, but further development is required in the prediction of clouds while the spatial grid of ERA5 (31 km) remains inadequate for places with high variability of surface irradiance (coasts and mountains). Satellite-based data should be still used when available, but having in mind their limitations, ERA5 is a valid alternative for situations in which satellite-based data are missing (polar regions and gaps in times series) while COSMO-REA6 complements ERA5 in Central and Northern Europe mitigating the limitations of ERA5 in coastal areas.


Introduction
Different methods have been developed to estimate surface irradiance in the absence of ground records (Urraca et al., 2017c). Satellitebased models using images from geostationary satellites are the most extended approach (Sengupta et al., 2015) nowadays. They provide gridded datasets of surface irradiance since the 1980s (Polo et al., 2016), with hourly or higher time resolutions and spatial resolutions down to few km. However, these products are not freely available for some regions such as Australia or Japan, while the spatial coverage of geostationary satellites is limited to latitudes within ± 65°. Products from polar-orbiting satellites have global coverage but they only provide daily data because these satellites pass over a fixed equatorial region only twice per day. Atmospheric reanalysis is an alternative that produces long-term irradiance data with global coverage (including the poles), intra-daily time resolutions, spatial resolutions around 30-80 km and no missing values. They are usually distributed at no cost and include a large number of weather parameters besides surface irradiance, making them an attractive option to assess surface irradiance. This is shown by the increasing number of research studies and industrial applications that incorporate reanalysis products (You et al., 2013;Juruˇs et al., 2013;Pfenninger and Staffell, 2016). However, the quality of irradiance data from reanalysis is generally lower than that of satellite-based products (Bojanowski et al., 2014;Urraca et al., 2017b) The two most widely used reanalyses are probably ERA-Interim and MERRA with several validations published about their surface irradiance values. The quality of ERA-Interim values was checked against ground stations in Europe (Bojanowski et al., 2014;Urraca et al., 2017b), Spain (Urraca et al., 2017c) and in the Eastern Mediterranean (Alexandri et al., 2017), among other places. Besides, ERA-Interim was also compared against satellite products from CM SAF (Träger-Chatterjee et al., 2010;Bojanowski et al., 2014;Urraca et al., 2017b) and against the CERES-EBAF dataset (Alexandri et al., 2017). Most validations of the NASA's GMAO products (Yi et al., 2011;Zhao et al., 2013;Juruˇs et al., 2013) were based in the former MERRA dataset (Rienecker et al., 2011), as the new MERRA-2 was fully released on 2016 and only few works have already assessed the changes in surface irradiance data from MERRA to MERRA-2 Pfenninger and Staffell, 2016). MERRA and ERA-Interim were directly compared by Boilley and Wald (2015), while for more general validations that compare global reanalysis from different organizations the authors refer to Wang and Zeng (2012), Decker et al. (2012) and Zhang et al. (2016).
All these studies found large biases in global horizontal irradiance (G) estimations from MERRA, MERRA-2 and ERA-Interim when the datasets were compared against ground and satellite data. The average bias worldwide was positive for MERRA and ERA-Interim (Decker et al., 2012;Zhang et al., 2016), and strong overestimations were observed in regions such as Europe, Asia and North America. This positive bias was related to an underestimation of the cloud fraction (Zhao et al., 2013;Zhang et al., 2016), although the opposite effect, small negative biases under clear-skies, was also described by Boilley and Wald (2015). This dependence of the bias on the clearness level evidences severe limitations of the reanalyses when modeling cloud patterns (Träger-Chatterjee et al., 2010;Yi et al., 2011;Alexandri et al., 2017). The biases under clear-skies were also related to aerosols and water vapor data (Zhang et al., 2016), but it is generally considered a secondary defect compared to clouds. Some authors have attempted to correct these biases. Zhao et al. (2013) corrected MERRA with ground data using an empirical relationship based on the daily cloudiness and the elevation. Jones et al. (2017) adjusted ERA-Interim to the satellitebased dataset HelioClim-3v5 (Blanc et al., 2011) using the clearness index and the cumulative distribution functions. These approaches may partly mitigate the consequences of using data with high average biases, but there is no method able to make a posteriori corrections of the large and highly variable errors caused by a poor modeling of  (Boilley and Wald, 2015). Hence, the assessment of solar irradiance with these products is in general not recommended (Träger-Chatterjee et al., 2010;Boilley and Wald, 2015;Bojanowski et al., 2014;Urraca et al., 2017c), and its use is limited to filling gaps in times series (Bojanowski et al., 2014) or to providing gross estimates in places with a high amount of clear-sky days (Boilley and Wald, 2015). Our intention is to evaluate whether the two recently released reanalysis, the global ERA5 and the regional COSMO-REA6, are able to overcome the limitations of former reanalyses making them valid alternatives to estimate surface irradiance. To do so, we benchmark the new products against two previous reanalyses, ERA-Interim and MERRA-2, and also against two well-known products based on geostationary satellite data: SARAH for Europe and Asia, and the National Solar Radiation Database (NSRDB) for the Americas. This comparative study is performed in terms of the daily G from 2010 to 2014 using two ground datasets: 41 BSRN stations distributed worldwide and 294 additional stations in Europe.
The paper is organized as follows: Section 2 describes the different reanalysis products as well as the satellite-based datasets used in this study. Ground station data gathered for the validation are detailed in Section 3. Section 4 describes the steps followed in the evaluation. Section 5 reports the validation results for the different reanalysis products and the comparison with the satellite-based data sets. Finally we present our conclusions in Section 6.

Solar radiation data sources
2.1. Reanalysis products 2.1.1. ERA-Interim ERA-Interim (Dee et al., 2011) is the 4th generation of reanalysis products from the ECMWF. This dataset is available almost operationally (with two months of delay) from 1979 to present. ERA-Interim uses a 12-hourly 4DVar data assimilation system. The NWP forecasting model is initialized at 00:00 and 12:00 UTC with a time step of 30 min, but the output frequency is 3-hourly for surface (2D) variables such as solar irradiance. It uses climatological values for aerosols, carbon dioxide, trace gases and ozone, while it takes prognostic information from the forecasting model for the water vapor. Besides, the product has an overestimation of the irradiance at the top of the atmosphere of 2 W/m 2 (Dee et al., 2011).
The variable used for this study is the surface solar radiation downwards (SSRD) [J/m 2 ]. In particular, we retrieved the steps 3, 6, 9 and 12 of the two daily forecasts performed at 00:00 and 12:00 UTC, where the step represents the number of hours from the beginning of the forecast. For accumulated variables such as SSRD, the value provided is the sum from the beginning of the forecast to the forecast step. Hence, some post-processing was required to obtain the 3-hourly means. Data can be ordered (for free but registration required) from this web interface: http://apps.ecmwf.int/datasets/data/interim-full-daily/ levtype=sfc/

ERA5
ERA5 is the new climate reanalysis dataset from ECMWF (5th generation). The most substantial upgrades compared to ERA-Interim are the finer spatial grid (31 km vs. 79 km), the higher time resolution (hourly vs. 3-hourly), the higher number of vertical levels (137 vs. 60), a new NWP model (IFS Cycle 41r2) and the increase of the amount of data assimilated. The data assimilation model is also a 12-hourly 4DVar. The dataset will cover from 1950 to near real time, but at the time of writing only data for the period 2010-2016 is available.
The variable used is also the SSRD [J/m 2 ], which is part of the forecast fields. The short forecasts are run at 06:00 and 18:00 UTC every day generating 18 forecast steps (up to 18 h) for each run. In this study, only steps 1 to 12 (the first 12 h) of each short forecast were used. Compared to ERA-Interim, the value provided for accumulated variables is the sum since the previous forecast step, e.g., the step 1 from 06:00 UTC forecast is the SSRD from 06:00 to 07:00 UTC. Therefore, mean hourly irradiance values obtained from SSRD (accumulated radiation) forecasts are centered at half-hourly intervals, e.g., the step 1 from 06:00 UTC forecast gives the hourly mean irradiance centered at 06:30 UTC. Daily means were calculated by aggregating all steps from 00:00 to 23:59 UTC: steps 7-12 initialized at 18:00 UTC of the previous day (values centered at 00:30-5:30 UTC), steps 1-12 initialized at 06:00 UTC of the corresponding day (values centered at 6:30-17:30 UTC) and steps 1-6 initialized at 18:00 UTC of the corresponding day (values centered at 18:30-23:30 (UTC)). Instructions for download are found here: https://software.ecmwf.int/wiki/display/ CKB/How+to+download+ERA5data+via+the+ECMWF+Web +API.

MERRA-2
MERRA-2 (Gelaro et al., 2017) is the second version of the Modern-Era Retrospective Analysis for Research and Applications (MERRA) produced by NASA's GMAO. MERRA-2 was created to replace the former MERRA reanalysis (Rienecker et al., 2011) and solves the limitations of the later in the assimilation of the newest sources of satellite data. The new version maintains some of the main features of its predecessor, such as the spatial and time resolutions and the 3DVar 6-h update cycle. The NWP model is intialized at 00:00, 06:00, 12:00 and 18:00 UTC and it has hourly resolution for surface irradiance variables. The most substantial upgrades in MERRA-2 are the use of a new version of the GEOS-5 atmospheric model (Molod et al., 2015) and the assimilation of aerosol data to analyze five aerosol species including black and organic carbon, dust, sea salt and sulfates.

COSMO-REA6
COSMO-REA6 (Bollmeyer et al., 2015) is a regional reanalysis product developed by the HErZ/DWD with a resolution of about 6.2 km (COSMO-REA6) for Europe, and with a high-resolution version around 2 km (COSMO-REA2)  for Germany. The product is based on the implementation of a regional NWP model using ERA-Interim estimates as boundary conditions. The data assimilation system uses a continuous nudging scheme which makes possible the continuous assimilation of observations. This process is interrupted every 3 h (00:00, 03:00, 06:00 UTC, …) for updating lateral boundary conditions, every 6 h (00:00, 06:00, 12:00, 18:00 UTC) for snow analysis and every 24 h (00:00 UTC) for sea surface temperature analysis and soil moisture analysis. The NWP model is the COnsortium for Small-Scale MOdelling (COSMO) limited-area model (LAM) (Schättler et al., 2011), which is part of the DWD operational scheme and is run with a time step of 50 s. The radiation scheme uses instantaneous distributions of clouds and water vapor, whereas aerosols are modeled with the Tanré climatology. This climatology is used because it is the standard input of COSMO model, even though it is known that it provides too high values of aerosols optical thickness for Europe (Frank et al., 2018).
The output resolution available for surface radiation is 1 h. The variables retrieved were the instantaneous direct radiation (SWDIFDS_RAD) and the instantaneous diffuse radiation (SWDIRS_RAD) [W/m 2 ], which are the beam (B) and diffuse (D) horizontal irradiance, respectively. B and D were subsequently added to obtain G. Data can be downloaded from:ftp://ftp-rea.dwd.de/pub/ REA/COSMO_REA6.

Main features of the reanalysis products
t The spatial and temporal resolution and extent vary between different reanalysis products. In addition, while all the products contain global horizontal solar irradiance at the Earth surface, some of them also provide estimations of the solar radiation components (B and D). Table 1 lists the main features of each reanalysis product used in the study.
The native format of the reanalysis data is not given in latitudelongitude coordinates, so the data have been interpolated onto a regular lat/lon grid in the following ways: • MERRA-2 data have been downloaded using the recommended resolution of ×°0.5 0.625 (thus using the interpolation methods of the data provider).
• ERA-Interim data have been downloaded using the recommended resolution of ×°0.75 0.75 .
• ERA5 data have been downloaded using a resolution of ×°0.25 0.25 which is slightly finer than the recommended lat/lon resolution. This was done in order to make the ERA5 grid match with the ERAinterim, COSMO-REA and SARAH grids.
• COSMO-REA6 These data are downloaded in the native format which uses a rotated pole coordinate system. The data have been reprojected to a lat/lon grid with resolution ×°0.05 0.05 by bilinear interpolation, using the Climate Data Operators software (https:// code.mpimet.mpg.de/projects/cdo/).
For the three reanalyses with rather coarse resolution (MERRA-2, ERA-interim, and ERA5), the irradiance values at the station location were obtained using inverse distance weighting (IDW) interpolation from the nearest four data points. For COSMO-REA6, the extracted values are those for the corresponding pixel in the raster maps.

Satellite-based products
Solar radiation datasets derived from geostationary satellite images are necessarily limited to the part of the Earth seen by the satellites used, with the additional restriction that the calculation methods have reduced accuracy in areas that are seen at a very sharp angle by the satellite. For a given satellite, the usable area is typically restricted to less than 60-65°lat/lon away from the satellite nadir. In this study we have made use of two satellite-based solar radiation products, whose main features are detailed in Table 1. Their algorithms are briefly described in the following paragraphs.

SARAH
SARAH solar radiation data set (Müller et al., 2015a) is produced by the CM SAF Collaboration and can be freely downloaded from the CM SAF Web user interface. In this study we use the version implemented in the Photovoltaic Geographical Information System (PVGIS) (PVGIS, 2016), which is denoted as PVGIS-SARAH. This version is virtually equivalent to SARAH-1 and the main differences are that PVGIS-SARAH uses the images of the two METEOSAT geostationary satellites (0°and 57°E) covering Europe, Africa and Asia, and that the hourly values are directly calculated from one individual satellite image. The algorithm is based on a semi-empirical approach as described in Müller et al. (2015b) that starts with the calculation of the cloud index using the visible channels from METEOSAT satellite. The cloud index is used to obtain the clear-sky index and ultimately to correct the output of the fast radiative transfer model MAGIC (Müller et al., 2009). The clear-sky model uses monthly averages of water vapor from ERA-interim and monthly climatologies of aerosol optical depth from MACC. The new SARAH-2 (Pfeifroth et al., 2017) has been recently released but the upgrades have not yet been implemented in PVGIS-SARAH.

NSRDB
NSRDB is produced by the National Renewable Energy Laboratory (NREL) for the Americas. The current available version is the Physical Solar Model (PSM) (NREL, 2017), which is the first one based on a fully physical approach. In a first stage, the PATMOS-X cloud algorithm is used to obtain cloud properties (cloud mask, cloud type, cloud height, cloud optical depth, etc.) using data from 4 channels (visible and infrared) of GOES geostationary satellites situated at 75°W and 135°W. These cloud products along with other meteorological variables, mainly aerosol and water vapor, are then used to calculate the surface irradiance with two radiative models: REST2 for clear-sky conditions and FARMS for all-skies. Aerosol data is obtained from a combination of MODIS/MISR satellite products and AERONET ground data, while the rest of ancillary variables such as water vapor are obtained from the MERRA reanalysis. In this study, and similarly to SARAH, we have only used hourly values despite the fact that the NSRDB provides 30-min estimates. Data for single locations or geographical areas are available from: https://nsrdb.nrel.gov/download-instructions.

Baseline Surface Radiation Network (BSRN)
Solar radiation data from the Baseline Surface Radiation Network (BSRN) (BSRN, 2017) have been widely used for solar radiation validation purposes due to the generally high quality of the data and the worldwide (though sparse) coverage (Ohmura et al., 1998). BSRN stations record global horizontal irradiance with Secondary Standard thermopile pyranometers and they normally report these data at oneminute intervals. Excluding the stations in the Antarctic, we have chosen the 41 stations that have at least one year with 7500 h of valid data in the period 2010-2014. Only the years with 7500 h or more are used. Out of these 41 stations, 14 lie within the area covered by SARAH and 13 stations are in the area covered by NSRDB. The final number of stations and years of data used are listed in Table A.1.
The main analysis was performed excluding the values from Izaña (IZA) due to the extreme underestimation exhibited by both, reanalysis and satellite-based products, in that location. This is explained by the particular conditions of the station, which is located in Tenerife island (2034 km 2 ) at 2373 m above sea level, within Teide volcano area (3718 m above sea level). The station is above the subtropical temperature inversion layer and affected by a quasi-permanent subsidence regime (García et al., 2016), so the clouds typically affect the lower part of the island (below 2000 m above sea level) while the upper area remains clear-sky. The inclusion of IZA values in the calculation of the summary statistics and in the different figures would have interfered in the adequate evaluation of the products in the main group of stations. The bias of each product in IZA can still be found in Table A.1.

Station data for Europe
The second ground dataset consists of a dense network of pyranometers in Europe. The original dataset, described in Urraca et al. (2017b), comprises 313 stations with hourly records of G in the period 2005-2015. The majority of the records were obtained with Secondary Standard pyranometers covering Finland, Norway, Sweden, Germany, United Kingdom, France and Spain. The stations belong to national meteorological services except for Norway and Spain, where data from agricultural networks were used. Besides, some of the BSRN stations over Europe were also included. The original dataset has been modified by replacing the 33 stations from the Spanish agricultural network

Data processing and quality control
The validation was made with the daily means of G despite the fact that all reanalyses provide sub-daily values (Table 1) and all stations have at least hourly records. Hourly estimates were not validated because the hourly intervals of estimates and records do not share the same midpoint in all stations (Urraca et al., 2017b). The validation of the hourly values would have required one-minute ground records, and these are only available in the BSRN stations. Note that the bias remains constant from daily to hourly resolution if the dataset does not include missing values, which is true for reanalysis products. However, the absolute error for the hourly values will be substantially higher than that obtained with the daily means.
Prior to the calculation of the daily means, night values of ground data (solar elevation < 0°) were set to 0 and samples were quality controlled with the BSRN range tests (Eq. (1)) (Long and Dutton, 2002). (1) where θ is the solar zenith angle and E N the solar constant adjusted to Earth -Sun distance. Samples that fell outside the "Physically possible" and "Extremely rare" intervals were set to not available. The daily means were calculated with the quality controlled data following the aggregation procedure described in Roesch et al. (2011). If the data was provided at one-minute resolution, the 15-min averages were computed when at least 5 valid minute values were available. Then, the hourly means were obtained if all four 15-min values were valid. The daily values were obtained by averaging the hourly values if at least 20 h were available. A second quality control (QC) procedure was applied to the daily means of all stations. The QC method is based on the analysis of the temporal stability of the daily deviations between three independent radiation products (SARAH-1, CLARA-A2 and ERA-Interim) and the ground records at each station. See Urraca et al. (2017a) for a more detailed description of this method. This second QC enabled the detection of different operational errors in several stations such as shading, soiling, miscalibrations and large errors in the sensors. Satellite and reanalysis products were also retrieved at sub-daily time resolutions and then the aggregation procedure used with the ground records was applied to obtain the daily means. In the case of SARAH, the night slots were set to 0 and negative values during daytime (sun elevation > 0°) were set to not available. The values from gridded datasets are not generally available at the exact location of the stations. For products with fine spatial grids (SARAH, NSRDB and COSMO-REA6), values extracted were those from the pixel defined by the station location, whereas for products with coarse grids (ERA-Interim, ERA5 and MERRA-2), the nearest four data points were interpolated using IDW.

Validation with ground stations
The performance statistics calculated at each station are the mean bias error (MBE), the mean absolute error (MAE), the relative mean bias error (rMBE) and the relative mean absolute error (rMAE) (Eqs. (2)-(5)).
where G d meas and G d est are the measured and estimated daily irradiances and N d is the number of valid daily values per station. Daily records from the polar night were excluded from the calculation of these metrics.
The overall performance of each product per ground dataset was evaluated with the mean and standard deviation (sd) of the statistics in each individual station. The mean of the absolute value (mean abs ) of the individual MBE was also calculated to give a measure of the overall uncertainty in the annual means from each product (Gracia Amillo et al., 2014). The variation of the quality of each product with the cloudiness was evaluated with the clearness index (KT), which is an indicator of the optical state of the atmosphere (Eq. (6)).
where E d is the daily extraterrestrial irradiance received on a horizontal plane. In the high-latitude stations, a few samples were obtained with > KT 1 d meas in periods of low irradiance before or after the polar night and they were excluded from the figures including the KT.

Comparison of reanalysis data against satellite-based products
A second validation was performed by calculating the difference between the yearly average irradiance of reanalysis and satellite-based data, in order to mitigate the limitations of validating gridded products against ground records. Ground values are point estimates while reanalysis provides the average of solar irradiance over a cell that spans many kilometers (30-80 km). Hence, larger errors are obtained in regions with high spatial variability where the spatial representativeness of the stations decreases (mountains and coastal areas), as well as with products that use coarser grids (Hazuba et al., 2013). These uncertainties cannot be attributed to the physics underlying the reanalysis model. Besides, this second validation is justified by the superior quality exhibited by satellite-derived products, while it enables the evaluation of products in places with low density of ground stations.
The differences between gridded products were calculated at the original horizontal resolution of each product. Other authors remapped the product with higher resolution to the coarser spatial grid (Träger-Chatterjee et al., 2010) but herein, we intended to show the variation of the irradiance within a unique pixel of the coarse grids of reanalysis datasets. This emphasizes how inappropriate these grids may be in some particular regions.

Validation results for reanalysis products in the BSRN stations
The validation results for the 40 BSRN stations are summarized in Table 2, while the individual biases per station are detailed in Fig. 1 and  reanalyses. The most significant improvement is the reduction of the strong positive bias, which has traditionally been the main limitation of ERA-Interim in particular, and of most global reanalysis products in general (Juruˇs et al., 2013;Decker et al., 2012;Zhang et al., 2016;Draper et al., 2017). ERA5 has a mean positive bias of +4.54 W/m 2 worldwide, which represents a reduction in the MBE of around the 50% compared to ERA-Interim (MBE = +10.05 W/m 2 ) and MERRA-2 (MBE = +11.34 W/m 2 ). High positive biases are still obtained in North America, coastal areas and small islands, such is the case of the extreme bias of +36.22 W/m 2 obtained for Kwajalein (KWA), a small atoll of 16 km 2 in the Marshall Islands. However, the overestimation is generally reduced for most locations (Fig. 1) as well as the percentage of BSRN stations in which the bias is positive (42.5% for ERA5 vs. 82.5% and 75% for ERA-Interim and MERRA2, respectively).
In the 13 stations under the coverage of METEOSAT satellites, the mean MBE of ERA5 is similar to that of SARAH (+0.84 vs. +0.61 W/m 2 ) while the mean abs of individual MBEs is even smaller (3.70 vs. 4.79 W/m 2 ), showing also a similar standard deviation (6.16 vs. 6.00 W/m 2 ). Both products obtained small biases within ±5 W/m 2 for most Europe, while ERA5 outperforms SARAH in South American stations. SARAH tends to overestimate in Brazil obtaining a high positive bias of +14.44 W/m 2 for Petrolina (PTR) ( Table A.1). The overestimation of G in this region is observed in all products evaluated but ERA5, and it could be attributed to the high humidity in the area along with an underestimation of water vapor and cloud occurrence by the majority of methods (Thomas et al., 2016). In the places within the coverage of GOES satellites, the mean bias of ERA5 is higher than that of NSRDB (+6.30 vs. +1.77 W/m 2 ). The positive bias of ERA5 is a consequence of a strong overestimation over North America with several stations exceeding +10 W/m 2 . The NSRDB obtains smaller biases in most of those stations, but its lower mean bias is also explained by the compensation of high positive biases (+13.22 W/m 2 for BER, +21.26 W/m 2 for PTR) by high negative biases (−18.51 W/m 2 for REG) (Table A.1). Hence the mean abs of individual biases of ERA5 is actually smaller than that of NSRDB (6.79 vs. 8.19 W/m 2 ) and the same occurs with the standard deviation (5.28 vs. 10.65 W/m 2 ). In general, these results suggest that ERA5 is closer in terms of the bias to satellite-derived products than it is to previous reanalyses.
The MBE was calculated at each station for different intervals of KT (Fig. 2) to examine the effect of the cloudiness in reanalysis estimates. The MBE was only calculated if the interval had at least 20 daily values in order to mitigate the influence of unusual conditions in the location. All global reanalyses overestimate G under cloudy and intermediate conditions, while they slightly underestimate under clear-sky conditions. ERA5 reduces the overestimation for < KTs 0.6 when compared to ERA-Interim and MERRA-2, leading to the smaller average bias shown by ERA5 in Table 2. This is probably due to the upgrades in the NWP model and the data assimilation process, as well as the increased horizontal, vertical and temporal spatial resolutions. On the other hand, the negative biases under clear-skies observed in ERA-Interim still remain in ERA5, while MERRA-2 is the global reanalysis that least underestimates G under these conditions. This underestimation for > KTs 0.7 mitigates the positive biases under cloudy conditions and it partly contributes to the smaller MBEs obtained by both ECMWF products.
The most likely cause of this variation of the bias with KT is a poor modeling of clouds, which are the main attenuating component for allsky conditions (Träger-Chatterjee et al., 2010;Sengupta et al., 2017;Polo et al., 2016). Träger-Chatterjee et al. (2010) already observed a strong underestimation of ERA-Interim in Central Europe for cloudy months. They suggested that those deviations were likely dominated by clouds because the effect of aerosols was not large enough to explain them. Zhang et al. (2016) compared ERA-Interim and MERRA worldwide and they also found high discrepancies in cloudy regions, concluding that the differences observed in cloud fraction fields led to the errors obtained in surface irradiance. Boilley and Wald (2015) compared ERA-Interim and MERRA-2, and similarly to the current study, they found a strong overestimation of G under cloudy conditions and a slight underestimation of ERA-Interim under clear-skies. They stated that deficiencies by MERRA and ERA-Interim in prediction of the cloud amount would explained the low correlation observed for KT and G. The overestimation of KT under cloudy conditions could be explained by different failures related to clouds and their composition: false prediction of clear-sky situations, underestimation of the cloud fraction, optically too thin clouds or incorrect cloud properties (cloud phase or liquid/ice water content), among other issues. The type of failure may vary among products and also spatially, so further work is still needed to verify the cause of the deviations under cloudy conditions in each model. Nonetheless, Fig. 2 suggests that a deficient clouds modeling is one of the main limitations of reanalysis models compared to satellite-derived methods.
Another factor that influences the bias is the type of aerosol and water vapor data used, especially under clear-sky conditions. This could be the case of Xianghe (XIA), which has high concentrations of air pollutants (Zhang et al., 2016), and Tamanrasset (TAM), which receives high loads of dust from Sahara Desert. Other examples are Bondville (BON), Petrolina (PTR) and Brasilia (BRB), which are located in regions with a high water vapor content. However, the bias still varies with KT in all these places, which indicates that the influence of aerosols and water vapor in the bias is in a second order of importance compared to that of clouds.
Looking at the variance of the MBE within each station (Fig. 2), the places with largest variations are mostly located in coastal areas and small islands. Out of the 19 stations with highest variance (stations on the bottom in Fig. 2 ERA5 is also better than ERA-Interim and MERRA-2 in terms of the absolute error, obtaining a MAE of 23.13 W/m 2 compared to the 28.28 W/m 2 of ERA-Interim and the 30.17 W/m 2 of MERRA-2 (Table 2). Aside from the extreme MAE obtained for KWA, the individual MAEs (Fig. 3) decrease in the majority of stations and ERA5 only obtains MAEs exceeding 25 W/m 2 in coastal areas and islands. However, the improvement shown by ERA5 is not enough to reach the quality of satellitebased products. ERA5 has a moderate positive bias, but it is generally a result of an overestimation under cloudy conditions and an underestimation under clear-skies, whereas the bias of satellite-based products is more stable with the cloudiness. This high-variance in the bias leads to large absolute errors, creating a gap in terms of the absolute error between reanalysis and satellite-based data. This is especially evident in the comparison of ERA5 against SARAH (Table 2), as both products obtained similar mean biases (+0.84 vs. +0.61 W/m 2 ) but the MAE of ERA5 is significantly larger than that of SARAH (21.77 vs. 13.44 W/m 2 ). The difference is less clear between ERA5 and the NSRDB (22.36 vs. 17.46 W/m 2 ) due to the large individual MAEs obtained by the NSRDB for South America and some North American stations (Fig. 3).

Spatial comparison of reanalysis products against NSRDB and SARAH
The maps of the differences between global reanalyses and the NSRDB (Fig. 4) share a similar pattern with positive deviations in North America and high negative differences in Central and South America. In North America, the bias of the NSRDB in the BSRN stations is the smallest overall, thus the positive differences observed are most likely    two satellites used to calculate the NSRDB data. The magnitude of this discontinuity is substantially smaller than that of the reanalysis validation errors, making its impact on the results negligible. However, it serves to illustrate that the deviations shown here are not exclusively due to the reanalysis products, though in general the NSRDB data are more accurate (Table A.1). The maps become more complex in Central and South America. The NSRDB obtained strong positive biases for the few BSRN stations available and the patterns observed in the area covered by both NSRDB and SARAH differ. Hence, the NSRDB might not be the most adequate reference product in this region. The negative differences obtained by ERA5 in Western Brazil then indicate that ERA5 has a smaller bias than NSRDB there, as it occurs in the Brazilian BSRN stations (Table A.1). However, this cannot be extrapolated to the whole South America due to the lack of ground data. A pattern shared by the maps based on SARAH and NSRDB in South America is the strong negative deviations exceeding −20 W/m 2 for the Amazon Basin. This area belongs to the Inter Tropical Convergence Zone (ITCZ), which is a belt that encircles the Earth near the equator. The most likely cause of these negative deviations is the overestimation of the clouds in the ITCZ, a region that already has a high frequency of clouds (Zhang et al., 2016). Fig. 4 depicts a discontinuity in the set of pixels that cover both water and land. For instance, high negative differences are observed in MERRA-2 in the coasts of Central America, while ERA5 shows a positive deviation in the North American West coast. These discontinuities are the cause of the high biases obtained in all coastal stations evaluated and they corroborate the limitations of reanalysis in coastal areas. In the area covered by SARAH, the discontinuity at the border between METEOSAT Prime and METEOSAT East is also visible here, though somewhat less pronounced than for the NSRDB data. ERA5 presents again the smallest deviations with values within ± 5 W/m 2 for most Europe, Central Asia and South Africa. Large negative deviations are obtained for regions within the ITCZ such as the Guinean Gulf, West Africa and India. This corroborates the limitations of reanalysis models in the ITCZ in particular, and in regions with frequent occurrence of clouds in general. On the other side, largest positive deviations are observed in the Tibetan Plateau and China. The positive bias for the Tibetan Plateau agree with the ones obtained in other mountain ranges such as the Alps, the Pyrenees, the Rocky Mountains or the Andes, while the overestimation of G in China could be related to an underestimation of clouds and anthropogenic aerosols as reported by Zhang et al. (2016). In general, the maps evidence that the average irradiances from ERA5 and SARAH are comparable mostly for flat regions with low occurrence of clouds. R. Urraca et al. Solar Energy 164 (2018)  The performance metrics of the reanalysis products in the European stations are summarized in Table 3, while the individual biases per station are depicted in Fig. 5. Overall, the statistics reported for Europe are in concordance with those obtained for the BSRN stations. In that sense, ERA5 also reduces the overestimation exhibited by previous global reanalyses showing a similar moderate positive bias of +4.05 W/m 2 . The reduction of the overestimation is greater than that reported for the BSRN stations (Table 2), reaching 65% compared to ERA-Interim (MBE = +11.58 W/m 2 ) and 78% compared to MERRA-2 (MBE = +18.45 W/m 2 ). ERA5 overestimates in 80% of the stations but the bias is within ± 5 W/m 2 in 60% of the European stations. The bias of ERA5 increases in the Norwegian, British, French or Spanish coasts, although the overestimation is not as high as the one exhibited by ERA-Interim and MERRA-2. As mentioned in the analysis of the BSRN stations, the positive bias in the coasts is mostly caused by the use of coarse spatial grids in areas where the spatial variability of irradiance is high. These coarse grids include water and land within the same pixel that for ERA5 spans around 31 km. On the contrary, surface irradiance in coastal areas rapidly changes in just a few km as shown in Fig. A.1.  Fig. A.1 also depicts that in most of the Atlantic and in the Northern Mediterranean Coast the sea receives on average more irradiance than land, which explains the positive biases obtained in land stations. This pattern is not universal as can be seen in the African Coast, so this also explains the high negative biases observed in some BSRN stations located at coastal areas.
SARAH obtains the smallest average bias (MBE = +0.86 W/m 2 ) for the 271 stations under the coverage of METEOSAT satellite, but the individual biases (Fig. 5) are heterogeneous across Europe. SARAH overestimates in 61% of European stations, especially in the Mediterranean Coast and in the pre-Alps, while it underestimates in Northern Europe. The positive bias of SARAH for most Europe was already reported by Müller et al. (2015b) while the negative bias for highlatitudes is related to problems with snow detection (Riihelä et al., 2015;Urraca et al., 2017b). Therefore, the differences between ERA5 and SARAH are negligible in terms of the mean abs (3.66 vs. 4.94 W/m 2 ) and the sd (4.74 vs. 4.78 W/m 2 ). This corroborates the fact that in inland regions where the spatial variability of irradiance is low, the mean bias of ERA5 is comparable to that of satellite-derived products.
COSMO-REA6 is the only reanalysis that underestimates G in Europe with a mean bias of −5.30 W/m 2 . The bias is particularly large in South Europe exceeding −20 W/m 2 in 18 Spanish stations. COSMO-REA6 improves above 45°N and the magnitude of the bias here is comparable to that of ERA5. Besides, COSMO-REA6 is the only reanalysis that does not deteriorates in coastal areas due to its high-resolution grid (6.2 km) clearly outperforming ERA5 in the Norwegian or British coasts.
All the products degrade when applied to the 6 mountain stations ( Table 3). The bias of reanalyses gets more positive and the standard deviation of the results increases. COSMO-REA6 is the product with the lowest increase in the average bias and in the standard deviation of individual biases, as a consequence of the 6.2 km spatial grid. It even outperforms SARAH that uses a similar grid in terms of horizontal resolution, showing a lower and more stable bias. This is because SARAH has an heterogeneous performance in the mountains underestimating in the Alps, probably due to snow detection issues, and overestimating in the Pyrenees and the Spanish Central System.
Predicted KT (KT d est ) is plotted against the real KT (KT d meas ) obtained with measured data to examine the dependence of the bias on cloudiness (Fig. 6). An analysis of the MBE for different KT levels, such as the one conducted for BSRN stations in Fig. 2, is not shown for Europe due to the large number of stations. The patterns observed in Fig. 6 mostly agree with the ones discussed in the BSRN dataset (Fig. 2). ERA5 overestimates KT under cloudy and intermediate conditions, while it slightly underestimate it under clear-skies. The overestimation for < KTs 0.7 is considerably reduced in ERA5, with a higher density of points close to the ideal fit line, but the small underestimation for > KTs 0.7 is stronger than it was in ERA-Interim.
The bias of COSMO-REA6 shows a similar compensation effect. The overestimation of G for medium and low transmissivity ( < KT 0.5) cases is less severe than that shown by ERA5. Frank et al. (2018) evaluated COSMO-REA6 using simultaneous records from ceilometers and pyranometers, finding that these samples correspond to cases with low R. Urraca et al. Solar Energy 164 (2018) 339-354 (cloud base < 2 km) and medium-height clouds (2 km < cloud base < 5 km). They suggested that the positive bias for < KTs 0.5 is caused by optically too thin or too few clouds. However, the most significant deviation is the strong underestimation for high transmissivity cases ( > KTs 0.7), which ultimately leads to the mean negative bias of COSMO-REA6 (Fig. 5). Frank et al. (2018) found that the majority of these cases correspond to cloud free conditions, which is in line with the increase of the negative bias with the amount of clear-sky days per year (Fig. 5). They suggested that the negative bias for clear sky conditions is related to the use of the Tanré aerosol climatology (Zubler et al., 2011), which introduces too high aerosol optical depths and therefore a too strong extinction of solar radiation for cloud free conditions. A bias corrected version was proposed using different scaling factors for cloudy and clear-sky situations (Frank et al., 2018(Frank et al., , 2017.
SARAH obtains the best results overall with the highest density of points around the ideal fit line, the least number of extreme values, and the most stable distribution with cloudiness. This is because SARAH uses data from the visible channels of geostationary satellites to detect clouds, while reanalyses generally do not assimilate cloud data and use the forecasts of the NWP model. Therefore, SARAH is able to predict more accurately the cloud pattern than reanalyses contributing to the higher quality of irradiance data from satellite-based products.
The individual MAEs per each station (Fig. 7) verify the improvement of ERA5 in terms of the absolute error compared to MERRA-2 and ERA-Interim. The MAE of ERA5 is 19.27 W/m 2 while those obtained by ERA-Interim and MERRA-2 are 23.90 W/m s and 26.40 W/m 2 respectively (Table 3). The absolute error of ERA5 is below 15 W/m 2 in many Nordic stations due to the low irradiance there, but it is also lower than 17 W/m 2 in most Spain, where the irradiance is high but cloudy days are rare. However the differences in the absolute error between ERA5 and SARAH are still noticeable, with the MAE of ERA5 ranging between 17 and 21 W/m 2 in most inland stations while the MAE of SARAH is usually between 11 and 14 W/m 2 . This is related to the high density of points around the ideal fit line observed in SARAH scatterplot (Fig. 6), as well as with the greater stability of the bias with KT. The absolute error of COSMO-REA6 is the largest overall in Southern Europe, as a consequence of the strong underestimation of G there, but similar to ERA5 above 45°N. Besides, the differences between ERA5 and COSMO-REA6 in coastal areas are almost negligible suggesting that the use of coarse grids has a higher impact on the bias than on the absolute error.

Spatial comparison of reanalysis products against SARAH for Europe
In Europe, SARAH shows fairly low bias ranging from −6.19W/m 2 (TOR) to +6.58W/m 2 (CAR) for the BSRN stations (Table A.1) and being within ±5 W/m 2 for 74% of European stations (Fig. 5). Therefore, if the difference between a reanalysis product and SARAH (Fig. 8) is outside this range, it is an indication that the accuracy of said product is low in that region. ERA5 exhibits the lowest deviations with values within ±5 W/m 2 in most Europe, while the negative deviations observed in Southern Europe suggest that ERA5 is even able to correct the overestimation of SARAH there. Overall, the yearly biases of both products are comparable in the flat and cloudless regions of Europe. The deviations are larger and generally positive in coastal areas, due to the limitations of the spatial grid there, and also in the mountains, but here the bad performance of SARAH hinders an adequate analysis of ERA5. The underestimation of COSMO-REA6 steadily increases with decreasing latitude, exceeding −20 W/m 2 in Spain and Italy and being even higher in the Southern edge of the grid. However, the differences between COSMO-REA6 and SARAH do not increase in the coasts, while the deviations remain within ± 5W/m 2 for Northern Europe and are comparable to that of ERA5. Fig. 8 also shows the improvement of ERA5 over the previous global reanalyses by reducing the large positive biases of MERRA-2 and ERA-Interim for Europe. MERRA-2 shows the largest differences overall, especially for Central and Northern Europe with positive differences around +20 W/m 2 . These positive differences are smaller in ERA-Interim, being around +10 W/m 2 for most Europe, but these values are still considerably larger than those of ERA5. Besides, the increase of the differences in coastal areas, such as Northern Spain, Wales, Scotland and Southern Sweden, is somewhat more pronounced for ERA-Interim than for ERA5. This moderate improvement in coastal areas proves the benefits of the finer spatial grid used by ERA5 (31 km) compared to that of ERA-Interim (81 km).

Conclusions
The study reveals the progress made by the new ERA5 and the regional COSMO-REA6 in the estimation of surface irradiance. In ERA5, the most relevant improvement is the reduction of the positive bias compared to ERA-Interim, making its yearly bias comparable to that of satellite-based products in most inland regions with low occurrence of clouds where the variability of surface irradiance is low. However, a significant variation of the bias with cloudiness most likely related to a poor prediction of clouds is still observed. This leads to larger absolute errors in ERA5, and also in COSMO-REA6, than the ones obtained with satellite-based products. Besides, the study highlights the inadequacy of the grid used by ERA5 (31 km) in coastal areas and mountains, while in Fig. 8. Difference in annual average G between the four reanalysis products and SARAH in the period 2010-2014. Note that the colour scale is limited to ± 30 W/m 2 , larger differences are shown as black. Areas without data from SARAH are shown as dark grey. these regions COSMO-REA6 clearly outperforms ERA5 thanks to its high-resolution grid (6.2 km). COSMO-REA6 has a bias and absolute error comparable to ERA5 in Northern and Central Europe, but the quality of COSMO deteriorates in Southern Europe with biases exceeding −20 W/m 2 . The underestimation of COSMO-REA6 is more pronounced under clear-sky conditions and it is probably related to the use of an aerosol climatology that overestimates the aerosol content.
To sum up, reanalysis products are an attractive approach to estimate surface irradiance, as they provide long term data with global coverage while they include multiple atmospheric parameters. The quality of previous reanalyses such as ERA-Interim or MERRA-2 was quite limited and their use was generally not recommended. The new ERA5 is a substantial quality leap in the estimation of surface irradiance with reanalysis models, but still satellite-derived data should be preferred when available. More work is required in the NWP and data assimilation models in order to improve the prediction of clouds, whereas the spatial grid of ERA5 is inadequate in regions with high variability of irradiance. However, both ERA5 and COSMO-REA6 could be used keeping in mind their limitations. ERA5 can be a valid alternative for regions not covered by geostationary satellites such as the polar regions, as well as to fill gaps in time series. The regional COSMO-REA6 can complement ERA5 in Northern and Central Europe mitigating the deficiencies of ERA5 in coastal areas and mountains.