Hourly global horizontal irradiance over West Africa: A case study of one-year satellite-and reanalysis-derived estimates vs. in situ measurements

Estimates of global horizontal irradiance (GHI) from reanalysis and satellite-based data are the most important information for the design and monitoring of PV systems in Africa, but their quality is unknown due to the lack of in situ measurements. In this study, we evaluate the performance of hourly GHI from state-of-the-art reanalysis and satellite-based products (ERA5, MERRA-2, CAMS, and SARAH-2) with 37 quality-controlled in situ measurements from novel meteorological networks established in Burkina Faso and Ghana under different weather conditions for the year 2020. The effects of clouds and aerosols are also considered in the analysis by using common performance measures for the main quality attributes and a new overall performance value for the joint assessment. The results show that satellite data performs better than reanalysis data under different atmospheric conditions. Nevertheless, both data sources exhibit significant bias of more than 150 W/m 2 in terms of RMSE under cloudy skies compared to clear skies. The new measure of overall performance clearly shows that the hourly GHI derived from CAMS and SARAH-2 could serve as viable alternative data for assessing solar energy in the different climatic zones of West Africa.


Introduction
Global horizontal irradiance (GHI) also called surface shortwave downward radiation or solar irradiance, is defined as the amount of sunlight received from the Sun at the surface.It plays a vital role in the dynamics of the Earth's surface and drives physical processes in the atmosphere and on the land surface [1].In addition, knowledge of the values of GHI in the solar energy sector is crucial to installing photovoltaic (PV) systems at a given location.The West Africa region receives abundant GHI throughout the year; and the daily average is estimated to be around 5-6 kWh/m 2 [2].In recent years, the capacities of solar PV technology in off-grid (rural and urban) and grid-connected systems strongly increased.For instance, between 2016 and 2018, the installed PV capacity almost tripled, and this trend is expected to continue in the coming years [3].However, the long-term profitability of solar energy plants based on the PV technology requires an accurate GHI estimation.
Ground-based measurements from state-of-the-art pyranometers according to the WMO (World Meteorology Organization) standards are still the best data source for GHI observations [4].However, GHI observations and related information such as sunshine duration from meteorological stations are often not accessible from African meteorological agencies due to a poor station network, national data regulations and other reasons [5][6][7].In addition, station maintenance remains a challenge due to high costs, while support from local governments has declined [8].This had a strong negative impact on data quality [9] and continuity in Africa [10].Therefore, obtaining reliable long-term GHI observations and related information from weather stations across the region is a fundamental problem for recent and past periods.This strongly affects reliable GHI information for solar energy projects planning, operation, and quality assessment.Recently, a number of different initiatives such as WASCAL (West African Science Service Centre on Climate Change and Adapted Land Use; [6]), SASSCAL (Southern African Science Service Centre for Climate Change and Adaptive Land Management; [11]) and TAHMO (Trans-African Hydro-Meteorological Observatory; [12,13]) established a relatively dense network of automatic weather stations providing ground-based meteorological measurements at high temporal resolution for many parts of the Africa continent for the first time.
GHI satellite and reanalysis data are essential in supplementing ground-based measurements, particularly in data-scarce regions such as Africa.These datasets provide long-term GHI time series for recent periods in a relatively high spatio-temporal resolution [14,15] in uniform gridded data formats where users can retrieve the nearest grid point for their region of interest.Taking advantage of this, many investigations rely on GHI satellite-based or reanalysis data for the assessment of solar energy potential or climate impact studies [2,[16][17][18][19].
However, to recommend the use of GHI satellite-based data or reanalysis data in the absence of ground-based measurements for these studies, a detailed inter-comparison and validation of these datasets for the region of interest are required.From this point of view, several studies have already carried out an inter-comparison between GHI observational, satellite, and reanalysis data.Most of them suggest that the accuracy of GHI from satellite-based and reanalysis data is lower than ground-based measurements [20].For example, Yang and Bright.
[21] evaluated hourly GHI from 57 radiometric stations of the Baseline Surface Radiation Network (BSRN) distributed across the world with six satellite-based and two reanalysis data in a period of 27 years.They concluded that the satellite-derived hourly GHI performed better than the reanalysis data; and also, cloudy days have a higher bias than clear-sky days.Another study was carried out in the Netherlands by Marchand et al. [22]; where they used a dense 32 observational networks to assess the accuracy of hourly GHI using the Copernicus Atmosphere Monitoring Service version 3.2 (CAMS) and HelioClim-3 version 5 with correlation between 0.94 and 0.98.They showed that both satellite-based data showed a relatively good correlation with the 32 radiometric stations and satisfactorily reproduced the hourly variations of GHI.Another study conducted in Brazil showed that GHI derived from 3 satellite-based datasets could be used as an additional source for solar energy assessment in this region [23] where the relative mean bias of CAMS is about 7%.A recent study by Du et al. [24] evaluated the hourly GHI performance of the second version of the MERRA-2 (Modern-Era Retrospective Analysis for Research and Applications Version 2) reanalysis data compared to 37 in-situ measurements over China under different sky conditions in 2018.In general, MERRA-2 overestimates the hourly GHI over China with a mean bias error of 69.35 W/m 2 .Their results are consistent with [21] where high deviations occur under cloudy conditions.
For sub-Saharan Africa, [4] recently performed an inter-comparison of five datasets (CAMS, ERA5, SARAH-2, MERRA-2 SOLCAST) for hourly GHI, with 13 ground-based data in South Africa, in which the MERRA-2 reanalysis exhibits the weakest performance with a relative mean bias error (rMBE) of 11%.The authors recommended the use of the CAMS (rMBE = 2.14%) and SARAH-2 (Surface Solar Radiation Data Set -Heliosat; rMBE = 2.13%) datasets for solar energy applications in the country.In West Africa, [25] showed that ERA5 provided a good representation of daily GHI compared to ERA-Interim datasets at four  1.

Table 1
The 51 AWSs used in this study with their basic measurement characteristics and pyranometer features.weather stations in Burkina Faso for the year 2017.Later, [26] used three radiometric observations from the African Monsoon Multidisciplinary Analysis program (AMMA) to validate the daily and monthly GHI against the SARAH-2 dataset.On both temporal scales, the SARAH-2 performed relatively well but with notable biases.However, GHI was evaluated on a daily and monthly basis with a limited number of stations in these studies, while hourly GHI data are essential for accurate solar power plant design and planning.Moreover, knowledge of hourly GHI is useful for GHI forecasting [27].A detailed validation process with high-quality data is needed to substitute GHI from ground-based measurements to GHI satellite-based or reanalysis data.To our knowledge, no such study has used hourly GHI from dense observation networks to validate GHI derived from satellite and reanalysis data over West Africa.
Therefore, this study aims to evaluate the performance of hourly GHI derived from MERRA-2, ERA5, SARAH-2 and CAMS data with groundbased data for the year 2020 for solar energy monitoring.For the first time in Africa, 51 automatic weather stations (AWS) are used for hourly GHI assessment.The AWS belongs to four different transboundary and national networks recently established by WASCAL, the Ghana Meteorological Agency (GMet) and the Burkina Faso National Meteorological Agency (ANAM) and partner institutions covering the most critical climate zones (Guinea, Savannah, and the Sahel) in West Africa.The focus of this study is on the evaluation of the different satellite and reanalysis datasets based on observations under different atmospheric conditions: (i) cloudy-sky, (ii) clear-sky and (iii) all-sky.This is realized by using a wide range of performance measures and methods and introducing a novel multi-objective performance measure to select the best performance among the datasets for the region.In addition, the effect of aerosols on the hourly GHI during the Harmattan period over the area is investigated.
The paper is structured as follows.The following section presents the study area, the detailed information on the different datasets, and the methodology used.Section 3 presents the outcomes of the study and highlights the discussion of the various findings of the study.The study ends with conclusions and general recommendations regarding satellite and reanalysis based on GHI information.

Study area
The study focuses on the West African region, particularly Burkina Faso and Ghana (Fig. 1).The region is governed by the West African Monsoon (WAM) which modulates atmospheric processes and triggers most of the rainfall in the region [28].West Africa is characterized by a long dry season and a rainy season (during the summer months) with annual rainfall ranging between 150 and 2500 mm [29].The Harmattan period lasts from late November to mid-March and transports dust from the Sahara Desert across the region [30].The strong environmental transitions from the Guinean forests in the south to the hyper-arid   The red dots indicate the physically possible limit, while the extremely rare limit is in green dots.Sahara Desert in the north, the region can be divided into three distinct climatic zones: Guinea (4  [6,34].Measurements from this network are made at a temporal resolution (average over each 5 min) and standard equipment maintenance such as cleaning radiation sensors is carried out regularly (e.g., twice a month).
GMet operates a surface observation network of 120 weather stations in Ghana, which are well distributed across the country.In late 2018 and early 2019, 22 novel AWS were installed by GMet and radiation measurements of which 15 AWS were are in the current study.The temporal    ANAM has a total of 270 weather stations across the country whereof which 22 were selected in this study, as outlined in Table 1.New AWSs were installed in 2017 in cooperation between the Burkina Faso government and its technical and financial partners.The maintenance schedule of these stations is similar to GMet, and data is recorded at 15min intervals on an average basis.Note that all data recorded in the different AWS are subject to basic quality control (e.g., data format, measurement interval, and data consistency) by different institutions.Accordingly, to the data availability, we have collected raw data for the year 2020 to validate GHI with the datasets from ERA5, MERRA-2, SARAH-2, and CAMS datasets.

SARAH-2 dataset
The satellite dataset used in this study is the second edition of the Surface Solar Radiation Data Set -Heliosat Edition 2 (SARAH-2) from the Satellite Application Facility on Climate Monitoring [35].The SARAH-2 covers the region of ±65 o longitude and ±65 o latitude (Europe, Africa, and the Atlantic Ocean) with a spatial resolution of 0.05 • by 0.05 • (~5 km).The dataset has a temporal resolution of 30 min (instantaneous values) and is available from 1983 to the present.The SARAH-2 products are based on the Heliosat algorithm, which incorporates the LibRadTran radiative transfer model and the MAGICSOL clear sky model to estimate GHI under cloud-free conditions [36,37].
The GHI data used in the SARAH-2 (referred to as surface incoming shortwave radiation) product are calculated using a radiative transfer model from water vapor, surface albedo, a cloud index (from satellite observations), aerosols and ozone.SARAH-2 uses the monthly aerosol climatology from the Monitoring Atmospheric Composition and Climate (MACC) project, which has a spatial resolution of 120 km and is interpolated on the SARAH-2 grid [38].The 30-min instantaneous values of GHI were downloaded from the SARAH-2 database (https://wui.cmsaf.eu/) for the year 2020.The hourly GHI is the average of two 30-min periods within 1 h.

CAMS dataset
The Copernicus Atmosphere Monitoring Service (CAMS) Radiation Service provides solar energy radiation products.Its algorithm for calculating these products is based on the Helliosat-4 approach [39].The method uses the McClear algorithm to estimate GHI under clear-sky conditions [40] and the McCloud model to estimate the attenuation of solar irradiance caused by clouds.The McClear and McCloud models are implemented using the libRadtran radiative transfer model developed by Mayer and Kylling.[37].The radiative transfer model calculates GHI under all-sky conditions by the product of GHI under clear-sky conditions with a clear-sky index, also called the cloud modification factor [39,41].The aerosol optical depth (AOD) inputs are from the CAMS service with a spatial resolution of 40 km and are updated every 3-h.CAMS covers Africa, Europe, the Eastern part of South America, the Middle East and the Atlantic Ocean and has been available from 2004 to the present with a delay of 2 days.The data are accessible in high-temporal resolution and different resolutions (e.g., 1 min, 15 min, hourly, daily, and monthly); users can access the data up to the point of interest.In this study, we used the latest version of CAMS radiation service (version 4.5), which uses a second APOLLO_NG production chain to improve cloud redundancy.We downloaded the 1-min GHI for the 51 AWS sites for the year 2020 and then computed the average hourly GHI values.

MERRA-2 dataset
The Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) is a product of the NASA atmospheric reanalysis [42].MERRA-2 replaces the original MERRA with an improved data assimilation system of the Goddard Earth Observing System Model version 5 (GEOS-5).The GEOS-5 model is coupled with the Goddard Chemistry Aerosol Radiation and Transport (GOCART) model and simulates five types of aerosols: sulfate, dust, sea salt, and black and organic carbon [43,44].The system includes a large-scale prognostic cloud in the moist physics scheme and uses a shortwave and longwave radiation scheme from Chou and Suarez [45] and Chou et al. 46] respectively.MERRA-2 uses real-time bias-corrected AOD inputs from the Advanced Very High Resolution Radiometer (AVHRR) instruments with a spatial resolution of 1.1 km [47].It has a spatial resolution of 0.5 • by 0.625 • (~50 km) with an output of 72 model levels and 42 pressure levels from the surface to 0.01 hPa and a temporal resolution of 1h.The data cover the period from 1980 to present with a lag of 2 months.GHI hourly data were downloaded from the MERRA-2 server for the year 2020.Hourly data in MERRA-2 are averaged over the specified hour and stamped at the central hour, i.e., 00:30 GMT, 01:30 GMT, etc.

ERA5 dataset
ERA5 is the fifth-generation of atmospheric reanalysis from the European Centre for Medium-Range Weather Forecasts (ECMWF; [48]).ERA5 has a spatial resolution of 0.25 • by 0.25 • (~31 km) and a temporal resolution of 1 h.It includes 137 model levels, and 37 pressure levels and covers the entire globe.ERA5 uses the RTTOVv11 model as the radiative transfer model and "McRad" as the radiation scheme, which includes the shortwave and longwave Rapid Radiative Transfer Model for GCM (RRTMG) schemes.ERA5 uses a prescribed monthly climatological aerosol information from the Global Ozone Chemistry Aerosol Radiation and Transport (GOCART) model with a horizontal resolution of 2.5 • longitude by 2 • latitude which includes stratospheric sulfate aerosols [49,50].Over West Africa, the GOCART shows a discrepancy with the observed AOD from AERONET data which is attributed to the strong perturbation of local dust source [44].The ERA5 data are available from 1979 to the present.From the ECMWF platform, we retrieved the hourly GHI, which refers to surface solar radiation for the year 2020.The ERA5 GHI values are hourly expressed in J/m 2 .We divided the accumulated values by 3600 s to get the average GHI values in W/m 2 .The hourly data of GHI in ERA5 are computed as the mean rate of the previous hour.For example, the GHI value at 12:00:00 UTC corresponds to the average GHI from 11:00:00 UTC to 11:59:59 UTC.To ensure consistency with the observation data and other datasets where the hourly averaged is computed on the current time, we adjusted the time to a 1 h shift.
Table 2 shows the different datasets used with their characteristics.We used a linear interpolation technique to determine the radiation information from the ERA5, MERRA-2, and SARAH-2 datasets for corresponding sites of the in-situ measurements.The observational data used in the study have different temporal resolutions (5 min, 10 min and 15 min, see Table 1).To compute the hourly data, the sub-hourly data were averaged using the following steps: 1.If there is a missing date in the time series, the date is added, and the value for GHI is marked as missing.2. All GHI values during nighttime are set to 0, even if there are missing values.3.For the 5-min data, the values of GHI are averaged to an hourly value if 95% of the measurements are available within the specific hour.
Otherwise, the value is set to a missing value.For the 10-and 15-min data, 100% of the measures must be available to calculate hourly GHI values.
To validate the accuracy of the hourly GHI satellite and reanalysis data, reliable ground-based GHI measurements are essential.To ensure the quality of the different AWSs, we applied the techniques shown in Fig. 2. Our first step was to exclude stations with large missing data.Fig. 3 shows the periods with missing hourly GHI data for the different AWS in 2020.The vertical bars indicate missing periods, while the sum of the missing hours and their percentage (in parentheses) can be seen on the right ordinate.Overall, 44 out of 51 stations have no data gaps or only a few missing measurements.However, stations such as Abefiti, Loagri, Aniabisi, Bango Soe, Kpando, Tuna and Gaoua have a much higher percentage of missing data between 5% and 33.5%.For the data quality assessment, these stations were excluded.No gap-filling techniques were applied to stations with less than 5% missing values.All missing data were removed from the station in question and the extracted coordinate of this station were subjected to the same exclusion process in the corresponding satellite and reanalysis datasets.
The second step was to categorize the different AWS based on their respective climate zones (see Appendix Figs.[17][18][19][20].We then excluded stations that differed from their counterparts.Such discrepancies could be caused by shadows, faulty sensors, or calibration problems.We also combined this analysis with the clearness index (Kt) to identify suspicious AWS.For this purpose, we calculated the daily average (K t ) for all AWSs.The K t is defined as the ratio of surface solar irradiance to extraterrestrial solar irradiance G 0 and is expressed as follows: The daily GHI is determined from the hourly GHI if there is no single missing value.
The clearness index has been used in previous studies to identify sky conditions.For instance, Du et al. [24] classified the sky conditions using K t to validate the MERRA-2 hourly dataset for clear-sky and cloudy conditions over China.However, the values of K t used to define cloudy and clear skies vary by location.[52] have used the modified clearness, K t ' introduced by Perez et al. [53]; for clear skies they used 0.65 < K t ' ≤1.On the other hand, [54] have used the diffuse fraction, K d , and established the range of 0 ≤ K d ≤ 0.26 to correspond to clear skies worldwide.This study describes clear-sky when K t ≥ 0.6 and cloudy-sky when 0.12 ≤ K t < 0.35.These values were adopted from previous studies on West Africa [55][56][57].Based on this information, the number of clear-sky days and cloudy days was calculated for each station, and those stations with no realistic clear-sky days throughout the year were removed (see Fig. 21 in the Appendix).After the first and second steps, only 38 stations passed these tests and were used for other quality checks.
The third step was to identify GHI values that are outside the normal range of the 38 AWSs, we, therefore used the extremely rare limit (Eq.( 2)) and the physically possible limit (Eq.( 3)) of GHI measurements from the BSRN guidelines [58].
where I 0 the solar constant (1367 W m − 2 ; [59]) and SZA is the solar zenith angle.For the BSRN's closure tests, the analyses were done when SZA < 80 • to account for the seasonality of sunrise and sunset over the region.Fig. 4 illustrates the quality control of the hourly GHI aggregated data for all stations based on the Eqs.( 2) and (3).The physically possible limit is drawn in red, the extremely rare limit in green.The blue dots indicate the individual hourly GHI measurements for all 38 weather stations.Most data points that fall outside the BRSN interval are for 75 • < SZA < 80 • .These intervals correspond to early morning and late afternoon measurements, i.e., 7am-8am and 5pm-6pm, respectively according to the region.At some stations such as Oualem, Nebou and Mange (see Tab.5 in the Appendix), there are some data points that show a high value of GHI under conditions of low irradiance and high zenith angle.These deviations could be due to interfering reflections from the roof edge in the early morning and late afternoon hours [60].These data points have GHI values that are above the physically possible and extremely rare limits GHI.About 649 (0.44%) such data points were flagged and removed from the analysis.
In the last step, we employed outliers to identify erroneous GHI from the different AWSs.In this study, we analyzed a far outlier for observation, which is calculated as follows [61]: The outlier analysis was based on daily Kt, and we removed from the analysis data that fall outside the upper and lower limit.Fig. 5 shows the interquartile range (in grey) and the upper outlier limit (red dot) and the lower limit (green dot) of the different AWSs.There are some stations where some data points are beyond the designed bound.Consequently, with combination of other AWSs from the same area, Bani was removed from the analysis.After performing all the steps outlined in this study, only 37 AWSs were used to evaluate the performance of GHI, based on satellite and reanalysis data, for the year 2020.

Performance metrics
The performance of the different datasets against the AWSs was assessed using several statistical metrics.We used the mean absolute error (MAE), the root mean square error (RMSE) and their normalized versions (nRMSE and nMAE) as important accuracy measures.In addition, the Pearson's correlation (R) was used to include a skill score in the current analysis.A statistical metric that is sensitive to extreme values is important for evaluating GHI.For that we applied the index of agreement (IOA), which represents the ratio between the mean square error and the potential error.The value of IOA ranges from 0 to 1; 1 means perfect agreement while 0 means no agreement [62].The different statistical metrics are expressed as follows: where P is the reanalysis or satellite data value, O the observation data at timestep i and n the number of data points used for comparison.O and P are the mean values of the observation and reanalysis or satellite data, respectively.
Comparing observations and different datasets using the above statistical metrics can sometimes be challenging to select the best dataset.For example, some datasets may have low RMSE, high correlation, and high IOA, while other datasets may have a low RMSE, low correlation, and low or high IOA compared to their subjects.We included therefore an additional performance measure based on the nRMSE, R and IOA to better determine the overall performance for the different datasets.Based on these metrics, a satellite or reanalysis dataset perfectly fits to the ground-based observations, if the nRMSE = 0, the IOA = 1, and the R = 1.The new overall performance measure (OP) can be expressed as follows: This new coefficient is dimensionless.+1 means that the dataset is perfectly close to the observation, while a negative value means that the dataset is far from the observation.Moreover, the OP provides a unified grade that considers a range of statistical metrics to assess the overall performance of the dataset.It allows a more comprehensive assessment of a dataset's agreement with ground-based observations and gives valuable insight into the performance of a dataset and its suitability for a particular application or assessment.

Evaluation of GHI
The analysis was based on "clear-sky", "cloudy-sky" and "all-sky" conditions.The atmospheric sky condition depends on the observations.An algorithm was developed to identify the days that meet the criteria for average cloudy and clear sky days for different AWSs.Based on the day found in the observation for the sky condition classification, the same day was used as cloudy-sky or clear-sky for the different datasets.Nevertheless, the criteria used may consider the day with aerosol particles present in the atmosphere as a cloudy day.Dust aerosols and carbonaceous aerosols from biomass burning are the main aerosol types over the region.The latter aerosol type is the most important during the winter season (Harmattan period), while dust aerosol dominates in the rest of the year [44].Therefore, we analyzed conditions on cloudy days during the Harmattan period (December-January-February) and on cloudy days during the rainy season (June-July-August).We selected 15 stations to analyze the diurnal variation of GHI.The selection was based on the representativeness of the stations in their respective climatic zones, i.e., we have taken the minimum, maximum, median, 25th percentile and 75th percentile based on the annual mean of GHI.We also used the Taylor diagram [63] and the cumulative distribution function (CDF) to evaluate the different datasets.Finally, we analyzed the performance of the different datasets under different atmospheric conditions at the seasonal level for individual stations and also for the different climate zones.

Performance of reanalysis and satellite-based hourly GHI
The performance of the different datasets varies according to the sky conditions for the 37 AWSs (Fig. 6. a-d).High performance occurs in clear skies, while low performance occurs in cloudy skies for CAMS, ERA5, SARAH-2, and MERRA-2.This performance also differs from dataset to dataset.Under cloudy skies, most data points are on the left side of the 1:1 line, i.e., all datasets overestimate the hourly GHI.The RMSE ranges from 232 to 303 W/m 2 and the MAE varies from 153 to 232 W/m 2 .CAMS shows the lowest RMSE and MAE, while ERA5 gives the highest values.In general, both satellites (CAMS and SARAH-2) show good performance compared to the reanalysis data (ERA5 and MERRA-2).The biases in the reanalyses are higher than those in the satellite data.For example, the MAE in ERA5 is 303 W/m 2 (122.28%) and the SARAH-2 has a value of 238 W/m 2 (96.07%).This discrepancy between the satellite and reanalysis data could be explained by the methodology used to calculate the cloud contents and their optical properties in the radiative transfer model.The cloud contents and their optical properties used in CAMS and SARAH-2 come from satellite observations, while the cloud contents in the reanalysis (ERA5 and MERRA-2) are prognostic clouds [64,65].In addition, the misinterpretation of cloudy skies as clear skies could also be a factor in the poor performance of the reanalysis (Fig. 21 a-d  Under clear skies, the performance of the different datasets improved significantly compared to that under cloudy skies, with a difference of Fig. 13.Similar to Fig. 11, but for all-sky conditions.more than 150 W/m 2 in terms of RMSE (Fig. 6. e-h).This shows how difficult it is for reanalysis and satellite data to reproduce the hourly GHI under cloudy skies.The RMSE, R, and IOA of ERA5 (120 W/m 2 ; 0.89; 0.88), CAMS (119 W/m 2 ; 0.90; 0.88) are comparable, but MERRA-2 (142 W/m 2 ; 0.86; 0.84) shows poor performance under clear-sky conditions.There is good agreement between SARAH-2 and observations.The values of RMSE, MAE, R, and IOA for SARAH-2 are 113 W/m 2 , 84 W/m 2 , 0.92, and 0.89, respectively, indicating that the MAGICSOL clear sky model used in SARAH-2 to derive GHI under cloud-free conditions performs well over the area compared to the other clear sky models used in ERA5, MERRA-2 and CAMS.
For all-sky conditions, CAMS outperforms the datasets from ERA5, MERRA-2, and SARAH-2 in the hourly estimates of GHI (Fig. 6. i-l).MERRA-2 shows poor performance with an RMSE value of 179 W/m 2 (36.49%) and a MAE value of 134 W/m 2 (27.39%).The unsatisfactory performance of MERRA-2 is the result of poor performance under a clear sky.A similar result of poor performance of MERRA-2 in hourly GHI estimation was highlighted in South Africa [4].Moreover, our results are comparable with different sites around the world under all-sky conditions.For example, the study by Yang and Bright.[21] found that the nRMSE values for the hourly GHI of MERRA-2, ERA5, CAMS and SARAH -2 ranged from 8% to 127% under all-sky conditions.Our results are consistent with previous studies that found satellite data to perform better than reanalysis data in estimating GHI [4,21,66,67].The statistical metrics of the datasets under different atmospheric conditions are summarized in Table 3. Fig. 7 shows the Taylor diagram and the cumulative distribution of the hourly GHIs under different sky conditions.The Taylor diagram displays the correlation coefficient, the centralized RMSE and the normalized standard deviation of each dataset relative to observations.A dataset performs well when it is closer to the observation, while a dataset with large differences is far from the observation.From the Taylor diagram, it is clear that the SARAH-2 and CAMS exhibits the best performance in estimating the hourly GHIs under different atmospheric conditions over the area (Fig. 7 a-c).However, the satellite and reanalysis data exhibit poor performance and each source is clustered under cloudy-sky conditions.Moreover, both satellite and reanalysis data miss the shape of the observation and overestimate the hourly values (Fig. 7  d).This shows how difficult it is to mimic the spatio-temporal variation of cloud properties with reanalysis and satellite data.In clear skies, the ERA5, MERRA-2 and CAMS are clustered with a slightly high value of the centered root-mean-square (0.6 W/m 2 ) from MERRA-2 compared to the SARAH-2 dataset where the value is about 0.4 W/m 2 .All datasets are able to capture the pattern of the observation, but the MERRA-2 shows a slight underestimation for values of 400-800 W/m 2 but agrees under allsky conditions (Fig. 7 e-f).Under all-sky conditions, the ERA5, CAMS and SARAH-2 slightly overestimate the observed values of 400-800 W/ m 2 .
To assess how well the different datasets capture the maximum observed GHIs, we used the Kolmogorov-Smirnov (KS) Integral metric.This metric measures the maximum vertical distance between two CDFs.The KS metric ranges between 0 and 1, where 0 indicates that the CDFs are identical.Table 4 displays the significant KS values at a 95% confidence level for different datasets under various sky conditions.When compared to the satellite data, the reanalysis data demonstrate high KS values under cloudy conditions.In other words, the satellite demonstrates the capability of capturing the maximum observed GHIs with low bias compared to reanalysis.Conversely, the reanalysis data exhibit a low bias in capturing the maximum observed GHIs compared to the satellite data under clear skies.Overall, our analysis revealed that the ERA5 (KS = 0.088) and MERRA-2 (KS = 0.036) demonstrate a low bias in capturing the maximum observed GHIs, whereas the SARAH-2 (KS = 0.142) and CAMS (KS = 0.104) exhibit a higher bias under all-sky conditions.
To better understand the poor performance of the different datasets under cloudy skies, Fig. 8 shows a density plot of GHI for cloudy skies during the Harmattan period (DJF) and the rainy season (JJA) over the region.In general, all datasets perform better in the rainy season than in the Harmattan period.In the Harmattan period, the nRMSE value reaches 20-50% of the RMSE values in the rainy season.During the Harmattan period, trade winds transport large amounts of mineral dust from the Chad Basin to the Sahel and the Guinean coast [68].The effect of aerosol could explain the large RMSE, and MAE found over the region under cloudy skies.The effects of aerosols as a source of large uncertainties in the estimation of GHI are well known in the literature [60,69,70].Among the datasets, the MERRA-2 shows the lowest RMSE (331 W/m 2 ), MAE (263 W/m 2 ) during the Harmattan period.The relatively better performance of MERRA-2 in DJF (Harmattan period) is also seen under all skies (Fig. 8).The AOD inputs to MERRA-2 have a spatial resolution of 1.1 km and a temporal resolution of 1h.This suggests that high spatial and temporal resolution of the AOD could improve the estimated hourly GHI over the region.However, the observed large deviation suggests that the reanalysis and satellite data did not correctly estimate the hourly GHI during the dust period.This result is consistent with [24,71].During the rainy season under cloudy-sky (Fig. 7 e-f), the CAMS shows the lowest RMSE (171 W/m 2 ), while the MERRA-2 gives the highest value (270 W/m 2 ).The good performance of SARAH-2 and CAMS under cloudy-sky could be a consequence of their performance during the rainy season.This can be confirmed in Fig. 9(i-l) where both datasets show good performance under all skies compared to that for MERRA-2 and ERA5.In the seasons of MAM (Fig. 9 e-h) and SON (Fig. 9 m-p), the satellite data also outperform the reanalysis data.

Spatial distribution of the nRMSE
Fig. 10 depicts the spatial distribution of the nRMSE over the area for different sky conditions.For a given sky condition, the nRMSE decreases from south to north, i.e., high nRMSEs are in the Guinea zone and low nRMSEs in the Sahel zone.The Sahel zone is known as a zone with low cloud cover, while the Guinea zone is a place with frequent occurrence of clouds and higher humidity throughout the year.This result leads to a similar conclusion where the reanalysis and satellite data show a large bias in the GHI estimate for cloudy regions [21,72].Under cloudy skies, most stations have a high nRMSE in the range of 80-120%.This large bias in cloudy regions could be due to the 3D effect of clouds leading to overshootsa feature that becomes important in the case of patchy cumulus clouds, especially if the clouds have a large height.In particular, the angle of view in each pixel by the satellite could be a relevant factor in this respect.Clouds are 3D structures, and the way they reflect, absorb and scatter light can affect the angle from which the satellite observes them [73].On the other hand, most AWSs show low nRMSE values under clear-sky and all-sky conditions.The nRMSE values under clear-sky are better than those under all-sky conditions.The majority of the stations indicate good coherence with the datasets of the SARAH-2 and CAMS, while the ERA5 and MERRA-2 show relatively poor performance under different atmospheric conditions.The ERA5 has the highest nRMSE in most of the stations under cloudy conditions.The high biases in the ERA5 dataset could be due to overestimation or underestimation of cloud properties as reported in some studies [4,72].However, the good performance of ERA5 has been demonstrated in some regions [66,74,75].The discrepancy of the ERA5 performance in the studied area under cloudy conditions could be due to the low number of weather stations in the region for the ERA5 reanalysis assimilation and/or the representation of cloud properties in the dataset, as the region is located within the Intertropical Convergence Zone (ITCZ).In the region, low-level clouds are common, and it is well known that reanalysis and climate models poorly represent them [76].

Cloudy-sky conditions
The average diurnal variation between the measured and estimated values of GHI for 15 selected stations within the three climate zones under cloudy skies is shown in Fig. 11.It can be observed that the Guinea zone experiences a greater number of cloudy days compared to the Sahel zone.All datasets are able to reproduce the pattern of observed GHI but overestimate the average diurnal variation.The overestimation occurs mainly at midday for all datasets and also in the early morning and late afternoon for some of them.The overestimation in the early morning could be related to cloud cover, as there is stratus in the morning especially on the Guinea coast [77].A minimum of convective activity occurs over the climate zones around noon and the maximum occurs in the late afternoon (~17:00 local time) mainly at latitudes below 9 • N (Guinea zone and some parts of the Savannah zone) and also above 9 • N (some parts of the Savannah zone and the Sahel zone) around 20:00 [77].In the Savannah and Sahel zones, all datasets are able to mimic the late afternoon observation well.In addition, these overestimates of the diurnal GHI pattern could also be due to the suspension of dust particles, especially during the DJF season when the reanalysis and satellite data are challenging to estimate GHI (see Fig. 8 a-d).However, the satellite data show less bias compared to that of the reanalysis data in estimating the maximum observed GHI.This is consistent with the results of Table 3. Overall, the reanalysis and satellite data show how difficult it is to reproduce the average daily variations of the selected stations under cloudy skies.

Clear-sky and all-sky conditions
Figs. 12 and 13 display the aggregate diurnal variations of GHI from the observation and the datasets under clear-sky and all-sky conditions, respectively.Unlike cloudy skies, most of the datasets show a good pattern of the measured GHI in most stations under clear and all skies.The number of clear sky days increases towards the north.In the Guinea zone, the ERA5 and MERRA-2 generally underestimate the maximum of the observation, while the SARAH-2 and CAMS are able to record the maximum under clear skies.In the Savannah and Sahel, most datasets also capture the maximum GHI, whereas the SARAH-2 and CAMS slightly overestimates the maximum.Similarly, in all skies, the SARAH-2 and CAMS slightly overestimates the maximum GHI.This agrees with the KS values previously mentioned (see Table 3) for both clear and allsky conditions.In general, most datasets overestimate the maximum GHI under all-sky conditions in all climate zones, especially in the Guinea zone.This could be the result of an overestimation of the average diurnal variation of GHI under cloudy and/or overcast sky (Kt < 0.2, which is not shown in this study).

Overall performance over different stations
The use of GHI, derived from reanalysis and satellite data, to assess and monitor solar energy is widespread.However, selecting the best product can be a difficult task.Here we present a new overall performance based on the nRMSE, correlation, and IOA (see Eq. ( 12)) to select the best product for the area.The corresponding statistical metrics (nRMSE, nMAE, R, IOA) for each station are given in the Appendix (see Figs. [22][23][24].Fig. 14 shows the OP of the different AWSs under various sky conditions.Under cloudy-sky conditions, all the datasets show a negative value with a maximum of − 1.5 at some stations.This means that the datasets are significantly far from observations.However, the SARAH-2 and CAMS show the lowest OP values compared to that for the ERA5 and MERRA-2 at most stations.Some stations like Oualem, Nebou, Doninga, and Manga show good OP for the CAMS and SARAH-2 datasets with a high positive value especially in Nebou.The OP value is about 0.5, which means that CAMS and SARAH-2 are consistent with the observations.To verify this, Fig. 15 shows the average diurnal variation of four stations under cloudy conditions.We can clearly see that the stations of Nebou, Oualem, Doninga and Manga, which show a high OP value for SARAH-2 and CAMS, are closer to the average diurnal variation of measured GHI in comparison with ERA5 and MERRA-2.We also plotted the average diurnal variation of GHI with stations showing a high negative OP (Ada, Akue, Jirapa, and Dedougou), as shown in Fig. 16.The average diurnal variations of all datasets are far from the observations.The results confirm that it is a good choice to use an overall performance indicator for the selection of datasets for the estimation of GHI.The satellite data, however, show the best performance at most stations under cloudy conditions.
In both clear-sky and all-sky conditions, all stations show a positive value of OP.The OP value of SARAH-2 and CAMS are higher than that of the MERRA-2 and ERA5 datasets, especially in stations that belong to the Guinea and the Savannah zones.In the Sahel region, the OP values are Fig. 17.Performance metrics showing the normalized root-mean-square-error (nRMSE), normalized mean absolute error (nMAE), correlation (R), index of agreement (IOA) and the overall performance (OP) for the hourly GHI in different climate zones and various sky conditions.Panels (a, d, g, j, m) show the performance of different datasets under cloudy skies, while panels (b, e, h, k, n) indicate that for clear skies.The performance under all-skies is depicted in panels (c, f, i, l, o).
W. Sawadogo et al. comparable between the ERA5, CAMS and SARAH-2 under clear skies at some stations.The OP value reaches about 0.7 under clear skies in Oualem, Nebou and Manga for the SARAH-2 dataset.In summary, it can be deduced from this analysis that the satellite data are better than the reanalysis data over the entire area.
We also examined the performance of different datasets at different stations and different seasons, considering different atmospheric conditions.A more detailed analysis can be found in the Figs.25-27 in the Appendix.During the DJF season, when the sky is cloudy, we observed the highest uncertainties at each station.Most datasets showed similar values, but the MERRA-2 dataset showed relatively better results.In contrast, the satellite data performed better than the reanalysis data during the rainy season, which is consistent with the results shown in Fig. 9.Under clear skies, the datasets showed relatively low nRMSE values at each station throughout the year.However, during the JJA season we noted high nRMSE values at some stations, reaching up to 45%.This indicates larger uncertainties during this period.These results are consistent under all-sky conditions.Both the satellite and reanalysis data showed higher nRMSE values during the JJA season than in other seasons.Nevertheless, the satellite data outperformed the reanalysis data at each station overall.

Overall performance over the climate zones
Fig. 17 shows the performance metrics of different datasets in different climate zones for hourly GHI.The values were obtained by aggregating the stations in each climate zone.The Guinea zone and the different sky conditions have high values for nRMSE and nMAE with low correlation and IOA.In Guinea and Savannah, the nRMSE and nMAE values are comparable under cloudy skies.The satellite-derived data outperform the reanalysis data in the Sahel with low nRMSE (~25%) and nMAE (~20%) under cloudy skies.Under cloudy skies, all the zones show a negative OP value; the CAMS and SARAH-2 datasets show the lowest value compared to that of the two-reanalysis datasets.All climate zones exhibit a positive value for clear skies and all skies, with SARAH-2 and CAMS showing a higher value.The ERA5 also performs well for clear skies in all climate zones.When estimating the hourly GHIs, the satellite data outperform the reanalysis data under all-sky conditions in all climatic zones.

Conclusion
The aim of this study was to validate four state-of-the-art satellite and reanalysis (CAMS, SARAH-2, ERA5, and MERRA-2) data using hourly GHI data from ANAM, WASCAL and GMet for the year 2020.To ensure the accuracy of the data, the ground-based measured data were subjected to strict quality controls; only 37 out of 51 stations were finally used as reference stations for analysis.The evaluation was conducted under different weather conditions, including cloudy skies, clear skies and all skies, using a new overall measure to identify the best product for the region, along with other criteria.In addition, the study examined the relationship between aerosol, clouds, and radiation during the Harmattan period and the rainy season.The results of the study can be summarized as follows: • For the combined 37 stations, the hourly GHI values derived from satellite and reanalysis data perform better in an area with cloud-free conditions than in a cloudy region in terms of the RMSE and MAE metrics.• Both satellite-based hourly GHI estimates perform well in cloudy conditions compared to the reanalysis data.• MERRA-2 outperforms SARAH-2, ERA5 and CAMS in estimating hourly GHI during the Harmattan period (DJF season), while SARAH-2 performs best during the rainy season (JJA) under cloudy skies.
• Most datasets capture the average diurnal variation in measured GHI under cloudy and all skies, while overestimating it under cloudy skies.• ERA5 reanalysis also shows a good performance in estimating hourly GHI under clear-sky conditions.• The overall performance measure shows that the SARAH-2 and CAMS data outperforms the ERA5 and MERRA-2 ones in all climate zones of the region and under different atmospheric conditions.
The results of this study showed that the satellite data from SARAH-2 and CAMS perform well in estimating hourly GHI data over the study area and may serve as viable alternative to ground-based measurements for assessing solar energy in West Africa.However, the data showed significant biases, especially during the Harmattan period when dust is more prevalent in the region.Future research should focus on exploring the spatial and temporal resolution of the AOD data from SARAH-2 and CAMS.On the other hand, the atmospheric reanalysis datasets used in this study performed poorly under cloudy conditions compared to the satellite data.It is important to note that the use of a one-year dataset could limit the generality of conclusions between reanalysis and satellite data in the region.For the poor performance of the reanalysis data, we hypothesize that the parameterization of the convective scheme and the interaction between radiation and aerosols in global circulation models needs to be improved to better capture the specific features of the monsoon, such as squall lines in this challenging region [78].In addition to the evaluation of the GHI products, the novel AWS networks with the sub hourly GHI measurements enables many other important applications such as the evaluation of regional climate models, as shown for the Weather Research and Forecasting (WRF) model [79][80][81].The data can also be used for statistical refinement of the satellite and reanalysis products to remove biases and perform spatio-temporal disaggregation of the satellite products to better meet the needs of local applications.In addition, the high-resolution measurements of the novel networks could also improve the reconstruction of weather conditions on the ground and lead to better GHI estimates over West Africa, if this information is directly incorporated into the atmospheric models that to produce reanalysis products.Thus, there are many opportunities to further improve GHI data products for solar energy applications that need to be explored in future studies for West Africa.This will enable better planning and design of PV systems and directly contribute to better meeting the rapidly increasing demand for sustainable electricity in Africa.

Funding
This research is part of the project EnerSHelF, which is funded by the German Federal Ministry of Education and Research as part of the CLIENT II program.Funding reference number: 03SF0567D.It is additionally funded partly by the WASCAL CONCERT project (funding reference number 01LG2089A, BMBF).

Table.5
Number of data point that are above the physically possible limit and the extremely rare limit for different stations.

Fig. 1 .
Fig. 1.Study area showing the topography of the region.The different dots are the location of the automatic weather stations (AWSs).The AWSs in red and black dots are owned by GMet and ANAM, respectively.The blue dots indicate WASCAL's AWSs and the orange locations are jointly operated by WASCAL and GMet.The red dashed lines delineate the different climatic zones.Each number corresponds to the station in Table1.

Fig. 2 .
Fig. 2. Flowchart of the quality control of the ground-based measurement used in this study.

Fig. 4 .
Fig. 4. Quality control of the 38 weather stations based on the Baseline Surface Radiation Network (BSRN).The measured hourly GHI are represented in blue dots.The red dots indicate the physically possible limit, while the extremely rare limit is in green dots.

Fig. 3 .
Fig. 3. Heatmap showing the missing values spread over the whole year for all the radiometric stations.The vertical black indicates a missing hour value.The total number of missing hours and the percentage is given on the right side.

Fig. 6 .
Fig. 6.Density plot of hourly GHI values from different datasets (CAMS, ERA5, SARAH-2, and MERRA-2) against observation for 37 stations using Gaussian kernels with normalized values of 0-1 for different sky conditions.The RMSE, R, IOA, and MAE denote the root-mean-square error, the Pearson correlation, the index of agreement, and the mean absolute error, respectively, while nRMSE and nMAE denote the normalized RMSE and normalized MAE, respectively.

Fig. 5 .
Fig. 5. Boxplot of the daily clearness index (Kt) of the different AWSs for the year 2020.The red dots indicate the upper outlier limit, while the green dots indicate the lower outlier limit of the individual stations.The number indicates the percentage of data points that fall outside the upper and lower outlier limits.

Fig. 7 .
Fig. 7.The panel (a-c) shows the Taylor diagram of different datasets under clear, cloudy and all-sky conditions for the 37 stations.The dashed grey circle indicates the centered mean-square-error.The panel (d-f) presents the cumulative distribution function for the 37 stations under different atmospheric conditions.
resolution of the radiation measurements is an average of 15 min.Due to the number of weather stations across the country, maintenance is done twice a year.The WASCAL-GMet stations belong to a transboundary climate observation network established under the WASCAL programme for different West African countries[6].6 AWSs were donated by WASCAL through a funding from the German Federal Ministry of Education and Research (BMBF) and were installed by a joint team from both WASCAL and GMet in December 2017 after the signature of Memorandum of Understanding (MoU) on data sharing and services development.The implemented stations were handed over to GMet, which manages their maintenance.Measurements are being recorded on an average of every 10 min.

Fig. 8 .
Fig. 8. Similar to Fig.6 but for cloudy days occurring during the Harmattan period (DJF) and the rainy season (JJA).

Fig. 9 .
Fig. 9. Similar to Fig.6 but for all-sky conditions for different seasons.
W. Sawadogo et al.2.3.Methodology2.3.1.Quality controlFig.2outlines a comprehensive process for quality control of the individual weather stations.This process includes visualization of the data, various tests and techniques, identification of unrealistic values and removal of outliers to ensure data quality.These steps help improve the integrity and reliability of the station data used in this study.

Fig. 10 .
Fig. 10.Normalized root-mean-square error (nRMSE) for hourly GHI at each AWS for cloudy-sky, clear-sky, and all-sky conditions and different datasets.Each color point indicates the value of nRMSE represented by the color bar.

Fig. 11 .
Fig. 11.Average diurnal variation of the observed GHI compared with the CAMS, ERA5, SARAH-2, MERRA-2 dataset for the selected stations under cloudy skies.Nb_days means the number of days that fall in clear skies conditions.The grey shaded curve indicates the 95% interval confidence of the measurement.

Fig. 14 .
Fig. 14.Overall performance of hourly GHI for different AWS under cloud (a), clear (b) and all (c) sky conditions.

Fig. 15 .
Fig. 15.Average diurnal variation of the observed GHI compared with the CAMS, ERA5, SARAH-2, MERRA-2 datasets with high positive overall performance (OP) under cloudy skies.Nb_days means the number of days that fall in clear skies conditions.The grey shaded curve indicates the 95% confidence interval of the average diurnal cycle.
Fig. 18.Same ad Fig.17, but within the Savannah zone.

Fig. 20 .
Fig. 20.Bar plot showing the number of clear-sky and cloudy-sky days for different stations.

W
.Sawadogo et al.

Fig. 21 .
Fig. 21.Density plot of the daily clearness index (Kt) from different datasets (CAMS, ERA5, SARAH-2, and MERRA-2) against observation for 37 stations using Gaussian kernels with normalized values of 0-1 for different clear and cloudy skies.The dashed grey line shows the line: 1:1 line.R indicates the Pearson correlation.

Fig. 22 .
Fig. 22. Performance metrics of different datasets at different weather stations under cloudy skies.Panel (a) shows the normalized root-mean-square-error (nRMSE); panel (b) indicates the normalized mean absolute error (nMAE); panel (c) shows the correlation, and panel (d) displays the Index of Agreement (IOA).

Fig. 25 .
Fig. 25.Performance metrics of different datasets at different season under cloudy skies.The number in the heat map shows the number of cloudy days that occur at a given season and station.The empty areas indicate absence of cloudy days.

Table 2
Characteristics of different satellite and reanalysis datasets used in this study.

Table 3
Error metrics of different datasets and atmospheric conditions on GHI of the aggregated 37 stations.The bold number shows the best metric values.
StationNumber of data outside from the BSRN range Number of data outside from the BSRN range in percentage