A review and comparison of surface incident shortwave radiation from multiple data sources: satellite retrievals, reanalysis data and GCM simulations

ABSTRACT Surface incident shortwave radiation (Rs ) can promote the circulation of substance and energy, and the accuracy of its estimation is of great significance for climate studies. The Rs can be acquired from satellite retrievals, reanalysis predictions and general circulation model (GCM) simulations. Although Rs estimates have been evaluated and compared in previous studies, most of them focus on evaluating the Rs estimates over specific regions using ground measurements from limited stations. Therefore, it is essential to comprehensively validate Rs estimates from multiple data sources. In this study, ground measurements of 690 stations from BSRN, GEBA, CMA, GC-NET and buoys were employed to validate the Rs estimates from seven representative products (GLASS, GEWEX-SRB, CERES-EBAF, ERA5, MERRA2, CFSR and CMIP6). The validation results indicated that the selected products overestimated Rs globally, with biases ranged from 0.48 to 21.27 W/m2. The satellite retrievals showed relatively better accuracy among seven datasets compared to ground measurements at the selected stations. Moreover, the selected seven products were all in poor accuracy at high-latitude regions with RMSEs greater than 50 W/m2. The long-term variation trends were also analyzed in this study.


Introduction
As an indispensable energy source received at the surface, surface incident shortwave radiation (R s ) contributes to the balance of global energy budget (Liang et al. 2019;Wild et al. 2012).It also drives the processes of biosphere, hydrosphere, and geochemical cycle by promoting the exchange of possible substances (e.g.energy, water, and so on.)among the surface and atmosphere (Letu et al. 2020;Zhou et al. 2018).Thus, precise estimation of R s is of vital importance for the related researches and applications, such as the primary productivity of ecosystem (Sakamoto et al. 2011), hydrological cycle (Huang, Li, Ma, et al. 2016), solar photovoltaic applications (Cai et al. 2021;Mellit and Kalogirou 2008;Qiu et al. 2022), and others (Yang et al. 2022).
So far, approaches for obtaining R s include the ground-based measurements, simulations from GCMs, reanalysis datasets, and retrievals from satellite observations (Baldocchi et al. 2001;Decker et al. 2012;Hou et al. 2020;Ma and Pinker 2012;Wild 2008;Zhang et al. 2014;Zhang et al. 2019;Zhang et al. 2020).The stations from various networks can provide high temporal R s measurements with high accuracy.The ground observations are sparsely distributed, and the ground observations may also stopped or missed owing to technique problems and other probable reasons.Therefore, it cannot match the requirements of the spatiotemporal continuous data well on regional scales, let alone on global scales (Yan et al. 2011;Zhang et al. 2015b).Thus, ground observations are usually used for evaluating the R s estimates from other data sources.
The GCMs is indispensable for the examination and analysis of previous and future climate change.The Coupled Model Intercomparison Project (CMIP) organizes the design and distribution of GCMs, and the R s estimates are provided with long time series (Eyring et al. 2019).Over the past decades, global R s simulations of CMIP3, CMIP5, and CMIP6 GCMs have been successively developed, but uncertainties exist in a quantity of sources (such as solar spectral irradiance variability, observational estimation and model formulation) while designing GCMs, which may result in the uncertainty of R s simulations (Hood et al. 2015).The comparison between different version of CMIP R s estimates and the validations against observations at ground stations have been attempted to assess the performance of GCMs (Allen, Norris, and Wild 2013;Jiao et al. 2022;Li et al. 2013;Ma, Wang, and Wild 2015;Mackie et al. 2020;Wild 2008;Wild 2017;Wild et al. 2012;Wild et al. 2014).As reported by Song, Chung, and Shahid (2022), the performance capabilities of CMIP3, CMIP5 and CMIP6 GCM simulations were continuously improved.The mean biases of global average CMIP5 R s estimates were reduced by about 32% in comparison with CMIP3 (Li et al. 2013), and the CMIP6 GCMs were remarkably improved in spatial resolution, representation of physical parameters and supplementary earth system processes compared to past generations (Eyring et al. 2019).However, they still inclined to overestimate the R s .The CMIP3 GCMs performed the obvious overestimation of the R s by about 6 W/m 2 (Wild 2008), then, some individual models of CMIP5 GCMs R s estimates were found deviating from the best estimate of CERES-EBAF by up to 6 W/m 2 , and the majority of models were tended to overestimate R s compared to ground measurements (Wild et al. 2012;Wild et al. 2014).The similar overestimation tendency of CMIP6 R s estimates were found by Jiao et al. (2022).These evaluation results further indicated that the overestimation phase of R s is still a common issue in the latest CMIP6 R s estimates.In addition, researches also showed that the performance of CMIP R s estimates differ in regions, such as land, ocean and other divided areas (Li et al. 2013;Zou et al. 2019).
Besides GCMs, reanalysis datasets are also feasible sources to obtain R s .The reanalysis merges available measurements (to provide the best simulation of the state of the atmosphere in which they are taken and constrain the calculation of the model) and the atmospheric geophysical fluid-dynamical model to achieve the optimization of the forecasts of variables (Betts et al. 2006;Zhao, Lee, and Liu 2013).The obvious characteristic of reanalysis data is that they not only have high temporal resolution, but also have long-term complete data with global coverage, which is beneficial for long-term analysis, though their spatial resolution is relatively coarse (Huang et al. 2019;Jiang et al. 2019).The ERA-Interim, ECMWF Reanalysis v5 (ERA5), Modern-Era Retrospective Analysis for Research and Applications (MERRA), MERRA-2, Climate Forecast System Reanalysis (CFSR), the Japanese 55-year Reanalysis (JRA-55) and the Global Land Data Assimilation System (GLDAS) are all widely used reanalysis data.Similar to the GCMs R s estimates, the predictions of R s from these reanalysis were also evaluated using ground-based observations from different networks (Decker et al. 2012;Feng and Wang 2018;Jia et al. 2013;Ladd 2002;Wang, Wang, and Xue 2021b;Zhang et al. 2016).The quality of reanalysis is largely dependent on the key variables in model, especially cloud and aerosol (Decker et al. 2012;Yu et al. 2021).This may lead to the deficiencies in R s estimates of reanalysis, and the deviations were also exhibited in many studies (Feng and Wang 2019;Kennedy et al. 2012;Xia et al. 2006).The R s data from National Center for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) were found higher than observations of buoys by 70-80 W/m 2 at two sites in Bering Sea, while overestimated R s by 20 W/m 2 at one site in northeast Pacific (Ladd 2002).On account of the underestimation of cloud fraction, all the five reanalysis predictions (ERAI, JRA-55, CFSR, MERRA and MERRA-2) overestimated the mutiyear mean R s by 24.1-40W/m 2 over China (Feng and Wang 2019).Kennedy et al. (2012) using Barrow and Ny-Alesund surface stations at Arctic to access R s of MERRA2, CFSR, NOAA's Twentieth Century Reanalysis Project (20CR), NCEP-Department of Energy (DOE)'s Reanalysis II (R2) and ERA-Interim, and found that the monthly R s of R2 showed considerable bias over 90 W/m 2 in June.The evaluation of five reanalysis products over the Qinghai-Tibet Plateau, Antarctic and Arctic also proved that performance of reanalysis R s predictions also differ from regions (Wang, Wang, and Xue 2021).
As the technology of satellite and remote sensing becomes more and more advanced, the satellite retrievals have grown into one of the most significant methods for estimating R s (Babar, Graversen, and Boström 2019;Hou et al. 2020).The competence of capturing spatial distribution and dynamic evolution of elements such as clouds gives satellites opportunities of deriving regional or global R s on the basis of physical or statistical models (Huang et al. 2019;Yu et al. 2021).Over the past few decades, a quantity of remotely sensed R s products including the International Satellite Cloud Climatology Project-Flux Data (ISCCP-FD), the Clouds and Earth's Radiant Energy System (CERES)-Synoptic Radiative Fluxes and Clouds (SYN) and Energy Balanced and Filled (EBAF), the Moderate Resolution Imaging Spectroradiometer (MODIS) level-3 products (MOD18, MYD18 and MCD18), the Global Energy and Water Cycle Experiments-Surface Radiation Budget (GEWEX-SRB) and the Global LAnd Surface Satellite (GLASS) products that based on the retrieving methods have been generated and evaluated (Gui et al. 2010;Huang et al. 2013;Huang et al. 2016a;Jia et al. 2013;Li, Wang, and Liang 2021;Sun et al. 2018;Tang et al. 2016;Tang et al. 2021;Wang et al. 2021a;Xia et al. 2006;Yan et al. 2011;Yu, Wang, and Shi 2018;Zhang et al. 2015a;Zhang et al. 2019).Though some satellite-based R s products were generally consistent with observations and spatial patterns of ground stations, uncertainties still exist in certain regions (Gui et al. 2010;Zhang, Liang, et al. 2015).Sun et al. (2018) concluded that the GEWEX-SRB, ISCCP-FD and CERES-SYN R s were less accurate in Arctic, given the root mean square errors (RMSEs) and mean absolute errors (MAEs) were far over 20 W/m 2 at most of the stations.Similarly, a comparison of five satellite retrievals (BESS, MCD18A1, GLASS, CLARA-A2 and CERES-SYN) showed the accuracy of these R s products were about 7 W/m 2 lower over high-latitude regions (Li, Wang, and Liang 2021).Besides, researchers also found some satellite retrievals tended to overestimate the R s in China, and the mean bias over southern China increased to 17-29 W/m 2 , about two or three times higher than that in northern areas (Jia et al. 2013;Xia et al. 2006).The performance of satellite retrievals rests with the theory of retrieving methods and state of surface and atmosphere (Wang, Wang, and Xue 2021), thus, the R s products and region chosen to study lead to the deviation of results.
Global R s products which have been developed and released provided foundation for relevant large-scale studies, but before application, they still required extensive assessments.Previous evaluation mostly focused on single type of data source individually, or the region and the number of validating stations were limited.A systematic review and comparison specialized in typical R s datasets including satellite retrievals, reanalysis data and GCM simulations is still lacking.This study initially conducted comprehensive verification and comparison of diverse and representative R s datasets through abundant stations as possible.The R s estimates from seven representative global products, including three derived from satellites (GLASS, GEWEX-SRB and CERES-EBAF), three reanalysis predictions (ERA5, MERRA2 and CFSR) and CMIP6 GCM simulations were evaluated using R s measurements collected from a total of 690 stations from five independent ground measurement networks (the Baseline Surface Radiation Network [BSRN], the Global Energy Balance Archive [GEBA], the China Meteorological Administration [CMA], buoys and the Greenland Climate Network [GC-NET]).Additionally, these products are further compared through spatial distributions, annual means and long-term variation trends of R s to assess the disparities and consistencies.The remainder of this paper is constructed as follows: the R s we used from ground measurements, satellite retrievals, reanalysis predictions and CMIP6 GCM simulations are briefly depicted in Section 2. Section 3 introduces the evaluating results based on ground measurements and presents the corresponding analysis about spatial distribution, annual means and long-term trends of R s .At last, the summary and conclusions are discussed in Section 4.

Data
The twelve datasets used in this study include ground measurements, satellite retrievals, reanalysis and GCM simulations.Table 1 summaries the detailed information of selected R s products from different data sources.

Ground measurements
The surface observations of stations we used to assess the R s estimates were derived from five data sources: BSRN, GEBA, CMA, buoys and GC-NET.
The GEBA is a database that centrally records the energy fluxes that are measured on a global scale at the surface, which is developed and preserved at ETH Zurich (Gilgen and Ohmura 1999;Gilgen, Wild, and Ohmura 1998).The R s of GEBA is the most widely measured component, which has been measured by pyranometers and widely used for the evaluation of R s estimates (Wild et al. 2017).The BSRN is a new radiometric global network established by the World Climate Research Programme (WCRP), which is aiming at providing high-quality and high-temporal-resolution radiation ground measurements (Ohmura et al. 1998).The R s of BSRN is directly measured using pyranometers and has been converted into daily and monthly data.The China Meteorological Administration (CMA) releases daily and monthly meteorological measurements at 122 stations, and the daily radiation data has gone through the check of the spatial and temporal consistency and manual inspection and correction (Tang et al. 2010;Zhang et al. 2015a).The buoy networks located in tropical oceans including the Prediction and Research Moored Array in the Tropical Atlantic (PIRATA) (Bourlès et al. 2008), the Tropical Atmosphere Ocean/Triangle Trans-Ocean Buoy Network (TAO/TRITON) (McPhaden et al. 1998) and the Research Moored Array for African-Asian-Australian Monsoon Analysis and Prediction (RAMA) (McPhaden et al. 2009).These buoys vary from 25°S to 25°N in latitudes and the corresponding data has undergone rigorous quality control procedures (Medovaya 2002).Additionally, we also used three buoys located over relatively high-latitude oceans in our research, including the KEO, Papa and ARC buoys.The stations of GC-NET distribute over the Greenland and provide records such as radiation observations on the ice sheet from 2000 to the present (Li, Wang, and Liang 2021).These measurements at highlatitude regions have been calibrated and make it possible to study the performance of products under particular conditions (Li et al. 2022).

Satellite retrievals
The GLASS products are remote sensing inversion products with long-term temporal coverage, high precision and high spatial resolution (Liang et al. 2021).There are two GLASS R s products derived from different data sources and methods.The GLASS-MODIS R s was derived on the basis of hybrid algorithm utilizing the MODIS top-of-atmosphere reflectance (Zhang et al. 2019), but only covering the global land.The GLASS-AVHRR R s was estimated based on the improved look-up table algorithm using the AVHRR top of atmosphere radiance and other ancillary datasets (Zhang et al. 2014).To keep the consistency of coverage with R s products from other data sources, we chose the GLASS-AVHRR as the research data.The latest GEWEX-SRB R s released by NASA is the version 3.0, which is achieved by radiative transfer model modified by updated shortwave algorithm from the University of Maryland (Pinker and Laszlo 1992;Wu and Fu 2011).The major inputs of GEWEX-SRB 3.0 include spectral albedo, cloud optical depth, cloud cover, temperature, moisture and other atmospheric compositions (Pinker and Laszlo 1992).The CERES-EBAF products were improved on the basis of CERES-SYN and applied for the energy budget estimation and climate evaluation (Di Biagio et al. 2021;Ham et al. 2018;Loeb et al. 2018).The R s of CERES-EBAF was calculated using the cloud and aerosol properties derived from instruments on the A-train constellation, based on the radiative transfer model with k-distribution and correlated-k for radiation (Kato et al. 2013).Other input parameters include the humidity, temperature, ozone amounts, TOA albedo and emissivity (Zhang, Liang, et al. 2015).

Reanalysis products
The ERA5 is the fifth generation of ECMWF and the successor of ERA-Interim, which combines plenty of historical observations and provide various types of global reanalysis datasets such as atmospheric and surface data with diverse temporal and spatial resolution (Hersbach et al. 2020).The ERA5 applied the updated Integrated Forecasting System 'cycle 41r2' and increased an amount of assimilated data based on the 12-hourly 4DVar assimilation model, reaching the increase of spatial grid, time resolution and vertical levels (Hersbach et al. 2020;Urraca et al. 2018).The MERRA2 released by NASA's GMAO is the improvement of MERRA, updating the observing system and solving the limitations in the assimilation of the newest sources of satellite data (Gelaro et al. 2017).The substantial upgrades of MERRA2 were the use of a new version of the GEOS-5 atmospheric model and the assimilation of aerosol data (Molod et al. 2015).The MERRA2 had enhanced the accuracy of simulation and performed more validation efforts (Jiang et al. 2015;Molod et al. 2015).The calculation and processes of R s from MERRA2 was specifically described by Suarez and Chou (1999).The CFSR developed by the National Centers for Environmental Prediction (NCEP) is an updated dataset (Saha et al. 2010).Compared with previous reanalysis data released by NCEP, it has higher spatial resolution and take the guess fields in use (Tahir et al. 2021).The radiative transfer model used in CFSR uses an advanced cloud-radiation interaction scheme, and the water vapor, ozone, carbon dioxide and atmospheric aerosols have been incorporated in the model.The R s of CFSR was parameterized using random cloud overlap according to the NASA approach (Chou et al. 1998).

CMIP6 GCMs
The CMIP6 provides numerous GCM simulations developed and maintained by different institutions globally, and the climate models are driven by a new set of scenarios.The R s was simulated considering natural and anthropogenic forcing such as aerosol loadings and land use (Wild 2020).
The monthly mean R s of 52 GCMs used in this research were all selected from CMIP6, and their details were displayed in Table 2.The historical simulations were available with the ensemble member 'r1i1p1f1' in 1850-2014, and we chose the period of 1983-2014 to match others products.Given the disparity of spatial resolution from each CMIP6 GCMs, we unified all the models into 1°×1°w ith bilinear interpolation method, just like existing studies (Kim et al. 2020;Yuan et al. 2022;Zhou et al. 2022) so that subsequent analysis could be performed by averaging the individual models.

Results and analysis
The seven selected R s estimates from multiple data sources were assessed using ground measurements gathered from five networks during the period of 2001-2010.Since the spatial resolution of the selected R s products ranged from 0.05°×0.05°to2.8°×2.8°,they were all resampled into 1°×1°using bilinear interpolation method to ensure their consistency.All these R s products provided the monthly mean values except GLASS R s products.The GLASS monthly mean R s were calculated by averaging the original daily R s values.

Evaluation using ground measurements
For the purpose of evaluating and analyzing the selected seven global representative R s products, measurements of a total of 690 stations (showed in Figure 1) were collected to achieve the validation, including 43 stations from BSRN, 455 stations from GEBA, 96 stations from CMA, 79 buoys and 17 stations from GC-NET.
Figure 2 presents the scatterplots of estimated R s from seven representative monthly products against the records at 690 stations.The scatterplots of R s estimates against each network are also plotted and the statistical information of five networks is recapitulated in Table 3.For all networks, the range of correlation coefficient (R), RMSE and bias were 0.93-0.97,20.28-36.14 and 0.48-21.27W/m 2 , respectively.The GLASS and CERES-EBAF R s performed better, with highest R (0.96 and 0.97), lowest bias (0.48 and 5.90 W/m 2 ) and lowest RMSE (21.11 and 20.28 W/m 2 ).In contrast, the R s estimates from MERRA2 and CFSR were less accurate with RMSEs of 36.14 and 34.49W/m 2 .All of these products displayed overestimation of R s , and the MERRA2 was especially significant with a bias of 21.27 W/m 2 .
We could see that for different networks, the R of the observed R s versus estimated monthly R s varied from 0.54 to 0.99, the bias varied from −23.54 to 36.63 W/m 2 and the RMSE varied from 12.91 to 68.36 W/m 2 , which exhibited substantial differences.Among these products, the GLASS R s had the minimum RMSE in CMA (18.89 W/m 2 ) and GEBA (19.46 W/m 2 ), while the CERES-EBAF R s had the minimum RMSE in BSRN (14.07 W/m 2 ), GC-NET (55.68 W/m 2 ) and buoys (12.91 W/m 2 ).The maximum RMSE at BSRN, CMA and GEBA stations was 23.70, 45.99 and 32.80 W/m 2 for MERRA2, and the GLASS and CFSR showed the maximum RMSE at GC-NET and buoys with values of 68.36 and 49.91 W/m 2 .All of the highest RMSE (over than 50 W/m 2 ) of R s products.This suggested that the R s at high-latitude areas may be less accurate, which was consistent with the result found by Sun et al. (2018).Obviously, all the R s estimates presented the positive bias at CMA and GEBA stations, especially the reanalysis predictions including ERA5, MERRA2 and CFSR, showing apparent overestimation of R s .But at stations of GC-NET, the R s products underestimated R s with the negative bias ranged from −23.54 to −1.71 W/m 2 , with an exception of CERES-EBAF and CFSR.The lower R, higher values of RMSE at buoys indicated relatively higher uncertainty of MERRA2 and CFSR R s over the ocean compared to other products.
Besides the evaluation at multiple stations, Figures 3 and 4 illustrate the RMSE and bias of monthly R s from seven products for each station.The RMSEs of R s from satellite retrievals were lower (less than 20 W/m 2 ) at more than 65% of the stations (510, 467 and 512 stations for CERES-EBAF, GEWEX-SRB and GLASS).The ERA5 R s showed higher RMSEs over China but displayed similarities over other regions compared to satellite retrievals.By contrast, 469 (68.0%), 546 (79.1%) and 450 (65.2%) stations had RMSEs greater than 20 W/m 2 for CMIP6 GCMs, MERRA2 and CFSR R s , respectively (Figure 3), and their higher RMSEs were mainly found at stations distributed in China, Europe and ocean.It is apparent that the RMSEs of reanalysis and CMIP6 were overall higher than satellite retrievals.
Among the selected seven R s products, only GLASS showed negative bias over half of the stations (370 stations accounting for 53.6%), which tended to underestimate R s .But the absolute values of biases were lower (less than 10 W/m 2 ) at most of the stations (67.2%), which indicated the higher accuracy at these stations.Other six R s estimates were overestimated at more than 70% of the stations.For CERES-EBAF and GEWEX-SRB, 284 (41.1%) and 268 (38.8%) stations showed positive bias which is lower than 10 W/m 2 .For R s from reanalysis and CMIP6 GCMs, positive bias lower than 10 W/m 2 was found at a small part of stations.But 38.7% (267 out of 690) and 53.2% (367 out of 690) stations displayed bias greater than 20 W/m 2 for CFSR and MERRA2 as shown in Figure 4, and most of the serious overestimations or underestimations were found in low-latitude and mid-latitude regions.

Spatial distributions
The comparison of globally distributed multiyear (2001-2010) mean R s estimates from seven products are exhibited in Figure 5.It is clear that the values of R s at low-latitude regions were generally higher than that at high-latitude regions.The maximum values of the South Pole were found larger than that of the North Pole, which may owing to the relatively smaller Earth-Sun distance, lesser cloud cover and drier atmosphere (Hatzianastassiou et al. 2005).In low-latitude regions, all of the products yield higher R s over central Pacific Ocean, the western coastline of South America, central Atlantic Ocean, the coastal area of Africa and most of Australia.The comparison among these R s estimates also reveals several differences.The GLASS almost showed the lower R s at all latitudinal zones.In contrast, the CFSR produced the highest R s within the low-latitude oceans.Overall, the high value of R s of reanalysis predictions, especially for MERRA2 and CFSR, were higher than that of satellite retrievals.4. As seen, the annual mean R s of GLASS was the lowest over the globe, land and ocean, with values of 177.1, 179.3 and 175.5 W/m 2 , respectively.The CFSR displayed the highest mean R s , which is 191.3 and 191.5 W/m 2 in globe and ocean, while MERRA2 was the highest of 196.0 W/m 2 in land.The maximum, minimum and mean R s estimates over land of most of the products tended to be higher than those over ocean, except for the CERES-EBAF and CFSR.For ERA5, CERES-EBAF and CMIP6 GCMs, the maximum and minimum values of R s were close, which means that the differences in annual mean R s of these products were minor.For a better inter-comparison of these R s estimates, the annual mean R s of each product at different latitude zones were also calculated and listed in Table 5. Obviously, the mean R s decreased gradually from low-latitude zones to high-latitude zones, and the mean values of the South Pole were significantly higher than that of the North Pole, which were consistent with the results in Section 3.2.For all these selected products, the minimum mean R s were found in the latitude span of  60°N to 90°N, while the maximums were found in the latitude span of 0°S to 30°S.The average R s of CFSR was the highest at low latitude zones, while the highest at middle latitude zones were average R s of MERRA2 and ERA5, and the average R s of GLASS were the lowest at different latitude zones except in 60°S to 90°S.The annual R s anomaly of each product over globe, land and ocean is displayed in Figure 7, and the variation trends of these datasets in two different phases are summarized in Table 6.Apparently, diverse trends were shown for these dissimilar R s products during different phases.Significant decreasing trends of R s were found for GLASS from 1984 to 2018 (−2.3, −1.5 and −3.1 W/m 2 per decade) and from 2001 to 2010 (−2.5, −2.4 and −3.1 W/m 2 per decade) over globe, land and ocean.Some NOAA satellites, such as NOAA-14/15/16, show significant orbital drifts (changes of local equator crossing time) over time, which may affect the long-term R s trend derived using AVHRR data.This may be the reason why the annual mean R s of GLASS changed greater.But the long-term analysis conducted in this study was on the basis of daily integrated data and the missing data were filled with climatology data, which might also contribute to mitigate the inconsistency.The MERRA2 R s also had remarkable decreasing trends in two phases over globe, but the slope over land and ocean were different during two phases.By contrast, the R s estimates from CERES-EBAF, ERA5 and CMIP6 both had flat trends, though some may not significant.All of these R s products were decreasing during 2001-2010 in globe and land, while for ocean, only the R s of GEWEX-SRB and CFSR showed weak brightening trends but not significant.

Annual mean and long-term trends
Figure 8 depicts the spatial pattern of variation trends at each grid of seven R s products during 2001-2010.Most of the regions over globe showed slight variation trends (not significant) for all these R s datasets.Annual mean R s significantly decreased mainly in the northeast of Australia for all the satellite retrievals and reanalysis predictions, while the significant upward trends of R s occurred in the western Pacific.Though there were similarities, the great differences among Table 6.Trend comparison of annual mean R s from seven products over globe, land and ocean during the whole time series of each product and the period of 2001-2010.
these products still couldn't be neglected, which indicated that these R s datasets may not be suitable for long-term analysis.

Summary and conclusions
Having collected the available measurements of R s at 690 stations from BSRN, GEBA, CMA, GC-NET and Buoys during 2001-2010, this study evaluated the global estimates of R s from seven representative datasets (GLASS, GEWEX-SRB, CERES-EBAF, ERA5, MERRA2, CFSR and CMIP6) including satellite retrievals, reanalysis predictions and GCM simulations at monthly scales.The performance of these R s products was compared with each other, and the inter-comparison about spatial distribution, annual means and long-term trends during 2001-2010 were also provided in this study.
The spatial resolution of the selected products varied from 0.05°×0.05°to2.8°×2.8°,thus, all the R s products were aggregated to 1°×1°resolution to ensure their consistency.Overall, these positive biases at 690 stations of seven R s products meant that they all tended to overestimate the R s , and the relatively higher RMSEs occurred in MERRA2 and CFSR R s , which meant their lower accuracy.Besides the proposed methods for estimating R s , many other factors would affect the accuracy of R s assessment.First of all, the input data used in the process of estimating R s for different models is one of the important factors.As is known to all, the cloud and aerosol datasets are the inputs of retrieving algorithms, prediction models and simulation models normally (Li, Wang, and Liang 2021;  Stubenrauch et al. 2013;Xia et al. 2006;Zhang et al. 2016).Other atmospheric inputs such as water vapor, total precipitable water and snow are also the influencing factors about R s (Hatzianastassiou et al. 2005;Huang et al. 2019).Secondly, the quality of the ground R s observations are also one of the potential error sources of the evaluation results (Lu, Wang, et al. 2023a;Lu, Zhang, et al. 2023).It was reported that the uncertainty of the equipment and problems of operation, and the quality control before releasing R s observations might achieve the quality assurance (Song et al. 2020).Additionally, the deviations between observations and the R s products may also be caused by the lack of the spatial representativeness of in-situ stations (Hakuba et al. 2013;Qin et al. 2020).The spatial nature of ground-based measurements and that of R s products are totally different, the former is point-specific and the latter is grid-level scale.The spatial and temporal averaging may be a possible solution to address the issues of spatial representativeness (Hakuba et al. 2013;Huang et al. 2016a).
According to the validation results, all of these selected R s products exhibited much higher RMSEs (over than 50 W/m 2 ) at GC-NET stations.It directly indicated the lower accuracy of the selected R s products at high-latitude regions.The possible reasons for this phenomenon are as follows.Firstly, most of the radiative transfer models assume that the atmosphere is plane-parallel, which leads to the low accuracy of R s at high latitudes owing to the large solar zenith.Meanwhile, the snow and clouds are both the bright target in remote sensing images.Most of the high latitudes are snow-covered areas, thus the difficulty in distinguishing snow from clouds may result in the error in inversion parameters, and further lead to the lower accuracy of R s (Sun et al. 2018).In addition, the surface observations at high latitudes we used were mainly from GC-NET.The difficulty of maintaining the instrument and the quality control of the measurements at high latitude areas may also affect the verification accuracy of the R s .It was also found that the R s of most of the selected products showed higher biases at CMA stations than that at stations from other networks.The quality of measurements at CMA stations or the underestimation of clouds and aerosols in China are potential error sources of this issue (Urraca et al. 2018;Zhang, Liang, et al. 2015).For R s predictions from reanalysis data, the biases of monthly mean R s estimates at CMA stations were greater that 20 W/m 2 .We also validated the monthly R s at individual stations, and the RMSEs of satellite retrievals at most of the stations were lower than those of reanalysis predictions and GCM simulations, indicating the accuracy of satellite retrievals was relatively higher.The higher biases of reanalysis predictions also proved that they overestimated the R s to a larger extent compared to the satellite retrievals and GCM simulations.
Geographical distribution of ten-year (2001-2010) mean R s from seven products were displayed for comparison.Higher R s values over globe mainly distributed in low-latitude regions while lower values were found in high-latitude regions.The selected seven datasets also showed similar characteristics in spatial distribution of R s , but some differences existed among their mean values.The multi-year mean R s of these products varied from 177.1 to 191.3 W/m 2 , from 179.3 to 196.0 W/ m 2 and from 175.5 to 191.5 W/m 2 over globe, land and ocean, respectively.The lowest mean values of R s were seen from GLASS except in southern high-latitudes, while the highest R s were found from ERA5, MERRA2 and CFSR at low latitudes and middle latitudes.
The long-term variation trends of R s estimates were also analyzed.The R s of GLASS and MERRA2 were significantly decreased (over 1.5 W/m 2 per decade) over globe, while the trends of other products were relatively flat (not significant).Over half of the R s products displayed clearly decreased trends over land in the period of 2001-2010, but only GLASS showed the same over ocean.For specific region, the spatial distribution of multi-year (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010) trends also indicated great differences among these products.Overall, these R s products may not suitable for long-term analysis.
We comprehensively reviewed and compared seven selected representative R s products.It would be desirable to obtain the accurate input datasets of these R s products such as cloud, aerosol and other atmospheric parameters in the future, which would be beneficial to enhance the precision of R s estimates.Meanwhile, it also provides a chance of further exploring the reasons of the accuracy difference between different R s products.

Figure 1 .
Figure 1.Spatial distributions of the observation stations.

Figure 2 .
Figure 2. Evaluation results of monthly R s estimates from seven products against the ground measurements from BSRN, CMA, GEBA, GC-NET and buoys.

Figure 6
Figure 6 depicts the box plots of annual mean R s of seven products during the period of 2001-2010, and the detailed values over globe, land and ocean are summarized in Table4.As seen, the annual mean R s of GLASS was the lowest over the globe, land and ocean, with values of 177.1, 179.3 and 175.5 W/m 2 , respectively.The CFSR displayed the highest mean R s , which is 191.3 and 191.5 W/m 2 in globe and ocean, while MERRA2 was the highest of 196.0 W/m 2 in land.The maximum,

Figure 6 .
Figure 6.Box plots of ten-year (2001-2010) annual mean R s in (a) global; (b) land; (c) ocean of the selected seven products.For each box, the blue point is the mean R s , and the circle is the outliers.The central line represents the median value, while the lower and upper edge symbolize the 25th (v 1 ) and 75th (v 3 ) percentiles, respectively.The top line is calculated by [v 3 + 1.5 × (v 3 − v 1 )] while the bottom line is [v 1 − 1.5 × (v 3 − v 1 )].

Figure A2 .
Figure A2.Evaluation results of monthly R s estimates from seven products against the ground measurements from CMA.

Figure A3 .
Figure A3.Evaluation results of monthly R s estimates from seven products against the ground measurements from GEBA.

Figure A4 .
Figure A4.Evaluation results of monthly R s estimates from seven products against the ground measurements from GC-NET.

Figure A5 .
Figure A5.Evaluation results of monthly R s estimates from seven products against the ground measurements from buoys.

Table 1 .
The detailed information of various R s products used in this study.

Table 2 .
The detailed information of CMIP6 GCMs used in this study.

Table 3 .
Summary of performance statistics for surface measurements of five networks and seven monthly R s estimates from 2001 to 2010.The units of Bias and RMSE are W/m 2 .

Table 4 .
Overview of the maximum, minimum and mean value of R s over globe, land and ocean of the seven products during 2001-2010 (Units are W/m 2 ).

Table 5 .
The average R s of seven products under different latitude zones during 2001-2010 (Units are W/m 2 ).