Interactive comment on “ Evaluation of cloud fraction and its radiative effect simulated by IPCC AR 4 global models against ARM surface observations ”

This study evaluates the representation cloud fraction and related radiative effects in a number of climate models used in the IPCC fourth assessment report (AR4). Clouds pose a major source of uncertainty in climate models and their projections of future climate evolution, and therefore a comprehensive assessment of the ability of the models to reproduce clouds under present day conditions is a worthwhile (and necessary) endeavor. While many studies use satellite data to assess clouds in climate models, here the authors focus on surface based cloud observations, by including 3 observa-

mensions of cloud in addition to cloud amount, such as cloud optical thickness and/or cloud height, have a similar magnitude of disparity as TCF within the GCMs, and suggests that the better agreement among GCMs in solar radiative fluxes could be a result of compensating effects from errors in cloud vertical structure, overlap assumption, cloud optical depth and/or cloud fraction.The internal variability of CF simulated in ensemble runs with the same model is minimal.Similar deviation patterns between inter-model and modelmeasurement comparisons suggest that the climate models tend to generate larger biases against observations for those variables with larger inter-model deviation.
The GCM performance in simulating the probability distribution, transmissivity and vertical profiles of cloud are comprehensively evaluated over the three ARM sites.The GCMs perform better at SGP than at the other two sites in simulating the seasonal variation and probability distribution of TCF.However, the models remarkably underpredict the TCF at SGP and cloud transmissivity is less susceptible to the change of TCF than observed.In the tropics, most of the GCMs tend to underpredict CF and fail to capture the seasonal variation of CF at middle and low levels.The highlevel CF is much larger in the GCMs than the observations and the inter-model variability of CF also reaches a maximum at high levels in the tropics, indicating discrepancies in the representation of ice cloud associated with convection in the models.While the GCMs generally capture the maximum CF in the boundary layer and vertical variability, the inter-model deviation is largest near the surface over the Arctic.

Introduction
Three dimensional general circulation models (GCMs) are probably the most powerful tools currently available to quantitatively investigate the Earth's climate system and to predict future climate change, which is affected by human activities that cause changes in greenhouse gases, aerosols, and land use and land cover (IPCC, 2007).From a physical point of view, anthropogenic climate change is first of all a perturbation of the Earth's radiation balance (Wild, 2008).Realistic simulation by GCMs of the perturbations of radiative forcing is an important pre-requisite for projecting reliable future climate responses.As Webb et al. (2001) emphasize: "If we are to have confidence in predictions from climate models, a necessary (although not sufficient) requirement is that they should be able to reproduce the observed present-day distribution of clouds and their associated radiative fluxes".Many previous studies have evaluated GCMs' performance in simulating shortwave (SW) and longwave (LW) radiation under cloudy and cloudless skies at the surface, where global ground radiation measurement networks are available, and/or at the top of atmosphere (TOA), where satellite observations can be used as constraints (e.g.Garratt, 1994;Wild et al., 1995Wild et al., , 1998Wild et al., , 2008;;Li et al., 1997;Walsh et al., 2008).These studies found that GCMs were better at producing the mean TOA radiation budget than the surface radiation budget, although there were significant biases and inter-model variability in estimating the SW and LW radiation in particular regions (Wild and Liepert, 1998;Wild et al., 1999Wild et al., , 2005;;Walsh et al., 2008;Wild, 2008).Although in some cases good agreement was found between the observed and modeled cloud radiative forcing, that could be a result of compensating errors in either cloud vertical structure, cloud optical depth or cloud fraction (Potter and Cess, 2004).
The climate science community has identified clouds as one of the highest priorities in climate modeling and climate change projection (IPCC, 2001(IPCC, , 2007)).Accurate representation of cloud-radiation interactions is critical for climate models to simulate the evolution of the climate system.Clouds are also an essential variable in the climate system because they are directly associated with precipitation through microphysical processes and with aerosol loading through the aerosol aqueous-phase chemistry and wet removal process.Physically, cloud-radiation interactions depend largely on the cloud macrophysical (e.g.cloud fraction, liquid and ice water path) and microphysical (e.g.cloud droplet number, size, and ice particle habit) properties.Cloud Fraction (CF) has long been recognized as a dominant modulator of radiation flux at both the surface and the top of the atmosphere (Xi et al., 2010;Liu et al., 2011).For example, a 4 % increase in the area of the globe covered by marine stratocumulus clouds would offset the predicted 2-3 K rise in global temperature due to a doubling of atmospheric carbon dioxide (Randall et al., 1984).Although considerable uncertainties are still associated with cloud feedbacks in GCMs, one can assume that to reasonably simulate global climate, these models should be able to accurately reproduce the current climatology of cloud fraction (including vertical structure) at a given location.
The vertical distribution of clouds affects the vertical heating rate profiles through radiative and diabatic processes, and thus influences the atmospheric stratification and general circulation (e.g.Stephens et al., 2002).Recent studies have revealed that the uncertainty in estimating the cloud occurrence at different levels is much larger than in estimating the total cloud amount in most GCMs (Stephens et al., 2002;Zhang et al., 2005;Illingworth et al., 2007;Naud et al., 2008).By comparing the results of 10 GCMs to ISCCP and CERES datasets, Zhang et al. (2005) found that models simulated a four-fold difference in high-top clouds against the observations.Because different dynamical and thermodynamic conditions produce differing vertical distributions of clouds, accurately characterizing this vertical distribution in the model is critical to understanding cloud feedback processes.
Simultaneously evaluating climatological simulations of cloud fraction, especially vertical structure, and radiation in GCMs against observations is difficult because of the lack of a long-term continuous cloud observational dataset.In situ aircraft measurements reveal the macroscopic structure of clouds, but suffer from sampling problems and can only provide 1-D cloud snapshots.Combining aircraft and groundbased instrumentation can provide a more comprehensive view of clouds and their radiative forcing, however the limitations of aircraft campaigns make this possible only for a number of isolated case studies raising the question of representativeness (Illingworth et al., 2007).Remote sensing from space has provided global cloud properties over many years (Rossow and Schiffer 1991;Webb et al., 2001), but information concerning cloud vertical structure has been lacking.The recent launch of cloud radar on CloudSat accompanied by the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO; Winker et al., 2003) provides valuable global cloud information including cloud vertical structure.However, this dataset is still relatively short, and is also limited to only two observational times per day, providing limited information on the diurnal cycle.
A review by Wild (2008) indicates that the inter-model range in SW TOA flux is about 4 % of its absolute value while the inter-model range in surface SW flux is up to 14 % of its absolute value.This result is likely due to the relative availability of global satellite versus surface data and the adjustment of model cloud parameterizations to get agreement with global mean satellite observations.Thus, information on the relationships between clouds and surface fluxes is needed to further constrain the model parameterizations so that the correct radiation budget can be obtained at both the top and bottom of the atmosphere.
In this study we use simultaneous measurements of cloud fraction and broadband radiation at the surface from the measurement sites sponsored by the Department of Energy (DOE) Atmospheric Radiation Measurement (ARM) Program.Using radiometers, cloud radar and lidar systems, and other advanced instruments at five permanent sites located in several different climate regimes, ARM provides long-term and nearly continuous observations of the surface SW and LW radiative fluxes, sky cover and cloud vertical distributions.The long-term comprehensive ARM climatological datasets make it possible to evaluate the CF and surface radiation budgets simulated by GCMs simultaneously, which provides a unique opportunity to study the role of cloud in estimating the surface radiation budgets (Xie et al., 2010).Due to the different scales of the ARM measurements and the GCM simulations, as well as the impossibility of simulating exact weather systems in a free-running GCM, we use the multi-year ARM data to evaluate the GCMs in a climatological sense.
ARM instruments provide several CF or sky cover related products, such as cloud cover derived from the Total Sky Imager (TSI), total sky cover (TSK) derived from the surface broadband SW radiometers during daytime (Long et al., 2006), effective sky cover (ESK) using broadband LW radiometers during both daytime and nighttime (Long and Turner, 2008), and the frequency of hydrometeor occurrence statistics derived from the narrow field-of-view (FOV) lidar and radar observations (i.e. the Active Remotely-Sensed Clouds Locations or ARSCL product, Clothiaux et al., 2000).Before these data can be used to validate GCM cloud statistics, it is necessary to evaluate the measurements by these different instruments and to see if they are consistent among themselves at different time scales.One difficulty is that various observational methods and climate models use different definitions for CF (Kassianov et al., 2005).For example, the International Satellite Cloud Climatology Project (ISCCP) defines "total cloud amount" as "the fraction of the earth's surface covered by cloud" (Hahn et al., 2001).However, in surface observer climatology studies, it is defined as "the fraction of hemispherical sky covered by cloud" (Hahn et al., 2001).Climate models, on the other hand, typically interpret CF as "the horizontal area fraction covered by clouds as viewed from nadir" (Del Genio et al., 1996).Therefore, comparisons between these FOV and hemispheric observations are important if we want to utilize these data to evaluate climate models.Because of the difficulty in relating model variables to quantities retrieved from remote sensing observations, instrument simulators which use model output to directly simulate the signal that an instrument would observe have been developed in recent years.For climate models, these simulators have focused primarily on satellite observations to date.Development of techniques to simulate groundbased remote sensing observations of the type used in this study would be useful to alleviate some of the uncertainties in the model/observation comparisons.
The aim of the present paper is to evaluate the CF simulations by the GCMs in the 4th assessment report of the Intergovernmental Panel on Climate Change (IPCC AR4;IPCC, 2007), with a focus on the total cloud amount and the vertical structure of CF.Our assessment seeks to identify systematic biases and inter-model deviation in cloud fields simulated in the GCMs across seasonal scales over different regions of the world (e.g.tropics, mid-latitude continent and Arctic).Conducting the analysis over three very different climate regimes provides a better understanding of the geographical variability of clouds and their radiative forcing and a stronger constraint on model simulations.
We also examine how the CF affects cloud transmissivity, a ratio of all-sky mean downwelling SW flux to the cloudfree sky mean, in both observations and GCMs.In the following sections, we first introduce the IPCC AR4 GCMs and ARM datasets used in the analysis (Sect.2).In Sect. 3 we compare three CF-related ARM datasets and examine their consistency at different time scales.Then we evaluate the CF simulations in GCMs against selected ARM observations, with Sect. 4 focusing on the total cloud amount and cloud transmissivity, and Sect. 5 focusing on the vertical profiles of CF.Summary and discussion are given in Sect.6. Results of this study can help the climate modeling community to better understand the CF-related measurements from ARM sites and provide useful insights for improving the cloud-radiation interaction and the CF parameterization in climate models.

IPCC AR4 GCMs
The CF and radiative fluxes simulated by more than a dozen GCMs participating in the experiments for IPCC-AR4 are available from the program for climate model diagnosis and inter-comparison (PCMDI).This experimental framework is also known as the World Climate Research Program (WCRP) Coupled Model Intercomparison Project (CMIP3, Meehl et al., 2007).It should be noted that some GCMs archived both total cloud amount and CF at each model layer, but some GCMs only archived total cloud amount.There are also models that did not archive cloud-free sky surface radiation fluxes.For consistency, we use monthly mean CF and surface radiation fluxes from 11 selected GCMs in this study, as shown in Table 1.The 11 models were from NCAR (model versions ccsm3 and pcm1), GFDL (cm2), GISS (e r), CCSR (MIROC3 2 with high resolution), MRI (cgcm2), UKMO (hadcm3), MPI (echam5), CNRM (cm3), IPSL (cm4) and INM (cm3).The model outputs used in this study are from the AMIP (Atmospheric Model Intercomparison Project) experiment, in which identical observed SSTs were used for all GCMs.Over 20 yr of results are available approximately from 1980 to 1999, with starting and ending years slightly varied among the models.More information about the project and these models can be found at the website of PCMDI (http://www-pcmdi.llnl.gov/).CF is a critical variable in climate models for determining the radiative fluxes through the atmosphere and at the surface.Depending on the complexity of the model, CF may also be used in many other physics parameterizations in the model such as cloud microphysics, aerosol wet removal and convective transport.In this study, we focus on the role of CF in radiation, where the area-averaged CF is used.As discussed in Brooks et al. (2005), although CF produced by most cloud schemes is volume-averaged, most GCMs assume that the cloudy area of a grid box fills the entire grid box in the vertical, thus essentially assuming area-averaged CF is the same as the volume-averaged CF.In GCMs, CF can be parameterized using statistic, diagnostic or prognostic approaches.Due to space constraints, we just summarize the CF parameterization schemes for all GCMs used in this study in Table 1; for more details of each cloud scheme, including references, see http://www-pcmdi.llnl.gov/ipcc/modeldocumentation/ipcc model documentation.php.
In GCMs, the vertical correlations between cloud layers have to be prescribed because cloud elements are often smaller than a typical GCM grid cell and there is no general theory for how different cloud systems should overlap (Collins, 2001).Assumptions about vertical overlap of clouds can affect the exchange of energy between the atmosphere and other components in the model, influencing not only radiative heating rates but also atmospheric temperature and hydrological processes (Collins, 2001).In the IPCC AR4 models, the most common overlap assumptions are maximum/random (Geleyn and Hollingsworth, 1979).One type of maximum/random assumption has maximum cloud overlap in each of three regions representing the lower, middle, and upper troposphere, and random overlap between these regions (e.g.Chou et al., 1998).A second type of maximum/random overlap scheme has maximum overlap between clouds in adjacent levels and random overlap between groups of clouds separated by one or more clear layers (e.g.Zdunkowski et al., 1982).The latter form of maximum/random overlap was found to be more consistent with a statistical analysis of observed cloud distributions (Tian and Curry, 1989).

ARM datasets
Our ground observational data are based on the measurements from three permanent ARM sites: the US Southern Great Plains (SGP) site in Lamont, Oklahoma, the North Slope of Alaska (NSA) Barrow site, and the Tropical Western Pacific (TWP) Manus site.These sites represent mid-latitude continent, Arctic and tropical climate regions, respectively.
ARM sites are equipped with ground-based active and passive remote-sensing instruments, including the millimeterwavelength cloud radar (MMCR), the micropulse lidar (MPL), the laser ceilometer, broadband SW and LW radiometers, and the total sky imager (TSI).Through its valueadded product (VAP) efforts, ARM has implemented advanced retrieval algorithms and sophisticated objective data analysis approaches to process and integrate data collected from these instruments (Xie et al., 2010).Below we give a brief overview of the observational datasets used in this study.More details about these ARM data products can be found in Xie et al. (2010) and in the references therein.

(a) Radiation flux and Total Sky Cover (TSK)
At the SGP site, surface radiation flux data are measured by three separate radiometer systems.An ARM Value Added Product called the Best Estimate Flux (the BEF data) (Shi and Long, 2002) combines the measurements from the three systems to produce the best estimate of surface radiation fluxes.For the NSA and TWP sites, the radiation measurements are from the SkyRad (sky radiation) and GndRad (ground radiation) systems measuring downwelling and upwelling fluxes, respectively.The total downwelling SW fluxes used in this study are primarily the sum of the direct plus diffuse components (measured by Eppley Normal Incidence Pyrheliometer and Eppley shaded Precision Spectral Pyranometers, respectively) whenever available; otherwise the global SW fluxes from the unshaded Eppley Precision Spectral Pyranometers are used.All radiation data used for this study have been quality tested using the QCRad (Quality Control for Radiation measurements) methodology of Long andShi (2006, 2008).After the quality tests, data are further processed by the Radiative Flux Analysis (RFA).The RFA is a collection of analysis tools that detects clear-sky periods and produces continuous clear sky estimates of SW fluxes (Long and Ackerman, 2000) and LW fluxes (Long and Turner, 2008) and infers bulk cloud properties such as daylight total sky cover (TSK; Long et al., 2006), longwave effective sky cover (ESK; Durr and Philipona, 2004), and cloud effective SW transmissivity from the broadband radiometer data.These measurements have a hemispherical FOV and thus provide time series of fractional sky cover (Kassianov et al., 2005).
The TSK measurements with 1-s sampling interval have hemispheric fields of view and hence are more related to fractional sky cover, the angular amount of the sky dome covered by clouds.The derived TSK is based on measurements, so also includes the uncertainties of the measured quantities themselves.The ARM Program documents the uncertainties in broadband radiation measurements (Stoffel, 2005) as 4 Wm −2 or 3 % (whichever is greater in Wm −2 ) for diffuse SW, 20 Wm −2 or 6 % for direct normal SW, and 4 Wm −2 or 4 % for LW.For the direct component SW, then, the uncertainty is roughly the normal incidence uncertainty weighted by the cosine of the solar zenith angle or 20 Wm −2 , whichever is greater.The clear-sky estimations for diffuse, direct, and total SW are about the root mean square (RMS) of twice the measurement uncertainty for the SW components (Long and Ackerman, 2000), and about 4-5 Wm −2 for the clear-sky LW (Long and Turner, 2008).Because the cloud effective SW transmissivity includes the instrument characteristics in both the numerator and denominator, the instrument characteristics are largely removed from the ratio.Thus the effective uncertainty of the ratio is at about the 2 % level (Long and Ackerman, 2000).For TSK, there is no "truth" for sky cover because of the nebulous definition of what is and is not a cloud in the community.However, the TSK, as well as sky imager retrievals and human sky observations all tend toward the same definition of cloud.Comparisons between TSK and sky imager and human observations give agreement to better than 10 % sky cover (Long et al., 2006).All ARM radiometer-based inferred quantities are produced at the same 1-min resolution as the measurements, and averaged to longer temporal resolution as appropriate.

(b) Total Sky Imager (TSI)
The ARM observational strategy does not include human observations, however the TSI is the instrument most similar to a traditional human observation of cloud cover.The TSI takes hemispheric "fish eye" color digital pictures of the sky every 30 s during daylight hours from a camera mounted looking down on a curved mirror.These images are then processed to infer what fraction of the sky view contains cloud elements, or fractional sky cover.The processing uses the ratio of red to blue color values for each pixel in the sky image, except for that part of the image that is masked for the camera arm and sun blocking strip on the rotating mirror.One advantage of sky imagers over human observations is consistency of the retrieved results, where the subjective nature that affects human observations is removed.An overview and examples of this processing methodology are presented in Long (2010).Comparisons with TSK give overall agreement at better than 10 % (Long at al., 2006) and with the Scripps Whole Sky Imager at the same level (Long et al., 2001).

(c) ARSCL cloud fraction
By integrating measurements from the MMCR, MPL and laser ceilometers, the ARSCL product provides an estimate www.atmos-chem-phys.net/12/1785/2012/Atmos.Chem.Phys., 12, 1785-1810, 2012 of the total amount and best estimate vertical location of clouds (Clothiaux et al., 1999(Clothiaux et al., , 2000)).These are vertically pointing instruments with a narrow FOV that can only detect clouds directly above the instruments.However, unlike the TSI and TSK measurements, they provide both vertical location of clouds and nighttime cloud detection.The ARSCL CF is derived based on the ARSCL cloud boundary information using the algorithm described in The cloud statistics obtained from such narrow FOV height-time transects might not be representative of a larger area surrounding these instruments at a short time scale (e.g.Berg and Stull, 2002;Kassianov et al., 2005), which is one reason we use longer-term observations for comparisons to the model results.Another issue with the ARSCL clouds is that cloud radar tends to underestimate the cloud top heights for thin high-altitude clouds because of detection limits and signal attenuation (Comstock et al., 2002).The consequence of this problem has been mitigated with the use of ARM MPL in CMBE, which is sensitive to small cloud particles.The ARSCL cloud statistics used in this study are calculated based on data during periods when both MMCR and MPL were in operation.

(d) Vertical mapping of CF
The vertical resolution (layer thickness) of the ARSCL data and the models are quite different.We vertically mapped the CF from each GCM into the ARSCL vertical grid so the model-observation comparisons presented later are all based on the finer ARSCL vertical grid.Since the CF is assumed vertically constant within each GCM grid layer, we simply distribute the CF in each GCM layer evenly into the much finer ARSCL layer (about 45 m thick).This should be similar to using a simple average to map the finer vertical resolution ARSCL CF to the coarser GCM vertical grid, but allows us to have a single vertical grid for comparison (rather than mapping ARSCL to each GCM vertical grid with different layer thicknesses).
CF and surface radiation data are available starting from late 1990s at most of the sites.Approximately 10 yr of data are used in this study.There were frequent data gaps in the ARM observations, especially at the remote Manus and NSA sites.To make full use of the ARM datasets, we setup the following individual rules for data selections.Overall, it's required to have at least 30 % of quality-controlled data for a day or hour to be included in the analysis.For dataset intercomparisons at daily time scale, we use the exact same periods when all involved datasets are available (e.g. for scatter and PDF Figures).For comparison of a single dataset against the GCMs at climatology scales, we include all available data from that dataset, regardless of the other two.Therefore, it is possible that different volumes of the datasets are used for different purposes.

Inter-comparisons of three ground-based CF related datasets
Before presenting the evaluation of the GCM results, we first present inter-comparisons of the three ground-based CF related datasets at different time scales to understand the uncertainties inherent in the observed cloud amount.To demonstrate the differences between the three datasets on a daily timescale, two time periods are selected here: The ARM sites represent three different climate regimes, so it is important to comprehensively compare the three CF measurements over the different sites.Here we first describe the three sites and their typical meteorological conditions and then present detailed analysis at daily and monthly time scales over each site.

Manus
Manus, one of the three TWP sites located in the Western Pacific Warm Pool region, is influenced by the El Niño-Southern Oscillation (ENSO), which also plays a large role in the interannual variability observed in the global climate system.The TWP region consistently has warm sea surface temperatures that produce large surface heat and moisture fluxes into the local atmosphere, causing the formation of deep convective cloud systems and consequent high-altitude cirrus clouds.
The scatter plots comparing the daily averaged ARSCL, TSI, and TSK values over Manus are shown in the top panel of Fig. 2. The correlation coefficient between ARSCL and TSI is 0.63, and the root mean square deviation (RMSD) is 0.23, indicating a significant bias (>30 %) between ARSCL and TSI on the daily timescale.There is also a similar significant inconsistency between ARSCL and TSK, with a correlation coefficient of 0.56 and RMSD of 0.24 (not shown).The correlation coefficient between TSI and TSK is 0.79 and RMSD is 0.17, indicating a relatively smaller bias between TSK and TSI.
As we are interested in examining the performance of GCMs in a climatological sense, we now examine multi-year monthly means and annual means of TCF from the three datasets over each site (Fig. 3).There are approximately 10 yr of data available for ARSCL and TSK over these sites.The TSI has fewer years of observations, especially at Manus and NSA.Since only 2-3 yr of TSI data are available at NSA, we do not include it in this analysis.The average TCF ranges from 0.65 to 0.85 at Manus, with a minimum value in May.The overall seasonal variability of TCF is small over Manus.Compared to the daily-based TCF, the differences among the three multi-year averaged datasets are much smaller.The annual RMSD between ARSCL and TSI, ARSCL and TSK, TSI and TSK are all less than 0.075, indicating less than 10 % disagreement among the three multi-year averages.
Given the large differences in the daily TCF between ARSCL and TSI/TSK, it is somewhat surprising that the monthly differences are less than 10 %. Figure 4 (top) shows the frequency distribution of daily total sky cover or cloud fraction over Manus for multiple years of data.The TSI frequency is larger (smaller) than ARSCL when CF is less (larger) than 0.6.Also the difference between the TSI and ARSCL tends to be significant when CF is larger than 0.8 or smaller than 0.2.This is not surprising due to the several orders of magnitude difference in field-of-view between the TSI and ARSCL.During days with large amounts of CF, because of the small sampling of the ARSCL narrow FOV, the ARSCL beam is more likely to be filled with cloud and hence overestimates the TCF.On days with lower cloud cover, the ARSCL beam is more likely to sample clear sky than cloud, and hence underestimates the TCF.The overestimation for larger CF by ARSCL is compensated by less frequent smaller CF, resulting in the small difference among the multi-year monthly mean CFs.The difference in the frequency distributions between TSI and TSK are smaller because both of them have hemispheric fields of view.

SGP
The SGP site is located in north-central Oklahoma, representing the interior regions of many mid-latitude continents, where the clouds are driven by frontal systems or by heating and local convection.The convection is usually short lived over the SGP and does not have the extensive cirrus that is found in the tropics (Mace and Benson, 2008).Shallow cumuli often form in spring and summer under stable synoptic conditions with a strong surface forcing and well-developed boundary layers (Berg et al., 2010).
The scatter plots for the daily ARSCL and TSI, and TSI and TSK values over SGP are shown in the middle panel of Fig. 2. The correlation coefficient between ARSCL and TSI is 0.82, and the RMSD is 0.23, indicating a significant bias between ARSCL and TSI on the daily timescale.The correlation coefficient between TSI and TSK is 0.86 and RMSD is 0.20.The differences among three measurements as shown from scatter plots are similar at SGP with that at Manus.
Figure 4 (middle) shows the frequency distribution of daily total sky cover or cloud fraction over SGP.While the occurrence frequency of cloudiness shows an upward trend with the increase of CF at Manus, much more clear sky days can be found at SGP than at either NSA or Manus, where most of the time there are clouds at least part of the day.The TSI frequency is larger (smaller) than ARSCL when CF is less (larger) than 0.3.The TSI is more than two times larger than the ARSCL when CF is less than 0.1.Similar to Manus, the difference of frequency distribution between TSI and TSK are much smaller than between TSI and ARSCL.
The multi-year mean TCF over SGP ranges from 0.35 to 0.62 (Fig. 3, middle).The overall magnitude of TCF is significantly smaller than over the tropics, although the seasonal variability is greater.The maximum TCF is during winter and spring and the minimum is during July to September.Similar to the results seen at Manus, the monthly mean TCF from ARSCL is larger than that from TSI or TSK, but disagreement among the three datasets is less than 15 %.

NSA
The NSA site is located at Barrow, the northernmost location in Alaska.This site, located near cryospheric boundaries, has a prevailing east-northeast wind off the Beaufort Sea and is influenced by both extratropical and Arctic synoptic activity (Stone et al., 2002).Previous research has estimated that clouds in the Arctic are more prevalent and persistent than clouds elsewhere (Curry et al., 1996).Additionally, the Arctic site has a large amount of mixed-phase clouds, which are not well treated in current climate models (Verlinde et al., 2007).
The bottom panel of Fig. 2 shows the scatter plots for the daily ARSCL and TSI, and TSI and TSK values over NSA.At NSA only days from April to September (when solar elevation is high enough for reliable TSI and TSK measurements) are used in this analysis.The correlation coefficient between ARSCL and TSI is 0.73, and the RMSD is 0.24, similar to that at the other two sites.The correlation coefficient between TSI and TSK is 0.90 and RMSD is 0.15, indicating a better agreement between TSI and TSK than at the other two sites.
Figure 4 (bottom) shows the frequency distribution of daily total sky cover or cloud fraction over NSA.Similar with at Manus, the occurrence frequency of cloudiness shows an upward trend with the increase of CF and overcast skies (CF > 0.9) account for almost 40 % of days at NSA.Generally, the TSI frequency is larger (smaller) than ARSCL when the CF is less (larger) than 0.8.Similar to the other two sites, the difference of frequency distribution between TSI and TSK are smaller than between TSI and ARSCL.
The multi-year mean ARSCL TCF ranges from 0.5 to 0.9 at NSA, showing a stronger seasonal variability over the Arctic than at Manus or SGP.TCF increases significantly from March to May (0.5 → 0.8), remains relatively high from May to October except for June and July, and then decreases from October to the next March.The maximum TCF occurs in August-October and the minimum occurs in March.For the available months, the ARSCL and TSK match very well and the difference between them is less than 10 %.

A summary and discussion for inter-comparison of three datasets
For all sites, the correlation coefficients are higher and RMSD are lower for the TSI/TSK comparisons than for the ARSCL/TSI comparisons.This is not surprising because ARSCL is derived from the time-slice measurement with narrow lidar/radar FOV, but both TSI and TSK are from hemispheric observations.There are several possible reasons for the bias between the daily TCF from ARSCL and the TSI/TSK measurements.The first possibility is the different fields of view.Although one might expect the TSI/TSK to have a higher CF than ARSCL because the hemispheric FOV instruments are more likely to be affected by cloud sides as well as cloud bases, it is also true that the narrow FOV AR-SCL instrument samples only a very small fraction of the domain seen by the hemispheric view instruments (TSI and TSK).Thus, if the cloud field is not isotropic, then sampling such a small portion of the cloud field could easily lead to large biases in CF over short time periods.Pincus et al. (2005) used cloud scenes produced by a 3-D large-eddy simulation model to simulate the CF that would be seen by a vertically pointing narrow FOV instrument and compare it to the model's domain-mean CF.They found that the difference in cloud fraction varied from scene to scene and also depended on the averaging period used.
Another reason for differences in the CF is that the TSI, broadband radiometer, and radar/lidar measurements use very different techniques to detect cloud and thus have different sensitivities to different types of clouds.The MPL instrument, which is included in the ARSCL TCF, can detect very optically thin cirrus clouds that may not significantly affect the broadband SW measurements used to determine the TSK cloud amount.The significant bias between narrow FOV and hemispheric observations on a daily basis suggests that users should be extremely cautious to use these datasets to quantitatively evaluate the hourly or daily CF calculated in climate models or retrieved by satellite instruments.The Manus and NSA sites both have large frequency of overcast cases and relatively few clear sky cases compared to SGP.At Manus, much of the overcast is likely due to ice anvil and cirrus associated with deep convective systems, while at NSA there is often extensive low-level cloudiness.At all sites, the ARSCL frequency is less than TSI when CF is small (<0.3) and greater than TSI when CF is large (>0.8).Also the difference between the TSI and ARSCL tends to be significant when CF is larger than 0.8 or smaller than 0.2.The compensating errors in lower and higher CF days result in small biases of TCF between ARSCL and TSI/TSK measurements as multiple-year data is averaged.
Table 2 summarizes the annual mean TCF from the different observations and some previous studies over the three sites.The mean TCF over Manus derived from the ARSCL, TSI and TSK is 0.76, 0.74, and 0.71, respectively.We did not find any previous studies that summarized annual mean TCF over Manus.The TCF ranges from 0.45 to 0.51 over SGP, based on this and some previous studies.The TCF from ARSCL in SGP is 0.51 in this study, very close to the values (0.49-0.50) from other studies, including those from synoptic weather stations.The annual TCF based on AR-SCL are larger than 0.73 over NSA, which are comparable to those derived from ground-based radar/lidar observations during the Surface Heat Budget of the Arctic Ocean experiment and from satellite observations over the western Arctic regions (Walsh et al., 2009).
Although the point-to-point differences are significant among the three CF datasets on a daily basis, the differences  are less than 15 % in the multi-year monthly means and less than 5 % in the annual means over the three sites.The differences in annual mean TCF among ARSCL, TSI, TSK and observation from synoptic weather stations are less than 0.04 (8-9 %) over SGP.Additionally, it gives us some confidence in using the narrow field of view ARSCL data to assess the models' vertical distribution of clouds.Thus the multiyear monthly means are the best available dataset to compare to the climate model results.However, this does not necessarily mean that the estimates of measured CF are unbiased, only that averaging over a longer time period and a multi- We find that the performance of GCMs in simulating radiative fluxes is highly related to their simulation of CF.Positive biases in monthly surface downwelling SW flux can be found when the CF is underestimated (figure not shown).However, our focus here is to compare the 11 GCMs as a group in estimating CF and related radiative effects rather than to examine the performance of each individual model.Figure 5 compares the aggregate normalized standard deviation (NSD) of annual mean surface downward SW radiation under clear skies (CSWdn) and all-skies (SWdn), cloud transmissivity (SWdn/CSWdn, TRANS), TCF (TSK, because it matches the surface radiation flux data), and cloud effect (CSWdn-SWdn normalized by TCF) over three sites for the 11 GCMs and for the difference between models and measurements, respectively.We define NSD as where x i represents annual mean value of a variable in a GCM, and N is the number of GCMs used for calculation.
x represents an observational value or the average of x i for the 11 models, Figure 5 (top) shows that the inter-model NSD for CSWdn is less than 0.03 (3 %), and the model-measurement NSD for CSWdn is also small (less than 0.04), indicating that the models generally do a reasonably good job with surface SW radiation in cloudless skies.The sites examined generally have small aerosol optical depths which may partly contribute to the good agreement between model and measurement for the cloudless sky fluxes.The inter-model NSD for all-sky SWdn is around 0.08-0.18,indicating around 3-5 times larger disparity among the GCMs in estimating the downward solar radiation when clouds are included.The NSD for cloud transmissivity is very close to the value for SWdn over all three sites, because the deviation in transmissivity is mainly contributed by the SWdn rather than CSWdn.At Manus and SGP, the intermodel NSD for TCF reaches 0.18-0.28,2-3 times as large as for SWdn and transmissivity, which indicates the inter-model disparity in TCF is much larger than in downward solar radiation.Meanwhile, the NSD of normalized cloud effect (NCE), defined as (CSWdn-SWdn)/TCF, shows a similar magnitude of NSD as TCF, indicating that other dimensions of cloud in addition to cloud amount, such as cloud optical thickness and/or cloud height, have a similar magnitude of disparity as TCF within the GCMs.This also suggests that the better agreement among GCMs in the cloudy-sky SW fluxes than in TCF or NCE could be a result of compensating effects from errors in cloud vertical structure, overlap assumption, cloud optical depth and/or cloud fraction.
The NSD for model-measurement comparison shows a similar overall pattern for inter-model deviation, but the magnitude is slightly larger for all quantities (Fig. 5, bottom).The NSD for SWdn and transmissivity is 3-4 times as large as for CSWdn, and the NSD for TCF and NCE is twice as large as for SWdn and transmissivity, except for NSA where only six months of data from April and September are used.The similar overall deviation pattern between inter-model and model-measurement comparisons suggests that the climate models tend to generate larger differences against observations for those variables with larger inter-model deviation.The model-measurement NSD values for CSWdn, SWdn, and SWdn/CSWdn are similar at both Manus and SGP.At NSA, however, both inter-model deviation and model-measurement difference have similar magnitudes for TCF, SWdn and Tsw, suggesting that models have more difficulties in simulating surface radiative fluxes in high-latitude regions.

Comparison of seasonal cycle of total cloud fraction
Figure 6 shows comparisons of seasonal TCF for the 11 GCMs and their averages with the three different observations over Manus, SGP and NSA.Although overall seasonal variability of TCF is small over Manus, most of the models capture the minimum of TCF during the late spring.Simulated TCFs are more scattered during June to October than other months.While the simulated TCF are very diverse among the GCMs, the ensemble mean TCF (averaged over all the 11 GCMs) is close to the measurement in both magnitude and seasonal cycle.
For the long-term average, most of the GCMs (except for one model) capture the seasonal variation of TCF over the SGP, with a maximum during winter and spring and a minimum during July to September.Compared to the measurements, most of the GCMs underestimate the TCF, so the 11model average of TCF is consistently smaller than the observations by 0.05-0.1 for almost all months.
As shown in Fig. 6, the NSA Barrow site has a relatively large cloud fraction, especially during the warm season in which the low-level cloud is persistent.This persistent large cloud coverage insures its important role in the Arctic climate system.Unfortunately, the performance of GCMs is very diverse over NSA, especially for cold season.The 11-model averaged TCF is close to the observations, with the exception of months from January to April, during which most of the models overpredict the CF by up to 0.4.

Frequency of occurrence vs. TCF bins
Figure 7 shows the frequency of occurrence of monthly mean TCF simulated by the 11 GCMs and calculated from observations (TSK, which has longest record to calculate PDF) over the three ARM sites.Over Manus (Fig. 7a), the observed TCF shows a narrow nearly normal distribution, with a range from 0.4 to 0.9.Four GCMs, i.e.GISS, CCSM UKMO and CNF-Hires, generally well capture the observed range and PDF pattern of TCF.The PDF pattern from CNRM, GFDL, IPSL is slightly skewed to high TCF compared to the observed nearly normal distribution, indicating a too frequent larger cloud cover in those models.Simulated frequency of occurrence in PCM, INM and MPI dramatically increases from lower to higher TCF bins with approximately 60 % of the occurrences having TCF larger than 0.9 in PCM and INM and 45 % of the occurrences having TCF larger than 0.9 in MPI.Apparently too frequent nearly overcast days are simulated over the tropics in these three models.TCF in MRI is almost evenly distributed in bins between 0.1 to 0.8, indicating a too frequent overprediction of low CF and underprediction of larger CF in this model.
Over SGP (Fig. 7b), the observed TCF also shows a nearnormal distribution, with the mean about 0.5.The PDF of TCF at SGP has a similar shape to the PDF at Manus, but values are shifted to lower bins.A few of the GCMs reasonably capture the near-normal distribution pattern of TCF over SGP, such as IPSL, GFDL, MPI and CNRM.However, the simulated TCF in PCM and UKMO are too evenly distributed over a bigger range of values.In contrast, TCF is too narrowly distributed in CNF Hires.This model together with CCSM and GISS all show a shifted TCF distribution to the lower bins, indicating too many cloud-free and/or low cloudcover days in these three models.Overall the GCMs perform better at SGP than at Manus in terms of the frequency distribution.This is likely related to the weaker large scale forcing at the TWP compared to SGP, and/or to the more diverse and varying cloud regimes over the tropical region, such as frequent deep convection and cirrus cloud, which are relatively poorly simulated in GCMs.
Over NSA (Fig. 7c), observed TCFs are widely spread in bins between 0.1 and 1.0 but occur more frequently in the higher bins.Except for CNF Hires, all models tend to show a more narrowly distributed PDF that skews to the higher side of the bins, especially CNRM, INM and MPI, which means too many overcast and/or high cloud-cover days but too few low cloud-cover days in those models.

Transmissivity vs. TCF
Not only the total cloud cover but also cloud optical properties influence the amount of SW flux reaching the surface.The absolute impact of the cloud also depends on the magnitude of incoming solar radiation.To characterize the normalized impact of clouds on the surface SW radiation, we use the SW cloud transmissivity (Tsw, or SWdn/CSWdn).It is normalized by the clear-sky downwelling flux at the surface instead of at the top of the atmosphere so that the effect of the atmosphere on the surface fluxes is minimized and then model treatment of molecular scattering, gaseous absorption, and aerosol is less important to the results (except for potential aerosol indirect effects).The monthly mean SW transmissivity is plotted against the corresponding TCF in Fig. 8 for both the observations and nine GCMs over the ARM sites (the other two GCMs did not archive SW fluxes under cloud-free skies).Since the SW flux under cloud-free skies (i.e.CSWdn) is much better simulated and has less scatter among the GCMs (see Fig. 5), the performance of models in estimating the transmissivity primarily reflects their ability to estimate the cloud influence on the SW flux under all-skies (i.e.SWdn).Figure 8a for Manus shows that the observed transmissivity ranges from 0.5 to 0.9.The observed transmissivity and TCF are highly correlated over Manus, with a correlation coefficient of -0.93.Moreover, the transmissivity almost linearly decreases with increased TCF within the range of observed TCF.The slope of the fitted line (i.e.s = T SW / TCF) is −0.74, and serves as an indicator of how the aggregate cloud optical properties change with changing cloud amount.For the models, the linear fit slope serves to indicate, given the cloud amounts that the model produces, whether the resultant aggregate cloud optical properties are in line with the observations for that climate regime.
The correlation coefficients between transmissivity and TCF over Manus simulated by the nine GCMs range from −0.74 to −0.96, with most GCMs having less correlation between transmissivity and TCF than seen in the observations.The simulated transmissivity, ranging from 0.4 to 1.0, is overall nominally consistent with the observed.However, the slope (or T SW / TCF) varies from −40.51 to −0.96, indicating a very wide range of different cloud transmissivity changes per unit TCF change among the GCMs.
www.atmos-chem-phys.net/12/1785/2012/For example, the T SW / TCF for UKMO and CNF Hires are −0.51 and −0.52, respectively, and the smallest values among all GCMs.Both these models exhibit an underestimation of TCF (Fig. 6a) with larger transmissivities at the lower TCF range.At the same time, these models overestimate the transmissivity for larger TCF compared to the observations.These differences suggest that the cloud optical thickness is underestimated in these models, resulting in a smaller T SW / TCF than observed.The T SW / TCF is also underestimated in MRI, which also has too many lower TCF values, and in GFDL, which has too many large TCF occurrences.However, T SW / TCF is −0.96 in MPI, which is much higher than the observations or other GCMs.
Here again, the MPI TCF frequency is biased toward large TCF, with anomalously high transmissivity for the few occurrences of TCF in the 40-60 % TCF range.This indicates that the transmissivity in MPI is too optically thin for the mid-TCF values, however this is compensated for by the overestimated TCF in MPI (see Fig. 6a) and still results in a reasonable estimation of surface SW flux.Other models show a more reasonable agreement for T SW / TCF over Manus.The slope of transmissivity against TCF and the correlation coefficients between them for the observations and all GCMs and are summarized in Table 3.
Over SGP, the observed transmissivity ranges from about 0.5 to 0.9, similar to that over Manus and the correlation coefficient between TCF and transmissivity is the same as at Manus.The T SW / TCF is −0.70, slightly smaller than at Manus.Except for MRI, all GCMs generally underestimate the T SW / TCF over SGP (however MRI had a large underestimate of TSW/TCF at Manus).The minimum T SW / TCF is −0.43 in IPSL, almost 40 % smaller than the observed.As discussed in Sect.4.2, most of the GCMs significantly underestimate the TCF over SGP.This indicates that current global models tend to remarkably underpredict both TCF and T SW / TCF over SGP, i.e. cloud cover is smaller and the models tend to generate larger transmissivities at the larger TCF values.For instance, the observations give transmissivity values ranging from 0.5 to 0.7 for TCF from 0.6 to 0.7.Yet the models (except for MRI) range from about 0.6 to 0.7 for the same range of TCF, which then produces the underestimation of T SW / TCF.
Over NSA, a greater variety of T SW / TCF can be found for both observations and models (Fig. 8c).This is at least in part due to the bi-modal behavior of the relationship for snow covered and non-snow-covered ground.In the snow covered case, multiple reflection of SW between the surface and the clouds increases the SWdn, which increases the SWdn/CSWdn ratio and the ratio then includes not only the actual cloud transmission but also the multiple reflection.The snow covered ground cases are those in the upper right of the observations, with both the snow-covered and non-snowcovered cases producing about the same T SW / TCF slope.The transmissivity varies from 0.2 to 0.9 in GCMs for TCF larger than 0.9, indicating more divergence in transmissiv-  ity and cloud optical thickness under skies with larger cloud fraction.The actual changes are asymptotic with respect to changes in TCF because the cloud optical depth usually tends to increase with increasing cloud fraction.While a value of −0.71 for the observed T SW / TCF is close to that at the SGP and Manus sites, the models give very diverse predictions for T SW / TCF, ranging from −0.60 in CNF Hires to −1.50 in INM.Both T SW / TCF and TCF (see Fig. 6) are significantly underestimated in CNF Hires, showing too large transmissivity values for larger TCF in this model.In contrast, T SW / TCF is overestimated by more than 100 % in the INM, with an attendant lack of smaller TCF values.Seven models overestimate and two models underestimate the T SW / TCF over NSA, with most models exhibiting little stable correlation between TCF and transmissivity, and thus no well-defined bi-modal behavior in the relationship.

Evaluation of cloud vertical structure
In Sect. 4 we comprehensively evaluated the TCF simulated by the GCMs and its impact on mean cloud transmissivity and the surface SW flux.Here, we use the long-term ARSCL observations to evaluate the vertical structure of cloud fraction in the models.For simplicity in discussion, low, middle and high clouds are defined as those located at heights of 0-3 km, 3-6 km and higher than 6 km, respectively.the observed.Overall, most of the GCMs tend to underpredict CF in the lower and mid troposphere.The CCSM is an exception as it significantly overpredicts CF of low and middle clouds.The observation shows a remarkable seasonal variation of CF at different levels (Fig. 10) with a minimum CF in April and a maximum CF in July at lower levels.Except for CCSM, none of the GCMs capture this seasonal variation of CF at lower levels over Manus.

Manus
The high level CF in almost all GCMs is substantially larger than the observation over Manus.The high level CF averaged in all 7 GCMs is around 3 times as large as that in the ARSCL observation.This is possibly to some extent a result of the different thresholds determining thin cloud in climate models and ARM measurements, and of the sensitivity limits of the ARM MMCR and MPL for high altitude clouds.The choice of thresholds in determining thin high-cloud is somewhat arbitrary in climate models and in lidar/radar retrievals.The CF at high altitudes is almost linearly dependent on the cutoff value of the optical thickness of these thin high-clouds.Except for the CCSM, none of the GCMs capture the seasonal variability of CF at high level.The GCMs such as INM and CNF Hires show no seasonal variation for the high level cloud.
Cloud top in most of the GCMs is notably higher in comparison to the ARSCL observations at Manus.The cloud top height is around 17 km in ARSCL, while except for MRI, the cloud top height reaches 19-20 km in most of the GCMs.The cloud top height in ARSCL is probably underestimated to some extent as the radar cannot detect small particles at the top of ice clouds and the lidar is often attenuated in optically thick ice cloud before reaching cloud top.It is interesting that only CCSM can capture the seasonal variability of CF at both low and high levels, although CF in CCSM is larger than the observation at all levels.Meanwhile, the TCF simulated in CCSM is close to that in ARSCL (Tables 2 and 3), which indicates that the cloud overlap scheme in CCSM produces a result similar to the observation.However, we should keep in mind that the overlap derived from the ARSCL observations is not necessarily the true overlap due to limitations in the measurements (such as the difficulty of lidar to penetrate through thick clouds and the lack of detection of thin layers by radar).The model-measurement difference reaches a maximum at around 8-16 km, which is also the height where the maximum of inter-model deviation of CF is located.Figure 9b shows that the standard deviation (SD) of CF among the 7 GCMs is larger than 0.15 between 8-16 km, while it is less than 0.1 below 6 km.The larger inter-model deviation at higher levels implies the current GCMs have more problems in simulating the high clouds (e.g.cirrus or ice clouds) over the tropical region.Much of the high cloud observed over Manus lie in the outflow from deep convection over the Maritime Continent (Mather, 2005), indicating that the GCMs likely have trouble representing the full radiative impact of tropical deep convection systems.As argued by Waliser et al. (2009), the shortcomings in the representation of these clouds impact both the latent and radiative heating processes, and in turn the circulation and the energy and water cycles, leading to errors in weather and climate forecasts and to uncertainties in quantifying cloud feedbacks associated with global change.

SGP
Figures 11 and 12 show the annual mean vertical profiles and monthly time-height plots of CF from the ARSCL observation and GCMs over SGP.The observed CF has a bimodal vertical distribution with a higher peak around 6-10 km and a lower one below 2 km.The maximum CF of high clouds occurs during the winter and spring (Fig. 12) when baroclinic wave activity is common over the ARM SGP site (Xi et al., 2010).High-cloud fraction also varies somewhat with the tropopause heights by season due to the change in thermal thickness of the atmosphere.CF is relatively smaller during July to September, especially for low clouds, which is consistent with that for TCF as shown in Fig. 6b.
The GISS and MRI simulate the smallest CF at all levels, while CNF Hires and INM simulate largest CF at higher levels.Most of the GCMs tend to underpredict CF by 50-150 % at low and middle levels.The CF averaged over all GCMs is around half and two-thirds of the values of ARSCL at low and middle levels, respectively.Except for MRI, all other GCMs fail to capture the distinct boundary layer cloud during winter and spring in their simulations.The ARSCL CF at high level is larger than that predicted in the GISS and MRI, but is smaller than that in the other models.The mean CF of all GCMs is only slightly larger than that for the ARSCL observation at high levels.While most of the models capture the minimum CF at low level during July to September, only CCSM relatively reasonably captures the seasonal variation of CF at high level.
While the model-measurement difference is larger at lower levels below 5 km, the inter-model deviation of CF is larger for high clouds (i.e.above 6 km). Figure 11b shows that the SD of CF among the 7 GCMs is around 0.07 between 7-13 km, and is less than 0.03 below 6 km.The SD for both high and low clouds over SGP is only half as large as that over Manus.The smaller inter-model deviation in SGP suggests that the current GCMs perform more consistently in simulating vertical distribution of CF over the mid-latitude continent than over the tropics, which could partly result from the much stronger large scale forcing for SGP compared to the TWP regime.

NSA
The observed and simulated annual mean vertical profiles of CF over NSA are shown in Fig. 13.Unlike Manus and SGP, most of the clouds at NSA are low level clouds below 1-2 km.The CF gradually decreases with the height and the maximum cloud top height is around 10-12 km.Therefore, the total cloud cover examined earlier is dominated by low clouds over NSA, either single-layered or multi-layered systems with a significant low-cloud component.
The observed and simulated monthly mean time-height plots of CF are shown in Fig. 14.The maximum CF of low clouds occurs in late spring characterized by optically thin cloud, late-summer and fall with more optically dense clouds.The deeper boundary layer and low clouds in late summer and fall than in late spring is likely due to the retreat of sea ice from the north coast of Alaska, which increases moisture fluxes into the lower atmosphere.In contrast, the cloudiness in May is typical of continental landmasses in spring (i.e.scattered "fair weather" cumulus clouds that form on an otherwise clear day).In a low-solar-zenith-angle environment such as the Alaskan North Slope, the scattered cumulus clouds sideways scatter a significant fraction of the downwelling solar flux to the surface (Dong et al., 2010).
While the GCMs generally capture the maximum CF in the boundary layer and vertical variability (i.e.decreasing with height) of CF, some GCMs (e.g.PCM) tend to overpredict the CF at high level and in the boundary layer.It can be found from Figs. 12 and 14 that a fixed cloud top is probably applied in the INM. Figure 13b shows that the SD of CF among the 7 GCMs is around 0.2 near surface and gradually decreases with height.It becomes constant (around 0.05-0.06)between 2-10 km.

Variability among ensemble runs within the same GCM
A few IPCC AR4 models have conducted several ensemble simulations, which is important to identify the internal variability and uncertainty of the model results, especially in projecting the future climate change.For example, GISS has four and IPSL has six ensemble simulations in CMIP3.Previous studies have typically focused on evaluating the model internal variability and uncertainty for the simulated temperature and precipitation.It is interesting to examine the internal variability of CF among ensemble simulations.
Figure 15 shows the vertical profiles of CF for four GISS and six IPSL simulations, and their SD among ensemble runs over Manus.The results show that the variability of CF among ensemble runs in the same GCM is minor at all levels.For example, SD for both GISS and IPSL is usually less than 0.005 in Manus, around 2-10 % of inter-model SD.Similar conclusions are found over the SGP and NSA (not shown).This indicates that the internal variability of CF in the same model with ensemble simulations is insignificant.

Summary and discussion
Cloud Fraction (CF) has long been recognized as the dominant modulator of radiative fluxes.In this study, we evaluate CF simulations in the IPCC AR4 GCMs against ARM ground measurements at a climatological time-scale, with a focus on the vertical structure, total amount of cloud and its effect on cloud transmissivity, for both inter-model deviation and model-measurement discrepancy.The frequency of hydrometeor occurrence statistics derived from the Active Remotely-Sensed Clouds Locations (ARSCL) observation, the Total Sky Imager (TSI), and the Total Sky Cover (TSK) derived from surface SW radiometers, all are CF or sky cover related products available at ARM sites.Our inter-comparisons reveal that they are correlated with each other but the daily differences are quite significant, suggesting that one should be extremely cautious in using transient (hourly or daily mean) CF data to quantitatively evaluate CF calculated in climate models or retrieved by satellite.A common feature among three sites is that the TSI produces smaller TCF compared to a radar/lidar dataset for highly cloudy days (CF > 0.8), but produces a larger TCF value than the radar/lidar for less cloudy conditions (CF < 0.3).The compensating errors in lower and higher CF days result in small bias of TCF between the vertically pointing radar/lidar dataset and the hemispheric TSI measurements as multi-year data is averaged.The differences are usually less than 10 % among their multi-year monthly mean values and less than 5 % among their annual mean values, which gives more confidence in using ARSCL CF to evaluate the GCM climatology simulations.
Detailed comparisons of the GCMs results with the ARM observations reveal that the model bias against the observation and the inter-model deviation (disparity) have a similar magnitude for the total CF (TCF) and for the normalized cloud effect, and they are twice as large as that for the surface downward solar radiation and cloud transmissivity.This implies that the other dimensions of cloud, such as cloud optical depth and height, has a similar magnitude of disparity to TCF among the GCMs, and suggests that a better agreement among the GCMs in solar radiative fluxes could be a result from compensating errors in either cloud vertical structure, cloud optical depth or cloud fraction.Similar deviation pattern between inter-model and model-measurement suggests that the climate models tend to generate larger bias against observations for those variables with larger inter-model deviation.The simulated TCF from IPCC AR4 GCMs are very scattered through all seasons over three ARM sites (SGP, Manus and NSA).The GCMs perform better at SGP than at Manus and NSA in simulating the seasonal variation and probability distribution of TCF; however, the TCF in these models is remarkably underpredicted and cloud transmissivity is less susceptible to the change of TCF than the observed at SGP.
Most of the GCMs tend to underpredict CF and fail to capture the seasonal variability of CF at middle and lower levels in the tropics.The high level CF is much higher in the GCMs than the observation and the inter-model variability of CF also reaches maximum at high level in the tropics.Most of the GCMs tend to underpredict the CF by 50-150 % at low and middle levels over SGP.Unlike clouds over Manus and SGP, most of clouds in NSA are in the lower troposphere.While the GCMs generally capture the maximum CF in the boundary layer and vertical variability, the inter-model deviation is largest near the surface over the Arctic.The internal variability of CF simulated in ensemble runs with the same model is very minimal.
While the results in this study could be valuable for advancing our understanding of the CF-related data that are available at ARM sites and for providing insights for the climate modeling community in improving the cloud-radiation interaction and the CF parameterization in climate models, several uncertainties should be taken into account in interpreting the results of this study.The primary one is the cutoff value of the optical depth to define cloudiness in the various observations and climate models.The CF at high altitudes is almost linearly dependent on the cutoff value of the optical thickness of these thin high-clouds.However, the choice of thresholds in determining thin cloud is somewhat arbitrary in climate models and in lidar/radar retrievals.The different cutoff values in defining the clouds in the ARM measurements and in the GCMs can result in appreciable uncertainty in comparing the CF in the observations and models.This uncertainty will be reduced once the cloud radar/lidar simulators are implemented into GCMs.
The second uncertainty is related to the different spatial coverage between the point observations and model results.The GCM grid variables represent an average in a grid cell (e.g.200 km × 200 km, which is a typical horizontal grid of AR4 GCMs in lower latitudes).The ARSCL is essentially the vertically pointing instruments with a narrow FOV that can only detect clouds directly above the instruments.This is a common problem when evaluating the model results, although the uncertainty introduced by this factor could be reduced by increasing the model spatial resolution or averaging more years of data.
The third uncertainty is the limited length of the valid measurement data available at the ARM sites.Although the CF and surface radiative fluxes data are available starting from the late 1990s at most of the sites, less than 10 yr of AR-SCL and TSI data are used for some variables in this study because of missing data due to instrument down time.However, the surface radiation data are mostly complete, and we show good agreement between the TSI and TSK TCF values, suggesting that the more continuous TSK data are well suited for long term comparison efforts of TCF.Nevertheless, a 20-yr complete dataset would be more ideal for this kind of study in a climatological sense.
It is highly desirable to see if the positive and negative attributes of model clouds can be associated with specific physical parameterizations.The results of this study show that there is no particular model with a specific cloud scheme that has superior performance in all aspects of CF simulation than other models in all three sites.The underestimation of TCF and the overestimation of optical thickness of clouds are common over SGP to models that used very different cloud schemes, however, this could be due to completely different reasons.As suggested by Webb et al. (2001), many other model components can be as important as the cloud and precipitation schemes in assessing clouds in models, such as the vertical resolution and cloud microphysical properties.Without carrying out controlled experiments by isolating individual physical parameterization components, it is difficult to pinpoint the source of the model differences.
In future work, we plan to do such experiments for the physics parameterizations used in the Community Atmosphere Model (CAM5).Colleagues at PNNL have implemented the CAM5 physics package in the Weather Research and Forecasting (WRF) model (J. Fast, personal communication, 2011), allowing examination of the range of behavior of the physics parameterizations over a range of scales, including those closer to the scale of the ARM observations.The WRF model can be run using all of the CAM5 physics or only individual components, which will allow exploration of the effects of individual parameterizations on the resulting cloud fraction and radiation relationships.The WRF model can also be run at very high spatial resolution, which can reduce the uncertainty in model evaluation induced by the different spatial coverage between model and measurement.Forcing the model with reanalysis data will also reduce the potential discrepancies in large-scale dynamics between the model and observations that can exist in free-running climate models, and this can be one cause of model/observation disagreement.We will also save hourly output from the model to allow us to investigate the diurnal cycle of cloudiness in parameterization scheme.
Inclusion of a ground-based radar and lidar simulator in the model will allow more direct assessment of cloud overlap, vertical structure, fall velocity, cloud phase, and cloud microphysical assumptions against the ARM radar observations.New radar observations and techniques such as radar spectra measurements provide vertical velocity statistics which can be used to examine assumptions in convective parameterizations (Kollias and Albrecht, 2010), better identification of regions with multi-modal characteristics such as mixed phase regions (Shupe et al., 2004), and better discrimination between cloud and drizzle (Kollias et al., 2011) which will be useful for investigation of autoconversion rates.The satellite simulator has been installed in some of the IPCC AR5 GCMs; we may repeat the analysis for the IPCC AR5 GCMs and compare the results with those from this study.

Fig. 1 .
Fig. 1.The time series of daily total sky cover or cloud fraction based on ARSCL, TSI, TSK and ARSCL-TSK over Manus, averaged only for daytime hours for 29 April to 8 July 2006 (top) and 16 January to 26 March 2007 (bottom), respectively, when all three datasets are available and both MPL and MMCR operate normally in ARSCL.Right panels are examples of total sky image.

Fig. 2 .
Fig. 2. The scatter plots for daily total sky cover or cloud fraction based on ARSCL and TSI (left) and TSI and TSK (right) over Manus(top), SGP (middle) and NSA (bottom).Only the days when both datasets are available are included in each plot.At NSA only days from April to September (when solar elevation is high enough for reliable TSI and TSK measurements) are used.For each panel, the correlation coefficient and root mean square deviation (RMSD) are given.
plicity of cloud types tends to offset detection differences between the different instruments.Meanwhile,Xi et al. (2010) analyzed one decade of ARM ARSCL and Geostationary Operational Environmental Satellite (GOES) observations at the SGP site and revealed an excellent agreement in the long-term mean CF derived from the surface and GOES data.Dong et al. (2006),Xi et al. (2010), andKennedy et al. (2010) have also found ARSCL CF to be statistically representative in long-term monthly or annual averages of the entire sky when compared with longterm satellite and surface observations, suggesting that the long-term ARM point observations can represent large areal observations.The consistency between long-term mean narrow FOV, hemispheric and satellite observations provides confidence for using ARM multi-year averaged monthly data to evaluate CF in climate models.4 Evaluation of surface radiation flux and total cloud amount 4.1 Inter-model divergence and model biases Here the TCF and the surface SW fluxes simulated in the major GCMs in IPCC-AR4 are inter-compared, and they are also evaluated against ARM measurements over all three sites.We compare the SW radiative fluxes under both cloudfree and cloudy skies and also calculate the cloud radiative effects and cloud transmissivity, attempting to investigate the role of TCF and other dimensions of cloud in contributing to the biases of simulated solar radiation fluxes in the GCMs.

Fig. 6 .Fig. 7 .
Fig. 6.The multi-year averaged monthly mean TCF for 11 individual GCMs and their average along with three different observations over Manus (top), SGP (middle) and NSA (bottom).

Figure 9 (
Figure 9 (left) shows the annual mean vertical profiles of cloud fraction (CF) derived from ARM ARSCL observations and simulated by seven GCMs over Manus, and the right panel shows the standard deviation among the models.Figure 10 shows the monthly time-height composite plots of CF.Simulated CF in most of the GCMs differs substantially from

Fig. 9 .
Fig. 9.The annual mean vertical profiles of cloud fraction (CF) from ARSCL observation and GCMs (left) and their standard deviation in seven GCMs (right) over Manus.The GCM results for 1980-1999 and ARSCL observation for 1999-2008 are used in climatology average.

Fig. 10 .
Fig. 10.The monthly mean time-height plots of cloud fraction (CF) from ARSCL observation and GCMs over Manus.

Fig. 15 .
Fig. 15.The vertical profiles of cloud fraction (CF) for four GISS and six IPSL ensemble simulations (left), and their standard deviations (SD, right) among ensemble runs over Manus.

Table 1 .
A summary of cloud fraction scheme and data availability in IPCC AR4 GCMs used in this study.
Clothiaux et al. (2000)st Estimate (CMBE,Xie et al., 2010and also at http://science.arm.gov/wg/cpm/scm/bestestimate.html).In this algorithm, a cloud point is first determined by MMCR or MPL and then screened by the best estimate of cloud base from both laser ceilometers and MPL to minimize the problem caused by precipitation in determining cloud base.As indicated inClothiaux et al. (2000), the laser ceilometers and MPL can provide quite accurate cloud base measurements because they are usually insensitive to ice precipitation (if the concentration of precipitation particles is not sufficiently large) or clutter.The ARSCL CF is then calculated by averaging the cloud mask points (where cloudy, clear or missing points are set) in the one-hour time period.Therefore, the ARSCL CF actually represents the frequency of cloud occurrence rather than fractional cloud area coverage.

Table 2 .
Annual mean Total Cloud Fraction (TCF) over three sites.

Table 3 .
Slope and correlation coefficient (in parentheses)between cloud cover and cloud transmissivity based on ARM data and nine GCMs results over three ARM sites.