of Geophysical Research : Atmospheres CAUSES : Attribution of Surface Radiation Biases in NWP and Climate Models near the U . S . Southern Great Plains

Many Numerical Weather Prediction (NWP) and climate models exhibit too warm lower tropospheres near the midlatitude continents. The warm bias has been shown to coincide with important surface radiation biases that likely play a critical role in the inception or the growth of the warm bias. This paper presents an attribution study on the net radiation biases in nine model simulations, performed in the framework of the CAUSES project (Clouds Above the United States and Errors at the Surface). Contributions from deficiencies in the surface properties, clouds, water vapor, and aerosols are quantified, using an array of radiation measurement stations near the Atmospheric Radiation Measurement Southern Great Plains site. Furthermore, an in-depth analysis is shown to attribute the radiation errors to specific cloud regimes. The net surface shortwave radiation is overestimated in all models throughout most of the simulation period. Cloud errors are shown to contribute most to this overestimation, although nonnegligible contributions from the surface albedo exist in most models. Missing deep cloud events and/or simulating deep clouds with too weak cloud radiative effects dominate in the cloud-related radiation errors. Some models have compensating errors between excessive occurrence of deep cloud but largely underestimating their radiative effect, while other models miss deep cloud events altogether. Surprisingly, even the latter models tend to produce too much and too frequent afternoon surface precipitation. This suggests that rather than issues with the triggering of deep convection, cloud radiative deficiencies are related to too weak convective cloud detrainment and too large precipitation efficiencies.


Introduction
Despite continued improvements in the representation of atmospheric physics and the relentless increase in spatial resolution, many Numerical Weather Prediction (NWP) and climate models still suffer from persistent biases in a number of key variables. One particular bias that is shared between a vast number of NWP and climate models is a too warm lower troposphere near the midlatitude continents in summer (Ahlgrimm & Forbes, 2012;Brisson et al., 2016;Van Weverberg et al., 2015). Given the nature of this bias and the unequivocal importance of the surface temperature for both weather forecasting and climate applications, there is an urgent need to fully understand its origin.
Research so far has brought forward three hypotheses about why the midlatitude continents, and in particular the Southern Great Plains (SGP) in the American Midwest, tend to be too warm in many atmospheric models. Klein et al. (2006) postulated that mesoscale convective systems, initiated in the lee of the Rocky Mountains, are underrepresented in many of the global circulation models (GCMs) that they investigated. A lack of convective clouds directly leads to excessive downwelling shortwave (SW) radiances and hence enhanced heating of the surface and the boundary layer. Moreover, the lack of convective rainfall could result in soils being too dry and hence overestimated Bowen ratios, further exacerbating the excess heating of the surface. Other studies have hinted at deficiencies in the representation of shallow cumulus clouds near the SGP as the main It should be emphasized that all data in the ARMBE are purely from the observations and hence are not influenced by any background model. Data were processed with a 1 h temporal resolution (similar to the radiation time step in most models under consideration) and averaged over a 2 × 2 ∘ grid surrounding the ARM SGP Central Facility, using the methods described in Zhang et al. (2001). This area includes measurements from the SGP Central Facility in Lamont, Oklahoma (E13) and three of the extended facilities (E9, E11, and E15). The location of these four sites is shown in Figure 1.
Surface precipitation data were also used from the ARMBE gridded 2-D product, which averages hourly precipitation data from the Oklahoma and Kansas Mesonet and the ARM standard rain gauge data to a 4 × 4 ∘ grid with a resolution of 0.25 × 0.25 ∘ , surrounding the ARM SGP Central Facility.

Central Facility Data
For the cloud regime analysis, vertically distributed information about the cloud position is required. While satellites with passive instruments can provide this information to some extent, they cannot detect low clouds when a high cloud layer is present. This complicates the analysis, in particular, for convective clouds with high cloud tops, one of the main regimes of interest. While this could be addressed by using active remote sensing such as those on CloudSat, the orbit is such that it will only pass over a given location once every few days. Therefore, it was deemed more sensible to make use of the detailed cloud observations at the SGP ARM Central Facility ( Table 2). The Active Remote Sensing of CLouds (ARSCL) ARM Value-Added Product (VAP) provides very detailed vertically distributed information of clouds, based on cloud radar, lidar, ceilometer, and radiometer data (Clothiaux et al., 2000).
By averaging the binary ARSCL cloud locations over a time window, "observed" cloud fractions can be obtained that can be compared against the model cloud fractions. To be consistent with the model grid resolution, this averging time window is derived for each model separately by dividing the horizontal model grid VAN Wang et al. (2014) Note. Provided are the main model reference, the number of vertical levels, the approximate grid spacing (dx), the radiation time step, and the references for the radiation and convection parameterizations spacing by the observed wind speed, averaged between 1 and 10 km altitude. Using the actual wind speeds at each level (and hence applying a different time window for each model level) did not change the results much. Observed wind speeds are obtained from the Merged Sounding ARM product (Table 2). This approach has been widely used in model evaluation studies (Bouniol et al., 2010;Illingworth et al., 2007;Morcrette et al., 2012;Van Weverberg et al., 2015;Yang et al., 2006) and should give reliable results for lower-order moments and long time series (Gruetzun et al., 2013). It should be emphasized that this method yields slight differences in the observed cloud fractions that each of the models are compared against. Indeed, the smaller the model grid spacing, the more the inferred cloud fractions should tend to 0 or 1. Wu et al. (2014) compared six different estimates of cloud fraction near the SGP for 10 years, including ARSCL and GOES-11. They found ARSCL-derived cloud fractions to be typically about 8% larger than GOES-11. Given that it is hard to judge which of the two observations should be considered more reliable, we assume an uncertainty of 8% for all cloud fractions in our analysis, but it should be stressed that it is hard to provide an objective estimate of the real uncertainty in cloudiness.
The goal of the cloud regime analysis in this study is to match cloud regimes with errors in the surface radiation balance. Hence, it is important that the cloud property observations are as consistent as possible with the surface radiation measurements. Therefore, only the radiometer data from the ARM Central Facility (E13) were used for the cloud regime attribution analysis, ignoring radiation measurements from the ARM extended facilities (E9, E11, and E15; Figure 1). For consistency with the cloud information, all radiation data were averaged over the same time window as the observed cloud fraction data. As for the ARMBE data described in the previous section, clear-sky radiative fluxes are obtained following the empirical fitting techniques described in Long and Ackerman (2000) and Long and Turner (2008). All-sky radiation uncertainties are taken from the ARM Solar Infrared Radiation Station Handbook (Stoffel, 2005) and represent inherent instrument calibration uncertainties. These are conservative estimates based on experiences with renewable energy applications. Clear-sky SW and LW irradiance uncertainties are determined by comparing 1 min clear-sky estimates against detected clear-sky data (Long & Ackerman, 2000) and a radiative transfer model (Long & Turner, 2008), respectively.
TOA fluxes were obtained from the Satellite Cloud Observations and Radiative Property retrieval System (SATCORPS; previously referred to as VISST) ARM Value-Added Product, which employs four visible and infrared channels from the GOES-11 geostationary satellite (Minnis & Smith, 1998;Minnis et al., 2011). This product is available at the native satellite resolution ("pixel-level," about 4 km) covering the region from 32 ∘ N-42 ∘ N and 105 ∘ W-91 ∘ W, surrounding the SGP ARM site, with a temporal resolution of 15 min. For the period from April to June, version 3 of this product was used, while version 4 was used for July and August.
The changes between the two versions should be minor and include slight improvements to the ice cloud properties and cloud masks near twilight. To compare the TOA fluxes in a consistent way with the surface radiation measurements and the ARSCL-derived cloud properties, the VISST broadband albedo and broadband VAN WEVERBERG ET AL. CAUSES: ATTRIBUTION OF RADIATION BIASES 3615  (2015) Ritsche (2008) Note. Provided are the abbreviations used throughout the text, the ARM retrieval method or value-added product, the temporal sampling (dt), the vertical spacing (dz), horizontal spacing (dx), the relative and absolute uncertainty (as a standard deviation) and the main reference to each data product and its uncertainty. Note that for the radiation measurements, the uncertainty for any particular measurement is the absolute or relative uncertainty listed, whichever is larger. outgoing longwave radiation (OLR) from the pixel closest to the SGP site was first selected. Subsequently, the same time window as for the surface radiation and cloud averaging was used to average over all satellite scenes available within that time window, applying a zenith angle adjustment to calculate the grid box average outgoing shortwave radiation (OSR). Spring and summer root-mean-square errors associated with the VISST retrievals are in the order of 5% for the OLR and 9% for the OSR (Khaiyer et al., 2011), based on comparison against CERES Aqua/Terra.
Last, a brief evaluation is performed of the liquid and ice water paths (LWP and IWP, respectively) associated with the cloud regimes. These are calculated from the liquid and ice water contents in the MICROBASE ARM Value-Added Product. This is an updated product from the version described in Dunn et al. (2011) and combines data from the ceilometer, micropulse lidar, microwave radiometer, merged sounding, and the vertically pointing Ka-band cloud radar (KAZR) at SGP. At the time of writing, this product was available with a 4 s temporal and 30 m vertical resolution for the period of the MC3E campaign. Hence, the analysis of the LWP and IWP in the models will be restricted to this period only (18 April to 6 June 2011). Similar to all above observations, the Microbase retrievals were time averaged using the 1-10 km averaged wind speeds from the merged sounding. Zhao et al. (2014) estimated the uncertainties in the retrievals of the liquid and ice water contents (LWC and IWC) to be in the order of 15 and 55%, respectively. The uncertainty in the IWC could be much higher than this value, however, given that uncertainties in the ice crystal habit and ice size distribution assumptions were not considered in these numbers. This needs to be borne in mind when interpreting the results shown here. Furthermore, the uncertainties mentioned in Zhao et al. (2014) are for an older version of this VAP, using the millimeter wavelength cloud radar, rather than the KAZR.

Description of Atmospheric Models
The models that took part in the CAUSES project vary in terms of their complexity, model resolution, and application. Some models, such as the IFS, are applied mostly in an NWP context, while others, such as the LMDZOR, CNRM, or CAM5 are used predominantly in climate applications. It is worth mentioning that most of the climate models used in this study consist of the atmospheric components of model versions intermediate between the Coupled Model Intercomparison Projects Phases 5 and 6 (CMIP5 and CMIP6; Eyring et al., 2016). Further differences exist in the domains these models are typically integrated over. The WRF models are limited area models (LAMs), while all other models are global models. Nevertheless, they all share a fairly distinct warm bias near the SGP, as outlined in Morcrette et al. (2017) An overview of the models that were analyzed in the context of this paper is provided in Table 1. Note that fewer models are analyzed here compared to Morcrette et al. (2017), since not all modeling centers provided all necessary fields to carry out the radiation and cloud regime analysis.
All models were run in hindcast mode, similar to the approach followed in . ERA-Interim reanalyzes (Dee et al., 2011) were used for the initialization at 00 UTC each day between 1 April and 31 August 2011, and all models were run freely for 5 days. Hence, a comparison can be made against observations at different lead times, shedding light on the growth of the model biases over time. Some key differences between the nine models in this study are highlighted in Table 1. Grid spacings of the models vary from 30 km in the METUM to 240 km in the CANCM4, while the number of vertical levels varies from 30 in the CAM5 and TAIESM to 91 levels in the IFS model. All models under investigation have parameterized deep convection and prognostic cloud microphysics. Cloud cover is diagnosed in most models. The IFS and METUM, however, have prognostic cloud fraction formulations. More details about all the models are provided in Morcrette et al. (2017).
Given the focus on understanding the radiation biases in the models, data were analyzed at the model radiation time steps (when the radiative transfer is calculated). This time step is 1 h for most models under investigation (Table1). The CNRM model has a longer radiation time step of 3 h and hence was analyzed at this coarser temporal resolution. The WRF models have a radiation time step of 10 min, but for practical reasons were only analyzed at hourly intervals.
While the METUM, IFS, and WRF simulations perform the radiative transfer prior to updating the cloud fields within a time step, all other models first update the cloud fields and subsequently perform the radiative transfer. This calling order matters when trying to trace the origin of biases in the surface radiation balance to the cloud properties in the models. Hence, the surface radiation fields in the IFS, METUM, and WRF models were compared against the cloud fields one physics time step prior to the radiation time step.
The analysis in the remainder of this paper focuses on the grid column closest to the ARM SGP Central Facility (36.60 ∘ N, −97.50 ∘ W). An overview of all model footprints of the grid column nearest to the SGP site is given in Figure 1. As can be seen on this figure, the location of the SGP site is fairly far from the center of the grid box nearest to it for many models. Hence, observed SW radiation has been adjusted using the cosines of the zenith angles of these two locations for each of the models. This adjustment is generally very small, but matters when comparing, for example, the observed and simulated diurnal cycles of radiation, mainly for the coarser models.

Radiation Attribution Methodology
The first goal of this paper consists of disentangling the net surface shortwave (ΔSW net ) and longwave (ΔLW net ) radiation biases into contributions from various processes as follows: where C , C surf , C cloud , C iwv , and C res are contributions from the surface albedo, surface LW emission, cloud, integrated water vapor, and a residual term, respectively. Appendix A provides a detailed derivation of each of these terms, but equations for the most important terms (C , C surf , and C cloud ) are as follows: where SW OD is the observed downwelling SW radiation and M and O are the simulated and observed surface albedos, respectively.
where LW MU and LW OU the modeled and observed upwelling LW radiation, respectively.
where SWCS MD and SWCS OD are the clear-sky SW downwelling radiation for the model and the observations, respectively.
The detailed derivations for each of these components are provided in Appendix A. The residual term includes contributions from errors in the aerosol optical depth and also numerical inaccuracies and issues with the clear-sky radiative transfer calculations in the models (Pincus et al., 2015).
Given that these contributions are not necessarily normally distributed, a Wilcoxon rank-sum test was applied on every contribution term to test whether the term was significantly different from zero at a 95% confidence interval.

Cloud Regime Analysis
A second goal of this paper is to further disentangle the contribution from clouds to the net radiation bias (equation (A5)) into contributions from different cloud regimes. Indeed, cloud-related radiation biases could predominantly originate from issues with low clouds, deep clouds, or even high clouds. As guidance to model development, it is instructive to know which of these regimes contribute most to the radiation biases.
Therefore, eight cloud regimes were defined, based on nonzero cloud occurrence in the lower (0-3 km altitude), middle (3-6 km altitude), and upper (6-15 km altitude) troposphere. To number the regimes, the absence of cloud in each layer is assigned a value of 0, while the presence of cloud in the lower, middle, and upper troposphere is given a value of 1, 2, and 4, respectively, similar to Van Weverberg et al. (2015), yielding a total of eight cloud regimes as shown in Figure 2.
It should be stressed that the regime definitions are based on any nonzero (>0.1%) cloud occurrence, and hence, clouds assigned to a certain regime could still span a wide range of cloud fractions. However, this regime classification was deemed the most sensible in terms of compromising between having a manageable number of regimes and being able to identify the most problematic cloud regime For each output time, the cloud contribution to the overall radiation bias C cloud (equation (A5)) could be assigned to an observed-simulated cloud regime pair. It is worth highlighting here again that, in contrast to the radiation attribution methodology described in section 3.1, the cloud regime analysis only uses the surface radiation data and vertical cloud profiles from the ARM SGP Central Facility. Figures 3a and 3b show the monthly running mean biases of the net SW and LW radiation, respectively, averaged over all hindcast days (T + 00 to T + 120). The net biases averaged over the entire simulation period are shown in Table 3. The largest net SW biases occur in the WRFCLM, the CANCM4, and the LMDZOR simulations. Smaller biases are present in the IFS, the CNRM, and the CAM5 simulations, albeit even these models experience an important positive bias in the net SW. Net LW biases are particularly negative for the CANCM4 and LMDZOR models, lacking nearly 50 W m −2 of net surface LW radiation. Much smaller biases of about −10 to −15 W m −2 are present in most other models. Since all models were run in hindcast mode, it is instructive to look at the average growth of the biases over five hindcast days as well (Figures 4a and 4b). From this figure, there does not appear to be a very systematic growth over the 5 days in any of the models, apart from a decrease in the net LW bias in the LMDZOR and CANCM4 near the end of May and July, respectively. None of the growth rates shown in Figures 4a and 4b exceed the observational uncertainty (denoted by the gray shading). Overall, just 1 day into the hindcast, most models tend to have a fairly important radiation bias that persists throughout the remainder of the 5 day hindcast (not shown).

Description of SW and LW Radiation Biases
VAN WEVERBERG ET AL. CAUSES: ATTRIBUTION OF RADIATION BIASES  Figure 1) and averaged over all hindcast days (T + 00 to T + 120). The colors indicate different models as shown in the legend. Observations used to calculate the biases are obtained using the Atmospheric Radiation Measurement Best Estimate by Tang and Xie (2015), for a region of 2 ∘ by 2 ∘ surrounding the Southern Great Plains Central Facility, as shown in Figure 1. The gray shading indicates the uncertainty in the observations, as calculated using the values in Table 2 and applying the rules for error propagation where necessary. The numbers to the upper right of each panel indicate the mean and standard deviation of the variable denoted in the panel below for the full simulation period.
Since the context of this radiation attribution study is the midlatitude warm bias, it is worth evaluating the temperature errors coinciding with the radiation biases in each of the models. The monthly running mean 2 m temperature errors in all models are depicted in Figure 3c, and the average temperature bias over the full simulation period is shown in Table 3. All models suffer from a persistent warm bias throughout most of the simulation period. The bias is particularly large in models with large positive biases in the net SW radiation (CanCM4, LMDZOR, and TAIESM; Figures 3a and 3b). Interestingly, some other models, such as the WRFCLM and WRFNOAH have fairly modest 2 m temperature biases, yet rather large errors in the net surface SW (Figures 3a and 3c).
VAN WEVERBERG ET AL. CAUSES: ATTRIBUTION OF RADIATION BIASES Table 3 Biases Note. Data are obtained from the Atmospheric Radiation Measurement Best Estimate data (Tang & Xie, 2015) and include the entire simulation period (April-August 2011) and the full hindcast time (T + 00 to T + 120).

of Net Shortwave (SW), Net Longwave (LW), 2 m Temperature (T2M), Integrated Water Vapor (IWV), Cloud Fraction (CF), and Albedo ( ) for All Models
It is instructive to compare the radiation biases to the biases in a number of variables that are known to affect the radiative transfer. Figures 3d-3f show the monthly running mean relative biases of the IWV, cloud cover, and albedo for all the models. Observations are taken from the gridded ARMBE data as described in section 2. From Figure 3d, the IWV remains within about 10% of the observations throughout most of the simulation period, apart from the CANCM4 (and to a lesser extent the LMDZOR), which tends to exhibit a more prominent dry bias of up to 20%. Many models show a tendency of developing a dry bias over summer ( Figure 3d), but again show fairly modest change of the bias over the hindcasts (Figure 4d). It is possible that the dry IWV bias, mainly later in summer, plays a role in the radiation biases seen in Figures 3a and 3b, given that dry atmospheres are more transparent to SW radiation and at the same time emit less LW radiation back to the surface. Further, all models tend to have insufficient cloudiness throughout most of the simulation period ( Figure 3e and Table 3), with the smallest bias magnitudes in the METUM and the largest biases in the LMDZOR, CNRM, and CANCM4. Some models tend to get cloudier over the five hindcast days, in particular, the CAM5, TAIESM, and METUM around June, partly offsetting the negative bias in cloudiness ( Figure 4e) later in the hindcasts. A lack of cloud would undoubtedly contribute to a positive bias in the downwelling SW and a negative bias in the downwelling LW radiation; hence, cloud cover biases seen in Figure 3e seem consistent with the radiation biases in Figures 3a and 3b.
A third possible contributor to SW radiation biases could be the surface albedo (defined as SW U /SW D ), shown in Figure 3f and Table 3. Apart from the WRFNOAH, all model surface albedos are lower than observed. The albedo bias remains very constant throughout the hindcasts (Figure 4f ), as would be expected given the typically slow changes in surface properties that affect the albedo. Again, a lower-than-observed albedo is consistent with a positive bias in the net SW radiation at the surface seen in Figure 3a.
It clearly is hard to identify one single culprit for the radiation biases present in all models, since many processes seem to push the SW and LW biases in the same direction throughout most of the simulation period. The next section will present a detailed attribution analysis as outlined in section 3 to quantify how much each of the above mentioned processes separately contributes to the overall radiation bias.

Attribution of Biases 4.2.1. Attribution of the Net Shortwave Bias
Since the growth rates of the biases in surface radiation over the hindcasts were found to be small (Figure 4), the remainder of this paper will focus on the entire simulation period from April to August 2011 and include the full hindcast lengths.
Given that biases in the surface albedo, clouds, and the IWV all tend to be consistent with a positive net SW bias and a negative net LW bias in most models (section 4.1), a more detailed and quantitative analysis is needed. Therefore, the bias attribution method laid out in section 3 has been applied to all model simulations. Figure   is the distribution around the mean contributions (as boxes and whiskers). The edges of the boxes are highlighted if the contribution is significant, based on a Wilcoxon rank-sum test.
Consistent with Figure 3f, the too low surface albedo in most models has a significant impact on the net SW radiation balance. The largest contribution of the surface albedo to the net SW radiation bias is found in the CAM5, the TAIESM, and the LMDZOR models, exceeding 15 W m −2 , which is an important portion of the total SW radiation bias. It is interesting to note that both the CAM5 and TAIESM share the same land surface scheme.
The WRFNOAH model has a significantly negative contribution from the surface albedo to the net SW bias (Figure 5c), consistent with the overestimated albedo throughout most of the simulation period in Figure 3f.
In all models, apart from the CAM5, clouds appear to be the dominant source of the net SW radiation bias.
The radiatively most important cloud biases are found in the WRF models, the CANCM4, and the LMDZOR (Figures 5b, 5c, 5g, and 5h), which is consistent with the large cloud fraction biases in these models in Figure 3e.
It should be highlighted that other than errors in the cloud cover (as shown in Figure 3e), errors can be present in the cloud optical properties even when the correct cloud cover is predicted. Indeed, the METUM has a fairly unbiased cloud cover on average (Table 3 and Figure 3e), yet still has a dominant contribution from cloud errors to the net SW radiation bias. Similar or slightly larger cloud-related radiation biases are found in the IFS, CNRM, and TAIESM (Figures 5e, 5f, and 5i), despite a more pronounced lack of cloud cover in these models Figure 5. Net shortwave radiation bias attribution in all models. Contributions from albedo (Alb), clouds (Cld), integrated water vapor, and residual processes (Res) are provided and calculated from equations (A4) to (A8). Boxes contain the 15th-85th percentiles of the distribution containing the full simulation period (April to August 2011, including day and night) and the five hindcast days. The mean contribution is indicated by the bold horizontal line, and whiskers indicate the 5th and 95th percentiles. Contributions that are significantly different from zero, based on a Wilcoxon Rank-Sum test (with 5% confidence interval) are highlighted by bold box edges and darker shading. Observational data used in this analysis obtained are from the Atmospheric Radiation Measurement Best Estimate data described in Tang and Xie (2015). Note that the Y axis is stretched below −40 and above 40 W m −2 (indicatd by the horizontal dotted line) to include all data on the panels.
compared to the METUM ( Figure 3e). The smallest, although still significant contribution of cloud errors to the overall net SW bias is found in the CAM5. Indeed, the cloud-related SW bias is smaller than the surface albedo-related bias ( Figure 5a) for this model. This finding is consistent with Van Weverberg et al. (2015).
From Figure 3d, it became clear that the atmosphere in most models is too dry over the SGP during the simulation period. However, SW radiation bias contributions from the biased IWV composites are very small for all models ( Figure 5). Somewhat surprisingly, the small IWV contributions are still significant for most models, apart from the WRFCLM and the CNRM.
The residual bias combines effects from the aerosol optical depth and inaccuracies in the clear-sky radiative transfer calculations. While generally small, this term still entails a significant contribution to the overall net SW radiation bias in all models. The sign of this bias varies among the different models in question, however.
The LMDZOR model appears to have the largest positive radiation bias associated with the residual term. It is unclear whether this model underestimates the aerosol loading in the SGP or whether possible inaccuracies in the radiative transfer could explain this positive contribution. Interestingly, the IFS and CNRM have a fairly large, but negative, impact of the residual term on the overall net SW bias (Figure 5e), partially offsetting positive contributions from the surface albedo and clouds (Figure 5e). In summary, this analysis has shown that cloud issues are the dominant source of the radiation biases in all models apart from the CAM5. The largest cloud radiative problems occur in the WRF models, the CANCM4, and the LMDZOR. A positive, but mostly secondary effect originates from albedo errors in most models, apart from the WRFNOAH. Figure 6 provides an overview of the contributions from each of the processes outlined in section 3 to the net LW radiation bias. Given that all models have too warm near-surface temperatures (Figure 3c), it is unsurprising that the surface contribution to the net LW bias is significantly negative in all models. Indeed, following the Stefan-Boltzmann law, warmer surfaces ought to have a larger upwelling LW radiation. The largest warm biases are found in the CANCM4 and LMDZOR models (Figure 3c), which also have the largest contributions from the surface to their overall net LW radiation bias. The surface temperatures in the WRFNOAH were comparable to the observed temperatures ( Figure 3c) and hence the contribution of the surface to the overall net LW radiation bias is small (albeit still significant).

Attribution of the Net LW Bias
As for the net SW radiation bias, errors in the cloud properties contribute very significantly to the overall net LW radiation bias in all models (second bar and whiskers in each panel of Figure 6). It is interesting to note that the variability among the different models in terms of the cloud contribution to the net LW radiation bias is smaller than for the net SW bias. Most models exhibit net LW biases of about −12 to −8 W m −2 due to errors in the cloud fractions or radiative properties. Also note that the negative LW cloud contribution generally does not offset the positive SW cloud contribution.
The dry IWV bias (Figure 3d) appears to have a significant, but generally small impact on the net LW radiation in all models, apart from the METUM and the CNRM. A clearly negative contribution can be found for those models with the largest overall dry bias, such as the CANCM4 and the LMDZOR (Figures 6g and 6h).
Interestingly, while still having a dry bias in the atmosphere on average (Figure 3d), some models, such as the CAM5 and the TAIESM, have a positive contribution from IWV to the LW radiation bias (Figures 6a and 6i). A mechanism explaining this apparent contradiction is provided in Appendix B.
The residual component (last box plot in each panel on Figure 6) is generally fairly small, but significant for all models. Fairly large positive contributions to the overall net LW radiation are found in the CANCM4 and the TAIESM (Figures 6g and 6i). The LMDZOR model is the only model with a large negative contribution from the residual term to the net LW radiation bias ( Figure 6h). As for the net SW radiation bias ( Figure 5), there is a need to further investigate what causes these small but significant contributions from the residual term in each of the models. This could be related to aerosol properties and also to issues with the radiative transfer code itself, such as the definitions of the solar and infrared spectra in each of the models (e.g., Li et al., 2010). Indeed, for many models the SW and LW biases in the residual term seem to balance each other (Figures 5 and 6). It should be mentioned that from Morcrette et al. (2017), the warm bias extends fairly deep into the troposphere. Hence, it can be expected that many models suffer from an excess in downwelling LW radiation associated with the warmer atmosphere. Ideally, this effect should end up in the residual term in Figure 6, but it should be borne in mind that temperature and water vapor content of the atmosphere are highly correlated. Hence, the contribution from a too warm troposphere will be hard to disentangle between the IWV and residual components in our analysis.

Cloud Contribution to Radiation Bias 4.3.1. Cloud Property Analysis
Given that clouds cause a significant portion of the net SW and LW radiation biases in many of the models in this study, a more detailed evaluation of the cloud fields is desirable. As a first step, this section deals with the ability of the models to replicate the vertical and diurnal structure of observed cloudiness. The next section will then establish the link between the cloud and radiation deficiencies, using the cloud regime definitions outlined in section 3. Figure 7 shows the diurnal cycle of the average cloud fraction profile at the SGP in the observations ( Figure 7a) and in all of the nine models under investigation. For this particular period in 2011, observed cloud fractions showed a maximum in the midlevels in the morning and evening, and a minimum in the afternoon. In the lower troposphere, a weak maximum in cloud fraction is apparent around noon ( Figure 7a). This diurnal cycle of the cloud fraction profiles compares well with Zhang and Klein (2010), who looked at 11 years of cloud data over the SGP. Similar to Morcrette et al. (2012), we acknowledge here that errors in the average cloud fraction (AVG) entail a combination of errors in the frequency of occurrence (FOO) and the amount when present (AWP). Potential compensating errors between the FOO and the AWP matter in the light of radiation errors given the nonlinear impact of clouds on the radiative transfer. Furthermore, revealing these errors can also help unveil the origins of the cloud deficiencies in the various models under investigation. The observed FOO ( Figure 8a) shows how cloud frequently builds up in the low levels around local noon, then deepens throughout the afternoon, culminating in an abundance of high clouds in the late evening. However, there is also a second, weaker peak in the FOO in the early morning. This peak appears to be more restricted to the middle and high levels, possibly associated with the nocturnal elevated convective storms that frequently occur in the SGP (Geerts et al., 2017). The observed AWP (Figure 9a) is smallest in the afternoon, mainly in the low levels of the atmosphere, indicating that the large FOO (Figure 8a) in the afternoon is associated with partly cloudy scenes, typical for cumulus and congestus-type cloud.
From Figures 7b-7d, the CAM5 and the WRF models have similar diurnal cycles in their cloud fraction profiles. This is somewhat surprising given their very different cloud-related SW and LW radiation errors (panels (a)-(c) of Figures 5 and 6). However, recall that these models all share the same set of atmospheric physical parameterizations (Morcrette et al., 2017). These three models tend to have a very distinct maximum in the cloudiness in the late afternoon, at all levels of the atmosphere (Figures 7b-7d). The fairly well captured afternoon AVG cloud cover masks large compensating overestimations in the FOO (Figures 8b-8d) and underestimations in the AWP (Figures 9b-9d), however. Possibly, this hints at the CAM5 and the WRF models triggering deep (convective) cloud too easily, but systematically underestimating the cloud fraction associated with these clouds. This would be consistent with earlier findings by, for example, Wang et al. (2014). The nocturnal AVG cloud cover, and in particular the early morning observed maximum, is largely underestimated in these three models (Figure 7), due to too infrequent cloud ( Figure 8) and too small AWP (Figure 9). The observed cloud fraction shown in panel (a) is the fraction regridded to the IFS grid definition, since this model has the most detailed vertical resolution. Observations regridded to each of the respective model grids were used to calculate the biases shown by the black contours. The bias for each of the models is shown as solid (positive) and dashed (negative) lines on top of the color shading. Contour intervals are 3%. All panels include data from the full simulation period (April-August 2011) and all hindcast days (T + 00 to T + 120) for the respective models, including cloudy and noncloudy data points The diurnal evolution of the AVG cloud fraction profile is fairly similar between the METUM and the IFS (Figures 7e and 7f ). In contrast to the CAM5 and WRF models, the METUM and IFS lack the prominent maximum in the AVG cloud cover in the afternoon. Both models exhibit fairly weak diurnal cycles in their AVG cloud cover, with persistent overestimations of the high cloud cover and underestimations of the middle and low cloud cover. While the AVG cloud fraction profiles in these two models look similar, their FOO (Figures 8e  and 8f ) and AWP (Figures 9e and 9f ) are very different. The METUM seems to have cloud too infrequently in the low level and midlevels (Figure 8e), but captures the AWP fairly well. Despite its lack of midlevel AVG cloud, the IFS largely overestimates the midlevel and high-level cloud frequency (FOO; Figure 8f ), mainly in the early evening hours. However, apart from within the boundary layer, the AWP in the IFS is dramatically underestimated compared to the observations.
A more pronounced diurnal cycle in cloudiness is present in the CNRM (Figure 7g), compared to the IFS and METUM. However, rather than the observed maximum in midlevel and upper-level cloudiness in the morning and late evening, the CNRM has its maximum in AVG cloud fraction in the afternoon. At this time, VAN  the upper-level AVG cloud fraction is overestimated, while the low-level cloudiness remains underestimated throughout the entire diurnal cycle. The latter is mainly due to underestimating the FOO (Figure 8g), while the AWP is fairly well captured, or even slightly overestimated in the CNRM (Figure 9g).
The CANCM4, LMDZOR, and TAIESM all share a deficiency in their low-level and midlevel cloud throughout the entire day (Figures 7h-7j). In the CANCM4 and LMDZOR this is mainly due to very infrequent cloud at these levels of the atmosphere (Figures 8h and 8i). The TAIESM (Figure 7j) has better captured low-level cloud statistics than the LMDZOR and the CANCM4, but has dramatically less cloud in the afternoon than the CAM5 (Figure 7b). This is remarkable, given that TAIESM and CAM5 share most of their dynamics and physics. The main difference between the CAM5 and TAIESM resides in the cloud macrophysics and convective triggering, which clearly has an important impact. The updated convective triggering in the TAIESM, described in Wang et al. (2014), acts mainly to suppress the convective initiation, which has reduced the overestimated FOO in the afternoon compared to the CAM5. However, given that the AWP in the TAIESM (Figure 9j) is relatively similar to the CAM5 and still underestimated at midlevels, the net result is now an underestimation of the AVG cloud amount. The AVG high-level cloud in the TAIESM is better captured than in the CANCM4 and LMDZOR (and the CAM5), both in terms of the FOO and the AWP (Figures 8 and 9). The CANCM4 and LMDZOR have fairly similar and somewhat underestimated AVG cloud amounts in the upper levels of the troposphere (Figures 7h and 7i), but this masks very different behavior in terms of the FOO and the AWP. Indeed, while the CANCM4 exhibits a compensation between far too small FOO and too large AWP (Figures 8  and 9h), the LMDZOR compensates a far too large frequency with much too small AWP (Figures 8 and 9i).

SW Radiative Effect of Cloud Regimes
It is clear from the previous section that all models struggle to replicate the observed cloud fraction and frequency, although each of the models appears to have very model-specific issues. To further tease out the specific issues in each of the models, it is necessary to perform a more detailed analysis. While the average diurnal cycles of the cloud fraction profiles in Figure 7 shed some light on the vertical structure of the clouds, they do not provide information on how the model biases in the vertical model levels are vertically overlapped. Indeed, a bias in the average high-and low-level cloud fraction could be the result of unconnected issues with high-level clouds during a particular period and low-level clouds during a different period, but they could also originate from a bias in deep cloud properties, including low and high-level cloud at the same time. The regime classification proposed in section 3 enables a simple analysis of the vertical cloud structure, while keeping the tie between cloud issues that are vertically overlapped.
The ultimate goal of this paper is to find how errors in the cloud properties propagate into biases in the surface radiation. To this end, surface radiation biases can be assigned to their concurrent cloud regimes, allowing an unambiguous identification of the cloud regimes that matter most from a radiation error point of view. For the sake of brevity, the remainder of this analysis will focus on the SW radiation, although it should be mentioned that similar findings were made for the LW radiation errors in most models. To provide a way to evaluate VAN WEVERBERG ET AL. CAUSES: ATTRIBUTION OF RADIATION BIASES  Note. Shown are the low cloud regime (1), mid cloud (2), mid/low cloud (3), high cloud (4), low/high cloud (5), mid/high cloud (6), and the deep cloud regimes (7). The regime with the largest absolute CRE error is highlighted in bold for each model. Radiation data are from the radiometers at the SGP Central Facility, and the Cloud information is from ARSCL. Data include the entire simulation period (April-August 2011) and the full hindcast time (T + 00 to T + 120) and are averaged over the entire diurnal cycle (day and night). Note that the observations are regridded to each of the model grids and hence vary slightly for each of the models.
the coevolution of cloud and radiation errors, the total shortwave cloud radiative effect (tCRE) is calculated for each of the models and the observations as follows: where the index i denotes the regime (from 0 to 7), SW Di and SWCS Di are the average all-sky and clear-sky downwelling radiation associated with regime i, and freq i is the frequency of regime i (as a fraction). Note that the CRE as defined here only includes downwelling radiation near the surface, as opposed to the more conventional definition using the net surface radiation.
The tCRE for each of the simulated and observed regimes is provided in Table 4. Furthermore, Figure 10 visualizes the two components of equation (6) for each of the models and the observations. The average shortwave cloud radiative effect (aCRE i ) of each regime (SW Di -SWCS Di ) is shown as the height of each bar in Figure 10, while the FOO freq i is shown as the width of each bar. Hence, following equation (6), the surface area of each bar (width × height) is the SW tCRE of the regime in question, also listed in Table 4. For each model under investigation, the simulated regime properties are given above the zero line in the panel, while the regridded observed regime properties are given below the zero line in the panel. A flawless model would hence appear as a perfect reflection below the zero line of the regime properties above the zero line. This allows the SW tCRE of each of the models to be evaluated, while looking at the contributions from each of the eight cloud regimes, each of which is a product of two factors. If a particular regime has a bias in its tCRE, this could be due to the model not producing this regime frequently enough (width of the bars; freq i in equation (6) and/or the model producing the wrong radiative properties associated with the regime (height of the bars; aCRE i in equation (6). The C cloud defined in equation (A5) is hence the difference in the simulated and observed tCRE, summed over all regimes (multiplied by the one minus observed albedo).
One of the more puzzling findings of the previous sections is the large similarity in the diurnal cycles of the vertical cloud properties between the CAM5, on the one hand, and the WRF models, on the other hand (panels (b)-(d) of Figures 7 to 9), while their cloud-related surface radiation errors are very different ( Figure 5).
VAN WEVERBERG ET AL. CAUSES: ATTRIBUTION OF RADIATION BIASES   (Figure 10a) and the WRF models (Figures 10b and  10c), consistent with the diurnal cycles of the cloud properties in the previous section (Figures 7-9). Indeed, these three models tend to have clear skies too often (regime 0). The frequency of the deep cloud regime (regime 7) is slightly overestimated in the CAM5 and largely overestimated in the WRF models. It should be noted that the frequency of the deep cloud regime in the afternoon in the CAM5 is much more overestimated than in the morning (not shown). This confirms that their large overestimation in the FOO (Figures 9b  to 9d) at all levels of the atmosphere in the afternoon is largely due to the deep cloud regime. Interestingly, the average cloud radiative effects (aCRE) of the regimes exhibit greater differences between the CAM5 and the WRF models (height of the bars in Figures 10a to 10c). Given its large frequency as well as its large aCRE, the deep cloud regime (regime 7) dominates the overall cloud radiative effect (Table 4 and Figure 10). The tCRE of this regime is fairly well captured in the CAM5 (Table 4 and surface area of the red bar in Figure 10a), due to compensating errors between a too large frequency and a too small aCRE. Conversely, the tCRE of regime 7 in the WRF models is largely underestimated (Table 4 and Figures 10b and 10c), due a to very small aCRE for this regime. This analysis highlights that the too large downwelling surface SW radiation in the WRF models does not originate from missing deep cloud events, but rather from misrepresenting their CRE when they are present. The following section will further delve into the origins of the different CRE of regime 7 between the CAM5 and the WRF models. Given that these models share the same atmospheric physics parameterizations, it would be interesting to find out whether this difference is predominantly a resolution effect or due to the different dynamical cores in these models. It is worth noting that many other regimes contribute small amounts to the overall bias in the CRE as well as in all three models, specifically the low/mid (3) and mid/high (6) regimes (Table 4). Indeed, mainly these latter regimes are responsible for the (small) contribution of clouds to the overall surface radiation errors in the CAM5 ( Figure 5).
While the AVG cloud fraction profiles were shown to be similar between the METUM and the IFS (Figures 7e  and 7f ), their cloud regime statistics differ somewhat ( Figure 10). The METUM appears to have too much clear sky at the expense of mainly the deep cloud regime (width of the red bars in Figure 10d and Table 4). The IFS captures the frequency of the deep cloud regime well, but tends to overestimate the occurrence of the mid/high cloud (regime 6) at the expense of high cloud (regime 4; Figure 10e). Focusing on the deep cloud regime, it appears from the surface area of the red bars in Figures 10d and 10e, and from Table 4, that the tCRE of this regime is very similar between the IFS and the METUM, and generally underestimated by nearly 40%. This is mainly a consequence of not having this regime often enough in the METUM, while the aCRE is well captured (Figure 10d). However, the inverse is true for the IFS. The latter model indeed produces regime 7 often enough, but typically with too low aCRE (Figure 10e). This is consistent with the very different profiles of the FOO and the AWP between these two models (panels (e) and (f ) of Figures 8 and 9). While regime 7 is the main culprit for cloud-related radiation errors in both models (responsible for about 40% of the bias), other regimes do contribute as well. The high cloud regime (4) contributes in both models due to too weak aCRE in the METUM and too small frequency in the IFS. The low/mid cloud regime behaves similarly to the deep cloud regime and it is its frequency that is underestimated in the METUM (with a fair aCRE), while the aCRE is too small in the IFS (but with fair frequency). The aCRE of the low cloud regime (1) is too small in both models, but this is compensated by too large frequency of this regime in the METUM. The too small aCREs in the IFS associated with most regimes are consistent with, for example, Ahlgrimm and Forbes (2012), who found the IFS to be associated with positive biases in the net SW radiation, even when low-level, deep-level, or midlevel regimes were correctly simulated. It is worth mentioning that the METUM and IFS overall appear to best reproduce the regime frequencies. Interestingly, these are the only models that have a prognostic cloud fraction scheme as well.
Similar to the METUM, the CNRM misses a large portion of the tCRE associated with the deep cloud regime due to underestimating its frequency, while capturing its aCRE quite well (Figure 10f and Table 4). However, the main characteristic of the CNRM is its spurious large frequency of low-level cloud (regime 1), mainly in the afternoon. In Figure 8g there appears to be a maximum in the FOO in the lower levels, but this seems to be still smaller than observed and never more than 15%. One possible explanation is that this model consistently produces excessive shallow cloud in a single model layer, for example, closest to the top of the boundary layer. Since this level probably varies from day to day, the very large frequency of low cloud in Figure 10f could be smeared out over several levels in Figure 8g. This might suggest that the CNRM triggers its shallow convection scheme too readily, but limits these clouds to a single model layer, which could partly explain the very small aCRE. Further, the CNRM largely underestimates the abundance of regimes 4 and 6, while it has too much clear sky (regime 0), leading to an important total bias in the CRE of all regimes combined (Table 4), consistent with Figure 5f.
Both the CANCM4 and the LMDZOR dramatically underestimate the frequency of all cloud types that include low-level cloud (regimes 1, 3, 5, and 7; Figures 10g and 10h), leading to much too small tCREs for all these cloud types ( Table 4). The very different FOO and AWP in the upper levels of the troposphere between these two models (panels (h) and (i) of Figures 8 and 9) are reflected in the very different frequency and aCRE of the high cloud regime (4; Figures 10g and 10h and Table 4). The CANCM4 misses much of the high cloud, but overdoes its aCRE, while the LMDZOR has more than double the frequency of high cloud, but with very limited aCRE. It should be highlighted here that the CANCM4 and LMDZOR are the two models with the largest overall bias in the total CRE, which appears consistent with the fact that these models also seem to have the largest 2 m temperature bias (Figure 3c).
The change in the convective triggering in the TAIESM compared to the CAM5 has had a dramatic impact on the cloud properties, as shown in the previous section. This also has important consequences from a radiation point of view. As shown in Figure 10h and Table 4, while the aCRE associated with regime 7 has been largely improved, now the frequency of deep cloud is severely underestimated in the TAIESM, compared to the overestimation in the CAM5. This is partly compensated for by an increase in the frequencies of the low, high, VAN  and mid/high cloud regimes (1, 4, and 6), but overall this has led to a large increase in the bias in the total CRE for all regimes combined (Table 4).
While the analysis from Figure 10 helps understand which deficiencies in the cloud regime statistics are most relevant from a radiation point of view, it is still unclear what role is played by timing errors in the regime occurrences. Indeed, when a certain regime has a too low frequency, it would be helpful to understand which regime is typically simulated instead. For instance, is there a tendency of missing deep cloud events altogether, or of producing cloud that is not deep enough? Even when we get the right frequency of a particular regime over a diurnal cycle, errors in the radiative properties might exist. In this case, the question is whether we get biases in the radiative effect because the regimes occur at the wrong time of day, or whether the regime consistently has wrong radiative properties, even when its timing is correct. The latter is important in light of previous studies repeatedly finding that models with parameterized convection fail to reproduce the observed diurnal cycle of cloudiness and precipitation (Clark et al., 2007;Gustafson et al., 2015;Walther et al., 2013;Yang & Slingo, 2001).
To help unravel these issues, Figure 11 shows the diurnal cycles of the cloud-related SW radiation biases (as defined in equation (A5) While the cloud fraction statistics and the regime frequencies in the CAM5 and the WRF models are very similar (Figures 7 to 10), the observed-simulated regime pairs contributing most to the SW bias are very different between these models (Figures 11a-11c). While the overall cloud contribution to the SW bias remains positive throughout the day (bold black lines) in the CAM5, there are often negative contributions when the model simulates deep cloud (regime 7; red bars below the zero line in Figure 11a) when other regimes are observed. This is consistent with deep clouds being triggered too frequently in the CAM5, often when no (0) or just high clouds (4) should be present. However, when the deep cloud regime is correctly simulated, it is associated with a positive SW bias (red bars with dense crossed hatching in Figure 11a), consistent with the too small aCRE of this regime in Figure 10a. Furthermore, simulating clear skies when other regimes (mainly 3 and 7) should be present contributes a fair amount to the SW bias as well (white bars above the zero line in Figure 11a). This analysis confirms that the CAM5 has compensating errors between too frequent deep cloud, but with too weak aCRE. This also means that if the aCRE of deep cloud would be improved in this model, it is likely that this model would develop an overall negative SW bias associated with clouds, given its too frequent deep cloud triggering. In contrast to the CAM5, virtually no cloud regime pairs contribute negatively to the overall SW bias in the WRF models (no bars below the zero line in Figures 11b and 11c). Even occasions of triggering deep cloud (7) instead of any other regime, coincide with a positive SW bias in these models (red bars with "M" symbol hatching above the zero line in Figures 11b and 11c). A very large contribution to the SW bias appears when the model correctly simulates deep cloud (red bars with dense crossed hatching in Figures 11b and 11c). The main difference between the CAM5 and the WRF models is the even smaller aCRE for deep clouds in the WRF models, while their cloud regime statistics are very similar (Figure 10).
The METUM and IFS had very similar AVG cloud fraction profiles (Figures 7e and 7f ), but somewhat different regime statistics, with the high cloud and the mid/high cloud regimes simulated too frequently in the METUM and IFS, respectively. Simulating clear skies (white bars) or high cloud (yellow bars) instead of deep (densely crossed hatching) cloud events hence is a main contribution to the SW bias in the METUM (Figure 11d). These errors contribute fairly little in the IFS, for which the SW bias mainly originates from producing mid/high cloud (pink bars) when deep cloud should be present (Figure 11e).
Radiation is called only once every 3 h in the CNRM, hence the coarser temporal resolution in its diurnal cycle of the cloud regime analysis in Figure 11f. The excessive frequency of the low cloud regime draws most of the attention in this model. The main regime that contributes to the overall SW radiation bias is producing just low-level cloud (blue bar) when deep cloud is observed (dense crossed hatching). This might hint Figure 11. Decomposition of the diurnal cycle of the cloud-related net SW radiation bias (as in equation (A5) into the observed-simulated regime combination. All panels include data from the full simulation period (April-August 2011) and all hindcast days (T + 00 to T + 120) for the respective models. The height of each stacked colored bar denotes the average contribution of the respective observed-simulated regime pair to the overall C cloud at that time of the day. Note that the legend is constructed so that the color indicates the simulated regime and the hatching style matches the observed regime, as indicated in the legend to the right of the panels. The temporal intervals correspond to the radiation time step in each model. Observed-simulated regime pairs that contribute less than 5 W m −2 are combined in the "mixed" category. Observed cloud fractions and radiation are obtained from ARSCL and RADFLUXANAL respectively. The black line overlayed on top of the bars denotes the overall C cloud for each time in the respective model. at a problem with the convective diagnosis in this model. Missing deep clouds (regime 0 instead of 7) altogether or correctly simulating deep clouds (7-7) also contributes a fair amount to the radiation bias.
The largest portion of the SW bias in the CANCM4 and LMDZOR models originates from missing deep cloud events (regime 7, dense hatching in Figures 11g and 11h). While the CANCM4 mainly simulates clear skies instead (white bars in Figure 11g), the LMDZOR mainly does so at the expense of high and mid/high clouds (yellow and pink bars in Figure 11h).
The partitioning of the cloud-related SW bias into observed-simulated regime pairs in the TAIESM looks very different from the CAM5 (Figure 11i versus Figure 11a). Given that the deep cloud regime is severely underrepresented in the TAIESM, rather than overestimated in the CAM5, there are hardly any negative contributions to the overall SW bias in the TAIESM. Moreover, most of the cloud-related SW bias finds its origin in missing deep clouds (densely hatched bars) and simulating clear skies, mid/high, high, midlevel or even low clouds instead. Interestingly, there also does not appear to be a significant SW bias associated with correctly simulating the deep cloud regime in the TAIESM, indicating the well-captured aCRE of this regime.  (7) (e), low, middle, and high cloud fraction based on a random maximum overlap (f, g, and h), total, liquid, and ice water path (i, j, and k). Apart from the water paths, all data are for the full simulation period (April-August 2011) and all hindcast days (T + 00 to T + 120). The water paths are for the MC3E period only (18 April to 6 June 2011). Observations are for the Central Facility, as indicated in Table 2 and have been regridded to each model as outlined in the text. Gray shading indicates the observational uncertainty, based on the numbers in Table 2 and taking into account the rules for error propagation where necessary. The numbers to the upper right of each panel indicate the mean and standard deviation of the variable denoted in the panel below, averaged for the full simulation period and over all model grid lengths.

Cloud Property Evaluation of Deep Cloud Regime
While the scope of this paper is not to give a full diagnosis and explanation of the radiation errors in each of the models participating in the CAUSES project, a more detailed analysis of the cloud properties of the deep cloud regime is desirable, given its importance in the SW (and LW) radiative errors in all models.
The diurnal cycles in the absolute biases of a number of cloud and radiation properties associated with regime 7 are shown in Figure 12. Again, all observed properties have been regridded to their respective model grids before the bias was calculated. The numbers indicated above each panel denote the mean and standard deviation of the observed value of the variable shown in the panel, averaged over the diurnal cycle and all model grid lengths. It should be mentioned that the analysis period for the liquid, ice and total water path in this figure (Figures 12i-12k) is restricted to the MC3E IOP, given that these observations were only available during this period at the time of writing. The regime statistics for all models were very similar during this 6 week period compared to the full 5 month simulation period, however (not shown).
Consistent with the previous analysis, the frequency bias of the deep cloud regime varies widely from one model to another (Figure 12a). All models tend to underestimate the frequency of this regime in the early morning. This underestimation persists throughout the entire diurnal cycle in the TAIESM and CNRM, and even more in the CANCM4 and LMDZOR. The METUM continues to underestimate the frequency of regime Figure 12c shows the downwelling surface SW bias associated with regime 7. The bias is largest in the two WRF models, consistent with the shallow height of the red bars in Figures 10b and 10c. Magnitudes for the CAM5 and IFS are similar in the afternoon, experiencing slightly too large downwelling SW radiation for regime 7. The METUM has fairly well captured downwelling SW associated with regime 7 throughout the diurnal cycle, while the surface in the CNRM, TAIESM, CANCM4, and LMDZOR receives far too little SW when deep clouds are present.
From Figure 12d, the TOA OSR errors correlate reasonably well with the bias in the surface SW for most models (Figure 12c). This indicates that most of the surface radiation biases in the deep cloud regime can be explained by errors in the cloud top reflectivity. From the net TOA and surface (SFC) SW radiative fluxes, the SW column radiative heating can be estimated as follows: where the suffix D and U stand for downwelling and upwelling, respectively. Figure 12e provides the biases in the SW radiative heating for all models. While the uncertainties in the observed radiative heating are fairly large, most models, apart from the IFS in the afternoon, seem to underestimate the amount of SW heating during the daytime when deep clouds are present. For the WRF and CAM5, these biases in the radiative heating are consistent with the lack of cloud fraction in regime 7 (leading to subdued tropospheric SW absorption). For the other models (except the IFS), the lack of SW absorption seems contradictory with the lack of SW radiation reaching the surface during regime 7 (Figure 12c). This does suggest that the deep clouds in these models are very reflective, leaving less radiation to be absorbed both at the surface and within the troposphere.
Errors in the surface and TOA radiative properties of the deep cloud regime could have number of underlying causes. It could be that cloud fractions detrained from parameterized convective cores are too small, the water paths in these clouds could be underestimated, or effective radii could be poorly represented in the radiation schemes. Diurnal cycles of the total, low, mid, and high cloud fractions are given in Figures 12b and 12f-12h. From these panels, it is clear that the models allowing too much SW to reach the surface in the deep cloud regime (i.e., CAM5, WRF, and IFS), also have too small low-level and midlevel cloud fractions for this regime (Figures 12f and 12g). The large difference between the surface SW bias in the CAM5 and the WRF models is not due to very different biases in the cloud fractions between these models. Indeed, low, mid, and high cloud cover are fairly similar between these models in the afternoon. The IFS suffers from a persistent lack of cloud associated with regime 7 during the entire diurnal cycle. The cloud fractions in the CNRM and METUM are reasonably well captured. The large overestimations in cloudiness during the deep cloud regime in the LMDZOR, CANCM4, and the TAIESM are mainly due to large overestimations in the low, mid/high, and high cloud cover, respectively (Figures 12f-12h).
The total (TWP), liquid (LWP), and ice water path (IWP) biases associated with regime 7 are shown in Figures 12i to 12k. It should be mentioned here that the simulated water paths only include hydrometeor species that are seen by the radiation scheme, while the MICROBASE-derived water paths will include all condensed water in the tropospheric column. In many models, precipitating hydrometeors such as snow, rain, or graupel are ignored by the radiative transfer. While this complicates the evaluation of the water paths, our focus on explaining biases in the radiative transfer justifies the approach used here. Indeed, given that this analysis tries to uncover how cloud errors trickle down to radiation errors, it is the cloud properties seen by the radiation that are mainly of interest here. It should be kept in mind, however, that biases in the water paths listed here could either point to issues with the cloud properties or to the omission of some water species in the radiative transfer.
Again, the water path biases explain some, but not all of the biases seen in the surface and TOA radiation characteristics. The excessive afternoon downwelling SW at the surface and negative TOA OSR bias in some models (i.e., CAM5, WRF and IFS) correspond well with an important bias in TWP (Figure 12i), including both the LWP (Figure 12j) and the IWP (Figure 12k). Most other models also suffer from negative biases in their early morning and afternoon TWP. Water paths are better captured in the late morning in most models, and overestimated in the CANCM4 and the LMDZOR, which is mainly due to too large LWP (Figure 12j). Interestingly, the METUM has a fairly negative TWP bias throughout the day in the deep cloud regime, both due to too little liquid and ice. This suggests that the well-captured surface SW and TOA OSR in the METUM (Figures 12c and  12d) hides a compensating error between slightly too large cloud fractions and too small water paths (Figures 12b and 12i). Conversely, the IFS has slightly better captured TWP (Figure 12i), but its cloud fraction is severely underestimated (Figure 12b). The large difference between the WRF and CAM5 surface SW and TOA OSR biases seems to correspond with smaller LWP in the WRF models (Figure 12j), despite their identical physics packages. The IWP and cloud fractions are indeed very similar between these three models. It needs to be further investigated whether this difference is predominantly an effect of the different dynamics between the CAM5 and the WRF models, or an impact of their different resolutions.

Evaluation of Surface Precipitation
The various issues associated with the characteristics of the deep cloud regime all hint at important deficiencies in the triggering of deep convection in many of the models. For some models, such as the CANCM4, the dry IWV bias suggests that the large-scale dynamics might not be captured well enough to trigger convection. For the other models, it should be further understood whether the deep convective parameterization is not capable of triggering local diurnal-heating-induced convection, or whether the propagating systems over the Great Plains, initiated in the lee of the Rockies (Klein et al., 2006) are poorly represented. Apart from detrained cloud fraction and condensate, the convection parameterization also produces precipitation (and changes in the atmospheric heating).
A closer look at the diurnal precipitation characteristics might hint at answers to some of the above questions. Figure 13 shows the diurnal cycle of the mean hourly precipitation rates and the frequency of rainfall rates larger than 0.5 mm h −1 for the models and as observed by the ARMBE rainfall product, all averaged over the entire simulation period from April to August 2011. Note that the precipitation frequency is expected to change as the averaging area increases. Therefore, the observed frequencies are calculated both for 0.25 × 0.25 ∘ and 2.0 × 2.0 ∘ averaging domains, roughly covering the grid spacings of the models investigated. The gray-shaded area in Figure 13 hence delineates the minimum and maximum observed precipitation frequencies, considering varying averaging domains. Table 5 provides the total rainfall amounts as observed and simulated by all models. The observed precipitation rates and frequencies exhibit a distinctive peak in the evening and a second peak in the early morning, both coinciding with episodes of enhanced observed cloudiness (Figure 7a). Apart from the METUM, none of the models seem to capture the early morning peak in the precipitation rates (Figure 13, top). The afternoon peak in mean precipitation rates is largely overestimated in the WRF models, the IFS, the METUM, and the CANCM4 and its timing is too early in all models, although the timing is slightly better captured in the IFS and the LMDZOR (Figure 13, top). The latter is consistent with, for example, Rio et al. (2009), who showed that shallow cumulus in the LMDZOR played a key role in preconditioning the atmosphere for deep convection, leading to a delay in the onset of the rainfall maximum from midday to late afternoon. Despite missing the nocturnal precipitation peak, many models compensate for this error by having too much precipitation in the afternoon, so that their total precipitation amount is fairly well captured or even overestimated (Table 5). It is worth noting that the afternoon rainfall is too heavy even in the models that underestimate the frequency of the deep cloud regimes at that time, such as the METUM, the CNRM, the CANCM4, Note. Model data have been averaged over the five hindcast days. and the LMDZOR (Figures 10d and 10f-10h). Conversely, while the CAM5 is one of the models with the largest frequency of the deep cloud regime (Figure 10a), it is among the models raining the least frequently and the least intensely around that time (Figures 13, top and 13, bottom). Interestingly, models that have a similar convection parameterization, such as CAM5, the WRF models and the CANCM4, behave fairly similarly in terms of the timing of the precipitation, but exhibit large differences in terms of the precipitation amounts. Indeed, these four models experience the largest rain rates in the afternoon and miss the nocturnal precipitation peak, but the afternoon rainfall in the WRF models exceeds the CANCM4 rainfall by far.
While far from conclusive, this analysis does suggest that most of the models trigger precipitation more than enough in the afternoon, but often lack the deep clouds that are observed with this precipitation. Nearly all models struggle to capture the nocturnal precipitation peak, which is likely related to propagating convective systems (Surcel et al., 2010;Trenberth et al., 2010), driven by cold pool dynamics and low-level jets. These mechanisms are known to be poorly represented in deep convective parameterizations, which likely explains their underrepresentation. The afternoon peak is more likely locally developed convection, associated with the diurnal heating of the surface (Zhang & Klein, 2010). Many models overestimate the intensity and have the timing of this peak too early. Possibly, this issue is related to the excess SW radiation reaching the surface in all models, heating the surface more than observed, and hence setting off convection too readily, but this needs further investigation. The fact that many models rain too heavily in the afternoon, while not detraining enough cloud from their convection schemes, suggest that precipitation efficiencies might be too high in these models over the SGP. It needs further study to confirm this hypothesis, but increasing the detrainment of cloud condensate in the convective environment and reducing the amount of condensate raining out to the surface could possibly improve many of the radiation issues highlighted. However, if not combined with an improved representation of the nocturnal precipitation peak, this might lead to dry biases in many models and errors in the surface energy balance. This indicates that solving one issue in the models, but not tackling a compensating error at the same time, could lead to deteriorations of the overall model performance.

Discussion and Conclusions
This paper constitutes the second part of a four-part series on the midlatitude continental warm bias seen in many NWP and climate models, with a particular focus on the U.S. SGP. In the framework of the CAUSES project, the first part of this series has documented the temporal and spatial characteristics of the warm bias in 11 GCMs and limited area models. Each of these models was initialized daily from ERA-Interim reanalysis between April and August 2011 and run freely in hindcast mode for 5 days, allowing for a detailed evaluation against the wealth of observations collected by the U.S. Department of Energy ARM Program near the Great Plains.
This second paper takes a closer look at the radiation biases that manifest themselves in the models taking part in the CAUSES project. More specifically, the two main objectives of this study are to perform (i) a systematic attribution study of radiation biases to errors in the surface properties, cloud characteristics, atmospheric water vapor, and aerosol and (ii) a more detailed analysis of the role of clouds and cloud regimes in the creation of surface radiation biases. Figure 14 provides an overview table of the key findings for all models. The large positive net SW biases found in all models are predominantly caused by cloud and surface albedo deficiencies. From the first two columns in Figure 14, cloud issues tend to dominate over albedo issues in all models, apart from the CAM5. Most of the cloud contribution to the net SW radiation bias originates from issues with the deep cloud regime, defined as concurrently having cloud in the low, middle, and high troposphere. This regime makes up 40-80 % of the total cloud contribution in all models, apart from the CAM5 (third column in Figure 14). The reason why this regime contributes so much varies between the models however. Some models, like the WRF simulations or the IFS, produce this regime enough or even more frequently than observed (FREQ 7 ), but with very weak cloud radiative effects (CRE 7 ). Other models, including the METUM, CNRM, and TAIESM capture the CRE 7 , but underestimate the frequency of the deep cloud regime (FREQ 7 ). The CANCM4 and LMDZOR miss most of the deep cloud events (FREQ 7 ), but largely overdo their CRE 7 if they are present. The contribution of the deep cloud regime to radiation errors in the CAM5 is small due to a compensating error between too large FREQ 7 and too small CRE 7 (Figure 14).
VAN WEVERBERG ET AL. CAUSES: ATTRIBUTION OF RADIATION BIASES Figure 14. Qualitative summary table for all models. The first three columns provide the contribution to the net SW radiation bias from clouds in general (C cloud ; first column), surface albedo (C ; second column), and the deep cloud regime (C 7 ; third column). The width and height of the squares scale with the magnitude of these contributions and are filled when the contribution is positive and empty when the contribution is negative. The last four columns represent the biases in a few key properties. Black-filled squares that are larger than the gray outline denote overestimations, while black squares that are smaller than the gray outline denote underestimations. Provided are the frequency bias of the deep cloud regime (FREQ 7 ; fourth column), the cloud radiative effect bias of the deep cloud regime (CRE 7 ; fifth column), the daytime precipitation bias (PREC day ; sixth column), and the overall precipitation bias (PREC all ; seventh column).
Interestingly, this does not imply that all models miss deep convection during the daytime. Indeed, a surprising finding of this study is that many models rain too much and too often during the afternoon (PREC day , Figure 14), while at the same time having too much absorbed SW radiation at the surface. This might point to issues of excessive convective precipitation efficiencies in many of the models and too weak detrainment of convective cloud.
It should also be mentioned that the finding of sufficient daytime rainfall, but with weak cloud radiative impact, does not rule out that there are still underlying issues with the surface evaporative fraction. Indeed, while the daytime precipitation tends to be too large, many models miss the large observed nocturnal precipitation peak, leading to too small total rainfall amounts in all but three models (PREC all , Figure 14) and possible impacts on the soil. Moreover, the excess SW radiation during the daytime in all models, be it due to surface albedo or cloud deficiencies, will also likely impact the surface energy balance. These issues will be further explored in the third part of this paper series .
As with most model evaluation studies, there are substantial uncertainties associated with many of the observations used here, in particular, associated with overlapping the ARSCL value-added product with radiometer measurements and satellite products (Qian et al., 2012). However, many of the discrepancies between the models and the observations and between the different models are much larger than these uncertainties and appear robust to slight changes to the assumptions in our methods.
While this study has been able to pinpoint some first-order mechanisms leading to severe biases in the surface radiation balance in all models participating in CAUSES, there are many outstanding issues yet to be resolved. In particular, it should be understood what is causing the weak deep cloud CRE in many models. For some models, this appears to be partly a precipitation efficiency issue (too heavy afternoon rain, but too little cloud), while for others the role of the large-scale environment might play a significant role as well. Moreover, in some models there might be room for improvement by including precipitating hydrometeors and more realistic particle properties in the radiative transfer. It should be teased out whether the excessive afternoon rainfall in many models is induced by this excessive surface heating or whether issues in the convection parameterizations are to blame for these excessive rainfall rates. A strategy to understand the role of convective triggering on the cloud and radiation biases should include the analysis of convection-permitting versions of those models presented here that can be run in that manner. Sensitivity studies focused on the assumptions in the convection parameterization, and the coupling of convection and other cloud physics are highly needed as well, leveraging the wealth of observations from the MC3E campaign. Furthermore, additional analysis of the propagation of convective systems initiated in the lee of the Rockies is needed, both with convection-permitting and nonconvection-permitting models.
Future studies, focused on specific problems in each of the models, should bring more clarity and provide possible ways forward in resolving some of the issues that this paper has highlighted. Although more work is required to actually fix the biases seen in these models, this paper has been able to help identify the root causes of the radiation errors in all models concerned, which helps focus model development and is a first step toward improving the representation of clouds and radiation over the midlatitudes in these models. In the future, the methodologies developed in this paper could be easily applied to improved versions of these models or other models that experience similar issues with clouds and radiation.
Hence, the contribution of surface albedo errors to the overall net SW radiation bias can be written as Given the estimates of observed clear-sky SW radiation by Long and Ackerman (2000), the contribution of biases in the cloud properties to the overall net SW radiation biases can be written as where SWCS MD and SWCS OD are the clear-sky SW downwelling radiation for the model and the observations, respectively. Note the multiplication with (1 − O ) in this equation (as inherited from equation (A3).
The remaining component now will be the clear-sky SW downwelling radiation bias: This component will contain contributions from biases in the vertically IWV, aerosol, and inaccuracies in the clear-sky radiative transfer. Focusing on the IWV first, it is instructive to compare the bias in the clear-sky radiation (equation (A6)) with the concurrent bias in IWV. For every output time (every hour in most models), the clear-sky net SW radiation bias can be assigned to one of two composites. A first composite contains all analysis times when the IWV is biased (negatively or positively) and a second composite encompasses analysis times with unbiased IWV. The threshold between the two composites has been set here to a bias of −5% or +5% of the observed IWV value, respectively. Although this is a fairly arbitrary threshold, variations between 1 and 10% did not change our conclusions.
Assuming that the bias in IWV is independent from any remaining biases associated with the clear-sky SW radiation (e.g., aerosol), the contribution from a biased IWV can be estimated as follows: where C clear biasiwv and C clear nobiasiwv are the clear-sky contributions for the biased and unbiased IWV composites, respectively, calculated following equation (A6). The parameter freq biasiwv is the frequency (as a fraction) of the biased composite.
All remaining potential contributions to the overall net SW radiation bias are contained in the residual contribution: It should be mentioned that this last component could be further disentangled if aerosol optical depths had been available for the models, which unfortunately is not the case. Hence, this last residual component contains any contributions from aerosol, as well as potential inaccuracies in the radiation codes (e.g., Pincus et al., 2015) and inaccuracies from misasignments of clear and cloud skies in the observations.
A similar approach can be followed for the net LW radiation bias, so that where the C surf is The terms in equation (A10) no longer contain the albedo factor as for the SW contributions and the observed clear-sky downwelling longwave (LWCS OD ) is estimated following Long and Turner (2008).

Appendix B: Attribution of Net LW Bias to IWV
In section 4.2.2, it was found that some models experience a positive contribution from IWV to the net LW bias, despite these models being drier than observed. To understand this apparently contradicting behavior VAN WEVERBERG ET AL. CAUSES: ATTRIBUTION OF RADIATION BIASES Figure B1. Scatter plot of the bias in integrated water vapor (IWV) versus the bias in clear-sky downwelling longwave radiation for the CAM5 against the Atmospheric Radiation Measurement Best Estimate data (Tang & Xie, 2015). Data points are from the entire simulation period (April-August 2011, with hourly intervals) and the five hindcast days (T + 00 to T + 120). Colors indicate the simulated near-surface (2 m) relative humidity for each respective data point. Only the data points with an IWV bias of more than ±5% have been colored. The shape of the distribution and the average of the IWV bias and the clear-sky downwelling longwave radiation bias are indicated by the black thick solid lines and the thin vertical and horizontal dotted lines, respectively.
in these models, it is instructive to have a closer look at the scatter plots shown in Figure B1. This figure shows the correlation between the IWV bias and the clear-sky downwelling LW radiation bias throughout the entire simulation period (April-August 2011) and the full 5 day hindcast (T + 00 to T + 120) for the CAM5. The colors in this figure represent the simulated near-surface relative humidity (RH) associated with each data point. From this figure, it appears that when the IWV is biased dry (data points to the left of the zero-IWV-bias line), the clear-sky downwelling LW radiation bias is not much reduced compared to when the IWV is well captured (gray dots). However, when the IWV is biased wet (data points to the right of the zero-IWV-bias line), the clear-sky downwelling LW radiation bias sharply increases. Moreover, these latter data points also seem to be associated with systematically larger relative humidity (as shown by the red colors for these data points in Figure B1). This finding is relevant since hygroscopic particles start to accumulate liquid water and a haze starts to form as soon as the relative humidity reaches about 75% (also bear in mind that 60% of the downwelling LW radiative impact of the atmospheric WV originates from the bottom 100 m of the atmosphere in clear-sky conditions McFarlane et al., 2013). Although still considered a clear-sky environment, this haze has nonnegligible effects on the downwelling LW radiation (Long & Turner, 2008). A dry bias in IWV in the CAM5 is typically associated with near-surface RH values closer to 50-60% and under these circumstances, slight differences in the water vapor content of the atmosphere will matter less than when the near-surface RH approaches 70-80%.