The Added Value of Large-eddy and Storm-resolving Models for Simulating Clouds and Precipitation

More than one hundred days were simulated over very large domains with fine (0.156 km to 2.5 km) grid spacing for realistic conditions to test the hypothesis that storm (kilometer) and large-eddy (hectometer) resolving simulations would provide an improved representation of clouds and precipitation in atmospheric simulations. At scales that resolve convective storms (storm-resolving for short), the vertical velocity variance becomes resolved and a better physical basis is achieved for representing clouds and precipitation. Similarly to past studies we found an improved representation of precipitation at kilometer scales, as compared to models with parameterized convection. The main precipitation features (location, diurnal cycle and spatial propagation) are well captured already at kilometer scales, and refining resolution to hectometer scales does not substantially change the simulations in these respects. It does, however, lead to a reduction in the precipitation on the time-scales considered – most notably over the ocean in the tropics. Changes in the distribution of precipitation, with less frequent extremes are also found in simulations incorporating hectometer scales. Hectometer scales appear to be more important for the representation of clouds, and make it possible to capture many important aspects of the cloud field, from the vertical distribution of cloud cover, to the distribution of cloud sizes, and to the diel (daily) cycle. Qualitative improvements, particularly in the ability to differentiate cumulus from stratiform clouds, are seen when one reduces the grid spacing from kilometer to hectometer scales. At the hectometer scale new challenges arise, but the similarity of observed and simulated scales, and the more direct connection between the circulation and the unconstrained degrees of freedom make these challenges less daunting. This quality, combined with already improved simulation as compared to more parameterized models, underpins our conviction that the use and further development of storm-resolving models offers exciting opportunities for advancing understanding of climate and climate change.


Introduction
The expectation that Earth's surface temperatures will continue to increase raises pressing questions. How will this warming be distributed spatially and temporally? What does it imply for the hydrological cycle on regional scales? And what are the implications of both for society and ecology? Climate models have been developed to provide answers to these questions. However, even after decades of development and extensive efforts to fit them to the present day climatology their biases remain large, often larger than the climate-change signals they predict (Palmer and Stevens 2019). This situation -what some authors have described as a deadlock -calls their usefulness into question. Progress in reducing model biases has been slow (Jakob 2010, Knutti and Sedlácek 2012)far too slow to give confidence that continuing along this path will bring success in a time-frame that is needed by society. New approaches are required.
An example of a new approach would be to develop climate models capable of directly simulating important processes that conventional models must parameterize (Tomita et al. 2005;Satoh et al. 2019). By replacing some of the most uncertain aspects of conventional models by representations better grounded in the laws of physics, such approaches provide an improved scientific basis for decision making. They are also simpler, because they embody fewer equations. Despite their obvious appeal the computational cost and slower workflow of such models is a disadvantage as compared to the computationally less ambitious models. Hence, before investing too heavily in the development of these new types of models, it would be helpful to determine which shortcomings one expects to address. This line of thought motivated the proposal of a German national project, called High Definition Clouds and Precipitation for Advancing Climate Prediction, HD(CP) 2 . HD(CP) 2 posed the question whether climate models developed to run on scales of hectometers or kilometers could constitute a possible way around the aforementioned modeling deadlock. The authors' answer to this question andin a distilled form -the experiences upon which this answer is based, are presented herein.
The idea that simulations at kilometer scales might provide a sound basis for representing precipitation processes has a strong empirical foundation. Studies going back more than twenty years (Weisman et al. 1997) have been demonstrating the ability of models to explicitly resolve convection using grid meshes on the order of a few kilometers. These approaches (see also the review by Guichard and Couvreaux 2017) have been so successful, see e.g., Miura et al. (2007) and Miyamoto et al. (2013), that in many countries operational weather prediction systems now incorporate them (e.g., Lean et al. 2008;Baldauf et al. 2011;Hirahara et al. 2011), and many centers have begun testing systems capable of resolving convective storms, globally (Weber and Mass 2019;Düben et al. 2020). This success has likewise motivated initiatives -such as the UK CASCADE project, a forerunner of HD(CP) 2 -to use realistically configured kilometerscale large-domain simulations to study the interaction of convection with large-scale circulations (e.g., Holloway et al. 2012;Marsham et al. 2013), and given new impetus to storm-resolving studies of regional climate (Prein et al. 2015;Kendon et al. 2017Kendon et al. , 2019Leutwyler et al. 2017). Simulations on global domains using NICAM (Satoh et al. 2017), albeit generally with slightly coarser (7 km to 14 km) grids, or using super-parameterization (Khairoutdinov et al. 2005;Arnold and Randall 2015), have also demonstrated global benefits of an explicit representation of convection. Continuous increases of computational capacity are opening a frontier to studies with yet finer resolution, as HD(CP) 2 has begun to extend the regional approaches to domains with hectometer grid spacings.
Parallel to these developments has been the growing awareness of the challenges faced by efforts to parameterize convection. What was once seen as a conceptually straightforward, even elegant, question, is increasingly seen as difficult and ill-posed. Simply visualizing a storm system as simulated on a 156 m mesh and comparing it to a parameterized version of the same case ( Fig. 1) illustrates this point. Atmospheric moist convection is comprised of many more elements than simply mass fluxes responding to forcing (cf. Arakawa and Schubert (1973)). As such parameterizations are increasingly being asked to orchestrate a symphony of elements -updrafts, downdrafts, rain-shafts, cloud shields and their radiative properties, cold-pools and their gust fronts -in ways that are general enough to capture the different conditions of different storms (e.g., Grandpeix and Lafore 2010). This is a daunting task. By contrast, by solving a handful of equations describing material conservation and force balances, and coupling them to relatively simple parameterizations of cloud microphysical and small-scale turbulent processes, major features of a storm and the interplay of its different elements, emerge naturally. With the advent of considerably more finely resolved (Δ x = 156 m) large-domain simulations, as performed within HD(CP) 2 and enabling the visualization in Fig. 1, it becomes possible to ask to what extent these fine-scales manifest themselves in improved representations of precipitating convective systems. Some studies have begun to explore these questions using relatively small domains and idealized simulations (Bryan et al. 2003;Jeevanjee 2017). HD(CP) 2 was the first project to explore these questions in more realistic situations for a variety of conditions in comparison to models with parameterized convection and abundant observations.
In the case of clouds, large-eddy simulations (LES) have long established the importance of hecto (and deca-) meter, and even finer (Mellado et al. 2018), scales. However, for reasons of computational expense most LES have been performed for idealized situations, generally for short periods of time (hours) and over comparatively small (kilometers to tens of kilometers) domains (Moeng 1986;Siebesma et al. 2003;Rieck et al. 2012;Seifert et al. 2015). Even approaches such as the LES ARM Symbiotic Simulation and observation workflow (LASSO) that are centered around intensive field measurements still adopt semi-idealized approaches using small domains that necessitate periodic boundary conditions (Fast et al. 2019). Super-parameterization has begun to allow a broader look at how an explicit representation of clouds couples to large-scale circulations (e.g., Parishani et al. 2018), but still using individual subdomains that remain small and idealized. The extent to which basic features of observed clouds, over large domains with realistic forcing and a realistic diel cycle, can be captured at hectometer scales has been much less explored. Our experience has been that the cloud-field can often be simulated in ways that appear quite real-istic, for instance as illustrated by Fig. 2. Lacking is a more quantitative comparison. By simulating realistic cases, over large domains, the present study is able to use data over a wide variety of conditions to demonstrate the added value of kilometer and hectometer scale representations of clouds.
In an earlier paper, Heinze et al. (2017) described prototype simulations conducted at the end of the first phase of the HD(CP) 2 project. Here we extend their analysis. The present approach differs from this and other scientific studies in that we make opportunistic use of a wide range of simulations, many performed for different specific purposes, in an attempt to distill more general insights. Some of the points we make, for instance, relating to the differences between parameterized and explicit representations of convective precipitation, are less novel, but are presented to corroborate and extend the existing literature on the subject, also because the comparisons to measurements that are herein made, are in most cases new. Greater emphasis is placed on the study of cloud processes, as this work goes well beyond the state-ofthe-art. The manuscript also emphasizes how simulations of scales of motion comparable with those observed greatly enhance the bandwidth between the modeling and observational communities. This enriches the present analysis and provides a footing for better addressing some important deficiencies that even a global LES would not solve The reference for the simulations, which are also archived and made available to the community for subsequent analysis, is found in Section 2. In this section it is argued that the distinction between storm-resolving simulations and approaches based on parameterized convection is that the former resolve the bulk of the energy in the field of vertical motions, the component of the velocity-vector most tightly coupled to clouds and precipitation. Observational data sources are introduced in Section 3 and used in Sec-tion 4 to demonstrate that untuned 1 simulations with grid spacings ranging from 156 m to 2.5 km capture basic aspects of the energy and mass budgets with an accuracy commensurate to what models with convective and cloud parameterizations can achieve after fine tuning. In Section 5 the precipitation component of the HD(CP) 2 hypothesis is addressed, in Section 6 the cloud component. Even at hectometer scales, important processes remain unresolved; i.e., the behavior of the simulations with respect to features still dependent on unresolved processes, such as radiative energy transfer, cloud microphysical processes, or small scale mixing in the presence of stable stratification. In Section 7 we discuss how the remaining deficiencies of hectometer and kilometer scale simulations are more amenable to observational constraints. A summary of the results and some broad conclusions are presented in Section 8.

Simulations to resolve atmospheric convection
The ICON (ICOsahedral Non-hydrostatic) modeling framework was co-developed by the German national weather service (DWD, Deutsche Wetterdienst) and the Max Planck Institute for Meteorology for weather and climate simulations . It was further developed and applied to LES (Dipankar et al. 2015;Heinze et al. 2017) and convective-stormresolving (or storm-resolving for short) simulations both over large regional ) and global domains (Stevens et al. 2019a, b;Hohenegger et al. 2020). These are referred to as the LEM and SRM configurations, respectively. The LEM version is run as a large-eddy simulation model with realistic topography and open boundary conditions nudged on the lateral boundaries to its forcing data with grid spacings of 156, 312, and 625 m. Physical parameterizations are limited to the representation of land-surface processes, three-dimensional mixing by small-scale turbulence, cloud microphysical processes and radiative transfer as described by Heinze et al. (2017). The simulations over Germany (DE) are initialized from COSMO-DE (Consortium for Smallscale Modeling) data (Baldauf et al. 2011). This model is run without explicit deep moist convection (but unlike the ICON-SRM or ICON-LEM, it does make use of a shallow convection scheme) and qualifies as an SRM; it is initialized by its larger domain and more In this color image, the 0.6 µm reflectance, R6, was used for the red channel, the 0.8 µm reflectance, R8, for the green channel and 0.5(R6 + R8) for the blue channel. The bottom visualizes the cloud scene using ray tracing (e.g., Mayer 2009) as an observer would see it from the surface in the cyan marked position delineating the field of view, on the top image.
coarsely resolved counterpart COSMO-EU. For a detailed description of how ICON-LEM was configured and of how the simulations were performed, the reader is referred to the manuscript by Heinze et al. (2017). The ICON-SRM version is run on 1.25 km and 2.5 km grid meshes. It differs from the ICON-LEM (as described by Heinze et al. 2017) in that the three-dimensional turbulence scheme is replaced by a boundary-layer parameterization and a turbulence mixing scheme that operates only on vertical columns. It differs from the ICON-NWP model in that it does not use parameterization for moist convection, and has no parameterization of shallow moist convection. The SRM uses the one-moment cloud microphysical scheme with graupel described by Baldauf et al. (2011), as also used by COSMO-DE. The SRM thus differs from the LEM that uses the two moment representation of cloud microphysics (Seifert and Beheng 2006). The initial and boundary data for the SRM are taken from the Integrated Forecasting System (IFS) of the European Centre for Medium-Range Weather Forecasts. Further details of the ICON-SRM as con-figured for this study can be found in Klocke et al. (2017) and Hohenegger et al. (2020).
In addition, COSMO was used to provide an SRM reference for comparison to the LEM simulations over the DE domain. For these (what we call COSMO-SRM) simulations, boundary conditions and initial data were taken from COSMO-EU (or ICON-EU for the 2017 simulation). COSMO-SRM is very similar to the operational COSMO-DE model, whose output is used to initialize the ICON-LEM. The main differences is that COSMO-SRM followed the LEM output protocol, and used the same two-moment microphysics used by the LEM, rather than the one-moment scheme used by COSMO-DE.
As simulations were being performed over a time period of approximately three years, bugs were identified and resolved, so that different simulations were run with different code versions as improvements were incorporated. Important updates are noted in Table 1 which provides an overview of all the simulations. The domains over which the simulations were performed and the number of simulated case  (Stevens et al. 2013) and ICON-ECHAM, which is the climate version of ICON using the ECHAM6 physics package (Giorgetta et al. 2018). ICON-NWP and ICON-ECHAM only share the same dynamical core and computational infrastructure, the ways in which subgrid-scale physical processes are parameterized differs substantially, also in terms of the applications for which they have been tuned. ICON-NWP was run with a grid spacing of 40 km. This particular resolution was chosen because it is comparable to that of the finest resolution models used in the framework of the Coupled Model Intercomparison Project CMIP6 (Eyring et al. 2016), but significantly coarser than current operational global deterministic NWP system used by DWD, and for which the physics has been tuned. ECHAM was run with a spectral-triangular truncation of T127 that results in 384 spectral-transform points along the equator, equivalent to approximately 100 km grid spacing. This resolution is state-of-the-art for decadal and centennial prediction systems. ICON-NWP, ICON-ECHAM and ECHAM were initialized by IFS data of the atmospheric and surface state and run forward in time for the same time-periods as indicated in Table 1. Running climate models as one would run a numerical weather prediction model, by initializing it with observed weather and analyzing its solutions on time-scales of hours to days, is known as Transpose AMIP (Williams et al. 2013). AMIP stands for the Atmospheric Model Intercomparison Protocol (Gates 1992), which evaluates the climate of atmospheric models forced by specified sea-surface temperatures. The Transpose AMIP approach applied to the global models differs from the treatment of limited-area SRM and LEM simulations in that the latter are continually updated at their boundaries. For the quantities we look at, and given the size of the domain and the shortness of the simulations, we have no reason to believe that this makes a large difference, but it leads to a less 'clean' comparison and should thus be kept in mind.
The main difference between an SRM or LEM and conventional general circulation models (GCMs), which run at scales where convection has to be parameterized, is that the former resolve dynamics in the third (vertical) dimension, i.e., the vertical motion and its variability. This is illustrated in Fig. 4, which illustrates the kinetic energy for the horizontal (E u, v )   (Koshyk and Hamilton 2001;Terasaki et al. 2009;Skamarock et al. 2014) has emphasized the ability of models to capture a λ -3 scaling regime at synoptic scales for the wavelength λ, and with increasing model resolution, the transition to a λ -5/3 regime at scales smaller than 400 km to 500 km, as is observed (Nastrom et al. 1984). This λ -5/3 regime is not to be confused with the more well understood inertial-range of three dimensional turbulence, which manifests at much smaller scales.
The spectra, examples of which are illustrated by Fig.  4, emphasize that most of the kinetic energy is carried by large-scale quasi-horizontal motions, i.e., E u, v  E w and is largest at large λ. Panels c) and d) in Fig. 4 illustrate a flat, almost white, spectrum of the vertical velocity, E w , which extends to scales of a few kilo-meters. Terasaki et al. (2009) noted a similar spectrum of E w in global simulations, albeit not extending to as fine a spatial scale. This scaling is most evident in the ICON-LEM spectra (bottom-right panel in Fig. 4), which fully resolve the transition to three dimensional turbulence, and the expected inertial range scaling, at wave lengths of 1 km to 5 km. At these scales nonhydrostatic accelerations start to be important. These results form the basis for claiming that storm-resolving scales are required to resolve motion in the third dimension of the atmosphere. Given the importance of vertical energy transport for our understanding of the climate system, its equipartition of variance across scales, would seem to have considerable bearing on the distribution of clouds and precipitation.

Observational data
The present study distinguishes itself from many others through its use of a large variety of observational data as a basis for the evaluation of the simulations.   Table A1 summarizes these diverse data sources, and their associated references. For many of the data products, retrievals are applied to allow the comparison of remotely sensed quantities with physical properties simulated by the models. Cloud water path is retrieved from SEVIRI 2 data utilizing the Cloud Physical Property retrieval developed by the Satellite Application Facility on Climate Monitoring (CMSAF, Schulz et al. 2009). Liquid clouds are defined to be those with tops below 3.66 km, a value chosen based on the ECHAM vertical grid. Cirrus cloud cover is retrieved using the 'Cirrus Properties from SEVIRI' (CiPS) algorithm (Strandgren et al. 2017a, b). CiPS retrieves ice clouds using an artificial neural network trained with MSG/SEVIRI Infrared (IR) observations and corresponding cirrus properties derived from CALIPSO/CALIOP backscatter retrievals. Validation against CALIOP indicates a very high sensitivity to thin ice clouds. The detection probability of ice clouds seen in CALIOP retrievals is 50, 60 and 80 % for cirrus with an optical thickness of 0:05, 0:08 and 0:14 respectively (Strandgren et al. 2017a), which corresponds to an Ice Water Path (IWP) of roughly 0.6, 1.0 and 3.0 g m -2 , respectively.

Bulk statistics
Estimates of mean properties from simulations over the different domains were compiled along with reference observations. These were intended to provide a quantitative overview of the different cases simulated, and the differences arising from different modeling frameworks. In cases where no direct measurements were available, ERA-Interim (Dee et al. 2011) data are used as a substitute for observational estimates. Sampling uncertainty is quantified through the standard deviation of the simulation case (day) means for that domain. Values are tabulated as a reference for users of the output (Appendix and Tables A2 -6). These statistics are indicative of how most of the simulations target situations where moist convection can be expected. Bowen ratios are generally less than 0.5, over the BB domain they are as low as 0.1. All domains, except for the NA, are also characterized by a net input of radiant energy at the top of the atmosphere (TOA). In terms of mean temperatures, the simulations fall in two groups: MCEA, TA and BB have surface air temperatures near 300 K whereas the DE and NA domains are approximately 12 K colder. Precipitable, PW, varies from near 20 mm in the colder domains to 50 mm (MCEA). In a relative sense the NARVAL 1 simulations (BB) are the driest, with a PW of 30 mm but temperatures are much higher than in the NA or TA cases. The NARVAL 1 cases also have the lowest precipitation rate, near 1 mm d -1 ; the other domains have mean precipitation rates near 3 mm d -1 , except for MCEA which has a domain mean precipitation rate approaching 6 mm d -1 .
The compilation of mean statistics in Tables A2 -6) aids an assessment of the extent to which LEM and SRM simulations stand out as compared to conventional models. In many cases the LEM and the SRM were run for the first time, whereas the global models have been developed over years and tuned to well represent climatological values. Despite this fact, the statistics tabulated in Appendix indicate no clear devi-ation of the LEM or SRM simulations from the conventional models. In cases where the LEM or SRM is an outlier, it is not necessarily a worse fit to the observations (for instance, TOA shortwave irradiance over the MCEA domain). Looking across the simulations for general behavior there is some evidence that the SRM simulations are brighter (as measured by a smaller net shortwave irradiance at TOA), with larger liquid water paths, but less ice. This tendency is more evident for the SRM than for the LEM simulations, consistent with the former also being brighter, and with global SRM simulations as summarized by Stevens et al. (2019a, b).

Precipitation
As discussed in the introduction, a considerable literature demonstrates the ability of simulations with a grid spacing of a few kilometers to well represent precipitating deep convection. This literature emphasizes the ability of such simulations to represent the structure of convective storms, particularly organized systems, as well as the frequency, intensity, and distribution of precipitation (Hohenegger et al. 2008;Prein et al. 2015;Kendon et al. 2017). Often it is concluded that grid-spacings of a few kilometers are adequate to capture the bulk statistics of precipitating deep convection (e.g., Langhans et al. 2012;Panosetti et al. 2019). These studies tend to emphasize case studies for a particular region; typically using a single model with grid-spacings that vary by a factor 4 -10, with (and sometimes without) parameterized convection. When global domains are considered, the grid-spacing is often still somewhat coarse. In this section we examine these questions from the perspective of a single modeling framework simulating cases from different climate regimes. We also compare our findings to global climate models with parameterized convection running at conventional (50 km to 100 km) grid spacings, hence containing little or no information associated with meso and storm-scale circulations.
We also explored new ground by analyzing simulations over large domains with much finer (hectometer) grid spacings. Where other studies have looked at the progressive impact of such finer scales, these tended to focus on idealized and isolated storms -the studies by Panosetti et al. (2019) and Langhans et al. (2012) being an exception -addressing particular questions, such as the most appropriate turbulent closure (Bryan et al. 2003), the role of non-hydrostatic accelerations (Jeevanjee 2017), and the effect of resolution on the effective buoyancy of convective plumes (Pauluis and Garner 2006). It is difficult to draw general conclusions from convergence studies as convergence depends on the metric, and the models being investigated are asymptotically inconsistent (see e.g., Stevens and Lenschow 2001). Despite these reservations, the results from these convergence studies indicate that (i) there is a smooth transition between hydrostatic and non-hydrostatic regimes, and (ii) the misrepresentation of the relevant horizontal scales (in terms of the updrafts) is not as serious a challenge as previously believed, as there are signs of bulk convergence. As regards the former, cloud mixing processes may become Reynolds number invariant (converge) at tens of centimeters (Mellado et al. 2018), but higher order moments of this mixing process might only converge at smaller scales. The latter point refers to the failure of the models to asymptotically approach known fundamental laws as a control parameter (such as grid spacing) is refined. For example, as resolution (or any other parameter) is refined, the cloud microphysical, land surface, or turbulent processes do not progressively approach known laws. For these reasons, the important question is how errors from the spatial truncation of the fluid motions compete with errors inherited from simplifications or uncertainties in the representation of other processes, such as the land surface, or microphysics; or how large do these errors end up being compared to those associated with alternative representations of convection.

The added value of hectometer grid spacings
To address this question, we look both at the TA simulations, over which cases were simulated with grid-spacings ranging between 312 m and 2500 m, and compare simulations with grid spacings ranging between 156 m and 625 m over the DE domain. The DE simulations have been run at yet coarser resolutions over a subset of cases, and the TA simulations have been performed at finer (156 m) grid-spacing over a smaller subdomain. In both cases we look for common changes in the structure, frequency, or intensity of the simulated fields of precipitation. In the simulations over the DE domain we additionally look for the signature of a better resolution of orographic effects as resolution is refined.
In a bulk sense, the simulations show some differences emerging from different configurations and/or from progressive refinement of simulations to hectometer scales. A signal of such differences is evident over the BB domain where the forcing is weaker, precipitation comes from shallower convection, and conditions are more homogeneous. This is illustrated in Fig. 5, where we have computed the mean precipitation rate for a 21 h period between 15 UTC and 12 UTC. Starting at 15 UTC (six hours after initialization) avoids a pulse in precipitation that is evident in the nested (312 m to 625 m) simulations. The area of the spatial composite is chosen slightly smaller than the large BB domain (as depicted by the outer dashed line in Fig. 3) to avoid possible boundary effects. Time-series data (not shown) indicate that the differences in Fig. 5 are also apparent over the temporal evolution, and thus appear systematic. Simulations with the smallest grid-spacing precipitate the least, but differ only slightly from those with a grid-spacing twice as coarse. The evolution with grid-spacing is not monotonic, simulations with Δ x = 1.25 km precipitate the most. Over the DE domain (not shown), there is also evidence of precipitation reducing as the grid encompasses hectometer scales, but these differences are less marked than they are for the BB domain. Bulk differences in precipitation between SRM and global (parameterized convection) simulations are larger over the TA domain than over the DE domain (compare Tables A2 and A4). We interpret these differences as evidence of a heightened sensitivity to resolution over the less strongly forced maritime conditions.
The similarity between the simulations at 625 m and 312 m over the BB domain is also evident in the histogram of rain-rate intensity (Fig. 6a), and differs markedly from TA simulations with the 2.5 km ICON-SRM subsetted to the BB domain. The more coarsely resolved simulations appear to favor rainrates with greater intensity. A similar comparison, this time between the LEM (625 m) and the COSMO-SRM (2.8 km) simulations over the DE domain ( Fig.  6b), indicates that differences between the SRM and LEM representations of precipitation intensity are less marked over land -consistent with more consistent bulk statistics. The LEM also indicates more profound differences across domains than is evident for the SRM, something that is evident by comparing COSMO-SRM to ICON-SRM in Fig. 6b. These results suggest that the LEM distinguishes between the two convective regimes, with less frequent intense precipitation over the tropical ocean as compared to mid-latitude land, in ways that the SRM does not.
One interpretation of the reduction of precipitation as the mesh is refined to scales below 1.25 km is that smaller scale features contribute to the transport of condensate, and these are accelerated more rapidly for the same buoyancy perturbation (Pauluis and Garner 2006;Jeevanjee 2017). This would make them more susceptible to mixing and less efficient at producing precipitation. Simulations of a composite diel cycle by Panosetti et al. (2019) show a similar tendency, as do global simulations analyzed by , but the simulations by Bryan et al. (2003) indicate less of a clear relationship between the fineness of the grid mesh and the amount of precipitation.
A variety of attempts were made to identify signs of the land surface imprinting itself on precipitation more clearly as the grid spacing was refined to 156 m over the DE domain. None were successful. One idea was that mountain valley circulations would become more apparent, another idea was that the effect of coastlines, or landscape variability would become more evident, at these scales. Analysis of the experiments was unable to support such ideas. As part of a PhD project, Singh and Kalthoff (manuscript in preparation, 2019) examined these questions more systematically, by independently varying the resolution of the land-surface representation and the model grid spacing. For the six cases they studied, the sensitivity to resolution was smaller than what we found over the TA domain, and the changes in the degree to which the land-surface is resolved contributed only 20 % to this change, the rest could be attributed to the effect of grid-spacing on the resolution of the atmosphere itself.

Large-scale structure and variability of precipitation
The ability of the storm-resolving model to capture the spatial distribution of precipitation is highlighted using output from the simulations over the MCEA and TA domains, as they provide a contrast to the varying influences on tropical precipitation. Mean precipitation, as simulated by ICON-SRM and ECHAM, is compared to observations for the MCEA domain in Fig. 7 and for the TA domain in Fig. 8. The figures demonstrate that in quite different conditions the Illustrated are results from grids with different grid spacings, whereby counts are computed after gridding all output to a coarser (7 km) grid to avoid grid-point effects. storm-resolving model, even without any explicit tuning, captures the amplitude, pattern, and variability of the observed precipitation more satisfactorily than the global models with parameterized convection. Over the MCEA domain (Fig. 7) the observed precipitation minimum over the South China Sea is simulated by the SRM but not ECHAM; the same applies to a similar feature north of Taiwan over the southern part of the south China Sea. The SRM also captures the observed land-sea contrast over many islands better than ECHAM, as, for instance, demonstrated by the precipitation features over and around Hainan (20°N, 110°E) and Taiwan (24°N, 121°E). For some of these features the SRM may benefit from an unfair advantage as it is continuously fed with updated boundary conditions, whereas the global models must run freely from their initial conditions. However, both the MCEA and TA simulations are reinitialized every day, so that the global models also benefit from updated information, and these simulations are consistent with the findings over MCEA, e.g., Fig. 7. Over the Atlantic, particularly in December (Fig. 8), precipitation is less bound to the coast than in ECHAM (a bias common to many climate models, as demonstrated by Siongco et al. (2014)) and less strongly coupled to local maxima in the sea-surface temperatures.
A more aggregated measure of the distribution of precipitation is given by its degree of covariation with precipitable water, PW. Recently, Mapes et al. (2018) argued that in present climate the humid, or wet, tropics is demarcated by the 48 mm contour of the water vapor path. Such a threshold is consistent with the sharp pick-up in precipitation with PW ranging from 40 mm to 50 mm evident in Fig. 9 and previous work, e.g., Bretherton et al. (2004), Peters and Neelin (2006), and Holloway and Neelin (2009). Considering the uncertainty in the observations, ICON-SRM fits the observed signal very well. Both models with parameterized convection (ECHAM and ICON-NWP) are known to provide a reasonably good fit to the observations and this is evident in Fig. 9; even so, both indicate more precipitation at lower values of PW as compared to ICON-SRM. Rain-rates are twice the observed (or SRM simulated) value in the critical range of the precipitation transition between column water vapor amounts of 40 mm to 50 mm. For high values of column water vapor, the SRM simulates a mean precipitation rate higher than reported by the observations. Not all aspects of the SRM precipitation are clear improvements. The simulations at storm-resolving scales produced intense precipitation in certain coastal regions that appear well in excess of what is derived from the Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Algorithm (TMPA; Huffman et al. 2007), although one should also bear in mind possible limitations in the observations, especially in regions of complex terrain. Differences with TRMM are evident along the coast of Myanmar and Thailand (Fig. 7) and along the coast of West Africa (near Guinea) in boreal summer (Fig.  8). A similar issue is evident in the NA simulations (not shown) along the west coast of South Greenland. In every case the observations have signs of a similar local maximum, but not as pronounced. In most cases local maxima are not present on the coasts in the simulations with parameterized convection. For instance, Fig. 7 illustrates how precipitation maximizes off shore over the Bay of Bengal in the simulations with

ECHAM.
In addition to a generally better spatial distribution, precipitation features simulated by the SRM have a more realistic signature of spatial variability than is evident in the simulation with parameterized convection. This is highlighted in the Hovmöller diagram ( Fig. 10) showing the latitudinal (28 -32°) averaged precipitation rate in the boxed region of Fig. 7. Damaging floods affected this region during the simulated period in connection with the quasi-stationary Mei-yu front, with precipitation totals of 193 mm recorded over a seven day period (between 30 Jun and 6 Jul 2016). Accumulated precipitation in the SRM simulations totaled 186 mm. The models with parameterized convection produced 137 mm (ICON-ECHAM) and 143 mm (ICON-NWP). Although it is difficult to rule out chance in the ability of the SRM to better represent the higher precipitation amounts, this improvement coincides with a better representation of the structure and evolution of the responsible storms. This is evident through the eastward migration of storms over the flooded region in the SRM, which is largely absent in the models with parameterized convection for which precipitation appears more diurnally driven and spatially locked (Fig. 10). Likewise, over Africa, the lack of large-scale propagating features and a too strong diel cycle (Fig. 11) leads to a too low day-today variability in ECHAM. This can be quantified using the coefficient of variation of the temporal variability (the ratio of the standard deviation to the mean), which we have calculated for the domainaverage of the west African Sahel, where variability plays an important role. The observed value is 2.1, compared to 1.4 in ICON-ECHAM and 1.7 in ICON-SRM. Though still varying less than observed, this bias is reduced by nearly a factor of two in the SRM. The ability of storm-resolving models to better represent meso-scale convective systems over Western Africa has also been noted by earlier studies (e.g., Pearson et al. 2014;Beucher et al. 2014;Maurer et al. 2015;Zhang et al. 2016;Peters et al. 2019).

Diel cycle of precipitation
The fact that precipitation peaks in the late-afternoon or early-evening over tropical continents is known for some time, e.g., observations over Sudan (Pedgley 1969) and over Northeast Brazil (Kousky 1980). Additionally over mid-latitude continental areas, the late-afternoon peak is so well known as to feature in children's books romanticizing a lazy summer day (Stietencron 1992). The failure of convective parameterization to capture this signal was All data were coarse grained to the resolution from ECHAM before the dependency was calculated and only water vapor bins with ten values or more are considered.    July / d also identified long ago (Dai et al. 1999). Although a few groups have demonstrated an ability to capture such a signal (Takayabu and Kimoto 2008;Hourdin et al. 2013;Bechtold et al. 2014), progress is spotty and large errors continue to persist over many generations of model development (Covey et al. 2016). Over land, precipitation is too coherent with the phase and amplitude of the diel cycle in these models. By contrast, storm-resolving models, even with grid spacings as coarse as 7 km to 14 km, are able to represent the observed signal of diurnal variability in locally forced convection without any special effort or tuning (Petch et al. 2002;Sato et al. 2009). This is also our experience across all domains, especially those with an influence of the land surface, as summarized in Fig.  12. Petch et al. (2002) argued that the better representation of the sub-cloud layer at hectometer, as compared to kilometer scales provides an additional benefit in representing the daytime peak. To the extent the changes are systematic, a comparison of simulations with 156 m to 625 m over the DE domain (not shown) suggest that the changes are small compared Some of the challenges of convective parameterization, as well as associated advantages of resolving deep convection, are illustrated by the simulations of the diel cycle over Germany. Figure 12f presents the composite diel cycle for days with summer conditions and largely locally forced precipitation (29 Jul, 14 -15 Aug 2014, 4 Jul 2015, and 3 Jun 2016. For these days and region, ECHAM produces a diel cycle with rainfall peaking near noon and absent at night; this is consistent with errors evident in more tropical regions (panels a, b and e), which are typical of most climate models (e.g., Covey et al. 2016). By contrast, ICON-NWP, which uses the convection scheme developed at ECMWF (Bechtold et al. 2014) and is exemplary of models with state-of-the-art convective parameterization that have been developed to address the bias in the diel cycle of precipitation, has much smaller systematic errors in its representation of the diel cycle (panel f). However, even in the ICON-NWP simulations, precipitation still peaks too early and decays too strongly as the sun retreats. The better simulation of night-time precipitation by the LEM, as compared to the best performing model with convective parameterization, is evident by its ability to produce substantial (and more intense) precipitation falling after 2000 UTC, which is similar to what was observed. In addition to the total amount of precipitation, the distribution of precipitation rates is improved by LEM. This is illustrated in Fig. 13, where the relative rain fraction asymptotes to the fraction of the observed precipitation that is simulated. The flatness of the curves for ECHAM (P > 2 mm h -1 ) and ICON-NWP (P > 3.5 mm h -1 ) is indicative of a lack of more intense precipitation in the models with parameterized convection. Although not illustrated in the figure, because deviations are too small compared to the differences to the other models, there is a slight tendency for the LEM to better match the observations as its grid-spacing is refined from 625 m to 156 m. A systematic early decline in night-time precipitation was also noted in convection permitting simulations centered over Germany by Rasp et al. (2018), which were performed with the COSMO model on a grid of 2.8 km for 26 May to 9 June 2016. This suggests that for precipitation, the added value of the LEM, as compared to the SRM, might be more evident at night.

Clouds
For the evaluation of clouds we used simulations over the DE domain to take advantage of a dense network of observations collected over Germany (Lammert et al. 2019), spanning an area of 360000 km 2 . Simulations on the TA and BB domains enable comparisons with aircraft observations from the NARVAL expeditions (Stevens et al. 2019a, b) and ongoing measurements from the Barbados Cloud Observatory . Over the TA/BB domain the spatial sparseness of the observations poses greater challenges for the model evaluation, notwithstanding surface conditions considerably more homogeneous than those over the DE domain.

Statistical signature of clouds over the DE domain a. Cloud condensate distribution
By virtue of its close connection to cloud optical thickness, condensate water path (CWP) links strongly to the radiative effects of clouds (Stephens 1978;Harshvardhan and Espinoza 1995). Figure  14a presents the cumulative distribution of CWP for ICON-LEM, ECHAM and ICON-NWP, with measurements from all available twenty-two MODIS 3 Fig. 13. Night time (20 -24 UTC) relative rain fraction over precipitation rate for five convective days (see Table 1) in the DE domain. The relative rain fraction is calculated by dividing the amount of precipitation contributed by rain rates smaller or equal to a given precipitation rate by the total precipitation amount found in the observations. Observational data is taken from the RADOLAN network. As in Fig. 12f, only the set of convective days is considered (as in Fig. 12). overpasses taken from eight simulation days and from the SEVIRI. SEVIRI's footprint is as much as twenty times coarser than MODIS (1 km 2 at nadir). To ensure some degree of homogeneity only MODIS pixels with viewing angles within 40° off nadir were selected, and to avoid biases from the limited sensitivity of the instruments a detection threshold of CWP < 10 g m -2 was applied. Because the retrievals use measurements of reflected sunlight the comparison is done for daytime only. Similar filtering is applied on ICON and ECHAM simulations that are geographically matched (through the nearest neighbor) to the spatial and temporal footprint of the satellite. Temporal matching is ±150 s for ICON-LEM, and ±1800 s for ECHAM and ICON-NWP, whose CWP output is hourly. Figure  14 indicates that ICON-LEM is in close agreement with MODIS (for CWP > 100 g m -2 ) the instrument to which its resolution is most commensurate, albeit with a greater contribution from low CWP than seen by MODIS. Less agreement at lower CWP may be a shortcoming of the simulations, or indicative of limitations in the sensitivity of MODIS to clouds that are optically thin or composed of very small droplets; small CWP values may also contain a non-negligible contribution from thin ice clouds in single-or multilayer conditions, which are poorly treated by MODIS retrievals (Sourdeval et al. 2016). Differences between the MODIS and MSG retrievals (as presented in Section 3) are expected as a result of differences in sensor footprints. The much larger MSG footprint effectively smooths the CWP field, leading to lower values, and introduces systematic biases due to heterogeneity effects as discussed by Heinze et al. (2017). This is consistent with MSG retrievals better matching to the lower resolution models (ICON-NWP and ECHAM). However, given that both ICON-NWP and ECHAM have grid cells much coarser than even the MSG footprint, the tendency of their cumulative distributions to lie between that of MODIS and MSG is indicative of a bias. If their CWP were consistent with the observations one would, by virtue of their coarser resolution, expect the distribution to shift to smaller values than those measured by MSG.
The lack of such a shift implies clouds that are on average too bright. Simulating too much condensate has the beneficial effect of compensating systematic biases arising from a failure to account for sub-grid heterogeneity so as to still get the correct irradiance at TOA. Earlier studies, using very different methods, came to similar conclusions, for example Nam et al. (2012). The smoothing effect of off-nadir retrievals may also explain the discrepancy between the LEM and MODIS. It certainly seems plausible that the LEM would still under-represent the optically thinnest clouds, but at a first look, the agreement between the observations of CWP and the LEM output agreed.
The distribution of the liquid water path is compared with the observations in Fig. 14b. For this analysis, the ICON-LEM low-level cloud LWP is coarse-grained to the MSG grid and a detection threshold of LWP < 1 g m -2 is applied. Only values during daytime (between 6 UTC and 18 UTC for the days investigated) were analyzed. The peak at LWP » 1000 g m -2 in ICON-NWP is caused by one day (29, Jul 2014) in association with a severe storm. Given the five-hundred fold difference in the MSG versus ECHAM footprint, the lack of a shift in the ECHAM LWP distribution toward low values relative to MSG is also indicative of the parameterized clouds in both models being too bright. ICON-LEM using samples coarse-grained to the MSG footprint is more consis- tent with the observations.

b. Profi les of cloud cover
To evaluate vertical cloud fraction profi les we fi rst use super site measurements from JOYCE-CF in the west of Germany and from RAO, near Poland, in eastern Germany. Comparisons are made between 6 UTC and 0 UTC to avoid problems with the model spin-ups during the fi rst six hours of the simulations. Observed cloud profi les are based on the Cloudnet target classifi cation following Illingworth et al. (2007). Comparing spatially averaged fi elds to point measurements is an imperfect exercise. However, large qualitative differences emerge between the observed profi les and those produced by the models for which clouds are parameterized (Fig. 15). Both the SRM (COSMO-DE) and the LEM better represent the structure of the observed profi les, with a double maximum with peak coverage in the lower (near 3 km) and upper (near 9 km) troposphere. Quantitatively the LEM produces substantially few clouds, in better accord with the observations. Hentgen et al. (2019) similarly found an over-prediction of clouds at mid-levels in simulations using COSMO over Europe. For high cloud cover (above 10 km) ICON-LEM and the SRM (COSMO-DE) produce too much cloud-cover compared to the coarser models and the observations. The color shaded areas in Fig. 15 delineate the range of profi les obtained by varying the threshold of IWP required to identify a scene as cloudy. The region of shading thus demonstrates that ECHAM's and ICON-NWP's sensitivities to thin ice clouds are mostly manifest above 6 km and that ICON-LEM simulates a larger fraction of optically thin cirrus and ice clouds in general.
Poor accounting of spatial variability limits one's ability to draw strong conclusions from the above analysis. In an effort to partially address this shortcoming the coverage of ice-clouds is also compared to cloud cover derived from satellite using the CiPS retrieval (as described in Section 3). Figure 16 indicates that the CiPS derived cirrus cloud cover frequency distribution is largest for low coverage and drops for increasing cover, a quality represented by all models. The frequency distribution of ice cloud cover simu- Fig. 15. Comparison of average cloud cover profi les for six simulated days. Vertical profi les display mean values between 6 UTC and 24 UTC comparing different simulations to profi les obtained from cloud radar observations from the two super sites located in West (JOYCE-CF) and North-East (Lindenberg) Germany. Color shaded areas for model outputs represents the sensitivity to different IWP thresholds, while grey shaded area shows the mean difference between the two measurement sites. lated by the LEM is flatter and aligns more closely with the CiPS retrievals than the other models do. The shaded areas show the variation in cirrus cloud fraction due to the different IWP thresholds (following the detection probability of CiPS) as applied to the model output. This analysis also reaffirms the inference from the previous figure (Fig. 15), whereby the ice-cloud coverage in the LEM samples more thin ice-clouds than the models with parameterized convection. Different microphysical parameterizations are likely to also play a role, which makes it difficult to attribute the better performance of the LEM only to its ability to explicitly represent the condensate transport.

c. Cloud base height evaluation
Cloud-base height has a strong influence on downwelling long-wave radiation at the surface, and hence, on the surface energy budget. Figure 15 suggests that cloudiness near the cloud base (between 1 km and 2 km) simulated by the LEM is less (by as much of a factor of two) than observed, whereas ICON-NWP agrees well with the observations. ICON-LEM features fog and low stratus over the marine coastal regions on approximately half of the simulated days. This peak is not represented in ceilometer observations because they are situated over land. Above 3 km the situation is reversed, with the LEM better representing the coverage of mid to high-clouds, as discussed in the previous section (6.1b). The apparently deficient representation of low clouds by the LEM may be misleading however, as low-clouds are more strongly tied to surface features and thus more likely to be biased by poor sampling. When compared to 155 ceilometer stations evenly distributed over Germany (Kotthaus et al. 2016;Wiegner et al. 2014), the LEM cloud-base heights are more uniformly distributed between 0.5 km and 4 km and in better agreement with the observations, particularly in the afternoon and early evening after the convective boundary layer has been established (Fig. 17). ICON-NWP, by contrast accumulates cloud bases near 1 km, consistent with its peak in cloudiness at this level in Fig. 15.
The ceilometer data also provide an opportunity to study how cloud-base evolves over the day. Ceilo- The ceilometer measurements have also been used to test the representation of cloudiness duration statistics. The length of contiguous cloud-base height returns at individual instruments provides a measure of cloud size statistics. The ICON-LEM captures the roughly exponential distribution of observed cloud duration, becoming increasingly flat and in better accord with the observations as Δ x is reduced (Fig. 18). Thus the frequency of short events increases at the expense of longer events as grid spacing is refined. Short-lived clouds are simulated at the highest model resolution as frequently as observed, and the bias toward more long-lived clouds is reduced with increasing resolution.
The preceding discussion demonstrates that even for a well instrumented, and relatively homogeneous region, such as Germany, a definitive evaluation of simulated cloud statistics is difficult. Nonetheless we venture some conclusions. Overall, the ICON-LEM produces a more compelling representation of the cloud field than the models that must parameterize the scales of motion to which clouds respond. Despite some deficiencies in the way clouds are calculated in ICON-LEM, their representation is expected to improve with finer resolution. This seems trivial, but when one reflects on the construction of models with parameterized convection, whose parameterizations act independently in each grid box or column, it quickly becomes clear that convergence is a more intrinsic property of the LEM and SRM approach. Barring a few notable and increasingly anachronistic exceptions, e.g., simple statistical approaches as in Sommeria and Deardorff (1977), models with parameterized clouds lack this property, as when cloud are parameterized, every different resolution defines a different model.

Representation of clouds over the TA domain
The wide variety of cases simulated in the HD(CP) 2 project allows us to contrast the simulation of clouds over Germany with those over the tropical oceans. For this purpose we make use of a special dataset with long-term surface observations at the Barbados Cloud Observatory ) and airborne measurements from two field campaigns (Stevens et al. 2019a). In analogy to Fig. 15, Fig. 19 presents a comparison of the simulated cloud amount profile with measurements from a nadir staring cloud radar deployed from the high-altitude research aircraft High Altitude and Long Range Research Aircraft (HALO). The analysis suggests that the models with parameterized convection underestimate the amount of low clouds and over-estimate the coverage of high-clouds and that ICON-LEM is in better accord with the measurements. When neglecting ice clouds with ice water paths below 3 g m -2 , the upper level cloud amount is substantially reduced. Nevertheless, the fraction of opaque cirrus clouds for ICON-NWP and ECHAM still exceeds the total cirrus cloud fraction of ICON-SRM including thin cirrus. This finding is consistent with that of Cesana and Waliser (2016) who found that large-scale models with parameterized clouds overestimate high-cloud frequency and vertical extent. All models miss the local cloud-amount maximum near 9 km; this is thought to reflect chance detrainment from a nearby deep convective cloud on one of the flights, rather than a persistent feature. In terms of low clouds, the good agreement between the ICON-LEM and the airborne cloud radar might be misleading. Unlike the Cloudnet target classifi cation used to measure cloudiness over the two German super site stations JOYCE-CF and RAO, or the high-sensitivity ground-based radar system at the BCO, the airborne radar is less sensitive to small clouds, particularly if they lack a precipitation signature. As a result, it is blind to many small clouds whose radar refl ectivity is below -30 dBZ (Stevens et al. 2019a, b). For similar conditions as simulated and observed over the TA domain, Nuijens et al. (2014) demonstrated a peak in the cloud-base cloud fraction of approximately 14 %, twice as large as what is measured with the airborne radar.
The ground-based radar at the Barbados Cloud Observatory has a much greater sensitivity (-60 dBZ at cloud base) and although it remains a point measurement it allows a comparison of cloudiness at cloud base as a function of time. Analysis of the Barbados data indicates a distinct diurnal cycle in low clouds (Vial et al. 2019). To illustrate the diel cycle the cloud radar data are segmented, and clouds are classifi ed as 'dry-cumuli' if they do not have a precipitation echo that extends below cloud base, and if their cloud base is near the lifting condensation level. 'Moist-cumuli' are those with echoes that extend below the liftingcondensation level, and stratiform clouds are those without a precipitation echo and whose cloud base is well above the lifting condensation level of nearsurface air. The relative frequency of these different clouds (Fig. 20) exhibits a distinct diel cycle, with 'dry-cumuli' having a minimum near local noon and maximizing around midnight. 'Moist cumuli' increase through the night, maximizing their coverage in the early morning hours, as a result the net total cumulus cloud-base fraction increases after sunset, maximizing near midnight. Stratiform clouds peak around sunrise, and total coverage from clouds at cloud base (summing the dry and moist cumuli) is between 20 % to 25 %, consistent with the inference in the previous paragraph, that the airborne radar measurements underestimate cloud amount. Figure 21 illustrates the diel composites of cloud amount across the models and as a function of resolutions. At kilometer grid-spacings it would be surprising if the model captured the signature of shallow convection. Consistent with this expectation, the SRM distorts clouds, simulating a peak in cloud base cloud amounts that arrives too early in the evening, cloud-fraction profi les that are insuffi ciently differentiated into cumuli rooted at the lifting condensation level, and the lack of a distinct mode of stratiform cloudiness. However, in contrast to the models with parameterized convection (ECHAM and ICON-NWP), the SRM at least captures important elements of the cloud structure, reminiscent of what was found over the DE domain (Fig. 15), Despite substantial quantitative differences (see change in scale), unlike the models with parameterized convection, ICON-SRM simulates a midday minimum in cloudiness, and cloudiness aloft maximizes in the early morning. As the grid is refi ned 4 the evolution of the cloud profi le increasingly accords with observations. The stratiform cloud peak begins to become distinct from the cumulus cloud peak, the early evening peak in cloud base cloud amount becomes less pronounced and shifts toward later times, and the midday minimum in cloudiness becomes increasingly pronounced. The gradual reduction in cloudiness near cloud base does, however, raise the question as to whether the LEM scales begins to underestimate cloud amount, something that the EUREC 4 A measurements (Bony et al. 2017) aim to adjudicate.

Parameterizations
Even if decameter or hectometer scales are resolved, some processes will occur on scales that are unresolved. These are radiative transfer, cloud microphysics, and small scale turbulence, as well as the land surface and subsurface. In this section we report on areas where we identifi ed an added value, or additional diffi culty that emerges in association with some of these processes as storm and large-eddy scale motions begin to be represented over very large and less idealized domains.

Radiative transfer
As the grid-scale becomes fi ne enough to resolve individual clouds, the full distribution of the cloud optical depth avoids the need for tuning parameters that parameterize cloud heterogeneity (Cahalan et al. 1994). This opens the opportunity to use such simulations as a complement to observations to assess the role of cloud heterogeneity in radiative transfer for coarse-resolution models, and to develop subgrid scale parameterizations to compensate for biases arising from radiative transfer schemes of coarseresolution models (Barker and Räisänen 2006). Numerical artifacts are expected to increasingly contaminate scales in the neighborhood of the grid scale (see also Bley et al. 2017), but as hectometer scales begin to become resolved, horizontal photon transport is expected to have a greater infl uence and threedimensional radiative transfer approximations might be required. Therefore, at hectometer scales new challenges emerge in the treatment of radiative transfer.
Failing to account for horizontal photon transport at hectometer scales leads to biases in the treatment of radiation. Figure 22 illustrates the upwelling solar irradiance at TOA, which is mostly patterned by the condensate fi elds taken from a snapshot of the 156 m ICON-LEM simulation at 1202 UTC on 29 June 2014. Irradiances are calculated using the 3D radiative transfer model MYSTIC (Mayer 2009). The threedimensional calculations are compared with results for a one-dimensional simulation using the independent column approximation (IPA), which neglects horizontal photon transport. In the simulation a constant surface albedo is prescribed and the sun is positioned to the south at an angle of 60° from zenith. The calculations indicate that the IPA introduces a well-known bias that we are now able to quantify over a large domain with a realistic representation of clouds. Local differences can be of the same order of magnitude as the irradiance field itself, and a substantial component of the bias is not reduced by averaging, even when averaged over an 80 km grid (Fig. 22d). The mean irradiance is 275 W m -2 for the full 3D calculation, as compared to 257 W m -2 for the 1D calculation, corresponding to a scene bias of -18 W m -2 or 7 %. The 3D calculation leads to a much smoother irradiance field (compare panels a and b in Fig. 22), as horizontal photon transport acts as an effective diffusion. Physically, the bias is associated with an underestimation of the albedo in the 1D case, which arises from not accounting for asymmetries in the loss or gain of photons through cloud sides. This effect comes from a combination of inhomogeneities in the optical medium (clouds) and their asymmetric illumination. Even in the absence of biases horizontal photon cloud fraction / % transport will lead to a differential heating of the cloud. In idealized LES this effect has been demonstrated to be important for the cloud fi eld development Klinger et al. 2017Klinger et al. , 2019. Whether these effects substantially distort the representation of the cloud fi eld in less idealized situations, as well as how to capture these effects in parameterizations, is a topic of active research Mayer 2015, 2016;Klinger and Mayer 2016).

Cloud microphysics and convection
Even with perfect understanding of cloud microphysical processes, which is unfortunately far from reality, it is not clear what scales of fl uid motion infl uence the microphysical development of clouds. Certainly, the distribution of vertical velocities is important (Reutter et al. 2009), as it controls the adiabatic cooling and warming, the rate of water phase changes, and how hydrometeors of different sizes are transported through the fl uid. As storm-resolving and large-eddy models begin to resolve the vertical motion fi eld (Fig. 4), they start to create a physical basis for capturing some of these effects.
The advantage of resolving the vertical motion fi eld for the representation of cloud microphysical processes becomes evident from a comparison of simulations over the NA domain that apply progressively fi ner resolution and different degrees of microphysical complexity. For these purposes, additional simulations are performed as a sensitivity study for one of the simulated NA days (these additional simulations have not been listed in Table 1). In these simulations the grid-spacing of the simulations is varied from 2.5 km to 80 km, and simulations at each grid-spacing are performed using the one-moment cloud microphysical scheme with graupel also used for the ICON-SRM simulations over the TA domain (Baldauf et al. 2011), and the two-moment cloud microphysical scheme of Seifert and Beheng (2006) used in ICON-LEM. The case of a warm conveyor belt, associated with the cyclone on 23 September 2016 (known in Germany as Vladiana) is studied. For comparison with observations, TOA irradiances and cloud cover from the CERES SYN1deg-1hr Edition 4A dataset are used in Fig. 23.
We fi rst consider the ensemble of simulations that treat convection in an explicit manner and do not apply a convective parametrization scheme (squared markers in Fig. 23). As resolution is refi ned, the simulations progressively approach the observations. Although the simpler microphysical scheme best matches the observations, it does so in a way that suggests a more fi nely resolved simulation would 'overshoot' the observations -making one suspicious that the match arises from some degree of error compensation (recall the preceding discussion of planeparallel biases). Cloud cover is progressively reduced with fi ner resolution. A factor of two in resolution has a commensurate effect as a change in the level of complexity in the microphysical representation, although the origin of these changes may differ.
The analysis is also repeated for simulations with the convective parameterization enabled, using the convection scheme of the global ICON-NWP simulations. Skamarock et al. (2014) demonstrated that including a cumulus parameterization considerably reduces the power in the vertical motion fi eld, actively working against the scales of motion that the SRM the difference between a 1D (IPA) and 3D radiative transfer simulation c); and 80 km average of 1D-3D irradiance differences d).
is trying to resolve. This might explain why for simulations with parameterized convection an increase in resolution beyond 10 km no longer has an impact and the simulations stall at some distance from the observations. This may also explain the substantially reduced sensitivity to the microphysical representation when convection is parameterized. The impact of the microphysical scheme is much larger for simulations with explicit convection, which is linked to a substantial microphysical sensitivity of high-level clouds and cloud ice in the simulations with explicit convection (not shown). By contrast, the sensitivity of low-level clouds and cloud liquid to the microphysical scheme is smaller and less systematic and does not appear to be strongly modulated by the presence or absence of a convection scheme. This may reflect the fact that even at the finest grid spacings of 2.5 km, low-level clouds remain marginally resolved (recall Fig. 21), thereby damping the coupling of the microphysics to the dynamics even in the absence of convective parameterization.
From this analysis, we hypothesize that a model setup with explicit convection is favorable, as it allows the differences in the representation of cloud microphysics, as well as its coupling to the dynamics, to be represented more physically. An open question is to what extent the effects of unresolved motions on the microphysical development of clouds can be adequately represented. There are reasons to be optimistic that these types of parameterization problems might be more solvable than those posed by more coarsely resolved models with parameterized convection. These prospects depend on the extent to which the effects of the microphysical processes can be observed, using satellites and ground-based networks, on the same scale on which they are simulated, and on the similarity between the unresolved motions and those that begin to become resolved at 2 km and finer.

Small-scale turbulence
Small-scale turbulence, particularly in regions of stratification or near the surface, can have a large influence on matter and energy transports as well as cloud formation. Here mixing on scales of centimeters can become important, and is far from being resolved, even by LES (Ansorge and Mellado 2016;Mellado et al. 2018;van Stratum and Stevens 2018). However, non-turbulent motions, e.g., mesoscale motions and flow circulations are resolved on hecto and kilometer scales, and these can increase shear locally and trigger intermittent turbulence.
Given the still relatively coarse scales, as compared to the scales of nocturnal turbulence near the surface, we investigate the ability of the ICON-LEM -especially as compared to the global models -to simulate the bulk statistics of the small-scale vertical motions. To do so we compare the vertical velocity statistics from eight nights in 2013 (20,(24)(25)(26)2,5,11,28 May) of simulation over JOYCE-CF with wind lidar measurements at the site. Not only does the ICON-LEM capture the variance in the vertical wind in the lower 1.5 km above the site (Fig. 24), but it also represents the intermittent character of the turbulence well. The latter is quantified following Rotta (1956), where the fraction of the night that is turbulent (intermittency factor γ) and the number of turbulent events per night (intermittency number n) are described (see Fig. 24b). In calculating these statistics, we distinguish between non-turbulent and turbulent events, using a threshold of one standard deviation of the total duration of the observed or model-output signals of σ w . Each of the signals is normalized by its maximum value. The σ w within the nocturnal boundary layer (NBL) was calculated as an average in height up to 1.5 km from the surface, and for each time step.

CERES
The ICON-LEM is, therefore, able to resolve parts of the intermittent character of the NBL turbulence which the coarse simulations cannot capture. Although the horizontal grid-spacing of 156 m is still not sufficient to resolve smaller (sub-meso and turbulence) scales of the NBLs, case studies indicate that the model is able to simulate the effects of larger (meso-) scales (e.g., low-level jets) on NBL turbulence variability, and thus capture important preconditioning circulations to which the smaller scales should be responding (Marke et al. 2018). This is encouraging for the hypothesis that some of the smaller scales of motion that are not directly captured by the LEM or SRM, can be more readily parameterized as the conditions that cause the motion are resolved.

Summary
We report on very large domain storm-resolving (1.25 km to 2.5 km grid mesh) and large-eddy (156 m grid mesh) simulations designed to test the hypothesis that by resolving motions in the atmosphere's third (vertical) dimension, a much better representation of clouds and precipitation becomes possible. Twenty simulated days were performed and evaluated over Germany, a heavily instrumented region in central Europe, with grid spacings of 156 m over a region with a north-south extent approaching 1000 km. Complementing these simulations are storm-resolving (kilometer) scale simulations spanning the North Atlantic (14 days), the Tropical Atlantic (62 days), and the Maritime Continent and East Asia (15 days), as well as large-eddy resolving simulations in subdomains of the Tropical Atlantic, where extensive ground-based and airborne measurement data have been collected. The realism of the simulations and the synoptic conditions in which they are performed allows them to be quantitatively compared to data, and to coarsely resolved simulations that are the stateof-the-art in climate modeling. Further aiding such an evaluation is the ability of the simulations to resolve scales of motion similar to those observed.
In accordance with earlier studies, the energy spectra show that the variance of the vertical velocity has its energy uniformly distributed across all scales larger than a few kilometers. Contributions to the variance only begins to diminish as the scales of threedimensional turbulence begin to be resolved, at grid spacings on the order of hundreds of meters. Hence, at storm-resolving scales the vertical dimension of the atmospheric motions begins to become resolved. Given the importance of vertical motions for cloud and precipitation development, and thus the exchange of energy in the vertical, this is not a trivial distinction.
In almost every respect, the ability to represent the vertical motion field leads to an improved, and more physical, representation of clouds and precipitation compared to models using parameterizations to represent cloud macrophysics and moist convective processes. In some fields, such as the distribution of condensate (liquid and liquid/ice) water path, or the size distribution and diel evolution of clouds and precipitation, the hectometer scale simulations are difficult to discriminate from the observations. Improvements in precipitation include a better spatial distribution, a more realistic propagation of precipitating systems, and a greatly improved diel cycle. These improvements include not only the timing and amplitude of the daily maximum in precipitation but also the evolution of precipitation through the night.
A novelty of the present study is its extension to hectometer scales. Here changes in precipitation as the grid is further refined tend to be smaller than differences between simulations with explicit versus parameterized changes. There are quantitative changes, a systematic reduction of precipitation, as finer (hectometer) scales are being resolved, and progressive improvement in the representation of nocturnal precipitation, as well as a steepening of the frequency distribution of precipitation, indicative of relatively fewer extremes. Despite progressive improvement in precipitation at hectometer scales, our findings give us the intermittency factor, γ (graphical symbols), and the intermittency number, n (numbers above symbols), are presented for the same platforms (see text for details).

ICON-LEM (156 m)
OBS ICON-NWP ECHAM confidence that many biases in the simulation of the hydrological cycle by global climate models would already be addressed by running models at kilometer scales.
Clouds also exhibit systematic improvements as the grid mesh is refined to hectometer scales. Particularly over the Tropical Atlantic, simulations on these scales better represent the observed structure of the shallow clouds. Over land as well, the simulations over Germany indicate an improved life cycle and spatial distribution of boundary layer clouds. In stormresolving simulations (i.e., models with kilometer grid spacings), there is a hint of some of the features, such as a better representation of the diel cycle and a clearer distinction between cumulus and stratiform cloud contributions to trade-wind cloud regimes, that emerge at higher resolution. Simulations in which the grid spacing is progressively increased and in which deep convection is not parameterized, also show a more physical coupling (convergence behavior) between cloud dynamic and microphysical representation. Likewise even for scales of motion that one would not expect to be resolved at hectometer scales, such as those associated with the nocturnal boundary layer, the simulations do a surprisingly good job at capturing the spatiotemporal structure and variability that is observed in the vertical motion field.
Our investigations suggest that the parameterization problems associated with the remaining unresolved degrees of freedom appear to be better constrained, either through a better representation of their drivers or through a greater affinity to data. However, some new problems also emerge at higher resolution, for instance, the need to account in some manner for horizontal photon transport.
Many of our findings may not seem surprising, as to hypothesize that a model that begins to resolve the vertical dimension of the atmospheric motion field by using the basic laws of physics, is better at representing clouds and precipitation. Nonetheless, when discussing the advantages of storm resolving simulations, our experience is that many colleagues are quick to point out the remaining deficiencies of such models, to the point where they sometimes even questions whether there is a net benefit from such approaches at all.
Looking forward, as global simulations are now becoming practical at storm-resolving scales, the first climate projections using these types of simulations can be expected to follow in a few years (Stevens et al. 2019b). The first global LES, of a day or two in duration, are being anticipated on a similar time hori-zon (Satoh et al. 2019). Unless something happens to drastically change the landscape of information technology it seems unlikely however, that long, multiyear, global LES will ever become possible. Hence, the best hope is that global storm-resolving models capture enough of the motion field in the atmosphere's third dimension, and that the smart application of observations combined with short snapshots from global LES can be combined in ways that usefully constrain important degrees of freedom that are still not adequately captured at kilometer scales. If society hopes to anticipate how warming will change regional climate, particularly possibly large changes that would accompany shifts in circulation systems, then the present study leads us to believe that it will most likely follow from the intensive development and application of storm resolving models to the climate problem, globally.

Acknowledgments
The project HD(CP) 2 (High Definition Clouds and Precipitation for advancing Climate Prediction) was funded by the German Federal Ministry of Education and Research within the framework programme "Research for Sustainable Development (FONA)" under the numbers 01LK1501 -01LK1508A and 01LK1509A. We gratefully acknowledge the computing time that was granted by the German Climate Computing Center (DKRZ) and thank its various groups for the support in performing the simulations and in the data handling. Data were provided by Jülich Observatory for Cloud Evolution (JOYCE-CF), a core facility funded by Deutsche Forschungsgemeinschaft via grant DFG LO 901/7-1. We would like to thank Armin Mathes, Paul Dostal and Jochen Elberskirch for the administration of the project and their unwavering support. The manuscript also benefited from the insightful input of an editor, Akira Noda, Christoph Schär and an anonymous reviewer. The study was designed and executed by the lead authors (BS, CA, AH, RH, CK, DK, HR, WS, and JW), but all authors contributed to some element of the study, by contributing to the analysis, the conception of the analysis, or the interpretation of the results. All authors have read the manuscript. Vasileios Barlakas' present affiliation is at Chalmers University of Technology, Gothenburg, Sweden.    (13) n/a n/a 289.7 (4.9) 2.9 (2.9) 21.6 (7.6) 47 (40) n/a 3.0 (0.9) Table A3. As Table A2 for BB domain simulations (300 m LEM domain), for NARVAL 1 (upper row) and NARVAL 2 (lower row), for dates as specified in, Table 1. Observational estimates from HOAPS (LHF), NTAS buoy (wind speed), NTAS site (precipitation), and BCO (T 2m and HATPRO for PW, LWP) and CERES (TOA long-wave and TOA net solar radiation). Large differences in the near surface wind, V 10m , may be indicative of the point measurements from a single buoy being unrepresentative fo the domain mean. -206 ( 6) -259 ( 7) -214 ( 4) -266 ( 8) n/a n/a LW TOA [W m -2 ] 289 ( 4) 266 (13) 290 (11) 244 (47) 287 (10) 269 ( 9) 294 ( 6) 271 (10) 274 (20) (32) 236 (27) 123 (23) 324 (42) 147 (47) 274 (21) 136 (37) n/a n/a  -202.4 (6.3) -222.0 (3.9) n/a n/a 12.6 (4.5) 12.7 (2.5) 11.6 (3.9) 11.0 (2.8) n/a n/a  Table A5. As Table A2 but for NA (2.5 km) domain simulations, as in Table A2, for dates as specified in , Table 1. The values are for 14 selected days of the NAWDEX field campaign (Sep-Oct). Observational data is taken from CERES (short-and long-wave radiation), GPCP (precipitation), CMSAF (PW) and ERA-interim (SHF, LHF, T 2m , and V 10m ). LWP is a September-October average from MAC.