1 Introduction

Deep convection is a common ingredient of many atmospheric extreme events and causes weather-related hazards globally (e.g., Doswell III et al. 1996; Kunkel et al. 2012). Deep convection causes major societal impacts due to flooding, wind gusts, hail, tornadoes, landslides, and debris flows. It is essential for hydrology since it is the dominant source of precipitation in the tropics, subtropics, and mid-latitudes during summertime (Yang and Smith 2008). For example, half of the summertime rainfall in the US plains originates from deep convective systems (Jiang et al. 2006).

State-of-the-art climate models are not able to represent convective precipitation explicitly because of their coarse grid spacing [larger than 12/100 km in regional/global climate models; Taylor et al. (2012)/Jacob et al. (2014)]. These models rely on convection parameterization schemes that are major sources of errors and uncertainties (e.g., Déqué et al. 2007). Recently, increasing computational resources made convection-permitting climate simulations (CPCSs) feasible that operate on horizontal grids < 4 km (e.g., Prein et al. 2015) and enable the explicit simulation of deep convection.

An advantage of CPCSs is their ability to improve the simulation of the convective precipitation diurnal cycle [see Prein et al. (2015) for a review]. This demands a realistic simulation of the evolution and propagation of deep convection. Our 4 km grid spacing model is too coarse to realistically simulate single cell thunderstorms, however, Mesoscale Convective Systems [MCSs, Houze (2004)] that consist of a complex of thunderstorms that become organized, can be captured. In North America, these storms include squall lines (Rotunno et al. 1988), which are storms arranged in a line along a evaporatively generated near-surface cold outflow and Mesoscale Convective Complexes with a large circular cloud shield (Maddox 1980). Despite the advantages of CPCSs, a kilometer scale horizontal grid spacing can only resolve large-scale convective motions and has been shown to result in large convective cells that do not entrain midlevel air (Bryan and Morrison 2012). A more realistic simulation of turbulent entrainment and up/down drafts would demand reducing the grid spacing by an order of magnitude, which is not affordable with current computer resources.

To identify MCSs in observations and our simulation we use a new variation of afeature-based verification method called the method for object-based diagnostic evaluation (MODE) (Davis et al. 2006, 2009) that incorporates the time dimension [MODE time-domain or short MTD; Clark et al. (2014)]. MTD is part of the Developmental Testbed Centers (DTC) Model Evaluation Tools (MET; the current version is available online at http://www.dtcenter.org/met/users/downloads/). MTD outputs all basic statistics included in MODE such as object size, location, and orientation, as well as information on the entire life cycle of objects including speed, changes in size and intensity, lifetime, and track length.

MODE has been used successfully to detect biases in short-term weather forecasts (e.g., Davis et al. 2006, 2009; Clark et al. 2012; Mittermaier and Bullock 2013). It was reported that coarse-resolution forecast models produce too many large rain areas and underestimate rainfall intensities (e.g., Davis et al. 2006; Wernli et al. 2008). Convection-permitting weather forecasts show substantial improvements over coarse-resolution forecasts and reduce the size and intensity biases (Wernli et al. 2008; Davis et al. 2009). Similar improvements are found in CPCSs (Prein et al. 2013a; Brisson et al. 2015).

Using a feature-tracking algorithm, Chang et al. (2016) evaluated a 12 km resolution climate model simulation during June, July, and August (JJA) over the US and found similar model biases as in low-resolution weather forecasts, i.e., an underestimation of hourly precipitation intensities and an overestimation of the rainfall area. Using the MTD tracking algorithm, Clark et al. (2014) showed that convection-permitting weather forecasts are able to capture the spatial distribution, lifetime, and the diurnal cycle of MCSs but underestimate their translation speed.

In this manuscript we will assess if CPCS are able to simulate MCSs in a similar quality than short-term numerical weather forecasting models. The goal is to understand if CPCS can be used to analyze the impact of climate variability and climate change on North American MCSs.

2 Data and methods

2.1 Model setup and observational data

Fig. 1
figure 1

Filled contours showing the model orography in the simulation domain (gray area) and the evaluation region (colored area). The red polygons show the outlines of the investigated climate regions. The computational domain contains \(1360 \times 1016 \times 51\) grid cells

The Weather Research and Forecasting (WRF) model Version 3.4.1 (Skamarock and Klemp 2008) is used to downscale the European Centre for Medium-Range Weather Forecast Interim Reanalysis (ERA-Interim) (Dee et al. 2011) over large parts of North America to 4-km horizontal grid spacing (Liu et al. 2016; see Fig. 1). The model domain has a size of \(1360 \times 1016\) grid points and 51 stretched vertical levels topped at 50 hPa. The simulation covers 13 water years starting in October 2000 and ending in September 2013.

The main physics packages used in our WRF simulation are the Thompson aerosol-aware microphysics (Thompson and Eidhammer 2014), the Yonsei University (YSU) planetary boundary layer scheme (Hong et al. 2006), the Rapid Radiative Transfer Model (RRTMG) (Iacono et al. 2008), and the improved Noah-MP land surface model (Niu et al. 2011). Additionally, an upgraded lake water temperature treatment is implemented and spectral nudging (von Storch et al. 2000; Miguez-Macho et al. 2004) of temperature, horizontal wind speed, and geopotential height is applied. Only wavelengths larger than ~ 2000 km above the planetary boundary layer are nudged with a moderate nudging strength (coefficient) corresponding to an e-folding time of about 6-h (Liu et al. 2016). The nudging ensures that synoptic-scale features are similar to the observations while sub-synoptic-scale processes such as the upscale growth and organization of MCSs can freely evolve. The final model setup is based on a series of sensitivity test that incorporated model physics (Liu et al. 2011, 2016) and model grid spacing (Ikeda et al. 2010; Prein et al. 2013b). Additional information about the simulation can be found in Liu et al. (2016).

This simulation was evaluated in previous studies that found overall good performance in capturing the annual/seasonal/sub-seasonal precipitation and surface temperature climatology except for a summer dry and warm bias in the Central US, wet biases in the Southeast and Southwest during summer, and a dry bias in the Deep South (Liu et al. 2016). JJA hourly precipitation extremes, defined as the 99.95 percentile of dry and wet hours, are underestimated by ~ 30 % except along the Gulf and Atlantic coastline (Prein et al. 2017).

For the model evaluation, we use the National Centers for Environmental Prediction (NCEP) stage-IV analysis (Fulton et al. 1998; Nelson et al. 2016) that provides hourly precipitation on a 4-km Contiguous United States (CONUS) grid based on radar and gauge reports. Since stage-IV has quality issues over the western half of the CONUS, over ocean regions, and prior to 2002 [e.g., Nelson et al. (2016)] we decided to constrain the analysis to land regions east of the Continental Divide for the period January 2002 to September 2013 (see colored area in Fig. 1). For the MTD analysis, we regrided the simulated precipitation field to the stage-IV grid by conserving total precipitation. Then we applied the mask shown in Fig. 1 to the stage-IV and the modeled hourly precipitation. As a final step, we masked all grid cells in the model data that are flagged as missing in stage-IV. We performed this data preprocessing to minimize the effect of errors and limitations of the observations on our results.

2.2 MODE time domain (MTD)

MTD automatically identifies objects in a spatial field and tracks them over time. MCSs can be identified in multiple atmospheric fields such as mid-tropospheric vorticity (Wang et al. 2011), cloud-related fields such as cloud top temperatures (Feng et al. 2016), or precipitation (Clark et al. 2014). Here we use hourly precipitation because of its high socioeconomic relevance and the availability of the stage-IV dataset that allows a sound model evaluation. Note that MCS characteristics depend on the investigated variable. For example, tracking MCS precipitation will typically lead to shorter MCS lifetimes than tracking mid-tropospheric vorticity.

In the first step, MTD smooths hourly precipitation fields by averaging over all grid cells within a user-defined squared smoothing radius. The smoothing makes the precipitation areas more contiguous and helps to filter out small, weak storms that are smaller than the effective resolution (four to eight times the horizontal grid spacing) of the simulation. The second MTD step involves the application of a user-defined threshold to the smoothed field. All precipitation values below this threshold are masked. The smoothing and thresholding result in only identifying MCSs whereas smaller storms such as airmass convection and supercell thunderstorms that are not well resolved by the model, are masked. In the third step, contiguous precipitation regions are identified and are assigned to an identifying number. Contiguous precipitation areas are defined as grid cells that are adjacent in space and time (plus or minus one time step). A more detailed description of MTD can be found in the MET Users Guide (http://www.dtcenter.org/met/users/docs/users_guide/MET_Users_Guide_v6.0.pdf).

Most results presented in this study are derived by using a smoothing radius of eight grid cells (~ 32 km) and a threshold of 5 mm h\(^{-1}\) on hourly precipitation accumulations (except otherwise noted). To test the sensitivity of our results to the MTD settings we repeated all analyses with a threshold of 2.5 mm h\(^{-1}\) and smoothing radii of 16 and 32 grid cells. The impacts of these settings are discussed and examples are shown where needed. In general, increasing the smoothing radius and the threshold leads to fewer and smaller objects. As an additional constraint, we only analyze 3D objects that have a minimum volume of 2000 grid cells with precipiation above the threshold, which results in the detection of moderate to large-scale MCSs that have a lifetime of several hours up to days (e.g., a lifetime of 10 h and an average area of 3200 km\(^2\),  \(15 \times 15\) grid cells). Because of limited computational resources, we had to perform MTD analyses on a monthly basis. This means that an object that begins on the last day of a month and continue into the first day of the next month is treated as two separate objects. The effect on our results should be nonsystematic and nonsignificant because the truncation only occurs every ~ 700 time steps (hours in a month) whereas the lifetime of MCSs is on average ~ 10 h.

Since we are evaluating climate simulations we do not demand that individual MCSs are correctly captured by our model. Instead, we assess the model’s ability to reproduce the observed MCS climatology in four climate regions: Midwest, Southeast, Mid-Atlantic, and Northeast (Fig. 1). An MCS is assigned to the climate region where it spends more than half of its lifetime in terms of location of the MCS center.

Fig. 2
figure 2

An example for tracking an MCS with MTD that occurred on 12 March 2007, in Texas. Shown are observed (a) and modeled (c) hourly precipitation at the first, sixth, 11th, and 21st hour after the MCS genesis (light red to dark red contours detected by MTD in a, c). The MTD results of the temporal and spatial development of the observed (b) and modeled (d) MCS precipitation shows a complex system of convection organization and dissolution. The arrow in the upper right corner (a, c) shows the direction of view in b, and d and the gray shaded areas on the xz/yz plane shows the projection of the MCS to the longitude/latitude axis

An example MTD analysis of an observed and modeled MCS that occurred on 12 March 2007, in Texas is shown in Fig. 2. The maps show the MCS at different stages of its lifetime from one hour after its origin to its dissolution more than 20 h later (Fig. 2a, c). The system originated from multiple isolated small cells. After 6 h the scattered convective cells merged into a single object. At hour 21 the MCS weakened substantively and dissolved into multiple small cells. The model captures the main characteristics of the MCS well. Since we are investigating a climate simulation the aim is not to reproduce the observe storm perfectly but to show the model’s ability to reproduce realistic MCS dynamics and precipitation structures. The three-dimensional representation of the MCS in Fig. 2b, d further highlights the similarity of the observed and modeled system and the capability of MTD in tracking complex MCSs. A 3-dimensional animation of the observed and simulated MCS can be found in the electronic supplement.

3 Results

3.1 MCS frequency, location, and movement

Fig. 3
figure 3

Lines show observed (a) and modeled (b) MCS tracks in JJA and their color-coding corresponds to the MCS’s maximum hourly precipitation. The insets (red box in a, b) show tracks in the central US Relative differences in the MCS track densities (modeled minus observed MCSs crossing a 100 km × 100 km region) are shown in c

Fig. 4
figure 4

As Fig. 3 but results are shown for MCSs in May, June, July, and August separately (left to right column)

All JJA observed and modeled MCS tracks between 2002 and 2013 are shown in Fig. 3a, b. The model is able to capture the observed gradient of maximum MCS precipitation with frequent intense MCSs (above 90 mm h\(^{-1}\) maximum) along the Gulf and southern Atlantic Coast region and a decrease in frequency and intensity inland and towards the north. The large number of MCSs allow us to perform statistically robust analyses. The simulation generally underestimates the MCS track density in the Central U.S. by up to − 70 % (Fig. 3c–e). It overestimates the track density by more than 90 % along the Southeast coast, the Appalachian region, and parts of the border to Canada. The overestimation can partly be attributed to deficiencies in the observational dataset since precipitation systems in the mountains are not well captured with US radars and the merging of Canadian radar data into the stage-IV dataset is error prone (Zhang et al. 2016). The biases in JJA track density are closely related to biases in mean precipitation, which is up to 50 % underestimated in the central US and 50 % overestimated in the Southeast and Mid-Atlantic region (Liu et al. 2016). The model performance in simulating MCS frequencies has a distinct annual cycle Fig. 4. The central US low bias emerges during July and August whereas biases in May and June are moderate. Frequency biases in the Southeast are always positive but also intensify in late summer.

Fig. 5
figure 5

As Fig. 3c but for MTD smoothing radius set to 8, 16, and 32 km (left to right) and the hourly precipitation threshold to 2.5 mm h\(^{-1}\) and 5 mm h\(^{-1}\) (top down)

Sensitivity analysis on the effects of the MTD threshold and smoothing radius are performed (Fig. 5). The basic bias patterns are similar for all setups but the biases clearly intensify if either the higher (5 mm h\(^{-1}\)) threshold or a larger smoothing radius is used. This suggests that extremely strong and large MCSs are underrepresented in the central US and overestimated along the Gulf and Atlantic coast. The combination of a 5 mm h\(^{-1}\) threshold and a 128 km smoothing radius results in very few large and intense MCSs that do not allow robust statistical analysis because of their small sample size (Fig. 5f).

Fig. 6
figure 6

MCS density in areas of 100 × 100 km for all MCSs between 0–6 UTC (evening), 6–12 UTC (night), 12–18 UTC (morning), and 18–24 UTC (daytime) from left to right. Shown are the observed (ad) and modeled (eh) MCS density and their difference (ie). The Spearman correlation coefficients (SP) is shown above the difference maps (il)

There is a pronounced diurnal cycle in the occurrence of observed MCSs with a nighttime maximum in the central US and daytime/afternoon maxima in coastal regions (Fig. 6a–d), consistent with the observed variation in precipitation (Dai et al. 1999). The model has a little nocturnal MCS activity in the central US but with a substantially lower amplitude (Fig. 6f, j). MCS amounts are overestimated along the Atlantic coast, especially in the Mid-Atlantic region during the afternoon (Fig. 6d, h, l). The pattern correlation coefficients between the observed and simulated MCS density are highest in the afternoon (0.74; Fig. 6d, h, l) and lowest during the night (0.6; Fig. 6b, f, j). Model and observed MCS frequencies agree better when the diurnal cycle of occurrence probability is compared (see Fig. A1) since systematic frequency biases are not accounted for. This shows that the model is able to reproduce the phase of the MCS frequency diurnal cycle but has biases in capturing its amplitude.

Fig. 7
figure 7

Average observed and modeled (a, b) direction and speed of the MCS movement in 2° × 2° regions indicated by the direction and length of arrows, respectively. The arrow color shows the mean speed of the MCSs. e.g., a short red arrow indicates MCSs that are fast but move in different directions while a blue long arrow shows slower MCSs that move in the same direction. c and d show the observed and modeled annual average number of MCS genesis in areas of 100 × 100 km

The average MCS translation direction and speeds are comparable between the observation and simulation (Fig. 7a, b). Observed MCSs move eastward with velocities of ~ 40–50 km h\(^{-1}\) in the northern part of the study region (above ~ 40°N). The MCS movement displays a southward component east of ~ 90°–W. In the southern part of the domain (below ~ 40°) MCSs move slower with typical velocities below 30 km h\(^{-1}\). Over Texas, MCSs propagate southeast while east of the Appalachians MCSs move towards the northeast. The Appalachians are a transition region between the fast southeastward moving MCSs on the western side and the slower northeast-moving systems on the eastern side. Simulated MCSs propagate faster in the Northeast (at ~ 44° N, ~ 72° W) and the northeast movement east of the Appalachians is less pronounced (at ~ 37° N, ~ 82° W).

The hot-spots of MCS genesis are the found in the central U.S. and the eastern parts of South and North Carolina (Fig. 7c, d). The hot spot the eastern part of the Carolinas is captured in the simulation but the genesis frequency is overestimated. The missing genesis hot spot in the central US is clearly related to model deficiencies.

Fig. 8
figure 8

The monthly mean annual cycle of the numbers of MCSs (ad), mean precipitation (eh), and percentage of precipitation from tracked MCSs relative to regional total monthly precipitation (il). Shown are the Southeast, Midwest, Mid-Atlantic, and North East region (left to right). The shadings show the interquartile range of the interannual variability. Black lines and blue shading corresponds to the observations while red lines and shadings show simulated results

The simulated and observed number of MCSs are similar during winter and spring in the Midwest (Fig. 8b). However, simulated MCS occurrence rapidly decreases from June to July whereas observed MCS does not (see also Fig. 4). In July and August, the modeled MCS counts are 75 % too low compared to the observations. A consequence of this bias is a significant underestimation of mean precipitation during July, August, and September (Fig. 8f; see also Liu et al. (2016)). Also, the ratio between MCS precipitation to total precipitation is too low (Fig. 8j). In the other regions, the differences are smaller and generally within the inter-annual variability. Results derived with a precipitation thresholds of 2.5  mm h\(^{-1}\) and different smoothing radii are similar.

3.2 MCS characteristics

Fig. 9
figure 9

Snapshots of hourly precipitation from the eight MCSs with highest hourly maximum precipitation rates in the Midwest during JJA in the observation (left panel; cj) and the simulation (right panel; kr). The dates of the extremes are shown above each panel. The average of the 40 MCS with highest maximum precipitation rates in the observation/model is shown in a or b

Hourly precipitation snapshots of the eight most intense MCSs during JJA in the Midwest are shown in Fig. 9. The simulated MCSs (Fig. 9k–r) have realistic features, such as shape, size, and intensity, that are visually not distinguishable from observed MCSs (Fig. 9c–j). Note that this is a climatological viewpoint rather than a direct comparison as the observed and modeled MCSs occurred on different days. All modeled and observed MCSs reach precipitation maxima beyond 100 mm h\(^{-1}\). The observed and modeled size of the area where 60 mm h\(^{-1}\) is exceeded is on average 400 km\(^2\). Averaging the precipitation of the 40 observed MCSs with highest maximum hourly precipitation indicates that MCSs typically have an oval shape with the main axis zonally rotated by 30° (Fig. 9a). Note that parts of this shape can be attributed to the prevailing eastward movement of MCSs (see Fig. 7). The precipitation composit of the 40 most intense modeled MCSs is more circular and its main axis is 40° and therefore slightly more tilted than in the observations. The composite mean intensities and the covered area are similar. MCss typically reach their maximum hourly rain rate during the first half of their lifetime after a phase of rapid intensification and organization of convection.

Fig. 10
figure 10

Probability density functions (PDFs) for the hourly MCS size (ad), speed (eh), maximum precipitation (il), mean precipitation (mp), and total precipitation (qp). PDFs for the MCS lifetime and track length are shown in u–x and y–B respectively. Results for the Southeast, Midwest, Mid-Atlantic, and Northeast are shown from left to right (see inlays in ad). The numbers in the panels denote the sample sizes that were used to construct the PDFs (black/red numbers show observation/model results). A Gaussian kernel density estimate was applied to estimate the PDFs from the empirical density functions. The shaded contours show estimates of the 1–99 percentile sampling uncertainty based on 100 bootstrap samples. Observed/simulated results are shown in black/red

A more systematic comparison between the the modeled and observed MTD MCS characteristics in the four climate regions is shown in (Fig. 10). The size of the MCSs is very well captured especially in the Southeast (Fig. 10a) and Midwest region (Fig. 10b). Modeled MCSs tend to be too small in the Mid-Atlantic and Northeast region (Fig. 10c, d). However, some MCSs in the stage-IV observational dataset are spuriously large and intense in these regions and are likely erroneous. The typical precipitation areas have the size of small US states (e.g., Connecticut or Vermont; 10 000 to 20 000 km\(^{2}\)).

The translation speed of the MCSs (movement of the MCS center from one hour to the next) is remarkably well simulated and no significant differences can be detected (Fig. 10e–h). The fastest MCSs occur in the Midwest region where propagation speeds of up to 100 km h\(^{-1}\) are observed and modeled while the slowest MCSs occur in the Southeast. The good agreement can partly be explained by the use of spectral nudging to observed large-scale flow patterns but are also related to the correct simulation of mesoscale dynamics as we will show later.

The simulated maximum MCS hourly precipitation is slightly overestimated but the differences are not significant except for the Southeast and Mid-Atlantic region (Fig. 10i, k). Prein et al. (2017) showed that the model underestimates hourly extreme precipitation, defined as the 99.95 percentile of dry and wet hours, by up to 30 % in the central US during summer. The advantage of the feature based evaluation presented here is that the individual components of the bias can be identified. The low bias shown in Prein et al. (2017) is caused by an underestimation of the frequency of MCSs. If maximum hourly precipitation from MCSs is evaluated the model overestimates the intensities by 5 % to 25 %. This is in the range of observational uncertainties that are typically ~ 20 % caused by rain gauge under-catch (Duchon and Essenberg 2001).

The distribution of the MCS mean (Fig. 10m–p) and total precipitation (Fig. 10q–t) is also well captured by the model. The largest deviations occur in the Mid-Atlantic and Northeast region. Biases in the total precipitation are related to the biases in the MCS size (compare Fig. 10c, d with s, t). The model can simulate realistic MCSs with total precipitation values of up to 100 000 m\(^{3}\) s\(^{-1}\), which is equivalent to half of the average discharge of the Amazon river or six times the discharge of the Mississippi.

The lifetime of MCSs is on average 10 hours but can reach more than a day in rare cases (Fig. 10u–x). The model almost perfectly reproduces MCS lifetimes in all regions. A similar performance can be seen for the track lengths of MCSs (Fig. 10y–B). Especially in the Midwest, MCSs can travel for vast distances of up to 1500 km, which is a third of the east-west extent of the CONUS. The model’s ability to simulate MCS characteristics typically deteriorates when a larger smoothing radii and the higher (5 mm h\(^{-1}\)) threshold is used in MTD (Fig. A2) but Perkins skill scores (overlapping area of observed and modeled PDFs, Perkins et al. (2007)) are typically larger than 0.75 and show high model skills for all tested MTD settings.

Fig. 11
figure 11

Scatter plot matrix of MCS properties in the Midwest region. The numbers in the upper right of each panel show the Spearman’s rank correlation coefficients and the lines show weighted linear regression LOWESS curves (Cleveland 1979). Black/red colors show results from the observation/model

Important for a realistic simulation of MCSs is not only the accurate simulation of single MCS characteristics but also the replication of their relationships. Generally, the model is able to reproduce the observed relationships very well in the Midwest region (Fig. 11). There is no correlation between the MCS speed and any other MCS property (Fig. 11a, c, e, h). The simulated maximum hourly precipitation is weakly correlated (\(r=0.4\)) with MCS size, but the observations do not support this correlation (Fig. 11b). Maximum precipitation is also correlated with the mean precipitation (Fig. 11f) and total precipitation (Fig. 11i). The former relationship is non-linear, and the increase in maximum precipitation decrease at around 10 mm h\(^{-1}\) mean precipitation. This result is consistent in all regions (not shown). The highest correlation occurs between total precipitation and size with a correlation coefficient of 0.62 in the observations and 0.97 in the model. This high correlation is consistent with studies that showed a high correlation between the area-time integral of precipitation from convective storms and their rainfall volume (e.g., Doneaud et al. 1984; Lopez et al. 1989). This result might seem trivial since larger MCSs can precipitate more but the relationship is much less clear in the Southeast region and correlation coefficients are below 0.2 in the model and the observations (see Fig. A3). Varying the threshold and smoothing radius in the MTD analysis leads to very similar relationships between the object properties (not shown).

Fig. 12
figure 12

Development of MCS maximum precipitation, size, and velocity (ac) as a function of MCS duration in the Midwest region. Statistics for short-lived MCSs (blue/orange; lifetime shorter than 10 h) and long-lived MCSs (red and black; lifetime between 10 and 20 h) are shown. The shading/error bars show the interquartile spread in the sample. Observed results are shown in cold colors and simulated results in warm colors

Finally, we analyze the dynamic evolution of the MCSs in the Midwest. There is a rapid intensification in the first hours after the MCS genesis in which the observed and modeled maximum precipitation increases by 50 % (Fig. 12a). In their mature state, the MCS’s maximum precipitation is almost constant for approximately 2–3 h in short-lived MCSs (lifetime < 10 h) and rapidly decreases thereafter. Long-lived MCSs (lifetime >10 h and < 10 h) can maintain high precipitation intensities for up to 5 h and show a steady decay afterward. Also, the MCS size increases rapidly and shows more than a five-fold increase within the fist two hours after the MCS genesis (Fig. 12b). At this time short-lived MCSs reach their maximum size and start to steadily decay while long-lived MCSs continue growing for another 3 h. The decay in MCS size occurs in conjunction with reduced maximum precipitation rates. There is a clear increase in MCS propagation speed during the intensification phase from 30 km h\(^{-1}\) to 45 km h\(^{-1}\) (Fig. 12c). Afterward, the speed stays approximately constant. This might be related to the progression of mesoscale organization of deep convection and evolving cold pool dynamics within the MCSs. Overall, the model closely reproduces the dynamical development of MCSs in the Midwest. This is very encouraging because it emphasizes that the model can capture fundamental processes such as the organization of convection and the interaction of mesoscale processes with the large scale flow realistically. Similar performances are found in other regions (see Fig. A4).

3.3 Sources of MCS frequency biases

Fig. 13
figure 13

ERA-Interim 700 hPa geopotential height anomalies and wind on days with observed and modeled MCSs (hit, a, e, i, m), modeled but not observed (false alarm, b, f, j, n), observed but not modeled (missed, c, g, k, o), and not observed nor modeled (null events, d, h, l, p). The percentage of days within each category are shown above the panels. Results for May, June, July, and August are shown from the top down. Anomalies are calculated compared to monthly climatologies. Hatching shows grid cells that have significantly different mean geopotential heights than those on hit days [5 % confidence level according to the MannWhitney U test (Mann and Whitney 1947)]

The significant underestimation of MCS frequency during late summer in the central U.S. is partly related to missing MCS genesis in the High Plains (see Fig. 7). In this section, we investigate ERA-Interim 700 hPa geopotential height anomalies to understand the large-scale conditions that produce or inhibit the development of MCSs in the model during May, June, July, and August (Fig. 13).

Cases where the model is accurately simulating MCSs are typified by negative 700 hPa geopotential height anomalies over the western US and positive anomalies in the eastern US in all investigated months (Fig. 13a, e, i, m). In this situation, MCSs form in the area of negative anomalies and propagate east with the mean flow. In contrast, cases where no MCSs are observed and simulated are typified by positive anomalies in the western US and negative anomalies in the east US (Fig. 13d, h, l, p). On days when the model fails to generate MCSs (missed events), geopotential height anomalies are significantly higher in the MCS development region west (upwind) of the Midwest compared to days with correct simulated MCSs (Fig. 13c, g, k, o). This weather pattern is infrequent in May (14 %) but doubles in frequency during July and August (31 and 34 %) due to the generally weaker large scale forcing in late summer and the strengthening of the subtropical ridge’s influence on the central U.S. The results are very similar if the 500 hPa geopotential height is analyzed. Similar contrasting skills between moderate and weak synoptic forcing have been previously reported in short-term convection-permitting simulations of US warm-season precipitation (Liu et al. 2006).

In summary, our model is underestimating MCSs frequencies in weak synoptic-scale forced conditions that typically occur in late summer. In this situation the correct representation of local-scale processes such as soil–atmosphere interactions, regional-scale wind systems, and mixing in the planetary boundary layer are essential. A summertime warm and dry bias over the central US is fairly common in weather and climate models (e.g., Klein et al. 2006; Ma et al. 2014; Bellprat et al. 2016). We are currently performing sensitivity experiments to find the sources and potential solutions for these biases.

4 Summary and conclusion

We evaluate the performance of a north American-scale, convection-permitting climate model to simulate MCSs during the period 2002–2013. The application of a feature based evaluation method provides detailed insights into the model’s ability to simulate the frequency and characteristics of MCSs. The model is able to realistically capture the main characteristics of MCSs such as their size, propagation speed, total precipitation volume, and maximum hourly precipitation rates within observational uncertainties in all investigated regions. The realistic simulation of MCS characteristics is a major advantage compared to coarser resolution climate simulations that have to parameterize deep convection (e.g., Chang et al. 2016). These results agree well with results from convection-permitting weather forecast evaluation study (Clark et al. 2014; Davis et al. 2009). The largest biases are found in the model’s ability to reproduce the frequency of MCSs. In the Southeast and the Mid-Atlantic region, MCS frequency is up to 70 % overestimated during JJA. Similar overestimations are found in convection-permitting weather forecasting models in all US regions (Johnson and Wang 2012; Clark et al. 2014). Different from forecasting biases are the 50 % underestimation of MCSs in the central US during late summer.

Assessing the sources of MCS frequency biases is critically important for future studies. The central Great Plains are an area of especially strong land–atmosphere coupling (Koster et al. 2004). An erroneous representation of land surface processes can lead to a loss of soil moisture in this region resulting in a too dry boundary layer and a reduction of MCS genesis. In particular, the surface energy balance and the effect of including groundwater and irrigation should be further studied. Also, the effect of model grid spacing on MCS genesis and dynamics is not well understood and the representation of shallow convection and turbulence need further exploration.

The accurate simulation of MCSs has significant societal benefits since MCSs produce hazardous weather events that cause 20 billion US$  of economic losses each year in the US with steadily increasing trends (Munich 2015). CPCSs can provide valuable insights into MCS dynamics and precipitation and will allow unprecedented insights into their changes in response to climate change.